top of page
Download CV

ML, Python, GPT 4, NLP, Fast API

true

Shubham M.

Machine Learning Engineer

13

+ years of Experience

Summary
Technical Skills
Projects Worked On
Technical Expert
Summary
  • A total experience of 13 years in the IT Industry as a Solution Engineer/Senior Software Engineer involved in project studies, project development, project testing, and deployment (on AWS) using Python/Java technologies.
  • For the last 6 years, I have been working on NLP projects.
  • I have a strong NLP background and experience in machine learning, python, & cutting-edge GenAI Technologies. Also have 2 years of experience in FastAPI.
Technical Skills
  • Python, MySQL, Prompt Engineering
  • ML-Linear & Logistic Regression, Naive Bayes, K-NN, K-Means, SVM, PCA, Decision Tree, Random Forest, Boosting Algos, NLP & Time Series
  • Pandas, Numpy
  • RegEx ML lib & NLP
  • Flask Web framework, Fast API framework
  • AWS EC2, S3, Bedrock, Sagemaker services
  • Windows, Linux, Conda, VS Code
  • Azure Factory, Databrics & Machine Learning Services
  • Generative AI tools, AutoGPT, HuggingGPT, LLMA2, GPT4
  • Langchain, Open-source LLMs, OpenAI Models NLP, Machine Learning, Text Classification, Python Restful APIs, Deep Learning, Team lead, Project planning, Strong Communication & Leadership Skills alongside verbal and written communication skills
Competencies
  • Experience using ML libraries, such as scikit learn, Spacy, NLTK, Gensim, Flair, Huggingface, Keras, GPT3.
  • Seaborn, Matplotlib, Pandas, numpy, RegEx etc., Experience diving into data to discover hidden patterns.
  • Developed solutions leveraging cutting-edge technologies, such as Lang chain, Open-source LLM (LLMA2, Falcon, Ada, Mistral 7b) OpenAI Models, Hugging Face Models, and Vector Databases, for fine-tuning LLMs and constructing RAG-based applications.
  • Writing the white paper to compare the Knowledge graph based RAG systems vs Vector DB based RAG System.
  • Take ownership of the data science model end-to-end from data collection to model building to monitoring the model in production.
  • Solid understanding of the mathematics related to data science probability, statistics, linear algebra.
  • Ability to understand business concerns and formulate them as technical problems that can be solved using data and math / stats / ML.
  • Experience working as part of a product development team, along with engineers, product managers and SMEs to define the problem and execute the data science solution.
  • Designing and maintaining the infrastructure for Scalable ETL data processing and storage in Azure factory, and managing the end-to-end ML pipeline, incorporating data preprocessing, model training, deployment and monitoring using Azure Databricks and Machine Learning services.
Certifications
  • Cyber Security White Belt certification.
  • Lean Six Sigma Yellow Belt certification.
  • Career Essentials in Generative AI by Microsoft and LinkedIn Certification.
  • Machine Learning - Knowledge Based Assessment [201 - Intermediate]
  • Natural Language Processing - Knowledge Based Assessment [201 - Intermediate]
  • Statistics - Knowledge Based Assessment [201 - Intermediate]
  • Introduction to Data Science - Knowledge Based Assessment [101 - Basics]
Projects Worked On

PII PHI Redactor (Gen AI)

Domain: Insurance/Cross Industries

Description: It is a Gen-AI based solution. Companies incur losses due to not following laws like HIPAA, GDPR etc. Insurance companies receives various sensitive documents as part of customer onboarding process. None of these documents can be stored or processed without masking. This application would classify document based on how sensitive the contents are. It will further identify PII, PHI content and accordingly redact it or replace it with placholders. Document can be images as well as Text documents.

Roles:

  • Senior Data Scientist/Architect
  • Prompt Engineering, GPT-3.5
  • Created Gradio based solution to Redact the PII-PHI from documents.

 

Prequalification of Insurance Application (Gen AI)

Domain: Insurance/Cross Industries

Description: Improve Operational efficiency by automating, and/or modernizing the proofing process for Insurance application. This solution extract information from the document and cross validate the info with the supporting document. Increased operational efficiency efforts to save 4 to 5 hours of underwriter. This solution can be used in any other domain as well. Used Amazon Bedrock services and anthropic’s Claude v2 model.

Roles:

  • Project Lead/Architect
  • Lead the project and be the main architect of the whole Idea.
  • Data Source - Insurance documents
  • Project Plan – Resource Sizing, Deliverables, Technology Stack, Architecture Decision

 

Accident Management-Service Date Adjustment Model (AI-ML)

Domain: Roadside Assistance & Accident Management

Description: Created a claim prediction model that predicts claims for cases that have not yet been approved for payment. AM-SDA model predicts the claim cost on daily bases. It allows for a more “up-to-date” view on claims, particularly for recent date. Generate the average claim cost report card at case level sooner. Easy access to the average claim cost data. Weekly and monthly reports for network and finance team to view the claims cases “up-to-date”

Roles:

  • Project Lead/Architect
  • Performed EDA, Statistical Model selection, Model Evaluation and deployment

 

Outlier Detection and Data exploration for KPIs (AI-ML)

Domain: Roadside Assistance & Accident Management

Description: In this project I have done the data exploration work to identify and remove data from experiments. Used different statistical methods to identify outliers for different distributions. CPC(Cost per case), EPJ(Effort per job), CPI(Cost per invoice) and EPC (Effort per case) are different KPIs for which outlier detection is done. Also created the sigma dashboard for visaulizing the data post outlier detection. Also created the sigma dashboard for visaulizing the data post outlier detection.

Roles:

  • Project Lead/Architect
  • Data Gathering, EDA, Feature Engineering, Data Visualization and Dashboard creation

 

Trade Credit Rater (AI-ML)

Domain: Insurance

Description: In thisTC Rater is a tool for TC underwriters to generate a technical premium, expected loss, expenses and capital charge considering the time probability of default, country risk, industry risk etc.

Roles:

  • Senior Developer/Architect
  • Was working closely with data scientists and the business to convert excel based rating solutions into microservice-based python architecture.
  • Created Monte Carlo simulation solutions for predicting premiums for different ratings for underwriters.

 

US Doc Proofing (AI-ML)

Domain: Legal Content

Description: Improve Operational efficiency by automating, and/or modernizing the proofing process for US legislation print proofing. Improve cycle time from 2 mins per page to 1 min per page. Achieve a per section quality target of at least 98% per current quality guidelines. Reduce the number of required proofing reviews by eliminating the 2nd or 3rd review hand-offs.

Roles:

  • Project Lead/Architect
  • Lead the project and be the main architect of the whole Idea.
  • Data Source - US legislation Documents
  • Project Plan – Resource Sizing, Deliverables, Technology Stack, Architecture Decision

 

Legislation Extraction (NLP)

Domain: Legal Content

Description: Legislation extraction is to identify section, subsection, rule, sub-rule, article, etc. of any act or regulation mentioned in a Case Law document and map them to its corresponding source document. 30% accuracy increased in terms of linking the source document correctly.

Roles:

  • Project Lead/Architect
  • Performed EDA, Statistical Model selection, Model Evaluation and deployment

 

De-Duping (NLP)

Domain: Legal Content

Description: In this project I have used Bag of Word and TF-IDF algorithm to find the duplicacy between documents. Within these documents, we are targeting The fields like Judge Name, Case Number, Advocates and Party names, etc to find out the duplicate documents. The duplicacy of online product is a big drawback and impact the search results and linking adversely.

Roles:

  • Project Lead/Architect
  • Data Gathering, EDA, Feature Engineering, Model Evaluation

 

Metadata Extraction (NLP)

Domain: Legal Content

Description: In this project I have used the Naïve Bayes algorithm to train the model. This model will find different metadata fields like Judge Name, Case Number, Advocates and Party names, etc from the PDF of a Judgment. This will also generate an excel sheet of the extracted metadata and store them in DB alongside.

Roles:

  • Training dataset creation, EDA, Feature Engineering
  • Tokenization, Lemmatization, Text Wrangling, TF-IDF
  • Statistical Models like Naïve Bayes, SVM used to classify the Judge Name field

Social Share

How it Works

KNOW

SEND

LIKE

SEND

ON BOARD

How it Works

1.

SEND

2.

MATCH

3.

TRIAL

4.

ON BOARD

bottom of page