top of page
Download CV

M

Mohan

Senior

Senior

Data Engineer

* Zero Evaluation Fee

Available

Available in IST Timezone

Summary

Technical Skills

Projects Worked On

Summary:

  • Having 8 Years of experience in SQL, ETL, Informatica, Snowflake, Power BI, Grafana, Python, Pyspark, Glue, Lambda, Data bricks, Data Modelling, Oracle Cloud Fusion Analytics, Azure, AWS, GCP, Databricks, Cloud Technologies, Reporting Tools and Automation processes.
  • Diligent Database and Senior Data Engineer, seeking to leverage solid technical skills and abilities.

 

Skills and Competencies:

  • Proficiency in Databricks platform
  • Advanced data pipeline design and development
  • Data quality and governance
  • Data integration processes
  • Data security and privacy regulations
  • Data visualization tools development
  • Data warehouse and data mart design and development
  • ETL (Extract, Transform, Load) processes
  • Data governance and compliance
  • Technical Skills: MS SQL Server, SSIS, SSRS, ETL, Informatica, Datawarehouse, Azure, Power BI, Python, Pyspark, Lambda, Glue, Databricks, PowerShell, Control-M, AWS, EC2, AppDynamics, Splunk, Snowflake, Performance Engineering, Oracle Cloud Fusion Analytics.
  • Databases & Data Warehousing: SQL Server(2016), Snowflake, Data Warehousing
  • ETL & Data Integration: SSIS (SQL Server Integration Services), SSRS (SQL Server Reporting Services), Informatica, ETL (Extract, Transform, Load)
  • Big Data & Cloud Technologies: Azure (Data Factory, Synapse, Blob Storage), AWS (Lambda, Glue), Databricks, PySpark
  • Data Visualization & BI Tools: Power BI, Grafana
  • Scripting & Programming: Python, PowerShell
  • Version Control & DevOps: GIT
  • Schedulers & Automation Tools: Task Scheduler, BMC Control-M, File Watcher Jobs
  • Third-Party Tools: Beyond Compare, V-Edit, IBM Lotus Notes, Stats and Strat’s, Retriever Prep
  • Operating Systems: Windows Server (2008, 2008 R2, 2012, 2016)

 

Worked with:

  • Amazon India Pvt Ltd - Aug 2016 to Apr 2017
  • Cotiviti India Pvt Ltd - Apr 2017 to Sep 2021
  • Vaco Binary Semantics - Sep 2021 to Dec 2022
  • Birlasoft India Pvt Ltd - Jan 2023 – Feb 2024

 

Projects Executed:

 

Project 1:

Descriptions: We create and deliver multiple products and release into market as per business requirement. Process millions of customers data on frequent basis and identify the bottlenecks for a smoother delivery.
Customer Name: Retail & Healthcare Services
Roles and Responsibility:

  • Designed and managed end-to-end ETL pipelines to orchestrate data ingestion, transformation, and loading into the data warehouse.
  • Improved data pipeline reliability and performance, reducing processing time by 40% by refactoring legacy workflows and utilizing Databricks' optimized cluster configurations and autoscaling capabilities.
  • Integrated diverse data sources, including SQL databases (sales transactions), REST APIs (inventory systems), and flat files (product catalog), using ADF's Copy Activity and Data Flow components.
  • End-to-End Data Pipeline Development: Ability to manage all aspects of the data engineering lifecycle from data ingestion to transformation to storage.
  • Technological Expertise: I emphasized my proficiency with Azure Data Factory, Databricks, Python, and cloud-based solutions like Azure Synapse Analytics.
  • Data Transformation and Quality Assurance: Showcased skills in cleaning, transforming, and validating large datasets using Python and Databricks.
  • Oversee the overall administration and configuration of the Databricks workspace.
  • Manage access to Unity Catalog and configure data governance settings.
  • Assign and manage roles for other users.
  • Monitor the health and usage of data in the workspace.
  • Control the deployment of data governance policies.
  • Business Impact: Focused on the measurable business outcomes of work, such as improved data processing time, more accurate sales forecasts, or reduced stockouts.
  • Scheduled and monitored the data pipelines, ensuring data was ingested and transformed in near real- time.
  • Integrated data from various sources (e.g., Azure Data Lake, Azure Blob Storage, SQL Server, and external APIs) into Azure Synapse Analytics to support business intelligence and analytics initiatives.


Project 2: Seadrill

Duration: From Jan 2023 to Feb 2024
Customer Name: Oil & Gas Services
Roles and Responsibility:

  • Designed and implemented a scalable, cloud-based data pipeline for a retail company to process and analyze sales data across multiple channels, including in-store, online, and third-party sales platforms.
  • Design, develop, and maintain data pipelines for extracting, transforming, and loading (ETL) data from various sources into Databricks.
  • Set up monitoring and alerting using Databricks' integrated logging and cloud-native monitoring tools (e.g., AWS CloudWatch, Azure Monitor) to track workflow performance, job completion status, and trigger alerts on failures or anomalies.
  • Collaborated with cross-functional teams (data scientists, business analysts, etc.) to integrate machine learning models into Databricks workflows, automating the training, evaluation, and deployment of models in production environments.
  • Automated data pipeline versioning using Databricks' integration with GitHub/Bitbucket for source control, ensuring a robust development lifecycle, and streamlined collaboration across team members.
  • Optimize data processing workflows and queries to ensure scalability and efficiency.
  • Manage and optimize data storage in Delta Lake or other distributed storage systems.
  • Work with data scientists and analysts to understand the data needs and ensure that the data is available and accessible.
  • Leveraged Azure Databricks to perform complex data transformations, aggregations, and joins across sales, customer, and inventory datasets.
  • Utilized PySpark (Python API for Apache Spark) to scale data processing for large volumes of transactional and inventory data.
  • Built data models (e.g., fact tables for sales and dimension tables for customers, products, and stores) to support efficient querying and analytics.
  • Developed custom Python scripts for data cleaning, removing duplicates, handling missing values, and applying business rules (e.g., validating product SKUs, price ranges, etc.).
  • Applied Pandas and NumPy for in-memory data processing and preprocessing prior to uploading data into Azure Databricks for transformation.
  • Used Azure Stream Analytics in combination with Azure Synapse for real-time data ingestion and analytics.


Project 3: CareSource

Duration: From Sep 2021 – Dec 2022

Customer Name: Health care & Insurance Services
Roles & Responsibilities:

  • Delivered the roll-out and embedding of Data Foundation initiatives in support of key business programs.
  • Provided advisory on technology use and implementation.
  • Coordinated change management, incident management, and problem management processes.
  • Ensured traceability of requirements from data through testing and scope changes.
  • Managed the migration of data from applications to Azure Cloud.
  • Developed Databricks notebooks for extracting data from application APIs.
  • Developed and maintained ETL pipelines leveraging Azure Data Factory and Azure Synapse Pipelines to ingest, transform, and load data efficiently into Synapse SQL Pools.
  • Designed final architectural solutions for data projects.
  • Implemented processes in Azure Data Factory to streamline data flow.
  • Optimized notebooks for performance and cost efficiency.
  • Analyzed business needs and translated them into technical solutions.


Project 4: Amazon, Riteaid, Macy’s, Walgreens, Walmart, Delhaize

Duration: From Apr 2017 – Sep 2021
Customer Name: Retail & Healthcare Services
Roles & Responsibilities:

  • Created several EC2 instances as per the workload balancer requirements.
  • Worked on extracting data and transforming it from different sources into cloud.
  • Worked on Redshift and data warehousing concepts.
  • Creating and managing maps to load data from various sources.
  • Designed and implemented Grafana dashboards to visualize real-time system performance metrics and business KPIs.
  • Integrated Grafana with Prometheus and Elasticsearch to provide comprehensive monitoring solutions.
  • Optimized queries and dashboards for improved performance, reducing load times by greater percentage.
  • Developed custom Grafana plugins and panels to meet specific visualization needs.
  • Set up alerting mechanisms to proactively identify and address system anomalies and potential issues.
  • Completed automation of ETL process through scheduler jobs like Control-M.
  • After the data being loaded into target systems, runs various quality checks to ensure the standards of data.
  • Created new maps from time to time for any number of different source files.
  • Developed and maintained data lakes and analytical platforms using Databricks on AWS and Azure, ensuring scalability, data security, and automation of infrastructure as code (IaC).
  • Reduced production AWS EMR processing costs by 25% and decreased downtime by 37% through effective optimization techniques, resource management, and configuration adjustments.
  • Led project teams, distributed tasks, reviewed pull requests, and supervised the implementation of big data solutions, ensuring adherence to project timelines and quality standards.
  • Developed and maintained continuous integration and continuous deployment (CI/CD) pipelines for schema migrations, workflows, and cluster pools using tools like Git, Jenkins, Azure Repos, and Azure Pipelines.
  • Developed integration frameworks for FHIR format data and Azure Databricks, troubleshooting and optimizing Delta Live Tables jobs to ensure seamless data processing and integration.
  • Hands on experience in Scripting that helps to play with files as per the given requirement.
  • Deals with end-to-end process for all kinds of files from being received phase to loaded, QC checks and archived.
  • Runs various quality checks on the data that is loaded in staging environment and before pushing into production
  • Analyzes the data and generates the reports from reporting tool.
  • Responsible for monitoring the production cycles and ensure the completion of every day ETL loads.
  • Design Develop and Test ETL Mappings, Mapplets, Workflows, Worklets.
  • Work in a fast-paced environment, under minimal supervision providing technical guidance to the team members
  • Responsible for database schema design, extensive T-SQL development, integration testing and other projects that may be necessary to help the team achieve their goals.
  • Working closely with Onshore and offshore application development leads.
  • Identify efficiencies and ways to improve design and development processes.
  • Identify ways to increase efficiency of production support - Find solutions that allow operations to better do their job without involving development resource time.
  • Designed and developed interactive Power BI dashboards and reports to track key performance indicators (KPIs) and business metrics.
Social Share

How it Works

KNOW

SEND

LIKE

SEND

ON BOARD

How it Works

1.

SEND

2.

MATCH

3.

TRIAL

4.

ON BOARD

icons8-speech-to-text-90.png
Whatsapp
bottom of page