Summary

Overview

Work History

Education

Skills

Certification

Timeline

Accomplishments

Work Availability

Work Preference

Languages

Hi, I’m

Sandeep Manam

Data Engineer

Duluth,GA

The way to get started is to quit talking and begin doing.

Walt Disney

Summary

Overall 10+ years of professional IT experience and over 5 years of Big Data Ecosystem experience in ingestion, storage, querying, processing, analysis of big data/ Databricks/cloud Technologies (AWS, Azure, GCP) Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Databricks, Azure Synapse Analytics, Azure Cosmos DB, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure HDInsight, Key Vault, Azure Data Lake for data ingestion, ETL process, data integration, data migration, AI solutions. Ingested data into Azure Blob storage and processed the data using Databricks, Involved in writing Spark scripts and UDF's to perform transformations on large dataset. Experience working with Azure Blob and Data Lake storage and loading data into Azure SQL Synapse analytics. Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration. Created Databricks notebooks to streamline and curate the data for various business use cases and mounted blob storage on Databricks. Experience in building data pipelines, computing large volumes of data using Azure Data factory. Developed Python scripts to do file validations in Databricks and automated the process using ADF. In-depth knowledge of Hadoop and Spark, experience with data mining and stream processing technologies (Kafka, Spark Streaming) Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL and HDFS, parallel processing - MapReduce framework. Development of Spark-based application to load streaming data with low latency, using Kafka and Pyspark programming. Extensive hands-on experience tuning spark Jobs. Experienced in working with structured data using HiveQL and optimizing Hive queries. Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data. Experience in development of Big Data projects using Hadoop, Hive, Flume, and MapReduce open-source tools. Experience in installation, configuration, supporting and managing Hadoop clusters. Experience in working with MapReduce programs using Apache Hadoop for working with Big Data. Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite. Worked with BTEQ in UNIX environment and execute the TPT script from UNIX platform. Worked on Teradata Store Procedures and functions to confirm the data and load it on the table. Worked on Teradata Multi-Load, Teradata Fast-Load utility to load data from Oracle and SQL Server to Teradata. Wrote numerous BTEQ Scripts to run complex queries on the Teradata database. Tuning SQL queries to overcome spool space errors and improve performance. Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS. Strong hands-on experience with AWS services, including EMR, S3, EC2, Lambda, Glue, Redshift, Athena, DyanmoDB. Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies. Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Hive, Sqoop, Oozie, Flume, big data technologies. Worked on Spark, Spark Streaming and using CoreSparkAPI to explore Spark features to build data pipelines. Experienced in working with different scripting technologies like Python and UNIX shell scripts. Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop. Installed and configured Apache airflow for workflow management and created workflows in python. Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool. I have experience in database design, entity relationships and database analysis, programming SQL, stored procedures PL/SQL, packages, and triggers in Oracle. Experience in working with different data sources like Flat files, XML files and Databases. Hands on experience in working with Continuous Integration and Deployment (CI/CD) Strong communication skills, analytic skills, good team player and quick learner, organized and self-motivated.

Transitioning from data-centric environment with focus on developing efficient data solutions and optimizing workflows. Skilled in data architecture, database management, SQL, and Python, with track record of enhancing data-driven decision-making processes. Seeking to apply these transferrable skills in new field, bringing consultative approach to solving complex problems and improving operational efficiency.

Overview

years of professional experience

Certification

Languages

Work History

MasterControl

Senior Data Engineer

01.2022 - Current

Job overview

The Data Engineering team works on building data lake using Bigdata to pull any type of customer information
The challenge is to make all customer data easily accessed for the marketing group so that when they pull groups of data, they are capturing the correct audience through Burst
Worked on end-to-end ETL (Extract, Transform, Load) processes using ADF, ensuring seamless data integration and quality
Hands on experience on creating spark clusters and parameters in the databricks notebooks
I have implemented delta lake concepts in databricks
Experienced in implementing SCD1, SCD2 using delta lake and data validations using pyspark and scheduling databricks notebooks using databricks jobs
Experienced in implementing audit logs in every notebook and exception handling and logging in every databricks notebook
Experienced in implementing dynamic notebooks using notebook parameters
Hands on knowledge on using dbutils commands and processing different file formats like csv, parquet, Json etc
Hands on knowledge on different connectors in databricks like creating mount point to connect to data lake storage and creating JDBC connections from databricks to azure synapse
Hands on knowledge on using different spark operations and converting unrefined data to refined data using pyspark
Strong knowledge of analytics functions like window, rank and dense_rank etc
And integrating databricks with azure key vault
Developed pyspark code for data cleansing like trimming of columns and duplicate checks, key duplicates etc
And SCD Type1 using merge functionality
Hands on experience in implementing different types of joins in pyspark and SQL
Having experience in implementing Azure Data factory(v2) pipeline components such as linked services, Datasets and Activities
Implemented multi tables full load from on-premises to cloud and incremental load from on-premises to cloud and Implemented Audit in Azure Data Factory using stored procedure
Implemented the scheduling in Azure Data Factory using Triggers and Dynamic Data loading using a Config
Table using components Lookup, Foreach and Copy Data activities for the tables which are configured in the Config
Table and experience on Key Vaults
Scheduling ADF pipelines using scheduled triggers and event-based triggers
Working experience Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract and load data from different sources like Azure SQL, ADLS, Blob storage, Azure SQL Data warehouse
Involved in Bug fixes and code debugging and job monitoring in Production Environment
Implemented Polybase in Azure SQLDW and email feature using Azure Logic-Apps to get pipeline notifications
Designed and implemented Snowflake stages to efficiently load data from various sources into Snowflake tables
Created and managed different types of tables in Snowflake, such as transient, temporary, and persistent tables and optimized Snowflake warehouses by selecting appropriate sizes and configurations to achieve optimal performance and cost efficiency
Applied advanced partitioning techniques within Snowflake, elevating both query performance and the efficiency of data retrieval
Effectively configured and oversaw multi-cluster warehouses in Snowflake, ensuring their capacity to adeptly manage high-concurrency workloads
Worked with Snowpipe to seamlessly ingest real-time data into the Snowflake environment, guaranteeing a steady flow of data and automating the loading process
Utilized Power BI to design and develop interactive dashboards and reports for various departments (e.g., Sales, Marketing, Finance) to track key performance indicators (KPIs) and gain insights into business performance
Created calculated columns and measures using DAX to manipulate and analyze data within Power BI
Employed data cleaning techniques and transformations to ensure data accuracy and consistency for reliable reporting
Partnered with business stakeholders to understand their data needs and translated them into actionable data visualizations and reports
Developed and maintained data refresh processes to ensure reports reflect the latest data
Increased sales team productivity by 20% through the creation of an interactive sales pipeline dashboard in Power BI
Reduced marketing campaign costs by 15% through data-driven insights generated from Power BI reports
Containerized applications and services using Docker and orchestrated them with Kubernetes for scalability and portability
Managed version control and code repositories using GIT, ensuring collaboration and code quality
Implemented CI/CD pipelines in Azure DevOps, automating deployment and ensuring continuous integration
Environment: Azure DW, ADF, Azure Databricks, Azure solutions, Python, SQL, Bash, Hadoop, Hive, Sqoop, PySpark, Scala, XML, JSON, Docker, Kubernetes, GIT, Azure DevOps, ELK, JIRA, Agile, Snowflake

Guardian Life Insurance limited

Data Engineer

02.2021 - 12.2021

Job overview

Established and developed serverless applications on AWS using the Serverless Framework and Python's boto3 module
Build serverless applications utilizing AWS Lambda, API Gateway, and DynamoDB, resulting in a reduction in infrastructure costs and an improvement in scalability
Designed and developed end-to-end data pipelines using AWS Glue, Apache Airflow and Apache Spark to extract, transform, and load data from various sources into data warehouses
Integrated data quality checks and data governance mechanisms to ensure data accuracy and consistency throughout the organization
Created and designed ETL processes in AWS Glue to import various kinds of from outside sources into AWS Redshift
Monitored and maintained the health and performance of Amazon Redshift clusters through monitoring tools and dashboards
Managed backups and disaster recovery procedures to ensure data integrity and business continuity
Implemented data encryption and security measures to protect sensitive data in compliance with industry standards
Designed and optimized data models, including defining schemas and setting up indexes, to enhance data storage and retrieval efficiency in Snowflake
Developed and maintained data pipelines using Apache Airflow and custom scripts to extract, transform, and load (ETL) data from diverse sources into the RedShift data warehouse
Set up proactive monitoring and alerting systems to detect and address data loads, query performance, and system health issues
Utilized Apache Spark for distributed computing tasks, improving processing speeds
Environment: AWS Cloud S3, Snowflake, DB2, Bigdata, SSIS, Spark, EMR, Python, MapReduce, Glue, CloudWatch

Standard Chartered Bank

Azure Data Engineer

11.2019 - 01.2021

Job overview

Responsible for creating a data lake on the Azure Cloud Platform to improve business teams' use of Azure Synapse SQL for data analysis
Utilized Azure SQL as an external hive meta store for Databricks clusters so that metadata has persisted across multiple clusters
Employed Azure Data Lake Storage as a data lake and made sure that spark and hive tasks immediately sent all the processed data to ADLS
Responsible for designing fact and dimension tables with Snowflake schema to store the historical data and query them using T-SQL
Strong Experience working with Azure Databricks runtimes and utilizing data bricks API for automating the process of launching and terminating runtimes
Experience in integrating Snowflake data with Azure Blob Storage and SQL Data Warehouse using Snow Pipe
Employed resources like SQL Server Integration Services, Azure Data Factory, and other ETL tools to identify the route for transferring data from SAS reports to Azure Data Factory
Transferred data to Excel and Power BI for analysis and visualization after being moved to Azure Data Factory and managed in Azure Databricks
Implemented version control using Git for collaborative development
Utilized Visual Studio Team Services (VSTS) for project management, code repositories, and continuous integration
Developed PowerShell scripts for automation and configuration management
Worked on retrieved data from Kafka and transfer it to the Spark pipeline, use Spark Kafka streaming and built SQOOP scripts to import and export data from RDBMS to HDFS
Worked on the Snowflake Schema, data modelling and elements, source-to-target mappings, interface matrix, and design components, created analytical warehouses on Snowflake and used Snow SQL to perform data quality issue analysis
Implemented data processing workflows using PySpark, leveraging the power of Python for Spark
Developed and optimize Spark pipelines for efficient large-scale data processing
Utilized JIRA for project tracking, task management, and collaboration
Developed Tableau dashboards for visualizing and presenting data insights to stakeholders
Designed and deployed infrastructure using Azure Resource Manager (ARM) templates
Implemented and managed SQL Data Warehousing (SQL DW) solutions for analytical processing
Maintained Azure Data Factory to consume data from several source systems and transferred data from upstream to downstream systems using Azure Data Factory as an orchestration tool
Developed a Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline to extract, transform, and load data from a variety of sources, including Azure SQL, Blob storage, an Azure SQL data warehouse, write-back tools, and backwards
Implemented and optimized data storage solutions using Azure Cosmos DB
Utilized Azure Analysis Services for multidimensional data analysis and reporting
Designed and orchestrated data workflows using Azure Data Factory
Utilized Azure Data Lake Storage (ADLS) for scalable and secure data storage
Implemented and optimized data processing using Azure Data Bricks
Utilized Azure Stream Analytics for real-time data streaming and analytics
Developed and optimized Pig and Hive scripts for data transformation and querying
Managed and optimized distributed storage using Hadoop Distributed File System (HDFS) and Cloud SQL
Implemented Spark-Kafka integration for efficient data streaming and processing
Utilized Apache Kafka for building scalable and fault-tolerant data pipelines
Environment: Apache Spark, Blob storage, ADL gen2, ADF, DataBricks, Synapse Analytics, Hive, Cosmos DB, Spark-SQL,Snowflake, Unix, Kafka, Python, SQL Server

Extarc Software Solutions Pvt. Ltd

Hadoop Developer

05.2017 - 10.2019

Job overview

Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables
Rigorously exerted Spark -Scala (RDD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector APIs for various tasks (Data migration, Business report generation etc.)
Developed Spark Streaming application for real time sales analytics
Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption
Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project
Engineered complex data pipelines using tools such as Databricks, processing terabytes of data to drive decision-making
Analyzed the source data and handled efficiently by modifying the data types
Worked on excel sheet, flat files, CSV files to generated PowerBI ad-hoc reports
Analyzed the SQL scripts and designed the solution to implement using PySpark
Extracted the data from other data sources into HDFS using Sqoop
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS
Extracted the data from MySQL into HDFS using Sqoop
Implemented automation for deployments by using YAML scripts for massive builds and releases
Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop
Implemented Data classification algorithms using MapReduce design patterns
Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs
Worked on GIT to maintain source code in Git and GitHub repositories
Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL

Triniti Advanced Software Labs

Data Warehouse Consultant

05.2014 - 04.2017

Job overview

Worked with Stakeholders regarding business requirements, functional specifications, and enhancements, based on the business needs created technical design and functional specification documents
Active participation in weekly calls with data modeling and analyst teams to understand and work on any new requirements
Analyzing data from source systems to design the solution for the business requirement
Developed Complex Mappings in Informatica using Power Center transformations (Source Qualifier, Joiner, Lookups, Filter, Router, Aggregator, Expression, XML Update and Sequence generator transformations), Mapping Parameters/Variables, Parameter files, SQL overrides, Transformation Language
Data profiling the source files and developing data model and mappings for smaller requirements
Implemented CDC, SCD2, SCD1 Delta load, Snapshot and transactional fact tables, headers and footers to Flat File, File list
Developed Unix scripts for SFTP file transfers and Target table truncate operations
Implemented partitioning at database level for better performance
Scheduled the Workflows to run on a daily and weekly basis using Control-M Scheduling tool
Provided support to the QA team for various testing phases of ETL development
Involved in unit testing and Unit test plan document preparation
Documenting mapping specifications, STM, Unit Test cases, procedure, results
Implemented Push Down Optimization (PDO) for better performance when source data is huge
Environment: Informatica PowerCenter 9.5/9.1, Oracle 11g, Flat Files, COBOL Files, Erwin, Control-M, SQL, PL/SQL, Shell Scripting

Education

Rivier University
Nashua, NH

Master of Science from Computer And Information Systems

05.2001

Skills

Python

Certification

Microsoft Certified, Azure Data Engineer Associate - Microsoft.

Timeline

Senior Data Engineer

MasterControl

01.2022 - Current

Data Engineer

Guardian Life Insurance limited

02.2021 - 12.2021

Azure Data Engineer

Standard Chartered Bank

11.2019 - 01.2021

Hadoop Developer

Extarc Software Solutions Pvt. Ltd

05.2017 - 10.2019

Data Warehouse Consultant

Triniti Advanced Software Labs

05.2014 - 04.2017

Rivier University

Master of Science from Computer And Information Systems

05.2001

Accomplishments

Supervised team of 4 staff members.

Availability

See my work availability

Not Available

Available

monday

tuesday

wednesday

thursday

friday

saturday

sunday

morning

afternoon

evening

swipe to browse

Work Preference

Work Type

Contract Work

Location Preference

On-SiteRemoteHybrid

Important To Me

Career advancementWork-life balanceCompany Culture

Languages

English

Full Professional

Hindi

Full Professional

Telugu

Native or Bilingual

Italian

Full Professional

Similar Profiles

JUHI VASISHTJUHI VASISHT
FedRAMP Senior Manager at MasterControlFedRAMP Senior Manager at MasterControl
Percy SeguraPercy Segura
Enterprise Engineer (Systems) at MasterControlEnterprise Engineer (Systems) at MasterControl
Cassidee J. CapunayCassidee J. Capunay
Senior Technical Writer at MASTERCONTROLSenior Technical Writer at MASTERCONTROL
Jacob DenkersJacob Denkers
Software Quality Assurance Engineer 3 at MasterControlSoftware Quality Assurance Engineer 3 at MasterControl
Nimmish MuljiNimmish Mulji
CFO at NPN Inc dba Beverage World Package StoreCFO at NPN Inc dba Beverage World Package Store

CREATE PROFILE

Summary

Overview

Work History

MasterControl

Job overview

Guardian Life Insurance limited

Job overview

Standard Chartered Bank

Job overview

Extarc Software Solutions Pvt. Ltd

Job overview

Triniti Advanced Software Labs

Job overview

Education

Rivier UniversityNashua, NH

Skills

Certification

Timeline

Senior Data Engineer

Data Engineer

Azure Data Engineer

Hadoop Developer

Data Warehouse Consultant

Rivier University

Accomplishments

Work Preference

Work Type

Location Preference

Important To Me

Languages

Similar Profiles

JUHI VASISHTJUHI VASISHT

Percy SeguraPercy Segura

Cassidee J. CapunayCassidee J. Capunay

Jacob DenkersJacob Denkers

Nimmish MuljiNimmish Mulji

Rivier University
Nashua, NH