Summary
Overview
Work History
Education
Skills
Certification
Timeline
Accomplishments
Work Availability
Work Preference
Languages
Hi, I’m

Sandeep Manam

Data Engineer
Duluth,GA
The way to get started is to quit talking and begin doing.
Walt Disney
Sandeep Manam

Summary

Overall 10+ years of professional IT experience and over 5 years of Big Data Ecosystem experience in ingestion, storage, querying, processing, analysis of big data/ Databricks/cloud Technologies (AWS, Azure, GCP) Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Databricks, Azure Synapse Analytics, Azure Cosmos DB, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure HDInsight, Key Vault, Azure Data Lake for data ingestion, ETL process, data integration, data migration, AI solutions. Ingested data into Azure Blob storage and processed the data using Databricks, Involved in writing Spark scripts and UDF's to perform transformations on large dataset. Experience working with Azure Blob and Data Lake storage and loading data into Azure SQL Synapse analytics. Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration. Created Databricks notebooks to streamline and curate the data for various business use cases and mounted blob storage on Databricks. Experience in building data pipelines, computing large volumes of data using Azure Data factory. Developed Python scripts to do file validations in Databricks and automated the process using ADF. In-depth knowledge of Hadoop and Spark, experience with data mining and stream processing technologies (Kafka, Spark Streaming) Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL and HDFS, parallel processing - MapReduce framework. Development of Spark-based application to load streaming data with low latency, using Kafka and Pyspark programming. Extensive hands-on experience tuning spark Jobs. Experienced in working with structured data using HiveQL and optimizing Hive queries. Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data. Experience in development of Big Data projects using Hadoop, Hive, Flume, and MapReduce open-source tools. Experience in installation, configuration, supporting and managing Hadoop clusters. Experience in working with MapReduce programs using Apache Hadoop for working with Big Data. Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite. Worked with BTEQ in UNIX environment and execute the TPT script from UNIX platform. Worked on Teradata Store Procedures and functions to confirm the data and load it on the table. Worked on Teradata Multi-Load, Teradata Fast-Load utility to load data from Oracle and SQL Server to Teradata. Wrote numerous BTEQ Scripts to run complex queries on the Teradata database. Tuning SQL queries to overcome spool space errors and improve performance. Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS. Strong hands-on experience with AWS services, including EMR, S3, EC2, Lambda, Glue, Redshift, Athena, DyanmoDB. Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies. Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Hive, Sqoop, Oozie, Flume, big data technologies. Worked on Spark, Spark Streaming and using CoreSparkAPI to explore Spark features to build data pipelines. Experienced in working with different scripting technologies like Python and UNIX shell scripts. Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop. Installed and configured Apache airflow for workflow management and created workflows in python. Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool. I have experience in database design, entity relationships and database analysis, programming SQL, stored procedures PL/SQL, packages, and triggers in Oracle. Experience in working with different data sources like Flat files, XML files and Databases. Hands on experience in working with Continuous Integration and Deployment (CI/CD) Strong communication skills, analytic skills, good team player and quick learner, organized and self-motivated.

Transitioning from data-centric environment with focus on developing efficient data solutions and optimizing workflows. Skilled in data architecture, database management, SQL, and Python, with track record of enhancing data-driven decision-making processes. Seeking to apply these transferrable skills in new field, bringing consultative approach to solving complex problems and improving operational efficiency.

Overview

11
years of professional experience
1
Certification
6
Languages

Work History

MasterControl

Senior Data Engineer
01.2022 - Current

Job overview

  • The Data Engineering team works on building data lake using Bigdata to pull any type of customer information
  • The challenge is to make all customer data easily accessed for the marketing group so that when they pull groups of data, they are capturing the correct audience through Burst
  • Worked on end-to-end ETL (Extract, Transform, Load) processes using ADF, ensuring seamless data integration and quality
  • Hands on experience on creating spark clusters and parameters in the databricks notebooks
  • I have implemented delta lake concepts in databricks
  • Experienced in implementing SCD1, SCD2 using delta lake and data validations using pyspark and scheduling databricks notebooks using databricks jobs
  • Experienced in implementing audit logs in every notebook and exception handling and logging in every databricks notebook
  • Experienced in implementing dynamic notebooks using notebook parameters
  • Hands on knowledge on using dbutils commands and processing different file formats like csv, parquet, Json etc
  • Hands on knowledge on different connectors in databricks like creating mount point to connect to data lake storage and creating JDBC connections from databricks to azure synapse
  • Hands on knowledge on using different spark operations and converting unrefined data to refined data using pyspark
  • Strong knowledge of analytics functions like window, rank and dense_rank etc
  • And integrating databricks with azure key vault
  • Developed pyspark code for data cleansing like trimming of columns and duplicate checks, key duplicates etc
  • And SCD Type1 using merge functionality
  • Hands on experience in implementing different types of joins in pyspark and SQL
  • Having experience in implementing Azure Data factory(v2) pipeline components such as linked services, Datasets and Activities
  • Implemented multi tables full load from on-premises to cloud and incremental load from on-premises to cloud and Implemented Audit in Azure Data Factory using stored procedure
  • Implemented the scheduling in Azure Data Factory using Triggers and Dynamic Data loading using a Config
  • Table using components Lookup, Foreach and Copy Data activities for the tables which are configured in the Config
  • Table and experience on Key Vaults
  • Scheduling ADF pipelines using scheduled triggers and event-based triggers
  • Working experience Pipelines in ADF using Linked Services/Datasets/Pipeline to Extract and load data from different sources like Azure SQL, ADLS, Blob storage, Azure SQL Data warehouse
  • Involved in Bug fixes and code debugging and job monitoring in Production Environment
  • Implemented Polybase in Azure SQLDW and email feature using Azure Logic-Apps to get pipeline notifications
  • Designed and implemented Snowflake stages to efficiently load data from various sources into Snowflake tables
  • Created and managed different types of tables in Snowflake, such as transient, temporary, and persistent tables and optimized Snowflake warehouses by selecting appropriate sizes and configurations to achieve optimal performance and cost efficiency
  • Applied advanced partitioning techniques within Snowflake, elevating both query performance and the efficiency of data retrieval
  • Effectively configured and oversaw multi-cluster warehouses in Snowflake, ensuring their capacity to adeptly manage high-concurrency workloads
  • Worked with Snowpipe to seamlessly ingest real-time data into the Snowflake environment, guaranteeing a steady flow of data and automating the loading process
  • Utilized Power BI to design and develop interactive dashboards and reports for various departments (e.g., Sales, Marketing, Finance) to track key performance indicators (KPIs) and gain insights into business performance
  • Created calculated columns and measures using DAX to manipulate and analyze data within Power BI
  • Employed data cleaning techniques and transformations to ensure data accuracy and consistency for reliable reporting
  • Partnered with business stakeholders to understand their data needs and translated them into actionable data visualizations and reports
  • Developed and maintained data refresh processes to ensure reports reflect the latest data
  • Increased sales team productivity by 20% through the creation of an interactive sales pipeline dashboard in Power BI
  • Reduced marketing campaign costs by 15% through data-driven insights generated from Power BI reports
  • Containerized applications and services using Docker and orchestrated them with Kubernetes for scalability and portability
  • Managed version control and code repositories using GIT, ensuring collaboration and code quality
  • Implemented CI/CD pipelines in Azure DevOps, automating deployment and ensuring continuous integration
  • Environment: Azure DW, ADF, Azure Databricks, Azure solutions, Python, SQL, Bash, Hadoop, Hive, Sqoop, PySpark, Scala, XML, JSON, Docker, Kubernetes, GIT, Azure DevOps, ELK, JIRA, Agile, Snowflake

Guardian Life Insurance limited

Data Engineer
02.2021 - 12.2021

Job overview

  • Established and developed serverless applications on AWS using the Serverless Framework and Python's boto3 module
  • Build serverless applications utilizing AWS Lambda, API Gateway, and DynamoDB, resulting in a reduction in infrastructure costs and an improvement in scalability
  • Designed and developed end-to-end data pipelines using AWS Glue, Apache Airflow and Apache Spark to extract, transform, and load data from various sources into data warehouses
  • Integrated data quality checks and data governance mechanisms to ensure data accuracy and consistency throughout the organization
  • Created and designed ETL processes in AWS Glue to import various kinds of from outside sources into AWS Redshift
  • Monitored and maintained the health and performance of Amazon Redshift clusters through monitoring tools and dashboards
  • Managed backups and disaster recovery procedures to ensure data integrity and business continuity
  • Implemented data encryption and security measures to protect sensitive data in compliance with industry standards
  • Designed and optimized data models, including defining schemas and setting up indexes, to enhance data storage and retrieval efficiency in Snowflake
  • Developed and maintained data pipelines using Apache Airflow and custom scripts to extract, transform, and load (ETL) data from diverse sources into the RedShift data warehouse
  • Set up proactive monitoring and alerting systems to detect and address data loads, query performance, and system health issues
  • Utilized Apache Spark for distributed computing tasks, improving processing speeds
  • Environment: AWS Cloud S3, Snowflake, DB2, Bigdata, SSIS, Spark, EMR, Python, MapReduce, Glue, CloudWatch

Standard Chartered Bank

Azure Data Engineer
11.2019 - 01.2021

Job overview

  • Responsible for creating a data lake on the Azure Cloud Platform to improve business teams' use of Azure Synapse SQL for data analysis
  • Utilized Azure SQL as an external hive meta store for Databricks clusters so that metadata has persisted across multiple clusters
  • Employed Azure Data Lake Storage as a data lake and made sure that spark and hive tasks immediately sent all the processed data to ADLS
  • Responsible for designing fact and dimension tables with Snowflake schema to store the historical data and query them using T-SQL
  • Strong Experience working with Azure Databricks runtimes and utilizing data bricks API for automating the process of launching and terminating runtimes
  • Experience in integrating Snowflake data with Azure Blob Storage and SQL Data Warehouse using Snow Pipe
  • Employed resources like SQL Server Integration Services, Azure Data Factory, and other ETL tools to identify the route for transferring data from SAS reports to Azure Data Factory
  • Transferred data to Excel and Power BI for analysis and visualization after being moved to Azure Data Factory and managed in Azure Databricks
  • Implemented version control using Git for collaborative development
  • Utilized Visual Studio Team Services (VSTS) for project management, code repositories, and continuous integration
  • Developed PowerShell scripts for automation and configuration management
  • Worked on retrieved data from Kafka and transfer it to the Spark pipeline, use Spark Kafka streaming and built SQOOP scripts to import and export data from RDBMS to HDFS
  • Worked on the Snowflake Schema, data modelling and elements, source-to-target mappings, interface matrix, and design components, created analytical warehouses on Snowflake and used Snow SQL to perform data quality issue analysis
  • Implemented data processing workflows using PySpark, leveraging the power of Python for Spark
  • Developed and optimize Spark pipelines for efficient large-scale data processing
  • Utilized JIRA for project tracking, task management, and collaboration
  • Developed Tableau dashboards for visualizing and presenting data insights to stakeholders
  • Designed and deployed infrastructure using Azure Resource Manager (ARM) templates
  • Implemented and managed SQL Data Warehousing (SQL DW) solutions for analytical processing
  • Maintained Azure Data Factory to consume data from several source systems and transferred data from upstream to downstream systems using Azure Data Factory as an orchestration tool
  • Developed a Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline to extract, transform, and load data from a variety of sources, including Azure SQL, Blob storage, an Azure SQL data warehouse, write-back tools, and backwards
  • Implemented and optimized data storage solutions using Azure Cosmos DB
  • Utilized Azure Analysis Services for multidimensional data analysis and reporting
  • Designed and orchestrated data workflows using Azure Data Factory
  • Utilized Azure Data Lake Storage (ADLS) for scalable and secure data storage
  • Implemented and optimized data processing using Azure Data Bricks
  • Utilized Azure Stream Analytics for real-time data streaming and analytics
  • Developed and optimized Pig and Hive scripts for data transformation and querying
  • Managed and optimized distributed storage using Hadoop Distributed File System (HDFS) and Cloud SQL
  • Implemented Spark-Kafka integration for efficient data streaming and processing
  • Utilized Apache Kafka for building scalable and fault-tolerant data pipelines
  • Environment: Apache Spark, Blob storage, ADL gen2, ADF, DataBricks, Synapse Analytics, Hive, Cosmos DB, Spark-SQL,Snowflake, Unix, Kafka, Python, SQL Server

Extarc Software Solutions Pvt. Ltd

Hadoop Developer
05.2017 - 10.2019

Job overview

  • Developed ETL jobs using Spark -Scala to migrate data from Oracle to new MySQL tables
  • Rigorously exerted Spark -Scala (RDD’s, Data frames, Spark SQL) and Spark - Cassandra -Connector APIs for various tasks (Data migration, Business report generation etc.)
  • Developed Spark Streaming application for real time sales analytics
  • Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption
  • Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project
  • Engineered complex data pipelines using tools such as Databricks, processing terabytes of data to drive decision-making
  • Analyzed the source data and handled efficiently by modifying the data types
  • Worked on excel sheet, flat files, CSV files to generated PowerBI ad-hoc reports
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Extracted the data from other data sources into HDFS using Sqoop
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS
  • Extracted the data from MySQL into HDFS using Sqoop
  • Implemented automation for deployments by using YAML scripts for massive builds and releases
  • Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop
  • Implemented Data classification algorithms using MapReduce design patterns
  • Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs
  • Worked on GIT to maintain source code in Git and GitHub repositories
  • Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Cassandra, YAML, ETL

Triniti Advanced Software Labs

Data Warehouse Consultant
05.2014 - 04.2017

Job overview

  • Worked with Stakeholders regarding business requirements, functional specifications, and enhancements, based on the business needs created technical design and functional specification documents
  • Active participation in weekly calls with data modeling and analyst teams to understand and work on any new requirements
  • Analyzing data from source systems to design the solution for the business requirement
  • Developed Complex Mappings in Informatica using Power Center transformations (Source Qualifier, Joiner, Lookups, Filter, Router, Aggregator, Expression, XML Update and Sequence generator transformations), Mapping Parameters/Variables, Parameter files, SQL overrides, Transformation Language
  • Data profiling the source files and developing data model and mappings for smaller requirements
  • Implemented CDC, SCD2, SCD1 Delta load, Snapshot and transactional fact tables, headers and footers to Flat File, File list
  • Developed Unix scripts for SFTP file transfers and Target table truncate operations
  • Implemented partitioning at database level for better performance
  • Scheduled the Workflows to run on a daily and weekly basis using Control-M Scheduling tool
  • Provided support to the QA team for various testing phases of ETL development
  • Involved in unit testing and Unit test plan document preparation
  • Documenting mapping specifications, STM, Unit Test cases, procedure, results
  • Implemented Push Down Optimization (PDO) for better performance when source data is huge
  • Environment: Informatica PowerCenter 9.5/9.1, Oracle 11g, Flat Files, COBOL Files, Erwin, Control-M, SQL, PL/SQL, Shell Scripting

Education

Rivier University
Nashua, NH

Master of Science from Computer And Information Systems
05.2001

Skills

Python

Certification

Microsoft Certified, Azure Data Engineer Associate - Microsoft.

Timeline

Senior Data Engineer

MasterControl
01.2022 - Current

Data Engineer

Guardian Life Insurance limited
02.2021 - 12.2021

Azure Data Engineer

Standard Chartered Bank
11.2019 - 01.2021

Hadoop Developer

Extarc Software Solutions Pvt. Ltd
05.2017 - 10.2019

Data Warehouse Consultant

Triniti Advanced Software Labs
05.2014 - 04.2017

Rivier University

Master of Science from Computer And Information Systems
05.2001

Accomplishments

  • Supervised team of 4 staff members.
Availability
See my work availability
Not Available
Available
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Work Preference

Work Type

Contract Work

Work Location

On-SiteRemoteHybrid

Important To Me

Career advancementWork-life balanceCompany Culture

Languages

English
Full Professional
Hindi
Full Professional
Telugu
Native or Bilingual
Italian
Full Professional
Sandeep ManamData Engineer