Summary
Overview
Work History
Education
Skills
Timeline
Generic

Maneesha Akepati

Cleveland,OH

Summary

  • Data Engineer with 4 years of experience in Azure, Hadoop, Snowflake, and Informatica.
  • Experienced in implementing ETL using Azure Data platform capabilities, including Azure Data Lake, Azure
  • Data Factory, HDInsight, Azure SQL Server, Azure ML, and Power BI.
  • Hands-on experience with Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights,
  • Azure Monitoring, Key Vault, and Azure Data Lake.
    Good experience designing cloud-based solutions in
  • Azure by creating Azure SQL databases, setting up Elastic pool jobs, and designing tabular models in Azure Analysis Services.
  • Valuable experience working with Azure BLOB and
  • Data Lake storage and loading data into Azure Synapse analytics (DW).
  • Extensive experience in creating pipeline jobs, scheduling triggers, and managing data quality using Azure Data Factory.
  • Expertise in Azure Data Platform, including Azure Synapse, Data Factory, SQL, and ADLS.
  • Used Delta Lake, Delta Tables, Delta Live Tables, Data
  • Catalogues, and Delta Lake API for implementing data pipelines.
  • Experience in building machine learning models and pipelines on Azure Databricks for predictive analytics and data-driven decision-making.
  • Skilled in creating and managing complex data pipelines, including error handling, monitoring, and scheduling, to ensure data flows reliably and on schedule.
  • Strong expertise in managing data in the Microsoft Azure public cloud environment.
  • Familiarity with deploying and managing HDInsight clusters.
  • Experience in building near-real-time data pipelines using Kafka and Py Spark.
    Implemented Spark performance tuning, Spark SQL, and Spark Streaming in Big Data and Azure Databricks.
  • Experience in developing Spark applications using Spark SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into customer usage patterns in Azure Databricks using Py Spark.
  • Familiarity with integrating Cassandra with other tools and frameworks, such as Apache Spark, Apache Kafka, or Elasticsearch, for data processing and analytics.
  • Familiarity with database administration tasks, including backup and recovery, security, and access control.
  • Knowledgeable in applying data modeling techniques to normalize or de-normalize databases based on specific use cases.
  • Implemented real-time data ingestion using Apache Kafka.
  • Proficient in designing and developing DWH solutions, architecting ETL strategies, and utilizing SQL, PySpark, and SparkSQL for data manipulation and analysis.
  • Hands-on experience in implementing data pipeline solutions using Hadoop, Azure, ADF, Synapse, Pyspark, MapReduce, Hive, Tez, Python, Scala, Azure functions, Logic apps, stream sets, ADLS Gen2, and Snowflake.
  • Experienced in developing and optimizing cache routines, queries, and class methods to implement business logic and data manipulation efficiently.
  • Extensive experience working with AWS Cloud services and AWS SDKs, utilizing services like AWS API Gateway, Lambda, S3, IAM, and EC2. Extensive experience working with AWS Cloud services and AWS SDKs, utilizing services like AWS API Gateway, Lambda, S3, IAM, and EC2.
  • Good understanding of Big Data Hadoop and Yarn architecture, along with various Hadoop daemons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).
  • Experience in creating API endpoints using Boomi to enable efficient data exchange between
    different systems.
  • Experience in choosing appropriate data structures for diverse types of data stored in MongoDB.
    Participated in the development, improvement, and maintenance of Snowflake database applications.
  • Experience in developing Scala applications on Hadoop and Spark SQL for high-volume and real-time data processing.
  • Capable of leveraging Snowpark to build complex data transformations and analytics directly within Snowflake.
  • Profound understanding of security practices in MuleSoft integrations, including securing APIs with OAuth, SSL, and token-based authentication.
  • Skilled in ensuring data quality and integrity through validation, cleansing, and transformation operations.
  • In-depth knowledge of Snowflake Database, schema, and table structures.
  • Implemented data encryption using Voltage and Azure key vaults (private erasure).
  • Experience in developing Spark applications in Python (PySpark) on a distributed environment to load a huge number of CSV files with different schemas into Hive ORC tables.
  • Experience in data analysis, data modeling, and implementation of enterprise-class systems spanning Big Data, data integration, and object-oriented programming.
  • Experience in shell scripting and automation to streamline routine tasks and enhance system efficiency.
  • Implemented data warehouse solutions using Snowflake and Star Schema.
  • Implemented data pipelines using Informatica and PL/SQL. Experience in implementing ad hoc analysis solutions using Azure Data Lake Analytics/Store and HDInsight.

Overview

6
6
years of professional experience

Work History

Data Engineer

Retail Next
08.2023 - Current
  • Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing, and Reporting of voluminous, rapidly changing data.
  • Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.
  • Orchestrated complex data workflows with AzureData Factory, including scheduling, error handling, and monitoring, ensuring data consistency and reliability.
    Worked on creating tabular models on Azure analytic services for meeting business reporting requirements.
  • Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks.
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
  • Developed and implemented T-SQL solutions to create and manage databases, to insert, update, and delete data, and to perform complex queries on data.
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Worked with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Increased efficiency of data-driven decision making by creating user-friendly dashboards that enable quick access to key metrics.
  • Developed Python, PySpark, Bash scripts log to Transform, and Load data across on-premises and cloud platform.
  • Developed database architectural strategies at modeling, design, and implementation stages to address business or industry requirements.
  • Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday
    and real-time data processing.
  • Collaborated with system architects, design analysts and others to understand business and industry requirements.
  • Designed and implemented an ETL process to extract and load data from a variety of sources into a data warehouse, improving data quality and reducing data processing time by 20%.
  • Automated routine tasks using Python scripts, increasing team productivity and reducing manual errors.
  • Worked closely with cross-functional teams, including data scientists, financial analysts, and software engineers, to translate financial requirements into technical solutions.
  • Provided financial advice to clients on a range of topics, including investment planning, risk management, and retirement planning.
  • Set up and worked on Kerberos authentication principles to establish secure network communication on cluster and testing of HDFS, Hive, Pig and Map
  • Reduce to access cluster for new users.
  • Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD.
  • Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.
  • Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in hive, doing map side joins etc.
  • Good knowledge on Spark platform parameters like memory, cores, and executors
  • Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
  • Importing & exporting databases using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages).

Azure Data Engineer

TATA CAPITAL
07.2020 - 12.2022
  • Analyzed, designed, and built modern data solutions using Azure PaaS services to support visualization of data. Understood the current production state of the application and determined the impact of the new implementation on existing business processes.
  • Implemented solutions for ingesting data from various sources and processing the data at rest utilizing Big Data technologies such as Hadoop, MapReduce frameworks, HBase, and Hive.
  • Implemented security measures within Azure Data Factory, including role-based access control (RBAC) and data encryption, to ensure data protection and compliance with organizational standards.
  • Extracted data from various sources, transformed it into Azure Databricks according to business requirements, and loaded it into Snowflake for analysis and reporting. Used Snowflake for various data ingestion methods, including bulk loading, streaming, and integration with popular ETL tools like Azure Data Factory.
  • Involved in extracting, transforming, and loading data from source systems to Azure data storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL. Azure Data Lake Analytics data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Implemented CI/CD pipelines for AKS deployments using Azure DevOps or other CI/CD tools.
  • Leveraged Azure Databricks for large-scale data processing, using PySpark to manipulate and analyze terabytes of data, resulting in a 15% performance improvement compared to traditional data processing methods.
  • Worked on migration of data from on-premises SQL Server to cloud databases (Azure Synapse Analytics (DW) and Azure SQL DB).
  • Performed analyses on data quality and applied business rules in all layers of data extraction, transformation, and loading (ETL) process using Azure Synapse Analytics.
  • Leveraged HDInsight to perform big data processing and analytics tasks using popular tools like Hadoop, Spark, Hive, and HBase.
  • Managed and led the development effort with the help of a diverse internal and overseas group and designed/architected and implemented complex projects dealing with considerable data size (TB/PB) and high complexity.
  • Created pipelines in Azure Data Factory (ADF) using linked services/datasets/pipelines to extract, transform, and load data from different sources like Azure SQL, Blob storage, and Azure SQL Data warehouse. Created pipelines in ADF to write-back data to the source and backwards.
  • Utilized Apache Spark with Python to develop and execute big data analytics and machine learning applications, executed machine learning use cases under Spark ML and Mllib, and explored with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java, and developed predictive analytics using Apache Spark Scala APIs.
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns.
  • Managed Azure Purview to ensure that data is classified and protected in accordance with organizational policies.
  • Developed JSON scripts for deploying the pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity.
  • Worked on catapulting data from Teradata to Snowflake to consume on Databricks, and worked on Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fastload, and FastExport.
  • Used Snowflake as a SQL-based database to write SQL queries to interact with the data.
    Utilized Snowflake to create complex data models to store and analyze data. Created and managed these data models.
  • Designed and developed real-time stream processing applications using Spark, Kafka, Scala, and Hive to perform streaming ETL and apply machine learning, designed, and implemented streaming solutions using Kafka or Azure Stream Analytics.
  • Collaborated with engineers and stakeholders to design and implement Azure Blob Storage and Azure Key Vault solutions.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data and worked on Azure Streaming Analytics with Event Hubs and sending output to Power BI Dashboard.
  • Exploring DAGs, their dependencies, and logs using Airflow pipelines for automation, and using Apache Airflow to schedule and run the Airflow DAGs to execute code and involved in scheduling Airflow workflow engine to run multiple Hive and Pig jobs using Python.
  • Built a CI/CD pipeline using Azure DevOps for automated deployment, resulting in more frequent releases and a 50% reduction in manual deployment efforts.
  • Utilized Terraform templating features for dynamic configuration based on variables.
  • Enhancements to traditional data warehouse based on STAR schema, updating data models, performing data analytics and reporting using Tableau, and involved in migration of data from existing RDBMS (Oracle and SQL Server) to Hadoop using Sqoop for processing data.
  • Troubleshooted and resolved Airflow issues to ensure the continuous operation of data pipelines.
  • Developed shell, Perl, and Python scripts to automate and provide control flow to Pig scripts, and developed HiveQL scripts for performing transformation logic and loading the data from staging zone to landing zone and semantic zone.
  • Developed and implemented an Azure Purview data map to catalog and classify data across the organization.
  • Collaborated with cross-functional teams to define data requirements, leading to the development of effective data models that met business needs.
  • Created and optimized database triggers and PL/SQL packages to automate data manipulation tasks.
  • Led the development of mission-critical PL/SQL applications that improved database efficiency and streamlined business processes.
  • Performed validation and verification of software at all testing phases, including functional testing, system integration testing, end-to-end testing, regression testing, sanity testing, user acceptance testing, smoke testing, disaster recovery testing, production acceptance testing, and pre-prod testing.
  • Designed real-time event-driven integrations using Boomi and its event-driven architecture, enabling faster response to critical business events.
  • Conducted comprehensive unit and integration testing of Django applications using built-in tools like unittest and pytest, ensuring code reliability and maintainability.
  • Environment: Azure Databricks, Data Factory, Logic Apps, Functional App, Snowflake, Data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi, MS SQL, Oracle, HDFS, MapReduce, Airflow, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance.
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Migrated legacy systems to modern big-data technologies, improving performance and scalability while minimizing business disruption.
  • Increased efficiency of data-driven decision making by creating user-friendly dashboards that enable quick access to key metrics.

Junior Data Engineer

Lupin
08.2019 - 06.2020
  • Designed and developed applications on the data lake to transform the data according business users to perform analytics.
  • Worked on architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.
  • Involved in developing a Map Reduce Responsibilities:
    framework that filters bad and unnecessary records.
  • Involved heavily in setting up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS.
  • Developed data pipeline using flume, SQOOP, pig and map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Developed and implemented ETL solutions to extract, transform, and load data from various sources to data warehouses and other data marts.
  • Optimized ETL jobs for performance and scalability.
    Implemented error handling and debugging procedures to ensure robust and fault-tolerant PL/SQL code.
  • Developed an SSIS package to transform data into a format that could be used for reporting and analytics.
  • Implemented the workflows using Apache OOZIE framework to automate tasks.
  • Developed and maintained SQL views and indexes to improve the performance of SQL queries.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Written Map Reduce code that will take input as log files and parse them and structure them in tabular format to facilitate effective querying on the log data.
  • Migrated data from On-premise to Azure cloud and used services like (Azure data Factory, Azure Databricks, Azure Synapse Analytics) to enrich the data.
  • Utilized Jupyter Notebooks for exploratory data analysis and model development.
  • Developed scripts and automated data management from end to end and synchronize up b/w all the Clusters.
  • Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.
  • Environment: Cloudera CDH 3/4, Hadoop, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL, Azure Services.
  • Contributed to a global data warehouse project that consolidated disparate sources into a single repository for unified reporting and analytics purposes.
  • Optimized data processing by implementing efficient ETL pipelines and data transformation techniques.
  • Collaborated closely with cross-functional teams to gather requirements and translate them into actionable insights using advanced analytics methodologies.
  • Reengineered legacy systems with modern frameworks resulted in reduced technical debt without compromising operational stability or performance metrics.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Modeled predictions with feature selection algorithms.

Education

Master of Science - Information Systems Security

University of The Cumberlands
Williamsburg, KY
12-2024

Skills

    Azure Services: Azure data Factory, Airflow, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure DevOps

    Big Data Technologies: MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Airflow, Zookeeper, Snowflake

    Hadoop Distribution: Cloudera, Horton Works

    Languages: Java, SQL, PL/SQL, Python, HiveQL, Scala

    Cloud Platform: Google Cloud Platform (GCP), Amazon Web Services (AWS)

    Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, Red Hat Linux, UBUNTU, CENTOS

    Build Automation tools: Ant, Maven

    Version Control: GIT, GitHub

    IDE & Build Tools, Design: Eclipse, Visual Studio

    Databases: MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse MS Excel, MS Access, Oracle 11g/12c, Cosmos DB

Timeline

Data Engineer

Retail Next
08.2023 - Current

Azure Data Engineer

TATA CAPITAL
07.2020 - 12.2022

Junior Data Engineer

Lupin
08.2019 - 06.2020

Master of Science - Information Systems Security

University of The Cumberlands
Maneesha Akepati