Summary
Work History
Education
Skills
Websites
Professional Summary
Timeline
Generic

Abhishek Ponnaganti

Tampa,FL

Summary

Dynamic and results-driven Data Engineer with over 8 years of experience in transforming business requirements into analytical models, designing algorithms, and implementing strategic solutions for massive volumes of data. Proven expertise in Big Data/Hadoop, cloud platforms, and data analysis. Seeking a challenging role to leverage technical proficiency and leadership skills in a forward-thinking organization.

Work History

Senior Data Engineer

Company Name
  • Led the gathering of requirements, conducted system analysis, and provided development and testing effort estimations
  • Contributed to the design of various system components, including Sqoop, Hadoop processes involving map reduce, Hive, Spark, and FTP integration for downstream systems
  • Implemented optimized Hive and Spark queries using techniques such as window functions and customized Hadoop shuffle and sort parameters
  • Developed ETL processes using PySpark, utilizing both Dataframe API and Spark SQL API for transformations and actions
  • Resulting data was stored in HDFS and transferred to the Snowflake database
  • Successfully migrated an on-premises application to AWS, utilizing services like EC2 and S3 for small dataset processing and storage
  • Proficient in maintaining Hadoop clusters on AWS EMR
  • Expertise in real-time data analytics using Spark Streaming, Kafka, and Flume
  • Configured Spark Streaming to extract ongoing information from Kafka and store it in HDFS
  • Designed and developed ETL processes in AWS Glue to migrate Campaign data from external sources to AWS Redshift, employing various Spark transformations and actions for data cleansing
  • Utilized Jira for issue tracking and Jenkins for continuous integration and deployment
  • Enforced data catalog and governance standards
  • Created DataStage jobs incorporating different stages for ETL processes, including Transformer, Aggregator, Sort, Join, Merge, Lookup, and more
  • Proficient in creating, debugging, scheduling, and monitoring ETL batch processing jobs using Airflow for Snowflake
  • Built ETL pipelines for data ingestion, transformation, and validation on AWS, collaborating with data stewards for data compliance
  • Scheduled jobs using Airflow scripts with Python, adding tasks to DAGs and managing dependencies between tasks
  • Employed PySpark for data extraction, filtering, and transformation in data pipelines
  • Monitored servers using Nagios, CloudWatch, and ELK Stack (Elasticsearch, Kibana)
  • Utilized Data Build Tool for ETL transformations, AWS Lambda, and AWS SQS
  • Developed Spark applications using Spark SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats
  • Responsible for estimating cluster size, monitoring, and troubleshooting Spark Databricks clusters
  • Automated data load processes to the target Data Warehouse using Unix Shell scripts
  • Implemented monitoring solutions in Ansible, Terraform, Docker, and Jenkins
  • Environment: Python, Power BI, AWS Glue, Athena, SSRS, SSIS, AWS S3, AWS Redshift, ETL, AWS EMR, AWS RDS, DynamoDB, SQL, Tableau, Distributed Computing, Snowflake, Spark, Kafka, MongoDB, Hadoop, Linux Command Line, Data Structures, PySpark, Oozie, HDFS, MapReduce, Cloudera, HBase, Hive, Pig, Docker, and Tableau.

Data Engineer

Company Name
  • Proficient in Cloud Service Providers, particularly Azure
  • Successfully migrated a SQL database to various Azure services including Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse
  • Managed database access and facilitated the migration of on-premise databases to Azure Data Lake Store using Azure Data Factory
  • Analyzed, designed, and implemented modern data solutions using Azure PaaS services to support data visualization
  • Evaluated the impact of new implementations on existing business processes
  • Executed Extract, Transform, and Load (ETL) processes from source systems to Azure Data Storage services using Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Ingested data into various Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processed it in Azure Databricks
  • Utilized REST APIs to retrieve analytics data from different data feeds and created pipelines in Azure Data Factory for data extraction, transformation, and loading from sources such as Azure SQL, Blob storage, Azure SQL Data Warehouse, and write-back tools
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats, uncovering insights into customer usage patterns
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of Spark Databricks clusters
  • Proficient in performance tuning of Spark applications, adjusting Batch Interval time, level of Parallelism, and memory tuning
  • Developed JSON scripts for deploying pipelines in Azure Data Factory (ADF) to process data using the Sql Activity
  • Hands-on experience in developing SQL scripts for automation purposes
  • Created Build and Release processes for multiple projects (modules) in a production environment using Visual Studio Team Services (VSTS)
  • Environment: PL/SQL, Python, Azure Data Factory, Azure Blob Storage, Azure Table Storage, Azure SQL Server, Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle 12c, SQL Server, Teradata SQL Assistant, Teradata Vantage, Microsoft Word/Excel, Flask, Snowflake, DynamoDB, Athena, Lambda, MongoDB, Pig, Sqoop, Tableau, Power BI, UNIX, Docker, and Kubernetes.

Data Engineer

Company Name
  • Proficient in designing and constructing multiple data pipelines, overseeing end-to-end ETL and ELT processes for data ingestion and transformation, both in AWS and Spark environments
  • Utilized cloud and GPU computing technologies for automated machine learning and analytics pipelines, with a focus on AWS
  • Engaged in all stages of data mining, including data collection, cleaning, model development, validation, and visualization
  • Conducted gap analysis and provided feedback to enhance software delivery for business teams
  • Applied data mining techniques to large datasets of both structured and unstructured data, covering data acquisition, validation, predictive modeling, and visualization for provider, member, claims, and service fund data
  • Developed RESTful APIs (Microservices) using the Python Flask framework, packaged in Docker, and deployed in Kubernetes through Jenkins Pipelines
  • Constructed and architected multiple data pipelines, overseeing end-to-end ETL and ELT processes for data ingestion and transformation in PySpark
  • Created reusable REST APIs gathering requirements directly from businesses and blending data from various sources
  • Contributed to the development of a Data Warehouse and Business Intelligence architecture, involving data integration and conversion from multiple sources and platforms
  • Managed full data loads from production to AWS Redshift staging environment and played a role in migrating EDW to AWS using EMR and other technologies
  • Developed, scheduled, and debugged Spark jobs using Python, focusing on data analysis, migration, transformation, integration, and import/export tasks
  • Conducted gathering and processing of raw data at scale, involving scripts, web scraping, API calls, SQL queries, and application development
  • Implemented reusable Python scripts to ensure data integrity between source (Teradata/Oracle) and target systems (Snowflake/Redshift)
  • Migrated on-premise database structure to Confidential Redshift data warehouse
  • Established data pipelines for various events, loading data from DynamoDB to AWS S3 bucket, HDFS, and ensuring high success metrics
  • Applied Scala and Spark for authoring, scheduling, and monitoring Data Pipelines
  • Expertise in building Snowpipe, comprehensive knowledge of Data Sharing in Snowflake Database, and mastery of schema and table structures
  • Utilized Airflow pipelines to explore DAGs, dependencies, and logs for automation purposes
  • Designed and implemented a fully operational, large-scale data solution on Snowflake
  • Developed systems to collect data from multiple platforms using Kafka and processed it with Spark
  • Created modules for spark streaming to ingest data into Data Lake, working with various data feeds like JSON, CSV, XML, and implementing the Data Lake concept
  • Executed Hive queries on Parquet tables stored in Hive for data analysis, and developed MapReduce programs to cleanse heterogeneous data from various sources for ingestion into Hive
  • Environment: Python, Teradata, Netezza, Oracle 12c, PySpark, MS Office, SQL Server, UML, MS Visio, Oracle Designer, Cassandra, Azure, Oracle SQL, Athena, SSRS, SSIS, DynamoDB, Lambda, Hive, HDFS, Sqoop, Scala, No-SQL (Cassandra), and Tableau.

Hadoop Developer

Company Name
  • Developed Spark programs in Python, applying functional programming principles to process intricate structured datasets
  • Operated in a dynamic agile development environment for rapid analysis, development, and testing of potential business use cases
  • Responsible for designing and developing high-performance data architectures supporting data warehousing, real-time ETL, and batch big-data processing
  • Utilized Hadoop infrastructure to store data in HDFS and migrated the underlying SQL codebase in AWS using Spark/HIVE SQL
  • Translated Hive/SQL queries into Spark transformations using Spark RDDs and PySpark
  • Analyzed SQL scripts and devised solutions for implementation using PySpark
  • Exported tables from Teradata to HDFS with Sqoop, building tables in Hive
  • Processed large sets of structured, semi-structured, and unstructured data using Hadoop/Big Data concepts
  • Loaded JSON data using SparkSQL, creating Schema RDD and loading it into Hive Tables
  • Managed structured data using SparkSQL
  • Worked with the Hadoop ecosystem, implementing Spark using Scala, and utilizing DataFrames and Spark SQL API for faster data processing
  • Developed Spark code using Scala and Spark-SQL/Streaming for efficient data processing
  • Created RDDs/DataFrames in Spark, applying various transformation logics to load data from Hadoop Data Lakes
  • Filtered and cleaned data using Scala code and SQL Queries
  • Experience in handling Financial Reports such as '2052A,' 'FR-Y9C,' '14Q,' and '10K-Q.' Acted as the primary on-site ETL Developer during project analysis, planning, design, development, and implementation stages, utilizing IBM WebSphere software
  • Prepared Data Mapping Documents and designed ETL jobs based on the DMD with required tables in the Dev Environment
  • Installed and configured a multi-node cluster on the Cloud using Amazon Web Services (AWS) on EC2
  • Designed and developed architecture for a data services ecosystem spanning Relational, NoSQL, and Bigdata technologies
  • Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports
  • Participated actively in decision-making and QA meetings, regularly interacting with Business Analysts and the development team to better understand the Business Process, Requirements, and Design
  • Used DataStage as an ETL tool to extract data from source systems, loading it into the ORACLE database
  • Designed and developed DataStage Jobs to extract data from heterogeneous sources, applying transform logics, and loading it into Data Warehouse Databases
  • Employed Talend for Big Data Integration using Spark and Hadoop
  • Environment: Hadoop, HDFS, Hive, Flume, HBase, Sqoop, PIG, MySQL, Ubuntu, Zookeeper, Amazon EC2, SOLR.

Education

Master of Science - Business Analytics

Northwood University
Midland, MI
12.2023

Skills

  • Python, Scala, SQL, MVS, TSO/ISPF, VB, VTAM, Korn shell scripting
  • Oracle, SQL Server, MySQL, NoSQL, PostgreSQL, Microsoft Access, Oracle Querying PL/SQL
  • AWS, Azure, GCP
  • Data Warehousing
  • Data Modeling
  • Machine Learning
  • Data Migration

Professional Summary

HDFS, MapReduce, Spark, Hive, Sqoop, Flume, Kafka, Oozie, Pig, HBase, Well-rounded IT experience covering various Big Data technologies, Spark, and database development, Demonstrated proficiency in Amazon Web Service (AWS) concepts, particularly in EMR and EC2, ensuring fast and efficient processing of Teradata Big Data Analytics, Proficient in Google Cloud Platform (GCP), adding versatility to cloud service utilization, Expertise in transforming intricate business requirements into analytical models, involving algorithm design, model building, and solution development for Data Mining, Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, and scalable Machine Learning Algorithms, Proven ability to develop solutions that scale across massive volumes of both structured and unstructured data, Experience in working with Hadoop distributions like Cloudera and Hortonworks, Excellent experience in end-to-end ETL processes, including Designing, Developing, Documenting, and Testing of ETL jobs and mappings in Server and Parallel jobs using Data Stage, Hands-on experience in Apache Spark, Spark Streaming, and Spark SQL, Familiarity with NoSQL databases like HBase, Cassandra, and MongoDB, Established and executed comprehensive Data Quality Governance Frameworks, ensuring decisions align with intended purposes through end-to-end processes, Expert in designing Server jobs using various types of stages like Sequential file, ODBC, Hashed file, Aggregator, Transformer, Sort, Link Partitioner, and Link Collector, Proficient in a wide range of Big Data Practices and Technologies, including HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, and Kafka, Expertise in designing Parallel jobs, utilizing various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, and XML, Demonstrated a commitment to continuous learning and skill enhancement throughout the career

Timeline

Senior Data Engineer

Company Name

Data Engineer

Company Name

Data Engineer

Company Name

Hadoop Developer

Company Name

Master of Science - Business Analytics

Northwood University
Abhishek Ponnaganti