Summary
Overview
Work History
Education
Skills
Timeline
Generic

Saitejareddy Ammakolla

Aldie,VA

Summary

Data Engineer with over 5 years of experience designing, developing, and optimizing scalable data architectures and pipelines. Proficient in using Big Data technologies, cloud environments, and real-time processing frameworks to facilitate effective data storage, transformation, and analytics. Proficient in collaborating with cross-functional teams to build high-quality, data-driven solutions that achieve business objectives.

Extensive experience in Big Data Analytics, including Apache Spark, Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Apache NiFi, and Kafka.
Skilled in designing and optimizing Hadoop core components such as Tracker, NameNode, DataNode, and MapReduce, improving data processing efficiencies.
Skilled in data manipulation and analysis, utilizing Python (Pandas, NumPy, SciPy, Scikit-learn, Matplotlib) to extract meaningful insights from structured and unstructured data.
Excellent hands-on experience with Hadoop ecosystem tooling like Cloudera CDH and Hortonworks HDP managing big-scale Hadoop clusters.
Deep skills in developing and designing Spark applications based on Scala to deliver maximum scalability and performance.
Abundant experience in cloud computing platforms including Azure (Azure Databricks, Azure Data Factory, Azure SQL, Azure Data Lake, Machine Learning) and AWS (EC2, S3, Redshift, EMR), with extensive expertise in migrating on-premises applications to the cloud.
Expertise in data migration, integration using Sqoop, Flume, and Kafka to seamlessly transfer data between disparate systems.
Experience with NoSQL databases such as HBase, Cassandra, and MongoDB, including SQL-to-NoSQL migration tuning for huge-scale applications.
Hands-on with Apache Airflow, using workflow automation, conditional triggers, and job scheduling.
Please provide these details as they are compulsory requirements of the job.
Deep understanding of data warehousing concepts, i.e., creating ETL pipeline, dimensional model, OLAP/OLTP environments, as well as administering tools such as Informatica PowerCenter.
Deep exposure to administering multi-data formats (PARQUET, AVRO, ORC, TEXTFILE, XML) and compression codes (GZIP, SNAPPY, LZO) to optimize storage and processing.
High skills in Snowflake Cloud Data Warehouse and designing and implementing high-performance data architectures to guarantee deep analytics and reporting.
Strong interpersonal and teamwork abilities, with proven track record of interacting with stakeholders, engineers, and business groups to facilitate data-driven decision-making.

Overview

7
7
years of professional experience

Work History

Data Engineer

Equifax
07.2024 - Current
  • Developed machine learning-based ETL pipelines, accessed via REST APIs, for processing large volumes of financial and consumer data
  • Built a scalable data pipeline to process and store 500TB of raw data each week, leveraging Apache NiFi, Kafka, Elasticsearch, Redis, Python, and Go to ensure efficiency and reliability
  • Designed, validated, and deployed data pipelines across IT/UAT/PROD environments using Agile/Sprint methodologies and JIRA
  • Performed ETL operations for data cleansing, mapping, and transformation from Azure Data Lake and on-prem SQL databases
  • Migrated data from Teradata/SQL Server to Hadoop, leveraging dynamic/static partitions and Spark Streaming with Kafka for real-time ingestion
  • Created comprehensive technical design documents for business stakeholders to align technical implementations with requirements
  • Built data warehouses with star and snowflake schema designs, creating fact and dimension tables for efficient data analysis
  • Led the development of a risk assessment model that reduced processing time by 35%, enhancing decision-making efficiency
  • Collaborated with the underwriting team to tailor credit solutions, resulting in a 15% increase in customer satisfaction
  • Streamlined credit reporting processes by implementing new software, saving 10 hours of manual work per week
  • Orchestrated workflows using Apache Airflow and Azure services like Data Lake, Databricks, and Synapse Analytics
  • Implemented CI/CD pipelines for seamless deployments and automated data workflows using Airflow
  • Utilized Power BI to develop analytical dashboards, delivering actionable insights to business users
  • Environment: Python, Microsoft Azure (Synapse Analytics, Data Lake, Data Factory), Teradata, Hadoop, HDFS, Spark, PySpark, Databricks, Kafka, Apache Airflow, Power BI, GIT, SQL Server, Oracle

Data Engineer

ADP
06.2023 - 05.2024
  • Implement and configure ADP HCM solutions, ensuring seamless integration with clients' existing HR and payroll systems
  • Provide training and support to HR teams and business leaders on best practices for ADP Workforce Now and ADP Vantage HCM
  • Optimize payroll and HR processes, reducing processing time and ensuring 100% compliance with federal and state regulations
  • Conduct workforce analytics to provide insights into employee performance, attrition, and compensation trends
  • Designed and implemented aggregate data models, consolidating multiple data streams into a unified dataset
  • Developed and maintained Apache Airflow DAGs, leveraging Docker to streamline local development and deployment
  • Orchestrated complex ETL workflows using AWS Glue, EMR, and Airflow Providers Libraries, optimizing data ingestion pipelines
  • Built micro-batch data pipelines using AWS Lambda, SQS, and Salesforce Marketing Cloud APIs for targeted email remarketing
  • Ingested and transformed data into Snowflake warehouse using Spark and JDBC, implementing role-based user access controls
  • Enabled advanced data analysis and visualization with Tableau, ensuring seamless integration with APIs and other applications
  • Developed robust unit tests for PySpark code, achieving high code coverage with Pytest and SonarQube integration
  • Configured real-time monitoring using AWS CloudWatch and migrated logs to Splunk, integrating PagerDuty for proactive alerting
  • Environment: Python, Apache Airflow, PySpark, AWS (Glue, EMR, Lambda, S3, Redshift, CloudWatch), Snowflake, Tableau, Docker, Jenkins, Splunk, PagerDuty

Data Engineer

CYIENT
07.2018 - 11.2022
  • Successfully migrated legacy Hadoop MapReduce and HDFS systems to AWS Cloud, implementing a cloud-native real-time data processing solution
  • Optimized data workflows by enabling Hive table partitioning and bucketing, improving query performance
  • Developed AWS Lambda-based real-time event processing pipelines integrated with S3 and SQS
  • Transformed semi-structured data into Parquet format using Python (Pandas) and migrated it to AWS for scalable storage
  • Designed and implemented IAM roles and JSON-based access policies, enhancing system security and compliance
  • Automated database testing and validations with SQL procedures for PostgreSQL and MySQL, ensuring data integrity
  • Configured CloudWatch Alarms and log groups for real-time monitoring and debugging of AWS Lambda workflows
  • Managed code migration using GitHub and Bitbucket, ensuring a smooth transition from legacy systems
  • Environment: Apache Hadoop, HDFS, MapReduce, Hive, Python, AWS (S3, SQS, Lambda, CloudWatch), Pandas, PostgreSQL, MySQL, MongoDB, Bitbucket

Education

Master of Science - Computer And Information Systems Security

Wilmington University
New Castle, DE
12-2024

Skills

    Technical Skills

    Programming & Scripting:
    Python, Scala, PySpark, Bash, Shell, Perl

    Cloud Platforms:
    Amazon Web Services (AWS), Microsoft Azure

    AWS Services:
    EC2, S3, Lambda, Route 53, Elastic Beanstalk (EBS), VPC, IAM, ECS (EC2 Container Service), DynamoDB, Auto Scaling, Security Groups, Redshift, CloudWatch, CloudFormation

    Azure Services:
    Azure Data Bricks, Azure Data Factory (ADF), Blob Storage, Azure SQL, Azure Data Lake

    Python Libraries:
    NumPy, Matplotlib, SciPy, PySpark, Pandas, BeautifulSoup, Scikit-Learn

    Version Control:
    Git, GitHub, Bitbucket, SVN

    Big Data Technologies:
    Spark, Kafka, Nifi, Airflow, Flume, Snowflake, HDFS, MapReduce, Pig, Hive, Sqoop, Oozie

    Hadoop Frameworks:
    Cloudera CDHs, Hortonworks HDPs

    Data Modelling Schemas:
    Star Schema, Snowflake Schema

    Visualization Tools:
    Tableau, Power BI, Excel

    Databases:
    MySQL, Oracle, MS-SQL Server, Teradata, HBase, Cassandra, DynamoDB, MongoDB

    Operating Systems:
    Windows, Linux, Unix, MacOS

Timeline

Data Engineer

Equifax
07.2024 - Current

Data Engineer

ADP
06.2023 - 05.2024

Data Engineer

CYIENT
07.2018 - 11.2022

Master of Science - Computer And Information Systems Security

Wilmington University
Saitejareddy Ammakolla