Summary

Overview

Work History

Education

Skills

Timeline

Saitejareddy Ammakolla

Aldie,VA

Summary

Data Engineer with over 5 years of experience designing, developing, and optimizing scalable data architectures and pipelines. Proficient in using Big Data technologies, cloud environments, and real-time processing frameworks to facilitate effective data storage, transformation, and analytics. Proficient in collaborating with cross-functional teams to build high-quality, data-driven solutions that achieve business objectives.

Extensive experience in Big Data Analytics, including Apache Spark, Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Apache NiFi, and Kafka.
Skilled in designing and optimizing Hadoop core components such as Tracker, NameNode, DataNode, and MapReduce, improving data processing efficiencies.
Skilled in data manipulation and analysis, utilizing Python (Pandas, NumPy, SciPy, Scikit-learn, Matplotlib) to extract meaningful insights from structured and unstructured data.
Excellent hands-on experience with Hadoop ecosystem tooling like Cloudera CDH and Hortonworks HDP managing big-scale Hadoop clusters.
Deep skills in developing and designing Spark applications based on Scala to deliver maximum scalability and performance.
Abundant experience in cloud computing platforms including Azure (Azure Databricks, Azure Data Factory, Azure SQL, Azure Data Lake, Machine Learning) and AWS (EC2, S3, Redshift, EMR), with extensive expertise in migrating on-premises applications to the cloud.
Expertise in data migration, integration using Sqoop, Flume, and Kafka to seamlessly transfer data between disparate systems.
Experience with NoSQL databases such as HBase, Cassandra, and MongoDB, including SQL-to-NoSQL migration tuning for huge-scale applications.
Hands-on with Apache Airflow, using workflow automation, conditional triggers, and job scheduling.
Please provide these details as they are compulsory requirements of the job.
Deep understanding of data warehousing concepts, i.e., creating ETL pipeline, dimensional model, OLAP/OLTP environments, as well as administering tools such as Informatica PowerCenter.
Deep exposure to administering multi-data formats (PARQUET, AVRO, ORC, TEXTFILE, XML) and compression codes (GZIP, SNAPPY, LZO) to optimize storage and processing.
High skills in Snowflake Cloud Data Warehouse and designing and implementing high-performance data architectures to guarantee deep analytics and reporting.
Strong interpersonal and teamwork abilities, with proven track record of interacting with stakeholders, engineers, and business groups to facilitate data-driven decision-making.

Overview

years of professional experience

Work History

Data Engineer

Equifax

07.2024 - Current

Developed machine learning-based ETL pipelines, accessed via REST APIs, for processing large volumes of financial and consumer data
Built a scalable data pipeline to process and store 500TB of raw data each week, leveraging Apache NiFi, Kafka, Elasticsearch, Redis, Python, and Go to ensure efficiency and reliability
Designed, validated, and deployed data pipelines across IT/UAT/PROD environments using Agile/Sprint methodologies and JIRA
Performed ETL operations for data cleansing, mapping, and transformation from Azure Data Lake and on-prem SQL databases
Migrated data from Teradata/SQL Server to Hadoop, leveraging dynamic/static partitions and Spark Streaming with Kafka for real-time ingestion
Created comprehensive technical design documents for business stakeholders to align technical implementations with requirements
Built data warehouses with star and snowflake schema designs, creating fact and dimension tables for efficient data analysis
Led the development of a risk assessment model that reduced processing time by 35%, enhancing decision-making efficiency
Collaborated with the underwriting team to tailor credit solutions, resulting in a 15% increase in customer satisfaction
Streamlined credit reporting processes by implementing new software, saving 10 hours of manual work per week
Orchestrated workflows using Apache Airflow and Azure services like Data Lake, Databricks, and Synapse Analytics
Implemented CI/CD pipelines for seamless deployments and automated data workflows using Airflow
Utilized Power BI to develop analytical dashboards, delivering actionable insights to business users
Environment: Python, Microsoft Azure (Synapse Analytics, Data Lake, Data Factory), Teradata, Hadoop, HDFS, Spark, PySpark, Databricks, Kafka, Apache Airflow, Power BI, GIT, SQL Server, Oracle

Data Engineer

ADP

06.2023 - 05.2024

Implement and configure ADP HCM solutions, ensuring seamless integration with clients' existing HR and payroll systems
Provide training and support to HR teams and business leaders on best practices for ADP Workforce Now and ADP Vantage HCM
Optimize payroll and HR processes, reducing processing time and ensuring 100% compliance with federal and state regulations
Conduct workforce analytics to provide insights into employee performance, attrition, and compensation trends
Designed and implemented aggregate data models, consolidating multiple data streams into a unified dataset
Developed and maintained Apache Airflow DAGs, leveraging Docker to streamline local development and deployment
Orchestrated complex ETL workflows using AWS Glue, EMR, and Airflow Providers Libraries, optimizing data ingestion pipelines
Built micro-batch data pipelines using AWS Lambda, SQS, and Salesforce Marketing Cloud APIs for targeted email remarketing
Ingested and transformed data into Snowflake warehouse using Spark and JDBC, implementing role-based user access controls
Enabled advanced data analysis and visualization with Tableau, ensuring seamless integration with APIs and other applications
Developed robust unit tests for PySpark code, achieving high code coverage with Pytest and SonarQube integration
Configured real-time monitoring using AWS CloudWatch and migrated logs to Splunk, integrating PagerDuty for proactive alerting
Environment: Python, Apache Airflow, PySpark, AWS (Glue, EMR, Lambda, S3, Redshift, CloudWatch), Snowflake, Tableau, Docker, Jenkins, Splunk, PagerDuty

Data Engineer

CYIENT

07.2018 - 11.2022

Successfully migrated legacy Hadoop MapReduce and HDFS systems to AWS Cloud, implementing a cloud-native real-time data processing solution
Optimized data workflows by enabling Hive table partitioning and bucketing, improving query performance
Developed AWS Lambda-based real-time event processing pipelines integrated with S3 and SQS
Transformed semi-structured data into Parquet format using Python (Pandas) and migrated it to AWS for scalable storage
Designed and implemented IAM roles and JSON-based access policies, enhancing system security and compliance
Automated database testing and validations with SQL procedures for PostgreSQL and MySQL, ensuring data integrity
Configured CloudWatch Alarms and log groups for real-time monitoring and debugging of AWS Lambda workflows
Managed code migration using GitHub and Bitbucket, ensuring a smooth transition from legacy systems
Environment: Apache Hadoop, HDFS, MapReduce, Hive, Python, AWS (S3, SQS, Lambda, CloudWatch), Pandas, PostgreSQL, MySQL, MongoDB, Bitbucket

Education

Master of Science - Computer And Information Systems Security

Wilmington University

New Castle, DE

12-2024

Skills

Technical Skills

Programming & Scripting:
Python, Scala, PySpark, Bash, Shell, Perl

Cloud Platforms:
Amazon Web Services (AWS), Microsoft Azure

AWS Services:
EC2, S3, Lambda, Route 53, Elastic Beanstalk (EBS), VPC, IAM, ECS (EC2 Container Service), DynamoDB, Auto Scaling, Security Groups, Redshift, CloudWatch, CloudFormation

Azure Services:
Azure Data Bricks, Azure Data Factory (ADF), Blob Storage, Azure SQL, Azure Data Lake

Python Libraries:
NumPy, Matplotlib, SciPy, PySpark, Pandas, BeautifulSoup, Scikit-Learn

Version Control:
Git, GitHub, Bitbucket, SVN

Big Data Technologies:
Spark, Kafka, Nifi, Airflow, Flume, Snowflake, HDFS, MapReduce, Pig, Hive, Sqoop, Oozie

Hadoop Frameworks:
Cloudera CDHs, Hortonworks HDPs

Data Modelling Schemas:
Star Schema, Snowflake Schema

Visualization Tools:
Tableau, Power BI, Excel

Databases:
MySQL, Oracle, MS-SQL Server, Teradata, HBase, Cassandra, DynamoDB, MongoDB

Operating Systems:
Windows, Linux, Unix, MacOS

Timeline

Data Engineer

Equifax

07.2024 - Current

Data Engineer

ADP

06.2023 - 05.2024

Data Engineer

CYIENT

07.2018 - 11.2022

Master of Science - Computer And Information Systems Security

Wilmington University

Saitejareddy Ammakolla

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Education

Master of Science - Computer And Information Systems Security

Skills

Timeline

Data Engineer

Data Engineer

Data Engineer

Master of Science - Computer And Information Systems Security

Similar Profiles

GANGA VENKATA SURYA PRAKASH TATAVARTHIGANGA VENKATA SURYA PRAKASH TATAVARTHI

Latonya DavisLatonya Davis

Ravi MeruguRavi Merugu

Ricardo Guzmán SeguraRicardo Guzmán Segura

Kamran MalikKamran Malik