Summary
Overview
Work History
Education
Skills
Timeline
Hi, I’m

RAKESHKUMAR REDDY POREDDY

Austin,TX

Summary

**Professional Summary**

Accomplished **Senior Data Engineer** with over **9 years of experience** in data-intensive environments, specializing in **SQL database development, cloud data solutions**, and **real-time data processing**. Proven expertise in **MySQL, PostgreSQL, and Amazon Redshift**, optimizing query performance and enhancing data warehousing capabilities to enable advanced analytics. Experienced in **Apache Kafka** for real-time data streaming, supporting timely data-driven decision-making.

Proficient in **cloud-based data solutions** on AWS, leveraging services like **AWS S3, DynamoDB**, and **Redshift** for scalable storage, retrieval, and big data analytics. Skilled in **Azure SQL Database**, **Cosmos DB**, and **Azure HDInsight**, enhancing big data processing and analytics within secure cloud environments. Demonstrated expertise in developing and orchestrating **ETL pipelines** using **Informatica, Apache NiFi**, and **Airflow**, ensuring efficient and accurate data flow across systems.

Hands-on experience with **Databricks** for advanced data engineering and machine learning workflows, incorporating the **Snowflake schema** to maximize query efficiency. Adept in **deploying machine learning models** using **PyTorch**, enabling predictive analytics and advanced decision support systems. Expertise in **stream analytics** using **Azure Stream Analytics** and **PostgreSQL** for real-time insights.

Proficient in **Terraform** for Infrastructure as Code (IaC), automating cloud infrastructure management, and supporting CI/CD operations. Experienced with **Git** for version control, maintaining code integrity in collaborative environments. Skilled in **data backup, recovery**, and **security compliance**, ensuring data integrity and adherence to regulatory standards.

Highly proficient with **data visualization tools** like **Power BI, Tableau**, and **AWS QuickSight**, delivering actionable insights through interactive dashboards. Demonstrated ability to manage large datasets using **Apache Hadoop** and **MongoDB**, optimizing data storage and retrieval in complex architectures. Adept at **workflow automation** and **collaborative analytics** using **DBT, Pandas**, and **JIRA**, ensuring agile delivery of data projects.

Continuously engaged in **data governance** and **data lifecycle management**, with a focus on leveraging **Databricks and Terraform** to modernize data architectures and ensure strategic alignment with business goals. Recognized for driving **end-to-end data engineering solutions** across cloud environments, delivering reliable, scalable, and efficient data platforms aligned with the latest industry standards.

Overview

9
years of professional experience

Work History

German Town Technologies

Sr Data Engineer
04.2023 - Current

Job overview

  • Implemented data pipelines for large-scale healthcare data analysis using AWS EMR and Snowflake, enhancing data accessibility and analysis capabilities
  • Developed predictive models to improve patient care outcomes, utilizing Python and SQL in conjunction with collaborative analytics tools
  • Engineered real-time data streaming solutions with Apache Kafka and AWS S3 to facilitate instant access to healthcare data for timely decision-making
  • Automated data processing workflows using Airflow, increasing operational efficiency and reducing manual errors in data handling
  • Developed complex data pipelines using Databricks, optimizing data flow from multiple sources into structured and unstructured data lakes
  • Leveraged Apache Spark on Databricks for large-scale data processing, achieving significant performance improvements and cost reductions
  • Spearheaded the integration of PySpark with AWS services (S3, EMR, Redshift) to enhance data storage and retrieval processes
  • Utilized PySpark with Kafka to build real-time data streaming applications, improving data availability and decision-making capabilities
  • Implemented machine learning algorithms with PySpark MLlib to predict trends and behaviors from large datasets, enhancing business strategies
  • Led projects that integrated data from various sources into MongoDB, enhancing data storage and retrieval processes for critical healthcare applications
  • Utilized Python to create scripts that automated routine data cleaning and preparation tasks, streamlining the data pipeline
  • Implemented data integration solutions using AWS Glue to consolidate disparate data sources, simplifying data management and improving data quality
  • Enhanced data security and compliance in Databricks environments by implementing robust security measures and monitoring tools
  • Trained and mentored junior data engineers and analysts on best practices and efficient use of Databricks in data engineering projects
  • Collaborated with cross-functional teams to define data requirements and deliver scalable data solutions using Databricks
  • Optimized SQL queries and database schemas in PostgreSQL and Teradata, significantly improving performance and scalability of healthcare data applications
  • Deployed DBT for data transformation tasks within cloud environments, enabling more efficient data modeling and reporting
  • Coordinated with clinical staff to leverage analytics in operational and patient care strategies, using Pandas to analyze and interpret data
  • Configured and managed data streaming architectures using Apache Kafka, ensuring robust data flow and integration for real-time analytics
  • Fostered a collaborative environment using Databricks Notebooks, facilitating sharing insights and working closely with data scientists to refine analytics models
  • Implemented data governance practices using Databricks Unity Catalog, ensuring compliance with data security and privacy standards
  • Conducted data migration projects to AWS EMR, ensuring seamless transitions with minimal downtime and no data loss
  • Developed and maintained robust data security protocols using AWS technologies, safeguarding sensitive healthcare data against potential breaches
  • Integrated machine learning algorithms with existing data systems using Python and Pandas, enhancing predictive analytics capabilities
  • Enhanced data visualization and reporting capabilities using Python and SQL, enabling healthcare professionals to access tailored dashboards
  • Automated various data processes using Python scripting, significantly reducing the time required for data preparation and loading
  • Performed extensive data cleaning and normalization to improve the quality of healthcare analytics, using Python and SQL
  • Architected a data lake solution utilizing AWS S3 as the primary storage, paired with AWS Glue and Athena for efficient serverless querying
  • Enhanced data security for AWS S3 storage by implementing comprehensive bucket policies, IAM roles, and enforcing encryption standards both in transit and at rest
  • Automated data transformation and cleansing processes using Python scripts and AWS Lambda, storing outputs in AWS S3 to ensure high-quality data for analytics
  • Managed large datasets using MongoDB, optimizing data architecture for better performance and scalability in healthcare applications
  • Integrated real-time and batch data processing using Apache Kafka and AWS EMR, balancing workload and improving processing time
  • Deployed and optimized DBT models in cloud environments, enhancing data transformation and load processes for complex healthcare datasets
  • Implemented automated monitoring with Prometheus and Grafana in CI/CD workflows, enabling proactive issue detection and resolution in data pipelines
  • Utilized Ansible for configuration management, automating the setup and maintenance of Hadoop clusters integrated with CI/CD pipelines
  • Architected a CI/CD strategy for a multi-cloud environment using Spinnaker, improving deployment strategies across AWS platform
  • Assisted in the development of a collaborative analytics platform, facilitating data sharing and decision-making across healthcare teams
  • Implemented robust backup and disaster recovery solutions using AWS technologies, ensuring high availability and data integrity
  • Conducted performance tuning of healthcare databases in PostgreSQL and Teradata, ensuring optimal performance during critical data operations
  • Streamlined data ingestion and integration using AWS Glue and Python, enhancing the speed and efficiency of data flows into healthcare analysis systems
  • Environment: AWS EMR, Snowflake, Python, SQL, Databricks, Apache Kafka, AWS S3, Airflow, MongoDB, AWS Glue, PostgreSQL, Teradata, DBT, Pandas

QBE

Data Engineer
01.2021 - 03.2023

Job overview

  • Automated data transformations using Terraform and Python, enhancing data consistency and reducing manual intervention across insurance datasets
  • Improved data quality and integration using Informatica and Apache NiFi, which facilitated accurate reporting and analytics in financial services
  • Implemented real-time data streaming solutions with Databricks Structured Streaming and Apache Kafka to enhance data availability and decision-making processes
  • Leveraged Docker and Kubernetes to create reproducible environments for data pipelines, facilitating consistent deployments across development, testing, and production stages
  • Developed Terraform scripts to automate the provisioning of cloud infrastructure on AWS, ensuring scalable and resilient data engineering solutions
  • Led the migration of legacy data systems to Hadoop-based platforms, leveraging PySpark for efficient data transformation and aggregation
  • Utilized Databricks SQL Analytics to provide actionable insights through dashboards and visual reports, enhancing business decision-making
  • Led migration projects to Databricks, achieving seamless transitions from legacy systems and enabling cloud-based data analytics capabilities
  • Utilized Databricks MLflow to manage the machine learning lifecycle, including experimentation, reproducibility, and deployment of ML models
  • Deployed machine learning models for insurance risk assessment using PyTorch, integrated with AWS Quick Sight for dynamic visualization
  • Optimized costs by analyzing and reconfiguring data storage and retrieval practices in AWS S3, applying lifecycle policies to transition to cost-effective storage tiers
  • Leveraged AWS S3 event notifications to automate and trigger downstream processing in AWS Lambda, enhancing responsiveness in event-driven data architectures
  • Created a secure, scalable multi-tenant data storage framework using AWS S3, improving data access control, governance, and compliance with international standards
  • Managed Agile project implementations, utilizing JIRA for project tracking, ensuring timely delivery of data enhancements and upgrades
  • Developed cost-effective solutions on Databricks by optimizing cluster management and leveraging spot instances
  • Ensured data security by implementing ACLs and role-based access control within Databricks environments
  • Designed data pipelines that combined AWS Glue with Python scripting to automate and simplify data ingestion and integration processes
  • Conducted data quality assessments using SQL and Python, ensuring high data integrity and supporting compliance with regulatory standards
  • Optimized data storage and processing using AWS DynamoDB, improving performance and scalability for large insurance datasets
  • Implemented version control best practices using Git, enhancing collaboration and code management across the data engineering team
  • Enhanced data visualization capabilities using AWS Quick Sight, enabling stakeholders to derive actionable insights from complex datasets
  • Streamlined data migration processes using Terraform, ensuring efficient and error-free transitions between different storage platforms
  • Facilitated the integration of structured and unstructured data using Apache NiFi, supporting comprehensive analytics in insurance applications
  • Developed real-time data feeds using Apache Kafka, enabling immediate data availability for timely decision-making in risk management
  • Orchestrated data workflows using Informatica, automating complex transformations and loading processes in the insurance data environment
  • Designed and implemented robust backup and disaster recovery strategies using AWS technologies, ensuring data availability and continuity
  • Automated repetitive data processing tasks using Python, significantly improving efficiency and reducing processing times
  • Configured AWS S3 for optimized data storage and retrieval, effectively managing large volumes of insurance transaction data
  • Conducted performance tuning of AWS DynamoDB instances, optimizing response times and resource utilization for critical applications
  • Implemented security measures in data handling and storage using AWS security tools, safeguarding sensitive insurance customer data
  • Utilized Apache NiFi for efficient data ingestion and distribution, enhancing the flow of information across insurance processes
  • Enhanced team collaboration and project management using Agile methodologies and JIRA, improving overall project visibility and tracking
  • Developed and maintained documentation for data processes and systems using Confluence, ensuring knowledge sharing and continuity
  • Supported the training of team members on new technologies and data processes, fostering a culture of continuous learning and improvement
  • Environment: AWS Glue, AWS S3, AWS DynamoDB, Databricks, Terraform, Python, Informatica, Apache NiFi, PyTorch, AWS QuickSight, JIRA, SQL, Git, Apache Kafka, Confluence

PNC Bank

Data Engineer
10.2018 - 12.2020

Job overview

  • Designed data ingestion solutions using Apache NiFi, optimizing the flow of financial data for analytics purposes
  • Configured AWS DynamoDB to provide scalable and reliable data storage solutions tailored for the financial sector
  • Utilized Terraform to automate cloud infrastructure deployments, enhancing operational agility and system reliability
  • Deployed data applications in Docker containers to improve scalability and manageability of financial services applications
  • Developed Python scripts for data transformation, streamlining the processing tasks within the financial data environments
  • Integrated AWS S3 for secure and scalable data storage, supporting extensive data management solutions in finance
  • Employed Git for version control, ensuring the integrity and collaboration in the development of financial data projects
  • Orchestrated automated data pipelines using Informatica, improving data quality and efficiency in data handling
  • Designed and implemented robust data pipelines in Databricks, integrating Apache Spark for real-time data processing and analytics
  • Skilled in using Databricks MLflow to manage machine learning lifecycle, including experimentation, reproducibility, and deployment
  • Leveraged AWS QuickSight for developing insightful dashboards and visualizations, aiding financial decision-making processes
  • Configured AWS DynamoDB for high-performance data operations, enhancing data retrieval and storage in financial applications
  • Managed agile project cycles using JIRA, ensuring timely delivery of data engineering projects in the finance domain
  • Implemented data validation and testing using PyTorch, ensuring accuracy and reliability of predictive models in finance
  • Applied data encryption and security measures in AWS, ensuring compliance with financial regulations and data privacy standards
  • Developed and maintained data warehouses using AWS technologies, facilitating complex financial analyses and reporting
  • Implemented automated monitoring and alerting mechanisms in Databricks, ensuring high availability and performance of data processes
  • Developed custom UDFs (User Defined Functions) in Databricks for specific business logic integration, enhancing data transformation capabilities
  • Utilized Tableau for complex data visualization tasks, enhancing the presentation and accessibility of financial data
  • Enhanced data workflows with Apache Kafka, improving the real-time data streaming capabilities in financial operations
  • Configured automation frameworks using Terraform, streamlining infrastructure management for financial services
  • Applied continuous integration and deployment practices using Docker, enhancing the reliability of financial data applications
  • Utilized AWS DynamoDB streams to capture real-time changes in financial data, enhancing data accuracy and timeliness
  • Orchestrated data migrations to AWS cloud environments, ensuring seamless transitions and minimal downtime
  • Enhanced operational efficiency and data processing using AWS Glue, automating data integration tasks in the financial domain
  • Environment: Apache NiFi, AWS Glue, DynamoDB, Terraform, Docker, Python, Git, Informatica, AWS S3, QuickSight, JIRA, PyTorch, Tableau, Apache Kafka, and AWS DynamoDB

Exl service.com (I) Pvt.Ltd

Data Quality Analyst
04.2017 - 06.2018

Job overview

  • Analyzed data quality and implemented enhancements using Python and SQL, utilizing Apache Hive for big data management and insights
  • Designed visualizations in QlikView and Tableau, facilitating insightful business reports and dashboards that drove strategic decisions
  • Managed version control for data projects using Subversion (SVN), ensuring code integrity and facilitating collaborative development
  • Supported business intelligence initiatives with robust data models using Power BI, enhancing reporting capabilities across departments
  • Leveraged Apache Hadoop for efficient processing of large datasets, improving data access and analysis for client projects
  • Developed and maintained data warehouses using SQL, ensuring structured data storage and efficient data retrieval
  • Configured and utilized Apache Hive to handle complex data queries, optimizing data operations and analytics
  • Implemented data governance and compliance measures, ensuring data integrity and security were maintained
  • Conducted data visualization workshops using Tableau and QlikView, enhancing team skills and data presentation techniques
  • Optimized data retrieval processes using custom SQL queries, reducing processing times and improving response rates
  • Performed data migrations and integrations using Apache Hadoop, ensuring data consistency and accuracy
  • Developed SQL scripts for database management and report generation, enhancing operational efficiency and data usability
  • Utilized Power BI to develop interactive and automated reporting solutions, increasing data accessibility for non-technical users
  • Assisted in database tuning and performance optimization, ensuring high performance and reliability of data operations
  • Engaged in agile project management practices, utilizing JIRA to manage tasks and monitor project progress
  • Delivered comprehensive data analysis reports to stakeholders, providing insights that influenced key business strategies
  • Environment: Python, SQL, Apache Hive, QlikView, Tableau, Subversion (SVN), Power BI, Apache Hadoop, JIRA

High Radius Technologies

SQL Developer
07.2015 - 03.2017

Job overview

  • Developed SQL databases using MySQL and PostgreSQL, enhancing data operations for business analytics and reporting
  • Utilized Talend for data integration and transformation, supporting analytics and business intelligence activities
  • Managed data backups and recovery procedures using Git, ensuring data integrity and high availability
  • Implemented dashboarding solutions with Power BI, delivering actionable insights to enhance business decision-making
  • Conducted performance tuning on SQL databases to ensure optimal operation and access speeds
  • Developed and maintained documentation for database designs and data management processes
  • Designed and executed SQL queries for data analysis and reporting, supporting various business units
  • Collaborated with business analysts to understand data requirements and deliver tailored database solutions
  • Participated in data migration projects, ensuring seamless data transfers with minimal downtime
  • Assisted in the setup and configuration of PostgreSQL databases, optimizing settings for performance and security
  • Developed custom data extraction and reporting tools using Python, enhancing data accessibility and user engagement
  • Trained junior developers in database management and ETL processes, fostering skill development within the team
  • Monitored and resolved database performance issues, ensuring stable and efficient data operations
  • Engaged in project meetings to provide updates on database health and data management strategies
  • Environment: MySQL, PostgreSQL, Talend, Git, Power BI, Python

Education

Gurunanak Institutions Technical Campus
Hyderabad

Bachelor of Science from Computer Science Engineering

University Overview

Add Details

Skills

  • Python
  • SQL
  • Scala
  • MySQL
  • PostgreSQL
  • Teradata
  • MongoDB
  • Azure Cosmos DB
  • AWS DynamoDB
  • Talend
  • Apache Airflow
  • Apache Kafka
  • AWS Glue
  • Informatica
  • Apache NiFi
  • AWS
  • Azure
  • GCP
  • Power BI

  • Tableau
  • QlikView
  • AWS Quick Sight
  • ETL
  • Data Streaming
  • Machine Learning
  • Automation
  • Agile
  • Version Control
  • Git
  • SVN
  • DBT
  • Pandas
  • Kubeflow
  • Terraform
  • Docker
  • Apache Hadoop
  • Snowflake
  • Python Programming

Timeline

Sr Data Engineer

German Town Technologies
04.2023 - Current

Data Engineer

QBE
01.2021 - 03.2023

Data Engineer

PNC Bank
10.2018 - 12.2020

Data Quality Analyst

Exl service.com (I) Pvt.Ltd
04.2017 - 06.2018

SQL Developer

High Radius Technologies
07.2015 - 03.2017

Gurunanak Institutions Technical Campus

Bachelor of Science from Computer Science Engineering
RAKESHKUMAR REDDY POREDDY