Summary
Overview
Work History
Education
Skills
Tools & Frameworks
Timeline
Generic

Sravana Sodipilli

Summary

Over 8 years of experience designing, building, and optimizing large-scale data architectures across Google Cloud Platform (GCP), Amazon Web Services (AWS). Proven expertise in cloud-native data warehousing and analytics platforms including BigQuery, Databricks, and Dataflow, with strong proficiency in ETL/ELT pipeline development using Python, Snow SQL, Spark, and PySpark. Skilled in architecting and integrating data from diverse sources — structured, semi-structured, and unstructured — into centralized data lakes and enterprise data warehouses, ensuring high availability, scalability, and security. Optimized SQL queries, data models, and schemas in Snowflake, and Hive to enhance analytics, reporting, and BI performance. Managed real-time data ingestion with AWS Kinesis into S3 and Redshift, improving data freshness and latency. Experienced in implementing partitioning, clustering, materialized views, and query optimization techniques to improve performance and reduce costs across cloud platforms. Strong background in real-time streaming and event-driven architectures using technologies like Pub/Sub, SNS, SQS, and Cloud Functions for low-latency processing. Proficient in infrastructure-as-code (Terraform) and CI/CD automation (Cloud Build, GitLab CI/CD), enabling faster, repeatable deployments of data pipelines and infrastructure. Deployed and managed containerized applications using Kubernetes (EKS/GKE/AKS), ensuring high availability, scalability, and self-healing of services. Deep knowledge of data governance frameworks, metadata management, and compliance standards (GDPR, HIPAA, SOC 2), leveraging tools like Cloud Data Catalog, DLP, and IAM policies. Adept at collaborating with cross-functional teams including data scientists, DevOps engineers, and business analysts to deliver data-driven solutions and actionable insights. Skilled in building interactive, real-time dashboards and BI solutions using Tableau, Power BI to support strategic decision-making.

Overview

10
10
years of professional experience

Work History

Data Engineer

Nationwide Insurance
03.2024 - Current
  • Worked on DataBricks in AWS for Data Processing in Scala/python notebooks.
  • Designed and managed end-to-end ETL/ELT pipelines on AWS, ensuring scalability, reliability, and high performance.
  • Integrated on-premises and cloud data (MySQL, Cassandra, AWS S3) into Snowflake using AWS Glue.
  • Built scalable ingestion pipelines with Apache Kafka, NiFi, and Flume to process high-volume structured and unstructured datasets.
  • Developed batch and streaming workflows using Apache Spark, PySpark, AWS EMR, and Airflow for efficient data transformation.
  • Optimized SQL queries, data models, and schemas in Snowflake, and Hive to enhance analytics, reporting, and BI performance.
  • Implemented data quality checks, Change Data Capture (CDC), and governance practices to ensure accuracy, completeness, and compliance.
  • Automated pipeline deployments using Jenkins, GitLab CI/CD, Terraform, and AWS CloudFormation.
  • Experience in Insurance domain and Guidewire tool
  • Developed serverless solutions with AWS Lambda for real-time, event-driven data processing.
  • Created interactive dashboards in Tableau and Power BI to deliver actionable insights to business and operations teams.
  • Environment: AWS Databricks, Snowflake, AWS Lambda, IAM MS SQL, Oracle, YARN, Spark, Hive, SQL, Python, Scala, PySpark, Spark Performance, data integration, data modeling, data pipelines, production support, Shell scripting, GIT, JIRA, Jenkins, Kafka, Power Bi.

Data Engineer

Freddie Mac, Virginia
04.2023 - 03.2024
  • Built scalable batch and streaming ETL/ELT pipelines using AWS EMR for processing data from APIs, databases, and files.
  • Managed real-time data ingestion with AWS Kinesis into S3 and Redshift, improving data freshness and latency.
  • Optimized queries and data models in Redshift and Hive using partitioning, clustering, and materialized views for performance and cost efficiency.
  • Created reusable ETL templates, PySpark scripts, and AWS Glue jobs for efficient data transformation and analytics.
  • Migrated SQL workloads to Redshift, improving efficiency with UDFs and optimized schema design.
  • Automated pipelines using AWS Lambda and event-driven architectures for reliable, serverless data processing.
  • Designed, developed, and maintained high-quality conceptual, logical, and physical data models for structured and unstructured data sources.
  • Collaborated with business stakeholders to gather and analyze data requirements, translating them into scalable and efficient data structures.
  • Created and managed data dictionaries, metadata repositories, and ER diagrams using tools such as ERwin, SQL Workbench, and DBT.
  • Orchestrated workflows with AWS Step Functions, managing retries, dependencies, and monitoring.
  • Implemented IAM-based access controls, least-privilege security, and governance for AWS resources.
  • Reduced AWS costs by optimizing compute, storage, and eliminating redundant processes.
  • Deployed cloud resources using AWS CloudFormation and automated pipelines with CI/CD tools like GitHub Actions.
  • Documented AWS architecture, pipelines, and runbooks to support operations, governance, and incident management.
  • Environment: Python, Redshift, AWS Glue, AWS Kinesis, AWS EMR , AWS S3, Cloud IAM, Vertex AI, Pub/Sub, GitHub Actions.

Data Engineer

CoForge, India
09.2022 - 02.2023
  • Built ETL pipelines with Google Cloud Dataflow to move and transform financial data into BigQuery.
  • Managed workflows with Cloud Composer and automated real-time data ingestion with Pub/Sub.

Big Data Developer

Mu Sigma, India
02.2018 - 08.2021
  • Developed and maintained data pipelines using Sqoop, Flume, and Kafka to ingest, transform, and process customer behavioral data for analysis.
  • Developed and optimized SQL queries, stored procedures, and data extraction scripts using Toad for Oracle/SQL Server, improving query performance and reducing execution time by 30%.
  • Performed data aggregation and analysis on large-scale datasets using Apache Spark and Hive, resulting in improved insights for the business.
  • Utilized big data ecosystems such as Hadoop, Spark, and Cloudera to load and transform large sets of structured, semi-structured, and unstructured data.
  • Integrated HBase with Hive on the Analytics Zone, creating and optimizing HBase tables for faster and more efficient querying of data.
  • Utilized Hive queries and Spark SQL to analyze and process data, meeting specific business requirements and simulating MapReduce functionalities.
  • Implemented automation for deployments using YAML scripts, resulting in faster and more efficient builds and releases.
  • Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for processing, enhancing data management and processing capabilities.
  • Utilized JIRA to manage issues and project workflow, improving project organization and efficiency.
  • Collaborated with team members to identify and resolve JVM-related issues, resulting in better performance and system stability.
  • Utilized Git as a version control tool to maintain the code repository, ensuring better code management and tracking of changes.
  • Environment: Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, HBASE, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, EC2, Python, PySpark, shell script, Ambari, JIRA.

Data Warehouse Developer

Sedin Technologies, India
01.2016 - 02.2018
  • Designed and implemented database objects (tables, views, indexes, constraints, and schemas) using DDL, ensuring scalability and optimized query performance.
  • Developed and maintained DML scripts (INSERT, UPDATE, DELETE, MERGE) to transform, cleanse, and load large datasets into staging and warehouse environments.
  • Creation, maintaining and supporting the SQL Server databases.
  • Involved in the Data modelling, Physical and Logical Design of Database
  • Helped in integration of the front end with the SQL Server backend.
  • Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results.
  • Import & export of the data from one server to other servers using tools like Data Transformation Services (DTS)
  • Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries for Production database monitoring in 24/7 support.
  • Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
  • Developed, monitored and deployed SSIS packages.
  • Created Linked reports, Ad-hoc reports and etc.. based on the requirement. Linked reports are created in the Report Server to reduce the repetition of the reports.
  • Environment: Microsoft Office, Windows 2007, T-SQL, DTS, SQL Server, HTML, SSIS, SSRS, XML.

Education

Bachelor of Engineering, B.E -

Andhra University College of Engineering
01.2009

Skills

  • Programming Languages: Python, SQL, T-SQL
  • Data Warehousing: BigQuery, Redshift, Snowflakes
  • Cloud Platforms: GCP, AWS
  • AWS Services: AWS s3 , Redshift, EMR, SNS, SQS, aethna, glue, cloudwatch, kenisis, route53, IAM
  • ETL Tools: Informatica PowerCenter, Talend, Microsoft SSIS, Apache NiFi, Cloud Data Fusion
  • Big Data Technologies: Spark, PySpark, Google BigQuery, Hadoop, Pig, Hive, HDFS, Sqoop, Storm, Kafka, Yarn, Oozie, Zookeeper, Pub/Sub
  • Data Processing: Spark, PySpark, Hadoop MapReduce, Spark Streaming, Google Cloud DataProc, Google Cloud Functions, Google Cloud Databricks, Snowflakes
  • Database Management: MS SQL Server, Toad, MySQL Workbench, Oracle, DB2, PostgreSQL
  • Data Integration & Modeling: Data modeling, Physical Datawarehouse design, MySQL schema design, SSIS, Power BI integration
  • Data Visualization: Power BI, Tableau
  • DevOps & CI/CD: GitHub, GitLab, GCP Cloud Build, Jira
  • Automation: GitHub Actions, Google Cloud Composer, Control-M, Oozie
  • Monitoring & Performance: CloudWatch, Google Cloud Monitor, Cloud logging, Performance tuning
  • Security & Compliance: Google IAM, Google KMS encryption, Data access control, GDPR compliance
  • Data Validation: Data quality processes, Error handling mechanisms, Automated data-driven testing
  • Reporting: MS Word, MS Excel, Jupyter Notebooks, SPSS, SAS
  • Data Governance: Informatica Enterprise Data Catalog, Data cleansing routines, Google Cloud Data Catalog

Tools & Frameworks

Apache Airflow, Autosys, Cron, Control-M, Hadoop ecosystem components, Spark Context, Spark YARN

Timeline

Data Engineer

Nationwide Insurance
03.2024 - Current

Data Engineer

Freddie Mac, Virginia
04.2023 - 03.2024

Data Engineer

CoForge, India
09.2022 - 02.2023

Big Data Developer

Mu Sigma, India
02.2018 - 08.2021

Data Warehouse Developer

Sedin Technologies, India
01.2016 - 02.2018

Bachelor of Engineering, B.E -

Andhra University College of Engineering
Sravana Sodipilli