Summary
Overview
Work History
Education
Skills
Timeline
Generic

Srinivasa Rao

Plano,TX

Summary

  • 8+ years of experience with handling large datasets by designing complex frameworks and algorithms using Hadoop, Big Data, AWS services, RDBMS databases and Business Intelligence Tools
  • Strong experience in performing Data Cleansing, Data Wrangling and Data Maksing with Bigdata ETL frameworks built using SPARK and SCALA.
  • Hands on Experience in working with Hadoop ecosystem components like Hive, HDFS, Pig, Sqoop, Map Reduce, Oozie.
  • Experience in building framework using PySpark to extract data from PostgresSQL database and saves output file in s3 bucket with data masking and reusability capabilities.
  • Model, lift and shift custom SQL and transpose LookML into dbt for materializing incremental views.
  • Designed, build and managed ELT data pipeline, leveraging Airflow, python, dbt, Stitch Data and AWS solutions.
  • Hands on expertise with AWS services such as EMR, EC2, S3, Redshift and IAM.
  • Proficient in using Bigdata tools such as Pig & Hive for data analysis, Qlik Enterprise Manager for data ingestion, Airflows for
    scheduling and Zookeeper for coordinating cluster resources.
  • Experience in handling structured, unstructured and semi-structured data using various Hadoop file
    formats like Parquet, ORC, AVRO, dat, text, Json, CSV and deflate.
  • Experience in writing extensive Snow-SQL queries to do transformations on the data to be used by
    downstream models.
  • Experience in migrating on Premises ETL process (Teradata) to Cloud (Snowflake with AWS).
  • Experience in designing complex workflows and schedule using Airflows, Arow and Control-M.
  • Ability to work on complex data structures, dashboards and ad hoc reporting.
  • Strong experience in writing custom shell scripts to handle adhoc requirements for inbound and outbound
    file transfers.
  • Collaborate with cross-functional departments and distributed teams on large initiatives.
  • Very good understanding of Teradata Architecture and Utilities such as Teradata Parallel Transporter.
  • Experience in preparing technical design documents which includes Metadata, BDQ, Dependency and
    ILDM required for SRE teams.
  • Experience in using Nebula and Exchange for Metadata registration for various datasets.
  • Expertise in database performance tuning by implementing parallel Execution, Partitions, materialized views and query rewriting, creating appropriate indexes, usage of hints, re-building indexes and used the Explain Plan and SQL Tracing.
  • Good knowledge on Agile Methodology and scrum process.
  • Astute [Job Title] with data-driven and technology-focused approach. Communicates clearly with stakeholders and builds consensus around well-founded models. Talented in writing applications and reformulating models.

Overview

9
9
years of professional experience

Work History

Big Data Developer

Cigna
01.2023 - Current
  • Working on building data pipelines for transforming the streaming data from internal and external sources and loading it into the AWS S3 data lake and then to snowflake data warehouse
  • Working on creating a framework that detects and masks the Customer’s Non-public Personal Information (NPI data) and Payment Card Industry (PCI data) that resides in AWS S3 and Snowflake warehouse
  • Co-ordinated with in configuration (Source & Aiven Connectors) setup of Kafka for real-time streams into S3 & Snowflake
  • Built Snowpipe pipelines for continuous data load to AWS S3 and Snowflake Datawarehouse
  • Working on validating the data between SDP (Streaming Data Platform) and AWS S3 to check whether there are any data gaps (Data missing & Data mismatch)
  • Worked on the ETL jobs (FiveTran,qlik enterprise manager) to migrate data from on premise to AWS cloud S3 and snowflake by generating JSON and CSV files to support Catalog API integration.
  • Working on creating the new relic dashboard to generate the alerts in case of any ongoing job failures
  • Working on onboarding the qlik enterprise manager tool which compares the data between the
    Source and Destination
  • Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation and aggregation technique,Knowledge of tools such as DBT
  • I have experience on DBT provides a few benefits for data engineering teams. It allows data engineers to write modular, reusable code using SQL, which can be version controlled and tested like any other software code. DBT also provides several built-in features for data modeling, such as automatic type inference, schema management, and data lineage tracking
  • Working with PySpark for encrypting sensitive Data residing in the history and the ongoing data set files
  • Extensively working on vscode for debugging purpose.
    Preparing of technical design documents and detailed design documents
  • Created a wrapper shell script for each of the framework developed in PySpark and provided it as an input to the Airflows jobs.
  • Created a shell script to load the historical data which resides in S3 buckets to snowflake.
  • Setup the replication and clone for large tables by splitting into multiple tables based on the partitions to migrate data into Snowflake data warehouse
  • Following Agile methodology and SCRUM meetings to track, optimize features to customer needs
  • Used Maximized Warehouse cluster while running the queries and tested with different queries each time using multiple warehouse sizes


Environment: Scala, Java, Snow Sql, Python, JSON, Snowflake, SQL, Airflows,Qlik enterprise manager,Fivetran,Snow pipe,Shell Scripting, GIT, AWS services (S3, EMR,EC2 &IAM), Nebula, Exchange, Agile.

Big Data Developer

HDFC
06.2017 - 11.2021
  • Constructed a data pipeline to process semi-structured data by incorporating 100 million raw records from 14 data sources
  • Designed the data pipeline architecture for a new product that quickly scaled from 0 to 60,000 daily users
  • Integrated data from multiple third party APIs that provided data around local language preferences, leading to customized landing pages that improved paid conversion rate by 6%.
  • Ingested streaming and transactional data across 9 diverse primary data source using Spark, Redshift, S3, and Python
  • Created Python library to parse and reformat data from external vendors, reducing error rate in the data pipeline by 12%.
  • Automate ETL processes across billions of rows of data, which saved 45 hours of manual hours per month
  • Experience performing root cause analysis on internal and external data and processes to answer specific business questions.
  • Experience with building processes supporting data transformation, data structures, metadata, dependency and workload management.
  • Designed and developed scalable solutions for storing and processing large amounts of data across multiple regions.
  • Analyzed the business requirements and translate them into technical specifications that can be used by developers to implement new features or enhancements.
  • I have experience on DBT provides a few benefits for data engineering teams. It allows data engineers to write modular, reusable code using SQL, which can be version controlled and tested like any other software code. DBT also provides several built-in features for data modeling, such as automatic type inference, schema management, and data lineage tracking
  • Provided support during all phases of development including design, implementation, testing, deployment and maintenance of applications/services.
  • Participated in cross-functional teams (e.g., infrastructure engineering) when required to ensure effective communication between groups with overlapping functionality or shared resources.
  • Developed and implemented data pipelines using AWS services such as Kinesis, S3, EMR, Athena, Redshift to process petabyte-scale data in real time.
  • Implemented a data warehouse using Redshift to store and analyze terabytes of raw data
  • Built ETL processes in Python, Pig, and SQL to transform unstructured data into structured datasets
  • Developed an automated machine learning system that reduced manual labor by 80%
  • Created custom dashboards with Tableau for real-time monitoring of key business metrics
  • Spearheaded the migration from on-premise servers to AWS cloud infrastructure (EC2, S3, RDS)
  • Conducted data analysis to support business decision-making by extracting, cleansing, and manipulating data from various sources.
  • Created data visualizations to communicate complex data sets in an easily understandable format for business users.

SQL/BI Developer

Riosoft Technologies
05.2014 - 05.2017

Responsibilities:

  • Designed, developed and maintain BI solutions using SQL, including data warehouse and data mart structures.
  • Created and maintained SQL-based ETL (Extract, Transform, Load) processes to extract, clean and load data into the data warehouse.
  • Collaborated with stakeholders and other teams to understand business requirements and design BI solutions that meet their needs.
  • Created and maintained SQL-based reports and analytics using tools like SSRS (SQL Server Reporting Services), Power BI and Tableau Monitored installation and operations to consistently meet customer requirements.
  • Created data models and design database schemas to support reporting and analytics.
  • Troubleshooted and debuged BI issues, identifying and resolving data and performance issues.
  • Optimized SQL queries for performance and scalability.
  • Continuously improved the BI development process by researching and experimenting with new tools and technologies.
  • Worked with other BI developers and IT teams to design, develop and implement security, backup and recovery procedures.
  • Provided technical guidance and mentorship to other team members, and act as a subject matter expert on BI development using SQL.

Environment:SQLServer Business Intelligence Development Studio, PL/SQL, SQL Server, Oracle, Power BI,Tableau, MS Office, Windows

Education

Bachelor of Science - Bachelor of Engineering

Acharya Nagarjuna University
India

Master of Science - Information Technology

Auburn University-Montgomery
Montgomery, AL
12.2022

Skills

Databases : Teradata 14, Oracle 10g, SQL Server, DB2, MS Access, Snowflake, Mongo DB, Cassandra

Big Data Ecosystem : HDFS, MapReduce, PIG, HIVE, Spark, Sqoop, Oozie, Zoo Keeper

ETL Tools: Sql Server Integration Services, Talend, Data Stage, DBT, Qlik Enterprise Manager, Five Tran, Snow Pipe

Languages : Scala, Python, Java, SQL, PLSQL, MDX

Schedulers : Arow, Control-M, Oozie, Crontab,Airflows

Cloud Services : AWS EMR, EC2, Simple Storage System(S3), IAM

Methodologies : Agile, Waterfall

CI/CD Tools : GIT-HUB, Jenkins, Puppet, Chef



Timeline

Big Data Developer

Cigna
01.2023 - Current

Big Data Developer

HDFC
06.2017 - 11.2021

SQL/BI Developer

Riosoft Technologies
05.2014 - 05.2017

Bachelor of Science - Bachelor of Engineering

Acharya Nagarjuna University

Master of Science - Information Technology

Auburn University-Montgomery
Srinivasa Rao