Summary
Overview
Work History
Education
Skills
Certification
Personal Information
Timeline
Generic
Siva Teja

Siva Teja

Atlanta,GA

Summary

With over 9 years of expansive experience, expertise has been cultivated in roles such as Senior Data Engineer, Azure Data Engineer, Big Data Engineer, and SQL/ETL Developer, with specialization in leveraging advanced AWS services like AWS Glue, Lambda, EMR, Data Bricks, Redshift, and Azure services including Azure Data Factory and Azure Synapse Analytics. Proficiency in Python and PySpark scripting ensures the creation of robust, scalable solutions, while experience with real-time processing in Kafka guarantees optimum data flow. Very good at data visualization and data analytics, the management of a variety of databases, including SQL Servers, MySQL, and Cosmos DB, ensures data integrity and availability. Extensive knowledge in big data tools encompasses technologies like Sqoop and Hive, facilitating efficient data acquisition and transformation. Beyond technical skills, strong presentation capabilities aid in effective decision-making processes within cross-functional teams. A commitment to continuous learning ensures staying abreast of the latest tools and technologies. A passion for continuous learning is highlighted by multiple certifications secured, emphasizing proficiency in cloud technologies, and modern data paradigms. This holistic approach to data engineering combines technical acuity with strategic insight, making it a valuable asset in any data-centric role.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Senior AWS Data Engineer

Texas Health And Human Services Commission
Austin, TX
12.2022 - Current
  • Description: The Texas Health and Human Services Commission (HHSC) is committed to providing high-quality healthcare services to the residents of Texas. The Texas Health and Human Services Commission needs to streamline the collection and processing of Medicaid claims data to enhance healthcare utilization analysis, detect fraudulent activities, and facilitate the creation of innovative care improvement initiatives. As a data engineer, a data pipeline was developed to collect and process Medicaid claims data.
  • Roles and responsibilities:
  • Designed a centralized AWS S3-based Medicaid claims data lake with AI-driven metadata classification, storing and managing petabyte-scale structured and unstructured healthcare data efficiently.
  • Developed real-time data ingestion pipelines using AWS Glue, Kinesis Data Streams, and DynamoDB, enabling fraud detection in Medicaid claims with 99% accuracy via Amazon SageMaker and AWS Fraud Detector.
  • Built a scalable Snowflake-based data warehouse with automated Snowpipe ingestion, optimizing multi-terabyte ML training datasets for AI-powered claims processing.
  • Implemented AI-driven data cleansing and deduplication using AWS Glue and PyDeequ, improving Medicaid claims data quality by 35%, and reducing errors by 50%.
  • Engineered predictive analytics models using Amazon Forecast, enabling 20% more accurate Medicaid expenditure predictions, and optimizing funding allocation by 15%.
  • Automated the configuration and establishment of AWS Glue Crawlers to seamlessly discover, catalog, and transform structured data within Amazon S3, resulting in the optimization of data accessibility, analysis, and an enhanced centralized data catalog.
  • Implemented robust data governance practices, including data quality monitoring, metadata management, and compliance enforcement, ensuring data integrity.
  • Developed an AI-powered claim validation engine using Amazon.
  • Comprehend Medical is improving ICD-10 and CPT code accuracy by 60%, reducing incorrect claims.
  • Integrated Amazon Bedrock and AWS Textract to process unstructured Medicaid claims, extracting key insights with 95% precision, and automating document processing by 80%.
  • Processed multi-terabyte datasets using PySpark on AWS EMR, reducing Medicaid claims processing time by 40%, and optimizing compute costs by 23%.
  • Optimized Athena queries using ML-driven optimization techniques, reducing query execution time by 30%, and the cost of Medicaid claims analysis by 25%.
  • Deployed real-time AI-powered fraud detection models using AWS Lambda and API Gateway, lowering fraudulent Medicaid claims by 40% and improving fraud detection response time by 70%.
  • Automated ML model deployment and performance monitoring using SageMaker, Pipelines, AWS CodeDeploy, and Splunk AI-driven logs, ensuring 99.9% uptime for fraud detection models.
  • Integrated Amazon QuickSight with AI-powered analytics, enabling Medicaid administrators to detect abnormal spending patterns instantly, and reduce manual auditing effort by 50%.
  • Integrated AWS CodeDeploy with repositories in GitHub to automate the build and deployment process.
  • Well-versed in Agile methodologies, with a specific focus on the Scrum framework, to optimize workflow, reduce lead times, and enhance team collaboration.
  • Growth: Good understanding of Medicaid fund utilization, reduced fraud and abuse, improved quality of care, and development of new programs and services.

AWS Data Engineer

NRG
Houston, TX
02.2022 - 12.2022
  • Company Overview: NRG is an American energy company specializing in the generation and distribution of electricity and related services
  • Collaborated effectively with cross-functional team members to achieve project goal by fostering open communication, sharing insights, and actively participating in brainstorming sessions and decision-making processes
  • Designed and implemented a robust data pipeline to stream data from Apache kafka into Amazon S3 enabling real-time data processing and analytics
  • Utilized AWS Glue Crawler to crawl and catalog unstructured data from S3 and text files, enhancing analytics readiness
  • Successfully led a Proof of Concept (PoC) initiative to integrate Databricks with Amazon Redshift, showcasing platform's seamless compatibility with existing data warehouse infrastructure
  • Utilized PySpark, Python, and Databricks notebooks to develop and execute data processing workflows, handling large volumes of structured and unstructured data with efficiency and scalability
  • Build AWS Data Lake using Lake Formation to store and manage raw and processed data
  • Designed and implemented a robust data ingestion pipeline leveraging Amazon EMR and Apache Spark to load large volumes of OLAP data from Amazon Data Lake into Amazon Redshift
  • Executed advanced SQL queries in Redshift for data transformation, including aggregation, filtering, and table joins, optimizing data readiness for in-depth analysis
  • Implemented cost-effective strategies for managing EMR clusters and S3 storage costs, saving 17% of budget through utilization of features such as lifecycle policies
  • Developed a comprehensive data migration strategy for moving OLTP data from Data Lake to MySQL, ensuring data consistency
  • Designed and configured MySQL database to support ACID transactions effectively
  • Established and maintained data pipelines that connected data from Redshift and MySQL databases, performed necessary data transformations, aggregations, and cleansing before making the data available for Amazon QuickSight
  • Designed and created interactive and visually compelling data visualizations, reports, and dashboards using Amazon Quick Sight for business intelligence and reporting
  • Utilized Amazon CloudWatch to monitor and leveraged CloudWatch Logs for real-time log monitoring and troubleshooting
  • Utilized Amazon Simple Notification Service (SNS) to design and implement SNS topics, message publishing, and subscriber management to enable real-time notifications
  • Designed and implemented CI/CD pipelines tailored specifically for workflows, automating the end-to-end process of data extraction, transformation, loading, and validation
  • Orchestrated automated deployments of data pipelines and ETL workflows using CI/CD tools Jenkins
  • Worked in Agile Methodology and participated in daily SCRUM
  • NRG is an American energy company specializing in the generation and distribution of electricity and related services
  • Reduced energy costs
  • Improved energy efficiency
  • Reduced greenhouse gas emissions
  • Development of new energy products and services

Azure Data Engineer

Flexera
Bangalore, India
12.2020 - 11.2021
  • Company Overview: Square Up is financial technology company known for its payment processing solutions and mobile point-of-sale systems
  • Managed source data stored in multiple formats, including JSON, CSV, SQL databases and text documents
  • Successfully ingested structured data from on-premises SQL databases and unstructured data using Azure Data Factory, ensuring data consistency and integrity
  • Successfully stored ingested data in Azure Blob Storage, utilizing scalable and cost-effective storage capabilities
  • Joined data from multiple sources or tables to create unified dataset and performed some aggregation functions tasks
  • Orchestrated complex workflows by chaining multiple activities together to create end-to-end data pipelines
  • Implemented data validation checks to ensure data quality and consistency throughout pipeline
  • Migrated unstructured data to Cosmos DB to perform ML Tasks
  • Utilized Azure Data Factory to schedule and trigger HDInsight Spark activities to ingested data from various sources into Azure Blob Storage
  • Created data pipelines using Azure Data Factory to represent batch processing workflows
  • Performed data transformations on batch data, such as filtering, aggregating, joining, and pivoting data by using HDInsight Spark
  • Utilized T-SQL scripts to perform data transformations, including data cleansing and aggregation, and implemented data partitioning strategies within Azure Synapse Analytics
  • Perform advanced analytics using Python, R, or Spark within Azure Synapse Analytics
  • Utilized Azure Monitor, Azure Data Factory and Synapse Management Studio for monitoring and logging
  • Analyzed data within Power BI to represent business concepts and relationships
  • Worked in Agile Environment in scrum frame work
  • Square Up is financial technology company known for its payment processing solutions and mobile point-of-sale systems
  • Improved sales performance
  • Reduced costs
  • Improved customer experience
  • Increased revenue

SQL/ETL Developer

Persistent Systems Limited
Pune, India
02.2018 - 11.2020
  • Company Overview: Persistent Systems Limited, prominent player in finance industry, provides financial services to wide range of clients, including banks, investment firms, and insurance companies
  • Built strong relationships with SMEs and Training and Placement coordinators to better understand their problems and information needs
  • Created procedures for standardizing unprocessed data from various sources in range of file formats, and assisted team members in cleaning and enhancing quality of data
  • Developed mappings in ETL applications like Pentaho Kettle to use transformations like Value Mapper, Expression, Lookup, Aggregate, Update Strategy, and Filters to load data from numerous sources into data warehouse
  • OLAP to OLTP conversions, data staging, data integration, and data cleaning operations were implemented
  • Developed, implemented, and maintained data mining, statistical, and visualization technique using python and pyspark scripts
  • Delivered data integration and design knowledge that enable accurate mapping using SQL and integration of data from numerous reliable sources
  • Using Excel and Pentaho BI Server, reports were created that corresponded with Training Officer's description of business user needs
  • Created automated methods for evolving content of standardized dashboards and interactive data visuals
  • Created Pentaho dashboards that are enticing and interactive that provide crucial business knowledge
  • Audited datasets and data to ensure were continuous, accurate, and consistent
  • Prioritized requirements for data retrieval, visualization, and research and completed them quickly
  • Prepared technical documentation that concisely outline data cleaning and analysis procedures and outcomes
  • Assisted with database application testing and production implementation
  • Managed big datasets, transferred and combined data files from many systems, aggregated data and performed quality control using big data technologies like Hive, Impala, Spark
  • Persistent Systems Limited, prominent player in finance industry, provides financial services to wide range of clients, including banks, investment firms, and insurance companies

Big Data Engineer

HSBC
Hyderabad, India
10.2015 - 01.2018
  • Company Overview: HSBC needs to enhance its fraud detection capabilities to stay ahead of evolving fraudulent schemes
  • Developed data ingestion pipelines using Informatica power center, an ETL tool, and bash scripting with big data technologies such as Hive, Impala, Spark, Kafka, and Informatica
  • Developed ETL (Extract, Transform, Load) pipelines using Sqoop to transfer structured data from RDBMS sources into Hadoop Distributed File System (HDFS)
  • Gathered requirements such as life cycle, data quality check, transformations, and metadata enrichment, for ingestion of new data sources
  • Provided data engineer services to data scientists employing big data technologies, such as data exploration, ad-hoc ingestions, and subject-matter knowledge
  • Using Pyspark and MLlib, created machine learning models to illustrate analytical potential of big data
  • Kafka were used to implement data streaming capabilities for various data sources
  • Utilized Cloudera distribution to process and analyze large-scale datasets, harnessing power of PySpark
  • Experienced in managing and optimizing diverse databases such as Hive, Impala, and Kudu harnessing their distinct features for effective data storage and processing
  • Took part in analysis and resolution of several scenarios' production work challenges
  • Developed UNIX scripts to process data files, design use case processes, and automate jobs
  • HSBC needs to enhance its fraud detection capabilities to stay ahead of evolving fraudulent schemes

Education

Master of Science - Computer Science

University of Missouri - Kansas City
Kansas City, MO

Bachelors of Technology -

Vellore Institute of Technology
Vellore, TN

Skills

AWS: EC2, EMR, Kinesis, Lambda, IAM, Glue, S3, Athena, AWS RDS, Redshift, Schema Conversion Tool, Cloud Formation, Cloud Watch, SNS

Azure: Data Factory, Stream Analytics, Data Lake Gen2, Functions Synapse Analytics, HD Insight, Logic Apps, Monitor, Cosmos DB

AI Services: Amazon SageMaker, AWS Fraud Detector, PyDeequ, Amazon Comprehend, Amazon Bedrock & AWS Textract

Visualization Tools: Power BI, Tableau, Quick Sight

Data Bases: MySQL, PostgreSQL, Oracle, MS SQL Server, CosmosDB, DynamoDB, MongoDB

Big Data Tools: HBase, Sqoop, Spark, Kafka, Scala, HDFS, Hive, Map Reduce, Yarn, Cassandra, Flume

Certification

AWS Certified Data Analytics - Specialty (DAS)

Personal Information

Title: Senior Cloud AI Data Engineer

Timeline

Senior AWS Data Engineer

Texas Health And Human Services Commission
12.2022 - Current

AWS Data Engineer

NRG
02.2022 - 12.2022

Azure Data Engineer

Flexera
12.2020 - 11.2021

SQL/ETL Developer

Persistent Systems Limited
02.2018 - 11.2020

Big Data Engineer

HSBC
10.2015 - 01.2018

Master of Science - Computer Science

University of Missouri - Kansas City

Bachelors of Technology -

Vellore Institute of Technology
Siva Teja