Summary

Overview

Work History

Education

Skills

Certification

Personal Information

Timeline

Siva Teja

Atlanta,GA

Summary

With over 9 years of expansive experience, expertise has been cultivated in roles such as Senior Data Engineer, Azure Data Engineer, Big Data Engineer, and SQL/ETL Developer, with specialization in leveraging advanced AWS services like AWS Glue, Lambda, EMR, Data Bricks, Redshift, and Azure services including Azure Data Factory and Azure Synapse Analytics. Proficiency in Python and PySpark scripting ensures the creation of robust, scalable solutions, while experience with real-time processing in Kafka guarantees optimum data flow. Very good at data visualization and data analytics, the management of a variety of databases, including SQL Servers, MySQL, and Cosmos DB, ensures data integrity and availability. Extensive knowledge in big data tools encompasses technologies like Sqoop and Hive, facilitating efficient data acquisition and transformation. Beyond technical skills, strong presentation capabilities aid in effective decision-making processes within cross-functional teams. A commitment to continuous learning ensures staying abreast of the latest tools and technologies. A passion for continuous learning is highlighted by multiple certifications secured, emphasizing proficiency in cloud technologies, and modern data paradigms. This holistic approach to data engineering combines technical acuity with strategic insight, making it a valuable asset in any data-centric role.

Overview

years of professional experience

Certification

Work History

Senior AWS Data Engineer

Texas Health And Human Services Commission

Austin, TX

12.2022 - Current

Description: The Texas Health and Human Services Commission (HHSC) is committed to providing high-quality healthcare services to the residents of Texas. The Texas Health and Human Services Commission needs to streamline the collection and processing of Medicaid claims data to enhance healthcare utilization analysis, detect fraudulent activities, and facilitate the creation of innovative care improvement initiatives. As a data engineer, a data pipeline was developed to collect and process Medicaid claims data.
Roles and responsibilities:
Designed a centralized AWS S3-based Medicaid claims data lake with AI-driven metadata classification, storing and managing petabyte-scale structured and unstructured healthcare data efficiently.
Developed real-time data ingestion pipelines using AWS Glue, Kinesis Data Streams, and DynamoDB, enabling fraud detection in Medicaid claims with 99% accuracy via Amazon SageMaker and AWS Fraud Detector.
Built a scalable Snowflake-based data warehouse with automated Snowpipe ingestion, optimizing multi-terabyte ML training datasets for AI-powered claims processing.
Implemented AI-driven data cleansing and deduplication using AWS Glue and PyDeequ, improving Medicaid claims data quality by 35%, and reducing errors by 50%.
Engineered predictive analytics models using Amazon Forecast, enabling 20% more accurate Medicaid expenditure predictions, and optimizing funding allocation by 15%.
Automated the configuration and establishment of AWS Glue Crawlers to seamlessly discover, catalog, and transform structured data within Amazon S3, resulting in the optimization of data accessibility, analysis, and an enhanced centralized data catalog.
Implemented robust data governance practices, including data quality monitoring, metadata management, and compliance enforcement, ensuring data integrity.
Developed an AI-powered claim validation engine using Amazon.
Comprehend Medical is improving ICD-10 and CPT code accuracy by 60%, reducing incorrect claims.
Integrated Amazon Bedrock and AWS Textract to process unstructured Medicaid claims, extracting key insights with 95% precision, and automating document processing by 80%.
Processed multi-terabyte datasets using PySpark on AWS EMR, reducing Medicaid claims processing time by 40%, and optimizing compute costs by 23%.
Optimized Athena queries using ML-driven optimization techniques, reducing query execution time by 30%, and the cost of Medicaid claims analysis by 25%.
Deployed real-time AI-powered fraud detection models using AWS Lambda and API Gateway, lowering fraudulent Medicaid claims by 40% and improving fraud detection response time by 70%.
Automated ML model deployment and performance monitoring using SageMaker, Pipelines, AWS CodeDeploy, and Splunk AI-driven logs, ensuring 99.9% uptime for fraud detection models.
Integrated Amazon QuickSight with AI-powered analytics, enabling Medicaid administrators to detect abnormal spending patterns instantly, and reduce manual auditing effort by 50%.
Integrated AWS CodeDeploy with repositories in GitHub to automate the build and deployment process.
Well-versed in Agile methodologies, with a specific focus on the Scrum framework, to optimize workflow, reduce lead times, and enhance team collaboration.
Growth: Good understanding of Medicaid fund utilization, reduced fraud and abuse, improved quality of care, and development of new programs and services.

AWS Data Engineer

NRG

Houston, TX

02.2022 - 12.2022

Company Overview: NRG is an American energy company specializing in the generation and distribution of electricity and related services
Collaborated effectively with cross-functional team members to achieve project goal by fostering open communication, sharing insights, and actively participating in brainstorming sessions and decision-making processes
Designed and implemented a robust data pipeline to stream data from Apache kafka into Amazon S3 enabling real-time data processing and analytics
Utilized AWS Glue Crawler to crawl and catalog unstructured data from S3 and text files, enhancing analytics readiness
Successfully led a Proof of Concept (PoC) initiative to integrate Databricks with Amazon Redshift, showcasing platform's seamless compatibility with existing data warehouse infrastructure
Utilized PySpark, Python, and Databricks notebooks to develop and execute data processing workflows, handling large volumes of structured and unstructured data with efficiency and scalability
Build AWS Data Lake using Lake Formation to store and manage raw and processed data
Designed and implemented a robust data ingestion pipeline leveraging Amazon EMR and Apache Spark to load large volumes of OLAP data from Amazon Data Lake into Amazon Redshift
Executed advanced SQL queries in Redshift for data transformation, including aggregation, filtering, and table joins, optimizing data readiness for in-depth analysis
Implemented cost-effective strategies for managing EMR clusters and S3 storage costs, saving 17% of budget through utilization of features such as lifecycle policies
Developed a comprehensive data migration strategy for moving OLTP data from Data Lake to MySQL, ensuring data consistency
Designed and configured MySQL database to support ACID transactions effectively
Established and maintained data pipelines that connected data from Redshift and MySQL databases, performed necessary data transformations, aggregations, and cleansing before making the data available for Amazon QuickSight
Designed and created interactive and visually compelling data visualizations, reports, and dashboards using Amazon Quick Sight for business intelligence and reporting
Utilized Amazon CloudWatch to monitor and leveraged CloudWatch Logs for real-time log monitoring and troubleshooting
Utilized Amazon Simple Notification Service (SNS) to design and implement SNS topics, message publishing, and subscriber management to enable real-time notifications
Designed and implemented CI/CD pipelines tailored specifically for workflows, automating the end-to-end process of data extraction, transformation, loading, and validation
Orchestrated automated deployments of data pipelines and ETL workflows using CI/CD tools Jenkins
Worked in Agile Methodology and participated in daily SCRUM
NRG is an American energy company specializing in the generation and distribution of electricity and related services
Reduced energy costs
Improved energy efficiency
Reduced greenhouse gas emissions
Development of new energy products and services

Azure Data Engineer

Flexera

Bangalore, India

12.2020 - 11.2021

Company Overview: Square Up is financial technology company known for its payment processing solutions and mobile point-of-sale systems
Managed source data stored in multiple formats, including JSON, CSV, SQL databases and text documents
Successfully ingested structured data from on-premises SQL databases and unstructured data using Azure Data Factory, ensuring data consistency and integrity
Successfully stored ingested data in Azure Blob Storage, utilizing scalable and cost-effective storage capabilities
Joined data from multiple sources or tables to create unified dataset and performed some aggregation functions tasks
Orchestrated complex workflows by chaining multiple activities together to create end-to-end data pipelines
Implemented data validation checks to ensure data quality and consistency throughout pipeline
Migrated unstructured data to Cosmos DB to perform ML Tasks
Utilized Azure Data Factory to schedule and trigger HDInsight Spark activities to ingested data from various sources into Azure Blob Storage
Created data pipelines using Azure Data Factory to represent batch processing workflows
Performed data transformations on batch data, such as filtering, aggregating, joining, and pivoting data by using HDInsight Spark
Utilized T-SQL scripts to perform data transformations, including data cleansing and aggregation, and implemented data partitioning strategies within Azure Synapse Analytics
Perform advanced analytics using Python, R, or Spark within Azure Synapse Analytics
Utilized Azure Monitor, Azure Data Factory and Synapse Management Studio for monitoring and logging
Analyzed data within Power BI to represent business concepts and relationships
Worked in Agile Environment in scrum frame work
Square Up is financial technology company known for its payment processing solutions and mobile point-of-sale systems
Improved sales performance
Reduced costs
Improved customer experience
Increased revenue

SQL/ETL Developer

Persistent Systems Limited

Pune, India

02.2018 - 11.2020

Company Overview: Persistent Systems Limited, prominent player in finance industry, provides financial services to wide range of clients, including banks, investment firms, and insurance companies
Built strong relationships with SMEs and Training and Placement coordinators to better understand their problems and information needs
Created procedures for standardizing unprocessed data from various sources in range of file formats, and assisted team members in cleaning and enhancing quality of data
Developed mappings in ETL applications like Pentaho Kettle to use transformations like Value Mapper, Expression, Lookup, Aggregate, Update Strategy, and Filters to load data from numerous sources into data warehouse
OLAP to OLTP conversions, data staging, data integration, and data cleaning operations were implemented
Developed, implemented, and maintained data mining, statistical, and visualization technique using python and pyspark scripts
Delivered data integration and design knowledge that enable accurate mapping using SQL and integration of data from numerous reliable sources
Using Excel and Pentaho BI Server, reports were created that corresponded with Training Officer's description of business user needs
Created automated methods for evolving content of standardized dashboards and interactive data visuals
Created Pentaho dashboards that are enticing and interactive that provide crucial business knowledge
Audited datasets and data to ensure were continuous, accurate, and consistent
Prioritized requirements for data retrieval, visualization, and research and completed them quickly
Prepared technical documentation that concisely outline data cleaning and analysis procedures and outcomes
Assisted with database application testing and production implementation
Managed big datasets, transferred and combined data files from many systems, aggregated data and performed quality control using big data technologies like Hive, Impala, Spark
Persistent Systems Limited, prominent player in finance industry, provides financial services to wide range of clients, including banks, investment firms, and insurance companies

Big Data Engineer

HSBC

Hyderabad, India

10.2015 - 01.2018

Company Overview: HSBC needs to enhance its fraud detection capabilities to stay ahead of evolving fraudulent schemes
Developed data ingestion pipelines using Informatica power center, an ETL tool, and bash scripting with big data technologies such as Hive, Impala, Spark, Kafka, and Informatica
Developed ETL (Extract, Transform, Load) pipelines using Sqoop to transfer structured data from RDBMS sources into Hadoop Distributed File System (HDFS)
Gathered requirements such as life cycle, data quality check, transformations, and metadata enrichment, for ingestion of new data sources
Provided data engineer services to data scientists employing big data technologies, such as data exploration, ad-hoc ingestions, and subject-matter knowledge
Using Pyspark and MLlib, created machine learning models to illustrate analytical potential of big data
Kafka were used to implement data streaming capabilities for various data sources
Utilized Cloudera distribution to process and analyze large-scale datasets, harnessing power of PySpark
Experienced in managing and optimizing diverse databases such as Hive, Impala, and Kudu harnessing their distinct features for effective data storage and processing
Took part in analysis and resolution of several scenarios' production work challenges
Developed UNIX scripts to process data files, design use case processes, and automate jobs
HSBC needs to enhance its fraud detection capabilities to stay ahead of evolving fraudulent schemes

Education

Master of Science - Computer Science

University of Missouri - Kansas City

Kansas City, MO

Bachelors of Technology -

Vellore Institute of Technology

Vellore, TN

Skills

AWS: EC2, EMR, Kinesis, Lambda, IAM, Glue, S3, Athena, AWS RDS, Redshift, Schema Conversion Tool, Cloud Formation, Cloud Watch, SNS

Azure: Data Factory, Stream Analytics, Data Lake Gen2, Functions Synapse Analytics, HD Insight, Logic Apps, Monitor, Cosmos DB

AI Services: Amazon SageMaker, AWS Fraud Detector, PyDeequ, Amazon Comprehend, Amazon Bedrock & AWS Textract

Visualization Tools: Power BI, Tableau, Quick Sight

Data Bases: MySQL, PostgreSQL, Oracle, MS SQL Server, CosmosDB, DynamoDB, MongoDB

Big Data Tools: HBase, Sqoop, Spark, Kafka, Scala, HDFS, Hive, Map Reduce, Yarn, Cassandra, Flume

Certification

AWS Certified Data Analytics - Specialty (DAS)

Personal Information

Title: Senior Cloud AI Data Engineer

Timeline

Senior AWS Data Engineer

Texas Health And Human Services Commission

12.2022 - Current

AWS Data Engineer

NRG

02.2022 - 12.2022

Azure Data Engineer

Flexera

12.2020 - 11.2021

SQL/ETL Developer

Persistent Systems Limited

02.2018 - 11.2020

Big Data Engineer

HSBC

10.2015 - 01.2018

Master of Science - Computer Science

University of Missouri - Kansas City

Bachelors of Technology -

Vellore Institute of Technology

Siva Teja

Summary

Overview

Work History

Senior AWS Data Engineer

AWS Data Engineer

Azure Data Engineer

SQL/ETL Developer

Big Data Engineer

Education

Master of Science - Computer Science

Bachelors of Technology -

Skills

Certification

Personal Information

Timeline

Senior AWS Data Engineer

AWS Data Engineer

Azure Data Engineer

SQL/ETL Developer

Big Data Engineer

Master of Science - Computer Science

Bachelors of Technology -

Similar Profiles

Elma VasquezElma Vasquez

ALEJANDRA CAMACHOALEJANDRA CAMACHO

Riesha PalmsRiesha Palms

Brenda F. CalvinBrenda F. Calvin

Monet SaundersMonet Saunders