Summary

Overview

Work History

Education

Skills

Timeline

Tarun Teja Pasupuleti

Frisco,TX

Summary

Experienced Data Engineer with a focus on designing, developing and maintaining highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at developing database architectural strategies at the modeling, design and implementation stages. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.

Overview

years of professional experience

Work History

Senior GCP Data Engineer

State Street

12.2023 - Current

Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP’s big data stack
Worked with App teams to collect information from Google analytics360 and built data marts in big query for analytical reporting for the sales and products team
Experience in GCP DataProc, Dataflow, Pub Sub, GCS, Cloud functions, Big Query, Stack driver, Cloud logging, IAM, Data studio for reporting etc
Developed automated ETL processes using Teradata SQL and Dataflow, ensuring efficient data extraction, transformation, and loading
Configured and managed Apigee API proxies to handle traffic routing, security policies, and rate limiting, enhancing the reliability and performance of .NET-based APIs deployed on GCP
Expertise in data migration projects from on-premises databases to AlloyDB on GCP, ensuring data integrity and minimal downtime
Integrated RAG (Retrieval-Augmented Generation) systems with LLMs for enhanced knowledge-based generation, combining real-time data retrieval with generative capabilities
Deployed PyTorch-based models in production environments using Azure ML and GCP Vertex AI, ensuring scalability and efficiency
Loading data every incremental basis to big query raw, Google Data Proc, GCS bucket, hive, Spark, Scala, Python, gsutil and Shell Script

Tools Used: Hadoop, Scala, Spark, Hive, Scala, Sqoop, ADF, Databricks HBase, Kafka, YAML, Flume, Ambari, Scala, MS SQL, MySQL, Snowflake, MongoDB, Cassandra, Git, Data Storage Explorer, SAS, Java, Python, GCP, GCS, GKE, Teradata, Apache Flume, Apache Drill, HDFS, ETL, Flink

Senior GCP Data Engineer

PayPal

03.2023 - 11.2023

Developed scalable data pipelines using GCP technologies, leveraging tools like Apache Beam or Cloud Dataflow for data ingestion, processing, and transformation
Implemented PyTorch Lightning for scalable and modular training pipelines, reducing development time for machine learning experiments
Designed and implemented end-to-end MLOps pipelines on Google Cloud Platform (GCP) to automate machine learning workflows, ensuring continuous integration and deployment (CI/CD) for ML models
Developed and deployed predictive models for credit card fraud detection, leveraging ensemble learning and advanced regression techniques to ensure high model accuracy and reliability
Utilized GCP services such as AI Platform, Cloud Functions, and Cloud Build to orchestrate and manage ML pipelines efficiently
Designed and implemented java-based solutions for handling and managing data stored in GCP Cloud Storage, including data partitioning, versioning, and lifecycle management
Used LLM-based techniques for data extraction, knowledge retrieval, and natural language understanding (NLU) tasks in high-demand business operations
Built and automated ETL (Extract, Transform, Load) processes in Java, using GCP services like Cloud Data Fusion and Dataflow, to clean, enrich, and load data into target systems like BigQuery
Utilized big data tools for MLOps like GCP, Big Query, DataProc for streamlining data lakes, AutoML for automating the model building process.
Tuned and optimized Power BI reports and dashboards for performance and scalability, ensuring efficient and effective data visualization and analysis

Tools Used: AWS (EC2, S3, EMR, RDS, Glue, Athena, CLI), Lambda, Kinesis, Redshift, Cloud Formation, CloudWatch), Ansible, Flink, ANT, MAVEN, Jenkins CI/CD, Spark, Scala, Hive, Sqoop, HDFS, Mongo DB, OLAP, Power BI, Kafka, Hadoop, Supnik, Bitbucket, GIT, JIRA, Java, Python, SSH, Shell Scripting, Snowflake, Informatica, Talend, Docker, JSON, Pyspark, Kubernetes, Linux, Kibana

Data Engineer

Cardinal Health

09.2020 - 02.2023

Developed and implemented data engineering solutions to analyze healthcare data, including electronic health records (EHR), claims data, and medical research
Designed and implemented LookML models for healthcare-specific datasets, including patient records, medication inventory, and clinical data stored in GCP BigQuery, enabling streamlined reporting and analytics for key stakeholders
Collaborated with cross-functional teams, including clinical researchers and data scientists, to identify data requirements and develop data models for healthcare analytics projects
Developed and managed FHIR (Fast Healthcare Interoperability Resources) servers using Firely and Azure FHIR, ensuring secure and compliant healthcare data exchange
Implemented FHIR data storage solutions, ensuring compatibility with healthcare standards like HL7 and FHIR for seamless integration with clinical systems
Collaborated with healthcare clients to set up Azure FHIR services, optimizing the flow of medical data across various healthcare platforms
Developed generative AI models for real-time content generation through Vertex AI’s low-latency serving capabilities, ensuring scalable and performant responses for customer-facing applications
Implemented custom evaluation metrics and real-time monitoring of LLM performance using Vertex AI’s built-in tools, allowing continuous feedback and model improvements
Integrated TensorFlow models with Google Cloud Dataflow for large-scale distributed training, optimizing compute and storage costs

Tools Used: Cloudera CDH4.3, Hadoop, AWS, Java, R, Pig, Hive, Informatica, HBase, Kafka, Tableau, Azure Data Storage, Map Reduce, HDFS, Python, SQL, Sqoop, Spark, DataMart, Git, Teradata, DataStage

Data Engineer

Cigna, Health Insurance

11.2017 - 08.2020

Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
Conducted performance tuning and optimization of GCP services and infrastructure to improve data processing and analysis
Implemented GCP-based machine learning solutions, leveraging tools like Google Cloud ML Engine or AutoML for predictive analytics and data-driven insights
Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
Actively kept up-to-date with the latest GCP features, enhancements, and best practices, and applied them to drive innovation and continuous improvement in data engineering processes
Implemented Microservices architecture using .NET, enabling modular and scalable API development that integrates seamlessly with GCP components like Cloud Run and Kubernetes Engine
Used GCP tools to verify and safely store incoming patient data while setting ETL procedures for HL7 message integration, making sure that all industry standards were met
Implemented monitoring solutions on GCP to track the performance and integrity of data exchanges utilizing HL7 and ADT protocol
Created and managed API gateways using Apigee, streamlining the deployment of .NET services and ensuring consistent access control and traffic management
Optimized performance and resource utilization in GCP-based Big Data deployments, including fine-tuning query performance in BigQuery and optimizing cluster configurations in Dataproc
Conducted troubleshooting and debugging of issues related to data pipelines, performance bottlenecks, and system failures in GCP environments
In-depth understanding of FHIR protocols and standards, specifically leveraging Firely and Azure FHIR for healthcare data management
Skilled in configuring and scaling Azure FHIR services to meet the regulatory requirements of healthcare applications, ensuring compliance with HIPAA and other standards
Extensive experience with ETL tools like IBM Data Stage and Informatica IICS for efficient data integration, transformation, and loading
Skilled in deploying and managing Google Cloud services using Terraform, ensuring seamless scalability and reliability

Tools Used: Python, Pandas, Shell, Hadoop, Sqoop, MapReduce, SQL, Teradata, Snowflake, Hive, Pig, SQL, Azure, Data Bricks, Kafka, Azure Data Factory, Glue, HBase, Apache, Eclipse, Airflow, Informatica

ETL Developer

AT&T

06.2014 - 10.2017

Collaborated with supply chain teams to develop forecasting models, enabling accurate demand planning and optimizing inventory levels
Led customer segmentation analysis projects by leveraging customer data and machine learning techniques
Developed data models and implemented data pipelines to enable effective customer segmentation for targeted marketing campaigns and personalized offerings
Extensive experience in writing Teradata scripts using Bteq, Mload, Fast Load and Fast Export
Ensured data quality and accuracy by implementing data cleansing and validation processes, maintaining high data integrity for analysis and decision-making
Collaborated with data governance teams to establish data standards, policies, and access controls, ensuring compliance and data security
Utilized big data technologies, such as Apache Hadoop and Spark, to process and analyze large datasets efficiently
Documented data engineering processes, data models, and system configurations, facilitating knowledge sharing and ensuring a robust technical foundation
Engineered an ETL service designed to monitor file updates on the server and streamline their transfer into the Kafka queue, boosting data flow and responsiveness
Leveraged SQL loader extensively to import data from flat files directly to Oracle database tables, ensuring fast and accurate data availability
Developed custom reports for business stakeholders, providing valuable insights into key performance metrics.
Collaborated with business intelligence staff at customer facilities to produce customized ETL solutions for specific goals.
Designed integration tools to combine data from multiple, varied data sources such as RDBMS, SQL and big data installations.
Documented technical specifications and designs, facilitating knowledge sharing among team members and supporting future development efforts.

Tools Used: Python, Pandas, Matplotlib, Scikit-learn, SciPy, Machine Learning, K-Means, Tableau, Hadoop, ETL, SQL, Oracle, Agile

Education

Bachelor of Science - Computer Science

Raghu Engineering College

Visakhapatnam, India

06-2014

Skills

Data modeling
Database management
SQL proficiency
Big data technologies
ETL processes
Data warehousing
Data analysis
Data integration

Data architecture
Data pipelines
Cloud computing
Data visualization
Python programming
Machine learning
AWS
Azure

Timeline

Senior GCP Data Engineer

State Street

12.2023 - Current

Senior GCP Data Engineer

PayPal

03.2023 - 11.2023

Data Engineer

Cardinal Health

09.2020 - 02.2023

Data Engineer

Cigna, Health Insurance

11.2017 - 08.2020

ETL Developer

AT&T

06.2014 - 10.2017

Bachelor of Science - Computer Science

Raghu Engineering College

Tarun Teja Pasupuleti

Summary

Overview

Work History

Senior GCP Data Engineer

Senior GCP Data Engineer

Data Engineer

Data Engineer

ETL Developer

Education

Bachelor of Science - Computer Science

Skills

Timeline

Senior GCP Data Engineer

Senior GCP Data Engineer

Data Engineer

Data Engineer

ETL Developer

Bachelor of Science - Computer Science

Similar Profiles

Aparna ValsanAparna Valsan

Aidaana KanatovaAidaana Kanatova

Surbhi Surbhi null

Adam KozakiewiczAdam Kozakiewicz

Ethan RhaburnEthan Rhaburn