Summary
Overview
Work History
Education
Skills
Timeline
Generic

Tarun Teja Pasupuleti

Frisco,TX

Summary

Experienced Data Engineer with a focus on designing, developing and maintaining highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at developing database architectural strategies at the modeling, design and implementation stages. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.

Overview

11
11
years of professional experience

Work History

Senior GCP Data Engineer

State Street
12.2023 - Current
  • Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP’s big data stack
  • Worked with App teams to collect information from Google analytics360 and built data marts in big query for analytical reporting for the sales and products team
  • Experience in GCP DataProc, Dataflow, Pub Sub, GCS, Cloud functions, Big Query, Stack driver, Cloud logging, IAM, Data studio for reporting etc
  • Developed automated ETL processes using Teradata SQL and Dataflow, ensuring efficient data extraction, transformation, and loading
  • Configured and managed Apigee API proxies to handle traffic routing, security policies, and rate limiting, enhancing the reliability and performance of .NET-based APIs deployed on GCP
  • Expertise in data migration projects from on-premises databases to AlloyDB on GCP, ensuring data integrity and minimal downtime
  • Integrated RAG (Retrieval-Augmented Generation) systems with LLMs for enhanced knowledge-based generation, combining real-time data retrieval with generative capabilities
  • Deployed PyTorch-based models in production environments using Azure ML and GCP Vertex AI, ensuring scalability and efficiency
  • Loading data every incremental basis to big query raw, Google Data Proc, GCS bucket, hive, Spark, Scala, Python, gsutil and Shell Script


Tools Used: Hadoop, Scala, Spark, Hive, Scala, Sqoop, ADF, Databricks HBase, Kafka, YAML, Flume, Ambari, Scala, MS SQL, MySQL, Snowflake, MongoDB, Cassandra, Git, Data Storage Explorer, SAS, Java, Python, GCP, GCS, GKE, Teradata, Apache Flume, Apache Drill, HDFS, ETL, Flink

Senior GCP Data Engineer

PayPal
03.2023 - 11.2023


  • Developed scalable data pipelines using GCP technologies, leveraging tools like Apache Beam or Cloud Dataflow for data ingestion, processing, and transformation
  • Implemented PyTorch Lightning for scalable and modular training pipelines, reducing development time for machine learning experiments
  • Designed and implemented end-to-end MLOps pipelines on Google Cloud Platform (GCP) to automate machine learning workflows, ensuring continuous integration and deployment (CI/CD) for ML models
  • Developed and deployed predictive models for credit card fraud detection, leveraging ensemble learning and advanced regression techniques to ensure high model accuracy and reliability
  • Utilized GCP services such as AI Platform, Cloud Functions, and Cloud Build to orchestrate and manage ML pipelines efficiently
  • Designed and implemented java-based solutions for handling and managing data stored in GCP Cloud Storage, including data partitioning, versioning, and lifecycle management
  • Used LLM-based techniques for data extraction, knowledge retrieval, and natural language understanding (NLU) tasks in high-demand business operations
  • Built and automated ETL (Extract, Transform, Load) processes in Java, using GCP services like Cloud Data Fusion and Dataflow, to clean, enrich, and load data into target systems like BigQuery
  • Utilized big data tools for MLOps like GCP, Big Query, DataProc for streamlining data lakes, AutoML for automating the model building process.
  • Tuned and optimized Power BI reports and dashboards for performance and scalability, ensuring efficient and effective data visualization and analysis


Tools Used: AWS (EC2, S3, EMR, RDS, Glue, Athena, CLI), Lambda, Kinesis, Redshift, Cloud Formation, CloudWatch), Ansible, Flink, ANT, MAVEN, Jenkins CI/CD, Spark, Scala, Hive, Sqoop, HDFS, Mongo DB, OLAP, Power BI, Kafka, Hadoop, Supnik, Bitbucket, GIT, JIRA, Java, Python, SSH, Shell Scripting, Snowflake, Informatica, Talend, Docker, JSON, Pyspark, Kubernetes, Linux, Kibana

Data Engineer

Cardinal Health
09.2020 - 02.2023
  • Developed and implemented data engineering solutions to analyze healthcare data, including electronic health records (EHR), claims data, and medical research
  • Designed and implemented LookML models for healthcare-specific datasets, including patient records, medication inventory, and clinical data stored in GCP BigQuery, enabling streamlined reporting and analytics for key stakeholders
  • Collaborated with cross-functional teams, including clinical researchers and data scientists, to identify data requirements and develop data models for healthcare analytics projects
  • Developed and managed FHIR (Fast Healthcare Interoperability Resources) servers using Firely and Azure FHIR, ensuring secure and compliant healthcare data exchange
  • Implemented FHIR data storage solutions, ensuring compatibility with healthcare standards like HL7 and FHIR for seamless integration with clinical systems
  • Collaborated with healthcare clients to set up Azure FHIR services, optimizing the flow of medical data across various healthcare platforms
  • Developed generative AI models for real-time content generation through Vertex AI’s low-latency serving capabilities, ensuring scalable and performant responses for customer-facing applications
  • Implemented custom evaluation metrics and real-time monitoring of LLM performance using Vertex AI’s built-in tools, allowing continuous feedback and model improvements
  • Integrated TensorFlow models with Google Cloud Dataflow for large-scale distributed training, optimizing compute and storage costs


Tools Used: Cloudera CDH4.3, Hadoop, AWS, Java, R, Pig, Hive, Informatica, HBase, Kafka, Tableau, Azure Data Storage, Map Reduce, HDFS, Python, SQL, Sqoop, Spark, DataMart, Git, Teradata, DataStage

Data Engineer

Cigna, Health Insurance
11.2017 - 08.2020
  • Enhanced data quality by performing thorough cleaning, validation, and transformation tasks.
  • Conducted performance tuning and optimization of GCP services and infrastructure to improve data processing and analysis
  • Implemented GCP-based machine learning solutions, leveraging tools like Google Cloud ML Engine or AutoML for predictive analytics and data-driven insights
  • Streamlined complex workflows by breaking them down into manageable components for easier implementation and maintenance.
  • Actively kept up-to-date with the latest GCP features, enhancements, and best practices, and applied them to drive innovation and continuous improvement in data engineering processes
  • Implemented Microservices architecture using .NET, enabling modular and scalable API development that integrates seamlessly with GCP components like Cloud Run and Kubernetes Engine
  • Used GCP tools to verify and safely store incoming patient data while setting ETL procedures for HL7 message integration, making sure that all industry standards were met
  • Implemented monitoring solutions on GCP to track the performance and integrity of data exchanges utilizing HL7 and ADT protocol
  • Created and managed API gateways using Apigee, streamlining the deployment of .NET services and ensuring consistent access control and traffic management
  • Optimized performance and resource utilization in GCP-based Big Data deployments, including fine-tuning query performance in BigQuery and optimizing cluster configurations in Dataproc
  • Conducted troubleshooting and debugging of issues related to data pipelines, performance bottlenecks, and system failures in GCP environments
  • In-depth understanding of FHIR protocols and standards, specifically leveraging Firely and Azure FHIR for healthcare data management
  • Skilled in configuring and scaling Azure FHIR services to meet the regulatory requirements of healthcare applications, ensuring compliance with HIPAA and other standards
  • Extensive experience with ETL tools like IBM Data Stage and Informatica IICS for efficient data integration, transformation, and loading
  • Skilled in deploying and managing Google Cloud services using Terraform, ensuring seamless scalability and reliability


Tools Used: Python, Pandas, Shell, Hadoop, Sqoop, MapReduce, SQL, Teradata, Snowflake, Hive, Pig, SQL, Azure, Data Bricks, Kafka, Azure Data Factory, Glue, HBase, Apache, Eclipse, Airflow, Informatica

ETL Developer

AT&T
06.2014 - 10.2017
  • Collaborated with supply chain teams to develop forecasting models, enabling accurate demand planning and optimizing inventory levels
  • Led customer segmentation analysis projects by leveraging customer data and machine learning techniques
  • Developed data models and implemented data pipelines to enable effective customer segmentation for targeted marketing campaigns and personalized offerings
  • Extensive experience in writing Teradata scripts using Bteq, Mload, Fast Load and Fast Export
  • Ensured data quality and accuracy by implementing data cleansing and validation processes, maintaining high data integrity for analysis and decision-making
  • Collaborated with data governance teams to establish data standards, policies, and access controls, ensuring compliance and data security
  • Utilized big data technologies, such as Apache Hadoop and Spark, to process and analyze large datasets efficiently
  • Documented data engineering processes, data models, and system configurations, facilitating knowledge sharing and ensuring a robust technical foundation
  • Engineered an ETL service designed to monitor file updates on the server and streamline their transfer into the Kafka queue, boosting data flow and responsiveness
  • Leveraged SQL loader extensively to import data from flat files directly to Oracle database tables, ensuring fast and accurate data availability
  • Developed custom reports for business stakeholders, providing valuable insights into key performance metrics.
  • Collaborated with business intelligence staff at customer facilities to produce customized ETL solutions for specific goals.
  • Designed integration tools to combine data from multiple, varied data sources such as RDBMS, SQL and big data installations.
  • Documented technical specifications and designs, facilitating knowledge sharing among team members and supporting future development efforts.


Tools Used: Python, Pandas, Matplotlib, Scikit-learn, SciPy, Machine Learning, K-Means, Tableau, Hadoop, ETL, SQL, Oracle, Agile

Education

Bachelor of Science - Computer Science

Raghu Engineering College
Visakhapatnam, India
06-2014

Skills

  • Data modeling
  • Database management
  • SQL proficiency
  • Big data technologies
  • ETL processes
  • Data warehousing
  • Data analysis
  • Data integration
  • Data architecture
  • Data pipelines
  • Cloud computing
  • Data visualization
  • Python programming
  • Machine learning
  • AWS
  • Azure

Timeline

Senior GCP Data Engineer

State Street
12.2023 - Current

Senior GCP Data Engineer

PayPal
03.2023 - 11.2023

Data Engineer

Cardinal Health
09.2020 - 02.2023

Data Engineer

Cigna, Health Insurance
11.2017 - 08.2020

ETL Developer

AT&T
06.2014 - 10.2017

Bachelor of Science - Computer Science

Raghu Engineering College
Tarun Teja Pasupuleti