Company Overview: Client - Aetna
- Spearheaded a large-scale GCP migration project, successfully transitioning the company's data infrastructure to the cloud
- Worked on migrating GCP tenants from shared to their own compute projects in order to monitor expenses, utilization, and improve the estimation of their business budget.
- Refactored existing on-premises code to Airflow DAGs, leveraging BigQuery and Python for efficient data processing
- Developed and optimized SQL queries to extract, transform, and load data from various sources
- Conducted numerous POCs involving FTP, Dataproc, and other technologies to evaluate their suitability for specific use cases.
- Led the data migration effort, ensuring data integrity, security, and compliance throughout the process
- Collaborated with cross-functional teams to ensure accurate data migration and to resolve other dependencies for smooth migration process
- Automated the process to validate and compare tables migrated by two different tools for data migration team
- Automated the data ingestion process using Dataproc, ingesting data from BigQuery to various databases
- Provided training and mentorship to team members, fostering knowledge sharing and professional growth
- Participated in agile development processes, contributing to sprint planning, stand-ups, and reviews to ensure timely delivery of data projects
- Optimized SQL queries and database schemas for performance improvements in data retrieval operations
- Designed, constructed, and maintained scalable data pipelines for data ingestion, cleaning, and processing using Python and SQL
- Implemented data visualization tools like Tableau and Power BI to create dashboards and reports for business stakeholders
- Developed Python scripts for extracting data from web services API's and loading into databases
- Managed version control and deployment of data applications using Git, Docker, and Jenkins
- Analyzed user requirements, designed and developed ETL processes to load enterprise data into the Data Warehouse
- Automated ETL processes across billions of rows of data, which reduced manual workload by 29% monthly
- Designing and developing scalable data pipelines using Google Cloud Platform (GCP) services such as Cloud Dataflow, Cloud Composer, and Apache Beam to process and manage large financial datasets efficiently.
- Building and optimizing data storage solutions leveraging Big Query, Cloud SQL, and Cloud Spanner to ensure high-performance data retrieval and analytics for financial reporting and compliance needs.
- Developing ETL/ELT workflows using Cloud Data Fusion, Cloud Storage, and Python, ensuring seamless data ingestion from various sources, including transactional banking systems and third-party financial services.
- I worked with data scientists to design data structures and support machine learning (ML) models using Google AI Platform and Big Query ML.
- Implementing CI/CD pipelines for data workflows using Cloud Build, Terraform, and GitHub Actions, automating deployment, testing, and monitoring processes to ensure consistency and reliability.
- Developing and maintaining data pipelines using GCP services such as Cloud Dataflow, Big Query, and Cloud Pub/Sub to ensure seamless data ingestion, transformation, and storage.
- Automating infrastructure provisioning and deployments using Terraform, Cloud Deployment Manager, and CI/CD pipelines with Cloud Build and GitHub Actions to ensure consistency and scalability.
- Supporting machine learning and AI initiatives by preparing and transforming large datasets for model training and evaluation using Big Query, Vertex AI, and Cloud AI Platform.
- Monitoring and troubleshooting data workflows using Cloud Logging, Cloud Monitoring, and Error Reporting, ensuring minimal downtime and quick resolution of data pipeline issues.
- Migrating on-premises Hadoop clusters (including HDFS, YARN, and MapReduce) to Google Cloud Storag
Environment: Google Cloud Platform (GCP), Apache Beam, BigQuery, Cloud SQL, Cloud Spanner, ETL/ELT workflows, Cloud Data Fusion, Apache Kafka, Google AI Platform, CI/CD pipelines, and GitHub Actions.