With over 8+ years of experience in the IT industry, I am a highly skilled and motivated team player specializing as a Data Engineer. My proficiency lies in building and managing data pipelines, designing data models, conducting data analytics, and software development using cloud platforms such as Google Cloud Platform (GCP), Azure, and AWS. I have a fervent passion for cloud computing, and my expertise in data engineering serves as an asset to any organization.
• Experienced Data Engineer with a proven track record of successfully migrating on-premises Data Warehouses and Data Marts to Cloud Data warehouses including Snowflake, GCP Big Query, and Redshift. Proficient in designing and deploying applications within the AWS stack, with a strong focus on high-availability, fault tolerance, and auto-scaling.
• Possesses a practical understanding of Data modeling concepts, including Dimensional & Relational, encompassing techniques such as Star-Schema Modeling, Snowflake Schema Modeling, and the creation of Fact and Dimension tables.
• Demonstrates strong expertise in Data Migration from RDBMS to Snowflake cloud data warehouse, ensuring efficient and secure data transfer.
• Hands-on experience utilizing Google Cloud Platform (GCP) for big data products like Big Query, Cloud Data Proc, and Composer (Air Flow as a service), showcasing proficiency in cloud-based data solutions.
• Proficient in constructing data pipelines using tools such as Apache Airflow, GCP Dataflow, and AWS Glue, leveraging a variety of technologies including Python, Pyspark, Hive SQL, Presto, and Big Query. Proficient in creating, debugging, scheduling, and monitoring jobs using Composer Airflow.
• Skilled in data preparation, data modeling, and data visualization utilizing Power BI. Proficient in developing various analysis services using DAX queries.
• Possesses expertise in multiple programming languages, including Python, R, and SAS, with a keen interest in staying updated with emerging technologies within the Google Cloud Platform ecosystem.
• Developed ETL applications capable of handling large volumes of data using various tools such as MapReduce, Spark-Scala, Spark, Spark-SQL, and Pig, ensuring efficient data processing.
• Strong SQL development skills, encompassing the creation of stored procedures, triggers, views, and user-defined functions, contributing to effective data management.
• Extensive experience in GCP Dataproc, GCS, Cloud functions, Big Query, Azure Data Factory, and Data Bricks, with a track record of building efficient data pipelines facilitating seamless data transfer between GCP and Azure using Azure Data Factory.
• Proficient in building Power BI reports on Azure Analysis Services, optimizing performance compared to direct querying using GCP Big Query.
• Well-versed in AWS services such as EC2, S3, Route53, ELB, EBS, VPC, RDS, DynamoDB, SNS, SQS, IAM, KMS, Lambda, Kinesis, ECS, EKS, and skilled in infrastructure management using CloudFormation and OpsWork.
• Experienced in Azure IaaS, Docker, SQL, Oracle, and NoSQL databases, with proficiency in Bash and Python scripting and an understanding of Linux internals.
• Proficient in Business Intelligence and data visualization, utilizing tools such as Tableau and Alteryx to deliver actionable insights.
• Prioritizes high-availability, fault tolerance, and auto-scaling in work, with experience in AWS Cloud Formation, service deployment (OpsWork and Cloud Formation), and adherence to security best practices (IAM, CloudWatch, CloudTrail).
• Well-versed in REST API development on SQL, Oracle, and NoSQL databases such as MySQL, MongoDB, and EMR.
• Possesses a strong background in Bash and Python scripting on Linux, with experience spanning all phases of the Systems Life Cycle, including project definition, analysis, design, coding, testing, implementation, and support.
• Skilled in unit testing and data validation to ensure data accuracy and reliability.
• Proficient in Agile and Scrum methodologies, with hands-on experience in all phases of the Software Development Life Cycle (SDLC).
• Excellent communication and interpersonal skills, coupled with a rapid learning ability, ensuring effective collaboration and knowledge acquisition.
• Led the migration efforts of SAP HANA and Oracle databases to BigQuery, showcasing expertise in database migration and cloud integration.
• Assumed responsibility for comprehending the intricacies of Calculation views in SAP HANA, successfully converting and optimizing them for seamless integration with GCP BigQuery.
• Architected and developed robust data pipelines in Apache Airflow within GCP, proficiently configuring ETL jobs using a diverse array of operators for optimized data processing.
• Demonstrated proficiency across various essential GCP services, including DataFlow, Google Cloud Storage (GCS), Cloud Functions, and BigQuery, enabling streamlined data processing and analysis.
• Utilized the Cloud Shell SDK within GCP to configure and fine-tune critical services such as Data Proc, Storage, and BigQuery, ensuring optimal performance and resource allocation.
• Collaborated seamlessly with cross-functional teams to engineer a comprehensive framework for generating daily ad hoc reports and data extracts from enterprise-level data residing in BigQuery, driving data-driven decision-making.
• Leveraged advanced data processing techniques to efficiently download BigQuery data into Pandas and Spark data frames, enabling advanced and flexible ETL capabilities.
• Proficiently created Audit reports utilizing GCP Data Studio, demonstrating a deep understanding of data visualization and reporting.
• Implemented and managed Apache Airflow to orchestrate job workflows, exemplifying expertise in job scheduling and automation.
• Showcased hands-on experience in the migration of data to cloud platforms through lift-and-shift strategies, facilitating seamless data transfer and integration.
• Demonstrated adeptness in utilizing Google Data Catalog and other Google Cloud APIs for comprehensive monitoring, query, and billing-related analysis in the context of BigQuery utilization.
• Exhibited sound knowledge of Cloud Dataflow and Apache Beam, effectively utilizing the Cloud Shell for diverse tasks, including service deployment and configuration.
• Crafted BigQuery authorized views to enforce row-level security, fortifying data access control, and critically assessed Snowflake Design considerations to accommodate application changes.
• Proficiently designed both Logical and Physical data models tailored for Snowflake, adhering to the evolving requirements of the project. Defined roles and privileges necessary to access various database objects, ensuring data security.
• Demonstrated the ability to size virtual warehouses within Snowflake for distinct types of workloads, optimizing resource allocation and performance.
Environment: GCP, Git, Pubsub, GCP BigQuery, GCP Looker, SAP HANA, Oracle, Python
• Implemented a robust ETL process in Alteryx to efficiently extract data from diverse sources, including SQL Server, XML, Excel, and CSV. Proficiently scheduled workflows for streamlined data operations.
• Managed end-to-end complex data migration, conversion, and data modeling utilizing Alteryx and SQL, coupled with the creation of visualizations using Tableau to craft high-impact dashboards.
• Updated fields and data types, showcasing a keen grasp of case statement logic for the development of business rules.
• Skillfully employed joins and sub-queries/nested queries in SQL, and adeptly designed Alteryx workflows to handle scenarios where data availability was limited.
• Proficiently navigated UNIX/LINUX systems, boasting bash scripting expertise while constructing data pipelines.
• Demonstrated a high level of expertise in SQL development, encompassing the creation of stored procedures, triggers, views, and user-defined functions to enhance data management.
• Architected and constructed multiple end-to-end ETL and ELT processes for data ingestion and transformation within Google Cloud Platform (GCP), leveraging services such as Big Query, Cloud Dataproc, and Apache Airflow.
• Developed Big Query authorized views to enforce row-level security and facilitate secure data sharing across teams, along with successfully migrating existing cron jobs to Composer/Airflow for improved orchestration.
• Possessed hands-on experience with various reporting tools and software, including SQL Database, Looker, Tableau dashboards, and data warehouses, enhancing data visualization and reporting capabilities.
• Demonstrated practical knowledge in importing data for reporting software utilization, conducting comprehensive SQL and Lookml testing within Looker, and leveraging Snowflake DB. Proficiently handled error handling and debugging of Looker reports, crafted custom table calculations like offset, and established data actions within Looker for enhanced interactivity.
Environment: GCP, Big Query, Alteryx, Looker Kubernetes, Cucumber, GIT.
• Proficiently navigated and leveraged a spectrum of Google Cloud Platform (GCP) services, including GCP core services, BigQuery, GCS buckets, Cloud Functions, Cloud Dataflow, Pub/Sub, Cloud Shell, GSUTIL, BQ command line utilities, Data Proc, and Stackdriver.
• Spearheaded the data migration process to GCP BigQuery, employing Apache Beam and Apache Airflow for seamless, automated data transfers.
• Collaborated closely with the ETL team to optimize data performance originating from diverse sources, architecting star schemas within BigQuery for enhanced data modeling.
• Orchestrated the loading of Salesforce data into BigQuery using a multifaceted toolset encompassing SQL, Google DataProc, GCS buckets, HIVE, Spark, Scala, Python, Gsutil, and Shell Script.
• Demonstrated proficiency in Python and Apache Beam, developing and deploying programs within Cloud Dataflow to execute comprehensive data validation between raw source files and BigQuery tables.
• Engineered a versatile and scalable framework in Scala and Spark, facilitating the integration of common data sources such as MYSQL, Oracle, Postgres, SQL Server, and BigQuery, streamlining data loading processes.
• Extended expertise into the realm of ETL pipelines, focusing on S3 Parquet files within data lakes using AWS Glue. Leveraged AWS Glue for data extraction from S3 and subsequent data transformations.
• Administered and managed Google Cloud clusters efficiently, harnessing Kubernetes (k8s) for orchestration and resource allocation.
• Implemented Stackdriver Monitoring within GCP, actively monitoring application alerts running on GCP, and effectively deploying applications using Google Cloud Deployment Manager.
• Proficiently crafted a diverse range of data visualizations, including Line Charts, Pie Charts, Heat Maps, Plots, Filters, Sets, Parameters, and Groupings. Employed advanced techniques for complex LOD (Level of Detail) calculations and data blending across multiple sources, such as SQL Server and Teradata.
• Adept in generating ad-hoc reports and adeptly working with User Filters and Row Security for enhanced data access control.
• Acclaimed for expertise in dashboard design, adhering to best practices in data visualization to deliver compelling and informative insights. Utilized advanced techniques, including filters and Sheet swapping/Sheet selector, to enhance dashboard performance and user experience.
• Maintained a meticulous approach to Tableau report creation, consistently optimizing report performance to ensure swift data retrieval and analysis.
Environment: GCP, Git, AWS Glue, AWS Athena, GCP Pubsub, GCP BigQuery,Docker, Kubernetes, Python, Terraform
· Spearheaded the development of robust data pipelines within the Google Cloud Platform using Apache Airflow, employing a variety of operators to streamline ETL processes.
· Demonstrated proficiency across multiple GCP services, including Dataproc, Google Cloud Storage (GCS), Cloud Functions, and BigQuery, while executing seamless data transfers between GCP and Azure through Azure Data Factory.
· Elevated reporting efficiency by constructing Power BI reports on Azure Analysis Services, enhancing overall system performance.
· Utilized the Cloud Shell SDK in GCP to configure essential services such as Data Proc, Storage, and BigQuery. Collaborated closely with cross-functional teams to devise a framework for generating daily ad hoc reports and enterprise data extracts from BigQuery.
· Orchestrated the implementation of advanced analytical models within Hadoop clusters, tackling extensive datasets in collaboration with the Data Science team.
· Developed Hive SQL scripts for the creation of intricate tables with a focus on optimizing performance metrics, including partitioning, clustering, and skewing.
· Executed data extraction from BigQuery into Pandas and Spark data frames to enable advanced ETL capabilities, ensuring data agility and flexibility.
· Leveraged Google Data Catalog and various Google Cloud APIs for comprehensive monitoring, querying, and billing-related analyses related to BigQuery usage.
· Pioneered a Proof of Concept (POC) for integrating machine learning models and Cloud ML to enhance table quality analysis within batch processes.
· Proficiently utilized Cloud Dataflow and Apache Beam, while harnessing the power of Cloud Shell for diverse tasks and service deployments.
· Designed and implemented BigQuery authorized views, enhancing data security through row-level access control and facilitating data sharing with other teams.
· Extensive expertise in designing and deploying Hadoop clusters, along with a deep understanding of various Big Data analytic tools such as Pig, Hive, Sqoop, Apache Spark, and the Cloudera Distribution.
· Led the end-to-end migration of a complex Oracle database to Google BigQuery, leveraging Power BI for comprehensive reporting and data visualization.
GCP Cloud Services
GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub
Big Data
Spark, Azure Storage, Azure Data Factory, Azure Analysis Services
ETL/Reporting
GCP Cloud Composer, DataFlow, Power BI, Data Studio, Tableau
Databases
GCP BigQuery, SAP HANA, SQL Server, DynamoDB, cosmosDB
Scripting Languages
Python, Bash, PowerShell, shell, Java Script, Java
Configuration Management
Chef, Ansible, Terraform