Overall, eight years of hands-on experience as a Data Engineer, excelling in the design, development, and implementation of advanced data architectures. Proficient in both Google Cloud Services, with a strong track record throughout the SDLC. Specialized in GCP, Azure Services, Big Data Ecosystem, and bringing expertise in large-scale data warehousing and end-to-end integration.
Good working knowledge of the Google Cloud platform which includes services like Compute Engine, Cloud Storage, Virtual Private Cloud, Cloud Identity, Firebase, Big Query, Cloud Functions, Eventarc, Cloud Monitoring, Auto Scaling, Security Groups, Deployment Manager, Dataflow, Pub-Sub, Firebase Cloud Messaging.
Hands-on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Map Reduce, Hive, Impala, Sqoop, Oozie, Pig, Zookeeper, Spark, Hue, Flume, Storm, Kafka, and Yarn Distributions.
Very good knowledge and working experience in GCP Services like DataProc and BQ and other services that provide fast and efficient processing of Big Data.
Experience in analyzing data using Python, R, SQL, Hive, Pyspark, and Spark SQL for data mining, data cleansing, data munging, and machine learning.
Good understanding of spark architecture with data bricks, and structured streaming. Setting up GCP and Microsoft Azure with Databricks, Databricks Workspace for business analytics, managing clusters in Databricks, and managing machine learning life cycle.
Utilized Scala and Spark SQL to design and implement Spark applications, significantly improving the speed of testing and data processing. Leveraging the power of Spark, these applications facilitated parallelized data processing, ensuring faster and more scalable operations.
Demonstrated proficiency in working with diverse file formats such as AVRO, ORC, JSON, and Parquet. Expertise in handling these formats ensures adaptability to different data structures and requirements.
Possessed strong skills in UNIX shell scripting and Perl scripting, leveraging these capabilities to automate and enhance data processing workflows. In UNIX environments, shell scripting played a crucial role in automating repetitive tasks, whilePerl scripting provided additional flexibility and customization.
Proficient in using Terraform to define, manage, and provision infrastructure as code, ensuring consistency and repeatability in infrastructure deployments. Familiarity with Snowflake's architecture, including its unique multi-cluster, multi-warehouse architecture that allows for efficient and concurrent data processing.
Experienced in utilizing Terraform to automate the deployment and management of databases, including configuration, scaling, and updates.
Knowledge of Star Schema Modeling, Snowflake Modeling, Facts and Dimension tables, Physical and Logical Modeling.
Skilled in integrating Terraform with cloud platforms (e.g., AWS, Azure, GCP) to orchestrate and manage cloud resources efficiently.
Developed and maintained reusable Terraform modules for database configurations, streamlining the deployment process and promoting code reusability.
Strong communication and analytical skills with very good experience in programming and problem-solving.
Excellent working experience in Scrum/Agile framework and Waterfallproject execution Methodologies.
Proven capabilities in Database design and development, BI using SQL Server, and extensive proficiency in SQL concepts, Python, Scala, Java, and Pyspark. Skilled in implementing new features and optimizing code with Kubernetes and Docker. Experienced in CI/CDpipelines, automation tools, and ETL processes using Flume, Kafka, Power BI,and SSIS.
Overview
9
9
years of professional experience
Work History
Sr. Data Engineer
H & R Block
08.2022 - Current
Lead the design and implementation of GCP-based data solutions, ensuring alignment with client needs.
Established resilient GCP architectures ensuring data availability, integrity, and security. Designed and implemented robust solutions for optimal performance and protection of sensitive information.
Lead the design and execution of complex infrastructure projects using Terraform, showcasing a deep understanding of IaC principles and best practices.
Demonstrate expertise in Cloud Build, orchestrating seamless and automated CI/CD pipelines to accelerate software delivery, enhance reliability, and optimize deployment processes.
Led the strategic design and implementation of Terraform security modules, leveraging over 3 years of experience to architect robust security measures within the infrastructure as code, ensuring the confidentiality, integrity, and availability of resources.
Demonstrate a track record of continuous improvement in Terraform security best practices, staying abreast of industry standards and evolving security threats, and applying this knowledge to enhance the security posture of infrastructure deployments.
Develop and optimize data processing pipelines using GCP services, such as DataProc, Dataflow, Cloud Functions, and Composer
Implemented and optimized Google Big Query data warehouse solutions for efficient querying and analysis, ensuring high-performance analytics for large datasets.
Developed ETL (Extract, Transform, Load) processes using Dataproc and Dataflow, automating data preparation and integration workflows to streamline data processing tasks.
Designed and implemented serverless functions using Cloud Functions to execute code in response to events, enhancing scalability and efficiency in data processing workflows.
Orchestrated and coordinated GCP services using Cloud Composer to create scalable and reliable workflows, providing a seamless integration of various microservices.
Utilized Google BigQuery for ad-hoc querying of data stored in Google Cloud Storage (GCS), enabling quick and cost-effective analysis of large datasets without the need for complex data infrastructure.
Implemented and managed DataProc clusters for processing and analyzing vast amounts of data using popular frameworks like Spark and Hive.
Ensure data accuracy and timeliness, meeting the dynamic reporting needs of the Insurance domain.
Exhibited strong scripting skills in Bash, PowerShell, and Groovy, showcasing the ability to seamlessly automate and optimize diverse tasks across different operating environments, contributing to efficient and scalable infrastructure management.
Leverage Spark SQL API in PySpark for intricate data transformations, extracting, transforming, and loading data seamlessly.
Execute complex SQL queries, facilitating comprehensive data analysis and supporting data-driven decision-making.
Implement automation scripts using Python and Terraform to streamline Cloud Identity policy provisioning and management, reducing manual efforts significantly.
Develop and maintain CI/CD pipelines on GCP using Jenkins and Terraform, ensuring seamless code deployment and testing.
Enforce best practices for continuous integration and deployment, enhancing the efficiency of the development lifecycle.
Cloud Data Engineer
Capgemini Private Limited
11.2019 - 06.2022
To meet the demand for processing and analyzing substantial volumes of data, I orchestrated the implementation and maintenance of sophisticated data pipelines on Google Cloud Platform (GCP).
These pipelines were meticulously designed to handle large-scale data operations, encompassing tasks such as data ingestion, transformation, and analysis.
Leveraging GCP's powerful suite of services, including BigQuery, Dataflow, and Storage, I ensured the seamless flow of data through the pipeline.
· Leveraged Scala and Python programming to implement custom data processing solutions, improving data quality and processing efficiency.
· Utilized GCP services, including Google Cloud Storage(GCS), Cloud DataPrep, and DataProc, to enhance data processing solutions, optimizing efficiency and quality.
Implemented GCP optimization strategies, incorporating tools such as Google Cloud Monitoring and Google Cloud Billing, to streamline processes and improve overall performance.
Introduced innovative GCP solutions, leveraging tools like Cloud Functions and Cloud Composer, to drive efficiency and effectiveness in data processing.
Hands-on experience with Cloud IAM services on Google Cloud Platform (GCP), showcased proficiency in designing and managing identity and access management solutions tailored to cloud environments.
Developed Spark programs in Scala and Pyspark to execute intricate data transformations, optimizing large datasets for analysis. Created structured data frames to facilitate efficient manipulation and integration within the Spark ecosystem.
In optimizing data workflows, Cloud Functions emerged as a pivotal component in orchestrating the efficient movement of data from Google Cloud Dataproc to downstream applications. The utilization of Cloud Functions introduced a serverless approach, enabling the execution of lightweight, event-driven functions triggered by changes or events in the Dataproc environment.
Evaluated Hadoop and its ecosystem's appropriateness for the aforesaid project and deployed/validated using several proof of concept (POC) apps to benefit from the Big Data Hadoop and Dataproc Initiative.
Created a real-time data pipeline utilizing Kafka and Spark stream to ingest and process client data from their weblog server.
Enhanced Teradata SQL scripts through the implementation of RANK functions, boosting query efficiency for retrieving data from sizable tables.
Crafted SQL queries and generated test data for Informatica Cloud mappings unit testing.
Developed data integration workflows in Data Fusion using a visual drag-and-drop interface, enabling rapid development and deployment of data pipelines.
Implemented proof of concepts for SOAP & REST APIs and utilized REST APIs to retrieve analytics data.
Developed and maintained CI/CD pipelines on GCP using Cloud Build and Cloud Deployment Manager to enable seamless code deployment and testing in a controlled environment.
Data Engineer
Capgemini Private Limited
09.2017 - 11.2019
Designed, developed, and implemented ETL processes using IICS Data Integration, incorporating Spark, Hive, Oozie, Sqoop, Kafka, and Shell scripting for comprehensive data integration and processing.
Built scalable and optimized Snowflake schemas, tables, and views to support complex analytics queries, leveraging Spark and Hive.
Implemented efficient data transfer between different environments using Sqoop, ensuring seamless integration and optimal data flow.
Worked on enhancements and maintenance activities of the data, including profiling, tuning, and modifying existing solutions, incorporating Oozie for workflow orchestration.
Orchestrated complex workflow processes using Oozie, enhancing coordination and automation in data processing tasks.
Re-engineered existing ETL processes for better performance, utilizing Sqoop for efficient data transfer.
Conducted unit testing at various ETL stages, actively participated in team code reviews, and implemented Kafkafor real-time data streaming.
Designed and developed various automated processes, including scheduled execution using Shell scriptingto meet downstream SLAs.
Employed Shell scripting to develop custom solutions for specific data processing requirements, contributing to the overall automation and efficiency of the ETL processes.
Built Power BI reports on Azure Analysis Services, enhancing visualization and analytics capabilities.
Constructed data pipelines with Cloud ETL mappings in Informatica Intelligent Cloud Service, utilizing Kafkafor event-driven data processing, and published them for API calls.
Data Analyst
Winnow IT Services Pvt Ltd
06.2015 - 08.2017
Developed and examined business needs to create technically proficient data solutions that can be put into practice.
Analyzing classified data items for data profiling and mapping from source to target data environments and creating working documents to back up results and assign responsibilities.
Used complicated SQL to analyze and profile data from a variety of sources, including Teradata and Oracle.
Participated in meetings for gathering information and JAD sessions to deliver a business requirements document and a draft logical data model.
Created mappings using the transformations Source Qualifier, Expression, Filter, Lookup, Update Strategy, Sorter, Joiner, Normalizer, and Router.
Carrying out data administration tasks and completing ad-hoc requests by user requirements using data management software and tools like Perl, Toad, MS Access, Excel, and SQL.
Utilized PL/SQL to write, test, and implement triggers, stored procedures, and functions at the database level.
Created several reports using a collection of data from an SQL query and a set of pivot tables built up in Excel.
Extracted data from a variety of sources such as MySQL, Oracle 11g, MS SQL server, etc., and analyzed large data sets using tools such as SQL, SAS, and JMP.
Using Tableau, I created and delivered reports that had drill-down, drill-through, and drop-down menu options, as well as parameterized and linked reports.