Summary
Overview
Work History
Skills
Websites
Timeline
Generic

Sai Krishna

Senior Data Engineer
Jersey City,NJ

Summary

  • Accumulating more than 7 years of professional expertise as a Data Engineer, specializing in the development of data systems and the analysis of data within business frameworks.
  • Proficiently engaged in all phases of the Software Development Life Cycle (SDLC)and actively participated in daily Scrum meetings, fostering collaboration across cross-functional teams.
  • Responsible for designing and implementing scalable data architectures on the Azure platform, utilizing services like Azure Data Lake Storage, Azure Synapse Analytics, and Azure Databricks to efficiently store, process, and analyze large volumes of data.
  • Acquired hands-on expertise in importing and exporting data through stream processing platforms such as Flume and Kafka, adeptly managing efficient data flow.
  • Proficient in hosting applications on Google Cloud Platform (GCP), leveraging Compute Engine, App Engine, Cloud SQL, Kubernetes Engine, and Cloud Storage for seamless deployment.
  • Developed and deployed various Lambda functions in AWS, both with built-in AWS Lambda Libraries and custom libraries in Scala.
  • Proficient in deploying microservices to AWS ECS, running Python jobs in AWS Lambda, and containerized deployments of Java and Python.
  • Competent in working with Terraform, making use of its infrastructure as code, execution plans, and change automation features.
  • Hands-on experience in Kubernetes for cluster management in AWS.
  • Engaged with Big Data tools on Google Cloud Platform, including BigQuery, Pub/Sub, Dataproc, and Dataflow, facilitating large-scale data processing and analysis.
  • Robust Proficiency in data migration, cleansing, transformation, integration, import, and export. Excellent technical and analytical skills, with a clear understanding of design objectives.
  • Expertise in designing for Online Transaction Processing (OLTP) and dimensional modeling for Online Analytical Processing (OLAP).
  • Experienced in Scala-based Apache Spark projects, proficient in RDD transformations, Data Frame operations, Spark SQL, and Spark Streaming implementations.
  • Proficient in Snowflake utilities, including SnowSQL and SnowPipe, and have used Python and Java for implementing Big Data model techniques.
  • Demonstrated expertise in migrating data warehouses and databases to Hadoop and No SQL platforms, ensuring seamless transitions. Extensive experience with ETL tools like Informatica and Talend.
  • Proficient in Apache Spark job execution components, encompassing DAG (Directed Acyclic Graph), lineage graph, DAG Scheduler, Task Scheduler, and Cron Tab Stages. Crafted insightful dashboards using Tableau to fulfill various business requirements and stakeholder needs.
  • Proficient in Python scripting, including statistical functions with NumPy and data visualization using Matplotlib and Pandas. Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
  • Experienced with Amazon Web Services (AWS) concepts such as EMR (Elastic MapReduce) and EC2 web services.
  • Proficient in data preparation, data modeling, and data visualization using Power BI, including the development of various analysis services using Data Analysis Expressions (DAX) queries.
  • Demonstrated the ability to address complex Proof of Concepts (POCs) based on business requirements and develop comprehensive test cases for unit testing.

Overview

8
8
years of professional experience

Work History

Senior GCP Data Engineer

Nationwide
08.2023 - Current

•Proficient in crafting and designing multiple data pipelines, overseeing complete ETL and ELT process for data ingestion and transformation within Google Cloud Platform (GCP)
•Successfully set up Continuous Delivery pipeline using Docker and GitHub, streamlining deployment process
•Developed, deployed, and managed results utilizing Spark and Scala code within Hadoop cluster hosted on GCP
•Practical familiarity with Google Cloud Functions, employing Python for transferring data from CSV files stored in Google Cloud Storage (GCS) buckets into BigQuery
•Proficient in processing and loading both bounded and unbounded data from Google Pub/Sub topics to BigQuery via Cloud Dataflow
•Leveraged Spark and Scala APIs to assess performance of Spark in comparison to Hive and SQL
•Successfully deployed applications to GCP using Spinnaker, leveraging rpm-based packages
•Architected several Directed Acyclic Graphs (DAGs) to automate ETL pipelines for seamless data processing.
•Developed pipeline for Proof of Concept (POC) to assess performance and efficiency of pipeline execution, comparing Google Cloud Dataproc clusters with Google Cloud Dataflow
•Automated feature engineering using Python scripts and deployed these on Google Cloud Platform (GCP) and Big Query
•Responsible for implementing monitoring solutions using Terraform, Docker, and Jenkins
•Also automated Datadog Dashboards using Terraform Scripts
•Proficient in architecting ETL transformation layers and writing Spark jobs to facilitate data processing.
•Proficient in collecting and processing large-scale raw data through scripting, web scraping, API calls, SQL queries, and application development
•Experienced in fact-dimensional modeling, including Star schema, Snowflake schema, transactional modeling, and Slowly Changing Dimensions (SCD)
•Involved in building ETL processes within Kubernetes, employing tools like Apache Airflow and Spark on GCP
•Proficient in machine learning techniques such as Decision Trees, Linear/Logistic Regression, and Statistical Modeling
•Experience in implementing machine learning back-end pipelines, particularly with Pandas and NumPy.

Senior GCP Data Engineer

Global Atlantic Financial Group
11.2022 - 07.2023

•Crafted data pipelines within Google Cloud's Airflow to streamline ETL tasks using airflow operators
•Proficient in wide range of Google Cloud Platform (GCP) services including Dataproc, Google Cloud Storage (GCS), Cloud Functions, and BigQuery
•Developed real-time analytics pipeline on Google Cloud Platform (GCP), leveraging Apache Kafka for management and analysis of extensive streaming data stored in Google Cloud Storage (GCS), facilitating prompt insights for business decision-making
• Designed and implemented data migration pipelines using Google Cloud's suite of services such as Cloud Storage, BigQuery, and Dataflow to transfer data seamlessly from Azure to GCP
• Integrated Datadog into continuous integration and continuous deployment (CI/CD) pipelines to monitor performance impact of code changes, track deployments, and ensure reliability of applications throughout the software development lifecycle
• Actively participated in migrating on-premises Hadoop systems to GCP (Google Cloud Platform)
• Conducted in-depth analysis of data from diverse domains to enable seamless integration into a Data Marketplace
• Developed Pyspark programs, established data frames, and executed data transformations
• Proficiently employed a variety of GCP services, including GCP Cloud Storage, Dataproc, Data Flow, BigQuery, Cloud Storage, Dataproc, Compute Engine, and GKE
• Configured Snowflake to directly ingest data from GCP storage services like Google Cloud Storage using storage integrations
• Leveraged GCP's managed services, including Cloud Dataflow and Apache Beam, to orchestrate complex data processing tasks and perform batch and stream processing on data stored in Snowflake and other GCP services
• Developed a Continuous Delivery pipeline incorporating Maven, Ant, Jenkins, and GCP
• Engineered multi-cloud strategies, leveraging strengths of GCP, especially its Platform as a Service (PaaS) offerings
• Crafted and implemented automated remediation workflows utilizing Datadog's integrations and APIs for finance data management
• These workflows effectively addressed monitoring alerts, executed self-healing actions, and mitigated incidents in real time
• Enacted daily data file storage in Google Cloud buckets, effectively harnessing DataProc and BigQuery for maintaining cloud-based solutions
• Collaborated with various business units to steer design and development strategy
• Produced functional specifications and technical design documentation
• Coordinated with teams such as cloud security, Identity Access Management, Platform, and Network to secure necessary accreditations and intake processes
• Leveraged cloud and GPU computing technologies for automation of machine learning and analytics pipelines, with primary focus on GCP
• Actively engaged in Proof of Concept (POC) to assess different cloud offerings, including Google Cloud Platform (GCP)
• Conducted comparative analysis between self-hosted Hadoop and GCP's DataProc, while also exploring Big Table (managed HBase) use cases and evaluating performance improvements.

Senior AWS Data Engineer

Hudda infotech Private limited
03.2020 - 06.2022

• Leveraged Spark RDD, Data Frame API, Data Set API, Data Source API, Spark SQL, and Spark Streaming alongside SQL and DynamoDB for comprehensive data processing
• Developed Spark applications using both Python and R, including implementing Apache Spark data processing projects to handle data from various RDBMS and streaming sources
• Employed Apache Spark's data frames, Spark-SQL, and Spark MLlib extensively, designing and developing POCs using Scala, Spark SQL, and MLlib libraries
• Pioneered the deployment of AWS CloudFormation templates to streamline provisioning and managing infrastructure resources, ensuring scalability and resilience in multi-tier application environments
• Efficiently extracted data from SQL server, Amazon S3 buckets, and internal SFTP, loading them into AWS S3 buckets in a data warehouse context
• Developed Spark jobs for data processing and orchestrated instances and clusters to load data into AWS S3 buckets, thereby creating a DataMart
• Leveraged AWS EMR for processing and transforming data to assist the Data Science team based on business requirements
• Designed and developed ETL processes in AWS Glue to migrate campaign data from external sources, such as S3, ORC/Parquet/Text files, into AWS Redshift
• Engaged in both batch processing and real-time data processing using Spark Streaming with a Lambda architecture
• Developed Python code for various tasks, dependencies, and time sensors in the context of workflow management and automation using the Airflow tool
• Collaborated with the DevOps team to implement Nifi Pipelines on EC2 nodes, integrated with Spark, Kafka, and Postgres running on other instances, using SSL handshakes in QA and Production Environments.

Azure Data Engineer

Grapesoft Solutions
09.2018 - 02.2020

•Orchestrated pipelines to extract, transform, and load data from diverse sources including Azure SQL, Blob storage, Azure SQL Data Warehouse, and write-back tools
•Analyzed, designed, and constructed contemporary data solutions using Azure's Platform as a Service (PaaS) to facilitate data visualization
•Extracted, transformed, and loaded data from source systems to Azure Data Storage services
•Designed and maintained data models and schemas in Azure Synapse Analytics for efficient querying and reporting, utilizing T-SQL for schema management and optimization
•Implemented a scalable data integration solution on Microsoft Azure utilizing Informatica, enabling seamless extraction, transformation, and loading (ETL) of large datasets from diverse sources into Azure data repositories for advanced analytics and reporting.
•Developed and deployed Java MapReduce jobs on Azure HDInsight, enhancing data processing capabilities
•Designed and implemented data processing and transformation logic in Azure kaf using Spark, PySpark, and SQL
•Architected scalable and cost-effective data processing pipelines using Azure Databricks, Spark, and Delta Lake to handle large volumes of streaming and batch data
•Integrated Azure data services with other Azure platform services like Azure Active Directory, Azure VNet, and Azure Monitoring
•Implemented SVN (Subversion) version control system for maintaining and tracking revisions in data pipelines, facilitating effective collaboration and versioning control among development teams.

Big Data Engineer

Yana Software Private Limited
07.2016 - 07.2018

•Proficiently analyzed the Hadoop cluster and various big data analytic tools, including the HBase database and Sqoop
•Leveraged Talend for data integration, cleansing, and transformation, while using dbt to refine raw data into structured datasets, leading to faster processing times and higher data quality
•Crafted, developed, and maintained Tableau functional reports according to user specifications, ensuring meaningful data visualization
•Deployed Hadoop and Cloudera Distribution for Hadoop (CDH) to optimize the data processing pipeline, including setup, real-time data ingestion Flume, and Spark analytics
•Proficiency in Python and Scala, with a knack for creating user-defined functions (UDF) for Hive and Pig using Python
•Integrated MongoDB with big data processing frameworks like Hadoop and Spark to build end-to-end data pipelines for batch and stream processing
•Configured HBase tables to accommodate various data formats, specifically PII data from diverse portfolios
•Developed complex Hive SQL queries to extract, transform, and load data from HDFS into Hive tables
•Demonstrated a commitment to best practices in unit testing, continuous integration, continuous delivery (CI/CD), performance testing, capacity planning, documentation, monitoring, alerting, and incident response.

Skills

Azure Services: Azure SQL, Blob storage, Azure Data Storage, Azure Synapse Analytics, Azure Databricks, HDInsight

undefined

Timeline

Senior GCP Data Engineer

Nationwide
08.2023 - Current

Senior GCP Data Engineer

Global Atlantic Financial Group
11.2022 - 07.2023

Senior AWS Data Engineer

Hudda infotech Private limited
03.2020 - 06.2022

Azure Data Engineer

Grapesoft Solutions
09.2018 - 02.2020

Big Data Engineer

Yana Software Private Limited
07.2016 - 07.2018
Sai KrishnaSenior Data Engineer