Summary
Overview
Work History
Skills
Timeline
Generic

Rama Krishna D

Summary

Around 8+ Years of IT background in delivering end-to-end data analytics solutions. GCP Proficiency: Mastery over GCP services such as BigQuery, Google Cloud Storage (GCS) buckets, Cloud Functions, and Dataflow for seamless data analytics solutions. GCP Command Line Tools: Solid grasp of Cloud Shell, GSUTIL, and BQ command line utilities for efficient management of GCP resources. Data-driven setting fostering collaborations to maintain the edge necessary for delivering products and services built on innovative technologies like Artificial Intelligence and Machine Learning. Data Pipeline Execution: Successfully designing and executing data pipelines within the GCP platform, optimizing data flow strategies for insights generation. Containerization and Kubernetes: Proficient handling of Docker and Kubernetes for efficient containerization and orchestration, particularly experienced in managing deployments using Google Kubernetes Engine (GKE). Azure DevOps Expertise: Proficiency in building reusable YAML pipelines, creating CI/CD pipelines using Azure DevOps, and implementing Git flow branching strategies. Azure Cloud Services: Solid understanding of Azure services including Databricks, Data Factory, Data Lake, and Function Apps for effective data management and analytics. Efficient Data Integration: Expertise in designing and deploying SSIS packages for data extraction, transformation, and loading into Azure SQL Database and Data Lake Storage. Hadoop Proficiency: Strong support experience across major Hadoop distributions - Cloudera, Amazon EMR, Azure HDInsight, Hortonworks, utilizing tools such as HDFS, MapReduce, Spark, Kafka, Hive, and more. Real-time Data Solutions: Proficient in building real-time data pipelines and analytics using Azure components like Data Factory, HDInsight, and Stream Analytics. API Development and Integration: Experienced in developing highly scalable and resilient RESTful APIs, ETL solutions, and third-party platform integrations within the GCP ecosystem. AWS Cloud Services: Proficiency in AWS cloud services like EC2, S3, Glue, Athena, DynamoDB, RedShift, and hands-on experience with Hadoop ecosystem tools. Legacy Data Migration: Led successful migration projects from Teradata to AWS Redshift and on-premises to AWS Cloud, ensuring seamless SQL database migration to GCP's Data Lake, BigQuery, and other relevant services. AWS Cloud-Based Pipelines: Utilization of AWS services like EMR, Lambda, and Redshift to develop cloud-based pipelines and Spark applications. Azure Data Services: ETL expertise using Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics, along with ingestion and processing within Azure Databricks. GCP Data Services Excellence: Expertise in GCP data services including Cloud Composer for orchestrating data tasks and profound ETL experience using GCP services like Dataflow. DevOps and Scripting Proficiency: Skilled in PowerShell scripting, Bash, YAML, JSON, GIT, Rest API, and Azure Resource Management (ARM) templates for effective pipeline management. Data Visualization and Analysis: Proficient in creating data visualizations using Python, Scala, Tableau, and developing Spark scripts for data transformation. Big Data Ecosystem: Extensive experience with Amazon EC2, Azure Cloud, and Big Data tools like Hadoop, HDFS, MapReduce, Hive, HBase, Spark, Kafka, Flume, Avro, Sqoop, and PySpark. Database Migration: Expertise in migrating SQL databases to AWS Redshift and Azure Data Lake, along with managing SQL databases, Parquet files, and parsing JSON formats. Cloud Computing and Big Data Tools: Proficiency in AWS Cloud and Azure components, with working knowledge of Spark using Scala and PySpark. Real-time Data Solutions: Building real-time data pipelines and analytics using AWS components and setting up workflows with tools like Apache Airflow and Oozie. Database Expertise: Working with SQL Server and MySQL databases, skilled in setting up workflows with Apache Airflow. IDE and Version Control: Proficient use of version control systems like Git, along with popular IDEs like PyCharm and IntelliJ for efficient code management. Spark Streaming: Developing Spark streaming modules for RabbitMQ and Kafka data ingestion in Azure and GCP environments. Windows Scripting and Cloud Containerization: Proficient in scripting and debugging within Windows environments, familiarity with container orchestration, Kubernetes, Docker, and Azure Kubernetes Service (AKS).

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

Cencora
TX
01.2023 - Current
  • Developing and deploying robust data pipelines using Apache Airflow to handle Extract, Transform, Load (ETL) tasks, ensuring seamless data flow from source to destination.
  • Utilizing Cloud Dataflow and Python programming to perform comprehensive data validation between source files and BigQuery tables, assuring data integrity and reliability.
  • Engineering PySpark scripts to efficiently merge dynamic and static files, cleanse data, and perform intricate transformations, enabling high-quality data for analysis.
  • Deploying applications onto GCP using Spinnaker, overseeing the end-to-end project lifecycle, from design and development to deployment and maintenance.
  • Implementation of novel iterative development procedures on jupyterLab based IDE AI Notebooks.
  • Leveraging Google Cloud Composer for advanced workflow monitoring and logging, providing real-time insights into task progress and status throughout project phases.
  • Proficiency in data ingestion, transformation, and loading into Redis, enhancing data accessibility and usability, while maintaining high performance.
  • Deploying and configuring Apache Flink clusters for efficient, real-time data processing, ensuring optimal data flow and manipulation.
  • Designing and executing complex data processing pipelines using Apache Flink, supporting intricate data transformations and analytics.
  • Leveraging IBM Streams for processing data from diverse sources, utilizing various processing operators and advanced analytics techniques.
  • Developed a new data scheme for the data consumption store for the Machine Learning and AI models to quicken the processing time using SQL, Hadoop, and Cloud services.
  • Ensuring data security and privacy by implementing Google Cloud Dataproc with integrated encryption mechanisms, safeguarding sensitive project data.
  • Proficiency in Redis Pub/Sub for real-time messaging and event-driven architectures, enabling responsive data interactions within projects.
  • Developing and executing complex queries, calculations, and aggregations using OBIEE, providing actionable insights to project stakeholders.
  • Lead multiple AI and Machine Learning programs within our product suits; Programs include - Predictive Analytics service - baselining and forecasting of performance and security KPIs, Security Analytics - Anomaly detection service - clustering of devices based on behavior over time - NLP, LSTM, KubeFlow, Docker, AWS Sagemaker, AWS Greengrass.
  • Demonstrating expertise in GCP Data Proc, GCS Cloud Functions, and BigQuery, harnessing the full capabilities of these services for efficient data management and analysis.
  • Utilizing GCP Cloud Shell SDK for streamlined configuration of services like Data Proc, Storage, and BigQuery, ensuring project readiness.
  • Optimizing PySpark jobs for Kubernetes Clusters, significantly improving data processing speed and efficiency for real-time projects.
  • Transforming Hive/SQL queries into Spark transformations, streamlining data processing for effective decision-making.
  • Designing complex Directed Acyclic Graphs (DAGs) in Google Cloud Composer to automate intricate project workflows, enhancing efficiency and reliability.
  • Successfully launching multi-node Kubernetes clusters on Google Kubernetes Engine (GKE), ensuring scalability and resilience for data-intensive projects.
  • Specialization in Redis data modeling and cache management, optimizing data accessibility and performance in high-throughput project environments.
  • Proficiency in Python libraries like pandas, NumPy, and SQL Alchemy for comprehensive data processing and analysis, supporting data-driven decision-making.
  • Demonstrating expertise in Snowflake, including table and view creation, optimization techniques, and ensuring high query performance for data analytics projects.
  • Utilizing various GCP utilities and services for cloud-based ETL and data management, streamlining data processing capabilities for data-centric projects.
  • Streamlining large-scale data processing using Google Cloud Dataproc, ensuring efficient and effective data handling for high-volume, real-time projects.
  • Proficiently managing SQL and PL/SQL using SQL Developer for Oracle databases, ensuring robust data management and manipulation for project data.
  • Leveraging Sqoop for seamless data import and export into HDFS and Hive, simplifying data integration for data-centric projects.
  • Implementing Hibernate for efficient data persistence, employing HQL-based queries for comprehensive CRUD operations, supporting project data integrity.
  • Designing Airflow scripts with Python for workflow automation, enabling the seamless execution of project tasks and processes.
  • Playing a pivotal role in performance tuning and optimization of applications, ensuring optimal resource utilization and responsiveness for critical projects.
  • Environment: GCP, Python, Google Cloud Pub/Sub, Google BigQuery, Google Cloud Storage (GCS) bucket, Kubernetes, Docker, Oracle, Hadoop, PySpark, Apache Airflow, SQL, JSON, Jenkins, Git (CI/CD), Cucumber (with Python).

Senior Data Engineer

Target
NY
02.2022 - 12.2023
  • Developed RESTful APIs using Python with Flask and Django frameworks, integrating diverse data sources.
  • Leveraged Apache Spark with Python for Big Data Analytics and Machine Learning applications.
  • Designed and deployed SSIS packages for data loading and transformation within Azure databases.
  • Configured and managed SSIS Integration Runtime for executing packages in Azure.
  • Proficient in version control systems like Git for managing Data Lakehouse pipelines.
  • Automated monitoring tasks using AWS CloudWatch and Redshift Query Performance Insights.
  • Conducted performance tuning and optimization of existing AWS Redshift stored procedures.
  • Implemented automated backups and snapshots for AWS RDS instances.
  • Leveraged Python libraries for advanced data analysis and integration tasks.
  • Designed and developed RESTful APIs for data integration.
  • Collaborated with data scientists and analysts to deploy machine learning models using Python.
  • Proficient in data validation and cleansing procedures using SQL.
  • Proficiently profiled structured, unstructured, and semi-structured data.
  • Experience working with Microsoft Azure Cloud services.
  • Executed ETL operations using Azure Data Factory, Spark SQL, and T-SQL.
  • Expertise in data migration to various Azure services.
  • Orchestrated data extraction, transformation, and loading across Azure services.
  • Automated script execution through Apache Airflow and shell scripting.
  • Constructed pipelines in Azure Data Factory.
  • Led Data Migration initiatives employing SQL, SQL Azure, and Azure Data Factory.
  • Proficiently profiled structured, unstructured, and semi-structured data from various sources.
  • Employed PowerShell and UNIX scripts for various tasks.
  • Leveraged Sqoop for data transfer between RDBMS and HDFS.
  • Installed and configured Apache Airflow for data workflows.
  • Employed MongoDB for data storage.
  • Developed RESTful APIs, ETL solutions, and platform integrations.
  • Proficiently used IDEs and version control systems.
  • Environment: Azure, Python, Azure Event Hubs, Azure Data Lake Storage, Azure Databricks, Kubernetes, Docker, Azure SQL Database, Azure HDInsight (Hadoop), PySpark, Apache Airflow, SQL, JSON, Jenkins, Git (CI/CD), Cucumber (with Python).

Data Engineer

Charles Schwab
NY
03.2020 - 01.2022
  • Contributed to the analysis, design, and development phases of the Software Development Lifecycle (SDLC) in an agile environment.
  • Leveraged PySpark extensively for transformations and data processing on Azure HDInsight.
  • Developed Azure ML Studio pipelines, integrating Python for machine learning algorithms.
  • Orchestrated data loading from diverse sources to Azure Data Lake using Azure Data Factory.
  • Implemented business rules for contact deduplication using Spark transformations with PySpark.
  • Designed and deployed multi-tier applications on AWS services, focusing on high-availability and fault tolerance.
  • Developed Graph Database nodes and relations using Cypher language.
  • Built microservices with AWS Lambda for third-party vendor API calls.
  • Provided support for cloud instances on AWS, managing resources and security.
  • Engineered data pipelines using Spark, Hive, Pig, Python, Impala, and HBase for customer data.
  • Configured AWS services including EC2, S3, Elastic Load Balancing, and security measures.
  • Automated backups and data management tasks using AWS CLI.
  • Created and managed Docker containers for application deployment.
  • Proficient in container clustering with Docker, Swan, Mesos, and Kubernetes.
  • Established and managed Jenkins CI/CD pipelines for automation.
  • Managed artifacts within Nexus repository and deployed them using Ansible and Jenkins.
  • Utilized monitoring tools like Nagios, Splunk, AppDynamics, and CloudWatch.
  • Set up JIRA as a defect tracking system for bug and issue tracking.
  • Environment: Azure, Azure Data Lake Storage, Azure Databricks, Azure HDInsight (Spark), Azure Stream Analytics, Azure SQL Database, Azure Machine Learning, Azure Data Factory, Spark, Spark Streaming, Spark SQL, HDFS, Hive, Apache Kafka, Sqoop, Java, Scala, Linux, Jenkins, Git (CI/CD), Flask Framework, IntelliJ IDEA, Eclipse, Tableau, MySQL, Postman, Agile Methodologies, Azure Functions, Docker.

Data Engineer

Magneto IT Solutions
Pune
08.2017 - 10.2019
  • Captured comprehensive business, system, and design requirements, conducting gap analysis.
  • Designed a dynamic, cross-device, cross-browser, and mobile-friendly web dashboard using Angular JS.
  • Orchestrated the development of Bot framework conversation flows with NODE-RED, NodeJS, and MS Bot framework.
  • Managed SSIS packages for seamless data integration and transformation within Azure.
  • Implemented client-side validation with Validation Controls and Angular Material Design.
  • Engineered Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation.
  • Established a CI/CD pipeline with Jenkins and Airflow for containerization using Docker and Kubernetes.
  • Orchestrated ETL operations using SSIS, NIFI, Python scripts, and Spark Applications.
  • Implemented data quality checks with Spark Streaming for data integrity.
  • Utilized Python, Informatica, MS SQL SERVER, T-SQL, SSIS, SSRS, and SQL Server Management Studio for various tasks.
  • Environment: Hadoop, SSIS, SQL, Map Reduce, HDFS, Hive, Pig, Java, Hadoop distribution of Cloudera, Pig, Linux, XML, Eclipse, Oracle 10g, PL/SQL.

Skills

  • Kafka
  • Cassandra
  • Apache Spark
  • HBase
  • Impala
  • Hadoop
  • HDFS
  • MapReduce
  • Hive
  • Pig
  • Sqoop
  • Flume
  • Oozie
  • Zookeeper
  • Cloudera CDH
  • Horton Works HDP
  • SQL
  • Python
  • PySpark
  • Scala
  • Shell Scripting
  • Java
  • Regular Expressions
  • RDD
  • Spark SQL
  • Spark Streaming
  • Azure
  • AWS
  • GCP
  • Oracle
  • Teradata
  • MySQL
  • SQL Server
  • NoSQL Databases
  • Git
  • Maven
  • SBT
  • Kubernetes
  • Docker
  • Power BI
  • Tableau

Timeline

Senior Data Engineer

Cencora
01.2023 - Current

Senior Data Engineer

Target
02.2022 - 12.2023

Data Engineer

Charles Schwab
03.2020 - 01.2022

Data Engineer

Magneto IT Solutions
08.2017 - 10.2019
Rama Krishna D