Summary
Overview
Work History
Education
Skills
Timeline
Generic

P V Krishna

Summary

As a seasoned Data Engineer with over 7 years of experience, I have specialized in the design, implementation, and management of complex data pipelines and ETL processes, leveraging a comprehensive array of tools and technologies across Databricks, AWS, Azure, Snowflake, and the Hadoop ecosystem. Expertise in SQL, Python, and PySpark for designing and implementing data pipelines within the Databricks environment, optimizing data processing and analytics tasks. Proficient in developing Databricks Notebooks and applying transformations using Spark SQL for seamless data flow and integration across various platforms. Skilled in enhancing data integrity and availability through the construction and maintenance of data transformation processes, utilizing Scala and SQL within Databricks. Demonstrated expertise in working with Apache Spark and other distributed computing frameworks to process and analyze large-scale datasets, showcasing expertise in big data technologies. Implemented and automated data movement in and out of Snowflake, using features like Snowpipe, UDFs, and zero copy clones for efficient cloud data management. Created and managed Azure Data Factory (ADF) Pipelines, integrating Azure services like Azure Databricks and Azure Synapse Analytics for enhanced data processing capabilities. Configured and managed Azure Blob Storage and Azure Data Lake Storage, alongside AWS S3, for scalable data storage solutions. Developed and deployed machine learning models within Azure Machine Learning Studio and Databricks MLflow, leveraging AWS Lambda for serverless data processing tasks. Employed AWS Redshift for data warehousing, Amazon Athena for ad-hoc query executions, and AWS Glue for robust ETL processing, integrating tools like Apache Airflow for workflow orchestration. Advanced scripting and automation capabilities demonstrated through the development of Hive and Bash scripts, utilizing SQOOP and Hadoop Filesystem APIs for data ingestion. Hands-on experience in Hadoop administration and support, including the use of Cloudera Manager and Ambari for cluster management and maintenance. Utilized Java and Spring Boot to design and develop RESTful APIs within a microservices architecture, employing Docker and Kubernetes on OpenShift for containerization and orchestration. Implemented data streaming processes using Kafka, enhancing real-time data processing capabilities between microservices. Integrated Cassandra for NoSQL data storage and Elasticsearch for advanced search and analytics capabilities, showcasing versatility in handling different data storage formats like AVRO, Parquet, and ORC. Engaged in Agile Scrum methodologies, effectively employing tools like Confluence and Jira for project management and collaboration. Monitored application performance and managed cloud resources using Azure Monitor, Splunk, Kibana, and Elasticsearch for log analysis and performance metrics. Implemented security best practices in API development, including data encryption and authorization mechanisms, and integrated SonarQube and Fortify for continuous code quality and security assessments. Conducted code reviews and collaborated with QA teams to ensure the delivery of robust, error-free applications, highlighting a commitment to high-quality software delivery. Delivered CI/CD pipelines using GitLab, focusing on automation, continuous integration, and deployment to support a robust production environment. Committed to continuous learning and professional development, exploring emerging technologies like machine learning pipelines with TensorFlow and PyTorch, and blockchain for enhanced data security and transparency.

Overview

7
7
years of professional experience

Work History

Senior Data Engineer

Cigna
02.2023 - Current
  • Designed and implemented data pipelines in Databricks, leveraging SQL, Python, and PySpark for data processing and analysis
  • Developed Databricks Notebooks to apply transformations using Spark SQL, ensuring seamless data flow and integration across systems, databases, and applications
  • Constructed and maintained processes for data transformation, data structures, metadata, dependency, and workload management, enhancing data integrity and availability
  • Delivered CI/CD pipelines using GitLab, emphasizing automation, continuous integration, and deployment practices to support a robust production environment
  • Provided hands-on experience in data refinement and performance tuning using PySpark, Scala, and SQL within Databricks, optimizing data processing and analytics tasks
  • Worked with distributed computing frameworks like Apache Spark to process and analyze large-scale datasets, demonstrating expertise in big data technologies and methodologies
  • Implemented Snowflake features including Snowpipe, UDFs, zero copy clones, time travel, micro-partitions, stored procedures, data import/export, and external tables for efficient cloud data management
  • Automated data movement in and out of Snowflake using SnowPipe, Streams, and Tasks utilities, ensuring efficient data pipeline constructions and data availability
  • Executed test plans to identify and analyze defects, integrated new code seamlessly, and identified performance bottlenecks, ensuring high-quality software delivery
  • Created Azure Data Factory (ADF) Pipelines and managed custom Azure development, showcasing proficiency in cloud data orchestration and troubleshooting
  • Integrated Azure services like Azure Databricks and Azure Synapse Analytics for enhanced data processing and analytics capabilities
  • Managed and configured Azure Blob Storage and Azure Data Lake Storage for data storage solutions
  • Utilized Snowflake's Data Sharing capabilities to securely share real-time, read-only data with external partners
  • Designed and implemented security measures in Snowflake, configuring roles, warehouses, and access controls
  • Optimized query performance in Snowflake by utilizing Query History and Warehouse Scaling features
  • Leveraged Azure Logic Apps and Azure Functions for creating serverless event-driven architectures
  • Developed and deployed machine learning models within Azure Machine Learning Studio and Databricks MLflow
  • Monitored and managed cloud resources using Azure Monitor and Snowflake's Account Usage and Warehouse Metrics.

Sr. Data Engineer

Wipro Limited
08.2021 - 01.2023
  • Designed, built, and automated ETL processes utilizing AWS Redshift, S3, alongside AWS Lambda for serverless data processing tasks, ensuring efficient data flow and optimal storage solutions
  • Leveraged Amazon Athena for ad-hoc query executions directly on data stored in S3, enhancing data accessibility and analytical capabilities
  • Employed AWS Glue for robust ETL processing, integrating additional tools like Apache Airflow for workflow orchestration, automating and scheduling complex data pipelines
  • Conducted data profiling and synthesized findings using advanced analytics tools like Amazon QuickSight and Google Data Studio, ensuring comprehensive data quality assessments
  • Utilized advanced features of Snowflake for data warehousing, implementing data sharing and cloning capabilities to enhance data availability and disaster recovery strategies
  • Applied sophisticated data modeling techniques using tools like Erwin or dbt (data build tool) for managing data transformations and version control in the data warehouse
  • Built and managed scalable data pipelines using Azure Databricks in conjunction with PySpark, integrating Azure Data Lake Storage for extensive data volume management, ensuring robust data processing capabilities
  • Addressed and resolved performance bottlenecks by implementing indexing and partitioning strategies in databases and data lakes, using tools such as Amazon Redshift Spectrum for querying data across AWS storage services
  • Engaged in Agile Scrum processes, effectively using tools like Confluence alongside Jira for documentation, project tracking, and sprint planning, fostering a collaborative project environment
  • Implemented CI/CD pipelines for data pipelines using Jenkins and GitLab CI, automating testing and deployment processes to ensure high-quality data solutions delivery
  • Executed complex data integration tasks from a variety of sources, incorporating stream-processing systems like Apache Flink for real-time data processing, alongside traditional batch processing
  • Enhanced data ingestion capabilities with Kafka Connect, facilitating real-time data flows into and out of Apache Kafka, supporting high-throughput and scalable data pipelines
  • Committed to continuous learning and professional development, exploring emerging technologies such as machine learning pipelines with TensorFlow and PyTorch, and blockchain data integrity solutions for enhanced security and transparency.

Hadoop and Spark Developer

Optum
11.2018 - 08.2021
  • Developed Hive and Bash scripts for validating and transforming source data, automating data loading into HDFS and Hive for preprocessing using One Automation, demonstrating proficiency in scripting and automation
  • Gathered data from Data warehouses in Teradata and Snowflake, showcasing skills in data extraction and integration from diverse data warehousing technologies
  • Developed Spark/Scala and Python scripts for a regular expression project in the Hadoop/Hive environment, highlighting expertise in big data processing and programming languages
  • Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata, showing advanced skills in ETL processes and data warehousing solutions
  • Generated reports using Tableau, emphasizing capabilities in data visualization and reporting for executive decision-making
  • Built Big Data applications using Cassandra and Hadoop, demonstrating hands-on experience in developing scalable big data solutions
  • Utilized SQOOP, ETL, and Hadoop Filesystem APIs for implementing data ingestion pipelines, showcasing knowledge in data ingestion tools and methodologies
  • Worked on batch data of different granularity ranging from hourly to monthly, illustrating the ability to manage and process data at varying intervals
  • Had hands-on experience in Hadoop administration and support activities, including installations and configuration of Apache Big Data Tools and Hadoop clusters using Cloudera Manager, indicating a strong foundation in Hadoop ecosystem management
  • Handled Hadoop cluster installations in various environments such as Unix, Linux, and Windows, showing versatility in operating systems
  • Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive, demonstrating maintenance skills in big data platforms
  • Developed and wrote SQLs and stored procedures in Teradata, loaded data into Snowflake, and wrote Snow SQL scripts, highlighting database programming and data loading expertise
  • Created TDCH scripts for full and incremental refresh of Hadoop tables, showcasing skills in data synchronization and update mechanisms
  • Worked with various data formats like AVRO, Sequence File, JSON, Map File, Parquet, and ORC, indicating versatility in handling different data serialization formats
  • Gained extensive experience in Teradata, Hadoop-Hive, Spark, SQLs, PL/SQLs, Snow SQLs, showing a broad range of database and data processing technologies
  • Designed and published visually rich and intuitive Tableau dashboards and crystal reports, emphasizing skills in advanced data visualization techniques
  • Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in PySpark, and efficient joins and transformations, highlighting optimization techniques for big data processing
  • Developed Scala scripts using both DataFrames/SQL and RDD/MapReduce in PySpark for Data Aggregation, queries, and writing data back into OLTP systems through SQOOP, showcasing advanced PySpark programming and data integration skills
  • Worked with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications, indicating a strong foundation in relational database management
  • Engaged with the Hadoop ecosystem from Hortonworks Data Platform and managed services through Cloudera Manager, showing expertise in big data platform administration
  • Employed Agile Scrum methodology for development, demonstrating proficiency in agile project management and teamwork.

API Developer

Emirates
04.2017 - 03.2018
  • Utilized Java and Spring Boot to design and develop highly scalable and maintainable RESTful APIs within a microservices architecture
  • Employed Docker for containerization and Kubernetes on OpenShift for orchestrating and managing microservices deployments
  • Implemented data streaming processes using Kafka for efficient and reliable data flow between microservices
  • Integrated microservices with Cassandra for NoSQL data storage and Elasticsearch for advanced search and analytics capabilities
  • Managed source code with Git and orchestrated CI/CD pipelines using Jenkins, integrating SonarQube and Fortify for continuous code quality and security assessments
  • Wrote JUnit test cases for Controller, Service, and DAO layers using Mockito and DBUnit
  • Developed unit test cases using a proprietary framework similar to JUnit
  • Used the JUnit framework for unit testing and ANT for building and deploying applications on WebLogic Server
  • Monitored application performance and troubleshot issues using tools like Splunk, Kibana, and Elasticsearch for log analysis and performance metrics
  • Implemented security best practices in API development, including secure coding techniques, data encryption, authentication, and authorization mechanisms
  • Integrated and utilized SonarQube for continuous code quality monitoring and technical debt management
  • Conducted code reviews and collaborated with QA teams to ensure the delivery of robust, error-free applications
  • Worked within Agile development frameworks, actively participating in sprints, stand-ups, and retrospectives
  • Collaborated with cross-functional teams, including data scientists, analysts, and product managers, to understand requirements and provide technical solutions, documenting API designs and system architecture.

Education

Master of Science - Information Technology

Trine University
Angola, IN
12-2023

Skills

  • Java, Python, Scala
  • Linux (Ubuntu, CentOS), Windows, Mac OS
  • Agile
  • MS-SQL, POSTGRES, Oracle 11g, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012
  • Shell scripting, UNIX, Python, R Language
  • AZURE BLOB, Synapse Analytics, ADF and ADLS GEN2 storage

Timeline

Senior Data Engineer

Cigna
02.2023 - Current

Sr. Data Engineer

Wipro Limited
08.2021 - 01.2023

Hadoop and Spark Developer

Optum
11.2018 - 08.2021

API Developer

Emirates
04.2017 - 03.2018

Master of Science - Information Technology

Trine University
P V Krishna