Summary

Overview

Work History

Education

Skills

Timeline

P V Krishna

Summary

As a seasoned Data Engineer with over 7 years of experience, I have specialized in the design, implementation, and management of complex data pipelines and ETL processes, leveraging a comprehensive array of tools and technologies across Databricks, AWS, Azure, Snowflake, and the Hadoop ecosystem. Expertise in SQL, Python, and PySpark for designing and implementing data pipelines within the Databricks environment, optimizing data processing and analytics tasks. Proficient in developing Databricks Notebooks and applying transformations using Spark SQL for seamless data flow and integration across various platforms. Skilled in enhancing data integrity and availability through the construction and maintenance of data transformation processes, utilizing Scala and SQL within Databricks. Demonstrated expertise in working with Apache Spark and other distributed computing frameworks to process and analyze large-scale datasets, showcasing expertise in big data technologies. Implemented and automated data movement in and out of Snowflake, using features like Snowpipe, UDFs, and zero copy clones for efficient cloud data management. Created and managed Azure Data Factory (ADF) Pipelines, integrating Azure services like Azure Databricks and Azure Synapse Analytics for enhanced data processing capabilities. Configured and managed Azure Blob Storage and Azure Data Lake Storage, alongside AWS S3, for scalable data storage solutions. Developed and deployed machine learning models within Azure Machine Learning Studio and Databricks MLflow, leveraging AWS Lambda for serverless data processing tasks. Employed AWS Redshift for data warehousing, Amazon Athena for ad-hoc query executions, and AWS Glue for robust ETL processing, integrating tools like Apache Airflow for workflow orchestration. Advanced scripting and automation capabilities demonstrated through the development of Hive and Bash scripts, utilizing SQOOP and Hadoop Filesystem APIs for data ingestion. Hands-on experience in Hadoop administration and support, including the use of Cloudera Manager and Ambari for cluster management and maintenance. Utilized Java and Spring Boot to design and develop RESTful APIs within a microservices architecture, employing Docker and Kubernetes on OpenShift for containerization and orchestration. Implemented data streaming processes using Kafka, enhancing real-time data processing capabilities between microservices. Integrated Cassandra for NoSQL data storage and Elasticsearch for advanced search and analytics capabilities, showcasing versatility in handling different data storage formats like AVRO, Parquet, and ORC. Engaged in Agile Scrum methodologies, effectively employing tools like Confluence and Jira for project management and collaboration. Monitored application performance and managed cloud resources using Azure Monitor, Splunk, Kibana, and Elasticsearch for log analysis and performance metrics. Implemented security best practices in API development, including data encryption and authorization mechanisms, and integrated SonarQube and Fortify for continuous code quality and security assessments. Conducted code reviews and collaborated with QA teams to ensure the delivery of robust, error-free applications, highlighting a commitment to high-quality software delivery. Delivered CI/CD pipelines using GitLab, focusing on automation, continuous integration, and deployment to support a robust production environment. Committed to continuous learning and professional development, exploring emerging technologies like machine learning pipelines with TensorFlow and PyTorch, and blockchain for enhanced data security and transparency.

Overview

years of professional experience

Work History

Senior Data Engineer

Cigna

02.2023 - Current

Designed and implemented data pipelines in Databricks, leveraging SQL, Python, and PySpark for data processing and analysis
Developed Databricks Notebooks to apply transformations using Spark SQL, ensuring seamless data flow and integration across systems, databases, and applications
Constructed and maintained processes for data transformation, data structures, metadata, dependency, and workload management, enhancing data integrity and availability
Delivered CI/CD pipelines using GitLab, emphasizing automation, continuous integration, and deployment practices to support a robust production environment
Provided hands-on experience in data refinement and performance tuning using PySpark, Scala, and SQL within Databricks, optimizing data processing and analytics tasks
Worked with distributed computing frameworks like Apache Spark to process and analyze large-scale datasets, demonstrating expertise in big data technologies and methodologies
Implemented Snowflake features including Snowpipe, UDFs, zero copy clones, time travel, micro-partitions, stored procedures, data import/export, and external tables for efficient cloud data management
Automated data movement in and out of Snowflake using SnowPipe, Streams, and Tasks utilities, ensuring efficient data pipeline constructions and data availability
Executed test plans to identify and analyze defects, integrated new code seamlessly, and identified performance bottlenecks, ensuring high-quality software delivery
Created Azure Data Factory (ADF) Pipelines and managed custom Azure development, showcasing proficiency in cloud data orchestration and troubleshooting
Integrated Azure services like Azure Databricks and Azure Synapse Analytics for enhanced data processing and analytics capabilities
Managed and configured Azure Blob Storage and Azure Data Lake Storage for data storage solutions
Utilized Snowflake's Data Sharing capabilities to securely share real-time, read-only data with external partners
Designed and implemented security measures in Snowflake, configuring roles, warehouses, and access controls
Optimized query performance in Snowflake by utilizing Query History and Warehouse Scaling features
Leveraged Azure Logic Apps and Azure Functions for creating serverless event-driven architectures
Developed and deployed machine learning models within Azure Machine Learning Studio and Databricks MLflow
Monitored and managed cloud resources using Azure Monitor and Snowflake's Account Usage and Warehouse Metrics.

Sr. Data Engineer

Wipro Limited

08.2021 - 01.2023

Designed, built, and automated ETL processes utilizing AWS Redshift, S3, alongside AWS Lambda for serverless data processing tasks, ensuring efficient data flow and optimal storage solutions
Leveraged Amazon Athena for ad-hoc query executions directly on data stored in S3, enhancing data accessibility and analytical capabilities
Employed AWS Glue for robust ETL processing, integrating additional tools like Apache Airflow for workflow orchestration, automating and scheduling complex data pipelines
Conducted data profiling and synthesized findings using advanced analytics tools like Amazon QuickSight and Google Data Studio, ensuring comprehensive data quality assessments
Utilized advanced features of Snowflake for data warehousing, implementing data sharing and cloning capabilities to enhance data availability and disaster recovery strategies
Applied sophisticated data modeling techniques using tools like Erwin or dbt (data build tool) for managing data transformations and version control in the data warehouse
Built and managed scalable data pipelines using Azure Databricks in conjunction with PySpark, integrating Azure Data Lake Storage for extensive data volume management, ensuring robust data processing capabilities
Addressed and resolved performance bottlenecks by implementing indexing and partitioning strategies in databases and data lakes, using tools such as Amazon Redshift Spectrum for querying data across AWS storage services
Engaged in Agile Scrum processes, effectively using tools like Confluence alongside Jira for documentation, project tracking, and sprint planning, fostering a collaborative project environment
Implemented CI/CD pipelines for data pipelines using Jenkins and GitLab CI, automating testing and deployment processes to ensure high-quality data solutions delivery
Executed complex data integration tasks from a variety of sources, incorporating stream-processing systems like Apache Flink for real-time data processing, alongside traditional batch processing
Enhanced data ingestion capabilities with Kafka Connect, facilitating real-time data flows into and out of Apache Kafka, supporting high-throughput and scalable data pipelines
Committed to continuous learning and professional development, exploring emerging technologies such as machine learning pipelines with TensorFlow and PyTorch, and blockchain data integrity solutions for enhanced security and transparency.

Hadoop and Spark Developer

Optum

11.2018 - 08.2021

Developed Hive and Bash scripts for validating and transforming source data, automating data loading into HDFS and Hive for preprocessing using One Automation, demonstrating proficiency in scripting and automation
Gathered data from Data warehouses in Teradata and Snowflake, showcasing skills in data extraction and integration from diverse data warehousing technologies
Developed Spark/Scala and Python scripts for a regular expression project in the Hadoop/Hive environment, highlighting expertise in big data processing and programming languages
Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata, showing advanced skills in ETL processes and data warehousing solutions
Generated reports using Tableau, emphasizing capabilities in data visualization and reporting for executive decision-making
Built Big Data applications using Cassandra and Hadoop, demonstrating hands-on experience in developing scalable big data solutions
Utilized SQOOP, ETL, and Hadoop Filesystem APIs for implementing data ingestion pipelines, showcasing knowledge in data ingestion tools and methodologies
Worked on batch data of different granularity ranging from hourly to monthly, illustrating the ability to manage and process data at varying intervals
Had hands-on experience in Hadoop administration and support activities, including installations and configuration of Apache Big Data Tools and Hadoop clusters using Cloudera Manager, indicating a strong foundation in Hadoop ecosystem management
Handled Hadoop cluster installations in various environments such as Unix, Linux, and Windows, showing versatility in operating systems
Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive, demonstrating maintenance skills in big data platforms
Developed and wrote SQLs and stored procedures in Teradata, loaded data into Snowflake, and wrote Snow SQL scripts, highlighting database programming and data loading expertise
Created TDCH scripts for full and incremental refresh of Hadoop tables, showcasing skills in data synchronization and update mechanisms
Worked with various data formats like AVRO, Sequence File, JSON, Map File, Parquet, and ORC, indicating versatility in handling different data serialization formats
Gained extensive experience in Teradata, Hadoop-Hive, Spark, SQLs, PL/SQLs, Snow SQLs, showing a broad range of database and data processing technologies
Designed and published visually rich and intuitive Tableau dashboards and crystal reports, emphasizing skills in advanced data visualization techniques
Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in PySpark, and efficient joins and transformations, highlighting optimization techniques for big data processing
Developed Scala scripts using both DataFrames/SQL and RDD/MapReduce in PySpark for Data Aggregation, queries, and writing data back into OLTP systems through SQOOP, showcasing advanced PySpark programming and data integration skills
Worked with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications, indicating a strong foundation in relational database management
Engaged with the Hadoop ecosystem from Hortonworks Data Platform and managed services through Cloudera Manager, showing expertise in big data platform administration
Employed Agile Scrum methodology for development, demonstrating proficiency in agile project management and teamwork.

API Developer

Emirates

04.2017 - 03.2018

Utilized Java and Spring Boot to design and develop highly scalable and maintainable RESTful APIs within a microservices architecture
Employed Docker for containerization and Kubernetes on OpenShift for orchestrating and managing microservices deployments
Implemented data streaming processes using Kafka for efficient and reliable data flow between microservices
Integrated microservices with Cassandra for NoSQL data storage and Elasticsearch for advanced search and analytics capabilities
Managed source code with Git and orchestrated CI/CD pipelines using Jenkins, integrating SonarQube and Fortify for continuous code quality and security assessments
Wrote JUnit test cases for Controller, Service, and DAO layers using Mockito and DBUnit
Developed unit test cases using a proprietary framework similar to JUnit
Used the JUnit framework for unit testing and ANT for building and deploying applications on WebLogic Server
Monitored application performance and troubleshot issues using tools like Splunk, Kibana, and Elasticsearch for log analysis and performance metrics
Implemented security best practices in API development, including secure coding techniques, data encryption, authentication, and authorization mechanisms
Integrated and utilized SonarQube for continuous code quality monitoring and technical debt management
Conducted code reviews and collaborated with QA teams to ensure the delivery of robust, error-free applications
Worked within Agile development frameworks, actively participating in sprints, stand-ups, and retrospectives
Collaborated with cross-functional teams, including data scientists, analysts, and product managers, to understand requirements and provide technical solutions, documenting API designs and system architecture.

Education

Master of Science - Information Technology

Trine University

Angola, IN

12-2023

Skills

Java, Python, Scala
Linux (Ubuntu, CentOS), Windows, Mac OS
Agile

MS-SQL, POSTGRES, Oracle 11g, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012
Shell scripting, UNIX, Python, R Language
AZURE BLOB, Synapse Analytics, ADF and ADLS GEN2 storage

Timeline

Senior Data Engineer

Cigna

02.2023 - Current

Sr. Data Engineer

Wipro Limited

08.2021 - 01.2023

Hadoop and Spark Developer

Optum

11.2018 - 08.2021

API Developer

Emirates

04.2017 - 03.2018

Master of Science - Information Technology

Trine University

P V Krishna

Summary

Overview

Work History

Senior Data Engineer

Sr. Data Engineer

Hadoop and Spark Developer

API Developer

Education

Master of Science - Information Technology

Skills

Timeline

Senior Data Engineer

Sr. Data Engineer

Hadoop and Spark Developer

API Developer

Master of Science - Information Technology

Similar Profiles

PRANABH KUMAR THADURIPRANABH KUMAR THADURI

Ashish KumarAshish Kumar

Dheemanth Bykere MallikarjunDheemanth Bykere Mallikarjun

Dheemanth Bykere MallikarjunDheemanth Bykere Mallikarjun