Summary
Overview
Work History
Education
Skills
Timeline
Generic

KARTHIK REDDY KARNA

Houston,TX

Summary

To secure a challenging role as a Data Engineer, leveraging 9+ years of software industry experience, with a focus on Azure cloud services and Big Data technologies like Spark, MapReduce, Hive, Yarn, and HDFS, using programming languages such as Scala and Python. With 4 years of experience in Data Warehouse, I possess a deep understanding of ETL processes, data modeling, and data warehousing. I am committed to delivering efficient and scalable data solutions that drive business growth and support strategic decision-making.

  • Highly results-driven Data Engineer specializing in the design and implementation of scalable data ingestion pipelines using Azure Data Factory.
  • Expertise in leveraging Azure Databricks and Spark for distributed data processing and transformation tasks, ensuring optimal performance.
  • Skilled in maintaining data quality and integrity through robust validation, cleansing, and transformation operations.
  • Adept at architecting cloud-based data warehouse solutions on Azure, utilizing Snowflake for efficient data storage, retrieval, and analysis.
  • Extensive experience with Snowflake Multi-Cluster Warehouses and deep understanding of Snowflake cloud technology.
  • Proficient in utilizing advanced Snowflake features such as Clone and Time Travel to enhance database applications.
  • Actively involved in the development, improvement, and maintenance of Snowflake database applications.
  • Expertise in building logical and physical data models for Snowflake, adapting them to meet changing requirements.
  • Skilled in defining roles and privileges to control access to different database objects.
  • Thorough knowledge of Snowflake database, schema, and table structures, enabling efficient data organization and retrieval.
  • Strong collaboration skills, working closely with data analysts and stakeholders to implement effective data models and structures.
  • Proven proficiency in optimizing Spark jobs and utilizing Azure Synapse Analytics for big data processing and advanced analytics.
  • Track record of success in performance optimization and capacity planning to ensure scalability and efficiency.
  • Experienced in developing CI/CD frameworks for automated deployment of data pipelines, collaborating with DevOps teams.
  • Proficient in scripting languages such as Python and Scala, enabling efficient automation and customization.
  • Skilled in utilizing Hive, SparkSQL, Kafka, and Spark Streaming for ETL tasks and real-time data processing.
  • Strong working experience in Hadoop ecosystem, including HDFS, Map-Reduce, Hive, and Python.
  • Hands-on expertise in developing large-scale data pipelines using Spark and Hive.
  • Experience in using Apache Sqoop for importing and exporting data between HDFS and relational databases.
  • Proven experience in setting up and managing Hadoop job workflows using Apache Oozie.
  • Optimized query performance in Hive using advanced techniques like bucketing and partitioning, and extensive Spark job tuning experience.
  • Highly proficient in Agile methodologies, utilizing JIRA for efficient project management and reporting.

Overview

9
9
years of professional experience

Work History

Azure Snowflake data engineer

CenterPoint Energy
11.2022 - Current
  • Implemented highly scalable data ingestion pipelines using Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs
  • Developed comprehensive data processing workflows utilizing Azure Databricks, harnessing the power of Spark for distributed data processing and advanced transformations
  • Ensured exceptional data quality and integrity by performing thorough data validation, cleansing, and transformation operations through seamless integration of Azure Data Factory and Databricks
  • Architected and implemented a robust cloud-based data warehouse solution on Azure, leveraging the strengths of Snowflake to achieve outstanding scalability and performance
  • Expertly created and optimized Snowflake schemas, tables, and views, optimizing data storage and retrieval for high-performance analytics and reporting requirements
  • Collaborated closely with data analysts and business stakeholders, gaining deep insights into their needs and translating them into effective data models and structures within Snowflake
  • Developed and fine-tuned Spark jobs to execute intricate data transformations, perform aggregations, and accomplish machine learning tasks on large-scale datasets
  • Manipulated powerful capabilities of Azure Synapse Analytics to seamlessly integrate big data processing and advanced analytics, unlocking valuable data exploration and insights generation opportunities
  • Implemented sophisticated event-based triggers and scheduling mechanisms to automate data pipelines and workflows, ensuring optimal efficiency and reliability
  • Established comprehensive data lineage and metadata management solutions, enabling efficient tracking and monitoring of data flow and transformations across the entire ecosystem
  • Identified and successfully addressed performance bottlenecks in both data processing and storage layers, achieving significant enhancements in query execution and data latency reduction
  • Implemented cutting-edge strategies such as partitioning, indexing, and caching in Snowflake and Azure services, resulting in superior query performance and reduced processing time
  • Conducted thorough performance tuning exercises and robust capacity planning to ensure the scalability and efficiency of the entire data infrastructure
  • Developed a robust CI/CD framework for data pipelines using the Jenkins tool, enabling streamlined deployment and continuous integration of data workflows
  • Collaborated closely with DevOps engineers to develop and implement automated CI/CD pipelines and test-driven development practices on Azure, precisely tailored to meet client requirements
  • Proficiently programmed in scripting languages such as Python and Scala, leveraging their flexibility and power to optimize data processes and enable customization
  • Contributed actively to ETL tasks, meticulously maintaining data integrity and performing rigorous pipeline stability checks
  • Demonstrated hands-on expertise in utilizing Kafka, Spark Streaming, and Hive for processing streaming data in specific use cases, unlocking real-time data insights
  • Designed and implemented end-to-end data pipelines encompassing Kafka, Spark, and Hive, effectively ingesting, transforming, and analyzing data for diverse business needs
  • Developed and fine-tuned Spark core and Spark SQL scripts using Scala, achieving remarkable acceleration in data processing capabilities
  • Utilized JIRA extensively for project reporting, creating subtasks for Development, QA, and Partner validation, and effectively managing projects from start to finish
  • Deeply experienced in Agile methodologies, actively participating in the full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning sessions
  • Environment: Azure Databricks, Data Factory, Snowflake, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shall scripting, GIT, JIRA, Jenkins, kafka, ADF Pipeline, Power Bi
  • Azure Snowflake data engineer

data Developer

Zions Bank
06.2021 - 10.2022
  • Designed and implemented end-to-end data pipelines using Azure Data Factory to efficiently extract, transform, and load (ETL) data from diverse sources into Snowflake
  • Utilized Azure Databricks to design and implement data processing workflows, leveraging the power of Spark for large-scale data transformations
  • Built optimized and scalable Snowflake schemas, tables, and views to support complex analytics queries and meet stringent reporting requirements
  • Developed data ingestion pipelines using Azure Event Hubs and Azure Functions, enabling real-time data streaming into Snowflake for timely insights
  • Leveraged Azure Data Lake Storage as a robust data lake solution, implementing effective data partitioning and retention strategies
  • Utilized Azure Blob Storage for efficient data file storage and retrieval, implementing compression and encryption techniques for enhanced security and cost optimization
  • Integrated Azure Data Factory with Azure Logic Apps to orchestrate complex data workflows and trigger actions based on specific events
  • Implemented rigorous data governance practices and data quality checks using Azure Data Factory and Snowflake, ensuring data accuracy and consistency
  • Implemented efficient data replication and synchronization strategies between Snowflake and other data platforms, leveraging Azure Data Factory and Change Data Capture techniques
  • Developed and deployed Azure Functions to handle data preprocessing, enrichment, and validation tasks within data pipelines
  • Leveraged Azure Machine Learning in conjunction with Snowflake to implement advanced analytics and machine learning workflows, enabling predictive analytics and data-driven insights
  • Designed and implemented data archiving and retention strategies using Azure Blob Storage and Snowflake's Time Travel feature
  • Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM), ensuring proactive identification and resolution of performance issues
  • Integrated Snowflake with Power BI and Azure Analysis Services, enabling the creation of interactive dashboards and reports for self-service analytics by business users
  • Optimized data pipelines and Spark jobs in Azure Databricks to improve performance, leveraging techniques such as Spark configuration tuning, caching, and data partitioning
  • Implemented comprehensive data cataloging and data lineage solutions using tools like Azure Purview and Apache Atlas, providing a holistic view of data assets and their relationships
  • Collaborated closely with cross-functional teams, including data scientists, data analysts, and business stakeholders, to understand data requirements and deliver scalable and reliable data solutions
  • Environment: Azure Databricks, Data Factory, Logic Apps, Snowflake, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shall scripting, GIT, JIRA, Jenkins, kafka, ADF Pipeline, Power Bi
  • Big

Charter Communications
05.2019 - 05.2021
  • Designed and implemented a robust ETL framework utilizing Sqoop, Pig, and Hive to efficiently extract data from diverse sources and make it readily available for consumption
  • Performed data processing on HDFS and created external tables using Hive, while also developing reusable scripts for table ingestion and repair across the project
  • Developed ETL jobs using Spark and Scala to seamlessly migrate data from Oracle to new MySQL tables
  • Leveraged Spark (RDDs, DataFrames, Spark SQL) and Spark-Cassandra Connector APIs for a range of tasks, including data migration and generation of business reports
  • Created a Spark Streaming application for real-time sales analytics, enabling timely insights and decision-making
  • Conducted comprehensive analysis of source data, efficiently managing data type modifications, and leveraging Excel sheets, flat files, and CSV files for ad-hoc report generation in Power BI
  • Analyzed SQL scripts and designed solutions using PySpark, ensuring efficient data extraction, transformation, and loading processes
  • Utilized Sqoop to extract data from various data sources into HDFS, enabling seamless integration of data into the big data environment
  • Handled data import from multiple sources, performed transformations using Hive and MapReduce, and loaded processed data into HDFS
  • Leveraged Sqoop to extract data from MySQL into HDFS, ensuring seamless integration and availability of MySQL data within the big data environment
  • Implemented automation for deployments using YAML scripts, streamlining the build and release processes for efficient project delivery
  • Worked extensively with a range of big data technologies, including Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop
  • Implemented data classification algorithms using well-established MapReduce design patterns, enabling efficient data classification and analysis
  • Utilized advanced techniques such as combiners, partitioning, and distributed cache to optimize the performance of MapReduce jobs
  • Leveraged Git and GitHub repositories for efficient source code management and version control, ensuring seamless collaboration and code tracking
  • Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Shallscript, Cassandra, YAML, ETL.

Big data Developer

Bank Of America
05.2018 - 04.2019
  • Created a shell script to automate the creation of staging and landing tables with matching schemas to the source data, generating properties used by Oozie jobs for seamless execution
  • Developed Oozie workflows to orchestrate Sqoop and Hive actions, integrating with NoSQL databases like HBase to create HBase tables for efficient loading of large volumes of semi-structured data from diverse sources
  • Implemented performance optimizations on Spark and Python, diagnosing and resolving performance issues to enhance overall processing efficiency
  • Contributed to the development of a roadmap for migrating enterprise data from multiple data sources, such as SQL Server and provider databases, to S3 as a centralized data hub, enabling improved data accessibility and management across the organization
  • Successfully loaded and transformed extensive sets of structured and semi-structured data from various downstream systems, leveraging Spark and Hive for business-specific transformations
  • Developed Spark applications and automated pipelines for both bulk loads and incremental loads of diverse datasets, ensuring streamlined data processing and efficient data ingestion
  • Wrote scripts to execute Oozie workflows, capturing job logs from the cluster and creating a metadata table that records the execution times of each job for monitoring and analysis purposes
  • Converted existing MapReduce applications into PySpark applications as part of an overarching initiative to modernize legacy jobs and establish a new framework for data processing
  • Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Oozie, Impala, Java (jdk1.8), Cloudera, Python, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Kafka, Oracle.

Data Warehouse Developer

Truist Bank
02.2014 - 05.2018
  • Managed and administered SQL Server databases, including creation, manipulation, and support of database objects
  • Contributed to data modeling and the design of both physical and logical database structures
  • Assisted in integrating front-end applications with the SQL Server backend, ensuring smooth data interactions and seamless functionality
  • Created stored procedures, triggers, indexes, user-defined functions, and constraints to optimize database performance and retrieve the desired results
  • Utilized Data Transformation Services (DTS) to import and export data between servers, ensuring smooth data transfer and synchronization
  • Wrote T-SQL statements to retrieve data and conducted performance tuning on T-SQL queries for improved query execution times
  • Implemented data transfers from various sources, including MS Excel, MS Access, and flat files, to SQL Server using SSIS/DTS, employing features like data conversion and derived column creation as per requirements
  • Supported the team in resolving issues related to SQL Reporting Services and T-SQL, leveraging expertise in report creation, including cross-tab, conditional, drill-down, top N, summary, form, OLAP, and sub-reports, while ensuring proper formatting
  • Provided application support via phone, offering assistance and resolving queries related to the SQL Server environment
  • Developed and tested Windows command files and SQL Server queries for monitoring the production database in a 24/7 support environment
  • Implemented comprehensive logging for ETL loads at both the package and task levels, capturing and recording the number of records processed by each package and task using SSIS
  • Developed, monitored, and deployed SSIS packages for efficient data integration and transformation processes
  • Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage 6.0, Ascential QualityStage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts, Sun Solaris, Windows 2000.

Education

Bachelor - Electronics and Communication Engineering

Vignan Bharathi Institute of Technology JNTUH

Skills

  • TECHNICAL SKILLS
  • Azure Services like Azure Data Factory, Azure Data Bricks, snowflake, Logic Apps, Functional App, Snowflake, Azure DevOps
  • Big Data Technologies that used are MapReduce, Hive, Teg, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper
  • Hadoop DistributionHadoop Distribution which are Cloudera, Horton Works
  • Languages: SQL, PL/SQL, Python, HiveQL, Scala
  • Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP
  • Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS
  • Build Automation tools: Ant, Maven
  • Version Control software like GIT, GitHub, Bitbucket
  • IDE &Build Tools, Design: Eclipse, Visual Studio
  • Databases: MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse MS Excel, MS Access, Oracle 11g/12c, Cosmos DB

Timeline

Azure Snowflake data engineer

CenterPoint Energy
11.2022 - Current

data Developer

Zions Bank
06.2021 - 10.2022

Charter Communications
05.2019 - 05.2021

Big data Developer

Bank Of America
05.2018 - 04.2019

Data Warehouse Developer

Truist Bank
02.2014 - 05.2018

Bachelor - Electronics and Communication Engineering

Vignan Bharathi Institute of Technology JNTUH
KARTHIK REDDY KARNA