Summary

Overview

Work History

Education

Skills

Timeline

KARTHIK REDDY KARNA

Houston,TX

Summary

To secure a challenging role as a Data Engineer, leveraging 9+ years of software industry experience, with a focus on Azure cloud services and Big Data technologies like Spark, MapReduce, Hive, Yarn, and HDFS, using programming languages such as Scala and Python. With 4 years of experience in Data Warehouse, I possess a deep understanding of ETL processes, data modeling, and data warehousing. I am committed to delivering efficient and scalable data solutions that drive business growth and support strategic decision-making.

Highly results-driven Data Engineer specializing in the design and implementation of scalable data ingestion pipelines using Azure Data Factory.
Expertise in leveraging Azure Databricks and Spark for distributed data processing and transformation tasks, ensuring optimal performance.
Skilled in maintaining data quality and integrity through robust validation, cleansing, and transformation operations.
Adept at architecting cloud-based data warehouse solutions on Azure, utilizing Snowflake for efficient data storage, retrieval, and analysis.
Extensive experience with Snowflake Multi-Cluster Warehouses and deep understanding of Snowflake cloud technology.
Proficient in utilizing advanced Snowflake features such as Clone and Time Travel to enhance database applications.
Actively involved in the development, improvement, and maintenance of Snowflake database applications.
Expertise in building logical and physical data models for Snowflake, adapting them to meet changing requirements.
Skilled in defining roles and privileges to control access to different database objects.
Thorough knowledge of Snowflake database, schema, and table structures, enabling efficient data organization and retrieval.
Strong collaboration skills, working closely with data analysts and stakeholders to implement effective data models and structures.
Proven proficiency in optimizing Spark jobs and utilizing Azure Synapse Analytics for big data processing and advanced analytics.
Track record of success in performance optimization and capacity planning to ensure scalability and efficiency.
Experienced in developing CI/CD frameworks for automated deployment of data pipelines, collaborating with DevOps teams.
Proficient in scripting languages such as Python and Scala, enabling efficient automation and customization.
Skilled in utilizing Hive, SparkSQL, Kafka, and Spark Streaming for ETL tasks and real-time data processing.
Strong working experience in Hadoop ecosystem, including HDFS, Map-Reduce, Hive, and Python.
Hands-on expertise in developing large-scale data pipelines using Spark and Hive.
Experience in using Apache Sqoop for importing and exporting data between HDFS and relational databases.
Proven experience in setting up and managing Hadoop job workflows using Apache Oozie.
Optimized query performance in Hive using advanced techniques like bucketing and partitioning, and extensive Spark job tuning experience.
Highly proficient in Agile methodologies, utilizing JIRA for efficient project management and reporting.

Overview

years of professional experience

Work History

Azure Snowflake data engineer

CenterPoint Energy

11.2022 - Current

Implemented highly scalable data ingestion pipelines using Azure Data Factory, efficiently ingesting data from diverse sources such as SQL databases, CSV files, and REST APIs
Developed comprehensive data processing workflows utilizing Azure Databricks, harnessing the power of Spark for distributed data processing and advanced transformations
Ensured exceptional data quality and integrity by performing thorough data validation, cleansing, and transformation operations through seamless integration of Azure Data Factory and Databricks
Architected and implemented a robust cloud-based data warehouse solution on Azure, leveraging the strengths of Snowflake to achieve outstanding scalability and performance
Expertly created and optimized Snowflake schemas, tables, and views, optimizing data storage and retrieval for high-performance analytics and reporting requirements
Collaborated closely with data analysts and business stakeholders, gaining deep insights into their needs and translating them into effective data models and structures within Snowflake
Developed and fine-tuned Spark jobs to execute intricate data transformations, perform aggregations, and accomplish machine learning tasks on large-scale datasets
Manipulated powerful capabilities of Azure Synapse Analytics to seamlessly integrate big data processing and advanced analytics, unlocking valuable data exploration and insights generation opportunities
Implemented sophisticated event-based triggers and scheduling mechanisms to automate data pipelines and workflows, ensuring optimal efficiency and reliability
Established comprehensive data lineage and metadata management solutions, enabling efficient tracking and monitoring of data flow and transformations across the entire ecosystem
Identified and successfully addressed performance bottlenecks in both data processing and storage layers, achieving significant enhancements in query execution and data latency reduction
Implemented cutting-edge strategies such as partitioning, indexing, and caching in Snowflake and Azure services, resulting in superior query performance and reduced processing time
Conducted thorough performance tuning exercises and robust capacity planning to ensure the scalability and efficiency of the entire data infrastructure
Developed a robust CI/CD framework for data pipelines using the Jenkins tool, enabling streamlined deployment and continuous integration of data workflows
Collaborated closely with DevOps engineers to develop and implement automated CI/CD pipelines and test-driven development practices on Azure, precisely tailored to meet client requirements
Proficiently programmed in scripting languages such as Python and Scala, leveraging their flexibility and power to optimize data processes and enable customization
Contributed actively to ETL tasks, meticulously maintaining data integrity and performing rigorous pipeline stability checks
Demonstrated hands-on expertise in utilizing Kafka, Spark Streaming, and Hive for processing streaming data in specific use cases, unlocking real-time data insights
Designed and implemented end-to-end data pipelines encompassing Kafka, Spark, and Hive, effectively ingesting, transforming, and analyzing data for diverse business needs
Developed and fine-tuned Spark core and Spark SQL scripts using Scala, achieving remarkable acceleration in data processing capabilities
Utilized JIRA extensively for project reporting, creating subtasks for Development, QA, and Partner validation, and effectively managing projects from start to finish
Deeply experienced in Agile methodologies, actively participating in the full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning sessions
Environment: Azure Databricks, Data Factory, Snowflake, Logic Apps, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shall scripting, GIT, JIRA, Jenkins, kafka, ADF Pipeline, Power Bi
Azure Snowflake data engineer

data Developer

Zions Bank

06.2021 - 10.2022

Designed and implemented end-to-end data pipelines using Azure Data Factory to efficiently extract, transform, and load (ETL) data from diverse sources into Snowflake
Utilized Azure Databricks to design and implement data processing workflows, leveraging the power of Spark for large-scale data transformations
Built optimized and scalable Snowflake schemas, tables, and views to support complex analytics queries and meet stringent reporting requirements
Developed data ingestion pipelines using Azure Event Hubs and Azure Functions, enabling real-time data streaming into Snowflake for timely insights
Leveraged Azure Data Lake Storage as a robust data lake solution, implementing effective data partitioning and retention strategies
Utilized Azure Blob Storage for efficient data file storage and retrieval, implementing compression and encryption techniques for enhanced security and cost optimization
Integrated Azure Data Factory with Azure Logic Apps to orchestrate complex data workflows and trigger actions based on specific events
Implemented rigorous data governance practices and data quality checks using Azure Data Factory and Snowflake, ensuring data accuracy and consistency
Implemented efficient data replication and synchronization strategies between Snowflake and other data platforms, leveraging Azure Data Factory and Change Data Capture techniques
Developed and deployed Azure Functions to handle data preprocessing, enrichment, and validation tasks within data pipelines
Leveraged Azure Machine Learning in conjunction with Snowflake to implement advanced analytics and machine learning workflows, enabling predictive analytics and data-driven insights
Designed and implemented data archiving and retention strategies using Azure Blob Storage and Snowflake's Time Travel feature
Developed custom monitoring and alerting solutions using Azure Monitor and Snowflake Query Performance Monitoring (QPM), ensuring proactive identification and resolution of performance issues
Integrated Snowflake with Power BI and Azure Analysis Services, enabling the creation of interactive dashboards and reports for self-service analytics by business users
Optimized data pipelines and Spark jobs in Azure Databricks to improve performance, leveraging techniques such as Spark configuration tuning, caching, and data partitioning
Implemented comprehensive data cataloging and data lineage solutions using tools like Azure Purview and Apache Atlas, providing a holistic view of data assets and their relationships
Collaborated closely with cross-functional teams, including data scientists, data analysts, and business stakeholders, to understand data requirements and deliver scalable and reliable data solutions
Environment: Azure Databricks, Data Factory, Logic Apps, Snowflake, Functional App, Snowflake, MS SQL, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, shall scripting, GIT, JIRA, Jenkins, kafka, ADF Pipeline, Power Bi
Big

Charter Communications

05.2019 - 05.2021

Designed and implemented a robust ETL framework utilizing Sqoop, Pig, and Hive to efficiently extract data from diverse sources and make it readily available for consumption
Performed data processing on HDFS and created external tables using Hive, while also developing reusable scripts for table ingestion and repair across the project
Developed ETL jobs using Spark and Scala to seamlessly migrate data from Oracle to new MySQL tables
Leveraged Spark (RDDs, DataFrames, Spark SQL) and Spark-Cassandra Connector APIs for a range of tasks, including data migration and generation of business reports
Created a Spark Streaming application for real-time sales analytics, enabling timely insights and decision-making
Conducted comprehensive analysis of source data, efficiently managing data type modifications, and leveraging Excel sheets, flat files, and CSV files for ad-hoc report generation in Power BI
Analyzed SQL scripts and designed solutions using PySpark, ensuring efficient data extraction, transformation, and loading processes
Utilized Sqoop to extract data from various data sources into HDFS, enabling seamless integration of data into the big data environment
Handled data import from multiple sources, performed transformations using Hive and MapReduce, and loaded processed data into HDFS
Leveraged Sqoop to extract data from MySQL into HDFS, ensuring seamless integration and availability of MySQL data within the big data environment
Implemented automation for deployments using YAML scripts, streamlining the build and release processes for efficient project delivery
Worked extensively with a range of big data technologies, including Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka, and Sqoop
Implemented data classification algorithms using well-established MapReduce design patterns, enabling efficient data classification and analysis
Utilized advanced techniques such as combiners, partitioning, and distributed cache to optimize the performance of MapReduce jobs
Leveraged Git and GitHub repositories for efficient source code management and version control, ensuring seamless collaboration and code tracking
Environment: Hadoop, Hive, spark, PySpark, Sqoop, Spark SQL, Shallscript, Cassandra, YAML, ETL.

Big data Developer

Bank Of America

05.2018 - 04.2019

Created a shell script to automate the creation of staging and landing tables with matching schemas to the source data, generating properties used by Oozie jobs for seamless execution
Developed Oozie workflows to orchestrate Sqoop and Hive actions, integrating with NoSQL databases like HBase to create HBase tables for efficient loading of large volumes of semi-structured data from diverse sources
Implemented performance optimizations on Spark and Python, diagnosing and resolving performance issues to enhance overall processing efficiency
Contributed to the development of a roadmap for migrating enterprise data from multiple data sources, such as SQL Server and provider databases, to S3 as a centralized data hub, enabling improved data accessibility and management across the organization
Successfully loaded and transformed extensive sets of structured and semi-structured data from various downstream systems, leveraging Spark and Hive for business-specific transformations
Developed Spark applications and automated pipelines for both bulk loads and incremental loads of diverse datasets, ensuring streamlined data processing and efficient data ingestion
Wrote scripts to execute Oozie workflows, capturing job logs from the cluster and creating a metadata table that records the execution times of each job for monitoring and analysis purposes
Converted existing MapReduce applications into PySpark applications as part of an overarching initiative to modernize legacy jobs and establish a new framework for data processing
Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Oozie, Impala, Java (jdk1.8), Cloudera, Python, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Kafka, Oracle.

Data Warehouse Developer

Truist Bank

02.2014 - 05.2018

Managed and administered SQL Server databases, including creation, manipulation, and support of database objects
Contributed to data modeling and the design of both physical and logical database structures
Assisted in integrating front-end applications with the SQL Server backend, ensuring smooth data interactions and seamless functionality
Created stored procedures, triggers, indexes, user-defined functions, and constraints to optimize database performance and retrieve the desired results
Utilized Data Transformation Services (DTS) to import and export data between servers, ensuring smooth data transfer and synchronization
Wrote T-SQL statements to retrieve data and conducted performance tuning on T-SQL queries for improved query execution times
Implemented data transfers from various sources, including MS Excel, MS Access, and flat files, to SQL Server using SSIS/DTS, employing features like data conversion and derived column creation as per requirements
Supported the team in resolving issues related to SQL Reporting Services and T-SQL, leveraging expertise in report creation, including cross-tab, conditional, drill-down, top N, summary, form, OLAP, and sub-reports, while ensuring proper formatting
Provided application support via phone, offering assistance and resolving queries related to the SQL Server environment
Developed and tested Windows command files and SQL Server queries for monitoring the production database in a 24/7 support environment
Implemented comprehensive logging for ETL loads at both the package and task levels, capturing and recording the number of records processed by each package and task using SSIS
Developed, monitored, and deployed SSIS packages for efficient data integration and transformation processes
Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage 6.0, Ascential QualityStage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts, Sun Solaris, Windows 2000.

Education

Bachelor - Electronics and Communication Engineering

Vignan Bharathi Institute of Technology JNTUH

Skills

TECHNICAL SKILLS
Azure Services like Azure Data Factory, Azure Data Bricks, snowflake, Logic Apps, Functional App, Snowflake, Azure DevOps
Big Data Technologies that used are MapReduce, Hive, Teg, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper
Hadoop DistributionHadoop Distribution which are Cloudera, Horton Works
Languages: SQL, PL/SQL, Python, HiveQL, Scala
Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS
Build Automation tools: Ant, Maven
Version Control software like GIT, GitHub, Bitbucket
IDE &Build Tools, Design: Eclipse, Visual Studio
Databases: MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse MS Excel, MS Access, Oracle 11g/12c, Cosmos DB

Timeline

Azure Snowflake data engineer

CenterPoint Energy

11.2022 - Current

data Developer

Zions Bank

06.2021 - 10.2022

Charter Communications

05.2019 - 05.2021

Big data Developer

Bank Of America

05.2018 - 04.2019

Data Warehouse Developer

Truist Bank

02.2014 - 05.2018

Bachelor - Electronics and Communication Engineering

Vignan Bharathi Institute of Technology JNTUH

KARTHIK REDDY KARNA

Summary

Overview

Work History

Azure Snowflake data engineer

data Developer

Big data Developer

Data Warehouse Developer

Education

Bachelor - Electronics and Communication Engineering

Skills

Timeline

Azure Snowflake data engineer

data Developer

Big data Developer

Data Warehouse Developer

Bachelor - Electronics and Communication Engineering

Similar Profiles

Jacob KitzmillerJacob Kitzmiller

Deryl TumlinsonDeryl Tumlinson

Kenneth BoutteKenneth Boutte

Toby PenningtonToby Pennington

ANDRE GOBERTANDRE GOBERT