Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Mohana Koka

Irving,TX

Summary

A dedicated Data Engineer with 3+ years of professional experience in IT industry and Expertise in Big Data, Azure - Cloud Data engineering, and Enterprise applications. Using Data Warehousing with a focus in Big Data technologies, Data Pipelines, SQL/No SQL, cloud based RDS, Distributed Database, Server less Architecture, Data Mining, Web Scrapping, Cloud technologies like AWS EMR, Redshift, Lambda, Step Functions, Cloud Watch.

· Design and develop ETL processes in AWS Glue to migrate campaign data from external sources like S3, ORC/Parquet/text files into AWS Redshift.

· Proficient in data warehousing, ETL (Extract, Transform, Load), and data modeling.

· Expert in building Data bricks notebooks in extracting the data from various source systems like DB2, Teradata and perform data cleansing, data wrangling, data ETL processing and loading to AZURE SQL DB.

· Experience in various cloud vendors like AWS, GCP and Azure

· Assisted in the development of data pipelines using Apache Spark to process large volumes of log data for real-time analytics.

· Gained hands-on experience with cloud-based data storage solutions, including Amazon S3 and Google Cloud Storage.

· Collaborated with data analysts and data scientists to understand data requirements and provide clean, structured datasets for analysis.

· Good Knowledge in Amazon AWS concepts like EMR and EC2, S3, Lambda, Redshift web services which provide fast and efficient processing of Big Data.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Homesite Insurance
11.2022 - Current

· Performed ETL operations in Azure Databricks using JDBC connectors to connect to various relational database source systems.

· Designed data ingestion modules using AWS Glue for loading data into different layers in S3, enabling reporting using Athena and Quick Sight.

· Implemented Azure Key Vault for centralized secrets management, leveraging secrets in Azure Data Factory and Databricks notebooks.

· Created data quality scripts using SQL and Hive to ensure data integrity and successful data loads.

· Developed automated processes in Azure cloud for daily data ingestion from web services into Azure SQL DB.

· Built REST APIs to serve data generated by prediction models to other teams/customers.

· Leveraged DBT (Data Build Tool) for data transformation and modeling, improving data quality and consistency across datasets.

· Worked extensively with Hive, creating tables, and loading event data from Kafka using Spark Streaming.

· Collaborated on NIFI Pipeline integration with Spark, Kafka on EC2 nodes in QA and Production Environments.

· Migrated data to Cloud databases (Azure Synapse Analytics, Azure SQL DB).

· Set up Azure infrastructure components such as storage accounts, integration runtimes, and service principals for optimized analytical requirements.

· Developed administrative APIs for managing Kafka objects.

· Created data visualizations using Python and Tableau for effective data analysis.

· Automated infrastructure deployment and management using Terraform on a Linux environment, resulting in a more reliable and scalable setup.

· Developed and maintained BI tools, including Zendesk and Clari, to support data analysis and business intelligence reporting

· Administered and optimized PostgreSQL databases to support business intelligence and analytics.

· Utilized Agile methodologies and tools like JIRA for project management.

Cloud Data Engineer

Novo Nordisk
12.2019 - 06.2021

· Analyzed data in Azure Data Lake and Blob using Azure Databricks for efficient data processing.

· Utilized Logic Apps for workflow decisions and developed custom alerts using Azure Data Factory, SQLDB, and Logic App.

· Implemented Apache Airflow workflows for automated data extraction, transformation, and loading (ETL), reducing manual intervention and errors.

· Developed Spark programs for data transformations, including creating datasets, data frames, writing Spark SQL queries, and implementing streaming applications.

· Utilized AWS Glue catalog with crawler for data operations in S3 and performed SQL queries.

· Created Hive tables, loaded data from Kafka using Spark Streaming, and optimized performance at various levels.

· Designed ETL pipelines in Azure cloud for API data processing into Azure SQLDB.

· Developed Admin API, Producer API, and Consumer API for Kafka object management and event streaming.

· Implemented DAGs for task execution and performance tracking using Email, Bash, and Spark Livy operators.

· Conducted data quality validations using SQL, Hive, and Python scripts.

· Designed and executed data visualizations using Python, Tableau, and Spark for effective data analysis.

· Collaborated with Business Analysts to understand requirements and implemented solutions accordingly.

· Managed data engineering pipelines including ingestion, transformations, and analysis.

· Utilized Ansible for configuration management and automation of repetitive tasks and improving system consistency

· Managed Git repositories for code versioning and collaboration, enhancing project transparency and team productivity

Education

Master of Science - Information Systems And Technology

University of North Texas
Denton, TX

Skills

  • ETL development
  • Data Pipeline Design
  • Big Data Processing
  • Hadoop Ecosystem
  • SQL Programming
  • Relational databases

Certification

AWS Certified Cloud Practitioner- 2024

Timeline

Data Engineer

Homesite Insurance
11.2022 - Current

Cloud Data Engineer

Novo Nordisk
12.2019 - 06.2021

Master of Science - Information Systems And Technology

University of North Texas
Mohana Koka