Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Charitha Kandula

USA

Summary

Data Engineer with 3+ years of experience in designing and optimizing data pipelines on AWS and Azure. Expert in ETL processes, big data (Hadoop, Spark, Kafka), and programming in Python, SQL, and Scala. Proven success in cloud migrations, infrastructure automation (Terraform), workflow orchestration (Airflow), data visualization (Tableau, Power BI), financial analysis, machine learning, and IoT. MS in Computer Science with multiple academic excellence awards. Motivated to tackle new challenges.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Data Engineer (Contract)

Wells Fargo
San Francisco, CA
02.2024 - 08.2024
  • Migrated an existing on-premises application to AWS
  • Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Writing Pig and Hive scripts with UDF in MR and Python to perform ETL on AWS Cloud Services
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQL, Python and Scala
  • Worked on Apache spark writing Python applications to convert txt, xls files and parse
  • Written Terraform scripts to automate AWS services which include ELB, CloudFront distribution, RDS, EC2, database security groups, Route 53, VPC, Subnets, Security Groups, and S3 Bucket and converted existing AWS infrastructure to AWS Lambda deployed via Terraform and AWS Cloud Formation
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy
  • Create several types of data visualizations using Python and Tableau
  • Designed and develop Tableau visualizations which include preparing Dashboards using calculations, parameters, calculated fields, groups, sets, and hierarchies
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata
  • Created scripts to read CSV, JSON files from S3 buckets in Python and load them into AWS S3, and DynamoDB
  • Developed Map Reduce jobs in Python for data cleaning and data processing
  • Connecting MySQL database through spark driver
  • Designed and implemented real-time and batch workflows
  • Worked on implementation of Audit Balance and Control Framework with Airflow as the backend
  • Authored Dags for Use cases with Spark, Python, Java, etc., in Airflow
  • Import of data using Scoop from Oracle to HDFS
  • Developed analytical component using Scala, Spark and Spark Stream
  • Environment: Python, Scala, SQL, AWS, S3, EC2, EMR, Lambda, RDS, Hadoop, Spark, Hive, Pig, Scoop, MySQL, Tableau, Oracle, Airflow, Teradata, Java, J2EE.

Graduate Teaching Assistant- Data Science

Northern Arizona University
AZ, USA
01.2023 - 12.2023
  • Assist in preparing and organizing course materials, including lecture slides and lab exercises, with a focus on data science, machine learning, and artificial intelligence
  • Help students with course-related questions and technical issues related to data preprocessing, feature engineering, and machine learning model development during office hours or tutoring sessions
  • Grade assignments, exams, and projects, providing constructive feedback on topics such as supervised and unsupervised learning algorithms, model tuning, and AI model evaluation metrics
  • Supervise and support students during practical lab sessions, ensuring proper use of data science tools, machine learning frameworks (e.g., TensorFlow, PyTorch), and statistical analysis methods
  • Manage online course content and maintain communication channels between students and the instructor, including troubleshooting technical problems related to data science platforms and tools
  • Handle administrative tasks such as tracking attendance and performance metrics and assist with the management of course-related databases and data pipelines
  • Support the instructor in delivering lectures or presentations, including setting up and demonstrating advanced AI models, neural networks, and machine learning algorithms.

Data Engineer

CGI
Hyderabad, India
07.2020 - 08.2022
  • Developed and maintained Python scripts for automating data processing tasks, including data cleaning, transformation, and integration
  • This involved writing efficient, reusable code and debugging issues to ensure reliable data pipelines
  • Optimized the SQL Server database structure to facilitate quicker access to information, addressing customer-reported incidents
  • Worked with JSON, CSV, Sequential, and Text file formats
  • Achieved 90% service precision by deploying and managing services using Azure Kubernetes Services
  • Imported data from Microsoft SQL Server to Azure Data Lake Gen2 utilizing tools in Azure Data Factory
  • Created workflows, and mappings using Informatica ETL and worked with different transformations such as lookup, source qualifier, update strategy, router, sequence generator, aggregator, rank, stored procedure, filter, joiner, and sorter
  • Utilized Azure Monitor to track and analyze system performance metrics, identifying bottlenecks and optimizing query execution plans for improved performance
  • Created workflows, and mappings using Informatica ETL and worked with different transformations such as lookup, source qualifier, update strategy, router, sequence generator, aggregator, rank, stored procedure, filter, joiner, and sorter
  • Integrated Azure Databricks with Azure Synapse Analytics for efficient query processing and analytics on Databricks-managed datasets
  • Worked on Python scripting to automate generation of scripts
  • Data curation is done using Azure data bricks
  • Experience in Reporting Services, Power BI (Dashboard Reports), Crystal Reports, SSRS using MS SQL Server and in supporting services MDX technology of the analysis services
  • Developed Power BI reports and dashboards from multiple data sources using data blending
  • Applying statistical methods to analyze data sets and draw meaningful conclusions
  • Authored optimized queries on databases to retrieve and verify information related to support cases
  • Environment: Python, SQL, JavaScript, Azure, Data Factory, Data Lake, Databricks, Synapse Analytics, Microsoft Excel, SQL Server, ETL, Informatica, Microsoft Excel, Power BI.

Education

Masters Degree - Computer Science

Northern Arizona University
Arizona City, AZ
12.2023

Bachelors Degree - Computer Science

Vel Tech University
India
05-2022

Skills

Programming Languages: Python 37/27, C, C, SQL

Database Tools: Oracle, MS SQL Server, MySQL, PL/SQL, Teradata

Reporting Tools: Power BI, Tableau

Web Programming: HTML, CSS

Cloud Technologies: AWS (S3, EC2, EMR, Lambda, RDS), Azure (Data Factory, Data Lake, Databricks, Synapse Analytics)

Data Formats: CSV, JSON, TXT, XML

Operating systems: Windows, Mac, Linux, Unix

Technologies/Tools/IDEs: PyCharm, Visual Studio, Jupyter Notebook, Eclipse, DBeaver

Big Data Technologies: Hadoop, Spark, Hive, Kafka, MapReduce

Certification

  • Python- Expert Level.
  • MATLAB
  • SQL for Data Science
  • Microsoft Power BI
  • Microsoft Azure

Timeline

Data Engineer (Contract)

Wells Fargo
02.2024 - 08.2024

Graduate Teaching Assistant- Data Science

Northern Arizona University
01.2023 - 12.2023

Data Engineer

CGI
07.2020 - 08.2022

Masters Degree - Computer Science

Northern Arizona University

Bachelors Degree - Computer Science

Vel Tech University
Charitha Kandula