Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic

NIKHIL VUTLA

Birmingham,AL

Summary

  • Data Engineer with 4+ years of experience in Data Extraction, Data Modelling, Statistical Modeling, Data Mining and Data Visualization.
  • Proficiency in the entire process of Software Development Life Cycle (SDLC) and proficiency in Agile and Waterfall Methodologies.
  • Knowledge of Python packages like a NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn and TensorFlow.
  • Ability to performing data analysis on various IDEs such as Jupyter Notebook and PyCharm.
  • Understanding of Relational Database implementation and development using MySQL, SQL Server and proficiency in writing complex SQL queries.
  • Good knowledge of developing data visualizations and dashboards using Power BI and Tableau.
  • Proven knowledge of Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift and EC2 for data processing.
  • Demonstrated expertise in designing, configuring, and managing Apache Kafka clusters, brokers, topics, and partitions.
  • Well-Versed in big data tools using Hadoop technologies Map Reduce, Apache Spark, Hive, HDFS and Pig. Capable in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing & configuring various packages in python.
  • Hands on experience on Data Bricks work space user interface, Managing notebooks, Delta lake with Python and Spark SQL. Proficient in version control tool such as Git.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Nike
02.2023 - Current
  • Responsible for requirements gathering and analyzing the data sources
  • Developed ETL and ELT pipelines using Data Bricks in AWS environment
  • Created databricks notebooks with delta format tables and implemented lake house architecture
  • Developed Pyspark and SparkSQL scripts for multiple data products on the business logic
  • Worked with various big data file formats such as Parquet, CSV and JSON in different scenarios of building data pipelines
  • Developed Data Cleaning and Data Validation scripts before applying business transformations on Databricks
  • Worked with Visual studio and Repos in AWS DevOps to do code commits and code migrations across dev, test and prod environments
  • Created stored procedures that are helpful for performing Full loads and Incremental Loads
  • Worked in Agile model during the project development.

Data Engineer

AirBnb
11.2021 - 01.2023
  • Implemented Agile Methodology for building an internal application
  • Developed Spark applications using PySpark
  • Worked in designing tables in Hive and processing data like importing and exporting of databases to the HDFS, involved in processing large datasets of different forms data
  • Developed Python/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources
  • Used Aws Glue Catalog with crawler to get the data from S3 and perform sql query operations
  • Successfully designed and deployed an Apache Kafka-based data streaming architecture, enabling real-time data ingestion and processing
  • Responsible for developing data pipelines with Amazon AWS to extract the data from S3 buckets and store in HDFS
  • Querying big data, Data pipeline design and implementation for data extraction, Scheduling and Automation of tasks from data fetching, data cleaning to model testing with DAG in Airflow
  • Scheduling and automation of processes by writing python programs (DAGs) in Apache Airflow
  • Worked closely with Data Scientists to know data requirements for the experiments
  • Experience in cloud versioning technologies like Github.

Data Engineer

MetLife
09.2020 - 10.2021
  • Working with an Agile environments including the Scrum methodology within the cross-functional team and act as a liaison between the business user group and the technical team
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake
  • Involved in developing and documenting the ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems
  • Using Jupyter notebooks to developing, testing & analyzing Spark jobs before Scheduling Customized Spark jobs
  • Designing and developing SSIS Packages to import and export data from MS Excel, SQL Server, and Flat files
  • Analyzing, designing & building Modern data solutions using Azure PaaS service to support visualization of data
  • Using Git for version control and Pull Requests.

Data Engineer

Blue light IT Solutions, India
09.2018 - 12.2019
  • Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Spark, Kafka, and Talend
  • Supported data quality management by implementing proper data quality checks in data pipelines
  • Delivered data engineer services like data exploration, ad-hoc ingestions, subject-matter-expertise to Data scientists in using big data technologies
  • Build machine learning models to showcase Big data capabilities using Pyspark and MLlib
  • Implemented data streaming capability using Kafka and Talend for multiple data sources
  • Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu)
  • S3 - Data Lake Management
  • Responsible for maintaining and handling data inbound and outbound requests through big data platform
  • Involved in the development of agile, iterative, and proven data modeling patterns that provide flexibility
  • Troubleshooted user's analyses bugs (JIRA and IRIS Ticket)
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint
  • Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.

Education

Masters in Computer Science -

University of Alabama
Birmingham, AL

Bachelors in Electrical Engineering -

Gokaraju Rangaraju Institute

Skills

  • Methodology: SDLC, Agile, Waterfall
  • Languages: Python, Scala, SQL
  • IDE's: PyCharm, Jupyter Notebook, Data Bricks
  • Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, Pig, Kafka
  • ETL Tools: SSIS, Informatica
  • Cloud Technologies: AWS, Azure
  • Reporting Tools: Tableau, Power BI, SSRS
  • Database: MS SQL Server, PostgreSQL, MongoDB, MySQL
  • Other Tools: Git, MS Office, Windows, Linux

Certification

  • Azure Fundamentals - AZ 900
  • AWS Cloud Practitioner

Timeline

Data Engineer

Nike
02.2023 - Current

Data Engineer

AirBnb
11.2021 - 01.2023

Data Engineer

MetLife
09.2020 - 10.2021

Data Engineer

Blue light IT Solutions, India
09.2018 - 12.2019

Masters in Computer Science -

University of Alabama

Bachelors in Electrical Engineering -

Gokaraju Rangaraju Institute
NIKHIL VUTLA