Data Engineer with 4+ years of experience in Data Extraction, Data Modelling, Statistical Modeling, Data Mining and Data Visualization.
Proficiency in the entire process of Software Development Life Cycle (SDLC) and proficiency in Agile and Waterfall Methodologies.
Knowledge of Python packages like a NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn and TensorFlow.
Ability to performing data analysis on various IDEs such as Jupyter Notebook and PyCharm.
Understanding of Relational Database implementation and development using MySQL, SQL Server and proficiency in writing complex SQL queries.
Good knowledge of developing data visualizations and dashboards using Power BI and Tableau.
Proven knowledge of Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift and EC2 for data processing.
Demonstrated expertise in designing, configuring, and managing Apache Kafka clusters, brokers, topics, and partitions.
Well-Versed in big data tools using Hadoop technologies Map Reduce, Apache Spark, Hive, HDFS and Pig. Capable in handling database issues and connections with SQL and NoSQL databases like MongoDB by installing & configuring various packages in python.
Hands on experience on Data Bricks work space user interface, Managing notebooks, Delta lake with Python and Spark SQL. Proficient in version control tool such as Git.
Overview
5
5
years of professional experience
1
1
Certification
Work History
Data Engineer
Nike
02.2023 - Current
Responsible for requirements gathering and analyzing the data sources
Developed ETL and ELT pipelines using Data Bricks in AWS environment
Created databricks notebooks with delta format tables and implemented lake house architecture
Developed Pyspark and SparkSQL scripts for multiple data products on the business logic
Worked with various big data file formats such as Parquet, CSV and JSON in different scenarios of building data pipelines
Developed Data Cleaning and Data Validation scripts before applying business transformations on Databricks
Worked with Visual studio and Repos in AWS DevOps to do code commits and code migrations across dev, test and prod environments
Created stored procedures that are helpful for performing Full loads and Incremental Loads
Worked in Agile model during the project development.
Data Engineer
AirBnb
11.2021 - 01.2023
Implemented Agile Methodology for building an internal application
Developed Spark applications using PySpark
Worked in designing tables in Hive and processing data like importing and exporting of databases to the HDFS, involved in processing large datasets of different forms data
Developed Python/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources
Used Aws Glue Catalog with crawler to get the data from S3 and perform sql query operations
Successfully designed and deployed an Apache Kafka-based data streaming architecture, enabling real-time data ingestion and processing
Responsible for developing data pipelines with Amazon AWS to extract the data from S3 buckets and store in HDFS
Querying big data, Data pipeline design and implementation for data extraction, Scheduling and Automation of tasks from data fetching, data cleaning to model testing with DAG in Airflow
Scheduling and automation of processes by writing python programs (DAGs) in Apache Airflow
Worked closely with Data Scientists to know data requirements for the experiments
Experience in cloud versioning technologies like Github.
Data Engineer
MetLife
09.2020 - 10.2021
Working with an Agile environments including the Scrum methodology within the cross-functional team and act as a liaison between the business user group and the technical team
Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake
Involved in developing and documenting the ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems
Using Jupyter notebooks to developing, testing & analyzing Spark jobs before Scheduling Customized Spark jobs
Designing and developing SSIS Packages to import and export data from MS Excel, SQL Server, and Flat files
Analyzing, designing & building Modern data solutions using Azure PaaS service to support visualization of data
Using Git for version control and Pull Requests.
Data Engineer
Blue light IT Solutions, India
09.2018 - 12.2019
Worked on development of data ingestion pipelines using ETL tool, Talend & bash scripting with big data technologies including but not limited to Hive, Impala, Spark, Kafka, and Talend
Supported data quality management by implementing proper data quality checks in data pipelines
Delivered data engineer services like data exploration, ad-hoc ingestions, subject-matter-expertise to Data scientists in using big data technologies
Build machine learning models to showcase Big data capabilities using Pyspark and MLlib
Implemented data streaming capability using Kafka and Talend for multiple data sources
Worked with multiple storage formats (Avro, Parquet) and databases (Hive, Impala, Kudu)
S3 - Data Lake Management
Responsible for maintaining and handling data inbound and outbound requests through big data platform
Involved in the development of agile, iterative, and proven data modeling patterns that provide flexibility
Troubleshooted user's analyses bugs (JIRA and IRIS Ticket)
Worked with SCRUM team in delivering agreed user stories on time for every Sprint
Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
Education
Masters in Computer Science -
University of Alabama
Birmingham, AL
Bachelors in Electrical Engineering -
Gokaraju Rangaraju Institute
Skills
Methodology: SDLC, Agile, Waterfall
Languages: Python, Scala, SQL
IDE's: PyCharm, Jupyter Notebook, Data Bricks
Big Data Ecosystem: Hadoop, MapReduce, Hive, Apache Spark, Pig, Kafka
ETL Tools: SSIS, Informatica
Cloud Technologies: AWS, Azure
Reporting Tools: Tableau, Power BI, SSRS
Database: MS SQL Server, PostgreSQL, MongoDB, MySQL
Certified Nursing Assistant at Noland Health Services Inc./ Oaks on ParkwoodCertified Nursing Assistant at Noland Health Services Inc./ Oaks on Parkwood