Accomplished Data Engineer with over 6 years of experience specializing in Big Data ecosystems and enterprise application development. Demonstrated expertise in Hadoop, Spark, and large-scale data processing. Proven track record in data engineering, data architecture, and ETL processes with a strong emphasis on data modeling and data warehousing. Proficient in SQL and experienced with major data analytics platforms. Skilled in AWS and GCP cloud platforms with hands-on experience in developing scalable data pipelines. Adept at collaborating with cross-functional teams to deliver high-quality data-driven solutions. Committed to continuous learning and staying updated with advancements in data engineering technologies.
Overview
6
6
years of professional experience
Work History
Data Engineer
TikTok
01.2024 - 06.2024
TikTok is a social media platform for creating, sharing and discovering short videos
Developed and optimized Spark scripts for data encryption and processing using hashing algorithms
Created HIVE and HBASE tables and used Hive Queries in Spark-SQL for data analysis
Implemented ETL processes using Talend and Spark for data ingestion and transformation
Automated data workflows using Apache Nifi and Control M
Developed Python-based APIs for revenue analysis and data migration projects to Snowflake
Utilized AWS services including S3, Redshift, and Lambda for data processing and storage solutions
Monitored and maintained data pipeline performance, ensuring high availability
Collaborated with data scientists to implement machine learning models
Ensured data security and compliance with company policies
Conducted data quality assessments and implemented improvement measures
Provided technical support and troubleshooting for data-related issues.
Conducted extensive troubleshooting to identify root causes of issues and implement effective resolutions in a timely manner.
Managed cloud-based infrastructure to ensure optimal performance, security, and cost-efficiency of the company''s data platform.
Collaborated with data scientists to develop machine learning models by providing the necessary data infrastructure and preprocessing tools.
Data Engineer Intern
Bayview Asset Management
04.2023 - 09.2023
Bayview is an investment management firm focused on investments in mortgage and consumer credit, including whole loans, asset backed securities, mortgage servicing rights, and other credit-related assets
Implemented Kafka/Spark streaming pipelines for real-time data ingestion
Utilized Apache Airflow for scheduling and monitoring data workflows
Developed Spark applications using Python for data extraction and transformation
Utilized GCP services including Big Query, DataProc, and Pub/Sub for data analytics
Built ETL pipelines and deployed applications in cloud environments using Docker and Kubernetes
Conducted data validation and reconciliation processes to ensure data quality
Implemented data warehousing solutions to support business analytics
Optimized data storage and retrieval processes for performance efficiency
Collaborated with business stakeholders to define data requirements
Developed automated reporting solutions to provide real-time insights
Participated in code reviews and provided constructive feedback to peers.
Boosted performance of machine learning models by preprocessing large volumes of raw data for feature extraction and selection.
Developed custom scripts for data cleansing, ensuring consistency and accuracy across various datasets.
Provided reliable and secure access to sensitive information by enforcing strict authorization policies in line with company guidelines.
Data Engineer
NTT Data
10.2019 - 05.2022
NTT DATA - a part of NTT Group - is a trusted global innovator of IT and business services headquartered in Tokyo
We help clients transform through consulting, industry solutions, business process services, IT modernization and managed services
Executed big data analytics initiatives using Hadoop, Spark, and AWS
Developed Spark scripts for data aggregation and transformation
Automated data ingestion processes using Python and Apache Airflow
Migrated data to Snowflake and optimized ETL workflows for better performance
Implemented data validation and reconciliation processes to ensure data quality
Designed and developed data pipelines using AWS Glue for data transformation and loading
Managed and monitored data infrastructure to ensure high availability
Conducted performance tuning and optimization of data processes
Provided technical guidance and support to junior team members
Collaborated with data analysts to develop actionable insights
Implemented data governance policies to ensure data integrity and compliance.
Conducted extensive troubleshooting to identify root causes of issues and implement effective resolutions in a timely manner.
Collaborated with system architects, design analysts and others to understand business and industry requirements.
Developed and delivered business information solutions.
Data Analyst
ADP Inc
07.2018 - 09.2019
Is a global provider of human capital management solutions
Migrated workflows from development to production environments
Performed data analysis and profiling, working with data transformation and quality rules
Utilized Kubernetes and Docker for managing containerized applications
Ingested real-time data using Flume, Kafka, and Spark Streaming
Developed ETL processes using SSIS for data extraction, transformation, and loading
Created and maintained data models to support business reporting
Implemented data quality checks and validation processes
Developed dashboards and visualizations using Tableau and Power BI
Collaborated with business stakeholders to gather and analyze requirements
Provided training and support to end-users on data tools and best practices
Documented data processes and created user guides for reference.
Produced monthly reports using advanced Excel spreadsheet functions.
Maintained up-to-date knowledge of industry trends and advancements in data analytics, enhancing the adaptability of solutions provided.
Generated standard and custom reports to provide insights into business performance.
Education
Master of Science - Computer Technology
Eastern Illinois University
Charleston, IL
12.2023
Bachelor of Science - Computer Science
Jawaharlal Nehru Technological University
India
05.2018
Skills
Technical Skills
Hadoop Ecosystem:
HDFS, HBase, MapReduce, Spark, Kafka
Programming Languages: Python, SQL
ETL Tools:
Talend, Informatica, Apache Nifi, AWS Glue
Data Orchestration: Apache Airflow
Data Bases: SQL Server, MySQL, Oracle, Mango DB, Cassandra