Summary

Overview

Work History

Education

Skills

Technologyandtools

Timeline

Dushyant Singh

Lutz,FL

Summary

9+ years of extensive development in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems. strong experience, working on Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pyspark/Spark and Amazon Web Services like IAM, EC2, VPC, AMI, SNS, SQS, EMR, LAMBDA, GLUE, ATHENA, REDSHIFT, Cloud Watch, Auto Scaling, S3. Good experience in working with cloud environments like Amazon Web Services (AWS) EMR, EC2, and S3. Experience in using analytic data warehouse like Snowflake. Experience in using Databricks for handling all analytical processes from ETL to all data modeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs. Experience in Apache Airflow to author workflows as directed acyclic graphs (DAGs), to visualize batch and real-time data pipelines running in production, monitor progress, and troubleshoot issues when needed. Experience in installation, configuring, supporting, and managing Hadoop Clusters using Apache Cloudera (CDH 5.X) distributions on Amazon web services (AWS). Experience in Amazon AWS services such as EMR, EC2, S3 and RedShift which provides fast and efficient processing of Big Data. Imported the data from different sources like AWS S3, Local file system into Spark RDD. Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation. Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFS with partitions and used Spark to extract schema of JSON files. Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume. Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures. Experienced in moving data from different sources using Kafka producers, consumers and preprocess data. Experience on importing and exporting data using stream processing platforms like Flume and Kafka. Good knowledge on various scripting languages like Linux/Unix, shell scripting and Python. Continuous integration and automated deployment and management using Jenkins. Hands On experience on developing UDF, DATA Frames and SQL Queries in Spark SQL. Proficient in Data Warehousing, Data Mining concepts and ETL transformations from source to target systems. Diverse experience in working with variety of Database like Oracle, MySQL, SQL Server. Experience with NumPy, Matplotlib, Pandas, Seaborn, and Cufflink’s python libraries. Experience with Python Web UI Frameworks like Flask and Django. Worked on large datasets by using Pyspark, NumPy and pandas. Good Experience in Agile Engineering practices, Scrum methodologies, and Test-Driven Development and Waterfall methodologies. Good knowledge in Core Java and J2EE technologies such as JDBC, EJB, Servlets, JSP, JavaScript, Struts and Spring. Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Jenkins, Maven. Strong team player, ability to work independently and in a team, ability to adapt to a rapidly changing environment, commitment towards learning, Possess excellent communication, project management, documentation, interpersonal skills. Practical database engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into system. Offering several-year background managing various aspects of development, design and delivery of database solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.

Overview

years of professional experience

Work History

Sr. Data Engineer

Fidelity

Salt Lake, UT

05.2023 - Current

Craft highly scalable and resilient cloud architectures that address customer business problems and accelerate the adoption of AWS services for clients
Build application and database servers using AWS EC2 and create AMIs as well as use RDS for PostgreSQL
Carried Deployments and builds on various environments using continuous integration tool Jenkins
Designed the project workflows/pipelines using Jenkins as CI tool
Used Terraform to allow infrastructure to be expressed as code in building EC2, LAMBDA, RDS, EMR
Built analytical warehouses in Snowflakes and queried data in staged files by referencing metadata columns in a staged file
Continuous data loads using Snow-Pipe and file sizing and loaded structure and semi-structured data using web interfaces into Snowflakes
Designed Data Quality Framework to perform schema validation and data profiling on Spark (Pyspark)
Worked on ETL Processing which consists of data transformation, data sourcing and also mapping, Conversion and loading
Used Pandas API to put the data as time series and tabular form for timestamp data manipulation and retrieval to handle time series data and do data manipulation
Implemented Spark in EMR for processing Enterprise Data across our Data Lake in AWS System
Fine-tuned Ec2 for long-running Spark Applications to utilize better parallelism and executor memory for more caching
Used Spark Structured streaming to consume real time data and build features calculation from various sources like data lake, snowflake and produce it back to Kafka
Worked on real time ETL from loading data to snowflake from data lake in the structured format
Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Pyspark
Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3 buckets, performed folder management in each bucket, managed logs and objects within each bucket
Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
Imported the data from different sources like AWS S3, Local file system into Spark RDD
Configured Glue Dev Endpoints to point Glue Job to specify EMR cluster or EC2 instance.

Data Engineer

Dish Wireless

Denver, CO

09.2022 - 05.2023

Built a generic data ingestion framework to extract data from multiple sources like SQL server, delimited flat files, XML, and JSON, using it to build redshift tables
Created API’s using Apache Kafka and node.js
Consuming event data from Kafka using Spark Streaming
Developed Admin API to manage and inspect topics, brokers, and other Kafka objects
Developed Producer API and Consumer API to publish and subscribe to stream of events in one or more topics
Developed Spark scripts, Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop
Developed SQL Queries, Views, and Packages for handling Database activities
Developed Python scripts to perform data cleansing and transformation
Inventory management system using Python Flask
Data generated from the APIs will be saved into MySQL RDBMS using Flask-SQLAlchemy
REST services to get data from Salesforce API and save in AWS S3
Extensively used AWS S3, EC2 and EMR instances to deploy and test the applications in various environments (DEV, QA, PROD) Involved in performance tuning the application at various levels, Hive, Spark, etc
Designed and developed data visualizations and monthly reports on the hazardous chemical data based on city requirements using Tableau
Worked in scrum/Agile environment, using tools such as JIRA.

Hadoop and Spark Developer

HDFC

Mumbai, India

04.2019 - 07.2021

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase database and SQOOP
Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs for data cleaning and pre-processing
Involved in gathering requirements from client and estimating timeline for developing complex queries using HIVE and IMPALA for logistics application
Responsible for design development of Spark SQL Scripts based on Functional Specifications
Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop
Developed Kafka producer and consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose
Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark
Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis
Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables
Written Hive scripts as per requirements to automate the workflow using shell scripts
Participated in Rapid Application Development and Agile processes to deliver new cloud platform services
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP
Django web tool to capture validation tests performed on SSD.

Data Engineer

Oracle

India, India

04.2017 - 03.2019

Involved in the design of the project using UML Use Case Diagrams, Sequence Diagrams, Object diagrams, and Class Diagrams
Experience in developing Spark programs to perform Data Transformations, creating Datasets, Data frames, and writing spark SQL queries, spark streaming, windowed streaming application
Written automated Bash scripts to automate the process of running regular jobs
Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator
Involved in Unit Testing, Integration testing and provided support for System Test and Performance
Performed performance tuning of OLTP and Data warehouse environments using SQL
Built continuous ETL pipeline by using Kafka, Spark streaming and HDFS
Performed ETL on data from various file formats (JSON, Parquet, and Database)
Strong knowledge of various data warehousing methodologies and data modeling concepts
Heavily involved in testing Snowflake to understand the best possible way to use the cloud resources.

Java Developer

IBM

India, India

06.2013 - 03.2017

Developed application using Struts MVC architecture
Developed Back Office and Front-End forms/templates with Validations for Login, Registration, maintain security through session / application variables, deliver dynamic content using HTML, JavaScript and Java as required
Developed web interfaces using HTML and JavaScript
Developed Stored Procedures, Functions using Oracle
Proficiency in developing SQL, PL/SQL in JDBC with Oracle 9i/10g as well as client-server environments
Developed UI using CSS, JSP, Servlets
Experience in creating user interfaces using JSP, HTML, DHTML, XML, XSLT, and JavaScript
Involved in validating the views using validator plug-in in Struts Framework
Experienced in building and deploying J2EE Application Archives (Jar, War and Ear) on IBM WebSphere application server using Apache Ant
Implemented Service tier and Data access tier using spring
Developed various test cases using Junit for Unit Testing
Designed message formats in XML
Back-end stored procedures development with PL/SQL
Used multi-threading in programming to improve overall performance
Developed DAO objects to mock persistence implementation to test Business Logic
Used SVN as version control.

Education

Master’s in Computer and Information Science -

SAU-Southern Arkansas University

USA

05.2023

Skills

Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pyspark/Spark
Amazon Web Services like IAM, EC2, VPC, AMI, SNS, SQS, EMR, LAMBDA, GLUE, ATHENA, REDSHIFT, Cloud Watch, Auto Scaling, S3
Cloud environments like Amazon Web Services (AWS) EMR, EC2, and S3
Analytic data warehouse like Snowflake
Databricks for handling all analytical processes from ETL to all data modeling
Apache Airflow to author workflows as directed acyclic graphs (DAGs)
Installation, configuring, supporting, and managing Hadoop Clusters using Apache Cloudera (CDH 5X) distributions on Amazon web services (AWS)
Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
Real-time data analytics using Spark Streaming, Kafka, and Flume
Designing and developing applications in Spark using Scala
In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability, and YARN architecture
Moving data from different sources using Kafka producers, consumers, and preprocess data
Importing and exporting data using stream processing platforms like Flume and Kafka
Scripting languages like Linux/Unix, shell scripting, and Python
Continuous integration and automated deployment and management using Jenkins
Developing UDF, DATA Frames, and SQL Queries in Spark SQL
Data Warehousing, Data Mining concepts, and ETL transformations
Diverse experience in working with a variety of Databases like Oracle, MySQL, SQL Server
NumPy, Matplotlib, Pandas, Seaborn, and Cufflink’s python libraries

Python Web UI Frameworks like Flask and Django
Working on large datasets using Pyspark, NumPy, and pandas
Agile Engineering practices, Scrum methodologies, and Test-Driven Development and Waterfall methodologies
Core Java and J2EE technologies such as JDBC, EJB, Servlets, JSP, JavaScript, Struts, and Spring
IDEs and Tools like Eclipse, NetBeans, GitHub, Jenkins, Maven
Excellent communication, project management, documentation, interpersonal skills
Python Programming
NoSQL Databases
API Development
Data Modeling
Data Security
Continuous integration
Data Warehousing
Performance Tuning
Data Analysis
Risk Analysis
SQL transactional replications
SQL and Databases
Big data technologies

Technologyandtools

Big Data Eco System
Hadoop
MapReduce
Hive
YARN
Kafka
Spark
Avro
Elastic Search
Parquet
Languages
Python
Scala
SQL
Linux shell scripting
Databases
Oracle
DB2
SQL Server
MySQL
PL/SQL
NoSQL
RDS
HBase
Cassandra
AWS
EC2
S3
EMR
Redshift
IAM
Athena
VPC
SNS
SQS
Lambda
Glue
CloudWatch
IDE/ Programming Tools
Eclipse
PyCharm
Intelli-J
Operating Systems
Unix
Linux
Windows
J2EE Technologies
Servlets
JDBC
JSP
Struts
Web Technologies
HTML
CSS
XML
JavaScript
JQuery
Django
Flask
Bootstrap
RESTful services
Libraries and Tools
Pyspark
Boto3
Jira
Scrum
Agile Methodologies

Timeline

Sr. Data Engineer

Fidelity

05.2023 - Current

Data Engineer

Dish Wireless

09.2022 - 05.2023

Hadoop and Spark Developer

HDFC

04.2019 - 07.2021

Data Engineer

Oracle

04.2017 - 03.2019

Java Developer

IBM

06.2013 - 03.2017

Master’s in Computer and Information Science -

SAU-Southern Arkansas University

Dushyant Singh

Summary

Overview

Work History

Sr. Data Engineer

Data Engineer

Hadoop and Spark Developer

Data Engineer

Java Developer

Education

Master’s in Computer and Information Science -

Skills

Technologyandtools

Timeline

Sr. Data Engineer

Data Engineer

Hadoop and Spark Developer

Data Engineer

Java Developer

Master’s in Computer and Information Science -

Similar Profiles

Alejandro HernandezAlejandro Hernandez

Kevin MaxwellKevin Maxwell

Abdul Hameed Basha Sena AbdulShukkoorAbdul Hameed Basha Sena AbdulShukkoor

Siddharth PatelSiddharth Patel

Sheila KhanSheila Khan