Summary
Overview
Work History
Education
Skills
Technologyandtools
Timeline
Generic

Dushyant Singh

Lutz,FL

Summary

9+ years of extensive development in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems. strong experience, working on Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pyspark/Spark and Amazon Web Services like IAM, EC2, VPC, AMI, SNS, SQS, EMR, LAMBDA, GLUE, ATHENA, REDSHIFT, Cloud Watch, Auto Scaling, S3. Good experience in working with cloud environments like Amazon Web Services (AWS) EMR, EC2, and S3. Experience in using analytic data warehouse like Snowflake. Experience in using Databricks for handling all analytical processes from ETL to all data modeling by leveraging familiar tools, languages, and skills, via interactive notebooks or APIs. Experience in Apache Airflow to author workflows as directed acyclic graphs (DAGs), to visualize batch and real-time data pipelines running in production, monitor progress, and troubleshoot issues when needed. Experience in installation, configuring, supporting, and managing Hadoop Clusters using Apache Cloudera (CDH 5.X) distributions on Amazon web services (AWS). Experience in Amazon AWS services such as EMR, EC2, S3 and RedShift which provides fast and efficient processing of Big Data. Imported the data from different sources like AWS S3, Local file system into Spark RDD. Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation. Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFS with partitions and used Spark to extract schema of JSON files. Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume. Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures. Experienced in moving data from different sources using Kafka producers, consumers and preprocess data. Experience on importing and exporting data using stream processing platforms like Flume and Kafka. Good knowledge on various scripting languages like Linux/Unix, shell scripting and Python. Continuous integration and automated deployment and management using Jenkins. Hands On experience on developing UDF, DATA Frames and SQL Queries in Spark SQL. Proficient in Data Warehousing, Data Mining concepts and ETL transformations from source to target systems. Diverse experience in working with variety of Database like Oracle, MySQL, SQL Server. Experience with NumPy, Matplotlib, Pandas, Seaborn, and Cufflink’s python libraries. Experience with Python Web UI Frameworks like Flask and Django. Worked on large datasets by using Pyspark, NumPy and pandas. Good Experience in Agile Engineering practices, Scrum methodologies, and Test-Driven Development and Waterfall methodologies. Good knowledge in Core Java and J2EE technologies such as JDBC, EJB, Servlets, JSP, JavaScript, Struts and Spring. Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Jenkins, Maven. Strong team player, ability to work independently and in a team, ability to adapt to a rapidly changing environment, commitment towards learning, Possess excellent communication, project management, documentation, interpersonal skills. Practical database engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into system. Offering several-year background managing various aspects of development, design and delivery of database solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.

Overview

11
11
years of professional experience

Work History

Sr. Data Engineer

Fidelity
Salt Lake, UT
05.2023 - Current
  • Craft highly scalable and resilient cloud architectures that address customer business problems and accelerate the adoption of AWS services for clients
  • Build application and database servers using AWS EC2 and create AMIs as well as use RDS for PostgreSQL
  • Carried Deployments and builds on various environments using continuous integration tool Jenkins
  • Designed the project workflows/pipelines using Jenkins as CI tool
  • Used Terraform to allow infrastructure to be expressed as code in building EC2, LAMBDA, RDS, EMR
  • Built analytical warehouses in Snowflakes and queried data in staged files by referencing metadata columns in a staged file
  • Continuous data loads using Snow-Pipe and file sizing and loaded structure and semi-structured data using web interfaces into Snowflakes
  • Designed Data Quality Framework to perform schema validation and data profiling on Spark (Pyspark)
  • Worked on ETL Processing which consists of data transformation, data sourcing and also mapping, Conversion and loading
  • Used Pandas API to put the data as time series and tabular form for timestamp data manipulation and retrieval to handle time series data and do data manipulation
  • Implemented Spark in EMR for processing Enterprise Data across our Data Lake in AWS System
  • Fine-tuned Ec2 for long-running Spark Applications to utilize better parallelism and executor memory for more caching
  • Used Spark Structured streaming to consume real time data and build features calculation from various sources like data lake, snowflake and produce it back to Kafka
  • Worked on real time ETL from loading data to snowflake from data lake in the structured format
  • Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Pyspark
  • Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3 buckets, performed folder management in each bucket, managed logs and objects within each bucket
  • Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD
  • Configured Glue Dev Endpoints to point Glue Job to specify EMR cluster or EC2 instance.

Data Engineer

Dish Wireless
Denver, CO
09.2022 - 05.2023
  • Built a generic data ingestion framework to extract data from multiple sources like SQL server, delimited flat files, XML, and JSON, using it to build redshift tables
  • Created API’s using Apache Kafka and node.js
  • Consuming event data from Kafka using Spark Streaming
  • Developed Admin API to manage and inspect topics, brokers, and other Kafka objects
  • Developed Producer API and Consumer API to publish and subscribe to stream of events in one or more topics
  • Developed Spark scripts, Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop
  • Developed SQL Queries, Views, and Packages for handling Database activities
  • Developed Python scripts to perform data cleansing and transformation
  • Inventory management system using Python Flask
  • Data generated from the APIs will be saved into MySQL RDBMS using Flask-SQLAlchemy
  • REST services to get data from Salesforce API and save in AWS S3
  • Extensively used AWS S3, EC2 and EMR instances to deploy and test the applications in various environments (DEV, QA, PROD) Involved in performance tuning the application at various levels, Hive, Spark, etc
  • Designed and developed data visualizations and monthly reports on the hazardous chemical data based on city requirements using Tableau
  • Worked in scrum/Agile environment, using tools such as JIRA.

Hadoop and Spark Developer

HDFC
Mumbai, India
04.2019 - 07.2021
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Hive, HBase database and SQOOP
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs for data cleaning and pre-processing
  • Involved in gathering requirements from client and estimating timeline for developing complex queries using HIVE and IMPALA for logistics application
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications
  • Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose
  • Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark
  • Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables
  • Written Hive scripts as per requirements to automate the workflow using shell scripts
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP
  • Django web tool to capture validation tests performed on SSD.

Data Engineer

Oracle
India, India
04.2017 - 03.2019
  • Involved in the design of the project using UML Use Case Diagrams, Sequence Diagrams, Object diagrams, and Class Diagrams
  • Experience in developing Spark programs to perform Data Transformations, creating Datasets, Data frames, and writing spark SQL queries, spark streaming, windowed streaming application
  • Written automated Bash scripts to automate the process of running regular jobs
  • Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator
  • Involved in Unit Testing, Integration testing and provided support for System Test and Performance
  • Performed performance tuning of OLTP and Data warehouse environments using SQL
  • Built continuous ETL pipeline by using Kafka, Spark streaming and HDFS
  • Performed ETL on data from various file formats (JSON, Parquet, and Database)
  • Strong knowledge of various data warehousing methodologies and data modeling concepts
  • Heavily involved in testing Snowflake to understand the best possible way to use the cloud resources.

Java Developer

IBM
India, India
06.2013 - 03.2017
  • Developed application using Struts MVC architecture
  • Developed Back Office and Front-End forms/templates with Validations for Login, Registration, maintain security through session / application variables, deliver dynamic content using HTML, JavaScript and Java as required
  • Developed web interfaces using HTML and JavaScript
  • Developed Stored Procedures, Functions using Oracle
  • Proficiency in developing SQL, PL/SQL in JDBC with Oracle 9i/10g as well as client-server environments
  • Developed UI using CSS, JSP, Servlets
  • Experience in creating user interfaces using JSP, HTML, DHTML, XML, XSLT, and JavaScript
  • Involved in validating the views using validator plug-in in Struts Framework
  • Experienced in building and deploying J2EE Application Archives (Jar, War and Ear) on IBM WebSphere application server using Apache Ant
  • Implemented Service tier and Data access tier using spring
  • Developed various test cases using Junit for Unit Testing
  • Designed message formats in XML
  • Back-end stored procedures development with PL/SQL
  • Used multi-threading in programming to improve overall performance
  • Developed DAO objects to mock persistence implementation to test Business Logic
  • Used SVN as version control.

Education

Master’s in Computer and Information Science -

SAU-Southern Arkansas University
USA
05.2023

Skills

  • Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pyspark/Spark
  • Amazon Web Services like IAM, EC2, VPC, AMI, SNS, SQS, EMR, LAMBDA, GLUE, ATHENA, REDSHIFT, Cloud Watch, Auto Scaling, S3
  • Cloud environments like Amazon Web Services (AWS) EMR, EC2, and S3
  • Analytic data warehouse like Snowflake
  • Databricks for handling all analytical processes from ETL to all data modeling
  • Apache Airflow to author workflows as directed acyclic graphs (DAGs)
  • Installation, configuring, supporting, and managing Hadoop Clusters using Apache Cloudera (CDH 5X) distributions on Amazon web services (AWS)
  • Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Real-time data analytics using Spark Streaming, Kafka, and Flume
  • Designing and developing applications in Spark using Scala
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability, and YARN architecture
  • Moving data from different sources using Kafka producers, consumers, and preprocess data
  • Importing and exporting data using stream processing platforms like Flume and Kafka
  • Scripting languages like Linux/Unix, shell scripting, and Python
  • Continuous integration and automated deployment and management using Jenkins
  • Developing UDF, DATA Frames, and SQL Queries in Spark SQL
  • Data Warehousing, Data Mining concepts, and ETL transformations
  • Diverse experience in working with a variety of Databases like Oracle, MySQL, SQL Server
  • NumPy, Matplotlib, Pandas, Seaborn, and Cufflink’s python libraries
  • Python Web UI Frameworks like Flask and Django
  • Working on large datasets using Pyspark, NumPy, and pandas
  • Agile Engineering practices, Scrum methodologies, and Test-Driven Development and Waterfall methodologies
  • Core Java and J2EE technologies such as JDBC, EJB, Servlets, JSP, JavaScript, Struts, and Spring
  • IDEs and Tools like Eclipse, NetBeans, GitHub, Jenkins, Maven
  • Excellent communication, project management, documentation, interpersonal skills
  • Python Programming
  • NoSQL Databases
  • API Development
  • Data Modeling
  • Data Security
  • Continuous integration
  • Data Warehousing
  • Performance Tuning
  • Data Analysis
  • Risk Analysis
  • SQL transactional replications
  • SQL and Databases
  • Big data technologies

Technologyandtools

  • Big Data Eco System
  • Hadoop
  • MapReduce
  • Hive
  • YARN
  • Kafka
  • Spark
  • Avro
  • Elastic Search
  • Parquet
  • Languages
  • Python
  • Scala
  • SQL
  • Linux shell scripting
  • Databases
  • Oracle
  • DB2
  • SQL Server
  • MySQL
  • PL/SQL
  • NoSQL
  • RDS
  • HBase
  • Cassandra
  • AWS
  • EC2
  • S3
  • EMR
  • Redshift
  • IAM
  • Athena
  • VPC
  • SNS
  • SQS
  • Lambda
  • Glue
  • CloudWatch
  • IDE/ Programming Tools
  • Eclipse
  • PyCharm
  • Intelli-J
  • Operating Systems
  • Unix
  • Linux
  • Windows
  • J2EE Technologies
  • Servlets
  • JDBC
  • JSP
  • Struts
  • Web Technologies
  • HTML
  • CSS
  • XML
  • JavaScript
  • JQuery
  • Django
  • Flask
  • Bootstrap
  • RESTful services
  • Libraries and Tools
  • Pyspark
  • Boto3
  • Jira
  • Scrum
  • Agile Methodologies

Timeline

Sr. Data Engineer

Fidelity
05.2023 - Current

Data Engineer

Dish Wireless
09.2022 - 05.2023

Hadoop and Spark Developer

HDFC
04.2019 - 07.2021

Data Engineer

Oracle
04.2017 - 03.2019

Java Developer

IBM
06.2013 - 03.2017

Master’s in Computer and Information Science -

SAU-Southern Arkansas University
Dushyant Singh