Summary

Overview

Work History

Education

Skills

Certification

Timeline

DEEPAK SHARMA

Inver Grove Heights,MN

Summary

Team-oriented Senior Cloud Data Engineer with technical hands-on and lead experience. Good understanding of designing large data warehouse systems, including relational and dimensional data models. Successful in implementing On-Premise (Hadoop/BigData) and Cloud (AWS) technologies, including data-pipeline and data-lake.

Overview

years of professional experience

Certification

Work History

Senior Cloud Data Engineer

Bremer Bank

Minneapolis, MN

08.2020 - Current

Responsible for on-premise Cloudera data-lake implementation and its migration to AWS, and designing secure data-pipelines in the Cloud utilizing Terraform for infrastructure as code.

· AWS (Cloud)

o Collaborated in the design of secure data-lake solution, and led the implementation using a staged multi-zone approach, including creation of data-pipelines for multiple source systems.

o Leveraged AWS services for the pipelines including Glue ETL/Crawler/DataCatalog, Lambda, Step-functions, Lake formation, CloudWatch, Athena, Redshift and QuickSight

o Developed the infrastructure using Terraform, Glue, Lambda, PySpark and DBT (Data build tool), and leveraged Jupyter notebook, Sagemaker, VSCode, Git, Shell as development tools.

o Implemented CICD for the data-pipeline workflow using Git and AWS CodePipeline.

o Performed troubleshooting for issues using CloudTrail, Cloudwatch and Kibana dashboards.

o Automated AWS key and password rotation using AWS SecretsManager.

o Ensured secure access to production data, at raw S3 layer, or Redshift, leveraging proper IAM roles and policies, and data access using appropriate KMS keys.

o Designed and implemented the history data migration of on-premise ODS data using Glue jobs and lambda triggers for data transfers between S3 source and targets across VPCs.

· Cloudera/HortonWorks (On-premise)

o Built on-premise Cloudera (CDP) clusters including OS and cluster software installation on multiple Linux servers. Designed and implemented the data-lake using a secure multi-zone architecture

o Enabled authentication (Kerberos) for securing the cluster for valid access. Enabled security for data in motion using trusted certificates implementation (TLS/SSL)

o Designed data-lake zones, to support raw, trusted and refined data, enforcing security for data at rest (HDFS encryption) by applications and zones

o Utilized Apache Atlas to provide the metadata layer for the ingested data, with an option to provide business taxonomy for schema attributes and provide visual lineage of loaded data

o Implemented policies through Apache Ranger for data access at Hive and HDFS level including auditing for policy violations. Managed the encryption keys for data-lake zones through Ranger

Consultant Database Administrator

Talent Software Service

Minneapolis, MN

07.1997 - 08.2022

o Clients included: Allina Health System, Medtronic, BI Worldwide, Children’s Hospital and Fortis

Big Data Engineer/Software Development Architect

ICF Next

Minneapolis, MN

01.2018 - 07.2020

Participated in various capacities on a program involving a major worldwide Hotel chain to transition from their legacy to a proprietary customer loyalty application.

o Implemented a multi-tenant data-lake for secure code/data storage/retrieval comprising of migration, ingestion, integration, reporting and analytics zones, using raw, staged and processed layers.

o Developed data-pipeline processes for client-data migration, ingestion, integration and reporting. Used a combination of NiFi, Kafka, Scala, PySpark, Hive, Airflow, Shell scripts and SSRS.

o Wrote processes to audit ingested Hive data against Kafka events from source. Created producer to send missing data keys back to Kafka broker so it could be updated on source for next NiFi fetch.

o Developed processes targeting change-data capture related to hard-deletes/updates on source RDBMS to flow to Hadoop/Hive using a combination of Kafka/NiFi, HiveQL and Python scripts.

o Worked with BA team to review the requirements for sensitive fields and used Atlas to define the Tags. Integrated with Ranger to enable Tag-based policies as per AD groups. Enforced LDAP group-based Data-lake folders access and Hive tables access for Dev/QAs, using Apache Ranger.

o The whole development process used an agile framework with sprints, using tools including JIRA, Confluence, IntelliJ, HiveRunner, Git/BitBucket and Bamboo for CI/CD of development branches.

BigData/EDW Lead

Seagate Technology

Bloomington, MN

09.2002 - 09.2017

Led the implementation of an Enterprise Data-Warehouse and BigData platform. Conducted POCs, delivered prototypes and recommended appropriate solutions for application and technology implementation teams.

o Principal lead for EDW database, ETL and BI, including database designs, data management, perform system health checks, mentor and conduct training, working with offshore teams.

o Architected strategy and plan to successfully transition platform migration of 170TB EDW to lower-cost compute/storage platforms within a 24-hour production outage window, resulting in $4M+ savings and eliminating $400K outside consulting costs by driving project in-house.

o Implemented a HortonWorks Hadoop cluster to augment EDW to store 20 times more data.

o Improved performance using de-normalized versions of Terabyte-sized Hive tables.

o Enabled engineers to use Hive query to output results to be analyzed using JMP and Tableau.

o Collaborated with business partners and analysts, to identify data-sources, define ingestion processes and storage methodology with appropriate standards and governance.

o Developed strategic roadmap for Enterprise data warehouse and Business Intelligence including key initiatives for platform migrations, product upgrades, security enhancements and cost reduction.

Education

Bachelor of Science - Computer Science

Institute of Engineering & Technology

Lucknow

Skills

Cloud: AWS (S3, Lambda, Glue, Step-functions, VPC, Lake Formation, Redshift, Secrets), Terraform

BigData: Hive, Cloudera Manager, Ambari, NiFi, Kafka, Sqoop, Atlas, Ranger, Airflow, Zeppelin

ETL/BI/Dev: DBT, QuickSight, Jupyter, Sagemaker, Denodo, Informatica, BOBJ, VSCode, IntelliJ, Shell

Agile/CI/CD: Confluence, GitHub, JIRA, BitBucket, SourceTree, Bamboo, Docker, Kanban, Scrum

Databases: Redshift, Oracle, SQL Server, MySQL, PostgreSQL

Languages: Unix/Linux shell, Python, PySpark, C, HiveQL, Perl, SQL, PL/SQL

Platforms: AWS, Cloudera/HortonWorks, HDFS, Unix, Linux, AIX, Zaloni, PagerDuty

Certification

AWS Certified Cloud Practitioner
CDP Administrator Training
ADM 301 NiFi Flow Management

Timeline

Senior Cloud Data Engineer

Bremer Bank

08.2020 - Current

Big Data Engineer/Software Development Architect

ICF Next

01.2018 - 07.2020

BigData/EDW Lead

Seagate Technology

09.2002 - 09.2017

Consultant Database Administrator

Talent Software Service

07.1997 - 08.2022

Bachelor of Science - Computer Science

Institute of Engineering & Technology

DEEPAK SHARMA

Summary

Overview

Work History

Senior Cloud Data Engineer

Consultant Database Administrator

Big Data Engineer/Software Development Architect

BigData/EDW Lead

Education

Bachelor of Science - Computer Science

Skills

Certification

Timeline

Senior Cloud Data Engineer

Big Data Engineer/Software Development Architect

BigData/EDW Lead

Consultant Database Administrator

Bachelor of Science - Computer Science

Similar Profiles

Miranda SchmidtMiranda Schmidt

Jordan MartinsonJordan Martinson

Johanna TimsJohanna Tims

Megan AndersonMegan Anderson

REDARIO POWELLREDARIO POWELL