Summary
Overview
Work History
Skills
Accomplishments
Timeline
Generic

Srikanth Reddy Chanda

Dayton,OH

Summary

9 years of IT Experience in Diverse Domains: Extensive background in end-to-end data analytics solutions encompassing Big Data, Hadoop, Informatica, Data Modeling, and System Analysis across Banking, Finance, Insurance, and Telecom sectors.

Overview

9
9
years of professional experience

Work History

Senior Data Engineer

Bank of America
08.2022 - Current
  • Led Agile delivery and integrated SAFe and DevOps frameworks, leveraging DevOps tools for end-to-end planning, building, testing, releasing, and monitoring processes
  • Architected, developed, and maintained robust CI/CD pipelines for Big Data solutions within Azure DevOps, ensuring seamless deployment of code to production
  • Constructed reusable YAML pipelines, leveraging Azure Data Factory, Data Lake, and Databricks, while effectively managing code changes with Git flow branching strategy
  • Employed PowerShell scripting, Bash, YAML, JSON, GIT, Rest API, and Azure Resource Management (ARM) templates to orchestrate and manage CI/CD pipelines
  • Set and enforced CI/CD standards and best practices, encompassing version control, code reviews, and scalable data processing pipelines implemented with PySpark for efficient data ingestion, transformation, and analysis
  • Leveraged PySpark's distributed computing capabilities to optimize large-scale data processing, significantly enhancing processing speed and overall performance
  • Orchestrated messaging queues using RabbitMQ to facilitate seamless data flow from HDFS for processing, and harnessed Kafka and RabbitMQ to capture data streams, all encapsulated within Docker virtualized test and dev environments
  • Proficiently designed and deployed SSIS packages, enabling seamless extraction, transformation, and loading of data into Azure SQL Database and Azure Data Lake Storage
  • Adeptly configured and fine-tuned SSIS Integration Runtime for efficient execution of SSIS packages in Azure, optimizing overall performance
  • Designed Docker Containers, both Linux and Windows-based, utilizing existing Linux Containers, AMIs, and building from scratch, while effectively managing container clusters with Docker Swarm, Mesos, and Kubernetes
  • Collaborated closely with development teams to diagnose issues and debug code within Windows environments, additionally mentoring junior engineers on CI/CD best practices and cloud-native architectures
  • Developed robust Databricks solutions for data extraction, transformation, and aggregation from diverse sources, creating high-performance data ingestion pipelines via Azure Data Factory and Azure Databricks
  • Constructed SCD Type 2 Dimensions and Facts leveraging Delta Lake and Databricks capabilities, ensuring accurate and efficient data management
  • Engineered custom ETL solutions, encompassing batch processing and real-time data ingestion using PySpark and Shell Scripting, facilitating seamless data movement within Hadoop clusters
  • Crafted Azure Databricks (Spark) notebooks to efficiently extract and load data between Data Lake storage accounts, Blob storage accounts, and on-premises SQL server databases
  • Conducted comprehensive statistical analysis utilizing SQL, Python, Scala, R Programming, and Excel, augmenting data-driven insights and generating key conclusions
  • Employed Python and SAS to extract, transform, and load source data from transaction systems, producing transformative reports and insights, while seamlessly transferring data from Azure storage to Azure SQL on Azure Databricks platform
  • Automated Azure Databricks jobs and constructed SSIS packages to facilitate smooth data transfer from Azure SQL to on-premises servers
  • Designed and implemented ETL solutions in Databricks, adhering to bronze, silver, and gold layer architecture, and leveraged Azure Data Factory to orchestrate data preparation and loading into SQL Data Warehouse
  • Seamlessly integrated on-premises data sources (MySQL, HBase) with cloud platforms (Blob Storage, Azure SQL DB), applying transformations to facilitate loading into Azure Synapse via Azure Data Factory
  • Created, published, and deployed Docker container images via Azure Container Registry into Azure Kubernetes Service (AKS), ensuring efficient containerized deployments
  • Transferred metadata into Hive, seamlessly migrating existing tables and applications for Hive and Azure compatibility, while implementing complex transformations and manipulations using ADF, Scala, and Python
  • Streamlined data ingestion from varied sources, including relational and non-relational databases, through Azure Data Factory configurations, optimizing Apache Airflow performance with tailored settings
  • Designed and implemented DAGs within Apache Airflow to schedule ETL jobs, enhancing workflow efficiency and incorporating additional components like Pool, Executors, and multi-node functionality
  • Configured Spark streaming for real-time data reception from Apache Flume, employing Scala to store stream data in Azure Table and Data Lake, ultimately used for processing and analytics
  • Architected and executed cloud implementation strategies for hosting complex app workloads on MS Azure, ensuring optimal performance and scalability
  • Performed transformation layer operations using Apache Drill, Spark RDD, Data frame APIs, and Spark SQL, harnessing Spark's capabilities for various aggregations and data manipulations
  • Derived real-time insights and reports by harnessing Spark Scala functions, optimizing cluster performance and reliability through continuous monitoring and fine-tuning
  • Enhanced query performance by transitioning log storage from Cassandra to Azure SQL Data Warehouse, resulting in improved overall data processing efficiency
  • Engineered custom input adapters utilizing Spark, Hive, and Sqoop to seamlessly ingest analytics data from diverse sources (Snowflake, MS SQL, MongoDB) into HDFS
  • Leveraged Scala for concurrency and parallel processing to optimize large dataset processing efficiency, while developing map-reduce jobs for streamlined data processing
  • Accelerated data processing by developing and optimizing Spark jobs using Python and Spark SQL, fine-tuning parameters like batch interval time and parallelism
  • Implemented indexing for data ingestion using Flume sink, facilitating direct writing to cluster-based indexers
  • Managed and delivered data for analytics and Business Intelligence needs using Azure Synapse, ensuring seamless and reliable data availability
  • Bolstered security by integrating Azure DevOps, VSTS (Visual Studio Team Services), Active Directory, and Apache Ranger for robust CI/CD and authentication mechanisms, effectively managing resource allocation and scheduling through Azure Kubernetes Service
  • Environment: Hadoop, Scala, Spark, Hive, Sqoop, HBase, Flume, Ambari, Scala, MS SQL, MySQL, SSIS, Snowflake, MongoDB, Git, Data Storage Explorer, Python, Azure (Data Storage Explorer, ADF, AKS, Blob Storage), RabbitMQ, Docker.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
  • Designed and implemented effective database solutions and models to store and retrieve data.

Senior Data Engineer

Fannie mae
01.2021 - 07.2022
  • Architected and executed intricate data pipelines via AWS Glue for efficient transformation and loading of extensive data from diverse sources
  • Automated ETL processes through maintenance and optimization of Glue jobs and crawlers, ensuring seamless data processing and analysis
  • Designed and implemented robust data processing pipelines using Amazon EMR, PySpark, and AWS Glue, enabling efficient extraction, transformation, and loading of large-scale datasets into Redshift for analysis
  • Optimized PySpark jobs for performance and scalability, achieving 40% reduction in processing time and enabling real-time insights from data
  • Leveraged AWS Glue to automate ETL workflows, reducing manual intervention and improving data accuracy by creating automated data quality checks and transformations
  • Developed serverless data processing solutions using AWS Lambda and Step Functions, enabling event-driven data processing and reducing operational overhead
  • Architected serverless ETL pipelines with Lambda and Step Functions, achieving cost savings of up to 30% compared to traditional infrastructure setups
  • Implemented logging, monitoring, and error handling mechanisms within serverless applications, ensuring robustness and reliability of data processing workflow
  • Orchestrated containerized data processing workloads using Amazon ECS with Fargate, achieving seamless scaling and resource isolation for data-intensive applications
  • Designed and implemented Docker containers for data processing tasks, allowing for consistent environments and efficient deployment across multiple stages of data pipeline
  • Utilized ECS with Fargate to achieve cost optimization by matching container resources precisely to workload demands, resulting in a 20% reduction in infrastructure costs
  • Designed and implemented holistic data pipelines incorporating AWS services like S3, Glue, and Redshift, seamlessly integrated with Snowflake as a cloud data warehouse
  • Crafted and maintained scalable, performant data models within Snowflake, optimizing Pyspark jobs for Kubernetes Cluster execution to enhance data processing speed
  • Developed a framework for migrating PowerCenter mappings to PySpark (Python and Spark) jobs, guiding and enforcing quality standards for development team
  • Orchestrated PySpark integration with Hadoop, Hive, and other big data technologies, establishing comprehensive end-to-end data processing pipelines
  • Employed AWS EMR to deploy and manage big data processing applications, utilizing frameworks like Spark and Hadoop for advanced data processing
  • Engineered Spark, Hive, Pig, Python, Impala, and HBase data pipelines for seamless customer data ingestion and processing
  • Designed RESTful APIs with Django Rest Framework (DRF) and Flask-RESTful, ensuring seamless integration with external systems
  • Generated SQL and PL/SQL scripts for managing database objects, encompassing tables, views, primary keys, indexes, and sequences
  • Orchestrated Amazon EC2 instances creation, troubleshooting, and health monitoring, alongside other AWS services for multi-tier application deployment
  • Designed and executed high-availability, fault-tolerant, and auto-scaling multi-tier applications utilizing AWS services like EC2, Route53, S3, RDS, DynamoDB, SNS, SQS, and IAM
  • Employed Apache Spark and Python for Big Data Analytics and Machine Learning applications, with expertise in Spark ML and MLlib
  • Provided Linux and Windows cloud instances support on AWS, configuring Elastic IP, Security Groups, and Virtual Private Cloud
  • Configured Amazon EC2, S3, Elastic Load Balancing, and security components in VPC, ensuring robust network security
  • Automated data backups to S3 buckets, EBS, and AMIs using AWS CLI, ensuring data safety for critical production servers
  • Created OpenShift namespaces for on-premises applications transitioning to cloud in OpenShift Pass environment
  • Virtualized servers using Docker for testing and development environments, streamlining configuration through Docker containers
  • Managed Docker clusters, including Docker Swarm, Mesos, and Kubernetes, integrating them with Amazon AWS/EC2 and Google's Kubernetes
  • Developed Jenkins CI/CD pipeline jobs for end-to-end automation, overseeing artifact management in Nexus repository, and utilizing Jenkins nodes for parallel builds
  • Environment: AWS, Ansible, ANT, Maven, Jenkins, Bamboo, Splunk, Confluence, Bitbucket, GIT, JIRA, Python, SSH, Shell Scripting, Docker, JSON, JAVA/J2EE, Kubernetes, Nagios, Red Hat Enterprise Linux, Terraform, Kibana, AWS Fargate.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
  • Designed and implemented effective database solutions and models to store and retrieve data
  • Prepared documentation and analytic reports, delivering summarized results, analysis and conclusions to stakeholders
  • Used GDP on validation protocols, test cases and changed control documents
  • Analyzed complex data and identified anomalies, trends, and risks to provide useful insights to improve internal controls
  • Developed, implemented and maintained data analytics protocols, standards, and documentation

Senior Data Engineer

AutoZone
10.2019 - 12.2020
  • Developed RESTful APIs using Python with Flask and Django frameworks, seamlessly integrating diverse data sources such as Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files
  • Leveraged Apache Spark with Python to architect and execute sophisticated Big Data Analytics and Machine Learning applications, successfully implementing machine learning use cases within Spark ML and MLlib
  • Designed and deployed SSIS packages for data loading and transformation within Azure databases and storage environments
  • Configured and managed SSIS Integration Runtime for seamless execution of SSIS packages in Azure infrastructure
  • Employed Spark and Python to craft regular expression (regex) projects within the Hadoop/Hive ecosystem, spanning Linux and Windows environments for comprehensive big data processing
  • Developed Spark streaming modules for efficient data acquisition from RabbitMQ and Kafka sources
  • Proficiently profiled structured, unstructured, and semi-structured data across diverse sources, adeptly identifying data patterns
  • Implemented data quality metrics via essential queries and Python scripts tailored to source characteristics
  • Analyzed, designed, and engineered contemporary, scalable, and distributed data solutions using Hadoop and Azure cloud services.

Data Engineer

Hike
08.2017 - 09.2019
  • Contributed to the analysis, design, and development phases of the Software Development Lifecycle (SDLC)
  • Proficient in agile methodologies, actively participating in sprint planning, scrum calls, and retrospective meetings
  • Managed project tracking with JIRA and version control via GitHub
  • Designed, developed, and maintained transformation processes in both non-production and production environments within Azure
  • Crafted data pipelines using PySpark Programming, employing technologies like Spark, Hive, Pig, Python, Impala, and HBase for effective customer data ingestion
  • Utilized Spark Streaming to segment streaming data into batches for seamless input to Spark engine, facilitating efficient batch processing
  • Developed Spark applications for tasks such as data validation, cleansing, transformation, and custom aggregation
  • Employed Spark engine and Spark SQL for comprehensive data analysis, providing valuable insights for data scientists' further investigations
  • Engineered RESTful APIs using Python with Flask and Django frameworks, seamlessly integrating diverse data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files
  • Leveraged Apache Spark with Python to architect and execute advanced Big Data Analytics and Machine Learning applications, successfully executing machine learning use cases within Spark ML and MLlib
  • Developed Spark and Python solutions for regular expression (regex) projects within the Hadoop/Hive environment, proficiently operating across Linux and Windows platforms for robust big data processing
  • Created Spark streaming modules for efficient data acquisition from RabbitMQ and Kafka sources
  • Proficiently profiled structured, unstructured, and semi-structured data from various sources, identifying key data patterns
  • Implemented data quality metrics through tailored queries and Python scripts based on source-specific characteristics
  • Designed, constructed, and managed SSIS packages to facilitate efficient data integration and transformation within Azure
  • Skillfully configured and optimized SSIS Integration Runtime for seamless package execution on the Azure platform
  • Analyzed, designed, and constructed modern, scalable distributed data solutions utilizing Hadoop and Azure cloud services.

Data Engineer

Myntra
12.2014 - 06.2017
  • Played a pivotal role in capturing comprehensive business, system, and design requirements
  • Conducted gap analysis, and illustrated findings through use case diagrams and flow charts
  • Architecture a dynamic, cross-device, cross-browser, and mobile-friendly web dashboard utilizing Angular JS
  • Empowered the management of multiple chatbots across diverse environments
  • Orchestrated the development of Bot framework conversation flows, utilizing NODE-RED, NodeJS, MS Bot framework
  • Crafted the user interface for the web dashboard utilizing HTML, CSS, Bootstrap, and Angular JS
  • Designed, constructed, and managed SSIS packages, enabling seamless data integration and transformation within Azure
  • Skillfully configured and optimized SSIS Integration Runtime for efficient package execution on the Azure platform
  • Pioneered the creation of custom nodes on NODE-RED dashboard, facilitating streamlined conversation building through Node.js over the MS Bot framework
  • Actively contributed to the implementation of user authentication mechanisms within the application, leveraging Stormpath and Passports for robust security measures
  • Employed a diverse array of Validation Controls for client-side validation
  • Crafted custom validation controls using Angular validation controls and Angular Material Design, enhancing data integrity
  • Engineered Spark applications using PySpark and Spark-SQL for robust data extraction, transformation, and aggregation
  • Analyzed and transformed data from multiple file formats, unveiling valuable insights into customer usage patterns
  • Successfully established a robust CI/CD pipeline leveraging Jenkins and Airflow for containerization via Docker and Kubernetes
  • Orchestrated ETL operations using SSIS, NIFI, Python scripts, and Spark Applications
  • Constructed data flow pipelines, expertly transforming data from legacy tables to Hive, HBase tables, and S3 buckets
  • This data was handed off to business stakeholders and Data scientists for advanced analytics
  • Implemented data quality checks using Spark Streaming, seamlessly categorizing data with bad and passable flags, ensuring data integrity and reliability.

Skills

  • Hadoop Proficiency:
  • Strong support experience across major Hadoop distributions - Cloudera, Amazon EMR, Azure HDInsight, Hortonworks Proficient with Hadoop tools - HDFS, MapReduce, Yarn, Spark, Kafka, Hive, Impala, HBase, Sqoop, Airflow, and more
  • Azure Cloud and Big Data Tools: Working knowledge of Azure components - HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, Cosmos DB Hands-on experience with Spark using Scala and PySpark
  • Database Migration: Expertise in migrating SQL databases to Azure Data Lake, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse Proficient in access control and migration using Azure Data Factory
  • Cloud Computing and Big Data Tools:
  • Proficient in Azure Cloud and Big Data tools - Hadoop, HDFS, MapReduce, Hive, HBase, Spark, Azure Cloud, Amazon EC2, DynamoDB, S3, Kafka, Flume, Avro, Sqoop, PySpark
  • Real-time Data Solutions: Build real-time data pipelines and analytics using Azure components like Data Factory, HDInsight, Azure ML Studio, Stream Analytics, Azure Blob Storage, and Microsoft SQL DB
  • Database Expertise: Work with SQL Server and MySQL databases Skilled in working with Parquet files, parsing, and validating JSON formats Hands-on experience in setting up workflows with Apache Airflow and Oozie
  • API Development and Integration: Develop highly scalable and resilient RESTful APIs, ETL solutions, and third-party platform integrations as part of an Enterprise Site platform
  • IDE and Version Control: Proficient use of IDEs like PyCharm, IntelliJ, and version control systems SVN and Git
  • Technical Skills:
  • Big Data Technologies
  • Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala, HDFS, MapReduce, Hive, Pig, BDM, Sqoop, Flume, Oozie, Zookeeper
  • Hadoop Distribution
  • Cloudera CDH, Apache, Horton Works HDP
  • Programming Languages
  • SQL, PL/SQL, Python, R, PySpark, Scala, Pig, Hive QL, Scala, Shell, Python Scripting, Regular Expressions, RabbitMQ
  • Spark components
  • RDD, Spark SQL (Data Frames and Dataset), and Spark Streaming
  • Cloud Infrastructure
  • Azure, GCP
  • Databases
  • Oracle, Teradata, My SQL, SQL Server, NoSQL Database (HBase, MongoDB)
  • Scripting &Query Languages
  • Shell scripting, SQL
  • Version Control
  • CVS, SVN and Clear Case, GIT
  • Build Tools
  • Maven, SBT
  • Containerization Tools
  • Kubernetes, Docker, Docker Swarm
  • Reporting Tools
  • Junit, Eclipse, Visual Studio, Net Beans, Azure Databricks, UNIX Eclipse, Visual Studio, Net Beans, Junit, CI/CD, Linux, Google Shell, Unix, Power BI, SAS and Tableau
  • Environment:
  • Python, HDFS, PySpark, Yarn, Pandas, NumPy, Spark, EMR, Spectrum, Glue, Netezza, Azure (HDInsight, Databricks, data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, AD, AKS), Scala, SSIS,Python, Hadoop 2x, Spark v202, NLP, Airflow v182, Hive v201, Sqoop v146, HBase, Oozie, Talend, Cosmos DB, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Ranger, Git, RabbitMQ
  • Python, HDFS, PySpark, Yarn, Pandas, NumPy, Spark, EMR, Spectrum, Glue, Netezza, SSIS, Azure (HDInsight, Databricks, data Lake, Blob Storage, Data Factory, SQL DB, SQL DWH, AD, AKS), Scala, Python, Hadoop 2x, Spark v202, NLP, Airflow v182, Hive v201, Sqoop v146, HBase, Oozie, Talend, Cosmos DB, MS SQL, MongoDB, Ambari, PowerBI, Azure DevOps, Ranger, Git, RabbitMQ
  • Python, Informatica V9, MS SQL SERVER, SSIS, T-SQL, SSIS, SSRS, SQL Server Management Studio, Oracle, Excel

Accomplishments

  • Proficiently build reusable YAML pipelines in Azure DevOps, create CI/CD pipelines using cloud-native architectures on Azure Cloud, and implement Git flow branching strategy
  • Cloud Native Technologies: Solid grasp of Azure services including Databricks, Data Factory, Data Lake, and Function Apps
  • Proficient with SQL and experienced in Agile, SAFe, and DevOps methodologies
  • Developed Spark streaming modules for RabbitMQ and Kafka data ingestion
  • DevOps and Scripting Proficiency: Skilled in PowerShell scripting, Bash, YAML, JSON, GIT, Rest API, and Azure Resource Management (ARM) templates
  • Implement CI/CD standards, integrate security scanning tools, and manage pipelines effectively
  • Windows Scripting and Cloud Containerization: Proficient in scripting and debugging within Windows environments
  • Familiarity with container orchestration, Kubernetes, Docker, and AKS
  • Efficient Data Integration: Expertise in designing and deploying SSIS packages for data extraction, transformation, and loading into Azure SQL Database and Data Lake Storage
  • Configure SSIS Integration Runtime for Azure execution and optimize package performance
  • Data Visualization and Analysis: Create data visualizations using Python, Scala, and Tableau
  • Develop Spark scripts with custom RDDs in Scala for data transformation and actions
  • Conduct statistical analysis on healthcare data using Python and various tools
  • Big Data Ecosystem: Extensive experience with Amazon EC2 for computing, query processing, and storage
  • Proficiently set up Pipelines in Azure Data Factory using Linked Services, Datasets, and Pipelines for ETL tasks
  • Azure Data Services: ETL expertise using Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Ingest data to Azure Services and process within Azure Databricks
  • Developed JSON scripts for streamlined deployment of pipelines within Azure Data Factory (ADF), facilitating efficient data processing through SQL Activity
  • Demonstrated expertise in migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, Delta Lake, and Azure SQL Data Warehouse
  • Successfully managed database access, control, and migration via Azure Data Factory
  • Executed Extract Transform Load (ETL) operations on Azure Data Storage services using a blend of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Profoundly knowledgeable in Clustering, NLP, and Neural Networks, effectively translating outcomes into interactive dashboards for visualization and presentation
  • Implemented Spark RDD transformations for comprehensive business analysis and subsequent actionable processes
  • Strategically engaged in data migration to Hadoop while optimizing Hive queries for performance enhancement
  • Expertly orchestrated data extraction, transformation, and loading across Azure Data Storage services through Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Subsequent data processing transpired within Azure Databricks
  • Automated script execution through Apache Airflow and shell scripting, ensuring seamless daily production procedures
  • Constructed pipelines in Azure Data Factory (ADF) encompassing Linked Services, Datasets, and Pipelines
  • Successfully extracted, transformed, and loaded data from diverse sources like Azure SQL, Blob storage, Azure SQL Data Warehouse, and more
  • Spearheaded Data Migration initiatives employing SQL, SQL Azure, Azure Storage, Azure Data Factory, SSIS, and PowerShell technologies
  • Proficiently profiled structured, unstructured, and semi-structured data from various sources, meticulously implementing data quality metrics via essential queries or Python scripts aligned with source attributes
  • Skillfully employed PowerShell and UNIX scripts for file management, transfer, emailing, and related tasks
  • Innovatively conceptualized a novel data model integrating NoSQL submodules within a relational structure through Hybrid data modeling concepts
  • Leveraged Sqoop to seamlessly transfer data between RDBMS and HDFS, streamlining data integration
  • Proficiently installed and configured Apache Airflow for Snowflake data warehouse, establishing robust DAGs (Directed Acyclic Graphs) for automated workflow execution
  • Employed MongoDB for data storage in JSON format, adeptly crafting and testing dashboard features utilizing Python, Bootstrap, CSS, and JavaScript
  • Ensured efficient code deployment to EMR via CI/CD using Jenkins
  • Possess sound expertise in developing highly scalable and resilient RESTful APIs, ETL solutions, and third-party platform integrations within Enterprise Site platforms
  • Proficiently navigated various IDEs including PyCharm, IntelliJ, and managed repositories using SVN and Git version control systems
  • Developed JSON scripts to facilitate streamlined deployment of pipelines within Azure Data Factory (ADF), ensuring efficient data processing through SQL Activity
  • Demonstrated expertise in migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, Delta Lake, and Azure SQL Data Warehouse
  • Skillfully managed database access, control, and migration via Azure Data Factory
  • Executed Extract Transform Load (ETL) operations on Azure Data Storage services, employing a blend of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Profoundly knowledgeable in Clustering, NLP, and Neural Networks, translating outcomes into dynamic, interactive dashboards for data visualization and presentation
  • Successfully implemented Spark RDD transformations, effectively mapping business analyses and applying actionable processes atop the transformed data
  • Led data migration initiatives to Hadoop while optimizing Hive queries for heightened processing efficiency
  • Efficiently executed Extract Transform Load (ETL) operations on Azure Data Storage services, utilizing a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Accomplished subsequent data processing within Azure Databricks
  • Automated script execution using Apache Airflow and shell scripting, ensuring consistent daily production operations
  • Created robust pipelines within Azure Data Factory (ADF), utilizing Linked Services, Datasets, and Pipelines for seamless data extraction, transformation, and loading from diverse sources like Azure SQL, Blob storage, and Azure SQL Data Warehouse
  • Played a pivotal role in Data Migration initiatives utilizing SQL, SQL Azure, Azure Storage, Azure Data Factory, SSIS, and PowerShell technologies
  • Skillfully profiled structured, unstructured, and semi-structured data from various sources, meticulously implementing data quality metrics through essential queries or Python scripts tailored to the unique attributes of each source
  • Proficiently managed file transfer, emailing, and other file-related tasks using PowerShell and UNIX scripts
  • Conceptualized a novel data model embedding NoSQL submodules within a relational structure, employing Hybrid data modeling concepts for enhanced data representation
  • Leveraged Sqoop for seamless data transfer between RDBMS and HDFS, facilitating efficient data integration across platforms
  • Expertly installed and configured Apache Airflow for Snowflake data warehouse, establishing effective Directed Acyclic Graphs (DAGs) to orchestrate automated workflows
  • Utilized MongoDB to store data in JSON format, skillfully developing and testing dashboard features using Python, Bootstrap, CSS, and JavaScript
  • Ensured streamlined deployment via CI/CD using Jenkins
  • Possess extensive expertise in designing highly scalable and resilient RESTful APIs, ETL solutions, and third-party platform integrations within Enterprise Site platforms
  • Proficiently navigated various Integrated Development Environments (IDEs) including PyCharm and IntelliJ
  • Managed version control using SVN and Git repositories.

Timeline

Senior Data Engineer

Bank of America
08.2022 - Current

Senior Data Engineer

Fannie mae
01.2021 - 07.2022

Senior Data Engineer

AutoZone
10.2019 - 12.2020

Data Engineer

Hike
08.2017 - 09.2019

Data Engineer

Myntra
12.2014 - 06.2017
Srikanth Reddy Chanda