Summary
Overview
Work History
Education
Skills
Timeline
Generic

Navaneeswara Namala

CHARLOTTE,North Carolina

Summary

Over 9 years of software development experience, 6 years of experience working on Big Data technologies within the Apache Spark and Hadoop Ecosystem. Proficient in building ETL and Machine Learning data pipelines using Azure Databricks and Azure Data Factory. Skilled in utilizing ML Flow model registry, Delta Lakehouse, and Auto ML features in Azure Databricks. Extensive experience in end-to-end machine learning projects, from design to delivery. Expertise in working with cloud-based Data Warehouses like Snowflake and Delta Lakehouse. Accomplished constructing automated data pipelines from landing zones to refined tables in the Snowflake data warehouse. Hands-on experience in Big Data Analytics, encompassing Data Extraction, Transformation, Loading, and Analysis using Databricks, Cloudera, and Hortonworks platforms. Proficient in Java, Hadoop Map Reduce, HDFS, Pig, Hive, Oozie, Sqoop, HBase, Scala, Python, Kafka, and NoSQL Databases. Worked on on-premises and cloud computing servers, including Azure, AWS, and Google Cloud. Expertise in Cloud Big Data tools on Azure and AWS. Familiarity with Azure Data Engineering tools such as Azure Cosmos DB (SQL, Table, Cassandra, MongoDB, Graph), Azure Synapse Analytics, Azure Streaming Analytics, Azure Databricks, Azure Data Lake Storage Gen2, and Azure Storage accounts services. Experience with AWS Big Data tools including Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, SQS, S3 bucket, EMR, DynamoDB, Redshift, Aurora DB, Glue, and quick sight services. In-depth understanding of Hadoop architecture and components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce programming paradigm. Extensive experience in working with structured data using Hive, writing custom UDFs, and optimizing Hive Queries. Proficient in importing/exporting data using Sqoop from HDFS to various Relational Databases like Oracle, Teradata, Netezza, and SQL Server. Strong knowledge of NoSQL databases like Mongo DB, HBase, Cassandra, AWS Dynamo DB, and Azure Cosmos DB. Well-versed in Data Warehousing concepts, facts, dimensional tables, and diverse data formats (CSV, text, Avro, orc, JSON, Parquet, and Delta). Managed and monitored Apache Hadoop clusters using Ambari. Proficient in data ingestion tools like Azure Data Factory, Apache Sqoop, and AWS Kinesis service. Experience with Spark, Spark Streaming (Scala and Python), and hands-on experience with Azure Databricks. Hands-on experience in Data mining techniques and machine learning algorithms using Python libraries: scikit-learn, Seaborn, Matplotlib, NumPy, and Pandas. Strong understanding of NoSQL databases, with hands-on experience writing applications on HBase and working on real-time processing using Rest API. Experience using build tools Maven, ANT, Jenkins, Bamboo, Gitlab, and Azure DevOps for deploying automated builds in different environments and familiarity with CI/CD pipelines.

Overview

10
10
years of professional experience

Work History

Data Engineer/ ML Engineer

Cummins
11.2020 - Current
  • Worked in a Data Science team that delivers solutions using Predictive modeling techniques, Machine learning
  • Worked on exploratory data analysis using Scala, Python, and data visualization modules Seaborn and Matplotlib
  • Worked with data scientists to create a clean and transformed dataset to train a model
  • Registered Python sci-kit-learn machine learning models in the MLflow registry on Azure Databricks
  • Created custom models, packaged them as an MLFlow Pyfunc model, and registered them in the model registry to serve as an on-demand API to get the predictions
  • Created an end-to-end pipeline to process and transform the data and apply the machine learning classification model in the Spark Structured streaming job and saved the results to an Azure SQL server table for accessing the results through a rest API
  • Created end-to-end pipeline involving data wrangling, transformation, aggregating, creating the input data to the classification model, and applying the ML model from the MLFlow registry to generate the predictions
  • Experienced in using Azure EventHub to send events to the downstream team
  • Created a job to generate the metrics for the Machine learning pipelines that helps business clients to get insights
  • Implemented Delta Lake migration by converting the parquet tables into the Delta tables and storing them on the Delta Lakehouse
  • Created indexes on the SQL server table to optimize the performance
  • Implemented Continuous integration and continuous delivery using Gitlab and Databricks repos
  • Creating a data pipeline involves fetching data through a rest API, data cleaning, and transforming data to create a model input and passing it to the regression model to get the predictions and send them to the downstream team using Azure Event hub
  • Created a framework to verify the predictions of the machine learning models with the ground-level raw data
  • Created a model monitoring framework to identify the data drift in the raw data serving the machine learning models
  • Worked on a Proof of concept for implementing the Feature store in Databricks and MLFlow model remote registry
  • Worked on a proof of concept for implementing Azure Machine Learning Studio
  • Worked on Databricks runtime migration project and workspace migrations
  • Worked on automated scripts to load and read data from Snowflake data warehouse tables using Python, Snow Pipe, and SnowSQL
  • Experienced in data sharing, time travel, and optimizing the tables and data sharing with other customers in the Snowflake marketplace.

Big Data Developer

Walgreens
06.2018 - 10.2020
  • Worked on analyzing Hadoop cluster and Big Data analytic tools, including Hive, HBase and SQOOP
  • Implemented Spark 2.4 using Scala 2.11 and Spark SQL for faster processing of data
  • Used Spark for interactive queries, processing of streaming data and integration
  • Importing the data from the Teradata and Oracle, SQL server into the HDFS using Sqoop
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
  • Worked on Spark Streaming and Spark SQL to run sophisticated applications on Hadoop
  • Ingested data to HDFS and writing hive queries to process required data
  • Loaded and transformed large sets of structured, semi structured XML data into HDFS
  • Configured connection between HDFS and Tableau using Impala for Tableau developer team
  • Experienced in managing and reviewing Hadoop log files for troubleshooting
  • Developed shell scripts for executing the Hadoop file system commands for file handling operations
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce
  • Created jobs for loading the raw data into the tables and denormalized tables
  • Performance-tuned and optimized Hive quires to achieve high performance
  • Written Hive queries for data analysis to meet business requirements
  • Monitored System health and logs and responded accordingly to warning or failure conditions
  • Responsible for managing the test data coming from different sources
  • Involved in scheduling the Oozie workflow engine to run multiple Hive jobs for the business team to run analytical queries
  • Created and maintained technical documentation for launching Hadoop jobs and executing Hive queries and Pig Scripts
  • Implemented schedulers on the Job tracker to share the cluster’s resources for the Map Reduce jobs given by the users
  • Worked on the Azure Data engineerings tools like Azure Data Factory, Azure Databricks, Cosmos DB, Azure Synapse Analytics, Power BI, and Azure Data Lake Gen2
  • Created ETL jobs using Azure Data Factory to ingest the data from the Azure Blob storage to Azure Data Lake Storage and then load into the Azure Synapse Analytics (Azure Data warehouse) for processing using Azure Databricks and to run the analytical queries
  • Created PySpark jobs to process the data present in the Azure data lake storage Gen2 and load into the Azure Synapse Analytics tables
  • Created automated jobs for ingesting and processing data using Azure Data Factory, Azure Databricks, Azure Data Lake Storage Gen2 and Azure Synapse Analytics
  • Worked on POC to migrate the On-premises data from HBase tables to Azure Cosmos DB SQL table
  • Using NIFI crated jobs to extract, load and transform data into HDFS
  • Implemented Transparent Data Encryption and created encryption zones in existing Hadoop folders
  • Written automated scripts using Python and Shell scripting to implement TDE in the Hadoop cluster
  • Written python scripts to run multiple threads from a single processor
  • Written shell scripts to copy the larger volumes of data using DISTCP tool
  • Created Hadoop jobs to transfer the data between the 2 Hadoop clusters
  • Created Incremental jobs to load data into the HBase tables using HBase Bulk Load operations tool to import data from different files formats like CSV, TSV and JSON
  • Worked on Space optimization by remodifying the HBase table Row Key
  • Experienced on performing the Exploratory Data Analysis (EDA) using Python, Seaborn, Matplotlib, NumPy, Pandas, Scikit learn packages
  • Worked on training and building the predictive model using linear regression algorithms using Python scikit-learn module
  • Made changes to the existing Ab initio job by making changes in the SQL query and in the DML files and verified in the Teradata tables
  • Worked and analyzed the Spark Scala jobs
  • Used the Bit bucket as the Code repository and Artifactory to store the artifacts
  • Implemented Continuous Integration and Continuous Delivery using the DevOps tools like Bitbucket, Artifactory to deploy the artifacts on the Test and Production servers.

Hadoop developer

Entergy
03.2017 - 05.2018
  • Developed Spark jobs using Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the standard learner data model, which gets the data from AWS Kinesis Data Firehose in near real-time and Persists into AWS Dynamo DB
  • Configured AWS Kinesis Data Streams and Kinesis Data Firehose jobs in Dev and Test environment and loaded data into AWS S3 buckets and AWS Redshift
  • Worked on AWS Elastic Cloud Compute (EC2 Instance) for computational tasks and stored data into AWS S3 buckets as the storage mechanism
  • Created PySpark automated batch processing jobs to clean and process data by loading into Spark Data frame from the AWS S3 bucket and writing back the processed data to S3 bucket and then using copy command loaded into AWS Redshift tables for the performing analytics
  • Worked on provisioning the AWS Elastic Map Reduce (EMR) cluster by specifying the data inputs, outputs and IAM security roles
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Used Spark SQL API over Hadoop YARN to perform analytics on data in Hive
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDDs in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
  • Developed a generic spring application for automating the Hadoop jobs in sequence integrated with Hive, Sqoop, Oracle Database, HDFS and Shell scripts
  • Developed an Oracle database command line interface for executing queries from the Hadoop Edge node using Spring JDBC and Maven
  • Wrote complex SQLs, Stored Procedures PL/SQL using SQL developer executed on Oracle Database
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & Teradata
  • Experience in Job management using ESP scheduler and Developed job processing scripts using Oozie workflow
  • Developed and executed hive queries for de-normalizing the data
  • Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ
  • Developed java runnable jars for integrating and supporting the existing Hadoop jobs using Core Java, Spring framework, Spring JDBC, Maven, Junit etc
  • Used Jenkins for Continuous Integration and GitHub as a Version control
  • Scheduled automated triggers to build the artifacts like Jar and EAR files and to store in the Artifactory and to deploy in the destination server location
  • Worked on UNIX shell Scripting.

Big Data Developer

Key Bank
08.2016 - 02.2017
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way
  • Involved in Migrating the Hive queries to Impala
  • Created batch analysis job prototypes using Hadoop, Pig, Oozie, Hue and Hive
  • Assisted with data capacity planning and node forecasting
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions
  • Created views, Managed and External tables in Hive and load incremental data into the tables
  • Documented the systems processes and procedures for future references
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Worked with application teams to install operating system, Hadoop updates, patches, and version upgrades as required
  • Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning, Monitoring Hadoop cluster connectivity and security, Managing and reviewing Hadoop log files.

Hadoop Developer

Lowe’s
05.2016 - 07.2016
  • Hadoop installation and configuration of multiple nodes in the Cloudera platform
  • Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters
  • Developed Simple to complex Map/reduce streaming jobs Analyzing data with Hive, Pig, and Hadoop Streaming
  • Build/Tune/Maintain Hive QL and Pig Scripts for reporting purposes
  • Handled importing of data from various data sources, performed transformations using Hive, Map/Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries (HQL) and running Pig scripts (Pig Latin) to study customer behavior
  • Stored the data in an Apache Cassandra Cluster
  • Used Impala to query the Hadoop data stored in HDFS
  • Manage and review Hadoop log files
  • Support/Troubleshoot Map/Reduce programs running on the cluster
  • Load data from the Linux file system into HDFS
  • Install and configure Hive and write Hive UDFs
  • Create tables, load data, and write queries in Hive
  • Develop scripts to automate routine DBA tasks using Linux Shell Scripts, Python.

Java Developer

Smartron
01.2014 - 07.2015
  • Involved in Software Development Life Cycle (SDLC) starting from Analysis of Design, programming, Test Cases, Implementing, and Production support of the Application
  • Involved in all phases of the end-to-end implementation project- requirements gathering, analysis, design, development, testing, and debugging
  • Actively participated in the daily SCRUM and weekly meetings to produce quality deliverables within time
  • Built the Web application using Spring MVC and implemented Spring Web-Flow for controlled page navigation
  • Used Spring MVC Framework to develop the Application by implementing the controller, Services classes
  • Design and development of the business using Spring and Hibernate integrated with Spring ORM for database mapping
  • Developed Web Services using SOAP for sending and getting data and Implemented SOAP Web services using JAX-WS
  • Gathered and analyzed user requirements and translated them into system solutions using Rational Rose (UML)
  • Implemented persistence layer using JDBC template that uses POJO classes to represent persistent database tables
  • Wrote stored Procedures, Functions, Triggers, and Cursors in PL/SQL for efficient interaction with the database
  • Used RESTful web services for sending and getting data from different Applications using SOAP messages
  • Handled Java multi-threading part in the back-end component
  • One thread will be running for each user, which serves that user
  • Designed Schemas for XML and used SAX parser to parse the XML documents
  • Involved in creating and deploying Enterprise Application on WebSphere Application Server8.0
  • Worked on queries and stored procedures on Oracle using SQL Navigator
  • Make necessary changes to add new products/field information in the Application
  • Involved in using JMS Queues and JMS Topics for one-to-one and one-to-many communication in the Application
  • Used GitHub and Jenkins for the Continuous Integration process
  • Worked with the JUnit framework for writing JUnit tests and integration tests.

Education

Master of Science - Computer Science

Texas A&M University - Kingsville
Kingsville, TX
06.2017

Bachelor of Science - Electrical, Electronics And Communications Engineering

Priyadarshini College of Engineering
Nellore, India
05.2014

Skills

  • Azure Data Factory
  • Azure Databricks
  • Azure Event hub
  • Azure SQL server
  • MLFlow
  • Azure Data Lake
  • Delta Lakehouse
  • Snowflake
  • Scala
  • Python
  • Shell scripting
  • SQL

Timeline

Data Engineer/ ML Engineer

Cummins
11.2020 - Current

Big Data Developer

Walgreens
06.2018 - 10.2020

Hadoop developer

Entergy
03.2017 - 05.2018

Big Data Developer

Key Bank
08.2016 - 02.2017

Hadoop Developer

Lowe’s
05.2016 - 07.2016

Java Developer

Smartron
01.2014 - 07.2015

Master of Science - Computer Science

Texas A&M University - Kingsville

Bachelor of Science - Electrical, Electronics And Communications Engineering

Priyadarshini College of Engineering
Navaneeswara Namala