Summary

Overview

Work History

Education

Skills

Timeline

Navaneeswara Namala

CHARLOTTE,North Carolina

Summary

Over 9 years of software development experience, 6 years of experience working on Big Data technologies within the Apache Spark and Hadoop Ecosystem. Proficient in building ETL and Machine Learning data pipelines using Azure Databricks and Azure Data Factory. Skilled in utilizing ML Flow model registry, Delta Lakehouse, and Auto ML features in Azure Databricks. Extensive experience in end-to-end machine learning projects, from design to delivery. Expertise in working with cloud-based Data Warehouses like Snowflake and Delta Lakehouse. Accomplished constructing automated data pipelines from landing zones to refined tables in the Snowflake data warehouse. Hands-on experience in Big Data Analytics, encompassing Data Extraction, Transformation, Loading, and Analysis using Databricks, Cloudera, and Hortonworks platforms. Proficient in Java, Hadoop Map Reduce, HDFS, Pig, Hive, Oozie, Sqoop, HBase, Scala, Python, Kafka, and NoSQL Databases. Worked on on-premises and cloud computing servers, including Azure, AWS, and Google Cloud. Expertise in Cloud Big Data tools on Azure and AWS. Familiarity with Azure Data Engineering tools such as Azure Cosmos DB (SQL, Table, Cassandra, MongoDB, Graph), Azure Synapse Analytics, Azure Streaming Analytics, Azure Databricks, Azure Data Lake Storage Gen2, and Azure Storage accounts services. Experience with AWS Big Data tools including Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, SQS, S3 bucket, EMR, DynamoDB, Redshift, Aurora DB, Glue, and quick sight services. In-depth understanding of Hadoop architecture and components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce programming paradigm. Extensive experience in working with structured data using Hive, writing custom UDFs, and optimizing Hive Queries. Proficient in importing/exporting data using Sqoop from HDFS to various Relational Databases like Oracle, Teradata, Netezza, and SQL Server. Strong knowledge of NoSQL databases like Mongo DB, HBase, Cassandra, AWS Dynamo DB, and Azure Cosmos DB. Well-versed in Data Warehousing concepts, facts, dimensional tables, and diverse data formats (CSV, text, Avro, orc, JSON, Parquet, and Delta). Managed and monitored Apache Hadoop clusters using Ambari. Proficient in data ingestion tools like Azure Data Factory, Apache Sqoop, and AWS Kinesis service. Experience with Spark, Spark Streaming (Scala and Python), and hands-on experience with Azure Databricks. Hands-on experience in Data mining techniques and machine learning algorithms using Python libraries: scikit-learn, Seaborn, Matplotlib, NumPy, and Pandas. Strong understanding of NoSQL databases, with hands-on experience writing applications on HBase and working on real-time processing using Rest API. Experience using build tools Maven, ANT, Jenkins, Bamboo, Gitlab, and Azure DevOps for deploying automated builds in different environments and familiarity with CI/CD pipelines.

Overview

years of professional experience

Work History

Data Engineer/ ML Engineer

Cummins

11.2020 - Current

Worked in a Data Science team that delivers solutions using Predictive modeling techniques, Machine learning
Worked on exploratory data analysis using Scala, Python, and data visualization modules Seaborn and Matplotlib
Worked with data scientists to create a clean and transformed dataset to train a model
Registered Python sci-kit-learn machine learning models in the MLflow registry on Azure Databricks
Created custom models, packaged them as an MLFlow Pyfunc model, and registered them in the model registry to serve as an on-demand API to get the predictions
Created an end-to-end pipeline to process and transform the data and apply the machine learning classification model in the Spark Structured streaming job and saved the results to an Azure SQL server table for accessing the results through a rest API
Created end-to-end pipeline involving data wrangling, transformation, aggregating, creating the input data to the classification model, and applying the ML model from the MLFlow registry to generate the predictions
Experienced in using Azure EventHub to send events to the downstream team
Created a job to generate the metrics for the Machine learning pipelines that helps business clients to get insights
Implemented Delta Lake migration by converting the parquet tables into the Delta tables and storing them on the Delta Lakehouse
Created indexes on the SQL server table to optimize the performance
Implemented Continuous integration and continuous delivery using Gitlab and Databricks repos
Creating a data pipeline involves fetching data through a rest API, data cleaning, and transforming data to create a model input and passing it to the regression model to get the predictions and send them to the downstream team using Azure Event hub
Created a framework to verify the predictions of the machine learning models with the ground-level raw data
Created a model monitoring framework to identify the data drift in the raw data serving the machine learning models
Worked on a Proof of concept for implementing the Feature store in Databricks and MLFlow model remote registry
Worked on a proof of concept for implementing Azure Machine Learning Studio
Worked on Databricks runtime migration project and workspace migrations
Worked on automated scripts to load and read data from Snowflake data warehouse tables using Python, Snow Pipe, and SnowSQL
Experienced in data sharing, time travel, and optimizing the tables and data sharing with other customers in the Snowflake marketplace.

Big Data Developer

Walgreens

06.2018 - 10.2020

Worked on analyzing Hadoop cluster and Big Data analytic tools, including Hive, HBase and SQOOP
Implemented Spark 2.4 using Scala 2.11 and Spark SQL for faster processing of data
Used Spark for interactive queries, processing of streaming data and integration
Importing the data from the Teradata and Oracle, SQL server into the HDFS using Sqoop
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Worked on Spark Streaming and Spark SQL to run sophisticated applications on Hadoop
Ingested data to HDFS and writing hive queries to process required data
Loaded and transformed large sets of structured, semi structured XML data into HDFS
Configured connection between HDFS and Tableau using Impala for Tableau developer team
Experienced in managing and reviewing Hadoop log files for troubleshooting
Developed shell scripts for executing the Hadoop file system commands for file handling operations
Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce
Created jobs for loading the raw data into the tables and denormalized tables
Performance-tuned and optimized Hive quires to achieve high performance
Written Hive queries for data analysis to meet business requirements
Monitored System health and logs and responded accordingly to warning or failure conditions
Responsible for managing the test data coming from different sources
Involved in scheduling the Oozie workflow engine to run multiple Hive jobs for the business team to run analytical queries
Created and maintained technical documentation for launching Hadoop jobs and executing Hive queries and Pig Scripts
Implemented schedulers on the Job tracker to share the cluster’s resources for the Map Reduce jobs given by the users
Worked on the Azure Data engineerings tools like Azure Data Factory, Azure Databricks, Cosmos DB, Azure Synapse Analytics, Power BI, and Azure Data Lake Gen2
Created ETL jobs using Azure Data Factory to ingest the data from the Azure Blob storage to Azure Data Lake Storage and then load into the Azure Synapse Analytics (Azure Data warehouse) for processing using Azure Databricks and to run the analytical queries
Created PySpark jobs to process the data present in the Azure data lake storage Gen2 and load into the Azure Synapse Analytics tables
Created automated jobs for ingesting and processing data using Azure Data Factory, Azure Databricks, Azure Data Lake Storage Gen2 and Azure Synapse Analytics
Worked on POC to migrate the On-premises data from HBase tables to Azure Cosmos DB SQL table
Using NIFI crated jobs to extract, load and transform data into HDFS
Implemented Transparent Data Encryption and created encryption zones in existing Hadoop folders
Written automated scripts using Python and Shell scripting to implement TDE in the Hadoop cluster
Written python scripts to run multiple threads from a single processor
Written shell scripts to copy the larger volumes of data using DISTCP tool
Created Hadoop jobs to transfer the data between the 2 Hadoop clusters
Created Incremental jobs to load data into the HBase tables using HBase Bulk Load operations tool to import data from different files formats like CSV, TSV and JSON
Worked on Space optimization by remodifying the HBase table Row Key
Experienced on performing the Exploratory Data Analysis (EDA) using Python, Seaborn, Matplotlib, NumPy, Pandas, Scikit learn packages
Worked on training and building the predictive model using linear regression algorithms using Python scikit-learn module
Made changes to the existing Ab initio job by making changes in the SQL query and in the DML files and verified in the Teradata tables
Worked and analyzed the Spark Scala jobs
Used the Bit bucket as the Code repository and Artifactory to store the artifacts
Implemented Continuous Integration and Continuous Delivery using the DevOps tools like Bitbucket, Artifactory to deploy the artifacts on the Test and Production servers.

Hadoop developer

Entergy

03.2017 - 05.2018

Developed Spark jobs using Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the standard learner data model, which gets the data from AWS Kinesis Data Firehose in near real-time and Persists into AWS Dynamo DB
Configured AWS Kinesis Data Streams and Kinesis Data Firehose jobs in Dev and Test environment and loaded data into AWS S3 buckets and AWS Redshift
Worked on AWS Elastic Cloud Compute (EC2 Instance) for computational tasks and stored data into AWS S3 buckets as the storage mechanism
Created PySpark automated batch processing jobs to clean and process data by loading into Spark Data frame from the AWS S3 bucket and writing back the processed data to S3 bucket and then using copy command loaded into AWS Redshift tables for the performing analytics
Worked on provisioning the AWS Elastic Map Reduce (EMR) cluster by specifying the data inputs, outputs and IAM security roles
Developed Spark scripts by using Scala shell commands as per the requirement
Used Spark SQL API over Hadoop YARN to perform analytics on data in Hive
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDDs in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself
Developed a generic spring application for automating the Hadoop jobs in sequence integrated with Hive, Sqoop, Oracle Database, HDFS and Shell scripts
Developed an Oracle database command line interface for executing queries from the Hadoop Edge node using Spring JDBC and Maven
Wrote complex SQLs, Stored Procedures PL/SQL using SQL developer executed on Oracle Database
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & Teradata
Experience in Job management using ESP scheduler and Developed job processing scripts using Oozie workflow
Developed and executed hive queries for de-normalizing the data
Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ
Developed java runnable jars for integrating and supporting the existing Hadoop jobs using Core Java, Spring framework, Spring JDBC, Maven, Junit etc
Used Jenkins for Continuous Integration and GitHub as a Version control
Scheduled automated triggers to build the artifacts like Jar and EAR files and to store in the Artifactory and to deploy in the destination server location
Worked on UNIX shell Scripting.

Big Data Developer

Key Bank

08.2016 - 02.2017

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows
Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way
Involved in Migrating the Hive queries to Impala
Created batch analysis job prototypes using Hadoop, Pig, Oozie, Hue and Hive
Assisted with data capacity planning and node forecasting
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Involved in Analyzing system failures, identifying root causes, and recommended course of actions
Created views, Managed and External tables in Hive and load incremental data into the tables
Documented the systems processes and procedures for future references
Monitored workload, job performance and capacity planning using Cloudera Manager
Worked with application teams to install operating system, Hadoop updates, patches, and version upgrades as required
Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning, Monitoring Hadoop cluster connectivity and security, Managing and reviewing Hadoop log files.

Hadoop Developer

Lowe’s

05.2016 - 07.2016

Hadoop installation and configuration of multiple nodes in the Cloudera platform
Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters
Developed Simple to complex Map/reduce streaming jobs Analyzing data with Hive, Pig, and Hadoop Streaming
Build/Tune/Maintain Hive QL and Pig Scripts for reporting purposes
Handled importing of data from various data sources, performed transformations using Hive, Map/Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Analyzed the data by performing Hive queries (HQL) and running Pig scripts (Pig Latin) to study customer behavior
Stored the data in an Apache Cassandra Cluster
Used Impala to query the Hadoop data stored in HDFS
Manage and review Hadoop log files
Support/Troubleshoot Map/Reduce programs running on the cluster
Load data from the Linux file system into HDFS
Install and configure Hive and write Hive UDFs
Create tables, load data, and write queries in Hive
Develop scripts to automate routine DBA tasks using Linux Shell Scripts, Python.

Java Developer

Smartron

01.2014 - 07.2015

Involved in Software Development Life Cycle (SDLC) starting from Analysis of Design, programming, Test Cases, Implementing, and Production support of the Application
Involved in all phases of the end-to-end implementation project- requirements gathering, analysis, design, development, testing, and debugging
Actively participated in the daily SCRUM and weekly meetings to produce quality deliverables within time
Built the Web application using Spring MVC and implemented Spring Web-Flow for controlled page navigation
Used Spring MVC Framework to develop the Application by implementing the controller, Services classes
Design and development of the business using Spring and Hibernate integrated with Spring ORM for database mapping
Developed Web Services using SOAP for sending and getting data and Implemented SOAP Web services using JAX-WS
Gathered and analyzed user requirements and translated them into system solutions using Rational Rose (UML)
Implemented persistence layer using JDBC template that uses POJO classes to represent persistent database tables
Wrote stored Procedures, Functions, Triggers, and Cursors in PL/SQL for efficient interaction with the database
Used RESTful web services for sending and getting data from different Applications using SOAP messages
Handled Java multi-threading part in the back-end component
One thread will be running for each user, which serves that user
Designed Schemas for XML and used SAX parser to parse the XML documents
Involved in creating and deploying Enterprise Application on WebSphere Application Server8.0
Worked on queries and stored procedures on Oracle using SQL Navigator
Make necessary changes to add new products/field information in the Application
Involved in using JMS Queues and JMS Topics for one-to-one and one-to-many communication in the Application
Used GitHub and Jenkins for the Continuous Integration process
Worked with the JUnit framework for writing JUnit tests and integration tests.

Education

Master of Science - Computer Science

Texas A&M University - Kingsville

Kingsville, TX

06.2017

Bachelor of Science - Electrical, Electronics And Communications Engineering

Priyadarshini College of Engineering

Nellore, India

05.2014

Skills

Azure Data Factory
Azure Databricks
Azure Event hub
Azure SQL server
MLFlow
Azure Data Lake

Delta Lakehouse
Snowflake
Scala
Python
Shell scripting
SQL

Timeline

Data Engineer/ ML Engineer

Cummins

11.2020 - Current

Big Data Developer

Walgreens

06.2018 - 10.2020

Hadoop developer

Entergy

03.2017 - 05.2018

Big Data Developer

Key Bank

08.2016 - 02.2017

Hadoop Developer

Lowe’s

05.2016 - 07.2016

Java Developer

Smartron

01.2014 - 07.2015

Master of Science - Computer Science

Texas A&M University - Kingsville

Bachelor of Science - Electrical, Electronics And Communications Engineering

Priyadarshini College of Engineering

Navaneeswara Namala

Summary

Overview

Work History

Data Engineer/ ML Engineer

Big Data Developer

Hadoop developer

Big Data Developer

Hadoop Developer

Java Developer

Education

Master of Science - Computer Science

Bachelor of Science - Electrical, Electronics And Communications Engineering

Skills

Timeline

Data Engineer/ ML Engineer

Big Data Developer

Hadoop developer

Big Data Developer

Hadoop Developer

Java Developer

Master of Science - Computer Science

Bachelor of Science - Electrical, Electronics And Communications Engineering

Similar Profiles

Anthony WeezorakAnthony Weezorak

Sai Pranitha Chowdary RavipatiSai Pranitha Chowdary Ravipati

Michelle HallMichelle Hall

Janasya JohnsonJanasya Johnson

Thubelihle NyathiThubelihle Nyathi