Summary
Overview
Work History
Education
Skills
Timeline
Generic

Vijay Kumar Tulluri

Nashville,Tennessee

Summary

  • Over 8 years as an experienced Data Engineer Specializing in Big Data engineering and analytics across various industries.
  • Expertise in the Hadoop ecosystem, including HDFS, MapReduce, YARN, Spark, Hive, and Sqoop, with significant experience in Cloudera and Hortonworks distributions.
  • Proficient in cloud-based data solutions, having migrated SQL databases to Azure Data Lake, Azure SQL Database, and AWS, utilizing Azure Data Factory for seamless data transfers.
  • Skilled in data ingestion and ETL processes using tools like Kafka, NiFi, and Flume for handling structured, semi-structured, and unstructured data.
  • Advanced programming capabilities in Java, Python, SQL, and T-SQL, with extensive experience in developing Spark applications for data cleansing, transformation, and summarization.
  • Familiar with CI/CD practices and tools such as Jenkins and Ansible, and adept in both waterfall and Agile project management methodologies.
  • Demonstrated ability in machine learning and predictive modeling using algorithms like Linear Regression, Logistic Regression, and Decision Trees.
  • Proficient in SQL querying and NoSQL databases, including Cassandra and HBase, with a strong understanding of data architecture and processing techniques for large datasets.

Experienced in optimizing data pipelines and applications for improved performance and scalability, including configuring and installing Hadoop/Spark ecosystem components.

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

Persistent Systems Ltd
05.2019 - 12.2021
  • Engineered and executed stream processing jobs using Spark Streaming in Scala, enhancing real-time data analysis capabilities.
  • Developed real-time data processing applications in Scala and Java, integrating Apache Spark Streaming with Kafka and JMS for efficient data ingestion.
  • Utilized PySpark and Hive for sensor data analysis, clustering users based on behavioral patterns during events, showcasing deep analytical skills.
  • Implemented high-performance, real-time processing jobs using Spark Streaming, coupled with Kafka, to establish a robust data pipeline system.
  • Leveraged AWS services like EMR and EC2 for scalable and efficient Big Data processing, demonstrating cloud computing expertise.
  • Crafted and optimized Spark programs, creating data frames and performing transformations to facilitate complex data analysis tasks.
  • Designed and managed data ingestion from diverse sources using Kafka producers and partitions, applying custom encoders for efficient data loading.
  • Conducted proof-of-concept (POC) with Hadoop, extracting data into HDFS using Spark, showcasing innovation and technical prowess.
  • Optimized data processing and analytics using Spark SQL and Scala on Cloudera Hadoop YARN, enhancing data insights extracted from Hive.

Developed and automated Databricks notebooks using SQL and Python, configuring Azure Databricks clusters for high concurrency, ensuring rapid preparation of high-quality data.


Environment: Hadoop, Spark, Spark Streaming, MapReduce, Hive, Pig, Oozie, Kafka, Storm, Scala, Java, Python, Sqoop, Talend, AWS (EMR, S3, CloudWatch), MongoDB, Solr, Hadoop Cluster, Azure Databricks, Linux.

Data Engineer

Randstad Technologies, LLC
04.2017 - 05.2019
  • Analyzed and optimized large datasets, leveraging Azure Cloud services and Spark for efficient data aggregation and reporting.
  • Designed and implemented a scalable Big Data Analytics architecture, integrating technologies like ADLS Gen-2, Delta Lakes, and Hive.
  • Developed robust Spark applications in Scala and Python for batch data extraction, transforming source data into parquet format for storage in Azure Data Lake Services (ADLS Gen 2).
  • Constructed ETL pipelines using Databricks notebooks, integrating wear algorithm predictions with outputs to Azure CosmosDB for real-time analytics.
  • Engineered Kafka consumers in Scala to ingest data from Kafka topics, enhancing data flow and accessibility within the pipeline.
  • Migrated legacy HQL computational code to PySpark, streamlining data processing workflows and improving system performance.
  • Facilitated data ingestion from various sources into Azure Data Storage services, employing Azure Data Factory, T-SQL, and Spark SQL for comprehensive data management.
  • Deployed Spark jobs on Amazon EMR, leveraging AWS cluster computing for scalable data processing and analysis.
  • Integrated data engineering practices with the Tire Wear model, collaborating with the data science team to refine predictive outputs and resolve technical issues.
  • Implemented data warehouse solutions on AWS Redshift, leading projects to migrate on-premises databases to cloud-based storage and computing environments.

Environment: Hadoop, Hive, Spark, MapReduce, HBase, Kafka, Flume, Azure Databricks, AWS, Azure Data Lake, Azure SQL, Azure DW, Scala, Python, Java, Shell Scripting

HDFS, ADLS Gen-2, Cassandra, Teradata, NoSQL, Sqoop, Ambari, Azure Data Factory, Tableau, Ubuntu, Oracle 10g/11g/12C

Data Engineer

GlobalLogic Technologies Limited
06.2014 - 04.2017
  • Designed and implemented complex T-SQL queries, stored procedures, and tables for data extraction and manipulation to meet business requirements.
  • Developed and executed shell scripts for automating DataStage jobs, enhancing efficiency and reliability in data processing workflows.
  • Enhanced data accessibility for service developers by identifying and cataloging essential content within existing reference models.
  • Leveraged Spark SQL API within PySpark for efficient data extraction, loading, and execution of SQL queries for analytical processing.
  • Crafted PySpark scripts incorporating hashing algorithms to encrypt sensitive data, ensuring data privacy and compliance with client specifications.
  • Engineered Python-based APIs (RESTful Web Services) for revenue tracking and analysis, providing critical insights into financial performance.
  • Optimized SQL queries through meticulous analysis of indexes and execution plans, significantly reducing runtime and enhancing database performance.
  • Conducted comprehensive ETL testing, overseeing job execution, data extraction, transformation, and loading into Data Warehouse servers, ensuring data integrity and accuracy.
  • Spearheaded data migration projects utilizing a blend of SQL, Azure Data Factory, SSIS, and PowerShell, achieving seamless transition to Azure Data Storage services.
  • Created dynamic dashboards and ad-hoc reports using Tableau and Power BI, facilitating data-driven decision-making by visualizing POS and revenue data.

Environment: Spark, PySpark, AWS, S3, Glue, Redshift, DynamoDB, Hive, Spark SQL, Docker, Kubernetes, Airflow, GCP, ETL workflows.

Education

Master of Science - Computer And Information Sciences

Christian Brothers University
Memphis, TN
12.2023

Bachelor of Science - Computer Engineering

Jawaharlal Nehru Technological University
Hyderabad, India
05.2013

Skills

Category

Skills

Big Data Technologies : HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Spark, Kafka, Nifi, Airflow, Flume, Snowflake, Ambari, Hue

Hadoop Frameworks : Cloudera CDHs, Hortonworks HDPs, MAPR, Spark, Impala

Cloud Services : AWS (IAH, S3, EMR, EC2, Lambda, Route 53, Cloud Watch, SNS), Azure, GCP

Programming Languages : SQL, Python, Scala, Java, C, C

Databases : Oracle (10g/11g), PL/SQL, MySQL, MS SQL Server 2012, DB2, Teradata, NoSQL, HBase, Cassandra, MongoDB, DynamoDB

Development Tools : Eclipse, Net Beans, IntelliJ, PyCharm, Jupyter, Databricks notebooks

Data Formats : JSON, Parquet, AVRO, XML, CSV

Business Intelligence : Tableau, PowerBI, DataStudio

Modeling Tools : Rational Rose, Star UML, Visual Paradigm for UML

Build Tools : Maven, Gradle, Jenkins

Operating Systems : Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X

Web Technologies : JDBC, JSP, Servlets, Struts (Tomcat, JBoss)

Methodologies : Agile, Waterfall

Timeline

Senior Data Engineer

Persistent Systems Ltd
05.2019 - 12.2021

Data Engineer

Randstad Technologies, LLC
04.2017 - 05.2019

Data Engineer

GlobalLogic Technologies Limited
06.2014 - 04.2017

Master of Science - Computer And Information Sciences

Christian Brothers University

Bachelor of Science - Computer Engineering

Jawaharlal Nehru Technological University
Vijay Kumar Tulluri