Summary
Overview
Work History
Skills
Websites
Timeline
Generic

Poojith Reddy Jakka

Irving,TX

Summary

Over 7+ years of work experience in IT, which includes experience in Data Engineering and Implementation of Hadoop, Spark and cloud Data warehousing solutions. Experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark streaming and Spark SQL. Experienced in writing Spark Applications in Scala and Python (PySpark). Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data. Experience in writing distributed Scala code for efficient big data processing. Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP. Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing. Experience structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau). Experience writing scripts using Python and familiarity with the following tools: AWS Cloud Lambda, AWS S3, AWS EC2, AWS Redshift, AWS Postgres. Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER. Experience in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. Hands on experience on Star Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling using Erwin. Experience in Amazon Web Services (AWS) concepts like EC2, S3, EMR, ElasticCache, DynamoDB, Redshift, Aurora. Experience in data analysis using Hive, Pig Latin, and Impala. Experience working on Dockers Hub, creating Dockers images and handling multiple images primarily for middleware installations and domain configuration. Strong experience in CI (Continuous Integration)/ CD (Continuous Delivery) software development pipeline stages like Commit, Build, Automated Tests, and Deploy using Jenkins. Hands on experience in SQL and NOSQL database such as Snowflake, HBase, Cassandra and MongoDB. Experience working in both Waterfall and Agile methodologies. A self-motivated exuberant learner and adequate with challenging projects and work in ambiguity to solve complex problems independently or in the collaborative team. Strong skills in analytical, presentation, communication, problem solving with the ability to work independently as well as in a team and had the ability to follow the best practices and principles defined for the team.

Overview

7
7
years of professional experience

Work History

Data Engineer

DocuSign
10.2022 - Current
  • Involved in Requirements and Analysis: Understanding requirements of client and flow of application.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Configured Spark to optimize data process.
  • Designed and Developed Scala workflows for data pull from cloud-based systems and applying transformations on it.
  • Developed shell scripts for ingesting data to HDFS and partitioned the data over Hive.
  • Developed PIG scripts for analysis of semi structured data.
  • Involved in planning and design of data warehouse in STAR schema.
  • Designing structure of tables and documenting it.
  • Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on fly with usage of quick filters for on demand needed information.
  • Created workflows, mappings using Informatica ETL and worked with different transformations such as lookup, source qualifier, update strategy, router, sequence generator, aggregator, rank, stored procedure, filter, joiner, sorter.
  • Create several types of data visualizations using Python and Tableau.
  • Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python.
  • Utilized AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Used Scala to convert Hive / SQL queries into RDD transformations in Apache Spark.
  • Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
  • Created Hive tables for loading and analyzing data, Implemented Partitions, Buckets and developed Hive queries to process data and generate data cubes for visualizing.
  • Involved in writing script files for processing data and loading to HDFS.
  • Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Worked with complex SQL, Stored Procedures, Triggers, and packages in large databases from various servers.
  • Worked on designing, building, deploying and maintaining Mongo DB.
  • Implemented SQL, PL/SQL stored procedures.
  • Worked on Agile Methodology.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.
  • Developed database architectural strategies at modeling, design and implementation stages to address business or industry requirements.
  • Monitored incoming data analytics requests and distributed results to support strategies.
  • Monitored incoming data analytics requests and distributed results to support [Type] strategies
  • Designed data models for complex analysis needs.
  • Gathered, defined and refined requirements, led project design and oversaw implementation
  • Designed data models for complex analysis needs
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
  • Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.

Data Engineer

PayPod digital labs
04.2021 - 08.2022
  • Worked with business users to gather, define business requirements and analyze possible technical solutions.
  • Developed Spark scripts by using Python and Scala shell commands as per requirement.
  • Wrote Spark jobs with RDD's, Pair RDDs, Transformations and actions, data frames for data transformations from relational sets.
  • Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it
  • Responsible for Building Scalable Distributed Data solutions using Hadoop
  • Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
  • Developed PySpark script to merge static and dynamic files and cleanse data
  • Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data
  • Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart
  • Utilized AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making
  • Responsible for developing Pig Latin scripts for extracting data using JSON Reader function
  • Written Pig Scripts for sorting, joining, filtering and grouping data
  • Extracted data from Teradata database and loaded into Data warehouse using spark
  • Implemented Continuous Delivery pipeline with Docker and GitHub
  • Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.
  • Performed analysis on unused user navigation data by loading into HDFS and writing MapReduce jobs.
  • Worked on data pre-processing and cleaning data to perform feature engineering and performed data imputation techniques for missing values in dataset using Python.
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
  • Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake
  • Used SQL queries and other tools to perform data analysis and profiling
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Data Engineer

eBay
08.2019 - 03.2021
  • Worked as Data Engineer to review business requirement and compose source to target data mapping documents
  • Developed Spark programs with Scala and applied principles of functional programming to do batch processing
  • Used various Spark Transformations and Actions for cleansing the input data and involved in using the Spark application master to monitor the Spark jobs and capture the logs for the spark jobs
  • Developed Apache Spark applications by using Scala for data processing from various streaming sources
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables to spark for faster processing of data
  • Analyzed large amounts of data sets to determined optimal way to aggregate and report on it using Map Reduce programs
  • Responsible for data services and data movement infrastructures, worked with ETL concepts, building ETL solutions and Data modeling
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date
  • Designed, developed and implemented ETL pipelines using python API (PySpark) of Apache Spark on AWS EMR
  • Development and implementation of several types of sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using Tableau
  • Designed and Developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs
  • Implemented data ingestion and handling clusters in real time processing using Kafka
  • Used Scala to convert Hive / SQL queries into RDD transformations in Apache Spark
  • Worked with AWS stack S3, EC2, Snowball, EMR, Athena, Glue, Redshift, DynamoDB, RDS, Aurora, IAM, Firehose, and Lambda
  • Designed, developed, implemented, and maintained solutions for using Docker, Jenkins, Git, for microservices and continuous deployment
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers
  • Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
  • Involved in creating Hive tables, loading and analyzing data using hive queries and written complex Hive queries to transform the data
  • Created HBase tables to load large sets of structured data
  • Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Data Engineer

Axos Bank
03.2018 - 07.2019
  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development
  • Involved in development of Spark jobs in PySpark and SparkSQL to run on top of hive tables and create transformed datasets for downstream consumption
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
  • Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations
  • Developed Scala scripts and UDFs using Data Frames and RDD in Spark for data aggregation, queries and writing data back into OLTP system through Sqoop
  • Developed the Map Reduce programs to parse the raw data and store the pre Aggregated data in the portioned tables
  • Developed ETL (Extraction, Transformation and Loading) procedures and Data Conversion Scripts using Pre-Stage, Stage, Pre-Target and Target tables
  • Involved in designing and deploying rich Graphic visualizations with Drill Down and Drop-down menu option and Parameters using Tableau
  • Write Python scripts to parse JSON documents and load the data in database
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig
  • Implemented data ingestion and cluster handling in real time processing using Kafka
  • Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas
  • Migrated an existing on-premises application to AWS
  • Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Implemented AWS Elastic Container Service (ECS) scheduler to automate application deployment in the cloud using Docker Automation techniques
  • Used hive to do transformations, event joins and pre-aggregations before storing the data to HDFS
  • Created HBase tables to store various data formats coming from different applications
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed
  • Use SQL queries and other tools to perform data analysis and profiling
  • Followed agile methodology and involved in daily SCRUM meetings, sprint planning.

Data Engineer

Datadot Software Solution
01.2017 - 02.2018
  • Gathering business requirements, business analysis and design various data products
  • Developed Spark scripts by using Python shell commands as per the requirement
  • Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it
  • Developed Spark code in Python and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage
  • Built key business metrics, Visualizations, dashboards, reports with Tableau
  • Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark
  • Created PIG Latin Scripts to sort, group, join and filter the enterprise wise data
  • Created Hive tables to store the processed results in a tabular format
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries
  • Worked on SQL queries in dimensional data warehouses and relational data warehouses
  • Performed Data Analysis and Data Profiling using Complex SQL queries on various systems
  • Followed agile methodology for the entire project.

Skills

  • Python
  • SQL
  • Scala
  • MATLAB
  • Red Hat Linux
  • Unix
  • Windows
  • MacOS
  • Snowflake
  • AWS RDS
  • Teradata
  • Oracle
  • MySQL
  • Microsoft SQL
  • Postgre SQL
  • AWS
  • Docker

Timeline

Data Engineer

DocuSign
10.2022 - Current

Data Engineer

PayPod digital labs
04.2021 - 08.2022

Data Engineer

eBay
08.2019 - 03.2021

Data Engineer

Axos Bank
03.2018 - 07.2019

Data Engineer

Datadot Software Solution
01.2017 - 02.2018
Poojith Reddy Jakka