Summary

Overview

Work History

Skills

Websites

Timeline

Poojith Reddy Jakka

Irving,TX

Summary

Over 7+ years of work experience in IT, which includes experience in Data Engineering and Implementation of Hadoop, Spark and cloud Data warehousing solutions. Experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark streaming and Spark SQL. Experienced in writing Spark Applications in Scala and Python (PySpark). Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data. Experience in writing distributed Scala code for efficient big data processing. Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP. Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing. Experience structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau). Experience writing scripts using Python and familiarity with the following tools: AWS Cloud Lambda, AWS S3, AWS EC2, AWS Redshift, AWS Postgres. Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER. Experience in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. Hands on experience on Star Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling using Erwin. Experience in Amazon Web Services (AWS) concepts like EC2, S3, EMR, ElasticCache, DynamoDB, Redshift, Aurora. Experience in data analysis using Hive, Pig Latin, and Impala. Experience working on Dockers Hub, creating Dockers images and handling multiple images primarily for middleware installations and domain configuration. Strong experience in CI (Continuous Integration)/ CD (Continuous Delivery) software development pipeline stages like Commit, Build, Automated Tests, and Deploy using Jenkins. Hands on experience in SQL and NOSQL database such as Snowflake, HBase, Cassandra and MongoDB. Experience working in both Waterfall and Agile methodologies. A self-motivated exuberant learner and adequate with challenging projects and work in ambiguity to solve complex problems independently or in the collaborative team. Strong skills in analytical, presentation, communication, problem solving with the ability to work independently as well as in a team and had the ability to follow the best practices and principles defined for the team.

Overview

years of professional experience

Work History

Data Engineer

DocuSign

10.2022 - Current

Involved in Requirements and Analysis: Understanding requirements of client and flow of application.
Developed Spark streaming application to pull data from cloud to Hive table.
Configured Spark to optimize data process.
Designed and Developed Scala workflows for data pull from cloud-based systems and applying transformations on it.
Developed shell scripts for ingesting data to HDFS and partitioned the data over Hive.
Developed PIG scripts for analysis of semi structured data.
Involved in planning and design of data warehouse in STAR schema.
Designing structure of tables and documenting it.
Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on fly with usage of quick filters for on demand needed information.
Created workflows, mappings using Informatica ETL and worked with different transformations such as lookup, source qualifier, update strategy, router, sequence generator, aggregator, rank, stored procedure, filter, joiner, sorter.
Create several types of data visualizations using Python and Tableau.
Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python.
Utilized AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
Used Scala to convert Hive / SQL queries into RDD transformations in Apache Spark.
Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
Created Hive tables for loading and analyzing data, Implemented Partitions, Buckets and developed Hive queries to process data and generate data cubes for visualizing.
Involved in writing script files for processing data and loading to HDFS.
Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
Worked with complex SQL, Stored Procedures, Triggers, and packages in large databases from various servers.
Worked on designing, building, deploying and maintaining Mongo DB.
Implemented SQL, PL/SQL stored procedures.
Worked on Agile Methodology.
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.
Developed database architectural strategies at modeling, design and implementation stages to address business or industry requirements.
Monitored incoming data analytics requests and distributed results to support strategies.
Monitored incoming data analytics requests and distributed results to support [Type] strategies
Designed data models for complex analysis needs.
Gathered, defined and refined requirements, led project design and oversaw implementation
Designed data models for complex analysis needs
Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.

Data Engineer

PayPod digital labs

04.2021 - 08.2022

Worked with business users to gather, define business requirements and analyze possible technical solutions.
Developed Spark scripts by using Python and Scala shell commands as per requirement.
Wrote Spark jobs with RDD's, Pair RDDs, Transformations and actions, data frames for data transformations from relational sets.
Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it
Responsible for Building Scalable Distributed Data solutions using Hadoop
Developed highly complex Python and Scala code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
Developed PySpark script to merge static and dynamic files and cleanse data
Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data
Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart
Utilized AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making
Responsible for developing Pig Latin scripts for extracting data using JSON Reader function
Written Pig Scripts for sorting, joining, filtering and grouping data
Extracted data from Teradata database and loaded into Data warehouse using spark
Implemented Continuous Delivery pipeline with Docker and GitHub
Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.
Performed analysis on unused user navigation data by loading into HDFS and writing MapReduce jobs.
Worked on data pre-processing and cleaning data to perform feature engineering and performed data imputation techniques for missing values in dataset using Python.
Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake
Used SQL queries and other tools to perform data analysis and profiling
Involved in Agile methodologies, daily scrum meetings, spring planning.

Data Engineer

eBay

08.2019 - 03.2021

Worked as Data Engineer to review business requirement and compose source to target data mapping documents
Developed Spark programs with Scala and applied principles of functional programming to do batch processing
Used various Spark Transformations and Actions for cleansing the input data and involved in using the Spark application master to monitor the Spark jobs and capture the logs for the spark jobs
Developed Apache Spark applications by using Scala for data processing from various streaming sources
Implemented Spark Scripts using Scala, Spark SQL to access hive tables to spark for faster processing of data
Analyzed large amounts of data sets to determined optimal way to aggregate and report on it using Map Reduce programs
Responsible for data services and data movement infrastructures, worked with ETL concepts, building ETL solutions and Data modeling
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date
Designed, developed and implemented ETL pipelines using python API (PySpark) of Apache Spark on AWS EMR
Development and implementation of several types of sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using Tableau
Designed and Developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map reduce outputs
Implemented data ingestion and handling clusters in real time processing using Kafka
Used Scala to convert Hive / SQL queries into RDD transformations in Apache Spark
Worked with AWS stack S3, EC2, Snowball, EMR, Athena, Glue, Redshift, DynamoDB, RDS, Aurora, IAM, Firehose, and Lambda
Designed, developed, implemented, and maintained solutions for using Docker, Jenkins, Git, for microservices and continuous deployment
Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers
Worked on Snowflake Schemas and Data Warehousing and processed batch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket
Involved in creating Hive tables, loading and analyzing data using hive queries and written complex Hive queries to transform the data
Created HBase tables to load large sets of structured data
Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL
Involved in Agile methodologies, daily scrum meetings, spring planning.

Data Engineer

Axos Bank

03.2018 - 07.2019

Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development
Involved in development of Spark jobs in PySpark and SparkSQL to run on top of hive tables and create transformed datasets for downstream consumption
Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations
Developed Scala scripts and UDFs using Data Frames and RDD in Spark for data aggregation, queries and writing data back into OLTP system through Sqoop
Developed the Map Reduce programs to parse the raw data and store the pre Aggregated data in the portioned tables
Developed ETL (Extraction, Transformation and Loading) procedures and Data Conversion Scripts using Pre-Stage, Stage, Pre-Target and Target tables
Involved in designing and deploying rich Graphic visualizations with Drill Down and Drop-down menu option and Parameters using Tableau
Write Python scripts to parse JSON documents and load the data in database
Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig
Implemented data ingestion and cluster handling in real time processing using Kafka
Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas
Migrated an existing on-premises application to AWS
Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
Implemented AWS Elastic Container Service (ECS) scheduler to automate application deployment in the cloud using Docker Automation techniques
Used hive to do transformations, event joins and pre-aggregations before storing the data to HDFS
Created HBase tables to store various data formats coming from different applications
Analyzed the SQL scripts and designed the solution to implement using PySpark
Extracted files from MongoDB through Sqoop and placed in HDFS and processed
Use SQL queries and other tools to perform data analysis and profiling
Followed agile methodology and involved in daily SCRUM meetings, sprint planning.

Data Engineer

Datadot Software Solution

01.2017 - 02.2018

Gathering business requirements, business analysis and design various data products
Developed Spark scripts by using Python shell commands as per the requirement
Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS
Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it
Developed Spark code in Python and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage
Built key business metrics, Visualizations, dashboards, reports with Tableau
Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark
Created PIG Latin Scripts to sort, group, join and filter the enterprise wise data
Created Hive tables to store the processed results in a tabular format
Analyzed the SQL scripts and designed the solution to implement using PySpark
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS
Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries
Worked on SQL queries in dimensional data warehouses and relational data warehouses
Performed Data Analysis and Data Profiling using Complex SQL queries on various systems
Followed agile methodology for the entire project.

Skills

Python
SQL
Scala
MATLAB
Red Hat Linux
Unix
Windows
MacOS
Snowflake

AWS RDS
Teradata
Oracle
MySQL
Microsoft SQL
Postgre SQL
AWS
Docker

Websites

http://www.linkedin.com/in/poojith-reddy-jakka

Timeline

Data Engineer

DocuSign

10.2022 - Current

Data Engineer

PayPod digital labs

04.2021 - 08.2022

Data Engineer

eBay

08.2019 - 03.2021

Data Engineer

Axos Bank

03.2018 - 07.2019

Data Engineer

Datadot Software Solution

01.2017 - 02.2018

Poojith Reddy Jakka

Summary

Overview

Work History

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Skills

Websites

Timeline

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Similar Profiles

Violeta MartinVioleta Martin

Travina ShawTravina Shaw

Sandipa ChatterjeeSandipa Chatterjee

Anurag MishraAnurag Mishra

Kevin Ricardo Freire OrtizKevin Ricardo Freire Ortiz