Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Priyanka Kosuri

Pine Brook,NJ

Summary

  • Around 5 years of experience in systems analysis, design, and development in the fields of java, Data Warehousing,
  • Hadoop Ecosystem, AWS Cloud Data Engineering, Data Visualization, Reporting and Data Quality Solutions.
  • Good experience in Amazon Web Services like S3, IAM, EC2, EMR, Kinesis, VPC, Dynamo DB, RedShift, Amazon RDS, Lambda, Athena, Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS and other services of the AWS family.
  • Hands on experience in Data Analytics Services such as Athena, Glue, Data Catalog & Quick Sight.
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Experience in developing the Hadoop based applications using HDFS, MapReduce, Spark, Hive, Sqoop, HBase and Oozie.
  • Hands on experience in Architecting Legacy Data Migration projects on - premises to AWS Cloud.
  • Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations
    and analytics on large data sets in EMR clusters.
  • Hands on experience on tools like Hive for data analysis and Sqoop for data ingestion and Oozie for scheduling.
  • Experience in scheduling and configuring the oozie and also having good experience in writing Oozie workflow and coordinators.
  • Worked on different file formats like JSON, XML, CSV, ORC, Paraquet.
  • Experience in processing both structured and semi structured Data with the given file formats.
  • Good knowledge in Kafka and Flume.:
  • Experience in Java, Java EE (2ee) technologies and proficient in Core Java, Servlets, JSP, EJB, JDBC, XML, and spring, Struts and Hibernate and RESTful Webservices.
  • Proven knowledge of standards-compliant, cross-browser compatible HTML, CSS, JavaScript, and Ajax.
  • Having good experience in different SDLC models including Waterfall, V-Model and Agile.
  • Demonstrated proficiency in Microsoft Office suite (Excel, Word, PowerPoint) to create comprehensive reports, presentations, and documentation for internal and external stakeholders.
  • Collaborated with cross-functional teams to streamline data collection processes, improving efficiency by 20%.
  • Employed SAP ERP system to manage inventory and streamline procurement processes, enhancing operational efficiency.
  • Maintained a high level of customer service orientation by promptly addressing client inquiries and concerns, leading to a 95% satisfaction rate.
  • Presented findings and recommendations to senior management through clear and concise communication, leveraging strong presentation skills.
  • Actively sought out information and remained up-to-date with industry trends, enhancing analytical and conceptual thinking abilities.
  • Thrived in a fast-paced environment by prioritizing tasks effectively and delivering high-quality results under tight deadlines.
  • Demonstrated organizational awareness and commitment by adapting to evolving business needs and fostering a collaborative work environment.

Overview

5
5
years of professional experience

Work History

AWS Data Engineer

Comcast
Philadelphia, PA
01.2023 - 01.2024
  • Designed and setup Enterprise Data Lake to provide support for various uses cases including Storing, processing, Analytics and Reporting of voluminous, rapidly changing data by using various AWS Services.
  • Used various AWS services including S3,EC2, AWS Glue, Athena, RedShift, EMR,SNS,SQS, DMS, Kenesis.
  • Extracted data from multiple source systems S3, Redshift, RDS and Created multiple tables/databases in Glue Catalog by
    creating Glue Crawlers.
  • Created AWS Glue crawlers for crawling the source data in SB and RDS.
  • Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded into S3, Redshift and RDS.
  • Created multiple Recipes in Glue Data Brew and then used in various Glue ETL Jobs.
  • Design and Develop ETL Processes in AWS Glue to migrate data from external sources like 53, Parquet/Text Files into AWS
    Redshift.
  • Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations using AWS Athena.
  • Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populate AWS Glue data Catalog with metadata table definitions.
  • Used AWS Glue for transformations and AWS Lambda to automate the process.
  • Used AWS EMR to transform and move large amounts of data into and out of other
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs using CloudWatch.
  • Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR. Redshift and S3.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
  • Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into 53, Dynamo DB and Redshift for storage and analyzation.
  • Created Lambda functions to run the AWS Glue job based on the AWS S3 events.
  • Performed unit testing of all the mappings developed in the ETL layer before delivering it to production environment.

Environment: AWS Glue, S3, IAM, EC2, RDS, Redshift, EC2, Lambda, Boto3, DynamoDB, Apache Spark, Kinesis, Athena, Hive, Sqoop, Python,ETL

AWS Data Engineer

Bluejestic
Tampa, FL
05.2022 - 12.2022
  • Responsible for provisioning key AWS Cloud services and configure them for scalability, flexibility, and cost optimization.
  • Create VPCs, subnets including private and public, NAT gateways in a multi- region, multi-zone infrastructure landscape to manage its worldwide operation.
  • Manage Amazon Web Services (AWS) infrastructure with orchestration tools such as CFT, Terraform and Jenkins Pipeline.
  • Create Terraform scripts to automate deployment of EC2 Instance, S3. EFS, EBS. IAM Roles, Snapshots and Jenkins Server.
  • Build Cloud data stores in 53 storage with logical layers built for Raw, Curated and transformed data management.
  • Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and
    Quicksight.
  • Create manage bucket policies and lifecycle for $3 storage as per organizations and compliance guidelines.
  • Create parameters and SSM documents using AWS Systems Manager.
  • Established CICD tools such as jenkins and Git Bucket for code repository, build and deployment of the python code base.
  • Build Glue Jobs for technical data cleansing such as deduplication, NULL value imputation and other redundant column removal. Also build Glue jobs to build standard data transformations (date/string and Math operations) and Business transformations required by business users.
  • Used Kinesis Family(Kinesis Data streams, Kinesis Firehose, Kinesis Data Analytics) for collection, processing and analyze the streaming data.
  • Create Athena data sources on 53 buckets for adhoc querying and business dashboarding using Quicksight and Tableau reporting tools.
  • Copy Fact/Dimension and aggregate output from $3 to Redshift for Historical data analysis using Tableau and Quicksight.
  • Use Lambda functions and Step Functions to trigger Glue Jobs and orchèstrate the data pipeline.
  • Use PyCharm IDE for Python/PySpark development and Git for version control and repository management.
  • Environment: AWS - EC2, VPC, 53, EBS, ELB, CloudWatch, CloudFormation, ASG, Lambda, AWS CLI, GIT, Glue, Athena and Quicksight.
    Python and PySpark, Shell scripting, Jenkins.

Hadoop - AWS Data Engineer

LSInextGen
Piscataway, NJ
05.2019 - 04.2022
  • Participated in requirements gathering and actively involved in the developing the requirement's into technical specifications.
  • Used SpringXD for data ingestion into HDFS.
  • Involved in development of MaReduce job's using various AP's like Mapper, Reducer, Record Reader, Input Formatter etc.
  • Extensively used HDFS for the storing the data.
  • Worked on Hive for creating External and Internal tables and did some analysis on the data.
  • Used HiveQL for the analysis on the data and validating the data.
  • Created Hive Load Queries for loading the data from HDFS.
  • Used Sqoop to export the data to Netezza from Hive and also used to import the data from Netezza to Hive.
  • Used informatica to load the data to final table. Used bulk load for this process.
  • Created sqoop jobs for importing and exporting the data from/to Netezza.
  • Used Oozie for scheduling this entire process.
  • Worked on AWS POC for transferring data from local file system to S3.
  • Hands on experience in creating EMR cluster and developing the glue jobs.
  • Written oozie workflows and job. Properties files for managing the oozie jobs. Configured all our MapReduce, Hive, Sqoop jobs in oozie workflow.
  • Scheduled the oozie jobs using Coordinator. Written workflow.xml, job. Properties and cordinator.xml for scheduling the oozie jobs.
  • Created Kinesis streams for live streaming of the data.
  • Done some mappings in Informatica and loaded the data to the target tables.
  • Written Oozie classes for moving files and deleting the files.
  • Configured the Jar's in the oozie workflows.
  • Validated the Hadoop jobs like MapReduce, Oozie using CLI. Able to handle the jobs in HUE too.
  • Deployed these Hadoop applications into the Development, Stage, and production Environments.
  • Extensively used Spark and created RDD's and Hive Sql for the Aggregating the data

Environment: Hadoop,MapReduce ,Java,Spark,Hive,Sqoop,Oozie,HDFS, Netezza,AWS,EMR, Glue, S3, Informatica9.1,DB2,Oracle11g.SQL,WindowsXP.AgileScrum,MRUnit.Mockito,ApacheLog4j.SpringXD.Subverion

Education

Master of Science in Data Science -

Saint Peter's University
02.2024

Bachelor of Engineering in Information Technology -

Bharat Institute of Engineering and Technology
05.2019

Skills

Programming Languages: Java14/15/16, Python
Hadoop/Big Data: HDP, HDFS, 5qoop, Hive, Pig, HBase, MapReduce, Spark, Oozie
AWS Cloud Technologies: IAM, S3, EC2, VPC, EMR, Glue, Dynamo DB, RDS, Redshift, Cloud Watch, Cloud Trail, Cloud Formation, Kinesis, Lambda, Athena, EBS, DMS, Elastic Search, SQS, SNS, KMS, QuickSight, ELB, Auto ScalingXML,XSL,XSLT,EJB 20/30,Struts1x/2,Spring25, Hibernate32,Ajax
Scripting Languages: Java Script, Python, Shell Script
Web Servers: Apache Tomcat41/50
Databases: Oracle (PL/SQL, SQL), DB2, Netezza
Tools: CVS, Code Commit, GIT hub, ApacheLog4j TOAD, ANT, Maven,unit Mock, Mockito, REST HTTP
Client,JMeter,CucumberJenkins,Aginity
ETL Tools: Informatica, DataStage
IDE'S: Eclipse, IBM'S RAD75

  • Proficiency in Microsoft Office (Excel, Word, PowerPoint)
  • Advanced Excel skills, including pivot tables, VLOOKUP, and macros
  • SAP ERP hands-on experience
  • Excellent communication skills
  • Analytical and conceptual thinking
  • Information seeking
  • Customer service orientation
  • Presentation skills
  • Collaboration
  • Organizational awareness & commitment
  • Ability to deliver in a fast-paced environment

Timeline

AWS Data Engineer

Comcast
01.2023 - 01.2024

AWS Data Engineer

Bluejestic
05.2022 - 12.2022

Hadoop - AWS Data Engineer

LSInextGen
05.2019 - 04.2022

Master of Science in Data Science -

Saint Peter's University

Bachelor of Engineering in Information Technology -

Bharat Institute of Engineering and Technology
Priyanka Kosuri