SHRAVYA

Summary

Having 10+ Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies. Strong Experience with Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, Glue, EMR, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Auto Scaling, Security Groups. Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data Factory. Hands on experience on Google Cloud Platform (GCP) in all the bigdata products Big query, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service). Strong experience in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue. Excellent programming skills with experience in Java, PL/SQL, SQL, Scala, and Python Programming. Hands on experience in writing Map Reduce programs using Java to handle data sets using Map and Reduce tasks. Worked on frameworks: Angular Material, Spring, Spring Boot, DAO in Spring Framework, Angular, Hibernate (ORM) Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned. Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis. Good understanding and knowledge of NoSQL databases like MongoDB, PostgreSQL, HBase and Cassandra. Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats. Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc. Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed. Had extensive working experience on RDBMS such as Oracle, DevOps, Microsoft SQL Server, MYSQL and Worked with NoSQL databases like HBase, Dynamo DB, Cassandra. Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT.

Work History

Sr. Data Engineer

Having 10+ Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
Strong Experience with Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, Glue, EMR, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Auto Scaling, Security Groups
Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data Factory
Hands on experience on Google Cloud Platform (GCP) in all the bigdata products Big query, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service)
Strong experience in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue
Excellent programming skills with experience in Java, PL/SQL, SQL, Scala, and Python Programming
Hands on experience in writing Map Reduce programs using Java to handle data sets using Map and Reduce tasks
Worked on frameworks: Angular Material, Spring, Spring Boot, DAO in Spring Framework, Angular, Hibernate (ORM)
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis
Good understanding and knowledge of NoSQL databases like MongoDB, PostgreSQL, HBase and Cassandra
Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
Mastered in using different columnar file formats like RC, ORC and Parquet formats
Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc
Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed
Had extensive working experience on RDBMS such as Oracle, DevOps, Microsoft SQL Server, MYSQL and Worked with NoSQL databases like HBase, Dynamo DB, Cassandra
Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT.

Data Engineer

Designed and set up Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect
Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery
Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage
Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users
Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3
Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD
Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response
Creating Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2 resources
Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages)
Coded Teradata BTEQ scripts to load, transform data, fix defects like SCD 2 date chaining, cleaning up duplicates
Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects
Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server
Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions
Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker related tasks such as publishing data to S3, training ML model and deploying it for prediction
Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon SageMaker.

Azure Data Engineer

Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis
Working Experience on Azure Databricks cloud to organizing the data into notebooks and making it easy to visualize data using dashboards
Performed ETL on data from different source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
Created database tables and stored procedures as required for reporting and ETL needs
Databricks job configuration, Refactoring of ETL Databricks notebooks
Implemented data ingestion from various source systems using Sqoop and PySpark
Performed end- to-end Architecture, implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis
Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis
Hands on experience implementing Spark and Hive jobs performance tuning
KS by proper troubleshooting, estimation, and monitoring of the clusters
Performed Data Aggregation, Validation and on Azure HDInsight using spark scripts written in Python
Performed monitoring and management of the Hadoop cluster by using Azure HDInsight
Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL) using SAS/SQL, SAS/macros
Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances
Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis
Good experience working on analysis tools like Tableau, Splunk for regression analysis, pie charts and bar graphs
Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL
Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base
Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX
Wrote Python scripts to parse XML documents and load the data in database
Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading
Created HBase tables to store various data formats of data coming from different sources
Responsible for importing log files from various sources into HDFS using Flume
Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures
Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files
Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.

Big Data Engineer

Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
Experienced developing and maintaining ETL jobs
Performed data profiling and transformation on the raw data using Pig, Python, and oracle
Experienced with batch processing of data sources using Apache Spark
Developing predictive analytic using Apache Spark Scala APIs
Created Hive External tables and loaded the data into tables and query data using HQL
Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers
Developed Spark code using Scala and Spark-SQL for faster testing and data processing
Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format
Developed Spark streaming application to pull data from cloud to Hive table
Used Spark SQL to process the huge amount of structured data
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response
Extracted files from Mongo DB through Sqoop, placed in HDFS, and processed
Developed complete end to end Big-data processing in Hadoop eco system
Created automated python scripts to convert the data from different sources and to generate the ETL pipelines
Configured Stream sets to store the converted data to SQL SERVER using JDBC drivers
Converted some existing hive scripts to Spark applications using RDD's for transforming data and loading into HDFS
Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like snappy, GZip and zlib
Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution
Worked with different tools to verify the Quality of the data transformations
Creation, configuration and monitoring Shards sets
Analysis of the data to be shared, choosing a shard Key to distribute data evenly
Worked with Spark-SQL context to create data frames to filter input data for model execution
Configured the setup of Development and PROD environment
Worked Extensively with Linux platform to setup the server
Extensively Worked on Amazon S3 for data storage and retrieval purposes
Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.

ETL Developer

Provided Technical support to the team as the ETL developer
Addressed best practices and productivity enhancing issues
Worked on designing and developing the Quality stage
Loaded data into load, staging and lookup tables
Staging area was implemented using flat files
Created jobs in DataStage to import data from heterogeneous data sources like Oracle 9i, Text files and SQL Server
Generation of Surrogate IDs for the dimensions in the fact table for indexed and faster access of data in server jobs
Extensively worked on Job Sequences to Control the Execution of the job flow using various Activities & Triggers (Conditional and Unconditional) like Job Activity, Wait for file, Email Notification, Sequencer, Exception handler activity and Execute Command
Dicing and Slicing of the input data for the Business feedback
Testing of the system
Designing Data masking techniques to mask sensitive information when working with offshore
Assisted Mapping team to transform the business requirements into ETL specific mapping rules
Developed ETL procedures to ensure conformity, compliance with standards and lack of redundancy, translates business rules and functionality requirements into ETL procedures using Informatica –Power Mart
Worked with Erwin tool in Data Modeling (both Physical and Logical Design)
Developed and documented data Mappings/Transformations, Audit procedures and Informatica sessions
Enhanced various complex jobs for performance tuning
Responsible for version controlling and promoting code to higher environments
Worked on Teradata optimization and performance tuning
Performed Unit Testing, System Integration Testing and User acceptance testing
Involved in ongoing production support and process improvements
Ran the DataStage jobs through third party schedulers.

Education

Bachelor’s degree in Computer Science -

JNTUH

01.2013

Skills

Hadoop
MapReduce
Spark
HDFS
Sqoop
YARN
Oozie
Hive
Impala
Apache Airflow
HBase
PL/SQL
SQL
Python
Scala
Java
MySQL
SQL Server
Oracle

MS Access
Cassandra
Dynamo DB
Autosys
Tableau
Power BI
Informatica
Talend
Azure
AWS
GCP
Eclipse
Jupyter notebook
Spyder
PyCharm
IntelliJ
Git
SVN

SHRAVYA

Summary

Work History

Sr. Data Engineer

Data Engineer

Azure Data Engineer

Big Data Engineer

ETL Developer

Education

Bachelor’s degree in Computer Science -

Skills

Timeline

Sr. Data Engineer

Data Engineer

Azure Data Engineer

Big Data Engineer

ETL Developer

Bachelor’s degree in Computer Science -

Similar Profiles

Adrián Pópulo QuirogaAdrián Pópulo Quiroga

Tyler ZwiersenTyler Zwiersen

Cathy Mih TaylorCathy Mih Taylor

Ella GoreElla Gore

Balachakradhar RocksagayarajBalachakradhar Rocksagayaraj