Summary
Work History
Education
Skills
Timeline
Generic

SHRAVYA

Summary

Having 10+ Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies. Strong Experience with Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, Glue, EMR, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Auto Scaling, Security Groups. Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data Factory. Hands on experience on Google Cloud Platform (GCP) in all the bigdata products Big query, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service). Strong experience in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue. Excellent programming skills with experience in Java, PL/SQL, SQL, Scala, and Python Programming. Hands on experience in writing Map Reduce programs using Java to handle data sets using Map and Reduce tasks. Worked on frameworks: Angular Material, Spring, Spring Boot, DAO in Spring Framework, Angular, Hibernate (ORM) Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned. Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis. Good understanding and knowledge of NoSQL databases like MongoDB, PostgreSQL, HBase and Cassandra. Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats. Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc. Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed. Had extensive working experience on RDBMS such as Oracle, DevOps, Microsoft SQL Server, MYSQL and Worked with NoSQL databases like HBase, Dynamo DB, Cassandra. Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT.

Work History

Sr. Data Engineer

  • Having 10+ Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
  • Strong Experience with Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, Glue, EMR, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Auto Scaling, Security Groups
  • Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data Factory
  • Hands on experience on Google Cloud Platform (GCP) in all the bigdata products Big query, Cloud Data Proc, Google Cloud Storage, Composer (Air Flow as a service)
  • Strong experience in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue
  • Excellent programming skills with experience in Java, PL/SQL, SQL, Scala, and Python Programming
  • Hands on experience in writing Map Reduce programs using Java to handle data sets using Map and Reduce tasks
  • Worked on frameworks: Angular Material, Spring, Spring Boot, DAO in Spring Framework, Angular, Hibernate (ORM)
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis
  • Good understanding and knowledge of NoSQL databases like MongoDB, PostgreSQL, HBase and Cassandra
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
  • Mastered in using different columnar file formats like RC, ORC and Parquet formats
  • Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc
  • Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed
  • Had extensive working experience on RDBMS such as Oracle, DevOps, Microsoft SQL Server, MYSQL and Worked with NoSQL databases like HBase, Dynamo DB, Cassandra
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT.

Data Engineer

  • Designed and set up Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data
  • Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect
  • Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery
  • Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery
  • Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage
  • Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
  • Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD
  • Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response
  • Creating Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2 resources
  • Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages)
  • Coded Teradata BTEQ scripts to load, transform data, fix defects like SCD 2 date chaining, cleaning up duplicates
  • Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects
  • Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server
  • Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions
  • Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker related tasks such as publishing data to S3, training ML model and deploying it for prediction
  • Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon SageMaker.

Azure Data Engineer

  • Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis
  • Working Experience on Azure Databricks cloud to organizing the data into notebooks and making it easy to visualize data using dashboards
  • Performed ETL on data from different source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
  • Created database tables and stored procedures as required for reporting and ETL needs
  • Databricks job configuration, Refactoring of ETL Databricks notebooks
  • Implemented data ingestion from various source systems using Sqoop and PySpark
  • Performed end- to-end Architecture, implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis
  • Hands on experience implementing Spark and Hive jobs performance tuning
  • KS by proper troubleshooting, estimation, and monitoring of the clusters
  • Performed Data Aggregation, Validation and on Azure HDInsight using spark scripts written in Python
  • Performed monitoring and management of the Hadoop cluster by using Azure HDInsight
  • Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL) using SAS/SQL, SAS/macros
  • Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis
  • Good experience working on analysis tools like Tableau, Splunk for regression analysis, pie charts and bar graphs
  • Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL
  • Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX
  • Wrote Python scripts to parse XML documents and load the data in database
  • Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading
  • Created HBase tables to store various data formats of data coming from different sources
  • Responsible for importing log files from various sources into HDFS using Flume
  • Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures
  • Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.

Big Data Engineer

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Experienced developing and maintaining ETL jobs
  • Performed data profiling and transformation on the raw data using Pig, Python, and oracle
  • Experienced with batch processing of data sources using Apache Spark
  • Developing predictive analytic using Apache Spark Scala APIs
  • Created Hive External tables and loaded the data into tables and query data using HQL
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format
  • Developed Spark streaming application to pull data from cloud to Hive table
  • Used Spark SQL to process the huge amount of structured data
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response
  • Extracted files from Mongo DB through Sqoop, placed in HDFS, and processed
  • Developed complete end to end Big-data processing in Hadoop eco system
  • Created automated python scripts to convert the data from different sources and to generate the ETL pipelines
  • Configured Stream sets to store the converted data to SQL SERVER using JDBC drivers
  • Converted some existing hive scripts to Spark applications using RDD's for transforming data and loading into HDFS
  • Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like snappy, GZip and zlib
  • Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution
  • Worked with different tools to verify the Quality of the data transformations
  • Creation, configuration and monitoring Shards sets
  • Analysis of the data to be shared, choosing a shard Key to distribute data evenly
  • Worked with Spark-SQL context to create data frames to filter input data for model execution
  • Configured the setup of Development and PROD environment
  • Worked Extensively with Linux platform to setup the server
  • Extensively Worked on Amazon S3 for data storage and retrieval purposes
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
  • Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.

ETL Developer

  • Provided Technical support to the team as the ETL developer
  • Addressed best practices and productivity enhancing issues
  • Worked on designing and developing the Quality stage
  • Loaded data into load, staging and lookup tables
  • Staging area was implemented using flat files
  • Created jobs in DataStage to import data from heterogeneous data sources like Oracle 9i, Text files and SQL Server
  • Generation of Surrogate IDs for the dimensions in the fact table for indexed and faster access of data in server jobs
  • Extensively worked on Job Sequences to Control the Execution of the job flow using various Activities & Triggers (Conditional and Unconditional) like Job Activity, Wait for file, Email Notification, Sequencer, Exception handler activity and Execute Command
  • Dicing and Slicing of the input data for the Business feedback
  • Testing of the system
  • Designing Data masking techniques to mask sensitive information when working with offshore
  • Assisted Mapping team to transform the business requirements into ETL specific mapping rules
  • Developed ETL procedures to ensure conformity, compliance with standards and lack of redundancy, translates business rules and functionality requirements into ETL procedures using Informatica –Power Mart
  • Worked with Erwin tool in Data Modeling (both Physical and Logical Design)
  • Developed and documented data Mappings/Transformations, Audit procedures and Informatica sessions
  • Enhanced various complex jobs for performance tuning
  • Responsible for version controlling and promoting code to higher environments
  • Worked on Teradata optimization and performance tuning
  • Performed Unit Testing, System Integration Testing and User acceptance testing
  • Involved in ongoing production support and process improvements
  • Ran the DataStage jobs through third party schedulers.

Education

Bachelor’s degree in Computer Science -

JNTUH
01.2013

Skills

  • Hadoop
  • MapReduce
  • Spark
  • HDFS
  • Sqoop
  • YARN
  • Oozie
  • Hive
  • Impala
  • Apache Airflow
  • HBase
  • PL/SQL
  • SQL
  • Python
  • Scala
  • Java
  • MySQL
  • SQL Server
  • Oracle
  • MS Access
  • Cassandra
  • Dynamo DB
  • Autosys
  • Tableau
  • Power BI
  • Informatica
  • Talend
  • Azure
  • AWS
  • GCP
  • Eclipse
  • Jupyter notebook
  • Spyder
  • PyCharm
  • IntelliJ
  • Git
  • SVN

Timeline

Sr. Data Engineer

Data Engineer

Azure Data Engineer

Big Data Engineer

ETL Developer

Bachelor’s degree in Computer Science -

JNTUH
SHRAVYA