Summary
Overview
Work History
Education
Skills
Timeline
Generic

Bhanulatha Rayapalli

Torrance,CA

Summary

  • 10+ years of professional experience in information technology with an expert hand in the areas of Big Data, Hadoop Spark, Hive, Impala, Sqoop, Flume, Kafka, SQL tuning, ETL development, report development, SAS, database development, data modeling and strong knowledge of oracle database architecture.
  • Experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, GCP, Azure, Spring Boot, Spark integration with Cassandra, Avro, Solr and Zookeeper.
  • Strong experience in migrating other databases to Snowflake.
  • Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB), SQL Server, Oracle, Data Warehouse etc. Build multiple Data Lakes.
  • Built a scalable, automated data pipeline using AWS services (Glue, EMR, Redshift) and GCP tools (BigQuery, Pub/Sub), integrating diverse data sources into Snowflake for analytics.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, Power BI.
  • Adopted best practices for AWS Lambda security, ensuring compliance with industry standards and enhancing data protection protocols.
  • Created Snowflake Schemas by normalizing the dimension tables as appropriate, and creating a Sub Dimension named Demographic as a subset to the Customer Dimension.
  • Utilized for ETL (Extract, Transform, Load) processes, integrating various data sources to centralize and prepare data for analysis.
  • Leveraged Airflow for orchestration, dbt for data transformations, and Informatica for data quality. Implemented continuous data validation and governance using IDQ and AWS Lambda for serverless automation.
  • Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exceptions Handling, Lambda Expressions, Annotations
  • Conducted training sessions on Apache Flink for team members, fostering knowledge sharing and enhancing team capabilities in stream processing.
  • Ensured data integrity and compliance with regulations (e.g., GDPR, CCPA) while managing big data projects, safeguarding sensitive information.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and MongoDB using Python.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Experience in working on creating and running Docker images with multiple micro services.
  • Experienced in utilizing AWS services such as Amazon S3 for data storage, Amazon RDS for relational database management, and Amazon Redshift for data warehousing solutions.
  • Implemented event-driven architectures with AWS Lambda, enabling real-time data processing and enhancing system responsiveness.
  • Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.
  • Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.
  • Having extensive experience in Microsoft Azure Cloud Computing, GCP and SQL BI Technologies.
  • Hands-on experience in Azure Cloud Services (PaaS & SaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
  • Good experience in tracking and logging end to end software applications build using Azure DevOps.
  • Used SQL Azure extensively for database needs in various applications.
  • Experienced with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
  • Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.

Overview

11
11
years of professional experience

Work History

Senior Data Engineer

Humana
08.2023 - Current
  • Created powerful search solutions with Apache Solr, enabling users to perform detailed and fast full-text searches and complex queries
  • Built and managed scalable Aerospike clusters that efficiently handled massive data loads, ensuring minimal downtime and high system reliability
  • Integrated AWS Lambda with third-party APIs to automate data retrieval and processing tasks, reducing manual data entry efforts and improving accuracy
  • Worked on migration of data from On-Prem SQL server to Cloud database (Azure Synapse Analytics (DW) & Azure SQL DB)
  • Managed NoSQL database on GCP, designed for large analytical and operational workloads, often used for time-series data or high-throughput applications
  • Developed and deployed real-time data processing applications using Apache Flink, enabling the analysis of streaming data with latency as low as a few seconds
  • Developed interactive dashboards using tools like Tableau and Power BI to present complex data findings to stakeholders, enhancing decision-making processes
  • Developed and managed ETL processes using AWS Glue, enabling seamless data integration from multiple sources and automating data transformation tasks
  • Built a serverless RESTful API with AWS Lambda and API Gateway for a mobile application, achieving seamless integration with front-end components and improving response times
  • Created tabular models on Azure analysis services for meeting business reporting requirements
  • Implemented AWS security best practices, including IAM (Identity and Access Management), to manage user permissions and ensure data compliance with regulations like GDPR
  • Experience in data transformations using Azure HDInsight, HIVE for different file formats
  • Developed Spark and SparkSQL code to process the data in Apache Spark on Azure HDInsight to perform the necessary transformations based on the STMs developed
  • Developed business intelligence solutions using SQL Server data tools to load data to SQL & Azure Cloud databases
  • Leveraged big data technologies (e.g., Hadoop, Spark) to analyze large datasets, driving insights that informed strategic business decisions
  • Created an automated data ingestion process utilizing AWS Lambda, which processed and transformed incoming data streams from IoT devices, resulting in a 60% reduction in manual intervention
  • Analyzed, designed and built modern data solutions using Azure PaaS services to support visualization of data
  • Converted Talend Job lets to support the snowflake functionality
  • Used Airflow for scheduling the Hive, Spark and Map Reduce jobs
  • Developed Oozie work processes for planning and arranging the ETL cycle
  • Transformed date related data into application compatible format by developing Apache Pig UDFs
  • Configure Zookeeper to coordinate and support Kafka, Spark, Spark Streaming, HBase and HDFS
  • Programmer analysts with expertise Tableau Servers in ETL, Teradata and other EDW data integrations and developments
  • Managed NoSQL database on GCP, designed for large analytical and operational workloads, often used for time-series data or high-throughput applications
  • Used for building and managing secure, scalable data lakes that store raw data from various sources
  • Cloudera integrates with Spark, Hadoop, and other big data technologies for efficient data management
  • Used for event-driven serverless computing
  • Lambda helps automate data processing tasks such as triggering ETL pipelines or handling data events from sources like S3
  • Consulting on Snowflake Data Platform Solution Architecture, Design, Development and deployment focused to bring the data driven culture across the enterprises
  • Worked on Oracle Databases, Redshift and Snowflakes
  • Documented the requirements including the available code, which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search
  • Developed Kafka consumer API in Scala for consuming data from Kafka topics
  • Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2
  • Used Sqoop to channel data from different sources of HDFS and RDBMS
  • Developed Scala scripts using both Data frames/SQL and RDD/Map Reduce in Spark for Data Aggregation, queries and writing data back into OLTP system through SQOOP
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system
  • Integrated Oozie with Hue and scheduled workflows for multiple Hive, Pig and Spark Jobs
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats
  • Involved in modeling different key risk indicators in Splunk and building extensive Hive & SPL quires to understand behavior across the customer life cycle
  • Utilized AWS Cost Explorer and Trusted Advisor to monitor and optimize cloud resource usage, resulting in significant cost savings
  • Created and Managed Splunk DB connect identities, connections inputs, outputs lookups and access controls
  • Created dashboards, reports and alerts for real time monitoring in Splunk
  • Tableau and Jasper soft
  • Performed statistical analysis and predictive modeling using R to uncover trends and patterns, which helped the business make informed decisions

Data Analyst

HCL TECHNOLOGIES LTD
12.2019 - 05.2022
  • Company Overview: USAA
  • Worked in Agile environment, and used rally tool to maintain the user stories and tasks
  • Implemented Apache Sentry to restrict the access on the Hive tables on a group level
  • Designed and implemented by configuring Topics in new Kafka cluster in all environment
  • Created multiple dashboards in tableau for multiple business needs
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access
  • Designed SSIS Packages to extract, transfer, load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)
  • Extract Transform and Load data from sources systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, Azure Data Lake analytics
  • Designed & implemented database solutions in Azure SQL Data Warehouse, Azure SQL
  • Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau
  • Migrated Map reduce jobs to Spark jobs to achieve better performance
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python
  • Developed Apache Spark applications by using spark for data processing from various streaming sources
  • Developed data pipeline using Spark, Hive, Pig, python, Impala, and HBase to ingest customer
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala
  • Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL
  • Joined various tables in Cassandra using spark and Scala and ran analytics on top of them
  • Applied Spark advanced procedures like text analytics and processing using the in-memory processing
  • Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop
  • Brought data from various sources in to Hadoop and Cassandra using Kafka
  • SQL Server reporting services (SSRS)
  • Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Sub reports, ad-hoc reports, parameterized reports, interactive reports & custom reports
  • Designing and Developing Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing
  • USAA

Data Engineer

Avon Technologies Pvt Ltd
12.2018 - 12.2019
  • Company Overview: Proxima
  • Ingested data from Oracle databases using Sqoop and Flume, ensuring smooth data flow into our systems
  • Developed a custom Pig User-Defined Function (UDF) to convert various date and timestamp formats from unstructured files into standardized formats
  • Engaged in hands-on Extract, Transform, Load (ETL) processes with Ab Initio, managing data mapping and transformation in a complex, high-volume environment
  • Analyzed and managed system logs using tools like Splunk and syslog to ensure data integrity and system performance
  • Imported and exported data to and from Hadoop Distributed File System (HDFS) and Hive using Sqoop and Kafka
  • Developed MapReduce programs using Apache Hadoop, allowing efficient processing of large datasets
  • Validated Sqoop jobs and Shell scripts, ensuring accurate data loading without discrepancies
  • Also handled migration and testing of both static and transactional data across core systems
  • Utilized Apache Kafka to enhance data processing by transforming live streaming data with batch processing to generate insightful reports
  • Proficient in several open-source programming languages, including Perl, Python, Scala, and Java
  • Wrote scripts for managing HBase tables, including creating, truncating, dropping, and altering tables to store processed data for future analytics
  • Designed and implemented self-service reporting solutions in Azure Data Lake Store Gen2 using an ELT (Extract, Load, Transform) approach
  • Developed data warehouse models in Snowflake for over 100 datasets, utilizing the Cape tool for efficient data management
  • Worked within Agile methodologies, participating in Scrum stories and sprints while focusing on data analytics and wrangling tasks in a Python environment
  • Tuned the performance of Phoenix/HBase, Hive queries, and Spark applications to optimize system performance
  • Installed Kafka to collect data from various sources, storing it for further consumption
  • Utilized a custom File System plugin to enable seamless access for Hadoop MapReduce programs, HBase, Pig, and Hive
  • Wrote PySpark and Spark SQL transformations in Azure Databricks to implement complex business rules and data transformations
  • Extended the functionality of Hive and Pig by writing custom UDFs, UDTFs, and UDAFs to meet specific project needs
  • Built and maintained a robust environment on Azure's Infrastructure as a Service (IaaS) and Platform as a Service (PaaS)
  • Implemented best practices for Continuous Integration and Continuous Development using Azure DevOps, ensuring effective code versioning
  • Architected and implemented medium to large-scale Business Intelligence (BI) solutions on Azure, leveraging various Azure Data Platform services, including Azure Data Lake, Data Factory, and Stream Analytics
  • Utilized the Azure Portal extensively, including Azure PowerShell, Storage Accounts, and Data Management for efficient operations
  • Created Azure PowerShell scripts to transfer data between the local file system and HDFS Blob storage
  • Managed various database and Azure Data Platform services, including Azure Data Lake, Data Factory, SQL Server, and Oracle, successfully building multiple Data Lakes
  • Developed ETL jobs using Spark-Scala to migrate data from Oracle databases to new Hive tables, ensuring efficient data handling
  • Gained experience in various scripting technologies, including Python and Unix shell scripts
  • Created Spark code using Scala and Spark-SQL/Streaming to facilitate quicker testing and processing of data
  • Developed middleware component services using Java Spring to fetch data from HBase through the Phoenix SQL layer for various web application use cases
  • Proxima

ETL Developer

HI-Gate Infosystems Pvt. Ltd
07.2015 - 11.2018
  • Company Overview: Barrick Gold Corporation
  • Developed Python utility to validate HDFS tables with source tables
  • Implement code in Python to retrieve and manipulate data
  • Designing and implementing data processing systems on GCP using services such as BigQuery, Dataflow, and Dataproc
  • Building and managing data warehouses and data lakes on GCP, ensuring data integrity and security
  • Implementing real-time data streaming and processing solutions using GCP services like Pub/Sub and Apache Beam
  • Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database
  • Scheduled different Snowflake jobs using NiFi
  • Developing and maintaining scalable data pipelines for ingesting, processing, and transforming large volumes of data
  • Designing and optimizing data models and schemas for efficient storage and retrieval
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows
  • Designing and implementing data ingestion processes to capture and load data from various sources into GCP storage systems such as Cloud Storage or Bigtable
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow
  • Involved in filtering data stored in S3 buckets using Elastic search and loaded data into Hive external tables
  • Designed and developed UDF'S to extend the functionality in both PIG and HIVE
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis
  • Developed a shell script to create staging, landing tables with the same schema as the source and generate the properties which are used by Oozie Jobs
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
  • Developed Oozie workflows for executing Sqoop and Hive actions
  • Built various graphs for business decision making using Python matplotlib library
  • Barrick Gold Corporation

Hadoop Developer

Hewlett Packard
01.2014 - 06.2015
  • Company Overview: Dot-com team
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Developed Map Reduce Programs for data analysis and data cleaning
  • Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements
  • Integrating external data sources and APIs into GCP data solutions, ensuring data quality and consistency
  • Building data transformation pipelines using GCP services like Dataflow or Apache Beam to cleanse, normalize, and enrich data
  • Build machine-learning models to showcase big data capabilities using Pyspark and MLlib
  • Designed, implemented and deployed within a customer's existing Hadoop / Cassandra cluster a series of custom parallel algorithms for various customer-defined metrics and unsupervised learning models
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
  • Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc
  • Performed data cleansing, enrichment, mapping tasks and automated data validation processes to ensure meaningful and accurate data was reported efficiently
  • Implemented Apache PIG scripts to load data from and to store data into Hive
  • Dot-com team

Education

Master of Science - Information Systems

Central Michigan University
Mount Pleasant, MI
05-2023

Bachelor of Science - Information Technology

Al Ameer College of Engineering And IT
Anandapuram - Bakurupalem, Andhra Pradesh 531173
11-2011

Skills

  • All versions of Windows
  • UNIX
  • LINUX
  • Macintosh HD
  • Sun Solaris
  • Java
  • Scala
  • R
  • Python (NumPy, SciPy, Pandas, Genism, Keras)
  • Shell Scripting
  • Microsoft SQL Server
  • MySQL
  • Oracle 11g
  • 12c
  • DB2
  • Teradata
  • Netezza
  • MS Office (Word/Excel/Power Point/ Visio/Outlook)
  • SSRS
  • Cognos
  • Jenkins
  • Toad
  • SQL Loader
  • PostgreSQL
  • Talend
  • Maven
  • ANT
  • RTC
  • RSA
  • Control-M
  • Oozie
  • Hue
  • SOAP UI
  • Microsoft SQL Studio
  • IntelliJ
  • Azure Databricks
  • Eclipse
  • NetBeans

Timeline

Senior Data Engineer

Humana
08.2023 - Current

Data Analyst

HCL TECHNOLOGIES LTD
12.2019 - 05.2022

Data Engineer

Avon Technologies Pvt Ltd
12.2018 - 12.2019

ETL Developer

HI-Gate Infosystems Pvt. Ltd
07.2015 - 11.2018

Hadoop Developer

Hewlett Packard
01.2014 - 06.2015

Master of Science - Information Systems

Central Michigan University

Bachelor of Science - Information Technology

Al Ameer College of Engineering And IT
Bhanulatha Rayapalli