Summary
Overview
Work History
Education
Skills
Websites
Personal Information
Timeline
Generic

Ram Kolla

Summary

Data Engineer with 10+ years of experience in designing and delivering large-scale data applications leveraging Big Data technologies and Cloud based Data engineering platforms. Hands on experience on Unified Data Engineering with Azure Data Factory, AWS, GCP, Snowflake, Databricks, Designed and developed Databricks Notebooks leveraging Python, Apache Spark, and SQL to build scalable data pipelines. Designed and implemented end-to-end data processing pipelines using Azure Data Factory, Databricks, Azure Functions, Triggers and Azure Key Vault to address complex business use cases and enable real-time analytics and monitoring. Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up Integration Azure Data Factory, AWS and GCP with Databricks Workspace for Business Analytics, Manage Clusters in Databricks and managing the Machine Learning Lifecycle. Implemented Data Encryption and Access Control Measures using Azure Key Vault, Azure AD, and Microsoft Defender for Cloud, securing sensitive data and credentials. Built a robust and scalable data pipeline using AWS services including S3, Glue, Redshift, Lambda, CloudWatch, Kinesis, Step Functions, and Secrets Manager to orchestrate ETL workflows, manage secure data access, and support business intelligence reporting. Designed and delivered fact and dimension tables to support BI reporting and executive-level visualization in Power BI and Tableau, enabling data-driven decision-making for key management. Skilled in query optimization, data workflows using Alteryx and data visualization presentations using Power BI, and Quick Sight. Built pipelines in ADF using Datasets/Linked Services/Pipeline to extract, load and transform data from various sources such as Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backwards. Developed scalable ETL pipelines on GCP using Dataflow, Big Query, and Cloud Storage to ingest, process, and transform data from sources like Cloud SQL, on-premises databases, Pub/Sub, and SaaS platforms automated workflows with Cloud Functions and Cloud Composer. Built robust data pipelines in Snowflake using Streams, Tasks, and Snowpipe to extract, load, and transform data from diverse sources such as AWS S3, Azure Blob Storage, on-prem SQL systems, and third-party APIs; enabled near real-time ingestion and transformation for analytics and reporting. End-to-end data pipelines using AWS Glue, Lambda, and Step Functions to extract, transform, and load data from Amazon S3, RDS, Redshift, and external APIs, ensured data quality and orchestration for both batch and near real-time processing scenarios. Migrated on-premises data from SQL and Oracle to Azure Data Lake Storage (ADLS Gen2) using Azure Data Factory ensuring scalable and efficient cloud data integration. Good knowledge of batch and historical data processing using Hive, Pig, and Databricks, enabling retrospective trend analysis, advanced feature engineering, and generation of ML-ready datasets for predictive modeling. Proficient in implementing CI/CD frameworks for data pipelines using tools like Jenkins, ensuring efficient automation and deployment. Configured data quality checks using Great Expectations, providing real-time monitoring and automatic alerting when validation rules were violated. Proficient in cloud migration strategies with a focus on Azure, AWS, Snowflake and GCP, including Azure Migrate, Lift and Shift methodologies. Managed end-to-end deployment workflows for data processing jobs, ensuring smooth transitions from development to production environments using Bamboo. With experience in delivering business intelligence solutions, developing executive dashboards and self-service BI using Tableau, Power BI, Looker, and Looker Studio to visualize KPIs, operational metrics, and compliance outcomes. Proficient in sorting, analyzing, and integrating data using GCP (Big Query, Cloud Dataflow), AWS (Glue, Redshift), and Snowflake (SQL, Streams & Tasks, Snow pipe). Proficient in configuring Azure DevOps pipelines and GitHub for continuous integration, automated testing, and streamlined code delivery. Experienced in dimensional modeling, data migration, cleansing, and ETL processes for data warehousing. Skilled in statistical modeling, machine learning, and decision trees, with a strong foundation in ETL processes and data transformation in AWS, GCP, Snowflake and Azure. Experienced in developing, maintaining, and supporting data pipelines using Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics, IBM DataStage jobs, custom frameworks, and related technologies. Worked in the areas of analysis, design, development, production support and implementation phases of Datawarehouse and BI application. Extensively worked on Datawarehouse concepts like Data Modeling, ETL Jobs, Data Marts and ETL Framework. Have extensive experience with Power BI, encompassing data visualization, report creation, and dashboard development. Hands-on experience in designing and implementing ETL/ELT pipelines to ingest semi-structured and nested JSON data from MongoDB and MongoDB Atlas into cloud data warehouses like Snowflake. Proficient in utilizing Spark Core and Spark SQL scripts using Scala to accelerate data processing capabilities. Good Experience in SQL (SSIS, SSAS, SSRS), T-SQL, PLSQL, Unix, Microsoft Power BI, Office 365.

Overview

11
11
years of professional experience

Work History

Senior Data Engineer

CYIENT Ltd
IN
10.2024 - Current
  • Created end-to-end, scalable Azure Data Factory (ADF) pipelines to orchestrate ETL workflows across various sources, including Azure Blob Storage, AWS S3, on-premises SQL Server, and APIs, loading processed data into Azure Synapse Analytics and Azure Data Lake Storage Gen2.
  • Performed Slowly Changing dimension (SCD) type 2 by capturing patient insurance details, treatment records, and demographics to SQL Datawarehouse in Azure Data Factory (ADF).
  • Implemented Data Encryption and Access Control Measures using Azure Key Vault, Azure AD, and Microsoft Defender for Cloud, securing sensitive data and credentials.
  • Implemented stored procedures in Azure SQL Database and leveraged Azure HDInsight for optimized query performance, streamlined data manipulation, and orchestrated data warehouse solutions.
  • Integrated Azure Databricks notebooks into ADF pipelines to perform advanced data transformations, cleansing, and enrichment using PySpark, pandas, and Delta Lake on large-scale datasets.
  • Automate CI/CD pipelines for data workflows using Azure DevOps, performed data ingestion using the Azure Data Factory, built data pipelines, and eliminated unnecessary data.
  • Developed fact and dimension tables for enterprise BI reporting, delivering interactive dashboards in Power BI and Tableau to provide key management with actionable insights.
  • Monitored and guided BI teams in creating reports and dashboards using Power BI and Tableau, ensuring accuracy and consistency in business reporting.
  • Sorted and analyzed the concealed information to learn more about it using the outdated SSIS in addition to integrating the data.
  • Migrated existing data objects (tables, views) from the current Hive meta store to Unity Catalog using Tools like UCX (Unity Catalog Upgrade).
  • Collaborate with cross-functional teams to support data governance using Databricks Unity Catalog.
  • Utilized Azure log analytics to track developments and produced a more error-free and superior query response.
  • Strong data integration and transformation skills, leveraging Azure Data Factory, Databricks, and SQL to design and implement scalable, efficient ETL/ELT pipelines across diverse data sources.
  • To ensure consistency in the testing and deployment process, Azure DevOps was utilized to establish pipelines for continuous integration, testing, and code delivery (CI/CD) to the destination.
  • Improved Auditing using Unity Catalog by automatically logging user access to data objects, providing valuable insights into data usage patterns and potential security risks.
  • Comfortable in using Agile Workstyle with breaking down large projects into smaller, iterative cycles (sprints) with continuous delivery of working features.
  • Environment: Azure, SQL, Kafka, Python, Visualization, NoSQL, SQL Server, Excel, Oracle, SQL Azure, Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, and Azure SQL Data warehouse, Apache Airflow, SAAS, PAAS, AWS Redshift, AWS S3, AWS Glue.

Data Engineer

CYIENT Ltd
IN
10.2022 - 09.2024
  • Created data pipelines using Azure Data Factory (ADF) and Azure Synapse pipelines, leveraging features like change tracking and scheduled triggers to extract, transform, and load data from various sources, including Azure Blob Storage, and relational databases.
  • Managed Azure Synapse Analytics environments, including performance scaling, resource monitoring, and scheduled maintenance to ensure optimal performance, cost efficiency, and high availability.
  • Successfully completed a proof of concept for Azure implementation as part of a larger cloud migration strategy, moving on-premises servers and data to the Azure cloud platform.
  • Responsible for estimating Azure Synapse dedicated SQL pool capacity, monitoring performance metrics, and troubleshooting issues to ensure efficient query execution and resource utilization.
  • Developed optimized data pipelines in Azure Databricks using Spark and Pandas Data Frames for complex operations such as joins, aggregations, filtering, and reshaping of multi-million-row datasets before loading into Azure Synapse Analytics.
  • Performed data quality analysis in Azure Databricks and Azure Synapse by creating analytical datasets and executing complex SQL queries to validate accuracy and consistency.
  • Designed and implemented ETL and data movement solutions integrating Azure Data Factory with Azure Databricks, Delta Lake, and Python-based frameworks for scalable data processing.
  • Migrated on-premises data from SQL Server to Azure Synapse Analytics using bulk loading techniques such as copy command, and staging in Azure Data Lake Storage Gen2.
  • Worked on Azure Databricks notebooks to extract raw data from multiple sources, perform complex transformations, and load cleansed datasets into Azure Synapse Analytics for downstream analytics.
  • Designed and developed data warehouse models in Azure Synapse following star and snowflake schema best practices to optimize performance and query efficiency.
  • End-to-end data ingestion, cleansing, conversion, and analysis workflows using Azure Data Factory and Azure Databricks to generate actionable business insights.
  • Experienced with SQL scripting, query optimization, and advanced features in Azure Synapse, including handling semi-structured data formats such as JSON and Parquet.
  • Environment: Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure SQL Database, Delta Lake, Spark, Python, Pandas, REST APIs, Power BI, Git, Azure DevOps, JSON, Parquet, CSV.

Data Engineer

Wipro
Hyderabad, IN
11.2020 - 09.2022
  • Processed streaming data in real time using AWS Kinesis by building data ingestion pipelines that captured, transformed, and delivered high-volume event streams from multiple sources into downstream systems, ensuring low-latency analytics and seamless integration with AWS services such as S3 and Redshift.
  • Acquired and ingested data from the client’s AWS RDS MySQL instances by designing and implementing secure extraction processes, integrating the data into ETL pipelines for transformation and loading into downstream data warehouses, ensuring data accuracy, consistency, and availability for analytics.
  • Transformed data using AWS Glue Crawlers to generate metadata for the AWS Data Catalog and implemented SNS with Lambda functions to automate notifications for new application events.
  • Implemented custom ETL processes in Java to acquire, transform, and load data from AWS RDS MySQL to S3.
  • Constructed the PySpark scripts to evaluate data and use transformation rules to convert it to a Kinesis-compatible format.
  • Used the Elastic Map Reduce (EMR) service to process the data that was coming in via the Kinesis stream.
  • Set up Kafka Producers across multiple servers to generate data at 10-second intervals and created interactive dashboards and graphs in Splunk to monitor pipeline activity, analyze streaming data, and extract actionable insights.
  • Leveraged Aerospike robust storage capabilities for managing metadata and other critical information generated by AWS Glue Crawlers, ensuring quick access and updates.
  • Leveraged Netezza for its high-performance data warehousing capabilities, efficient data loading, and advanced analytics, seamlessly integrating with the AWS ecosystem to ensure the project's success.
  • Acquired a thorough understanding of the Teradata architecture and all its elements, including database administration, security, performance optimization, utilities, and the Teradata Database Management System's capabilities.
  • Encompassed PostgreSQL database administration, encompassing duties like setup, adjustment, enhancement, and upkeep.
  • Utilized Jira to break down the project into manageable user stories and tasks.
  • Acquired a great deal of expertise using Teradata's suite of tools, including Teradata Manager, Teradata Studio, and Teradata Queryman.
  • Assign user roles and set up access controls for the pipelines using AWS Identity and Access Management (IAM).
  • Orchestrated automated tasks using Apache Airflow and stored the transformed data in AWS Redshift to ensure availability and accessibility for end users.
  • Implemented Confluence ensured clear communication and fostered collaboration amongst team members working on different parts of the data pipeline.
  • Developed interactive dashboards and visualizations in AWS QuickSight to illustrate key features and configured AWS CloudWatch with automated trigger-based actions for proactive monitoring and alerting.
  • Used the Scrum technique in conjunction with the Agile methodology to advance the project in a timely and organized manner.
  • Environment: AWS RDS, AWS Data Pipeline, AWS Kinesis, Apache Airflow, AWS Glue crawlers, PySpark, Glue jobs, Lambda functions, Glue triggers, AWS Redshift, PostgreSQL, AWS Cloud Watch, Java, Kafka, Netezza, Aerospike, Jira, Teradata manager, Teradata Studio, Teradata Query man, Confluence, Agile-Scrum.

Data Engineer

Arcserve
Hyderabad, IN
12.2017 - 10.2020
  • Company Overview: Clients provide services in healthcare services in growing regions, Rural and small towns.
  • Involved in Azure ADF Pipeline development as per the Source to Target mapping documents.
  • Involved in performance tuning of Pipeline which are processing bigger volumes.
  • Reprocessing rejected data and ensuring that there is no loss of data.
  • Worked on various complex transform like Change capture, SCD, lookup, join, merge & Aggregate.
  • Responsible for updating the status on time to all stakeholders.
  • Participate in Scrum calls, Backlog grooming, Iteration Planning and retrospective meetings.
  • Participate in technical discussion with the technical lead from customer side.
  • Configured the pipeline with parameters/parameter sets like Database credentials, Auditing Dates and important directory path etc.
  • Prepare various store procedures to load data in respected tables.
  • Preparing Unit test cases and Unit testing the Jobs.
  • Involving in the code deployment using GIT.
  • Working under different modules to achieve the project deliveries possible.
  • Providing technical help to other teams Defined and implemented approaches to load and Extract data from database.
  • Automating the manual task for validating the file and reporting to concern team in case of any Issues.
  • Clients provide services in healthcare services in growing regions, Rural and small towns.
  • Technical Platform: Operating System: Windows 10, Tools and Technologies: Azure ADF, Azure Data Lake, MS SQL Server, Code Repository: GitHub, Ticketing Tool: Service Now.

Data Engineer

S&P Global Capital IQ
Hyderabad, IN
05.2014 - 11.2017
  • As a Data engineer of this project, I have been involved in SSIS packages to develop reporting database with the help of SSRS reporting tool and unit Testing.
  • Handling multiple clients for ELITE (an ERP Tool) and migration of data for the existing client.
  • Conversion of 3E clients to the new architecture in terms of data movement.
  • Discuss with the Client and gathering the information of customer database and accordingly update the code and migrate the data.
  • Migration happens in phases, after each migration database is send for UAT and migration packages are freeze.
  • Optimizing the SQL code for the faster execution and check the execution plan time to time if the heavy data is not getting stuck while executing.
  • Balance the financial and company data with the Existing environment to the new environment.
  • Technical Platform: Operating System: Windows 10, Tools and Technologies: SSIS, MS SQL Server 2008/2012, SQL Profiler, 3E tool, Code Repository: GitHub.

Education

Master of Science - Computers

JNTU Hyderabad
Hyderabad
06-2014

Skills

  • Databricks
  • Hadoop Distributed File System (HDFS)
  • Hive
  • Pig
  • Scoop
  • MapReduce
  • Spring Boot
  • Flume
  • YARN
  • Cloudera
  • MLlib
  • Azure Data Factory
  • AWS
  • Azure Databricks
  • Azure Data Explorer
  • Azure DevOps
  • Azure HDInsight
  • Jenkins
  • Salesforce
  • AWS Quick sight
  • Tableau
  • Linux
  • Big Query
  • Ab initio
  • Bash Shell
  • Snowflake
  • Hadoop
  • Unix
  • Power BI
  • Vertex AI
  • Kubernetes
  • SAS
  • GKE
  • Docker
  • Unity catalog
  • Dynamics 365
  • PostgreSQL
  • Jira
  • Confluence
  • Netezza
  • Avro
  • Snow Pipe
  • Django
  • Snow SQL
  • Parquet
  • RC
  • TPT
  • Crystal Reports
  • Microsoft SQL Server 2005/2019
  • MySQL
  • DB2
  • Flat Files
  • Oracle 11g/10g/9i/8i
  • Microsoft Azure
  • Google Cloud (GCP)
  • Amazon Web Services (AWS)
  • Azure Data Lake
  • Data Bricks
  • Azure Monitoring
  • Active Directory
  • Synapse
  • Key Vault
  • SQL Azure
  • GCP Data Lake
  • Cloud Dataflow
  • Cloud Composer
  • Stack driver Monitoring
  • IAM (Identity & Access Management)
  • Cloud Functions
  • Cloud SQL
  • Secret Manager
  • Amazon S3
  • Amazon Redshift
  • AWS Glue
  • Step Functions
  • CloudWatch
  • IAM
  • Lambda
  • Amazon RDS
  • Secrets Manager
  • Snowflake Data Cloud
  • Streams & Tasks
  • Time Travel
  • Secure Data Sharing
  • Role-Based Access Control (RBAC)
  • Snowflake Marketplace
  • Virtual Warehouses
  • HBase
  • Cassandra
  • MongoDB
  • All versions of Windows
  • UNIX
  • LINUX
  • MS Office (Word/Excel/Power Point/Outlook)
  • Terraform
  • ALB
  • SQL
  • PL/SQL
  • Python
  • HiveQL
  • Scala
  • Java
  • PowerShell

Personal Information

Title: Senior Data Engineer

Timeline

Senior Data Engineer

CYIENT Ltd
10.2024 - Current

Data Engineer

CYIENT Ltd
10.2022 - 09.2024

Data Engineer

Wipro
11.2020 - 09.2022

Data Engineer

Arcserve
12.2017 - 10.2020

Data Engineer

S&P Global Capital IQ
05.2014 - 11.2017

Master of Science - Computers

JNTU Hyderabad
Ram Kolla