Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

KISHOR SANCHINA

Allentown,USA

Summary

Seasoned and cloud-savvy Big Data Engineer and Cloud Data Developer with over 13 years of experience in building scalable, cloud-native data platforms using Snowflake, Databricks, and Apache Spark. Proven ability to deliver high-performance data pipelines, data lakehouses, and ETL/ELT solutions on both AWS and Azure. Experienced in working across multiple industries, delivering actionable insights from massive datasets through modern big data ecosystems, and enterprise-grade data platforms.

Overview

15
15
years of professional experience

Work History

Senior Cloud Engineer

Verizon
Allentown, PA
02.2023 - Current
  • Engineered real-time streaming ingestion pipelines utilizing Google Pub/Sub and Cloud Functions for efficient data loading into BigQuery with minimal latency.
  • Engineered declarative ELT pipelines employing Google Dataform for streamlined data transformation and schema management in BigQuery.
  • Developed reusable SQL data models and automated job scheduling utilizing Cloud Composer (Airflow) with environment-based deployments.
  • Managed GCS-based data lake and integrated it with BigQuery using federated queries and ingestion workflows.
  • Implemented robust data governance practices with table partitioning, clustering, access control policies, and data retention rules.
  • Developed and maintained ETL jobs using Python and SQL to transform data from GCS to BigQuery.
  • Integrated third-party APIs, flat files, and internal app data into the analytics ecosystem using batch and streaming approaches.
  • Worked with data scientists and analysts to build and support Looker dashboards on top of BigQuery datasets.
  • Developed data pipelines using Azure Data Factory and custom Python scripts to load data into Snowflake.
  • Implemented security policies in Snowflake, including RBAC, object-level permissions, and data masking.
  • Created dynamic dashboards with Power BI, connected to Snowflake using DirectQuery mode.
  • Built custom web-based analytics tools using Python and JavaScript for real-time visualization.
  • Developed monitoring dashboards with Cloudera Manager and Grafana to visualize cluster health metrics.
  • Designed and implemented Elasticsearch clusters, both on-premises and in the cloud.
  • Set up Kibana dashboards for real-time data visualization.

Cloud Engineer

Athena
Allentown, PA
03.2021 - 01.2023
  • Designed and maintained highly available EMR clusters on AWS to support real-time and batch processing.
  • Automated infrastructure provisioning using Terraform and Ansible, reducing deployment time by 70%.
  • Led migration of on-prem Hadoop cluster to AWS cloud-based EMR environment.
  • Implemented centralized logging using ELK and integrated monitoring dashboards with Grafana.
  • Supported Spark and Kafka workloads for streaming data pipelines.
  • Built CI/CD pipelines using Jenkins and GitHub Actions for data platform releases.

Big Data Administrator

Reliance Jio
Bangalore, IND
01.2018 - 12.2020
  • Designed and integrated scalable, secure Azure cloud infrastructure solutions.
  • Directed a complex, multi-year initiative to transition legacy on-premise systems to Azure.
  • Designed and maintained Azure Virtual Machines, Blob Storage, Networks, and SQL Databases.
  • Spearheaded the implementation of Azure Security Center, Azure Active Directory (AAD), RBAC, and Azure Firewall for regulatory compliance with GDPR and HIPAA.
  • Automated application deployment and infrastructure provisioning with Terraform, ARM Templates, and Azure Automation.
  • Conducted an analysis of cloud usage using Azure Cost Management and Azure Advisor to identify cost-saving opportunities.
  • Implemented Azure Stack hybrid cloud solutions for seamless on-premises infrastructure and cloud integration.
  • Enhanced Azure SQL Database and Cosmos DB performance through implementing indexing, partitioning, and caching strategies.
  • Implemented strategies ensuring disaster resilience with robust, Azure-based recovery methods.
  • Ensured cloud-based systems' adherence to security policies and compliance standards in collaboration.
  • Automated provisioning and configuration management of Kubernetes clusters using Terraform and Helm.
  • Enhanced the efficiency of Kubernetes clusters through resource allocation optimization.
  • Enhanced security through RBAC, network policies, and HashiCorp Vault integration for centralized secrets management, ensuring secure and compliant systems.
  • Lead the deployment and management of a large-scale Cloudera Hadoop ecosystem (over 1,500 nodes) to support data ingestion, processing, and analytics across multiple departments.

Big Data Administrator

Etisalat Telecommunication
India
11.2016 - 06.2017
  • Working on Gigabytes of data, multi cluster environment almost 50 + nodes
  • Kerberized CDH cluster before installing services like Ranger, YARN, HDFS, Impala etc.;
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Installed the latest version(7.1.7) of CDH and migrated data and metadata from HDP to CDH
  • Replaced existing map-reduce jobs and Hive scripts with Spark DataFrame transformation and actions for the faster analysis of the data
  • Responsible for writing Hive queries for data analysis to meet the business requirements
  • Prepared automated scripts to collect metrics from Yarn, GC, NameNodes, Data Nodes, HDFS, etc
  • Providing on-call support for daily run jobs based on ticketing systems across the clusters
  • Good experience with data pipeline and turning the large volume of data into business value
  • Good understanding and working experience of distributed processing framework
  • Worked on creating the automatic job flows using Crontab and MySQL
  • Installed independent Kafka cluster with zookeeper, yahoo Kafka manager, Zoo-navigator, LinkedIn Kafka monitor, Prometheus and Grafana
  • Monitoring, Performance tuning of Hadoop clusters
  • Performing Distributed copy between clusters
  • Creating Snapshot policies and Recovery from node failures
  • Environment: CDP 7.1.7, Hive, Pig, HDFS, Ranger, Python, Solr, MongoDB, MapReduce, GCP, Java, BigQuery, Sqoop, Spark, Scala, REST API, SQL, Hcatalog, Oozie, Hue, ORC, JSON, ZooKeeper, Linux CentOS, ServiceNow, MySQL, SQL server

BigData Engineer

Hewlett Packard Enterprise
India
12.2015 - 11.2016
  • Working on Petabytes of data, multi cluster environment almost 100 + nodes
  • Installed the latest version (7.1.7) of CDH and migrated data and metadata from HDP to CDH
  • Replaced existing map-reduce jobs and Hive scripts with Spark DataFrame transformation and actions for the faster analysis of the data
  • Kerberized CDH cluster before installing services like Ranger, YARN, HDFS, Impala etc
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Responsible for writing Hive queries for data analysis to meet the business requirements
  • Prepared automated scripts to collect metrics from Yarn, GC, Name Nodes, Data Nodes, HDFS, etc
  • Providing on-call support for daily run jobs based on ticketing systems across the clusters
  • Installed Apache atlas(2.0.0), Apache tomcat(10.0.16), Apache impala(3.0.0), Apache Solr(8.11.1), Apache ranger(2.1.0), Apache storm(2.0.0), Anaconda (2020.11) and python (3.6) packages additionally installed NoSQL databases like MongoDb, Cassandra and PostGre from end to end on bare metal servers as well as cloud
  • Installed independent Kafka cluster with zookeeper, yahoo Kafka manager, Zoo-navigator, LinkedIn Kafka monitor, Prometheus and Grafana
  • Monitoring, Performance tuning of Hadoop clusters
  • Performing Distributed copy between clusters
  • Creating Snapshot policies and Recovery from node failures
  • Performed Commissioning and Decommissioning of nodes and running balancer utility
  • Managing and reviewing Hadoop log files
  • Setting up Kerberos and troubling the TGT related issues
  • Setting up SSL/TLS and renewing the certificates
  • Setting up installing, configuring, Hadoop Security using Ranger
  • Resolving the incidents generated by Zendesk and users within SLA
  • Preparing monthly incidents reports and identifying the reoccurring issues
  • Automating, manual process which takes time and human efforts
  • Environment: HDP 3.0 and CDH 7.1.x, Spark, Map Reduce, Hive, Zookeeper, HBase, Flume, Sqoop, Kerberos, oozie, Ranger, Kafka, GitHub

Data Engineer

FlyTxT
, India
02.2015 - 09.2015
  • Extract and load the data from relational database and mainframe flat files into Hadoop eco system using Sqoop
  • Integrated data from disparate sources such as DB2, Oracle, and Mainframes etc
  • To Hadoop using Sqoop and Hive
  • Developed the data points for MDE dashboard using Hadoop stack technologies
  • Worked on NoSQL databases like MongoDB and good experience in data ingestion into databases
  • Worked on performing ETL, ELT using Pig, Hive, Python and MapReduce
  • Experience with Hadoop streaming and written python scripts for processing the data using Hadoop streaming
  • Worked on various compressions formats like Snappy, Bzip2, Parget, Gzip etc
  • Extensively worked on troubleshooting the issues in the Hadoop environment and providing the effective solutions
  • Worked on an application to schedule and run the jobs using Shell scripts, MySQL and Cron
  • Performing data validation using Pig scripts, HiveQL and SQLs
  • Good experience with data pipeline and turning the large volume of data into business value
  • Good understanding and working experience of distributed processing framework
  • Worked on creating the automatic job flows using Crontab and MySQL
  • Experience in scrum agile software development framework and involved in daily scrum meetings
  • Utilized Hadoop, Hive, Pig, HDFS, Python, Solr, MongoDB, MapReduce, GCP, Java, BigQuery, Sqoop, Spark, Scala, REST API, SQL, Teradata, Eclipse, H catalog, Oozie, Hue, Avro, JSON, ZooKeeper, Linux CentOS, Maven, ServiceNow, MySQL, SQL Assistant
  • Environment: HDP 2.6, Spark, Map Reduce, Hive, Pig, Zookeeper, HBase, Flume, Sqoop, Kerberos, Sentry, Cent OS

System Admin

Radien Softcom
India
10.2010 - 02.2015
  • Extract and load the data from relational database and mainframe flat files into Hadoop eco system using Sqoop
  • Developed a data integration framework to get claims information into Hadoop to identify historical trends and patterns
  • Worked with on-shore and off-shore programmers and mentored new team members
  • Proficient in writing shell scripts for data loads and data manipulation
  • Optimizing Hadoop Map Reduce code, Hive/Pig scripts for better scalability, reliability and performance
  • Integrate the data from multiple data sources to Hadoop using Pig, Hive and Impala
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Managing and writing the workflows for Map Reduce and Hive programs using Oozie
  • Worked on Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Input & output formats
  • Experience with different file formats like Sequence files, Avro, Parquet, RC, ORC and Text
  • Communicating with clients to understand specific data extraction requirements
  • Planning of Extraction transformation and loading process for various data sources
  • Designing of new data-pipes and automating the process
  • Worked on copying data from primary cluster to secondary cluster using Distcp
  • Planning of extraction transformation and data loading process for various data resources
  • Designed shell script that automatically sends notification to users if HDFS storage crossed the threshold limit
  • Providing resolution for end users tickets
  • Maintaining existing data-pipes
  • Performing analysis on data
  • Providing technical guidance to end-users
  • Providing mechanisms to optimize client queries/jobs
  • Documenting design specifications and technical best practices
  • Mentoring team members
  • Environment: HDP 2.5, HDFS, MapReduce, Yarn, Spark, Hive, Pig, Flume, Oozie, Sqoop, Ambari

Education

Bachelor of Science - Information Technology

Kakatiya University
Hyderabad
03-2006

Skills

  • Programming and AI Frameworks: Python, TensorFlow, PyTorch, SQL, and Shell Scripting
  • Security & Governance: Kerberos, Ranger, Sentry, TLS/SSL, Active Directory Integration, Data Masking
  • Data Pipelines & Orchestration: Airflow, Luigi, Jenkins, GitHub Actions, Terraform
  • Cloud and Big Data: AWS, GCP, Azure, CDP, HDP
  • Databases: PostgreSQL, MySQL, Teradata, and Oracle
  • Generative AI and LLMs: OpenAI API, LangChain, Retrieval-Augmented Generation (RAG), and fine-tuning LLMs
  • MLOps & Model Deployment: Kubernetes, Docker, MLflow, TensorFlow Serving, and TorchServe
  • Software Development and APIs: FastAPI, Flask, RESTful APIs, Microservices, CI/CD
  • Monitoring: Splunk, Prometheus, Grafana, Elasticsearch

Timeline

Senior Cloud Engineer

Verizon
02.2023 - Current

Cloud Engineer

Athena
03.2021 - 01.2023

Big Data Administrator

Reliance Jio
01.2018 - 12.2020

Big Data Administrator

Etisalat Telecommunication
11.2016 - 06.2017

BigData Engineer

Hewlett Packard Enterprise
12.2015 - 11.2016

Data Engineer

FlyTxT
02.2015 - 09.2015

System Admin

Radien Softcom
10.2010 - 02.2015

Bachelor of Science - Information Technology

Kakatiya University
KISHOR SANCHINA