Summary

Overview

Work History

Education

Skills

Websites

Timeline

KISHOR SANCHINA

Allentown,USA

Summary

Seasoned and cloud-savvy Big Data Engineer and Cloud Data Developer with over 13 years of experience in building scalable, cloud-native data platforms using Snowflake, Databricks, and Apache Spark. Proven ability to deliver high-performance data pipelines, data lakehouses, and ETL/ELT solutions on both AWS and Azure. Experienced in working across multiple industries, delivering actionable insights from massive datasets through modern big data ecosystems, and enterprise-grade data platforms.

Overview

years of professional experience

Work History

Senior Cloud Engineer

Verizon

Allentown, PA

02.2023 - Current

Engineered real-time streaming ingestion pipelines utilizing Google Pub/Sub and Cloud Functions for efficient data loading into BigQuery with minimal latency.
Engineered declarative ELT pipelines employing Google Dataform for streamlined data transformation and schema management in BigQuery.
Developed reusable SQL data models and automated job scheduling utilizing Cloud Composer (Airflow) with environment-based deployments.
Managed GCS-based data lake and integrated it with BigQuery using federated queries and ingestion workflows.
Implemented robust data governance practices with table partitioning, clustering, access control policies, and data retention rules.
Developed and maintained ETL jobs using Python and SQL to transform data from GCS to BigQuery.
Integrated third-party APIs, flat files, and internal app data into the analytics ecosystem using batch and streaming approaches.
Worked with data scientists and analysts to build and support Looker dashboards on top of BigQuery datasets.
Developed data pipelines using Azure Data Factory and custom Python scripts to load data into Snowflake.
Implemented security policies in Snowflake, including RBAC, object-level permissions, and data masking.
Created dynamic dashboards with Power BI, connected to Snowflake using DirectQuery mode.
Built custom web-based analytics tools using Python and JavaScript for real-time visualization.
Developed monitoring dashboards with Cloudera Manager and Grafana to visualize cluster health metrics.
Designed and implemented Elasticsearch clusters, both on-premises and in the cloud.
Set up Kibana dashboards for real-time data visualization.

Cloud Engineer

Athena

Allentown, PA

03.2021 - 01.2023

Designed and maintained highly available EMR clusters on AWS to support real-time and batch processing.
Automated infrastructure provisioning using Terraform and Ansible, reducing deployment time by 70%.
Led migration of on-prem Hadoop cluster to AWS cloud-based EMR environment.
Implemented centralized logging using ELK and integrated monitoring dashboards with Grafana.
Supported Spark and Kafka workloads for streaming data pipelines.
Built CI/CD pipelines using Jenkins and GitHub Actions for data platform releases.

Big Data Administrator

Reliance Jio

Bangalore, IND

01.2018 - 12.2020

Designed and integrated scalable, secure Azure cloud infrastructure solutions.
Directed a complex, multi-year initiative to transition legacy on-premise systems to Azure.
Designed and maintained Azure Virtual Machines, Blob Storage, Networks, and SQL Databases.
Spearheaded the implementation of Azure Security Center, Azure Active Directory (AAD), RBAC, and Azure Firewall for regulatory compliance with GDPR and HIPAA.
Automated application deployment and infrastructure provisioning with Terraform, ARM Templates, and Azure Automation.
Conducted an analysis of cloud usage using Azure Cost Management and Azure Advisor to identify cost-saving opportunities.
Implemented Azure Stack hybrid cloud solutions for seamless on-premises infrastructure and cloud integration.
Enhanced Azure SQL Database and Cosmos DB performance through implementing indexing, partitioning, and caching strategies.
Implemented strategies ensuring disaster resilience with robust, Azure-based recovery methods.
Ensured cloud-based systems' adherence to security policies and compliance standards in collaboration.
Automated provisioning and configuration management of Kubernetes clusters using Terraform and Helm.
Enhanced the efficiency of Kubernetes clusters through resource allocation optimization.
Enhanced security through RBAC, network policies, and HashiCorp Vault integration for centralized secrets management, ensuring secure and compliant systems.
Lead the deployment and management of a large-scale Cloudera Hadoop ecosystem (over 1,500 nodes) to support data ingestion, processing, and analytics across multiple departments.

Big Data Administrator

Etisalat Telecommunication

India

11.2016 - 06.2017

Working on Gigabytes of data, multi cluster environment almost 50 + nodes
Kerberized CDH cluster before installing services like Ranger, YARN, HDFS, Impala etc.;
Developing Scripts and Batch Job to schedule various Hadoop Program
Installed the latest version(7.1.7) of CDH and migrated data and metadata from HDP to CDH
Replaced existing map-reduce jobs and Hive scripts with Spark DataFrame transformation and actions for the faster analysis of the data
Responsible for writing Hive queries for data analysis to meet the business requirements
Prepared automated scripts to collect metrics from Yarn, GC, NameNodes, Data Nodes, HDFS, etc
Providing on-call support for daily run jobs based on ticketing systems across the clusters
Good experience with data pipeline and turning the large volume of data into business value
Good understanding and working experience of distributed processing framework
Worked on creating the automatic job flows using Crontab and MySQL
Installed independent Kafka cluster with zookeeper, yahoo Kafka manager, Zoo-navigator, LinkedIn Kafka monitor, Prometheus and Grafana
Monitoring, Performance tuning of Hadoop clusters
Performing Distributed copy between clusters
Creating Snapshot policies and Recovery from node failures
Environment: CDP 7.1.7, Hive, Pig, HDFS, Ranger, Python, Solr, MongoDB, MapReduce, GCP, Java, BigQuery, Sqoop, Spark, Scala, REST API, SQL, Hcatalog, Oozie, Hue, ORC, JSON, ZooKeeper, Linux CentOS, ServiceNow, MySQL, SQL server

BigData Engineer

Hewlett Packard Enterprise

India

12.2015 - 11.2016

Working on Petabytes of data, multi cluster environment almost 100 + nodes
Installed the latest version (7.1.7) of CDH and migrated data and metadata from HDP to CDH
Replaced existing map-reduce jobs and Hive scripts with Spark DataFrame transformation and actions for the faster analysis of the data
Kerberized CDH cluster before installing services like Ranger, YARN, HDFS, Impala etc
Developing Scripts and Batch Job to schedule various Hadoop Program
Responsible for writing Hive queries for data analysis to meet the business requirements
Prepared automated scripts to collect metrics from Yarn, GC, Name Nodes, Data Nodes, HDFS, etc
Providing on-call support for daily run jobs based on ticketing systems across the clusters
Installed Apache atlas(2.0.0), Apache tomcat(10.0.16), Apache impala(3.0.0), Apache Solr(8.11.1), Apache ranger(2.1.0), Apache storm(2.0.0), Anaconda (2020.11) and python (3.6) packages additionally installed NoSQL databases like MongoDb, Cassandra and PostGre from end to end on bare metal servers as well as cloud
Installed independent Kafka cluster with zookeeper, yahoo Kafka manager, Zoo-navigator, LinkedIn Kafka monitor, Prometheus and Grafana
Monitoring, Performance tuning of Hadoop clusters
Performing Distributed copy between clusters
Creating Snapshot policies and Recovery from node failures
Performed Commissioning and Decommissioning of nodes and running balancer utility
Managing and reviewing Hadoop log files
Setting up Kerberos and troubling the TGT related issues
Setting up SSL/TLS and renewing the certificates
Setting up installing, configuring, Hadoop Security using Ranger
Resolving the incidents generated by Zendesk and users within SLA
Preparing monthly incidents reports and identifying the reoccurring issues
Automating, manual process which takes time and human efforts
Environment: HDP 3.0 and CDH 7.1.x, Spark, Map Reduce, Hive, Zookeeper, HBase, Flume, Sqoop, Kerberos, oozie, Ranger, Kafka, GitHub

Data Engineer

FlyTxT

, India

02.2015 - 09.2015

Extract and load the data from relational database and mainframe flat files into Hadoop eco system using Sqoop
Integrated data from disparate sources such as DB2, Oracle, and Mainframes etc
To Hadoop using Sqoop and Hive
Developed the data points for MDE dashboard using Hadoop stack technologies
Worked on NoSQL databases like MongoDB and good experience in data ingestion into databases
Worked on performing ETL, ELT using Pig, Hive, Python and MapReduce
Experience with Hadoop streaming and written python scripts for processing the data using Hadoop streaming
Worked on various compressions formats like Snappy, Bzip2, Parget, Gzip etc
Extensively worked on troubleshooting the issues in the Hadoop environment and providing the effective solutions
Worked on an application to schedule and run the jobs using Shell scripts, MySQL and Cron
Performing data validation using Pig scripts, HiveQL and SQLs
Good experience with data pipeline and turning the large volume of data into business value
Good understanding and working experience of distributed processing framework
Worked on creating the automatic job flows using Crontab and MySQL
Experience in scrum agile software development framework and involved in daily scrum meetings
Utilized Hadoop, Hive, Pig, HDFS, Python, Solr, MongoDB, MapReduce, GCP, Java, BigQuery, Sqoop, Spark, Scala, REST API, SQL, Teradata, Eclipse, H catalog, Oozie, Hue, Avro, JSON, ZooKeeper, Linux CentOS, Maven, ServiceNow, MySQL, SQL Assistant
Environment: HDP 2.6, Spark, Map Reduce, Hive, Pig, Zookeeper, HBase, Flume, Sqoop, Kerberos, Sentry, Cent OS

System Admin

Radien Softcom

India

10.2010 - 02.2015

Extract and load the data from relational database and mainframe flat files into Hadoop eco system using Sqoop
Developed a data integration framework to get claims information into Hadoop to identify historical trends and patterns
Worked with on-shore and off-shore programmers and mentored new team members
Proficient in writing shell scripts for data loads and data manipulation
Optimizing Hadoop Map Reduce code, Hive/Pig scripts for better scalability, reliability and performance
Integrate the data from multiple data sources to Hadoop using Pig, Hive and Impala
Implemented Partitioning, Dynamic Partitions, Buckets in Hive
Managing and writing the workflows for Map Reduce and Hive programs using Oozie
Worked on Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Input & output formats
Experience with different file formats like Sequence files, Avro, Parquet, RC, ORC and Text
Communicating with clients to understand specific data extraction requirements
Planning of Extraction transformation and loading process for various data sources
Designing of new data-pipes and automating the process
Worked on copying data from primary cluster to secondary cluster using Distcp
Planning of extraction transformation and data loading process for various data resources
Designed shell script that automatically sends notification to users if HDFS storage crossed the threshold limit
Providing resolution for end users tickets
Maintaining existing data-pipes
Performing analysis on data
Providing technical guidance to end-users
Providing mechanisms to optimize client queries/jobs
Documenting design specifications and technical best practices
Mentoring team members
Environment: HDP 2.5, HDFS, MapReduce, Yarn, Spark, Hive, Pig, Flume, Oozie, Sqoop, Ambari

Education

Bachelor of Science - Information Technology

Kakatiya University

Hyderabad

03-2006

Skills

Programming and AI Frameworks: Python, TensorFlow, PyTorch, SQL, and Shell Scripting
Security & Governance: Kerberos, Ranger, Sentry, TLS/SSL, Active Directory Integration, Data Masking
Data Pipelines & Orchestration: Airflow, Luigi, Jenkins, GitHub Actions, Terraform
Cloud and Big Data: AWS, GCP, Azure, CDP, HDP
Databases: PostgreSQL, MySQL, Teradata, and Oracle

Generative AI and LLMs: OpenAI API, LangChain, Retrieval-Augmented Generation (RAG), and fine-tuning LLMs
MLOps & Model Deployment: Kubernetes, Docker, MLflow, TensorFlow Serving, and TorchServe
Software Development and APIs: FastAPI, Flask, RESTful APIs, Microservices, CI/CD
Monitoring: Splunk, Prometheus, Grafana, Elasticsearch

Websites

linkedin.com/in/kishor-s-908897252

Timeline

Senior Cloud Engineer

Verizon

02.2023 - Current

Cloud Engineer

Athena

03.2021 - 01.2023

Big Data Administrator

Reliance Jio

01.2018 - 12.2020

Big Data Administrator

Etisalat Telecommunication

11.2016 - 06.2017

BigData Engineer

Hewlett Packard Enterprise

12.2015 - 11.2016

Data Engineer

FlyTxT

02.2015 - 09.2015

System Admin

Radien Softcom

10.2010 - 02.2015

Bachelor of Science - Information Technology

Kakatiya University

KISHOR SANCHINA

Summary

Overview

Work History

Senior Cloud Engineer

Cloud Engineer

Big Data Administrator

Big Data Administrator

BigData Engineer

Data Engineer

System Admin

Education

Bachelor of Science - Information Technology

Skills

Websites

Timeline

Senior Cloud Engineer

Cloud Engineer

Big Data Administrator

Big Data Administrator

BigData Engineer

Data Engineer

System Admin

Bachelor of Science - Information Technology

Similar Profiles

Mohd SulemaanMohd Sulemaan

James W ScoffieldJames W Scoffield

Mariam MagareMariam Magare

Yameenah MayfieldYameenah Mayfield

James MoseleyJames Moseley