Summary
Overview
Work History
Education
Skills
Timeline
Generic

Sai Ganesh Vasam

Trenton,NJ

Summary

  • Over all 10+ years of professional IT experience with 3+ Years of AWS experience in S3, EC2, EMR, Redshift and 7+ years of experience as AWS Data Engineer in ingestion, storage, querying, processing and analysis of big data.
  • Hands-on experience in architecture and implementing Hadoop clusters on Amazon (AWS), using EMR, S2, S3, Redshift, Cassandra, DynamoDB, PostgreSQL, SQL.
  • Experience in code/build/deployment tools like Git, SVN, Maven, SBT, Jenkins.
  • Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic MapReduce.
  • Maintained Hadoop Cluster on AWS EMR. Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Very good Knowledge and experience in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Worked on Migration POC to Move data from Existing On-prem Hive to Snowflake.
  • Experience in administration activities of RDBMS data bases, such as MS SQL Server.
  • Have worked upon EMR cluster job failure issues, debugging and fixing the issues.
  • ·Analyzed large data sets by running Hive queries.

Overview

11
11
years of professional experience

Work History

AWS Data Engineer

Anthem
04.2023 - Current
  • Worked on end-to end Bitbucket metric Dashboard creation capturing the metrics across portfolio.
  • Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3buckets, performed folder management in each bucket, managed logs and objects within each bucket.
  • Extracted data from Bitbucket API using python code.
  • Made Rigorous testing and code changes based on the needs of the final dashboard creation.
  • Developed AWS Glue jobs and integrated the code to run and create the final file in desired AWS S3 location.
  • Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic MapReduce.
  • Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
  • Implemented a natural time system with Kafka and Zookeeper.
  • Demonstrated expertise in utilizing SBT as the primary build tool for Scala projects.
  • Developed and implemented custom tasks within SBT to automate specific project workflows, enhancing overall productivity.
  • Successfully integrated and configured SBT plugins to extend the functionality of the build tool according to project needs.
  • Worked closely with Tableau team to reflect the data in the Dashboard.
  • Created and executed AWS step functions to load the data from hive tables.
  • Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on AWS.
  • Written Programs in Spark using Python for Data quality check.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Experience in Amazon web services (AWS) cloud like S3, EC2, EMR, Lambda, Glue, Redshift and Athena and in Microsoft Azure.
  • Maintained Hadoop Cluster on AWS EMR. Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Loading Monthly and Adhoc basis client report and dashboard for Health-related claims data using Apache Spark, Scala, AWS EMR and AWS Lambda.
  • Framework is also used for data replication from Sources like RDS and S3 to target File-system(S3).
  • Have worked upon EMR cluster job failure issues, debugging and fixing the issues.
  • Have also worked upon Step Function to create EMR cluster and execute the jobs in EMR cluster.
  • Analyzed large data sets by running Hive queries.

AWS Cloud Engineer

Tech Mahindra
03.2018 - 02.2023
  • Enhanced data quality by refining ETL processes using Python and AWS Glue, reducing data inaccuracies by 15%
  • Managed 15 AWS accounts, including the setup and configuration of EC2 instances, RDS databases, VPCs, Elastic Load Balancers (ELBs), Cloudfront distributions, Route53 hosted zones & health checks.
  • Developed 8 automation tools that streamlined infrastructure deployment processes using Bash scripting language.
  • Collaborated with the software development team to deploy software updates more frequently due to streamlined AWS processes.
  • Developed a real-time analytics solution using AWS Kinesis and AWS Lambda, streamlining decision-making processes.
  • Defined information models supporting data assets for complex data structures represented through relational and hierarchical databases.
  • Participated in system development life cycle from requirements analysis through system implementation.
  • Optimized cloud resource usage through continuous monitoring and cost analysis.
  • Designed and implemented a high-throughput data pipeline processing 10TB of data daily using Amazon EMR, reducing processing time by 30%.
  • Decreased data loading times by 40% by optimizing Amazon Redshift configurations.
  • Established robust encrypted data pipelines with AWS KMS, enhancing data security.
  • Accelerated data extraction from various sources using Amazon EMR, saving 15 hours per week.
  • Scaled data processing capabilities by 3X by efficiently partitioning data in Amazon S3.
  • Led a cross-functional team of 7 to migrate on-premise databases to AWS Redshift, improving query performance by 2X.
  • Initiated a data governance strategy using AWS Lake Formation, enhancing data security and compliance.


Data Engineer

NBC
08.2016 - 12.2017


  • Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
  • Used CI/CD tools for continuous delivery with minimal to no support from production support team.
  • Worked on importing, transforming large sets of structured semi-structured and unstructured data.
  • Involved in converting Hive/SQL queries into Spark transformations and actions using Spark SQL (RDDs and Data frames) in Python.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Created various hive external tables, staging tables and joined the tables as per the requirement.
  • Worked on data processing and transformations and actions in spark by using Python (Pyspark) language.
  • Implemented Spark SQL queries with Python for faster testing and processing of data.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Developed multiple MapReduce jobs in python for data cleaning and preprocessing.
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
  • Developed application using Eclipse and used build and deploy tool as Maven.
  • Hands on experience in AWS Cloud in various AWS services such as EC2, EMR, S3, and RDS.
  • Spins up different AWS instances including EC2-classic and EC2-VPC using Cloud Endure.
  • Worked in Agile environment in delivering the agreed user stories within the sprint time.

AWS Engineer

Daffodil
08.2013 - 06.2015
  • Worked as part of DevOps system team to deliver there leases to production environment in a timely manner.
  • Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation-using Jenkins along with Shell scripts to automate routine jobs.
  • Installed, Configured and Administered Jenkins Continuous Integration Tools.
  • Proposed, Implemented and maintained New Branching strategies for development teams to support trunk, development baseline codes along with several feature branches.
  • Managed environments DEV, SIT, QA, UAT and PROD for various releases and designed instance strategies.
  • Implemented CI/CD pipelines using AWS, which will spin up on demand instances as required.
  • Configured AWS EC2 Instances using AMI’s and launched instances with requirements of specific applications.
  • Creating the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system for all our products.
  • Managed build server and setup/monitor daily continuous builds and deployments to the Development, Test and Production environments.
  • Gather build specifications, written build/deployment instructions, and instruct teams in the operation of Continuous Integration and automated deployment tools.
  • Developed build and deployment scripts using MAVEN as build tool in Jenkins to move from one environment to other environments and also create new jobs and branches through Jenkins.

Education

Master of Science - Computer Science

SFBU
Fremont, CA
08.2016

Bachelor of Science - Computer Science

Mallareddy Engineering College
Hyderabad, IN
06.2013

Skills

  • Cloud Platforms: AWS, Microsoft Azure, Google Cloud Platform (GCP)
  • Operating Systems: Linux, Unix, Windows Server
  • Operating Systems: Linux, Unix, Windows Server
  • AWS Cloud Services: S3, EC2, EMR, Redshift, DynamoDB, Lambda, Glue
  • Big Data Tools: Apache Hadoop, Hive, Spark, Kafka, Pig, Zookeeper, YARN
  • Database Management: MySQL, Oracle, SQL Server, PostgreSQL
  • DevOps & CI/CD Tools: Docker, Jenkins, Git, Kubernetes, Ansible, Terraform, Puppet, Chef
  • Monitoring & Logging: CloudWatch, Prometheus, ELK Stack(Elasticsearch, Logstash, Kibana)
  • Cloud Security: IAM, Cognito, Security Groups, Network ACLs
  • Development IDEs: NetBeans, Eclipse IDE, IntelliJ
  • NoSQL Databases: HBase, Cassandra and MongoDB

Timeline

AWS Data Engineer

Anthem
04.2023 - Current

AWS Cloud Engineer

Tech Mahindra
03.2018 - 02.2023

Data Engineer

NBC
08.2016 - 12.2017

AWS Engineer

Daffodil
08.2013 - 06.2015

Master of Science - Computer Science

SFBU

Bachelor of Science - Computer Science

Mallareddy Engineering College
Sai Ganesh Vasam