Summary
Overview
Work History
Skills
Work Availability
Quote
Timeline
Generic
Vamshi Krishna Venkataswamy

Vamshi Krishna Venkataswamy

San Francisco,CA

Summary

As a Data enthusiast with 8+ years of experience in the field of Data Engineering/Business Intelligence who loves to harness the power of data and translating solutions to business problem through code, statistical and quantitative data analysis and gain meaningful insights to make key business decisions. Working in different positions with different clients gained the ability to identify and translate business requirement by paying attention to the specifics of the dataset recognizing the business issue and knowing exactly what to draw from the dataset, gathering all the information that is important for data analysis. Good grasper of domain knowledge regardless of the industry through effective research, learning and communication skills.

Overview

10
10
years of professional experience

Work History

Senior Data Quality Engineer

Salesforce
San Francisco, CA
06.2022 - Current
  • Performed data profiling and analysis of various objects in SalesForce.com (SFDC) and MS Access database tables for an in-depth understanding of source entities, attributes, relationships, domains, source data quality, hidden and potential data issues, etc.
  • Worked with peers to implement full Agile methodology to manage projects and reduced downtime by 28% in first four months.
  • Led complete lifecycle of visual analytical applications, from designing mock ups and storyboards to developing and deploying a complete production-ready application using Ataccama.
  • Lead data operations with a thorough knowledge and understanding of relational databases, data structures, and ability to cleanse, analyze, and connect data from multiple sources.
  • Implemented bulk API to extract huge volumes of data from the saleforce API's and address all performance issues encountered.
  • Managed all functions of the product catalog, including initial upload, periodic updates, and troubleshooting any data quality issues.
  • Perform analysis, design, development, and configuration to establish Ataccama ONE Data management tool, and processes for the organization.
  • Perform data profiling, complex sampling, statistical testing, and testing of reliability on data.
  • Identify incomplete data, improve quality of data, and integrate data from several data sources.
  • Interact with the users to gather the requirements, and functional specifications to develop ETL procedures that are consistent across application and systems.
  • Expertise in Salesforce-Informatica Integration. Experience in Sales Force CRM Configuration, Customization, Testing of applications.
  • Designed and developed Informatica ETL Interfaces to load data incrementally from SFDC, MS Access databases and Flat files into Staging schema.
  • Designed and developed Informatica ETL/SCD Interfaces to load data from staging schema into the Customer Reporting dimension tables.
  • Designed and developed Informatica ETL Interfaces to load data from staging schema into the Customer Reporting DW Fact tables.
  • Scheduled ETL jobs for entire DW load process using Tidal Scheduler.
  • Developed Source-Target Map documents to describe relationship between source and target data.
  • Developed ETL Specification Design document containing detailed information on ETL processing, mapping/workflow specifications, exception handling process, staging and data warehouse schemas, etc.
  • Configure the connections in informatica to extract the data from SFDC API’s.
  • Extracted data using informatica power center and dataloaders by connecting to SFDC cloud.
  • Tuned the Synchronization and Replication jobs to reduce the runtimes of the jobs and eliminate data contention in bulk data jobs.
  • Working on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.
  • Consulting on Snowflake Data Platform Solution Architecture, Design, Development and deployment focused to bring data driven culture across the enterprises.
  • Developed Talend MDM jobs to populate the claims data to data warehouse - star schema, snowflake schema, Hybrid Schema

Lead Data Engineer - Data Analytics & Engineering

Collective Health
Chicago, IL
05.2021 - 06.2022
  • Engineered and orchestrate data flows & pipelines in cloud environment using a progressive tech stack.
  • Organized system operating procedures to strengthen controls.
  • Identified needed business improvements and determined appropriate systems required to implement solutions.
  • Ingested and integrated data from large number of disparate data sources.
  • Construction and optimization of data warehouse and data pipelines.
  • Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
  • Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Create several types of data visualizations using Python and Looker.
  • Installed and configured apache airflow for workflow management and created workflows in python.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Implemented data load pipeline algorithms in python and SQL.
  • Use Python, SQL programming on a daily basis to perform transformations for applying business logic.
  • Created test scripts for regression tests, smoke tests and unit tests.
  • Integrated Airflow to perform recurrent ETL batch jobs.
  • Developed ETL pipelines for structured and unstructured data using Pandas, and FeatureTools.
  • Lead and implement master data management system to include data governance, data dictionary and data category.
  • Used AWS S3 Buckets to store file and injected files into Databricks using Auto loaders and run deltas using Data pipelines.
  • Set up scripts for creation of new snapshots and deletion of old snapshots in S3 using S3 CLI tools.
  • Created PySpark Scripts to improve performance of the application.
  • Generated ad-hoc SQL queries using joins, database connections, and transformation rules to fetch data from legacy DB2 and SQL Server database systems.
  • Establish single source of truth for various data to increase availability, accessibility and scalability.
  • Created and Maintained Logical Data Model (LDM) for project. Includes documentation of all Entities, Attributes, Data Relationships, Primary, and Foreign key Structures, Allowed Values, Codes, Business Rules, Glossary Terms, etc.
  • Developed and maintained a data dictionary to create metadata reports for technical and business purposes.
  • Worked closely with business team, development team and the quality assurance team to ensure that desired functionalities will be achieved by application.
  • Skilled at designing and implementing SQL queries using joins (inner joins, outer joins), unions, select within select, order by, group by and aggregate functions to extract data from different data sources.
  • Actively involved in walkthroughs and meetings with Project Team to discuss related business and project issues.

ETL Developer

Cisco
Charlotte, NC
01.2019 - 04.2021
  • Involved as a key team member for Requirement Analysis during design phase and interaction with business user.
  • Adhered to timelines to meet quality assurance targets.
  • Performing technical analysis, ETL design, development, and deploying on data as per business requirement.
  • Developed Talend Bigdata jobs to load heavy volume of data into S3 data lake and then into Snowflake data warehouse.
  • ETL development using EMR/Hive/Spark, Lambda, Scala, DynamoDB Streams, Amazon Kinesis Firehose, Redshift and S3.
  • Developed DDL scripts for around 185 tables in oracle and developed ETL to bring data into oracle environment from Hadoop.
  • Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Wrote various data normalization jobs for new data ingested into Redshift.
  • Design dimensional model, data lake architecture, data vault 2.0 on Snowflake and used Snowflake logical data warehouse for compute.
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Worked with AWS cloud platform and its features which include EC2, IAM, EBS CloudWatch and AWS S3
  • Deployed application using AWS EC2 standard deployment techniques and worked on AWS infrastructure and automation. Worked on CI/CD environment on deploying application on Docker containers.
  • Used AWS S3 Buckets to store file and injected files into Snowflake tables using Snow Pipe and run deltas using

Data pipelines.

  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Experience in writing SQL queries using SQL Server Management Studio to validate data integrity after Extract, Transform, and Load (ETL) processes in Enterprise Data Warehouse (EDW). Knowledge of Informatica PowerCenter - Data Quality. Exposure to other functionalities like metadata reporting, advanced transformations, & partitioning.
  • Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD.
  • Used Spark Streaming APIs to perform necessary transformations and actions on fly for building common learner data model which gets data from Kafka in near real-time and Persists into Cassandra.
  • Implementing strategy to migrate Netezza based analytical systems to Snowflake on AWS.
  • Worked with Architect on final approach and streamlined integration - Informatica with Snowflake.
  • Created various reports using Tableau and Qlikview based on requirements with BI team.

ETL Developer

Bank Of America
Dallas, TX
01.2017 - 07.2018
  • Designed integration tools to combine data from multiple, varied data sources such as RDBMS, SQL and big data b.
  • Designed and created ETL code installations, aiding in transitions from one data warehouse to another.
  • Collaborated with business intelligence staff at customer facilities to produce customized ETL solutions for specific goals.
  • Interpreted data models for conversion into ETL diagrams and code.
  • Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management.
  • Involved in designing, developing, testing and documenting an application to combine personal loan, credit card and mortgage from different countries and load data to Sybase database from hive database for Reporting insights.
  • Developed a architecture to move project from Abinitio to pyspark and scala spark.
  • Implemented enterprise grade platform (Mark logic) for ETL from mainframe to NoSQL(cassandra).
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing data in In Azure Databricks.
  • Building distributed data scalable using Hadoop.
  • Using Sqoop to load data from HDFS, Hive, MySQL and many other sources on daily bases.
  • Used Delta Lakes for time travelling as Data versioning enables rollbacks, full historical audit trails, and reproducible machine learning experiments.
  • Develop Spark applications using PySpark and spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming data uncover insight into customer usage patterns.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Using Enterprise data lake to support various use cases including Analytics, Storing and reporting of Voluminous, structured and unstructured, rapidly changing data.
  • Exported analyzed data into relational databases using Sqoop for visualization and to generate reports for BI team.
  • Converting data load pipeline algorithms written in python and SQL to scala spark and pyspark.
  • Mentor and support other members of team (both on-shore and off-shore) to assist in completing tasks and meet objectives.

Big Data Developer

DaVita Dialysis Corporation
Nashville, TN
01.2015 - 07.2016
  • Wrote software that scaled to petabytes of data and supported millions of transactions per second.
  • Worked in hybrid environment where legacy and data warehouse applications and new big-data applications co-existed.
  • Partnered with infrastructure engineers and system administrators in designing big-data infrastructures.
  • Engaged with business representatives, business analysts and developers and delivered comprehensive business-facing analytics solutions.
  • Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs.
  • Involved in designing, reviewing, optimizing data transformation processes using Apache Storm.
  • Experience in job management using Fair Scheduling and Developed job processing scripts using Control-M workflow.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Scoop.
  • Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded data into Spark RDD and do in memory data computation to generate output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Performed advanced procedures like text analytics and processing, using in-memory computing capacities of Spark using Scala.
  • Responsible for ingesting data from various source systems (RDBMS, Flat files, BigData) into Azure (Blob Storage) using framework model.
  • Hands on experience using Azure Data Factory (ADF) to perform data ingestion into Azure Data Lake Storage (ADLS).
  • Created Spark clusters and configured high concurrency clusters using Azure Databricks to speed up preparation of high-quality data.
  • Primarily involved in Data Migration process using SQL, Azure SQL, SQL Azure DW, Azure storage and Azure Data Factory (ADF) for Azure Subscribers and Customers.
  • Implemented Custom Azure Data Factory (ADF) pipeline Activities and SCOPE scripts.
  • Primarily responsible for creating new Azure Subscriptions, data factories, Virtual Machines, Sql Azure Instances, SQL Azure DW instances, HD Insight clusters and installing DMGs on VMs to connect to on premise servers.
  • Imported data from Kafka Consumer into HBase using Spark streaming.
  • Experienced in using Zookeeper and Oozie Operational Services for coordinating cluster and scheduling workflows.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformation and other during ingestion process itself.
  • Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked on a POC to compare processing time for Impala with Apache Hive for batch applications to implement former in project.
  • Worked extensively with Sqoop for importing metadata from Oracle.

Data Analyst

Tech Mahindra
Bangalore, India
05.2013 - 12.2014
  • Created various Excel documents to assist with pulling metrics data and presenting information to stakeholders for concise explanations of best placement for needed resources.
  • Produced monthly reports using advanced Excel spreadsheet functions.
  • Documented effective and replicable methods for extracting data and organizing data sources.
  • Analyzed transactions to build logical business intelligence model for real-time reporting needs.
  • Worked with business intelligence software and various reports to glean insights into trends and prospects.
  • Performed as a Data Analysis, Data Modeling, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Experienced in building applications based on large datasets in MarkLogic.
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Analyzed data lineage processes to identify vulnerable data points, control gaps, data quality issues, and overall lack of data governance.
  • Worked on data cleansing and standardization using cleanse functions in Informatica MDM.
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.
  • Validated and updated the appropriate LDM’s to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Maintained data model and synchronized it with changes to database.
  • Designed and developed use cases, activity diagrams, and sequence diagrams using UML.
  • Extensively involved in modeling and development of Reporting Data Warehousing System.
  • Designed database tables created table and column level constraints using suggested naming conventions for constraint keys.
  • Implemented enterprise grade platform (Mark logic) for ETL from mainframe to NOSQL (cassandra).
  • Used ETL tool BO DS to extract, transform and load data into data warehouses from various sources like relational databases, application systems, temp tables, flat files etc.
  • Wrote packages, procedures, functions, exceptions using PL/SQL.
  • Reviewed database programming for triggers, exceptions, functions, packages, procedure

Skills

    SQL

undefined

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Quote

Judge a man by his questions rather than his answers.
Voltaire

Timeline

Senior Data Quality Engineer

Salesforce
06.2022 - Current

Lead Data Engineer - Data Analytics & Engineering

Collective Health
05.2021 - 06.2022

ETL Developer

Cisco
01.2019 - 04.2021

ETL Developer

Bank Of America
01.2017 - 07.2018

Big Data Developer

DaVita Dialysis Corporation
01.2015 - 07.2016

Data Analyst

Tech Mahindra
05.2013 - 12.2014
Vamshi Krishna Venkataswamy