Summary
Overview
Work History
Education
Skills
Timeline
Generic

Swathi M

Aurora,US

Summary

Results-driven Data Engineer with extensive experience in designing, implementing, and optimizing data solutions using AWS and Snowflake. Adept at architecting scalable data pipelines, developing ETL processes, and ensuring robust data governance and security. Proven track record of leveraging AWS services like Redshift, S3, and Glue to support large-scale data operations and drive insightful business decisions. Expertise in Snowflake's data warehousing capabilities, including data modeling, performance tuning, and cost optimization. Strong analytical skills combined with a collaborative approach to problem-solving and a commitment to delivering high-quality, data-driven solutions. Passionate about staying current with emerging technologies and industry best practices to continuously improve data engineering processes.

Overview

6
6
years of professional experience

Work History

AWS/SNOWFLAKE Data Engineer

Office Depot Inc
02.2023 - Current
  • Developed and implemented scalable and efficient data pipelines using AWS services such as S3, Glue, Kinesis, and Lambda
  • Worked with data scientists and business stakeholders to understand their requirements and design data solutions that meet their needs
  • Designed and implemented data models and data warehousing solutions using AWS services such as Redshift and Athena
  • Developed and implemented data processing solutions using AWS Lambda and Apache NiFi
  • Designed and implemented data governance policies and data quality frameworks
  • Developed and implemented data security solutions using AWS services such as IAM, KMS, and S3 bucket policies
  • Worked with AWS databases such as RDS, DynamoDB, and Aurora, and implemented solutions for data replication and synchronization
  • Designed and implemented data archiving and backup solutions using AWS services such as S3 and Glacier
  • Developed and implemented data visualization solutions using AWS Quick Sight or third-party tools such as Tableau and Power BI
  • Implemented real-time data processing solutions using AWS Kinesis and AWS Lambda
  • Developed and maintained data processing workflows using Apache Airflow and AWS Glue
  • Worked with AWS machine learning services such as Sage Maker and Comprehend
  • Optimized database performance and managed ETL processes
  • Managed AWS infrastructure and resources using AWS CloudFormation or Terraform
  • Worked with DevOps teams to implement CI/CD pipelines for data solutions
  • Worked with AWS cost optimization strategies and implemented cost optimization measures
  • Experience working with AWS VPCs, subnets, security groups, and load balancers
  • Knowledge of AWS networking concepts and experience implementing and managing AWS Direct Connect, VPN, and Route53
  • Performed Hive test queries on local sample files and HDFS files
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop
  • Generating various capacity planning reports (graphical) using Python packages like NumPy, matplotlib
  • Analyzing various logs that are been generating and predicting/forecasting the next occurrence of event with various Python libraries
  • Hands-on experience with Snowflake utilities, Snow SQL, Snow Pipe, Big Data model techniques using Python
  • ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake
  • Designed and implemented ETL pipelines using Talend for data extraction, transformation, and loading into Snowflake..
  • Developed Snowflake schemas and optimized data models for efficient querying and reporting.
  • Tuned performance by optimizing SQL queries, managing clustering keys, and utilizing materialized views.
  • Automated data ingestion with Snowpipe and Python scripts, reducing manual intervention and increasing data freshness.
  • Ensured data security and compliance with role-based access control and data masking.
  • Centralized customer data, providing a unified source of truth and improving data accessibility.

Data Engineer

CVS Pharmacy
03.2022 - 02.2023
  • Responsible for Business Analysis and Requirements Collection.
  • Involved in SDLC Requirements gathering, Analysis, Design, Development, andTesting of application using Agile Methodology.
  • Created Linked Services for multiple source system (i.e.: Azure ,SQL Server, ADLS,BLOB, Rest API).
  • Created pipelines to extract data from on-premises source systems to Azure clouddata lake storage. Extensively worked on copy activities and implemented the copy behaviors, such as flatten hierarchy, preserve hierarchy, and merge hierarchy. Implemented error handling concept through copy activity.
  • Configured logic apps to handle email notification to the end users and key shareholders with the help of web services activity.
  • Created dynamic pipeline to handle multiple sources, extracting to multiple targets, and extensively used Azure Key Vaults to configure the connections in linked services.
  • Developed and optimized PySpark and SparkSQL scripts within Azure Synapse notebooks to perform complex data transformations aligned with business requirements.
  • Implemented advanced performance tuning techniques on Spark applications within Synapse, resulting in an improvement in processing efficiency compared to original job runtimes.
  • Engineered intricate data queries utilizing PySpark and SparkSQL in Azure Synapse Spark pools to meet diverse business needs.
  • Appointed as the primary point of contact for production support, overseeing the deployment and operation of a new data framework leveraging Azure Data Factory and Azure Databricks.
  • Led the migration of on-premises SQL Server data to Azure Data Lake Storage Gen2 (ADLS Gen2) using Azure Data Factory (ADF V2) and Azure Synapse Pipelines.
  • Extracted data from various sources to analyze the shelf life of products and reported findings to customers to help improve their business.
  • Analyzed store representatives' work patterns over specific time periods by creating Power BI reports.
  • Collaborated closely with cross-functional teams to define test data


Data Analyst

Thrymr Software
12.2018 - 01.2021
  • Interact with legacy business users to understand business process flow, business logic, and to assess major data objects, data volume, and level of effort required for data conversion and migration.
  • Applied Statistical methods with Excel, SQL, R, and Python to analyze large datasets to identify trends and patterns.
  • Worked on Informatica Power Center tools, such as Repository Manager, Workflow Manager, and Workflow Monitor.
  • Utilized RUP to create use cases, activity, class diagrams, and workflow process diagrams.
  • Verified the correlation between the UML diagrams and developed detailed diagrams.
  • Validated the system's end-to-end testing to meet the approved functional requirements.
  • Blueprinted technical and operational training manuals and led PowerPoint/Visio presentations for the development of financial and technical applications for the departmental user base and for audit purposes.
  • Responsible for providing fiscal and economic solutions through financial modeling and cost basis analysis.
  • Increased process efficiencies and reduced development cycle time across the department by spearheading the implementation of the full lifecycle software development methodology.
  • Documentation and update of the business requirements in close coordination with data stewards and key stakeholders about the ongoing approach and mapping of the objects and data elements across the systems.
  • Write multiple SQL and PL/SQL scripts, stored procedures, etc. To process and

standardize the data, and conduct checks across different databases to

determine referential integrity and consistency of data.

  • Develop framework to extract and dump the data from various legacy sourcesinto the staging area and create scripts and setup to extract, parse, cleanse, andmassage the data to standardize it.
  • Creation of the data structure and staging area for dumping the legacy data andinterlinking all the segment data for in-depth analysis and closer look.
  • Conduct post-migration check and acceptance testing to validate the quality,accuracy, and consistency of the data in the target database.

Education

Master of Science - Computer And Information Sciences

Texas A&M University - Kingsville
Kingsville, TX
05.2022

Skills

TECHNICAL SKILLS

DATA BASE: AWS,RDS,Aurora, MongoDB,MySQL, Teradata, Oracle10g, DB2, SQL server

CLOUD/SaaS: AWS, GCP,Azure, Snowflake ,SnowSQL,Redshift, snowpipe

Programming Languages: Python, R, SQL, HTML, JavaScript, PySpark, Scala
Methodology: SDLC, Agile, Waterfall
Version Control: GitHub, MS Office, OLAP, OLTP
DWBI Tools: Informatica power center 104, SAS DI, AWS Glue, Tableau, Power Bi
Environment: Unix, Linux, Windows

Timeline

AWS/SNOWFLAKE Data Engineer

Office Depot Inc
02.2023 - Current

Data Engineer

CVS Pharmacy
03.2022 - 02.2023

Data Analyst

Thrymr Software
12.2018 - 01.2021

Master of Science - Computer And Information Sciences

Texas A&M University - Kingsville
Swathi M