Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Timeline
Generic

Anandan Kaliyamoorthy

Memphis,TN

Summary

Cloud Data Engineer with 8+ Years of Proven Expertise accomplished professional specializing in end-to-end data engineering solutions. In-depth understanding of the business process models underlying these domains and leveraging skills in AWS Solutions Architect Associate and Tableau Desktop Specialist certifications, along with proficiency in Python, R, SQL, and DAX programming languages, and analytical tools such as Power BI, Tableau, AWS, Microsoft Azure, SQL Server Data Tools, RStudio, and Microsoft Excel. Extensive experience in business analysis, requirement gathering and eliciting efforts for application development projects including Azure DevOps practice involving Agile, Waterfall methodologies. Participated in different levels of the SDLC cycles – Analyze, Plan, Design, Develop, Test and Deploy in the waterfall and Agile model. Expert Data Manipulator proficient in Python, R, SQL, Excel, and Tableau for advanced data manipulation. Process Improvement Advocate track record of driving process improvements through comprehensive data analysis. High-Impact Reports and Dashboards proven experience in delivering high-impact reports and dynamic dashboards. Azure Data Engineer at Wip ro Technologies leading ETL data pipeline operations with a focus on Azure technologies. Technological Expertise hands-on experience with Azure Databricks, Data Factory, Snowflake, Spark (Scala, Python), and more. Designed and Developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena. ELT/ETL Pipeline Development expertise in developing ELT/ETL pipelines, optimizing data fetching, and deploying CI/CD frameworks. Programming Proficiency strong programming skills in Python, Scala, and proficiency in SQL for data warehousing and analysis. Experienced in Data Architecture and data modeling using Erwin, ER-Studio and MS Visio. Experience in coding SQL for developing Procedures, Triggers, and Packages. Experience in Dimensional Data Modeling Star Schema, Snow-Flake Schema, Fact and Dimensional Tables, concepts like Lambda Architecture, and Batch processing. Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations. Extensive knowledge of Data Modeling, Data Conversions, Data integration and Data Migration. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS)- Oracle, DB2 and SQL Server and from RDBMS to HDFS. Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments. Solid understanding of AWS, Redshift, S3, EC2 and Apache Spark, Scala process, and concepts. Work experience with UNIX/Linux commands, scripting and deploying the applications on the servers. Collaboration and Teamwork adept at collaborating with cross-functional teams and stakeholders in various Data Engineering roles. Currently pursuing a master’s in data science for continuous skill enhancement. Career Aspiration Committed to professional growth within the field of Data Engineering, aiming to contribute to the development and optimization of robust data infrastructure. Developed SSAS Cubes, including Aggregation, KPIs, Measures, Partitioning Cube, and Data Mining Models. Deployed and processed SSAS objects for effective Business Intelligence. Created ad hoc reports and reports with complex formulas. Executed queries on the database for Business Intelligence purposes. Developed Parameterized, Chart, Graph, Linked, Dashboard, Scorecards, and Cascading reports using SSRS on SSAS Cube. Demonstrated flexibility, enthusiasm, and a project-oriented approach as a team player. Exhibited excellent written, verbal communication, and leadership skills, contributing to the development of creative solutions for challenging client needs. Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills. Detail-oriented Data Engineer designs, develops and maintains highly scalable, secure and reliable data structures. Accustomed to working closely with system architects, software architects and design analysts to understand business or industry requirements to develop comprehensive data models. Proficient at developing database architectural strategies at the modeling, design and implementation stages. Background includes data mining, warehousing and analytics. Proficient in machine and deep learning. Quality-driven and hardworking with excellent communication and project management skills. Organized and dependable candidate successful at managing multiple priorities with a positive attitude. Willingness to take on added responsibilities to meet team goals. Hardworking and passionate job seeker with strong organizational skills eager to secure entry-level Data Engineer position. Ready to help team achieve company goals. Detail-oriented team player with strong organizational skills. Ability to handle multiple projects simultaneously with a high degree of accuracy. To seek and maintain full-time position that offers professional challenges utilizing interpersonal skills, excellent time management and problem-solving skills.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Azure Data Engineer

Chase Bank
01.2023 - Current
  • Performed all phases of software engineering including requirements analysis, application design, and code development & testing
  • Developed and maintained end-to-end operations of ETL data pipeline and worked with large data sets in azure data factory
  • Increased the efficiency of data fetching by using queries for optimizing and indexing
  • Wrote SQL queries using programs such as DDL, DML and indexes, triggers, views, stored procedures, functions, and packages
  • Designed and Developed ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena
  • Worked on Azure Data Factory to integrate data of both on-prem (MYSQL, Cassandra) and cloud (Blob storage, Azure SQL DB) and applied transformations to load back to snowflake
  • Deployed Data Factory for creating data pipelines to orchestrate the data into SQL database
  • Working on Snowflake modelling using data warehousing techniques, data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture
  • Analytical approach to problem-solving; ability to use technology to solve business problems using Azure data factory, data lake and azure synapse
  • Migrate data from on-premises to AWS storage buckets
  • Developed a python script to transfer data from on-premises to AWS S3
  • Developed script that will hit REST API's and extract data to AWS S3
  • Deployed application using AWS EC2 standard deployment techniques and worked on AWS infrastructure and automation
  • Worked on CI/CD environment on deploying application on Docker containers
  • Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions
  • Developed ELT/ETL pipelines to move data to and from Snowflake data store using combination of Python and Snowflake Snow SQL
  • Developing ETL transformations and validation using Spark-SQL/Spark Data Frames with Azure data bricks and Azure Data Factor
  • Worked with Azure Logic Apps administrators to monitor and troubleshoot issues related to process automation and data processing pipelines
  • Developed and optimized code for Azure Functions to extract, transform, and load data from various sources, such as databases, APIs, and file systems
  • Designed, built, and maintained data integration programs in a Hadoop and RDBMS
  • Developed CI/CD framework for data pipelines using Jenkins tool
  • Collaborated with DevOps engineers to develop automated CI/CD and test-driven development pipeline using azure as per the client requirement
  • Hands on programming experience in scripting languages like python and Scala
  • Involved in running all the Hive scripts through Hive on Spark and some through Spark SQL
  • Collaborated on ETL tasks, maintaining data integrity, and verifying pipeline stability
  • Hands on experience in using Kafka, Spark streaming, to process the streaming data in specific use cases
  • Developed a data pipeline using Kafka, Spark, and Hive to ingest, transform and analyze data
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing
  • Working with JIRA to report on Projects, and creating sub tasks for Development, QA, and Partner validation
  • Experience in full breadth of Agile ceremonies, from daily stand-ups to internationally coordinated PI Planning.

Azure Data Engineer

HealthTrust Workforce Solutions
01.2022 - 12.2022
  • Extensive proficiency in working within the AWS cloud platform, with hands-on experience in AWS services like EC2, S3, EMR, Redshift, Lambda, and Glue
  • Proficient in Spark, including expertise in Spark RDD, Data Frame API, Data Set API, Data Source API, Spark SQL, and Spark Streaming
  • Developed Spark applications using Python, implementing Apache Spark data processing projects for handling data from various sources, including RDBMS and streaming platforms
  • Utilized Spark for enhancing performance and optimizing existing algorithms within Hadoop
  • Competent in Spark technologies such as Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD, and Spark YARN
  • Employed Spark Streaming APIs to perform real-time transformations and actions, creating a common data model by ingesting data from Kafka and persisting it to Cassandra
  • Developed a Python-based Kafka consumer API for consuming data from Kafka topics
  • Processed Extensible Markup Language (XML) messages with Kafka and utilized Spark Streaming to capture User Interface (UI) updates
  • Created preprocessing jobs using Spark Data Frames to flatten JSON documents into flat files
  • Loaded D-Stream data into Spark RDD and performed in-memory data computations to generate output responses
  • Extensive experience in developing live real-time processing and core jobs using Spark Streaming in conjunction with Kafka as a data pipeline system
  • Successfully migrated an existing on-premises application to AWS, making use of AWS services like EC2 and S3 for data processing and storage
  • Proficient in maintaining Hadoop clusters on AWS Elastic MapReduce (EMR)
  • Loaded data into S3 buckets using AWS Glue and PySpark, filtered data stored in S3 buckets using Elasticsearch, and loaded the data into Hive external tables
  • Configured Snowpipe to extract data from S3 buckets and store it in Snowflake's staging area
  • Created numerous ODI interfaces for loading into Snowflake DB
  • Utilized Amazon Redshift to consolidate multiple data warehouses into a single data warehouse
  • Designed columnar families in Cassandra, ingested data from RDBMS, performed data transformations, and exported transformed data to Cassandra as per business requirements
  • Utilized the Spark Data Cassandra Connector for loading data to and from Cassandra
  • Configured Kafka from scratch, including settings for managers and brokers
  • Developed data models for clients' transactional logs and analyzed data from Cassandra tables using the Cassandra Query Language
  • Conducted cluster performance testing using the Cassandra-stress tool to measure and enhance Read/Writes
  • Employed Hive QL to analyze partitioned and bucketed data, executing Hive queries on Parquet tables stored in Hive for data analysis to meet business specifications
  • Used Apache Kafka to aggregate web log data from multiple servers and make it available in downstream systems for data analysis and engineering roles
  • Implemented Kafka security measures and enhanced its performance
  • Expertise in working with various data formats such as Avro, Parquet, RCFile, and JSON, including the development of user-defined functions (UDFs) in Hive
  • Developed custom UDFs in Python and utilized them for data sorting and preparation
  • Worked on custom loaders and storage classes in Pig to process diverse data formats like JSON, XML, CSV, and generated bags for further processing using Pig
  • Developed Sqoop and Kafka jobs to load data from RDBMS and external systems into HDFS and Hive
  • Created Oozie coordinators to schedule Hive scripts, effectively establishing data pipelines
  • Authored numerous MapReduce jobs using PySpark and Numpy and implemented Jenkins for continuous integration
  • Conducted cluster testing and monitoring of HDFS, Hive, Pig, and MapReduce to facilitate access for new users
  • Ensured the continuous monitoring and management of the Hadoop cluster through Cloudera Manager.

AWS Data Engineer

EDASSIST
05.2020 - 12.2021
  • Collaborate closely with technical staff, business managers, and practitioners within the business unit to ascertain project requirements and necessary functionalities
  • Execute a variety of transformations, including wide and narrow transformations, and actions such as filter, lookup, join, and count, on Spark Data Frames
  • Utilize PySpark for working with Parquet files and ORC, as well as Spark Streaming with Data Frames, to meet project needs
  • Develop batch and streaming processing applications using Spark APIs to fulfill functional pipeline requirements
  • Configure AWS Kinesis (Data Firehose) to automate data storage from streaming sources to AWS data lakes like S3, Redshift, and RDS
  • Leverage the real-time integration capabilities of AWS Kinesis (Data Streams) for performing analytics on streamed data
  • Create PySpark code utilizing Spark SQL to generate data frames from Avro formatted raw data and write them to data service layer internal tables in Parquet format
  • Design workflows using Apache Airflow and Apache Oozie to schedule Hadoop jobs controlling large data transformations
  • Utilize Sqoop for importing/exporting data between HDFS/Hive and relational databases like Teradata
  • Engage in the creation and configuration of EC2 instances on AWS for establishing clusters in the cloud
  • Implement CI/CD solutions using Git, Jenkins, and Docker to set up and configure big data architecture on the AWS cloud platform
  • Analyze SQL scripts and design solutions for implementation using PySpark
  • Develop Spark applications in Scala and Python (PySpark), leveraging the in-memory computing capabilities of Spark for faster data processing
  • Utilize Spark Streaming applications to consume data from Kafka topics and write processed streams to HBase
  • Employ Spark API over MapReduce FS for analytics on data in Hive tables and HBase Tables
  • Utilize AWS Lambda to run servers without managing them and trigger code execution via S3 and SNS
  • Integrate Kafka Publisher in Spark job to capture errors from Spark Application and push them into the database.

Azure Data Engineer

05.2018 - 12.2019
  • Utilized Hadoop Cloudera, Microsoft Azure services (HDInsight Clusters, BLOB, Data Factory, Logic Apps)
  • Conducted data ingestion and processing in Azure services (Data Lake, Storage, SQL, DW) and Azure Databricks
  • Executed ETL, migrating Oracle processes to Azure Synapse Analytics
  • Migrated SQL databases to Azure Data Lake, Analytics, SQL Database, Data Bricks, and SQL Data Warehouse
  • Controlled and granted database access, migrating databases to Azure Data Lake Store using Azure Data Factory
  • Cloud Integration: Extensive experience in integrating Informatica PowerCenter with cloud-based platforms such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP), Informatica leveraging services like AWS Glue, Azure Data Factory, or GCP Dataflow
  • Imported the data from various formats like JSON, ORC and Parquet to HDFS cluster with compressed for optimization
  • Conducted data transfer using Azure Synapse and Polybase
  • Developed and optimized Python web applications through Azure DevOps CI/CD
  • Developed enterprise-level solutions with Spark Streaming, Apache Kafka
  • Processed schema-oriented and non-schema-oriented data using Scala and Spark
  • Created partitions and buckets for processing using Hive joins
  • Imported data using Sqoop for regular MySQL to HDFS loading
  • Worked with Data Lakes and big data ecosystems (Hadoop, Spark, Cloudera)
  • Loaded and transformed structured, semi-structured, and unstructured data
  • Developed a data pipeline using Kafka, Spark, and Hive for ingestion and analysis
  • Utilized JIRA for issue and project workflow
  • Worked on Spark using Python (PYSPARK) and Spark SQL for data processing
  • Used Git for version control.

Data Engineer

Value Labs
06.2016 - 04.2018
  • Developed complex stored procedures, efficient triggers, required functions, created indexes, and indexed views for optimal performance
  • Responsible for developing data pipelines with Amazon AWS to extract the data from weblogs and store in HDFS
  • Demonstrated excellent expertise in monitoring SQL Server performance and implementing performance tuning strategies.

Education

Bachelor of Engineering - Electronics And Communications Engineering

Sri Krishna College Of Technology
Coimbatore, TN

Master of Science - Computer And Information Systems

Christian Brothers University
Memphis, TN

Skills

  • Technical Skills:
  • BigData Technologies:
  • AWS EMR, S3, EC2-Fleet, Spark-22, 20 and 16, Hortonworks HDP, Hadoop, MapReduce, Pig, Hive, Apache Spark, SparkSQL, Informatica Power Center 961/8x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN, Nifi, Impala, Sqoop, Solr, Oozie
  • Databases: Cloudera Hadoop CDH 15x, Hortonworks HDP, Oracle 10g/11g, Teradata, DB2, Microsoft SQL Server, MySQL, NoSQL, SQL databases
  • Platforms (O/S): Red-Hat LINUX, Ubuntu, Windows NT/2000/XP
  • Programming languages: Java, Scala, SQL, UNIX shell script, JDBC, Python, Perl
  • Security Management: Hortonworks Ambari, Cloudera Manager, Apache Knox, XA Secure, Kerberos
  • Web-technologies: DHTML, HTML, XHTML, XML, XSL (XSLT, XPATH), XSD, CSS, JavaScript, SOAP, RESTful, Agile, Design Patterns, Data warehousing Informatica PowerCenter/Power mart/Dataquality/Bigdata, DBT, Pentaho, ETL Development, Amazon Redshift, IDQ
  • Database Tools: JDBC, HADOOP, Hive, No-SQL, SQL Navigator, SQL Developer, TOAD, SQL
  • Data Modeling: Rational Rose, Erwin 73/71/41/40, Code Editors Eclipse, Intellij
  • Data Analysis Tools: Machine Learning, Deep Learning, Data Warehouse, Data Mining, Data Analysis, Big data, Visualizing, Data Munging, Data Modelling decisions
  • ETL development
  • Data Warehousing
  • Data Modeling
  • Data Pipeline Design
  • Data Migration
  • Big Data Processing
  • Spark Framework
  • SQL Expertise
  • Machine Learning
  • Data Governance
  • NoSQL Databases
  • Data Security
  • API Development
  • Hadoop Ecosystem
  • SQL and Databases
  • SQL Programming
  • Risk Analysis
  • Data Mining
  • Data repositories
  • Load Balancing
  • XML Web Services
  • RDBMS
  • Data Analysis

Accomplishments

  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
  • Designed ETL data flows using SSIS, creating mappings and workflows for extracting data from SQL Server
  • Conducted Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS
  • Efficiently implemented Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions
  • Developed fact tables and dimension tables, incorporating Slowly Changing Dimensions (SCD)
  • Implemented robust error and event handling mechanisms, including Precedence Constraints, Break Points, Check Points, and Logging
  • Built Cubes and Dimensions with different Architectures and Data Sources for Business Intelligence
  • Applied thorough knowledge of Features, Structure, Attributes, Hierarchies, and Star/Snowflake Schemas of Data Marts.

Certification

AI For Everyone

Google Data Analytics

Microsoft Azure Fundamentals

Timeline

Azure Data Engineer

Chase Bank
01.2023 - Current

Azure Data Engineer

HealthTrust Workforce Solutions
01.2022 - 12.2022

AWS Data Engineer

EDASSIST
05.2020 - 12.2021

Azure Data Engineer

05.2018 - 12.2019

Data Engineer

Value Labs
06.2016 - 04.2018

Bachelor of Engineering - Electronics And Communications Engineering

Sri Krishna College Of Technology

Master of Science - Computer And Information Systems

Christian Brothers University
Anandan Kaliyamoorthy