Summary
Overview
Work History
Education
Skills
Certification
Timeline
PROFESSIONAL SUMMARY
Generic

Sreeja Katta

Apex,NC

Summary

Proven Success as Engineering Manager, over 10 years of experience leading high-performing data engineering teams and delivering scalable Big Data and Cloud solutions. Expert in building and optimizing data platforms on AWS, Databricks, and Snowflake, with a strong focus on reliability, data quality, and performance in high-availability environments. Managed ETL/ELT systems, driving engineering best practices, and partnering with cross-functional business teams to design data models and pipelines that enable analytics, forecasting, and operational decision-making. AWS Solutions Architect and Snowflake SnowPro Core certified, with a track record of mentoring engineers, improving engineering processes, and leading multi-functional teams to troubleshoot complex issues, optimize systems, and accelerate business

Overview

13
13
years of professional experience
1
1
Certification

Work History

Technical Engineer Manager | Data Engineering & Analytics

LTIMindtree Infotech | Travelers
11.2024 - Current
  • Responsible to Manage, mentor, and coach a team of Data Engineers, guiding them on pipeline design, performance tuning, and coding best practices.
  • Responsible to establish engineering best practices for coding standards, data quality checks, testing, version control, and CI/CD workflows using GitHub.
  • Coordinated across multiple Lines of Business (LOBs) to align technical and business priorities, ensuring seamless communication and issue resolution across data engineering and analytics teams.
  • Worked closely with various LOB stakeholders to capture representative sample sets of ETL jobs, which served as a test scenario for validation and sign-off before production deployment.
  • Responsible to design and evolve the enterprise analytics data model to support financial reporting, product insights, inventory operations, and customer behavior analytics.
  • Led and guided technical teams across Talend, Databricks & Snowflake environments to identify, analyze, and resolve issues within established SLAs.
  • Oversaw end-to-end data pipeline delivery, ensuring successful integration for multiple artifacts aimed at lift & shift strategy between Talend & Databricks.
  • Heavily utilized snowflake cortex AI and Atacama to ensure data governance across enterprise-wide data.
  • Managed release planning, testing cycles, and deployment coordination, ensuring code changes were properly reviewed, version-controlled, and merged using GitHub.
  • Partnered with architecture and engineering teams to define data migration and transformation strategies, ensuring scalability, performance, and governance compliance.
  • Established clear communication channels between business and technical teams to proactively identify dependencies, risks, and blockers across projects.
  • Facilitated defect triage and root cause analysis, driving accountability and technical resolution across multi-platform data ecosystems.
  • Created and maintained program-level Reports, burndown charts, tracking milestones, deliverables, and release readiness for executive reporting.
  • Responsible to architect and maintain scalable ETL/ELT frameworks, ensuring clean, well-tested, and production-ready code.
  • Worked closely with internal leadership and project owners to generate weekly project status reports, highlighting progress, key risks, mitigation plans, and upcoming milestones for executive review.
  • Led team to ensure the successful migration
  • Environment: Talend , Databricks, Github, Snowflake, Teradata, Oracle, AWS S3, SQL server, AWS, GIT, Jenkins

Application Development Manager

Enact Mortgage, Inc
03.2021 - 11.2024
  • Utilized Agile methodologies and lead the team to have a prioritized backlog by following agile ceremonies.
  • Develop and implement text mining algorithms to categorize and classify textual data, enabling automated content tagging and organization.
  • Utilized snowflake for building data lake architecture and using it as a warehousing tool for business utilization.
  • Utilized AI capabilities for GenAI development by using snowflake Cortex and snowflake co-pilot to better understand the data residing in snowflake
  • Manage, mentor, and coach a team of Data Engineers, guiding them on pipeline design, performance tuning, and coding best practices.
  • Responsible to conduct design reviews, code reviews, and enforce engineering standards to improve quality and consistency.
  • Responsible to foster a strong engineering culture based on ownership, continuous learning, and technical excellence.
  • Worked with team of scientists to help them build the automated pipeline to understand the data better using Gen AI capabilities of snowflake
  • Good experience utilizing AWS S3, EC2, Step functions & lambda for orchestrating and maintaining ETL pipelines.
  • Worked closely with data science team for better assisting them by providing quality data for building successful models using sage maker
  • Responsible for building processes for sourcing, processing, contextualizing, and modeling data.
  • Built and lead the overall CI/CD effort to make sure data pipelines are automated and smooth enough to handle the end-to-end ETL process.
  • Develop and execute transformation logic using Snowpark APIs (Python, Scala, or Java) to clean, enrich, and transform data within Snowflake.
  • Deployed and managed Docker containers on local machines, AWS services (e.g., Amazon ECS, Amazon EKS), or other environments.
  • Configured and build data lake by utilizing snowflake capabilities like dynamic tables Apache iceberg to build automated data lake process.
  • Utilized snow pipe for infer schema capabilities provided by snowflake which in turn helps the data science team to successfully run their models.
  • Good experience utilizing dynamic SQL & Python, Spark to build ETL, CI/CD automated pipelines
  • Environment: Talend, Python, Bitbucket, Snowpark, AWS, GIT, Jenkins, chef, Maven, Python, JIRA, Shell Scripts, and Oracle, DB2, Snowflake, AWS Jupiter notebooks, GitHub, Gitlab, SNOW CLI, Informatica’s, SAS, VSAM, Active Batch

AWS Big Data Engineer/PTE

Vanguard Group, Inc
12.2019 - 02.2021
  • Designing, creating, testing, and maintaining the complete data management & processing systems.
  • Good experience with Snowflake Snow SQL and writing use defined functions. Also participated in snowflake implementation.
  • Good knowledge on Snowflake advanced concepts like setting up resource monitors, RBAC controls, virtual warehouse sizing, query performance tuning, zero copy clone, time travel and understand how to use these features.
  • Utilized the AWS Glue Data Catalog for schema management and to track metadata.
  • Expertise in deploying Snowflake features such as data sharing, events, and lake-house patterns.
  • Have good understanding of relational as well as NoSQL data stores, methods, and approaches (star and snowflake, dimensional modelling).
  • Create and maintain optimal data pipeline architecture.
  • Good experience in designing, developing and responsible for implementation in Snowflake.
  • Improving data quality, reliability & efficiency of the individual components & the complete

AWS Big Data Developer/PTE

Vanguard Group Inc, NC
03.2019 - 12.2019
  • Set up AWS Glue Crawlers to automatically discover and catalog data stored in S3, RDS, or other data sources.
  • Configured AWS Glue connections to extract data from various sources.
  • Utilized container orchestration platforms (e.g., Kubernetes on Amazon EKS) to manage the deployment, scaling, and operation of containerized applications.
  • Created Hive tables and partitioned data for better performance. Implemented Hive UDF's and did performance tuning for better results.
  • Set up automated deployment pipelines for Docker containers using CI/CD tools (e.g., AWS Code Pipeline, GitHub Actions) to ensure smooth application updates.
  • Developed Map-Reduce programs to clean and aggregate the data.
  • Implemented optimized map joins to get data from different sources to perform cleaning operations before applying algorithms.
  • Integrated Docker into CI/CD pipelines to automate testing, building, and deploying containerized applications.
  • Developed Spark scripts to perform data transformations, such as filtering, aggregating, and joining datasets.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Optimized Spark jobs by tuning Spark configurations, partitioning data, and using caching strategies to enhance performance.
  • On demand secure EMR launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function.
  • Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets.
  • Responsible to Integrate data from various sources into Azure data platforms (e.g., Azure Data Factory, Azure Data Lake, Azure Synapse Analytics).
  • Monitored and analyzed system metrics, query plans, and resource utilization to identify bottlenecks and implement proactive measures for improving data processing efficiency.
  • Developed and maintained infrastructure as code (IaC) using Azure Resource Manager (ARM) templates, Terraform, or other IaC tools
  • Implemented multi-stage pipelines for continuous integration and continuous deployment (CI/CD)..
  • Used version control systems (e.g., Git) to manage code and configurations related to data pipelines, scripts, and DevOps processes.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by

AWS Big Data Developer

Vanguard Group Inc, NC
08.2018 - 03.2019
  • Responsible to create and maintain optimal data pipeline architecture.
  • Responsible to ensure optimal ETL of data from a wide variety of data sources using modern technologies.
  • Good experience using Talend tool for data quality check & data integration.
  • Responsible for designing, developing, and documenting Talend ETL processes.
  • Participated in technical architecture, implemented data pipelines, and performance scaling using tools to integrate Talend data and ensure data quality in a big data environment.
  • Utilized Talend for performing data requirements analyses, assisting with data flow, data mapping
  • Maintained and kept data separated and secure across national boundaries through multiple data centers and AWS regions.
  • Optimized DAGs and tasks for performance, ensuring efficient resource utilization and minimizing execution time.
  • Used Airflow operators and hooks to interact with different systems and services.
  • Experience writing chef cookbook and recipes to automate the deployment process and to integrating chef cookbooks into Jenkins jobs for a continuous delivery framework.
  • Integrated Automation scripts (Selenium WebDriver API) on Continuous Integration tool Jenkins for nightly batch run of the script.
  • Knowledge of monitoring and managing Hadoop cluster using Confidential.
  • Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
  • Experienced in installing, configuring, and managing RDBMS and NoSQL tools like Elastic Search, MongoDB (NoSQL) and Cassandra DB.
  • Focusing on high-availability, fault tolerance and auto-scaling in Cloud Formation. Creating snapshots and Amazon machine images (AMIs) of the instances for backup and creating clone instances.
  • Environment: Chef, Jenkins, Airflow, Azure Devops, WebLogic, WebSphere, MongoDB, MySQL, Shell Scripting, Ruby, Python, Selenium, Git, Maven, SISS, Nginx, VMware ESX

AWS Devops Engineer

BSAPSEC INC
04.2016 - 07.2017
  • Installation, configuration and maintenance of Red Hat, CentOS, SUSE, and Solaris servers at multiple data centers.
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
  • Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
  • Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
  • Work with data and analytics experts to strive for greater functionality in our data systems.
  • Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Strong project management and organizational skills.
  • Environment: Hadoop, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Python, Oozie, Linux, UNIX, SQL, snowflake, Java, Jenkins, SQL

SQL Developer

TCS- CMC LTD, Chennai, India.
05.2013 - 12.2014
  • Designs, develops, tests, deploys, and supports new software and changes to existing software solutions.
  • Contributes to the design and creation of data architectures and analyzes information from disparate data sources.
  • Collaborates with team members and provides thoughtful discussion on tactics and strategies needed.
  • Develop complex workflows and mappings using Informatica ETL and shell scripts to satisfy the business requirements
  • Designing and developing ETL solutions using Informatica that follow the in-house ETL development standards, managing program coding for the extraction of data from all existing source systems, transforming data to industry standards and generating extract files
  • Create and maintain documentation of the physical and logical data models; data dictionaries; and ETL processes via process flow diagrams
  • Good experience in establishing testing/quality assurance environments; working with technical teams and system to ensure that programs and modifications are error free
  • Experienced in testing, debugging and performance tuning of any ETL bottlenecks. Very large Database and DW implementation experience (20 TBs)
  • Experience on designing new solutions for the project Strong understanding of data warehousing concepts, schemas, Slowly Changing Dimensions, Facts and Dimensions
  • Environment: SQL Management studio, Informatica, ETL, Agile, Oracle SQL, Data Analysis, Jira, Confluence, Shell Scripts, Data Modeling.

Education

Master's - computers and information sciences

Manchester, NH
05.2016

Bachelor's - computer science and engineering

Vignan University
India
05.2014

Skills

  • Operating System: Windows Server 2000, 2003,2008 and 2012, Ubuntu, HPUX 10x/11x, Mac OsX
  • Web Server: Apache 13x, Apache 20x, JBoss4x and Nginx
  • Experience with WebSphere application servers
  • Experience with AWS and Azure
  • Automation Tools: Chef, Puppet, Ansible, Docker, Terraform, Kubernetes
  • Virtual Servers: VMware ESX/ESXi Servers, vCenter, vSphere 5x, Solaris Zones
  • Database Tools: MySQL, SQL, Oracle, NoSQL, DynamoDB, Cassandra
  • Scripting: Python, PowerShell and Bash Shell scripting

Certification

Snowflake Snowpro Core certified,

AWS solutions Architect Certified,

IIPM Certified Scrum Master

Timeline

Technical Engineer Manager | Data Engineering & Analytics

LTIMindtree Infotech | Travelers
11.2024 - Current

Application Development Manager

Enact Mortgage, Inc
03.2021 - 11.2024

AWS Big Data Engineer/PTE

Vanguard Group, Inc
12.2019 - 02.2021

AWS Big Data Developer/PTE

Vanguard Group Inc, NC
03.2019 - 12.2019

AWS Big Data Developer

Vanguard Group Inc, NC
08.2018 - 03.2019

AWS Devops Engineer

BSAPSEC INC
04.2016 - 07.2017

SQL Developer

TCS- CMC LTD, Chennai, India.
05.2013 - 12.2014

Bachelor's - computer science and engineering

Vignan University

Master's - computers and information sciences

PROFESSIONAL SUMMARY

  • Data Engineering Leader with strong experience in building and managing scalable data pipelines using Spark (Python + Scala), Databricks, and AWS Cloud.
  • Skilled in designing analytical data models, optimizing ETL/ELT workflows, and ensuring high data quality and reliability for analytics and business decision-making.
  • Expertise in Spark SQL, Spark Streaming (Lambda Architecture), performance tuning, debugging distributed compute clusters, and building high-throughput ingestion pipelines.
  • Strong background in data architecture including ingestion design, Hadoop ecosystem components, data modeling, advanced data processing, and machine learning with MLlib and NLP frameworks.
  • Proven experience developing production-grade pipelines on Databricks, feeding Large Language Models (LLMs) using Python/PySpark, and architecting low-latency, high-quality data flows that support analytics, financial modeling, and product insights.
  • Hands-on cloud engineering skills across AWS (EC2, S3, RDS, VPC, IAM, CloudWatch, SQS/SNS, ELB, Auto Scaling) and strong understanding of virtualization, networking, and distributed systems.
  • Experienced in selecting and integrating AWS services to meet application and data requirements.
  • Strong DevOps experience, including CI/CD automation with GitHub Actions, Jenkins, and Azure DevOps; Docker-based pipelines; release management; and monitoring/alerting through CloudWatch and Azure Monitor. Skilled integrating ADO with JIRA, Slack, and ServiceNow to streamline engineering workflows.
  • Experienced in leading teams, enforcing coding standards and data quality, conducting code reviews, and partnering cross-functionally with Analytics, Product, Marketing, and Operations to deliver trusted analytical datasets and drive business impact.