Summary

Overview

Work History

Education

Skills

Timeline

Sritej Marni

Dallas,TX

Summary

12 years of IT experience spanning Data Engineering, Data Analysis, and Machine Learning Engineering. Expertise in building data pipelines and ETL processes with tools such as Python, Groovy (Java), SQL, AWS, GCP, Kafka, and Spark. Developed batch ETL pipelines to extract data from sources like MySQL, Mixpanel, REST APIs, and text files into Snowflake. Skilled in distributed data processing using Databricks for data transformation, validation, and cleaning, ensuring high data quality and consistency. Experience working with Spark (AWS EMR) to process data from S3 and load it into Snowflake. Worked closely with analytics and data science teams to support model deployment using Docker, Python, Flask, and AWS Beanstalk. Contributed to end-to-end CI/CD pipelines with tools like CodeBuild, CodeDeploy, Git, and CodePipeline. Developed and managed API gateways and web services for seamless data integration. Strong foundation in Object-Oriented Programming (OOP), writing extensible, reusable, and maintainable code. Hands-on experience with IDEs and development tools, including Eclipse, PyCharm, PyScripter, Notepad++, and Sublime Text. Proficient with Python libraries such as NumPy, Matplotlib, Beautiful Soup, and Pickle for data manipulation and visualization. Expertise in writing efficient Python code and resolving performance bottlenecks. Implemented optimized data processing pipelines to meet performance SLAs and reduce latency. Demonstrated ability to lead teams and work independently to deliver complex projects. Strong client interaction and presentation skills, effectively bridging technical and business communication gaps. Proven success in delivering solutions on time and driving collaborative teamwork. Experience working in cloud-native environments (AWS and GCP) and utilizing Kafka for real-time data streaming. Skilled in data modelling, governance, and ensuring data integrity across distributed systems. Adaptive to agile development methodologies, ensuring smooth project delivery and iteration.

Overview

years of professional experience

Work History

Senior Data Engineer

ROKU INC

10.2021 - Current

Engineered Python-based solution for downloading data from various payment processors
Established Python-based framework for daily and monthly file monitoring via Roku Pay platforms
Transformed AWS S3/GCP Parquet, AVRO files into Snowflake and Hive Data Warehouse
Configured stages to facilitate seamless data migration into Snowflake cloud
Tailored visual reports based on specific requirements using Tableau
Designed a data quality framework to ensure data integrity across multiple tables in Snowflake Cloud Data Warehouse
Implemented real-time data loading using Snowpipe for partner-supplied files, including those from Hulu and Disney
Leveraged PySpark to streamline recommendations data workflows
Developed ETL processes with Spark-SQL/PySpark Data Frames
Implemented file tracking system on GCP platforms using Python
Automated missing data report generation to ensure delivery timeliness
Implemented data pipelines leveraging PySpark for efficient batch processing on AWS
Formulated comprehensive test plans in alignment with business needs
Created, Reviewed and updated Test Scenarios, performed end-to-end testing, Regression Testing and Integration Testing
Executed integration testing to ensure optimal ETL performance
Diagnosed data quality issues to identify underlying problems
Transformed data into Parquet and AVRO formats to facilitate integration with Snowflake and Hive data warehouses
Exhibited outstanding verbal, written, and interpersonal communication abilities
Managed security protocols such as authentication and authorization to protect sensitive information stored in databases
Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS data pipeline, GCP hive, SQL, Snowflake, DBT, Tableau, Numpy, Pandas, Dask, Scikit learn, Machine Learning, Flask, HTML,CSS, JAVASCRIPT

Data Engineer

CAPITAL ONE

02.2021 - 09.2021

Built data pipelines (ELT/ETL scripts) to extract data from sources like Snowflake, AWS S3, and Teradata, transform it, and load it into Salesforce
Utilized Teradata ETL tools such as BTEQ, FastLoad, MultiLoad, and FastExport for data extraction from Teradata
Developed ETL scripts using Python, AWS S3, and cloud storage to migrate data from Teradata to Snowflake via PySpark
Created Spark applications using Spark SQL in Databricks to extract, transform, and load user click and view data
Designed and implemented AWS CloudFormation templates in JSON to build custom VPCs, subnets, and NAT for application deployment
Created new dashboards, reports, scheduled searches, and alerts using Splunk
Leveraged Kafka to build real-time data pipelines between clusters
Developed custom Jenkins jobs/pipelines using Bash scripts and AWS CLI for automated infrastructure provisioning
Created an ETL framework to process revenue files received from partners
Built a machine learning model for predicting user behavior (transactor vs
Revolver) and detecting fraud
Optimized data pipelines in Databricks using Spark, reducing data processing time by30%
Developed AWS Lambda serverless scripts for handling ad-hoc requests
Performed cost optimization, reducing infrastructure expenses
Proficient in Python libraries such as NumPy, Pandas, Dask, Scikit-learn, and ONNX for machine learning tasks
Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS Data Pipeline, GCP, AWS CloudWatch, AWS CloudFormation, AWS Glue, AWS Kinesis, Shell scripts, DBT, Jenkins

Data Engineer

Credit Sesame

09.2017 - 01.2021

Worked on building the data pipelines (ELT/ETL Scripts) - extracting the data from different sources (MySQL, Dynamo DB, AWS S3 files), transforming and loading the data to the Data Warehouse (Snowflake)
Worked on adding the Rest API layer to the ML models built using Python, Flask & deploying the models in AWS Beanstalk Environment using Docker containers
Worked on developing & adding a few Analytical dashboards using Tableau
Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve the Tableau analytics dashboard performance and to help data scientists and analysts to speed up the ML model training & analysis
Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving the credit products
Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to SnowFlake data warehouse
Worked on building the data pipelines using PySpark, processing the data files present in S3 and loading it to Snowflake
Worked on supporting & building the infrastructure for the core module of the Credit Sesame i.e Approval Odds, started with Batch ETL, moved to micro-batches, and then converted to real time predictions
Designed and developed AWS Cloud Formation Templates to deploy the web applications
Worked on building the data pipelines using Spark (AWS EMR), processing the data files present in S3 and loading it to SnowFlake
Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onnx & MachineLearning
Other activities include: supporting and keeping the data pipelines active, working with Product Managers, Analysts, Data Scientists & addressing the requests coming from them, unit testing, load testing and SQL optimizations
Technical environment: Java, Groovy, Python, Flask, Numpy, Pandas, SQL, MySQL, Cassandra, AWS EMR, Spark, AWS Kinesis, Snowflake, AWS EC2, AWS S3, AWS Beanstalk, AWS Lambda, AWS data pipeline, AWS CloudFormation, AWS cloud-watch, Docker, Shell scripts, Dynamo DB

ETL Developer

Ingram Micro

07.2012 - 08.2017

Participated in project planning sessions with management, data architects, external stakeholders, and team members to analyze business requirements and outline the proposed data warehouse solution
Collaborated with business analysts and stakeholders to gather and analyze requirements, translating them into technical specifications for ETL processes and data integration
Collaborated with a Senior Data Architect to construct a comprehensive data warehouse using a combination of on-premises and cloud-based infrastructures
Maintained complex data models for a large-scale data warehouse, ensuring optimal performance and scalability for handling terabytes of data
Translated business requests into specific KPI dashboards and reports
Provided support to Power BI developers in designing, developing, and maintaining BI solutions
Designed and executed a comprehensive system test plan to ensure the accurate implementation of the data solution
Designed and implemented end-to-end ETL processes, extracting data from diverse sources, transforming it to meet business requirements, and loading it into cloud-based data warehouses like Snowflake
Developed and optimized scalable ETL pipelines, ensuring seamless data integration across multiple platforms using tools such as Informatica, Talend, SSIS, and Pentaho
Enhanced data quality and integrity through profiling, validation, and automated error-handling mechanisms, ensuring consistency and accuracy in the data pipeline
Automated workflows using schedulers like Apache Airflow and cloud-native orchestration tools to streamline operations and reduce manual intervention
Collaborated with data science teams to prepare data pipelines for AI/ML models, providing ready-to-use datasets for advanced analytics and machine learning projects
Integrated real-time data streaming using platforms like Apache Kafka alongside batch ETL processes to support time-sensitive reporting needs
Developed CI/CD pipelines for ETL solutions using tools like Git, Jenkins, and CodePipeline, ensuring smooth deployments and version control
Utilized SQL and Python expertise to write efficient queries, optimize data transformations, and automate routine tasks for enhanced productivity
Monitored and maintained ETL processes, identifying and resolving issues through real-time logging and troubleshooting to ensure uninterrupted data flow
Worked with cloud platforms (AWS, GCP, Azure) to design cloud-native pipelines, leveraging their storage and compute capabilities for efficient data handling
Collaborated with cross-functional teams, including business stakeholders, to translate requirements into actionable ETL workflows while ensuring high performance and scalability
Stayed updated with emerging trends, including real-time data processing, automation, and cloud-native ETL tools, to continuously enhance ETL operations
Technical environment: Python, Numpy, Pandas, SQL, MySQL, Redshift, AWS

Education

Bachelor of Science - Computer Science and Engineering

JNTU Hyderabad

Telangana

05-2012

Skills

Languages: Python, Groovy (Java), SQL, Shell Scripting, JavaScript, HTML/CSS
Frameworks and Libraries: PySpark, Flask, NumPy, Pandas, Dask, Sci-kit Learn, ONNX, Matplotlib, Beautiful Soup
ETL Tools: Informatica, Talend, Pentaho, SSIS Batch
Data Processing: Spark, Databricks, AWS Glue, Apache Kafka, Airflow, Snowpipe
Data Warehousing and Databases: Snowflake, MySQL, Hive, Cassandra, DynamoDB, Redshift, Teradata

AWS Services: EMR, S3, Lambda, EC2, Beanstalk, CloudFormation, Kinesis, CodePipeline, CloudWatch
GCP Services: Data Storage, Parquet/AVRO Processing, DataFlow, GCP Lambda Functions
CI/CD Tools: Jenkins, Git, CodeBuild, CodeDeploy
IDEs: Eclipse, PyCharm, PyScripter, Notepad, Sublime Text Version Control: Git, CodePipeline
Soft Skills and Methodologies: Agile development and project management, strong client interaction, presentation skills, and teamwork abilities

Timeline

Senior Data Engineer

ROKU INC

10.2021 - Current

Data Engineer

CAPITAL ONE

02.2021 - 09.2021

Data Engineer

Credit Sesame

09.2017 - 01.2021

ETL Developer

Ingram Micro

07.2012 - 08.2017

Bachelor of Science - Computer Science and Engineering

JNTU Hyderabad

Sritej Marni

Summary

Overview

Work History

Senior Data Engineer

Data Engineer

Data Engineer

ETL Developer

Education

Bachelor of Science - Computer Science and Engineering

Skills

Timeline

Senior Data Engineer

Data Engineer

Data Engineer

ETL Developer

Bachelor of Science - Computer Science and Engineering

Similar Profiles

Sandeep BadamSandeep Badam

SUNNY KUMARSUNNY KUMAR

Sritej MarniSritej Marni

SIMON MAXEYSIMON MAXEY

Cierra KennedyCierra Kennedy