Summary

Overview

Work History

Education

Skills

Certification

Timeline

Sritej Marni

Dallas,TX

Summary

With over 9 years of IT experience in Data Engineering, Data Analysis, and Machine Learning Engineering, I have developed expertise in building data pipelines and ETL scripts using Python, Groovy (Java), SQL, AWS, GCP, Kafka, and Spark. I have worked on batch ETL processes to load data from various sources like MySQL, Mixpanel, REST APIs, and text files into the Snowflake data warehouse. My experience includes leveraging Databricks for distributed data processing, transformation, validation, and data cleaning, ensuring data quality and integrity. I have collaborated with analytics and data science teams to support and deploy models using Docker, Python, Flask, and AWS Beanstalk. Additionally, I built data pipelines using Spark (AWS EMR), processed data from S3, and loaded it into Snowflake. Proficient in development tools such as Eclipse, PyCharm, PyScripter, Notepad++, and Sublime Text, I have a strong foundation in Object-Oriented Programming, writing extensible, reusable, and maintainable code. My work also includes using Python libraries like NumPy, Matplotlib, Beautiful Soup, and Pickle, and developing web services, API gateways, and CI/CD pipelines with CodeBuild, CodeDeploy, Git, and CodePipeline. I am skilled in writing efficient Python code and resolving performance issues. I bring excellent client interaction, presentation skills, and leadership qualities, with proven success working both independently and within teams.

Overview

years of professional experience

Certification

Work History

Lead Data Engineer

ROKU INC

10.2021 - Current

Engineered Python-based solution for downloading data from various payment processors
Established Python-based framework for daily and monthly file monitoring via Roku Pay platforms
Transformed AWS S3/GCP Parquet, AVRO files into Snowflake and Hive Data Warehouse
Configured stages to facilitate seamless data migration into Snowflake cloud
Tailored visual reports based on specific requirements using Tableau
Designed a data quality framework to ensure data integrity across multiple tables in Snowflake Cloud Data Warehouse
Implemented real-time data loading using Snowpipe for partner-supplied files, including those from Hulu and Disney
Leveraged PySpark to streamline recommendations data workflows
Developed ETL processes with Spark-SQL/PySpark Data Frames
Implemented file tracking system on GCP platforms using Python
Automated missing data report generation to ensure delivery timeliness
Implemented data pipelines leveraging PySpark for efficient batch processing on AWS
Formulated comprehensive test plans in alignment with business needs
Created, Reviewed and updated Test Scenarios, Performed end-to-end testing, Regression Testing and Integration Testing
Executed integration testing to ensure optimal ETL performance
Diagnosed data quality issues to identify underlying problems
Transformed data into Parquet and AVRO formats to facilitate integration with Snowflake and Hive data warehouses
Exhibited outstanding verbal, written, and interpersonal communication abilities
Managed security protocols such as authentication and authorization to protect sensitive information stored in databases.
Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS data pipeline, GCP hive, SQL, Snowflake, DBT, Tableau, Numpy, Pandas, Dask, Scikit learn, Machine Learning, Flask, HTML,CSS, JAVASCRIPT

Data Engineer

CAPITAL ONE

02.2021 - 09.2021

Built data pipelines (ELT/ETL scripts) to extract data from sources like Snowflake, AWS S3, and Teradata, transform it, and load it into Salesforce.
Utilized Teradata ETL tools such as BTEQ, FastLoad, MultiLoad, and FastExport for data extraction from Teradata.
Developed ETL scripts using Python, AWS S3, and cloud storage to migrate data from Teradata to Snowflake via PySpark.
Created Spark applications using Spark SQL in Databricks to extract, transform, and load user click and view data.
Designed and implemented AWS CloudFormation templates in JSON to build custom VPCs, subnets, and NAT for application deployment.
Created new dashboards, reports, scheduled searches, and alerts using Splunk.
Leveraged Kafka to build real-time data pipelines between clusters.
Developed custom Jenkins jobs/pipelines using Bash scripts and AWS CLI for automated infrastructure provisioning.
Created an ETL framework to process revenue files received from partners.
Built a machine learning model for predicting user behavior (transactor vs. revolver) and detecting fraud.
Optimized data pipelines in Databricks using Spark, reducing data processing time by 30%.
Developed AWS Lambda serverless scripts for handling ad-hoc requests.
Performed cost optimization, reducing infrastructure expenses.
Proficient in Python libraries such as NumPy, Pandas, Dask, Scikit-learn, and ONNX for machine learning tasks.
Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS Data Pipeline, GCP, AWS CloudWatch, AWS CloudFormation, AWS Glue, AWS Kinesis, Shell scripts, DBT, Jenkins, Splunk, PagerDuty, Snowflake, Scikit-learn.

Data Engineer

Credit Sesame

Mountain view, CA

09.2017 - 01.2021

Company Overview: Mountain view, CA
Worked on building the data pipelines (ELT/ETL Scripts) - extracting the data from different sources (MySQL, Dynamo DB, AWS S3 files), transforming and loading the data to the Data Warehouse (Snowflake)
Worked on adding the Rest API layer to the ML models built using Python, Flask & deploying the models in AWS Beanstalk Environment using Docker containers
Worked on developing & adding a few Analytical dashboards using Tableau
Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve the Tableau analytics dashboard performance and to help data scientists and analysts to speed up the ML model training & analysis
Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving the credit products
Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to SnowFlake data warehouse
Worked on building the data pipelines using PySpark, processing the data files present in S3 and loading it to Snowflake
Worked on supporting & building the infrastructure for the core module of the Credit Sesame i.e Approval Odds, started with Batch ETL, moved to micro-batches, and then converted to real time predictions
Designed and developed AWS Cloud Formation Templates to deploy the web applications
Worked on building the data pipelines using Spark (AWS EMR), processing the data files present in S3 and loading it to SnowFlake
Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onnx & MachineLearning
Other activities include: supporting and keeping the data pipelines active, working with Product Managers, Analysts, Data Scientists & addressing the requests coming from them, unit testing, load testing and SQL optimizations
Technical environment: Java, Groovy, Python, Flask, Numpy, Pandas, SQL, MySQL, Cassandra, AWS EMR, Spark, AWS Kinesis, Snowflake, AWS EC2, AWS S3, AWS Beanstalk, AWS Lambda, AWS data pipeline, AWS CloudFormation, AWS cloud-watch, Docker, Shell scripts, Dynamo DB

ETL Developer

Nic InfoTek

Tampa, FL

07.2016 - 08.2017

Company Overview: Tampa, FL
Participated in project planning sessions with management, data architects, external stakeholders, and team members to analyze business requirements and outline the proposed data warehouse solution
Collaborated with business analysts and stakeholders to gather and analyze requirements, translating them into technical specifications for ETL processes and data integration
Collaborated with a Senior Data Architect to construct a comprehensive data warehouse using a combination of on-premises and cloud-based infrastructures
Maintained complex data models for a large-scale data warehouse, ensuring optimal performance and scalability for handling terabytes of data
Translated business requests into specific KPI dashboards and reports
Provided support to Power BI developers in designing, developing, and maintaining BI solutions
Designed and executed a comprehensive system test plan to ensure the accurate implementation of the data solution
Technical environment: Python, Numpy, Pandas, SQL, MySQL, Redshift, AWS

Education

Master of Science - Computer Science and Programming

Southern University And A & M College

Baton Rouge, LA

06.2016

Bachelor of Science - Computer Science And Engineering

JNTU Hyderabad

05.2012

Skills

Data Integration Expertise
Information Security Management
Version Control Proficiency
Automated Deployment
Integration Workflow Optimization

API Integration Expertise
Risk Management Analysis
SQL Data Synchronization
Resource Utilization Optimization

Certification

GCP Professional Data Engineer

Timeline

Lead Data Engineer

ROKU INC

10.2021 - Current

Data Engineer

CAPITAL ONE

02.2021 - 09.2021

Data Engineer

Credit Sesame

09.2017 - 01.2021

ETL Developer

Nic InfoTek

07.2016 - 08.2017

Master of Science - Computer Science and Programming

Southern University And A & M College

Bachelor of Science - Computer Science And Engineering

JNTU Hyderabad

Sritej Marni

Summary

Overview

Work History

Lead Data Engineer

Data Engineer

Data Engineer

ETL Developer

Education

Master of Science - Computer Science and Programming

Bachelor of Science - Computer Science And Engineering

Skills

Certification

Timeline

Lead Data Engineer

Data Engineer

Data Engineer

ETL Developer

Master of Science - Computer Science and Programming

Bachelor of Science - Computer Science And Engineering

Similar Profiles

SUNNY KUMARSUNNY KUMAR

Sandeep BadamSandeep Badam

Amanda FalconeAmanda Falcone

Alexandre MorelAlexandre Morel

Isaac MartinezIsaac Martinez