Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sritej Marni

Dallas,TX

Summary

With over 9 years of IT experience in Data Engineering, Data Analysis, and Machine Learning Engineering, I have developed expertise in building data pipelines and ETL scripts using Python, Groovy (Java), SQL, AWS, GCP, Kafka, and Spark. I have worked on batch ETL processes to load data from various sources like MySQL, Mixpanel, REST APIs, and text files into the Snowflake data warehouse. My experience includes leveraging Databricks for distributed data processing, transformation, validation, and data cleaning, ensuring data quality and integrity. I have collaborated with analytics and data science teams to support and deploy models using Docker, Python, Flask, and AWS Beanstalk. Additionally, I built data pipelines using Spark (AWS EMR), processed data from S3, and loaded it into Snowflake. Proficient in development tools such as Eclipse, PyCharm, PyScripter, Notepad++, and Sublime Text, I have a strong foundation in Object-Oriented Programming, writing extensible, reusable, and maintainable code. My work also includes using Python libraries like NumPy, Matplotlib, Beautiful Soup, and Pickle, and developing web services, API gateways, and CI/CD pipelines with CodeBuild, CodeDeploy, Git, and CodePipeline. I am skilled in writing efficient Python code and resolving performance issues. I bring excellent client interaction, presentation skills, and leadership qualities, with proven success working both independently and within teams.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Lead Data Engineer

ROKU INC
10.2021 - Current
  • Engineered Python-based solution for downloading data from various payment processors
  • Established Python-based framework for daily and monthly file monitoring via Roku Pay platforms
  • Transformed AWS S3/GCP Parquet, AVRO files into Snowflake and Hive Data Warehouse
  • Configured stages to facilitate seamless data migration into Snowflake cloud
  • Tailored visual reports based on specific requirements using Tableau
  • Designed a data quality framework to ensure data integrity across multiple tables in Snowflake Cloud Data Warehouse
  • Implemented real-time data loading using Snowpipe for partner-supplied files, including those from Hulu and Disney
  • Leveraged PySpark to streamline recommendations data workflows
  • Developed ETL processes with Spark-SQL/PySpark Data Frames
  • Implemented file tracking system on GCP platforms using Python
  • Automated missing data report generation to ensure delivery timeliness
  • Implemented data pipelines leveraging PySpark for efficient batch processing on AWS
  • Formulated comprehensive test plans in alignment with business needs
  • Created, Reviewed and updated Test Scenarios, Performed end-to-end testing, Regression Testing and Integration Testing
  • Executed integration testing to ensure optimal ETL performance
  • Diagnosed data quality issues to identify underlying problems
  • Transformed data into Parquet and AVRO formats to facilitate integration with Snowflake and Hive data warehouses
  • Exhibited outstanding verbal, written, and interpersonal communication abilities
  • Managed security protocols such as authentication and authorization to protect sensitive information stored in databases.
  • Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS data pipeline, GCP hive, SQL, Snowflake, DBT, Tableau, Numpy, Pandas, Dask, Scikit learn, Machine Learning, Flask, HTML,CSS, JAVASCRIPT

Data Engineer

CAPITAL ONE
02.2021 - 09.2021
  • Built data pipelines (ELT/ETL scripts) to extract data from sources like Snowflake, AWS S3, and Teradata, transform it, and load it into Salesforce.
  • Utilized Teradata ETL tools such as BTEQ, FastLoad, MultiLoad, and FastExport for data extraction from Teradata.
  • Developed ETL scripts using Python, AWS S3, and cloud storage to migrate data from Teradata to Snowflake via PySpark.
  • Created Spark applications using Spark SQL in Databricks to extract, transform, and load user click and view data.
  • Designed and implemented AWS CloudFormation templates in JSON to build custom VPCs, subnets, and NAT for application deployment.
  • Created new dashboards, reports, scheduled searches, and alerts using Splunk.
  • Leveraged Kafka to build real-time data pipelines between clusters.
  • Developed custom Jenkins jobs/pipelines using Bash scripts and AWS CLI for automated infrastructure provisioning.
  • Created an ETL framework to process revenue files received from partners.
  • Built a machine learning model for predicting user behavior (transactor vs. revolver) and detecting fraud.
  • Optimized data pipelines in Databricks using Spark, reducing data processing time by 30%.
  • Developed AWS Lambda serverless scripts for handling ad-hoc requests.
  • Performed cost optimization, reducing infrastructure expenses.
  • Proficient in Python libraries such as NumPy, Pandas, Dask, Scikit-learn, and ONNX for machine learning tasks.
  • Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS Data Pipeline, GCP, AWS CloudWatch, AWS CloudFormation, AWS Glue, AWS Kinesis, Shell scripts, DBT, Jenkins, Splunk, PagerDuty, Snowflake, Scikit-learn.

Data Engineer

Credit Sesame
Mountain view, CA
09.2017 - 01.2021
  • Company Overview: Mountain view, CA
  • Worked on building the data pipelines (ELT/ETL Scripts) - extracting the data from different sources (MySQL, Dynamo DB, AWS S3 files), transforming and loading the data to the Data Warehouse (Snowflake)
  • Worked on adding the Rest API layer to the ML models built using Python, Flask & deploying the models in AWS Beanstalk Environment using Docker containers
  • Worked on developing & adding a few Analytical dashboards using Tableau
  • Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve the Tableau analytics dashboard performance and to help data scientists and analysts to speed up the ML model training & analysis
  • Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving the credit products
  • Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to SnowFlake data warehouse
  • Worked on building the data pipelines using PySpark, processing the data files present in S3 and loading it to Snowflake
  • Worked on supporting & building the infrastructure for the core module of the Credit Sesame i.e Approval Odds, started with Batch ETL, moved to micro-batches, and then converted to real time predictions
  • Designed and developed AWS Cloud Formation Templates to deploy the web applications
  • Worked on building the data pipelines using Spark (AWS EMR), processing the data files present in S3 and loading it to SnowFlake
  • Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onnx & MachineLearning
  • Other activities include: supporting and keeping the data pipelines active, working with Product Managers, Analysts, Data Scientists & addressing the requests coming from them, unit testing, load testing and SQL optimizations
  • Technical environment: Java, Groovy, Python, Flask, Numpy, Pandas, SQL, MySQL, Cassandra, AWS EMR, Spark, AWS Kinesis, Snowflake, AWS EC2, AWS S3, AWS Beanstalk, AWS Lambda, AWS data pipeline, AWS CloudFormation, AWS cloud-watch, Docker, Shell scripts, Dynamo DB

ETL Developer

Nic InfoTek
Tampa, FL
07.2016 - 08.2017
  • Company Overview: Tampa, FL
  • Participated in project planning sessions with management, data architects, external stakeholders, and team members to analyze business requirements and outline the proposed data warehouse solution
  • Collaborated with business analysts and stakeholders to gather and analyze requirements, translating them into technical specifications for ETL processes and data integration
  • Collaborated with a Senior Data Architect to construct a comprehensive data warehouse using a combination of on-premises and cloud-based infrastructures
  • Maintained complex data models for a large-scale data warehouse, ensuring optimal performance and scalability for handling terabytes of data
  • Translated business requests into specific KPI dashboards and reports
  • Provided support to Power BI developers in designing, developing, and maintaining BI solutions
  • Designed and executed a comprehensive system test plan to ensure the accurate implementation of the data solution
  • Technical environment: Python, Numpy, Pandas, SQL, MySQL, Redshift, AWS

Education

Master of Science - Computer Science and Programming

Southern University And A & M College
Baton Rouge, LA
06.2016

Bachelor of Science - Computer Science And Engineering

JNTU Hyderabad
05.2012

Skills

  • Data Integration Expertise
  • Information Security Management
  • Version Control Proficiency
  • Automated Deployment
  • Integration Workflow Optimization
  • API Integration Expertise
  • Risk Management Analysis
  • SQL Data Synchronization
  • Resource Utilization Optimization

Certification

  • GCP Professional Data Engineer

Timeline

Lead Data Engineer

ROKU INC
10.2021 - Current

Data Engineer

CAPITAL ONE
02.2021 - 09.2021

Data Engineer

Credit Sesame
09.2017 - 01.2021

ETL Developer

Nic InfoTek
07.2016 - 08.2017

Master of Science - Computer Science and Programming

Southern University And A & M College

Bachelor of Science - Computer Science And Engineering

JNTU Hyderabad
Sritej Marni