Summary
Overview
Work History
Education
Skills
Timeline
AdministrativeAssistant

Sritej Marni

Dallas,TX

Summary

12 years of IT experience spanning Data Engineering, Data Analysis, and Machine Learning Engineering. Expertise in building data pipelines and ETL processes with tools such as Python, Groovy (Java), SQL, AWS, GCP, Kafka, and Spark. Developed batch ETL pipelines to extract data from sources like MySQL, Mixpanel, REST APIs, and text files into Snowflake. Skilled in distributed data processing using Databricks for data transformation, validation, and cleaning, ensuring high data quality and consistency. Experience working with Spark (AWS EMR) to process data from S3 and load it into Snowflake. Worked closely with analytics and data science teams to support model deployment using Docker, Python, Flask, and AWS Beanstalk. Contributed to end-to-end CI/CD pipelines with tools like CodeBuild, CodeDeploy, Git, and CodePipeline. Developed and managed API gateways and web services for seamless data integration. Strong foundation in Object-Oriented Programming (OOP), writing extensible, reusable, and maintainable code. Hands-on experience with IDEs and development tools, including Eclipse, PyCharm, PyScripter, Notepad++, and Sublime Text. Proficient with Python libraries such as NumPy, Matplotlib, Beautiful Soup, and Pickle for data manipulation and visualization. Expertise in writing efficient Python code and resolving performance bottlenecks. Implemented optimized data processing pipelines to meet performance SLAs and reduce latency. Demonstrated ability to lead teams and work independently to deliver complex projects. Strong client interaction and presentation skills, effectively bridging technical and business communication gaps. Proven success in delivering solutions on time and driving collaborative teamwork. Experience working in cloud-native environments (AWS and GCP) and utilizing Kafka for real-time data streaming. Skilled in data modelling, governance, and ensuring data integrity across distributed systems. Adaptive to agile development methodologies, ensuring smooth project delivery and iteration.

Overview

12
12
years of professional experience

Work History

Senior Data Engineer

ROKU INC
10.2021 - Current
  • Engineered Python-based solution for downloading data from various payment processors
  • Established Python-based framework for daily and monthly file monitoring via Roku Pay platforms
  • Transformed AWS S3/GCP Parquet, AVRO files into Snowflake and Hive Data Warehouse
  • Configured stages to facilitate seamless data migration into Snowflake cloud
  • Tailored visual reports based on specific requirements using Tableau
  • Designed a data quality framework to ensure data integrity across multiple tables in Snowflake Cloud Data Warehouse
  • Implemented real-time data loading using Snowpipe for partner-supplied files, including those from Hulu and Disney
  • Leveraged PySpark to streamline recommendations data workflows
  • Developed ETL processes with Spark-SQL/PySpark Data Frames
  • Implemented file tracking system on GCP platforms using Python
  • Automated missing data report generation to ensure delivery timeliness
  • Implemented data pipelines leveraging PySpark for efficient batch processing on AWS
  • Formulated comprehensive test plans in alignment with business needs
  • Created, Reviewed and updated Test Scenarios, performed end-to-end testing, Regression Testing and Integration Testing
  • Executed integration testing to ensure optimal ETL performance
  • Diagnosed data quality issues to identify underlying problems
  • Transformed data into Parquet and AVRO formats to facilitate integration with Snowflake and Hive data warehouses
  • Exhibited outstanding verbal, written, and interpersonal communication abilities
  • Managed security protocols such as authentication and authorization to protect sensitive information stored in databases
  • Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS data pipeline, GCP hive, SQL, Snowflake, DBT, Tableau, Numpy, Pandas, Dask, Scikit learn, Machine Learning, Flask, HTML,CSS, JAVASCRIPT

Data Engineer

CAPITAL ONE
02.2021 - 09.2021
  • Built data pipelines (ELT/ETL scripts) to extract data from sources like Snowflake, AWS S3, and Teradata, transform it, and load it into Salesforce
  • Utilized Teradata ETL tools such as BTEQ, FastLoad, MultiLoad, and FastExport for data extraction from Teradata
  • Developed ETL scripts using Python, AWS S3, and cloud storage to migrate data from Teradata to Snowflake via PySpark
  • Created Spark applications using Spark SQL in Databricks to extract, transform, and load user click and view data
  • Designed and implemented AWS CloudFormation templates in JSON to build custom VPCs, subnets, and NAT for application deployment
  • Created new dashboards, reports, scheduled searches, and alerts using Splunk
  • Leveraged Kafka to build real-time data pipelines between clusters
  • Developed custom Jenkins jobs/pipelines using Bash scripts and AWS CLI for automated infrastructure provisioning
  • Created an ETL framework to process revenue files received from partners
  • Built a machine learning model for predicting user behavior (transactor vs
  • Revolver) and detecting fraud
  • Optimized data pipelines in Databricks using Spark, reducing data processing time by30%
  • Developed AWS Lambda serverless scripts for handling ad-hoc requests
  • Performed cost optimization, reducing infrastructure expenses
  • Proficient in Python libraries such as NumPy, Pandas, Dask, Scikit-learn, and ONNX for machine learning tasks
  • Technical environment: Python, AWS EMR, Spark, AWS EC2, AWS S3, AWS Lambda, AWS Data Pipeline, GCP, AWS CloudWatch, AWS CloudFormation, AWS Glue, AWS Kinesis, Shell scripts, DBT, Jenkins

Data Engineer

Credit Sesame
09.2017 - 01.2021
  • Worked on building the data pipelines (ELT/ETL Scripts) - extracting the data from different sources (MySQL, Dynamo DB, AWS S3 files), transforming and loading the data to the Data Warehouse (Snowflake)
  • Worked on adding the Rest API layer to the ML models built using Python, Flask & deploying the models in AWS Beanstalk Environment using Docker containers
  • Worked on developing & adding a few Analytical dashboards using Tableau
  • Worked on building the aggregate tables & de-normalized tables, populating the data using ETL to improve the Tableau analytics dashboard performance and to help data scientists and analysts to speed up the ML model training & analysis
  • Developed a user-eligibility library using Python to accommodate the partner filters and exclude these users from receiving the credit products
  • Built the data pipelines to aggregate the user click stream session data using spark streaming module which reads the click stream data from Kinesis streams and store the aggregate results in S3 and data and eventually loaded to SnowFlake data warehouse
  • Worked on building the data pipelines using PySpark, processing the data files present in S3 and loading it to Snowflake
  • Worked on supporting & building the infrastructure for the core module of the Credit Sesame i.e Approval Odds, started with Batch ETL, moved to micro-batches, and then converted to real time predictions
  • Designed and developed AWS Cloud Formation Templates to deploy the web applications
  • Worked on building the data pipelines using Spark (AWS EMR), processing the data files present in S3 and loading it to SnowFlake
  • Knowledge and experience on using Python Numpy, Pandas, Sci-kit Learn, Onnx & MachineLearning
  • Other activities include: supporting and keeping the data pipelines active, working with Product Managers, Analysts, Data Scientists & addressing the requests coming from them, unit testing, load testing and SQL optimizations
  • Technical environment: Java, Groovy, Python, Flask, Numpy, Pandas, SQL, MySQL, Cassandra, AWS EMR, Spark, AWS Kinesis, Snowflake, AWS EC2, AWS S3, AWS Beanstalk, AWS Lambda, AWS data pipeline, AWS CloudFormation, AWS cloud-watch, Docker, Shell scripts, Dynamo DB

ETL Developer

Ingram Micro
07.2012 - 08.2017
  • Participated in project planning sessions with management, data architects, external stakeholders, and team members to analyze business requirements and outline the proposed data warehouse solution
  • Collaborated with business analysts and stakeholders to gather and analyze requirements, translating them into technical specifications for ETL processes and data integration
  • Collaborated with a Senior Data Architect to construct a comprehensive data warehouse using a combination of on-premises and cloud-based infrastructures
  • Maintained complex data models for a large-scale data warehouse, ensuring optimal performance and scalability for handling terabytes of data
  • Translated business requests into specific KPI dashboards and reports
  • Provided support to Power BI developers in designing, developing, and maintaining BI solutions
  • Designed and executed a comprehensive system test plan to ensure the accurate implementation of the data solution
  • Designed and implemented end-to-end ETL processes, extracting data from diverse sources, transforming it to meet business requirements, and loading it into cloud-based data warehouses like Snowflake
  • Developed and optimized scalable ETL pipelines, ensuring seamless data integration across multiple platforms using tools such as Informatica, Talend, SSIS, and Pentaho
  • Enhanced data quality and integrity through profiling, validation, and automated error-handling mechanisms, ensuring consistency and accuracy in the data pipeline
  • Automated workflows using schedulers like Apache Airflow and cloud-native orchestration tools to streamline operations and reduce manual intervention
  • Collaborated with data science teams to prepare data pipelines for AI/ML models, providing ready-to-use datasets for advanced analytics and machine learning projects
  • Integrated real-time data streaming using platforms like Apache Kafka alongside batch ETL processes to support time-sensitive reporting needs
  • Developed CI/CD pipelines for ETL solutions using tools like Git, Jenkins, and CodePipeline, ensuring smooth deployments and version control
  • Utilized SQL and Python expertise to write efficient queries, optimize data transformations, and automate routine tasks for enhanced productivity
  • Monitored and maintained ETL processes, identifying and resolving issues through real-time logging and troubleshooting to ensure uninterrupted data flow
  • Worked with cloud platforms (AWS, GCP, Azure) to design cloud-native pipelines, leveraging their storage and compute capabilities for efficient data handling
  • Collaborated with cross-functional teams, including business stakeholders, to translate requirements into actionable ETL workflows while ensuring high performance and scalability
  • Stayed updated with emerging trends, including real-time data processing, automation, and cloud-native ETL tools, to continuously enhance ETL operations
  • Technical environment: Python, Numpy, Pandas, SQL, MySQL, Redshift, AWS

Education

Bachelor of Science - Computer Science and Engineering

JNTU Hyderabad
Telangana
05-2012

Skills

  • Languages: Python, Groovy (Java), SQL, Shell Scripting, JavaScript, HTML/CSS
  • Frameworks and Libraries: PySpark, Flask, NumPy, Pandas, Dask, Sci-kit Learn, ONNX, Matplotlib, Beautiful Soup
  • ETL Tools: Informatica, Talend, Pentaho, SSIS Batch
  • Data Processing: Spark, Databricks, AWS Glue, Apache Kafka, Airflow, Snowpipe
  • Data Warehousing and Databases: Snowflake, MySQL, Hive, Cassandra, DynamoDB, Redshift, Teradata
  • AWS Services: EMR, S3, Lambda, EC2, Beanstalk, CloudFormation, Kinesis, CodePipeline, CloudWatch
  • GCP Services: Data Storage, Parquet/AVRO Processing, DataFlow, GCP Lambda Functions
  • CI/CD Tools: Jenkins, Git, CodeBuild, CodeDeploy
  • IDEs: Eclipse, PyCharm, PyScripter, Notepad, Sublime Text Version Control: Git, CodePipeline
  • Soft Skills and Methodologies: Agile development and project management, strong client interaction, presentation skills, and teamwork abilities

Timeline

Senior Data Engineer

ROKU INC
10.2021 - Current

Data Engineer

CAPITAL ONE
02.2021 - 09.2021

Data Engineer

Credit Sesame
09.2017 - 01.2021

ETL Developer

Ingram Micro
07.2012 - 08.2017

Bachelor of Science - Computer Science and Engineering

JNTU Hyderabad
Sritej Marni