Summary
Overview
Work History
Education
Skills
Certification
Awards
Accomplishments
Affiliations
Work Availability
Languages
Timeline
web
Moses G.

Moses G.

Lead Data Scientist & Cloud Engineer | Cloud Architecture, Data Engineering, Machine Learning Engineering
Austin,TX

Summary

Passionate Lead Data Scientist & Cloud Engineer with 10+ years architecting and leading scalable, high-performance data platforms that turn complex data into strategic business assets.

My expertise centers on AWS, where I've designed end-to-end solutions using Amazon S3 for robust data lakes, AWS Glue for serverless ETL/ELT, Amazon EMR (PySpark/Spark) for big data processing, Amazon Redshift and Spectrum for analytics warehousing, Amazon Kinesis for real-time streaming, AWS Lambda & Step Functions for event-driven orchestration, Amazon Athena for query-on-lake, and AWS Lake Formation for governance and security. I've led migrations to AWS-native architectures, optimized petabyte-scale pipelines (10–15TB+ daily), reduced costs by 35–45% through auto-scaling, Spot Instances, and serverless shifts, and achieved 99.9% uptime with sub-second latency for mission-critical analytics.

I also bring strong Azure capabilities, including Azure Data Factory for pipeline orchestration, Azure Databricks for Spark-based transformations and Delta Lake, Azure Synapse Analytics for unified warehousing and big data processing, and ADLS Gen2 for storage—enabling hybrid/multi-cloud strategies and seamless integrations when needed.

Beyond clouds, I excel in general data science engineering leadership: mentoring teams of 8–12 engineers on best practices, enforcing data governance and quality frameworks, collaborating cross-functionally with data science, analytics, and product teams, and delivering resilient batch/streaming pipelines that support ML models, BI dashboards, and real-time decision-making. Tools like Python (Boto3/PySpark/Pandas), SQL, Apache Spark/Kafka, Airflow (MWAA), Terraform/CloudFormation, and Docker are my daily drivers for building reliable, cost-efficient systems.

What drives me is solving tough data challenges at scale—whether slashing processing times by 50–70%, enabling self-service analytics for hundreds of users, or aligning infrastructure with revenue-generating outcomes. I've delivered millions in business value through optimized, governed platforms.

Open to connecting on multi-cloud data architecture, AWS/Azure migrations, team leadership in data engineering, or opportunities to build next-gen scalable solutions. Let's discuss how we can drive impact together! 🚀

Cloud Data Engineering & Distributed Systems

• Pipeline orchestration and ETL across Databricks, Airflow, Glue, and Data Factory

• Distributed data processing (Spark, Databricks)

• Lakehouse and data warehouse architectures (Delta Lake, BigQuery, Redshift etc)

Overview

12
12
years of professional experience
4
4
Certification

Work History

Lead Data Scientist & Cloud Engineer

AT&T
Remote
12.2022 - Current

• Act as a technical liaison between customers, service engineering teams, and leadership to design and deploy AWS solutions.

• Led architecture of enterprise data lake on Amazon S3 with AWS Glue Crawlers and Data Catalog, enabling scalable ingestion and discovery for petabyte datasets.
• Designed and implemented high-throughput ETL/ELT pipelines using AWS Glue and PySpark, processing 10TB+ daily and reducing runtime by 45%.
• Architected real-time streaming solutions with Amazon Kinesis Data Streams/Firehose to S3/Redshift, supporting sub-second analytics for mission-critical apps.
• Optimized Amazon Redshift clusters and Spectrum queries on S3, improving BI query performance by 3x and saving $400K+ annually in compute/storage.
• Built serverless data pipelines using AWS Lambda triggered by S3 events, integrated with Glue jobs for automated transformations and cost efficiency.
• Led cross-functional teams in adopting AWS Step Functions for workflow orchestration, enhancing pipeline reliability and reducing manual interventions by 70%.
• Implemented data governance frameworks with AWS Lake Formation and IAM policies, ensuring compliance and secure access across 500+ users.
• Mentored 7+ junior engineers on AWS best practices (Glue, EMR, Lambda), resulting in 30% improved team productivity and faster delivery.
• Engineered cost-optimization strategies across AWS services (Glue DPUs, EMR scaling, Athena partitioning), cutting monthly data platform expenses by 40%.
• Developed event-driven architectures with AWS Lambda and EventBridge, automating data quality checks and notifications via SNS.
• Integrated Amazon Athena for ad-hoc querying on S3 data lakes, enabling self-service analytics and reducing dependency on heavy warehousing.
• Built and implemented data infrastructure, ingesting and transforming data via ETL/ELT for large-scale apps.

  • Led development of machine learning models to enhance predictive analytics and data-driven decision-making.
  • Directed cross-functional teams in implementing data strategies, improving operational efficiency across departments.

Senior Data Engineer

AT&T
Remote
11.2019 - 12.2022

• Designed and implemented large-scale distributed ETL pipelines across AWS and Databricks environments to support enterprise analytics.
• Led modernization from on-prem data platforms to cloud-based architectures, improving performance and operational efficiency by about 30 percent.
• Architected enterprise data models and analytics solutions supporting self-service reporting for 200+ users.
• Led a cross-functional team to design and implement a new CI/CD pipeline, reducing deployment time by 30% and presenting the results to senior leadership.
• Built and operated production MLOps platforms using SageMaker, Docker, MLflow, and CI/CD pipelines to standardize model delivery.
• Served as senior technical advisor, translating business requirements into scalable AWS solutions for data and analytics initiatives.
• Engineered a scalable AWS data pipeline using S3 and Lambda, ensuring GDPR compliance for a machine learning workflow
• Designed and secured multi-account AWS environments using VPC Peering and Transit Gateway
• Managed and mentored data engineers and analysts, improving delivery quality and architectural consistency.

Senior Data Scientist & Engineer

Adidas
Remote
10.2018 - 11.2019

• Enhanced data visualization techniques with relational databases like SQL, Python, ArcGIS, R, SAS, Analytical Tools (Regression Analysis, Web Analytics), Predictive Modeling, Data Visualization (R-Shiny, Power BI, and Tableau), reducing data analysis time by 50% and increasing data insights by 20%.
• Developed predictive analytics and statistical models to optimize program performance, operational efficiency, and financial forecasting for clients in healthcare, government, and technology.
• Automated reporting and pipeline workflows using Python, SQL, and Azure Data Factory, reducing manual processing time by up to 40%.
• Partnered with business stakeholders to translate requirements into scalable BI dashboards and data pipelines supporting enterprise KPIs and strategic decision-making.
• Established early MLOps best practices, including experiment tracking, dataset versioning, and reproducible model training pipelines.

Data Research Analyst

New York City Department Of Education
New York, NY
09.2014 - 10.2018

• Conducted in-depth longitudinal analysis of student performance using SQL, Excel and R, driving a 20% improvement in test scores by identifying key performance gaps.
• Designed and implemented data-driven curriculum optimization experiments, reducing inefficiencies by 25% and enhancing learning outcomes.
• Translated complex data insights into actionable recommendations, presenting findings to senior stakeholders and aligning strategies with educational objectives.
• Partnered with school administrators and educators to develop interactive dashboards, streamlining student progress tracking and informing data-driven instructional decisions.
• Designed and implemented training sessions for staff on effective data utilization practices.

Education

Post-Graduate Certificate - Cloud Computing

The University of Texas at Austin
05.2025

Master of Science - Data Science

City University of New York
06.2018

Bachelor of Science - Statistics

University of Ilorin
06.2011

Skills

Cloud Platforms: Azure (AKS, ARM),
AWS (EC2, EKS), GCP
DevOps tools: JIRA, Jenkins, Slack,
AzureDevOps
Build Tools: Ant, Maven, MS Build
SCMs: SVN, Git, GitHub, Bitbucket,
GitLab, Azure Git
IAC Tools: Terraform,
CloudFormation
Containers/Orchestration: Docker,
Kubernetes
Application/Web Servers: Tomcat,
WebLogic 9.x/10.x/12c, Apache
2.x/1.3.x, JBoss 7.1
Operating Systems: Ubuntu 18.0.4,
Red Hat Linux, Windows, HP-UX and
Solaris 10
Programming & Scripting
Languages:
Ruby, Python, Shell
scripting, UNIX Shell Scripts (Ksh,
Bash), Git Bash
Web Technologies : HTML5, CSS3,
JavaScript, JSON
Frameworks and Libraries: Angular,
Flask, RESTful APIs, React
Database Technologies: Oracle, SQL
Server, MySQL, PostgreSQL, S3, RDS,
DynamoDB
Methodologies: Agile, Scrum Networking/Security Tools: IAM,
ELB, Putty, VMware

Certification

• Certified AWS Certified Solutions Architect –Professional

• Certified AWS Certified Solutions Architect – Associate

• Certified Power BI Associate, Microsoft

• Certified Database Fundamentals (T-SQL), Microsoft

Awards

Recognized for Outstanding Leadership in mentoring junior data scientists and driving cross-functional collaboration at City of New York.

Accomplishments


• Developed a predictive analytics pipeline using Python and Scikit-learn to identify high-risk properties for health and safety violations, based on historical inspection, complaint, and maintenance data.
• Integrated multi-source data using SQL and PySpark in Databricks, ensuring clean, scalable datasets for machine learning model training.
• Designed and published Power BI dashboards for operational managers and field inspectors to visualize risk levels across buildings, zones, and violation types.

• Built a forecasting model using R and AWS SageMaker to predict peak inspection periods and optimize inspector scheduling, improving field coverage and reducing overtime costs by 20%.
• Automated data extraction and cleansing using SQL scripts, enhancing the timeliness of inspection reports.
• Conducted clustering analysis to group buildings based on historical violations, population vulnerability, and inspection history to develop proactive inspection routes.

• Designed and implemented a financial forecasting system using predictive modeling (Random Forest, Linear Regression) to simulate various budget scenarios.
• Used Azure Machine Learning to deploy models and monitor performance in real-time.

Affiliations

  • American Statistical Association
  • American Society for Quality

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Languages

English
Native or Bilingual

Timeline

Lead Data Scientist & Cloud Engineer

AT&T
12.2022 - Current

Senior Data Engineer

AT&T
11.2019 - 12.2022

Senior Data Scientist & Engineer

Adidas
10.2018 - 11.2019

Data Research Analyst

New York City Department Of Education
09.2014 - 10.2018

Post-Graduate Certificate - Cloud Computing

The University of Texas at Austin

Master of Science - Data Science

City University of New York

Bachelor of Science - Statistics

University of Ilorin
Moses G.Lead Data Scientist & Cloud Engineer | Cloud Architecture, Data Engineering, Machine Learning Engineering