Project: Real-Time Sales Data Processing with PySpark for Retail Analytics.
Real-Time Data Processing and Analytics: Built real-time ETL pipelines using PySpark and Apache Spark, processing transactional data from various retail outlets to generate up-to-the-minute sales insights. This allowed the company to quickly adapt to trends, demand shifts, and inventory needs.
Data Integration with Sales Systems : Integrated sales transaction data from point-of-sale systems, third-party logistics providers, and inventory management software. Utilized SQL and PySpark for efficient data transformations and storage, ensuring data consistency across systems.
Predictive Analytics and Reporting : Developed dashboardsand reports using Apache Spark and Databricks to present real-time analytics to the business teams. The system enabled automatic reporting of daily sales performance, regional trends, and inventory alerts, helping stakeholders make informed decisions.
Workflow Orchestration with Apache Airflow : Automated data workflows using Apache Airflow, orchestrating the data extraction, transformation, and loading processes, ensuring seamless and reliable data pipeline execution.
Cloud Infrastructure and Scalability : Deployed and managed cloud-based infrastructure using AWS (EC2, S3) and Azure Databricks to store large volumes of sales transaction data. Leveraged cloud services for dynamic scaling, ensuring high availability, and performance during peak transaction periods.
Version Control and Project Management : Collaborated with cross-functional teams using Git for version control and JIRA for project tracking, enabling efficient project management, regular progress updates, and task prioritization.
Data Optimization and Performance Tuning: Optimized data processing performance in PySpark by fine-tuning partitioning strategies, caching intermediate results, and applying broadcast joins for large datasets, significantly reducing processing time and improving overall pipeline efficiency. Implemented data partitioning and bucketing techniques in Apache Spark to enable faster query execution, and enhanced scalability for future growth.
Database Management Intern
Rashtriya Ispat Nigam
Visakhapatnam, Andhra Pradesh
05.2022 - 08.2022
Understanding of SQL, and exposure to real-world database projects.
Familiarity with database management systems (DBMS) like MySQL, PostgreSQL, and Oracle.
Working with large datasets and enterprise databases.
Experience with CRUD operations (Create, Read, Update, Delete).
Writing and optimizing SQL queries for performance.
Database: SQL (MySQL, Oracle), MS Access, Snowflake
Platforms: Linux, Web, Windows, AWS, Azure;
Soft Skills: Leadership, Event Management, Writing, Public Speaking, Time Management
Academic Projects
Project: Cricket Data Analytics Pipeline for Optimal Team Selection June 2023
Data Extraction Cleaning: Scraped and processed over 1,000 T20 World Cup records using Python, leveraging BrightData and Pandas for data collection and transformation. Cleaned and standardized data to ensure consistency and accuracy in analytical caluclation.
Performance Analysis Visualization: Designed interactive Power BI dashboards for dynamic player evaluation based on KPIs like strike rate, economy rate, and batting averages.
Automated Scalable Data Processing : Built automated pipelines for data extraction, transformation and analysis, ensuring scalability and efficient team selection based on historical trends
Scalability and Optimisation : Optimised data pipelines for efficient processing and storage, ensuring scalability for future expansions in data volume and additional analytical features.
Mathematics Teaching Assistant at High Mowing School Unpaid and Voluntary JobMathematics Teaching Assistant at High Mowing School Unpaid and Voluntary Job