Programming & Scripting: Python, SQL, PL/SQL, JavaScript, C, C, and Shell Script
Databases & Data Warehousing: Snowflake, Redshift, PostgreSQL, SQL Server, MySQL, MongoDB, Cassandra
Big Data & Distributed Systems: Apache Spark, Hadoop, Hive, Kafka, Data-bricks, Apache Flink, Spark SQL, Spark Core, MapReduce, Spark Streaming
ETL Data Pipelines: Informatica, DBT
Cloud Platforms: AWS, Azure
Scheduler Tools & APIs: Apache Airflow, REST APIs
Data Visualization & Analytics: Power BI, Tableau, Matplotlib, Pandas, NumPy
DevOps & CI/CD Tools: Docker, Kubernetes, Terraform, GIT, GitHub
Project Management and Operating Systems: Jira, ServiceNow, Confluence, Windows, Linux, Unix, MacOS
Real-Time Streaming Analytics on E-Commerce Transactions -
Developed a real-time data pipeline to process e-commerce transactions using Kafka for streaming and PySpark for data transformation. Stored raw data in AWS S3 and performed analysis using Redshift and SQL to derive customer purchase trends. Built interactive visualizations in Matplotlib to identify high-demand products and sales patterns, improving business decision-making.
Sales Data Pipeline & Analysis Using PySpark and AWS -
Built an end-to-end ETL pipeline to process and analyze sales data for a retail business. Ingested raw sales data from CSV files stored in AWS S3 and used AWS Glue to clean and transform data. Loaded the processed data into AWS Redshift for further analysis. Used PySpark and SQL to extract insights such as top-selling products, revenue trends, and customer behavior. Created visual reports using Matplotlib to present key findings, helping businesses make data-driven decisions.