Data Analyst with 4 years of experience specializing in data pipeline development, SQL optimization, and advanced data analysis to drive business decisions. Proven expertise in Azure, Python, and SQL for data engineering tasks, with a track record of delivering actionable insights and improving data workflows. Skilled in using data visualization tools to communicate findings effectively to stakeholders.
• Python programming to conduct data analysis on preventive maintenance records for 1000+ equipment units. Employed
regression analysis techniques with Pandas and NumPy to forecast maintenance costs at the outset of each fiscal year.
• Utilized Kafka for real-time processing of preventive maintenance logs on equipment data, managing over 300 messages per minute, and
stored the processed data in AWS S3. Additionally, employed AWS Glue crawlers for automated metadata extraction.
• Streamlined ELT tasks using PySpark, enhancing data processing efficiency by 20% and integrated with Redshift data warehouse for
optimized storage and analysis.
• Performed data analysis using PowerBI techniques to create BI dashboards and reports, identifying key trends, patterns and
communicating insights from marketing campaigns.
• Compiled monthly performance reports, emphasizing key performance indicators (KPIs) using advanced Excel features, including VBA
and VLOOKUP functions.
• Conducted predictive modeling using SPSS, and SAS to forecast sales trends, achieving a 25% accuracy rate and informing strategic
business decisions.
• Mitigated fraud transactions by 17% in offline retail environments through the implementation of real-time, data-driven rules using SQL,
Association rule mining, K-means clustering and predictive modeling.
• Drove a 20% increase in CRM growth using Salesforce by deploying advanced models incorporating RFM, Demand Forecasting, and Market Basket Analysis for customer retention, and cross-sell product analysis through real time campaigns
•Extracted, transformed, and loaded over 1TB data from SQL Server, HTTP end points and REST APIs into Hive external tables
via Sqoop and Spark ETL processes
•
Integrated retail and corporate sales data with PySpark, utilizing UDFs for aggregation and Python scripts for Data Frame
manipulation, resulting in a 30% reduction in processing time and handling over 2 million records daily
•
Orchestrated end-to-end data workflows and pipelines by leveraging Azure Data Factory, enhancing data integration and
management
•
Designed and managed scalable data warehouse solutions in Azure Synapse Analytics, handling over 5 TB of data and
improving query performance by 10% for high-performance analytics and reporting
•
• Created Power BI reports and dashboards utilizing DAX expressions and complex data models to provide sales insights
Databases and Programming: MongoDB, SQL Server, Oracle, R, Python, Advanced SQL, Hive SQL, SparkSQL Big Data & AzureTools: Hadoop, PySpark, Sqoop, Kafka, Snowflake, Redshift, Hive, Azure Data Factory, Synapse Analytics, Data Lake Visualization and Analytical Tools: Tableau, PowerBI, Matplotlib, Excel, SPSS, SAS
• Optimized database performance by 90%, significantly improving system efficiency. • Reduced cloud costs by 40% through strategic cost optimization techniques. • Awarded 'Excellence in Innovation' at Technova for outstanding contributions. • Implemented predictive analytics models, improving cost forecasting accuracy. • Developed scalable microservices using Spring Boot, enhancing system modularity and performance.Debugged deployment and infrastructure issues
• Databases and Programming: MongoDB, SQL Server, Oracle, R, Python, Advanced SQL, Hive SQL, SparkSQL • Big Data & AzureTools: Hadoop, PySpark, Sqoop, Kafka, Snowflake, Redshift, Hive, Azure Data Factory, Synapse Analytics, Data Lake • Visualization and Analytical Tools: Tableau, PowerBI, Matplotlib, Excel, SPSS, SAS