Data professional with a solid foundation in big data engineering, data analysis, and visualization, supported by 2 years of industry experience. Proficient in leveraging Hadoop, Spark, and visualization tools to extract insights from complex datasets. Seeking a master’s program to advance my knowledge in data science, statistical analysis, and machine learning, with a focus on applying these skills to solve real-world problems.
Big Data Frameworks: Hadoop, Spark, Hive
Languages: Python, PySpark, SQL, PL/SQL
Data Warehousing and ETL Tools: Informatica, Abinitio, SAP Business Objects
Scripting: UNIX Shell Scripting (Bash, KornShell)
Schedulers: Airflow, Autosys
Databases: Teradata, Oracle
Business Intelligence Tools: Tableau
Project: SAT (System for Administration of Tax)
Role: ETL Developer
Tools: Hadoop, PySpark, Abinitio, Informatica, Airflow, Unix
Description: Developed a comprehensive data pipeline for the Tax, facilitating tax-related data processing and secure reporting to comply with regulatory requirements.
Responsibilities:
• Engineered ETL workflows in Spark on Hadoop clusters to handle and transform high-volume tax data.
• Ensured secure file generation and data encryption for sensitive tax data using Abinitio and Informatica.
• Collaborated with cross-functional teams to maintain robust data quality controls and adherence to compliance standards.
Project: Internal Fraud Detection
Role: ETL Developer
Tools: Hadoop, Spark, Python, Abinitio, Informatica, Airflow, Unix
Description: Implemented a data pipeline to monitor and detect internal fraud by analyzing key transactional and demographic data. Leveraged Spark and Hadoop to manage high-volume data and ensure real-time data quality.
Responsibilities:
• Developed complex ETL jobs with Spark to transform and cleanse transactional data for fraud detection.
• Coordinated with data quality teams to enhance data validation and monitoring capabilities.
• Scheduled data workflows using Airflow and provided support during testing and deployment phases.
Big Data Analytics, Data Visualization, Spark, Predictive Modeling, Distributed Systems, Data Mining, Machine Learning, Advanced Statistical Analysis
Customer Churn Analysis :
• Conducted a comprehensive customer churn analysis using Python and SQL, employing statistical techniques to identify factors contributing to churn.
• Designed an interactive sales performance dashboard using Tableau, incorporating filters and drill-downs to allow stakeholders to explore data by region, product, and time.
• Analyzed sales data trends and provided actionable insights, which helped guide strategic decisions in resource allocation and product promotions.
• Developed a predictive model to assist in customer retention strategies, reducing churn potential by identifying at-risk customers.