Results-driven Data Engineer known for high productivity and efficient task completion. Skilled in big data processing frameworks like Hadoop and Apache Spark, database management using SQL, and data visualization with tools such as Tableau. Excel in problem-solving, collaboration, and adaptability to leverage technical skills in developing innovative data solutions across diverse environments.
Programming Languages: Python (Pandas, NumPy, Scikit-learn), Shell scripting, R, SQL, Oracle PL/SQL
Tools: Azure ML studio, Visio, Visual Studio, SAP, Postman, Tableau, PowerBI, MS Excel, Azure Databricks
Databases: MySQL, MS SQL, PostgreSQL, Oracle, AWS Redshift, SQLite
Big Data Technologies: Pyspark, Kafka, Databricks, Airflow
Management Tools: Jira, ADO board, CRM, Salesforce
Cloud Technologies: AWS (EC2, S3), AWS Lambda, AWS Glue, AWS EMR, Textract, AWS Sage maker, AWS Red Shift, Databricks, Apache Airflow, Snowflake, Azure Data Lake, Azure Data Factory, Azure SQL, Synapse
Student Data ETL Pipeline, Built an ETL pipeline to automate the ingestion, transformation, and loading of student enrollment and performance data into a central database. Utilized Python and SQL for data processing and automation. This project improved data consistency and reduced manual intervention by 50%. Real-Time Weather Data Processing, Developed a real-time data pipeline to collect weather data from an API and store it in a cloud data warehouse. Used Apache Kafka for streaming and PySpark for processing large datasets. The project enabled near real-time data analysis and faster decision-making based on updated weather information.