• Strategic Data Engineer with over 8 years of hands-on experience in utilizing a comprehensive suite of tools and programming languages to extract, transform, and analyze data across diverse industries including Healthcare, Cloud Computing, Media and Entertainment, Telecom, and Banking.
• Proficient in SQL and adept at working with various SQL servers including MySQL, PostgreSQL, SQL Server, and Oracle, leveraging advanced querying techniques to derive insights from complex relational databases.
• Graph Ontology and semantic modeling with GraphQL or SPARQL experience is a must.
• Advanced proficiency in Python for data preprocessing, statistical analysis, and predictive modeling, utilizing libraries such as Pandas, NumPy, and SciPy to manipulate and analyze data efficiently.
• Strong command of R programming language and R Studio for statistical analysis, data visualization, and machine learning, employing packages like ggplot2, dplyr, and caret to explore and interpret data effectively.
• Experience in utilizing SPSS for advanced statistical analysis, hypothesis testing, and predictive modeling, providing deeper insights into complex datasets.
• Expertise in data visualization tools including Tableau and Power BI for creating dynamic dashboards and visualizations that drive data-driven decision-making and storytelling.
• Proven track record of leveraging varied SQL servers and cloud platforms such as AWS, Azure, and Google Cloud Platform, to seamlessly integrate and analyze diverse data sources.
• Proficiency in web scraping using libraries like BeautifulSoup and Scrapy to extract structured data from web pages, expanding the scope of data analysis to include unstructured data sources.
• Knowledgeable in machine learning algorithms and techniques for predictive modeling, classification, and clustering, utilizing libraries such as scikit-learn, TensorFlow, and Keras to build and deploy machine learning models.
• Experienced in data wrangling techniques including data cleansing, transformation, and feature engineering, ensuring data quality and integrity for accurate analysis and interpretation.
Languages
Databases
Jupyter Notebooks, Apache Zeppelin, RStudio, Tableau, Power BI, Excel (with VBA for automation), SQL IDEs (eg, SQL Server Management Studio, DBeaver), Google Sheets, Alteryx, KNIME
Apache Kafka
Data Storage
Snowflake, Amazon Redshift, PostgreSQL, MySQL, Microsoft SQL Server
Google Cloud Platform, Azure, AWS, Databricks
Pandas, NumPy, Spark, Java Spring boot