Experienced Senior Data Engineer with over 6 years of hands-on experience designing and deploying robust data solutions using Azure, AWS, and GCP cloud platforms. Adept in building scalable data pipelines and analytical platforms for real-time and batch processing. Strong foundation in Python, R, SQL, and PySpark for transforming clinical, biological, and enterprise data. Proven track record of collaborating with cross-functional teams to deliver compliant, scalable data products using modern cloud-native tools. Holds a Master’s in Data Science (NJIT) and a Bachelor’s in Computer Science. Specialized in clinical data engineering and biological datasets, with deep understanding of data QC workflows and feature selection techniques. Proficient in Git-based version control, CI/CD using Azure DevOps & AWS CodePipeline. Experienced in delivering data visualizations using Tableau and Power BI (with knowledge of Spotfire equivalents), and in creating machine learning and deep learning workflows using Python, Spark MLlib, and Vertex AI. Open to up to 15% domestic and international travel.
Databases - Oracle, MySQL, Hive, SQL Server, HBase, Cassandra, MongoDB
Bigdata Technologies - HDFS, Hive, PySpark, Map Reduce, Pig, YARN, Sqoop, Oozie, Zookeeper, Flume
Programming Languages - Python, Java, SQL, R, PL/SQL, Scala, JSON, XML, C#
Cloud Services - Azure, Cosmos, Blob storage, Kubernetes, Azure Synapse Analytics(DW), Azure Data Lake, Databricks, DWH, Data Factory
Techniques - Datamining, Clustering, Data Visualization, Data Analytics
Methodologies - Agile/Scrum, UML, Design Patterns, Waterfall
Container Platform - Docker, Kubernetes, CI/CD, Jenkins
Tools & Utilities - JIRA, GitHub, Tableau 91, Power BI, Control-M, PowerShell