Summary

Overview

Work History

Education

Skills

Websites

Accomplishments

Aws Certifications

Projects

Timeline

Tejaswani Dash

Data Engineer

Fairfax,VA

Summary

Experienced Data Engineer over 5+ years with experience in Machine Learning, healthcare data, automotive data, customer data, media data, reporting, SAS, SAP, PyTorch, Tensorflow, Spark, Scala, NLP, LLM, various data/reporting tools and programming using SQL, Python, R, Shell scripting. Solid understanding of statistical analysis, machine learning algorithms, and predictive modeling. Proven track record of delivering innovative data solutions with effective project management. Hands-on experience in delivering work before deadlines with 98% accuracy. Seeking a challenging role in a dynamic environment to apply my skills and knowledge effectively. Dynamic ETL Developer practiced in helping companies with diverse transitioning, including sensitive data and massive big data installations. Promotes extensive simulation and testing to provide smooth ETL execution. Known for providing quick, effective tools to automate and optimize database management tasks.

Overview

years of professional experience

years of post-secondary education

Work History

Senior Data Engineer

Intelligenie

2 2023 - Current

Crafted highly intricate Python, SQL, and PySpark code, ensuring maintainability and ease of use, to fulfill application requirements and drive data processing and analytics to unprecedented levels of success
Developed highly complex Python, SQL and PySpark code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries
Designing and developing data pipelines and workflows using Abinitio's graphical interface and scripting language
Experienced in ETL processes from REST APIs, encompassing data extraction, transformation, loading, integration, optimization, security, monitoring, and collaboration for data-driven decision-making
Skilled in managing cloud infrastructure and scaling ETL solutions to meet growing data demands using cloud services
Experience in Data Pipelines, phases of ETL, ELT data process, converting BigData/unstructured data sets (JSON, log data) to structured data sets for Product analysts, Data Scientists
Proficient in establishing and enforcing data governance policies, metadata standards, and ensuring compliance with data regulations
Developing and implementing efficient data structures tailored to specific data processing and storage requirements
Experience in collecting, processing, and aggregating large amounts of streaming data using Kafka, Spark Streaming
Designing and implementing ETL data integration workflows using Informatica PowerCenter and data quality tools to ensure accurate and reliable data transfer from diverse sources to Data environments
Proficient in designing and optimizing ETL pipelines using Java, Python, Hive, and Pig for data transformation and processing
Skilled in Hadoop architecture, HDFS, and NoSQL databases like Cassandra
Expertise in query optimization and performance tuning for complex SQL and NoSQL queries
Responsible for end-to-end data pipeline development, integration into CI/CD processes, automation, testing, performance optimization, security, collaboration, and documentation to ensure the seamless and reliable delivery of high-quality data solutions
Experienced in administering HDFS and managing data in Hadoop clusters
Competent in working with MySQL and NoSQL databases, including schema design and data modeling
Familiar with indexing and partitioning strategies
Adept at creating bash shell scripts to automate ETL processes and routine tasks
Proficient in utilizing UNIX utilities and commands for data manipulation and system administration
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity
Build an ETL which utilizes spark jar inside which executes the business analytical model
Developed robust and scalable data pipelines using Spark, PySpark, and Scala for healthcare data extraction, transformation, and loading, ensuring data quality and traceability with GIT integration
Designed and implemented scalable data architectures on Azure, ensuring performance, reliability, and security, including data modeling and storage mechanisms
Developed and optimized ETL processes for seamless data integration from diverse sources into Azure storage solutions, focusing on data quality and workflow efficiency
Established Azure-based data warehouses for advanced analytics and reporting, employing dimensional modeling techniques aligned with business intelligence requirements
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
Constructed scalable big data solutions by utilizing Google Cloud Platform (GCP) products and services including BigQuery, Dataflow, Dataproc, and Pub/Sub, vertex AI and Apache Spark, and maximizing speed and scalability via distributed computing strategies and configuration tweaking
Performed data migration work, ensured robust data governance and security on GCP by implementing access controls, encryption, and monitoring processes, while also addressing compliance requirements through meticulous auditing and adherence to regulatory standards
Utilizing GCP tools like Apache Beam and Apache Spark for data processing and transformation, optimizing performance and scalability
Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).Strong experience of leading multiple Azure Big Data and Data transformation implementations in Pharmacovigilance
Optimized Spark and PySpark jobs for performance, employing techniques like partitioning and caching to efficiently process large healthcare datasets, while addressing data quality and compliance with regulatory standards
Developed data visualization dashboards using Power BI, MicroStrategy to communicate findings effectively.
Developed, implemented and maintained data analytics protocols, standards, and documentation.
Ensured data quality through rigorous testing, validation, and monitoring of all data assets, minimizing inaccuracies and inconsistencies.

Senior Data Engineer- ETL Developer

CoxAutomotive

Proficiency in Python, PyTorch, Tensorflow, SQL, ELT development (particularly in Snowflake), and ELT orchestration using Airflow
Extensive experience in ETL processes, especially with Informatica Power Center, and expertise in integrating data from diverse sources into Data Warehouses
Designed and implemented data solutions utilizing various non-relational databases and data stores
Evaluated suitability and selected appropriate non-relational databases and data stores (object storage, document or key-value stores, graph databases, column-family databases) for specific use cases
Improved optimized data models and schema designs for efficient storage and retrieval
Strong AWS Cloud skills, including experience with analytical services, non-relational databases, data stores and Terraform
Familiarity with reporting tools, production support, Python data manipulation, and version control with GIT
Knowledge of New Relic for monitoring and performance management, as well as experience with project management using Rally and programming in Scala
Ensuring data is stored in a way that maximizes efficiency and performance, utilizing appropriate data structures such as arrays, lists, trees, graphs, or hash tables
Proficient in migrating data from EC2 instances to AWS MWAA, encompassing data analysis, transformation, transfer, and validation with a strong focus on security and compliance
Collaborated with Data Scientist team, Business Process Owners to capture business and functional requirements for Scope of Data Migration
Implemented real-time data streaming solutions using Kinesis and Firehose to ingest and process large volumes of data
Developed and optimized Apache Spark jobs for efficient data processing and analysis, ensuring scalability and performance
Designed and maintained data pipelines to seamlessly integrate streaming data from Kinesis and Firehose into Spark for real-time analytics and insights generation
Worked on AWS services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager
Hands on experience with AWS Services like S3, AWS Glue, Redshift, EMR, Kinesis, FireHose, IAM role and permission, Lambda Functions, SQS, SNS, EC2 etc
Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket
Selecting appropriate AWS services to design and deploy an application based on given requirements
Orchestrated and automated complex data workflows using Apache Airflow, ensuring efficient ETL processes, task dependencies, error handling, and scalability
Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates
Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline
Used Pandas in Python for Data Cleansing and validating the source data
Experienced in AWS services, including compute engine, cloud load balancing, cloud storage, and SQL
Skilled in AWS services and technologies, including MWAA configuration and DAG orchestration
Able to optimize performance, troubleshoot, and ensure a smooth data migration process
Collaborated with data scientist teams to develop AI chatbots optimized using Machine Learning, NLP, Gen AI, LLM fine-tuning with GPT 3.4 and 3.5 for prompt customer responses.

Research Assistant

George Mason University

08.2021 - 01.2023

Conducted text analysis using regression models and performed medical image processing using NLP with 92% accuracy
Developed computational methods and algorithms to analyze and quantify biomedical data achieved 87% accuracy in predicting outcome by applying information analysis, healthcare interoperability tools HL7/FHIR visualization, and SQL
Worked on a Smart Watch Prediction Model, achieving a 90% success rate for health record accuracy
Assisted in the Empowered Communities Opioid Project, analyzing behavioral data resulting in a 95% reduction in processing time.
Gathered, arranged, and corrected research data to create representative graphs and charts highlighting results for presentations.

Clinical Resolution Data Analyst Intern

HTC Global

05.2022 - 08.2022

Analyzed real-time data using machine learning for specific clients
Developed prediction models and managed clients directly
Expertise in data computation and security on GCP.

Senior Data Engineer

APCER Life Science

01.2018 - 12.2020

Led end-to-end data integration and migration projects, employing Python, SQL, and ETL processes to merge Pharmacovigilance ICSR data from ARISg & Argus databases
Played pivotal role in system and data integration of Pharmacovigilance ICSR Data from ARISg & Argus database as a part of their business merger using Python and SQL
Expanded robust data pipelines that extract, transform, and load (ETL) data from various sources into structured formats using appropriate data structures
Used ETL Process to combine multiple data source, transform, load and SQL framework as an intermediate database to analyze data
Accomplished 6 cycles of data transfer from legacy system to target system
Collaborated with Business Process Owners to capture business and functional requirements for Scope of Data Migration
Led Pre-load and Post load validation report publications and follow-up actions
Implemented Real Time Load Summary Report using SQL and Reporting Dashboard to provide key insights on Data Migration status, eradicating all manual efforts to update the statistics after each load attempt Analyzing adverse drug effect data using python and SQL, SAS, visualizations, performed predictions of possible positive cases, and coding events
Responsible for data extraction, client management, and data management
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB)
Strong experience of leading multiple Azure Big Data and Data transformation implementations in Pharmacovigilance
Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI
Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity
Involved in complete project life cycle starting from design discussion to production deployment
Optimized Hive queries and used Hive on top of Spark engine
Proficient with Azure Cloud Platform services like Azure Data Factory (ADF), Azure Data Lake, Azure Blob Storage, Azure SQL Analytics, Azure Databricks
Worked on Sequence files, Map side joins, Bucketing, Static and Dynamic Partitioning for Hive performance enhancement and storage improvement
Experience in retrieving data from oracle using PHP and Java programming
Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
Proficient in designing and building data pipelines, conducting ETL processes, and integrating machine learning models into data flows to support data-driven decision-making
Worked closely with the business team to gather their requirements and new support features
Developed a 16-node cluster in designing the Data Lake with the Cloudera Distribution
Responsible for building scalable distributed data solutions using Hadoop
Implemented and configured High Availability Hadoop Cluster
Designing and developing data pipelines to ETL data from SAP S/4 HANA, SAP IBP, and other SAAS data warehouses
Installed and configured Hadoop Clusters with required services (HDFS, Hive, HBase, Spark, Zookeeper)
Enhanced Hive scripts to analyze data and PHI are categorized into different segments and promotions are offered to customer based on segments
Extensive experience in writing Pig scripts to transform raw data into baseline data
Developed UDFs in Java as and when necessary to use in Pig and HIVE queries
Worked on Oozie workflow engine for job scheduling
Created Hive tables, partitions and loaded the data to analyze using HiveQL queries
Created different staging tables like ingestion tables and preparation tables in Hive environment
Created tables in HBase to store the variable data formats of data coming from different upstream sources
Experience in managing and reviewing Hadoop log files
Good understanding of ETL tools and how they can be applied in a Big Data environment
Handled 25 clients with different data for data processing to predict the performance and positive and negative events.
Optimized data pipelines by implementing advanced ETL processes and streamlining data flow.
Prepared documentation and analytic reports, delivering summarized results, analysis and conclusions to stakeholders.
Trained staff and updated training documents to meet regulations and standards.
Reduced workload backlog by effectively prioritizing high-priority cases based on severity level and regulatory deadlines during peak periods of volume influx.
Championed the adoption of agile methodologies within the team, resulting in faster delivery times and increased collaboration among team members.

Education

Master of Science - Data Analytics & Health Informatics

George Mason University

Fairfax, VA

05.2001 - 05.2022

Skills

Scripting Languages: Python, R, SQL, Java, PHP, Bash, Powershell, Pyspark, Scala

undefined

Websites

Accomplishments

Excellence Performance Award for Business Client management in Apcer Life Science, 01/2020
Excellent performance in Data Engineering and recognition in cloud data Migration, 01/2020
Best Debut Performance in Pharmacovigilance Data, 12/2018

Aws Certifications

AWS Certified Data Engineer, https://www.credly.com/badges/885e9f66-7c47-4fea-bb5b-3c6489f30fdc/public_url, 03/2024
AWS Certified Machine Learning – Specialty, https://www.credly.com/badges/15e9f69c-8ec3-4247-ab25-89f50b88af64/public_url, 04/2024

Projects

Capstone Project- Effect of Social Determinants incidents on Diabetes, 2022, https://github.com/TejaswaniDash/Effect-of-Social-Determinants-incidents-on-Diabetes-

Used LASSO regression to construct a causal network for explaining variation in social determinants incidence of diabetes, achieving an accuracy of 76%.

AI Project on Fingerphoto, 2022, https://github.com/TejaswaniDash/AIProject-Color-space-finger-photo-presentation-detection-on-background-variation-using-ResNet

Addressed presentation attack issues using Convolutional Neural Networks (CNNs) for Fingerphoto-based authentication, a touchless authentication method in medical patient scenarios, incorporating GAN models.

Credit Card Fraud Prediction using ML and AI, 2022, https://github.com/TejaswaniDash/Credit-Card-Detection

Achieved 96% Accuracy and Minimized False Negatives Using Python, Logistic Regression, and Ensemble Models.
Employed exploratory data analysis (EDA), implemented advanced classification algorithms, and fine-tuned models via hyperparameter optimization to effectively predict fraudulent credit card transactions.
The meticulous approach yielded exceptional results, demonstrating reliable detection and combatting fraudulent activities with a high accuracy rate of 96%.

Machine Learning Model to predict Covid infection and death rates, 2022, https://github.com/TejaswaniDash/Covid-19-Death-Recovery-and-confirmed-Prediction-and-Analysis-using-ML-and-AI

Utilized European and country-level USA datasets to predict future US infection and death rates with 90% accuracy. Compared multiple ML models (regression/forest, etc.), LLM and neural network performance.

Text Analysis of data to predict sentiment using NLP, LLM, 2023, https://github.com/TejaswaniDash/Text-Analysis-of-data-to-predict-sentiment

Employed LASSO regression, multi-level regression, LLM and ML tools to create a network for sentiment prediction. Achieved a 95% success rate in predicting sentence sentiment.

Mental Health in Tech world using Deep Learning and ML, 2021, https://github.com/TejaswaniDash/Mental-Health-In-Tech-World

Analyzed and predicted the need for medical treatment for mental health issues in the tech industry using support vector networks, decision trees, and random forests.

Timeline

Clinical Resolution Data Analyst Intern

HTC Global

05.2022 - 08.2022

Research Assistant

George Mason University

08.2021 - 01.2023

Senior Data Engineer

APCER Life Science

01.2018 - 12.2020

Master of Science - Data Analytics & Health Informatics

George Mason University

05.2001 - 05.2022

Senior Data Engineer

Intelligenie

2 2023 - Current

Senior Data Engineer- ETL Developer

CoxAutomotive

Tejaswani Dash

Summary

Overview

Work History

Senior Data Engineer

Senior Data Engineer- ETL Developer

Research Assistant

Clinical Resolution Data Analyst Intern

Senior Data Engineer

Education

Master of Science - Data Analytics & Health Informatics

Skills

Websites

Accomplishments

Aws Certifications

Projects

Timeline

Clinical Resolution Data Analyst Intern

Research Assistant

Senior Data Engineer

Master of Science - Data Analytics & Health Informatics

Senior Data Engineer

Senior Data Engineer- ETL Developer

Similar Profiles

Jagadeesh Reddy MandapatiJagadeesh Reddy Mandapati

Preethi AdavelliPreethi Adavelli

Bhanu Prasad BandaBhanu Prasad Banda

MARSHA CROSBYMARSHA CROSBY

Madison O'DonnellMadison O'Donnell