Summary
Overview
Work History
Education
Skills
Work Availability
Quote
Timeline
Generic

Vinaybabu Bandaru

Sr Cloud Data Engineer & Python Developer

Summary

Experienced data engineering professional with a proven track record of developing and managing efficient data systems. Demonstrated ability to deliver impactful solutions through collaborative and results-driven approaches. Recognized for expertise in data warehousing and ETL processes, as well as the flexibility to adapt to evolving project requirements. Dedicated to delivering impactful results, I bring extensive experience in designing and implementing scalable data architectures, optimizing data pipelines, and leveraging big data technologies. Collaborative team leadership, adaptability, and a results-driven approach have been key to my success. Proficient in SQL, Python, Spark, and cloud platforms, I possess a keen ability to align technical solutions with business objectives.

Overview

10
10
years of professional experience

Work History

Lead Data Engineer

ZYTER|Trucare
02.2024 - Current
  • Managed large-scale data storage and processing using Amazon S3 and HDFS, ensuring efficient data availability, fault tolerance, and integration with data lakes and warehouses
  • Designed and developed scalable data pipelines for business intelligence (BI) and analytics, integrating SnapLogic and Azure Data Factory (ADF) for optimized ETL/ELT processes
  • Automated workflows for seamless data extraction, transformation, and loading into data warehouses for real-time insights
  • Proficient in building data models and optimizing SQL queries for BI tools like Tableau, Power BI, and Looker, enabling data-driven decision-making
  • Created interactive dashboards and visualizations to present actionable insights across business units
  • Integrated Generative AI solutions into production systems for automated content generation, enhancing marketing automation and content creation workflows
  • Optimized data storage strategies for unstructured data (e.g., text, images) used in training AI models
  • Optimized rendering performance in ReactJS by implementing lazy loading, memorization, and should Component Update to enhance the overall user experience
  • Configured and optimized NiFi processors for seamless integration with various data sources such as databases, APIs, and file systems
  • Migrated on-premises PostgreSQL databases to AlloyDB, ensuring high availability, scalability, and improved performance for data-intensive applications and workloads
  • Designed and implemented scalable data architectures in Google BigQuery to support large-scale analytics and reporting across diverse data sources and business units
  • Designed and built automated ETL pipelines for data acquisition from diverse sources (APIs, databases, third-party vendors, flat files), ensuring reliable and timely data delivery for BI purposes
  • Designed, developed, and deployed scalable data pipelines using GCP services such as BigQuery, Dataflow, Cloud Storage, and Cloud Pub/Sub to enable real-time and batch data processing
  • Managed task dependencies and scheduling within DAGs to ensure proper execution order, minimizing delays and optimizing resource usage
  • Optimized data pipelines by leveraging BigQuery's partitioning, clustering, and query optimization techniques to reduce processing time and cost for large datasets
  • Developed and managed end-to-end data pipelines in Azure Databricks using PySpark and Scala, ensuring efficient data processing and transformation for large-scale datasets
  • Automated ETL workflows using Azure Data Factory and AWS Glue, integrating Spark Streaming and Apache Kafka for real-time data processing
  • Implemented data integrity checks and quality assurance measures to ensure accuracy across datasets
  • Utilized business intelligence tools like Looker, Tableau, and Power BI to design intuitive, high-performance dashboards and reporting solutions, providing actionable insights to business stakeholders
  • Built and managed data warehouses in BigQuery, optimizing schema design, partitioning, and clustering for high-performance querying and cost efficiency across large datasets
  • Leveraged DAGs to automate data extraction, transformation, and loading (ETL) processes, significantly reducing manual intervention and improving process efficiency
  • Integrated RESTful APIs and GraphQL endpoints with ReactJS to enable seamless data flow between the front end and backend services
  • Developed and orchestrated data workflows using Cloud Composer (Airflow), automating ETL processes and ensuring timely, reliable data movement between systems in the GCP ecosystem
  • Implemented NiFi's data provenance features to track data lineage, enabling enhanced visibility and traceability of data across complex workflows
  • Integrated AlloyDB with GCP BigQuery and other cloud services to enable hybrid data processing solutions, combining real-time transactional data with large-scale analytics
  • Collaborated with data analysts and business stakeholders to define key metrics and KPIs, creating data models and BI reports that support data-driven decision-making across the organization
  • Applied advanced data transformation techniques, including text cleaning, tokenization, and image preprocessing, to prepare datasets for Generative AI model training
  • Developed complex SQL queries and optimized database performance through indexing and restructuring
  • Troubleshot and resolved production issues related to data ingestion, transformation, and integration, ensuring timely incident resolution
  • Developed custom processors and controller services in Java to extend NiFi's out-of-the-box capabilities, addressing unique business requirements
  • Developed and managed BigQuery ETL processes, integrating data from Google Cloud Storage, Cloud Pub/Sub, and other sources to deliver high-quality, real-time data insights
  • Designed and implemented machine learning models using Azure Machine Learning Studio, ensuring efficient model training, evaluation, and deployment pipelines
  • Collaborated with backend teams to integrate ReactJS with Node.js/Express APIs, ensuring efficient data handling and state management
  • Leveraged AlloyDB's advanced indexing and query optimization features to ensure efficient data retrieval and processing, reducing query latency for mission-critical applications
  • Integrated Azure Synapse Analytics, AWS Glue, and Apache Airflow for seamless orchestration of data flows and transformation tasks
  • Leveraged Microsoft Fabric for scalable data integration and real-time analytics solutions
  • Managed data acquisition and transformation processes, ensuring data quality, integrity, and consistency before feeding into business intelligence platforms for analysis and reporting
  • Implemented proactive monitoring and alerting mechanisms using tools such as Grafana, Prometheus, and ELK stack to detect potential system failures
  • Collaborated with UX/UI designers to translate wireframes and mockups into responsive, high-performance web interfaces using ReactJS and CSS frameworks like Bootstrap and Material-UI
  • Collaborated with cross-functional teams to build and deploy machine learning models in Azure Databricks, utilizing its integration with Azure Machine Learning for scalable and optimized training
  • Designed and implemented star and snowflake schema for optimized reporting and analytics
  • Utilized Snowflake and AWS Athena for optimized querying and data storage
  • Managed performance tuning, indexing, and schema designs for enterprise applications
  • Collaborated with data teams to define DAG structures that ensure clear task execution order, error handling, and logging to support complex data pipeline requirements
  • Designed and deployed SSIS packages to extract, transform, and load data from multiple sources into centralized data warehouses, ensuring seamless data integration
  • Automated data migration, cleansing, and scheduled updates using SSIS, reducing manual effort by 80% and improving operational efficiency
  • Executed real-time data streaming solutions with Apache Kafka and Spark Streaming, facilitating immediate analytics and decision-making
  • Optimized data ingestion and processing workflows using Microsoft Fabric tools
  • Led efforts in managing data security and compliance by leveraging NiFi's built-in data encryption, access control, and authentication mechanisms
  • Built and maintained data lakes by integrating Azure Databricks with Azure Data Lake Storage, enabling efficient data storage and real-time analytics for business intelligence
  • Automated model deployment and monitoring using Azure ML pipelines, streamlining the end-to-end workflow from data preparation to production
  • Built and maintained scalable data warehouses in Snowflake and Amazon Redshift, optimizing storage configurations, partitioning, and query performance for large datasets
  • Developed and executed complex queries to extract actionable business insights
  • Developed predictive models and analytics dashboards using Alteryx and Power BI, enabling actionable insights for strategic decision-making
  • Integrated external data sources to enhance model accuracy and reporting capabilities
  • Collaborated with cross-functional teams to perform root cause analysis (RCA) and implement preventive measures to avoid recurring issues in production
  • Automated data ingestion workflows using Google Cloud Dataflow and Cloud Composer, ensuring seamless and reliable data movement into BigQuery for downstream analytics
  • Configured and managed cloud resources on AWS, Azure, and GCP for optimized computing and storage, including EC2, Synapse Analytics, Hadoop, and RDS
  • Ensured data security, scalability, and fault tolerance across cloud environments
  • Implemented RESTful API services for efficient data interchange, enabling system integration and data accessibility
  • Developed custom reporting solutions using SQL and DAX to support cross-departmental decision-making
  • Designed and implemented high-availability and disaster recovery strategies using AlloyDB's multi-zone deployment capabilities, ensuring business continuity and minimal downtime
  • Leveraged Azure Machine Learning's automated machine learning (AutoML) feature to accelerate model development and improve model accuracy for business-critical applications
  • Contributed to data governance and metadata management using tools like AWS Athena and Apache NiFi, ensuring compliance with data policies and ensuring high data integrity
  • Built infrastructure for real-time content generation using Generative AI models, supporting low-latency applications such as chatbots and personalized recommendations
  • Automated deployment and model monitoring using Scikit-learn
  • Integrated GCP BigQuery with external tools and services such as Google Analytics, Firebase, and Google Cloud AI, providing a unified solution for data analysis and business intelligence
  • Automated deployment processes using Jenkins, enhancing CI/CD pipelines and streamlining code deployment to AWS and Azure environments
  • Optimized ETL processes and scheduled automated pipeline execution using ADF triggers
  • Integrated Azure ML with Azure Databricks for distributed training and high-performance machine learning on large-scale datasets
  • Implemented security and data governance policies within GCP, using IAM (Identity and Access Management), Cloud Security Command Center, and VPC for secure and compliant data handling
  • Used Webpack and Babel for building and bundling ReactJS applications, ensuring compatibility across browsers and reducing JavaScript bundle sizes
  • Led data migration projects for seamless extraction, transformation, and loading (ETL) between on-premise systems and cloud platforms
  • Integrated Azure Data Factory pipelines with Snowflake and Amazon RDS for consistent data orchestration
  • Leveraged Tableau, Power BI, and Looker to create visually compelling analytics dashboards, driving data-driven decisions
  • Implemented advanced analytics techniques for real-time performance monitoring and business insight generation
  • Architected and maintained secure data lakes using AWS technologies, including EMR, Lambda, and Redshift, ensuring robust and compliant storage solutions for large-scale data sets
  • Designed systems for efficient querying and data exchange in big data environments
  • Environment: AWS, Azure, Snowflake, Redshift, Databricks, Apache Spark, Hadoop, Tableau, Power BI, Looker, SQL, PL/SQL, Python, Generative AI, SnapLogic, Azure Data Factory (ADF), HDFS, S3, Lambda, Kafka, Cassandra, Kubernetes, Elasticsearch, Data Governance, Cloud Migration, Data Lakes, Real-Time Analytics, Machine Learning, API Development, Data Integration, Apache Airflow, and more.

Cloud Data Engineer

HSBC
10.2022 - 09.2023
  • Developed scalable ETL pipelines using Dataflow, Snowflake, Databricks, PySpark on Azure Databricks to enrich data stored in Azure Data Lake
  • Developed scalable ETL pipelines leveraging Azure Databricks, Snowflake, PySpark, and Azure Data Factory to ingest, transform, and enrich data stored in Azure Data Lake
  • Created data ingestion pipelines using Spark SQL and integrated with Cosmos DB for seamless data flow
  • Designed and implemented real-time data processing workflows with SnapLogic Ultra Pipelines and Azure, ensuring low-latency delivery for critical business operations
  • Integrated SnapLogic with multiple cloud platforms (Azure, AWS, GCP) for data processing across environments.Built efficient data transformation workflows to cleanse, enrich, and prepare datasets for analytics and reporting
  • Utilized tools like SnapLogic, ADF, and Alteryx for dynamic workflows, ensuring data quality and consistency
  • Automated data quality checks to ensure accuracy in SQL Server and cloud databases
  • Led migration efforts from legacy systems to modern cloud data architecture, using Azure Data Factory, GCP, Databricks, and Snowflake to integrate, store, and process large datasets
  • Migrated legacy on-premises data warehouses to Snowflake, ensuring minimal downtime and data integrity
  • These points highlight your expertise in ReactJS and demonstrate your ability to develop performant, scalable, and user-friendly web applications
  • Built custom SQL queries and BigQuery scripts to extract, transform, and load (ETL) data, enabling business stakeholders to generate actionable insights from large datasets
  • Integrated Cloud-based BI solutions with on-premise databases and cloud storage, ensuring seamless data flow between acquisition, processing, and presentation layers
  • Worked with development teams to implement AlloyDB's automated backups, ensuring that data is securely backed up and available for recovery in case of failure
  • Leveraged GCP Pub/Sub and Dataflow to build real-time data streaming pipelines, enabling immediate insights and automated decision-making across data sources
  • Integrated DAGs with external systems and APIs to fetch, process, and store data, ensuring seamless data flow across various cloud and on-premise environments
  • Integrated data into BI platforms such as Power BI, Tableau, and SQL Server, enabling advanced analytics and reporting
  • Built and maintained data marts to improve the performance and usability of BI solutions
  • Designed and executed real-time dashboards and CX scorecards using Qualtrics for actionable insights
  • Automated deployment and scaling of AlloyDB instances using Terraform and Google Cloud Deployment Manager, streamlining database provisioning and configuration in cloud environments
  • Integrated BigQuery with Google Data Studio, Looker, and other BI tools to deliver real-time dashboards and visualizations that support data-driven decision-making
  • Automated data collection from various sources using APIs, custom scripts, and GCP Dataflow, reducing manual data entry and improving data accuracy and timeliness
  • Optimized NiFi cluster performance for high availability and fault tolerance, ensuring reliable data processing at scale
  • Developed and maintained complex single-page applications (SPAs) using ReactJS, ensuring a seamless user experience and optimal performance across browsers
  • Created interactive notebooks in Azure Databricks for data exploration, transformation, and visualization, enabling stakeholders to access actionable insights with ease
  • Managed data versioning and change management processes, ensuring smooth updates to production data workflows and minimizing disruptions
  • Coordinated the deployment and scaling of NiFi in a multi-node environment to support large-scale, real-time data integration projects
  • Built and maintained data marts and data lakes to centralize and organize large volumes of data, enabling advanced analytics and reporting through business intelligence platforms
  • Applied machine learning models within Microsoft Fabric for uncovering actionable business insights
  • Integrated Azure Synapse Analytics and Power BI for advanced data visualization and model performance tracking
  • Collaborated with application teams to optimize database schema design and indexing strategies in AlloyDB, resulting in improved query performance and system responsiveness
  • Implemented dynamic DAG generation based on external configurations or parameters, allowing for reusable and flexible data pipelines that can scale with business needs
  • Integrated NiFi with other big data technologies, including Hadoop, Spark, and Kafka, to support end-to-end data pipelines for analytics and reporting
  • Automated infrastructure provisioning using Google Cloud Deployment Manager and Terraform, streamlining resource management and ensuring reproducible environments for data engineering tasks
  • Implemented component-based architecture using ReactJS to create modular, reusable UI components that adhere to DRY (Don't Repeat Yourself) principles
  • Implemented batch and stream processing workflows using Azure Databricks and Structured Streaming to handle both real-time and historical data processing needs
  • Developed efficient data aggregation and transformation logic to support complex queries and reporting, optimizing the performance of BI tools and dashboards
  • Collaborated with cross-functional teams to create data-driven solutions by applying predictive modeling, clustering, and classification techniques in Azure ML
  • Leveraged ReactJS hooks such as useState, useEffect, and useReducer to manage component state and side effects, streamlining code complexity and enhancing maintainability
  • Implemented robust data governance and security protocols across SnapLogic, Snowflake, and Azure to ensure compliance with organizational standards
  • Managed role-based access control (RBAC) in Snowflake and utilized Azure Key Vault for enhanced security
  • Implemented complex data transformations and robust error-handling mechanisms, improving data accuracy and reducing processing errors by 95%
  • Enhanced SSIS packages to handle large datasets (1TB+ daily), reducing processing time by 50% through parallel execution and resource optimization
  • Applied BigQuery ML to build and deploy machine learning models directly within the data warehouse, enabling predictive analytics and advanced data insights without data movement
  • Provided technical guidance and best practices on NiFi's use, helping cross-functional teams improve data flow efficiency and reduce time to insight
  • Utilized Airflow's native features such as task retries, XCom, and sensors within DAGs to handle failures, manage task state, and create robust, fault-tolerant data workflows
  • Developed and maintained CI/CD pipelines using GitHub Actions, Azure DevOps, and DBT for streamlined code deployment, version control, and workflow automation
  • Automated API integrations with SnapLogic's API Management to ensure seamless data exchanges and integrations
  • Utilized GCP's BigQuery, Compute Engine, and Kubernetes for scalable data processing and application deployment
  • Configured Google Cloud Storage and Cloud SQL for optimized data storage and migration
  • Implemented Cloud Load Balancing and Cloud CDN for efficient networking during migration
  • Ensured compliance with data governance frameworks and security policies during the data acquisition process, maintaining proper data access controls and audit trails for sensitive business data
  • Managed and monitored machine learning experiments through Azure ML's tracking capabilities, ensuring high-quality models and continuous improvement
  • Maintained and optimized databases in production environments, including performance tuning, index optimization, and data consistency checks
  • Monitored and optimized SQL Server and Snowflake databases for improved performance, query execution times, and storage efficiency
  • Employed tools such as SQL Trace, TKPROF, and AWR reports to identify and resolve performance bottlenecks
  • Integrated third-party libraries and frameworks with ReactJS, such as Chart.js, D3.js, and Redux-Saga, to implement data visualizations and handle complex state management logic
  • Implemented model versioning and management using Azure Machine Learning's model registry, ensuring smooth version control and governance for machine learning models
  • Proficient in using PySpark, Databricks, and Talend for data transformation and migration tasks
  • Optimized Spark Streaming for real-time data processing from sources like Apache Flume
  • Collaborated with cross-functional teams to design scalable data models, ensuring data consistency, security, and compliance across cloud platforms and databases
  • Facilitated the design of KPI dashboards and CX programs to track customer feedback and performance
  • Led successful cloud migrations from on-premise systems to cloud-based solutions, ensuring data integrity and cost optimization
  • Implemented data-loading strategies for Data Lakes and Data Warehouses using ADF and Snowflake for seamless and automated data ingestion
  • Designed and deployed real-time dashboards and KPI reports using Qualtrics to provide insights into customer feedback trends
  • Configured Power BI for interactive data visualizations that drive business decisions
  • Developed data preprocessing pipelines within Azure ML for large-scale datasets, integrating with Azure Data Lake, Azure SQL Database, and other data storage solutions
  • Applied best practices for data backup, disaster recovery, and data security to ensure the integrity and confidentiality of production data
  • Conducted extensive SQL tuning and optimization tasks, including query restructuring, indexing, and partitioning to improve the overall performance of SQL Server and Snowflake queries
  • Managed and optimized API integrations using SnapLogic, reducing operational costs and enhancing scalability
  • Automated data movement and ETL processes for seamless integration between enterprise systems and cloud platforms
  • Used Azure Monitor and Log Analytics for pipeline health monitoring, ensuring proactive error resolution and performance optimization
  • Led debugging and optimization efforts to ensure high availability and reliability of ETL processes
  • Committed to ongoing professional development, staying up-to-date with advancements in Snowflake, Azure, GCP, and emerging data engineering technologies
  • Applied domain-specific knowledge to enhance the impact of engineering solutions in the Finance and Banking sectors
  • Environment: Snowflake, Databricks, Azure Data Factory (ADF), Azure Synapse Analytics, GCP, Power BI, PySpark, SQL, Data Lakes, ETL, SnapLogic, Machine Learning, Alteryx, Spark Scala, Data Governance, CI/CD, DBT, Cosmos DB, AWS, API Integration, Python (NumPy, Pandas), Kubernetes, Talend, SQL Server.

Sr. Data Engineer & Python Developer

Tata Consultancy Services
09.2020 - 04.2022
  • Implemented polybasic technique for data loading and exporting in Azure Synapse Analytics using serverless SQL pools and Spark pools
  • Migrated from Oozie to Apache Airflow for incremental loads, Hadoop, extracting data from RDBMS
  • Implemented Kafka high-level consumers for obtaining data from Kafka partitions and moving it into HDFS
  • Managed resources and scheduling on Azure Kubernetes Service for handling online and batch workloads
  • Leveraged Azure DevOps and VSTS for CI/CD, Databricks, Azure Synapse Analytics, DBT, Dataflow, Data Warehouse, utilized Active Directory for authentication, and employed Apache Ranger for authorization
  • Supported incident response and post-mortem processes, documenting incidents, resolutions, and steps to improve system reliability
  • Ensured compliance and security within AlloyDB by implementing encryption, access controls, and audit logs to meet data privacy and regulatory requirements
  • Collaborated with cross-functional teams to define data schemas, design data models, and optimize queries for performance in BigQuery, enhancing data accessibility and usability
  • Monitored and fine-tuned DAG performance through execution metrics, logs, and visualizations, ensuring optimal performance and identifying bottlenecks in data processing
  • Used Scala for concurrency support, developing map-reduce jobs for JVM-based data processing
  • Executed SQL queries and published data for interactive Power BI dashboards and reporting
  • Implemented Dimensional Data Modeling for Multi-Dimensional STAR schemas and Snowflake Schemas
  • Utilized Google Cloud Functions for serverless compute and Cloud Run for deploying microservices, improving scalability and reducing operational overhead in data-driven applications
  • Created and managed forms using ReactJS controlled components, validation libraries like Formik and Yup, and custom input components to collect and process user data efficiently
  • Assisted in automation of manual tasks and the implementation of CI/CD pipelines to streamline deployments and reduce human error in production environments
  • Ensured data pipeline scalability by structuring DAGs to handle high-volume datasets, utilizing parallel processing and resource optimization techniques to meet growing data demands
  • Collaborated with cross-functional teams to understand business requirements and develop SnapLogic solutions tailored to meet specific objectives
  • Utilized AlloyDB's compatibility with PostgreSQL to support complex SQL queries, foreign key relationships, and other advanced database features for transactional workloads
  • Managed access control and data security in BigQuery by implementing fine-grained IAM roles, encryption, and data auditing to ensure compliance with governance standards
  • Managed and optimized GCP Cloud Storage for large-scale data storage, ensuring efficient data retrieval, security, and cost-effective use of resources
  • Worked with WebSockets to implement real-time updates in ReactJS applications, allowing users to see live data without refreshing the page
  • Conducted performance tuning of pipelines to maximize efficiency, reducing runtime and resource consumption
  • Integrated SnapLogic with downstream applications, such as Tableau, Power BI, and other BI tools, to enable data-driven decision-making
  • Leveraged BigQuery's federated queries to connect with external data sources (e.g., Google Sheets, Cloud SQL) for cross-platform data analysis
  • Provided technical training and documentation on SnapLogic solutions to ensure team-wide adoption and knowledge sharing
  • Collaborated with cross-functional teams to design GCP-based solutions for big data analytics, machine learning, and data visualization, enhancing the organization's data-driven decision-making
  • Optimized ReactJS components by employing techniques such as memoization and PureComponent to reduce unnecessary re-renders and improve app performance
  • Created test cases and validation processes to ensure the accuracy and consistency of data pipelines
  • Monitored and maintained SnapLogic environments, ensuring optimal performance and timely updates to Snaplex nodes
  • Collaborated with the backend team to design and integrate APIs with ReactJS, using tools like Axios for HTTP requests, and ensuring efficient data fetching strategies with caching mechanisms
  • Conducted unit testing, system integration testing, and user acceptance testing (UAT) for database components, ensuring smooth deployment across environments
  • Enforced database security measures, including roles, privileges, and data masking, to safeguard sensitive data
  • Authored comprehensive technical documentation, including database schemas, PL/SQL code standards, and deployment guides, facilitating team collaboration and knowledge sharing
  • Designed and implemented Azure Subscriptions, data factories, Kubernetes, Virtual Machines, SQL Azure Instances, HD Insight clusters
  • Utilized Spark Data Frames in Azure Databricks for business transformations and data cleansing
  • Developed Python scripts for ETL pipelines and DBT, DAG workflows in Airflow and Apache NiFi
  • Designed custom input adapters using Spark and Hive to ingest and analyze data in Airflow, ingesting into Snowflake
  • Developed a scalable web application using Django, integrating complex business logic and user authentication
  • Led the migration of a legacy system to a Python-based platform, enhancing efficiency and scalability
  • Implemented a data processing pipeline using Python, DBT, Pandas, and NumPy, resulting in a 30% reduction in processing time
  • Designed and built RESTful APIs in Flask, facilitating seamless integration with front-end applications and third-party services
  • Engineered an automated testing framework in Azure Synapse Analytics, Python, improving code quality and reducing bug rates by 25%
  • Managed database schema design and integration with Hadoop, Dataflow, PostgreSQL and MongoDB, optimizing data retrieval and storage
  • Contributed to an open-source project in Python, adding features and fixing bugs, demonstrating community engagement and collaboration skills
  • Implemented real-time data visualization tools using Python libraries like Matplotlib and Seaborn, aiding in insightful business decision-making
  • Spearheaded a cross-functional team in an Agile environment, successfully delivering complex project milestones on schedule
  • Continuously refactored and optimized existing Python codebase, ensuring high performance and adherence to modern coding standards
  • Environment: Python3.7, Django, Django Rest, AWS, Selenium API, Devops, Flask, React, PySpark, Spark, Spark SQL, MySQL, Cassandra, Snowflake, MongoDB, Flume, Data Warehouse, VSTS, Azure HDInsight, Data Bricks, Data Lake, Cosmos DB, DevOps, Azure AD, Blob Storage, DBT, Data Factory, Azure Synapse Analytics, Git, Scala, Hadoop 2.x (HDFS, Databricks, MapReduce, Yarn), Azure Synapse Analytics, Airflow, Flume, Hive, Sqoop, Hadoop, HBase, PowerBI.

SQL & ETL Developer

Cogwave Software Technologies
06.2015 - 08.2020
  • Designed and implemented a robust database schema for a Hospital Management System (HMS) using SQL Server
  • Designed and implemented a robust SQL Server schema for a Hospital Management System (HMS), developing complex queries, stored procedures, and functions to enable efficient data retrieval and manipulation
  • Integrated SQL-based reporting tools for dynamic generation of patient and hospital operation reports
  • Developed and maintained ETL pipelines using Matillion for cloud data warehouses (AWS Redshift, Hadoop, Google BigQuery, DBT), optimizing data ingestion, transformation, and loading processes
  • Integrated data from multiple sources, improving accessibility and data accuracy
  • Collaborated with healthcare professionals to design clinical data solutions, creating an SQL-driven data warehouse for the HMS to facilitate advanced analytics and decision-making
  • Implemented database security measures to ensure patient data privacy and regulatory compliance
  • Built and maintained a POS system database to support retail operations, including inventory management and payment processing
  • Designed SQL-based solutions for real-time tracking of stock levels and sales analytics
  • Leveraged DBT for sales and customer data analytics, improving business intelligence capabilities
  • Utilized Python and Pandas for data cleaning, feature scaling, and enrichment to enhance reporting accuracy
  • Integrated customer data from various platforms (e.g., web, mobile, CRM) into Customer Data Platforms (CDPs) and Data Management Platforms (DMPs), creating a 360-degree view of the customer for targeted marketing and segmentation
  • Utilized PySpark for large-scale data transformations and Kafka for real-time streaming data analysis
  • Built data pipelines to process, merge, and enrich data, ensuring efficient ETL workflows
  • Developed data processing pipelines in AWS and containerized web analytics applications and pipelines using Docker to ensure scalability and portability across different environments
  • Managed routine backup, recovery, and disaster recovery solutions for both HMS and POS systems, ensuring high availability and operational continuity
  • Conducted statistical analysis using Python and presented findings to stakeholders
  • Developed web services using SOAP for XML data exchange and Flask for data manipulation and retrieval
  • Worked with Jenkins for continuous integration and deployment, utilizing GIT version control for efficient code management and collaboration
  • Utilized Informatica Power Center for ETL tasks and data integration
  • Environment: SQL Server, Matillion, AWS Redshift, Hadoop, Google Big Query, DBT, Informatica, PySpark, Kafka, Docker, Flask, Python, Pandas, NumPy, SQL, AWS, Data Warehousing, Data Integration, ETL, Dimensional Data Modelling, Web Services, UNIX, Jenkins, Git.

Education

Master of Science - Data Science

Saint Peter's University
Jersey City, NJ
05.2001 -

Skills

Big Data Technologies: Hadoop, Spark, PySpark, Hive, Kafka, Flume, Sqoop, Oozie, Zookeeper, MapReduce, Cloudera Manager

Work Availability

monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse

Quote

Success is not final; failure is not fatal: It is the courage to continue that counts.
Winston S. Churchill

Timeline

Lead Data Engineer

ZYTER|Trucare
02.2024 - Current

Cloud Data Engineer

HSBC
10.2022 - 09.2023

Sr. Data Engineer & Python Developer

Tata Consultancy Services
09.2020 - 04.2022

SQL & ETL Developer

Cogwave Software Technologies
06.2015 - 08.2020

Master of Science - Data Science

Saint Peter's University
05.2001 -
Vinaybabu BandaruSr Cloud Data Engineer & Python Developer