Summary
Overview
Work History
Education
Skills
Timeline
CustomerServiceRepresentative

Lokesh Bachu

Frisco,TX

Summary

Results-driven Data Engineer with over 2 years of experience in designing and implementing efficient data engineering solutions. Highly skilled in data extraction, transformation, and loading (ETL) processes using technologies such as Apache Hadoop, Apache Nifi, and Azure services. Proven expertise in data cleaning, validation, and integration to ensure high-quality and reliable data for analysis. Experienced in leveraging distributed databases like Apache HBase and Azure Cosmos DB for scalable storage and retrieval of large datasets. Skilled in data modeling, schema design, and query optimization for enhanced performance. Collaborative team player with strong communication skills, adept at working with cross-functional teams to understand business requirements and deliver effective data solutions. Committed to continuous learning and staying updated with the latest trends in data engineering and cloud technologies. Resourceful team player with experience with common software applications and social media platforms. Well-versed in business organization and strategies for successfully modernizing workplaces. Skilled at project support with proven history of solutions-oriented problem-solving.

Overview

3
3
years of professional experience

Work History

Intern Volunteer

Servicebee LLC
05.2022 - 09.2022
  • Designed and implemented a data engineering pipeline for medical data, leveraging Azure HDInsight (based on Apache Hadoop), Azure Data Lake Storage, and distributed computing systems.
  • Led the ingestion process by utilizing Azure Data Factory, Apache Flume, and Apache Nifi to extract and ingest medical data from various sources, including electronic health records and medical devices.
  • Developed and implemented data cleaning and transformation algorithms using the MapReduce framework in Azure HDInsight, ensuring data quality and integrity.
  • Overcame scalability challenges by efficiently processing and analyzing large volumes of medical data in Azure, ensuring the pipeline's ability to handle increasing data volumes and processing requirements.
  • Implemented data anonymization techniques using Azure Data Lake Analytics and Azure Databricks to ensure compliance with privacy regulations and protect patient confidentiality.
  • Leveraged distributed databases like Azure Cosmos DB (with Cassandra API) and Apache HBase in Azure HDInsight for storing cleaned and transformed medical data, ensuring fault tolerance and high availability.
  • Utilized Azure Databricks and Azure Synapse Analytics (formerly SQL Data Warehouse) for data analysis, enabling complex computations and machine learning tasks on the distributed data.
  • Collaborated with cross-functional teams, including data scientists and healthcare professionals, to understand data requirements and design effective data models in Azure SQL Database and Azure Synapse Analytics.
  • Implemented agile methodologies, enabling iterative development, frequent feedback loops, and adaptation to changing requirements in Azure DevOps.
  • Ensured data governance by implementing best practices for data quality, security, and compliance with regulatory standards in Azure Data Catalog and Azure Purview.
  • Implemented DevOps practices for continuous integration, automated testing, and deployment using Azure DevOps pipelines, ensuring smooth development and operation of the data engineering pipeline.
  • Employed effective data modeling techniques to design schemas for storing medical data and optimize query performance in Azure SQL Database and Azure Synapse Analytics.

Data Engineer

Apps Associates
01.2021 - 12.2021
  • Data Ingestion:
  • Extracted data from various sources, including databases, APIs, and files.
  • Utilized tools such as Apache Flume and Apache Nifi for efficient data ingestion.
  • Implemented data validation and transformation processes during ingestion.
  • Data Transformation and ETL (Extract, Transform, Load):
  • Designed and implemented robust ETL processes to transform raw data into structured and consistent formats.
  • Applied data cleansing techniques to ensure data quality and integrity.
  • Used SQL and scripting languages (e.g., Python, Shell) for data transformation tasks.
  • Data Modeling:
  • Developed data models that captured the relationships and structure of the data.
  • Designed star schemas and dimensional models for efficient data warehousing.
  • Implemented optimization techniques to enhance query performance.
  • Data Warehousing:
  • Implemented end-to-end data warehousing solutions to support reporting and analytics.
  • Created and maintained data warehouse architectures, including staging, integration, and presentation layers.
  • Utilized technologies such as Apache Hive and Apache Impala for querying and analysis.
  • Database Management:
  • Administered and optimized databases used in data warehousing.
  • Performed database tuning, indexing, and partitioning for improved performance.
  • Implemented security measures and access controls to ensure data integrity and confidentiality.
  • Data Integration:
  • Integrated data from diverse sources, both internal and external, ensuring consistency and reliability.
  • Implemented data integration pipelines using tools like Apache Kafka and Apache NiFi.
  • Developed data integration workflows and data mapping strategies.
  • Data Quality Management:
  • Implemented data quality processes, including data profiling, cleansing, and validation.
  • Monitored and resolved data quality issues to ensure accurate and reliable data.
  • Developed and maintained data quality metrics and reporting mechanisms.
  • Data Governance:
  • Established data governance frameworks and policies for proper data management and usage.
  • Ensured compliance with data governance standards and regulatory requirements.
  • Implemented data governance processes for data lineage, metadata management, and data cataloging.
  • Troubleshooting and Issue Resolution:
  • Identified and resolved data-related issues, performing root cause analysis.
  • Implemented preventive measures and conducted performance tuning to optimize system efficiency.
  • Collaborated with cross-functional teams to address data-related challenges and provide technical expertise.
  • Collaboration and Communication:
  • Worked closely with data analysts, data scientists, and business stakeholders to understand requirements.
  • Communicated effectively to translate business needs into technical data engineering solutions.
  • Collaborated in cross-functional teams to deliver high-quality data-driven solutions.
  • Emerging Technologies and Industry Trends:
  • Kept up-to-date with the latest advancements in data engineering, including big data technologies, cloud computing, and automation.
  • Evaluated and implemented emerging technologies to improve data engineering processes and efficiency.

Skills Utilized:

  • Big Data Technologies (Apache Hadoop, HDFS, MapReduce)
  • Apache Flume and Apache Nifi for Data Ingestion
  • SQL and Scripting Languages (Python, Shell)
  • Data Modeling (Star Schemas, Dimensional Models)
  • Apache Hive and Apache Impala for Querying and Analysis
  • Database Administration and Optimization
  • Data Integration Tools (Apache Kafka, Apache NiFi)
  • Data Quality Management
  • Data Governance
  • Troubleshooting and Performance Tuning
  • Collaboration and Communication
  • Emerging Technologies and Industry Trends Awareness
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability

Software Engineer-QA

Cybage
01.2020 - 12.2020
  • Test Planning: Collaborated with the development team to design comprehensive test plans and strategies.
  • Test Case Development: Created detailed test cases and test scenarios based on project requirements.
  • Test Execution: Conducted manual and automated testing to verify software functionality, performance, and usability.
  • Defect Identification and Tracking: Identified and reported software defects using bug tracking systems, providing clear and concise bug reports.
  • Test Automation: Developed and maintained automated test scripts using [automation tools/frameworks] to improve testing efficiency.
  • Regression Testing: Performed regular regression testing to ensure software stability and maintain the integrity of existing features.
  • Continuous Integration and Delivery: Integrated testing activities into the continuous integration and delivery pipeline to support agile software development practices.
  • Test Data Management: Managed and maintained test data sets and test environments to ensure accurate and reliable testing.
  • Performance Testing: Conducted performance testing to evaluate software responsiveness and scalability under various workloads.
  • Documentation: Created comprehensive test documentation, including test procedures, test results, and other relevant information.
  • Collaboration: Collaborated closely with cross-functional teams, including developers, business analysts, and project managers, to ensure effective communication and alignment of testing activities.
  • Quality Assurance Processes: Contributed to the improvement and implementation of quality assurance processes and best practices.
  • Continuous Learning: Stayed updated with emerging testing tools, techniques, and industry trends to enhance the quality assurance process and drive continuous improvement.

Education

Master of Science - Data Engineering

University Of North Texas
Denton, TX
05.2023

Bachelor of Science - Computer Science And Engineering

CVR College Of Engineering
Hyderabad, India
06.2021

Skills

  • Programming: Python, Java, C, SQL, HTML, CSS
  • Tools & Software: VS code, Git, Microsoft Excel, Tableau
  • Databases: MS SQL Server, Oracle
  • Frameworks: ASPNET MVC, NET Framework
  • Libraries: Python-pandas, NumPy, scikit-learn, NLTK, Matplotlib, SpaCy, re, BeautifulSoup
  • ML and Data Mining Concepts: Random Forests, Decision Trees, KNN, Logistic Regression
  • Azure HDInsight (Apache Hadoop)
  • Azure Data Lake Storage
  • Azure Data Factory
  • Apache Flume, Apache Nifi
  • MapReduce
  • Azure Data Lake Analytics
  • Azure Databricks
  • Azure Cosmos DB (Cassandra API)
  • Azure Synapse Analytics (formerly SQL Data Warehouse)
  • Azure SQL Database
  • Azure DevOps
  • Azure Data Catalog, Azure Purview

Timeline

Intern Volunteer

Servicebee LLC
05.2022 - 09.2022

Data Engineer

Apps Associates
01.2021 - 12.2021

Software Engineer-QA

Cybage
01.2020 - 12.2020

Master of Science - Data Engineering

University Of North Texas

Bachelor of Science - Computer Science And Engineering

CVR College Of Engineering
Lokesh Bachu