Having 10+ years of professional experience in IT, working with various Legacy Database systems. Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies like Agile and hybrid methods. Significant experience in Scala,Python & Shell languages. Extensive experience in developing Kafka producers and Kafka consumers for streaming millions of events per minute on streaming data using Scala,PySpark, Python & Spark Streaming. 8 years of hand-on experience designing and building scalable data pipelines for large datasets on cloud data platforms. Good Experience with Amazon ECS and ECR for containerization Good understandings of understanding of AWS IAM roles, policies, and cloud security principles. Responsible for - Developing refactoring & Optimizing real time data processing applications using Apache Flink - Developed Refactor,optimize batch data processing Working with Apache tools like NiFi, Kafka and RabbitMQ to streamline data ingestion and flow between different systems - and to enhance, sort, modify, combine, split, and verify data sets. Developing Pandas libraries to shepherd data Worked on different optimization techniques using databricks. Good experience in optimized dataset structures in Parquet and Delta Lake formats, with ability to design and implement complex transformations between datasets. Strong proficiency in data pipeline and workflow management tools. Working experience on spark optimization techniques like salting,AQE and other techiques. Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing). Involved in designing and creating Migration plan with high level documentation with clients and project team. Attending meetings for troubleshooting Priority issues during and post migrations with end users. Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Good Experience with Amazon ECS and ECR for containerization Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks. Good experience on Mysql queries & migration to pyspark. Migrated mysql stored procedures into pyspark. Good understandings of understanding of AWS IAM roles, policies, and cloud security principles. Strong understanding & good experience in nummy python library. Strong experience and knowledge of NoSQL databases such as HBase and MongoDB. Experience in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries. Significant experience in AWS services related to data engineering, such as AWS S3, AWS Redshift, AWS Glue, AWS Lambda, AWS Kinesis, MSK and AWS EMR, API Gateway, SQS, DynamoDB, S3, Elasticsearch, Flink . Develop scalable data pipelines using Databricks and Apache Spark technologies. Integrate Databricks with cloud services like AWS, Azure, or Google Cloud. Worked on various Databricks projects and assisted team members with data analysis tasks Experienced in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions. Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data. Extensive working experience in agile environment using a CI/CD model. Extensive experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and incorporate complex UDF's in business logic.