Having over 5+ years of experience as Senior Data Engineer with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes. Experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies. Experience with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN. Experience in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python. Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming. Experience with Airflow to schedule ETL jobs to extract the data from AWS data warehouse. Proficient in designing, implementing, and optimizing ETL processes using Talend, leveraging its comprehensive suite of data integration tools to ensure seamless data movement and transformation across various systems and platforms. Experienced in utilizing Informatica PowerCenter for ETL development, including mapping design, workflow creation, and performance tuning, to deliver efficient data pipelines meeting business requirements within stringent timelines. Experience structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau). Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension). Experience working on creating and running Docker images with multiple micro - services. Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing. Experience with PySpark and Azure Data Factory in creating, developing and deploying high performance ETL pipelines. Experience in developing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity. Developed Spark jobs on DataBricks to perform tasks like data cleansing, data validation, standardization, and then applied transformations as per the use cases. Hands on experience in SQL and NOSQL database such as Snowflake, HBase, Cassandra and MongoDB. Extensive experience in agile software development methodology. Team Player as well as able to work independently with minimum supervision, innovative & efficient, good in debugging and strong desire to keep pace with latest technologies. Excellent Communication and presentation skills along with good experience in communicating and working with various stake holders.
Python