Data Engineer with over 5 years of experience designing, developing, and optimizing scalable data architectures and pipelines. Proficient in using Big Data technologies, cloud environments, and real-time processing frameworks to facilitate effective data storage, transformation, and analytics. Proficient in collaborating with cross-functional teams to build high-quality, data-driven solutions that achieve business objectives.
Extensive experience in Big Data Analytics, including Apache Spark, Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Apache NiFi, and Kafka.
Skilled in designing and optimizing Hadoop core components such as Tracker, NameNode, DataNode, and MapReduce, improving data processing efficiencies.
Skilled in data manipulation and analysis, utilizing Python (Pandas, NumPy, SciPy, Scikit-learn, Matplotlib) to extract meaningful insights from structured and unstructured data.
Excellent hands-on experience with Hadoop ecosystem tooling like Cloudera CDH and Hortonworks HDP managing big-scale Hadoop clusters.
Deep skills in developing and designing Spark applications based on Scala to deliver maximum scalability and performance.
Abundant experience in cloud computing platforms including Azure (Azure Databricks, Azure Data Factory, Azure SQL, Azure Data Lake, Machine Learning) and AWS (EC2, S3, Redshift, EMR), with extensive expertise in migrating on-premises applications to the cloud.
Expertise in data migration, integration using Sqoop, Flume, and Kafka to seamlessly transfer data between disparate systems.
Experience with NoSQL databases such as HBase, Cassandra, and MongoDB, including SQL-to-NoSQL migration tuning for huge-scale applications.
Hands-on with Apache Airflow, using workflow automation, conditional triggers, and job scheduling.
Please provide these details as they are compulsory requirements of the job.
Deep understanding of data warehousing concepts, i.e., creating ETL pipeline, dimensional model, OLAP/OLTP environments, as well as administering tools such as Informatica PowerCenter.
Deep exposure to administering multi-data formats (PARQUET, AVRO, ORC, TEXTFILE, XML) and compression codes (GZIP, SNAPPY, LZO) to optimize storage and processing.
High skills in Snowflake Cloud Data Warehouse and designing and implementing high-performance data architectures to guarantee deep analytics and reporting.
Strong interpersonal and teamwork abilities, with proven track record of interacting with stakeholders, engineers, and business groups to facilitate data-driven decision-making.
Technical Skills
Programming & Scripting:
Python, Scala, PySpark, Bash, Shell, Perl
Cloud Platforms:
Amazon Web Services (AWS), Microsoft Azure
AWS Services:
EC2, S3, Lambda, Route 53, Elastic Beanstalk (EBS), VPC, IAM, ECS (EC2 Container Service), DynamoDB, Auto Scaling, Security Groups, Redshift, CloudWatch, CloudFormation
Azure Services:
Azure Data Bricks, Azure Data Factory (ADF), Blob Storage, Azure SQL, Azure Data Lake
Python Libraries:
NumPy, Matplotlib, SciPy, PySpark, Pandas, BeautifulSoup, Scikit-Learn
Version Control:
Git, GitHub, Bitbucket, SVN
Big Data Technologies:
Spark, Kafka, Nifi, Airflow, Flume, Snowflake, HDFS, MapReduce, Pig, Hive, Sqoop, Oozie
Hadoop Frameworks:
Cloudera CDHs, Hortonworks HDPs
Data Modelling Schemas:
Star Schema, Snowflake Schema
Visualization Tools:
Tableau, Power BI, Excel
Databases:
MySQL, Oracle, MS-SQL Server, Teradata, HBase, Cassandra, DynamoDB, MongoDB
Operating Systems:
Windows, Linux, Unix, MacOS