Highly skilled Data Engineer with 5 years of experience in designing, implementing, and optimizing scalable data pipelines, ETL processes, and distributed data systems. Adept at developing data architectures using cloud platforms like AWS, GCP, and Azure, while implementing advanced real-time and batch data processing solutions. Extensive hands-on experience with data modeling, system design, schema optimization, and API integrations. Proficient in building and maintaining large-scale data pipelines using technologies such as Apache Spark, Kafka, Hadoop, and ElasticSearch. Strong foundation in database management, transactions, indexing, and concurrency control. A dedicated professional with excellent communication skills and a proven track record of supporting data science and machine learning workflows.
Programming Languages: Python, Scala, Java, SQL, Go
Cloud Platforms: AWS (S3, Redshift, Glue, Lambda), GCP (BigQuery, Dataproc), Azure (Data Factory, Synapse)
ETL & Orchestration: Apache Airflow, dbt, AWS Glue, SSIS
Data Warehousing: Snowflake, Redshift, BigQuery, Azure Synapse
Big Data Technologies: Apache Hadoop, Hive, Spark, HBase, Kafka, Presto, Beam
CI/CD & DevOps: Jenkins, Docker, Kubernetes, Terraform
Distributed Storage: ElasticSearch, Cassandra, HDFS, Parquet, Avro
Database Systems: PostgreSQL, MySQL, SQL Server, Cassandra
Data Visualization: Tableau, Power BI, Amazon QuickSight
Data Modeling & Warehousing: Star Schema, Snowflake Schema, Data Lakes
Streaming Technologies: Apache Kafka, AWS Kinesis
API Integrations: RESTful APIs, CRM Platforms
Data Science & ML: PySpark, Scikit-Learn, TensorFlow, Databricks