Data Engineer with 7+ years of experience designing, building, and optimizing scalable data pipelines across AWS, Google Cloud Platform (GCP), and Hadoop ecosystems. Proficient in orchestrating cloud-native ETL workflows using Airflow, AWS Glue, Lambda, Step Functions, Cloud Composer, Dataflow, and DataProc. Hands-on expertise in Big Query, Cloud Storage, Amazon S3, Snowflake and modern data lake architectures. Strong coding skills in Python and SQL, with deep knowledge of Apache Spark, Hive, and HDFS for distributed data processing and analytics. Experienced in production support, incident management, and performance tuning to ensure data reliability at scale. Actively expanding capabilities in Generative AI (GenAI), with hands-on experience integrating LLM-powered automation into cloud-based data workflows to enable intelligent analytics and next-gen data solutions.
Environment: AWS (Amazon S3, lambda, Glue, Step Functions, Cloud Watch, Athena, DynamoDB), Hadoop, Cloudera manager, Informatica, Snowflake, Hive, Apache Ambari, SQL, GitHub, python, Bitbucket, shell Scripting, Unix/Linux, Splunk, Jira, ServiceNow.
Environment: GCP(Google cloud storage, Big Query, DataProc, Dataflow, Cloud Composer), SQL, GitHub, Airflow, FHIR Store, HealthCare, Apache Beam, Cloud shell, Python, Looker.
Environment: Hadoop, HDFS, Spark, python, Teradata, Hive, Aorta, Sqoop, API, GCP, Google cloud storage, Big Query, Dataproc, Dataflow, Cloud Composer, Pub sub, SQL, DB2, UDP, GitHub, Tableau, Data Studio, Looker, etc.
Environment: Git, Oozie, Apache Spark, Scala, Python, Shell Scripting (Bash), PowerShell, Hadoop Ecosystem, Cloudera, Ambari, Jenkins, Agile, Scrum, JIRA, Linux, Windows, SQL