Over 9 years of diversified IT experience as a Data Engineer, specializing in requirement gathering, design, development, testing, and maintenance of databases, Cloud technologies, data pipelines, and Data Warehouse applications.
Well-versed in RDBMS like Oracle, MS SQL Server, MYSQL, Teradata, DB2, Netezza, PostgreSQL, MS Access; Exposure to NoSQL databases such as MongoDB, HBase, DynamoDB, and Cassandra.
Hands-on experience with Azure (including Azure Data Factory, Data Lake Storage, Synapse Analytics, Cosmos NO SQL DB), GCP (including Big Query, GCS, Cloud functions, Dataflow, Pub/Sub, Data Proc), and AWS (including EC2, Glue, Lambda, SNS, S3, RDS, Cloud Watch, VPC, Elastic Beanstalk, Auto Scaling, Redshift).
Experience in developing web applications using Python, Pyspark, Django, C++, XML, CSS, HTML, JavaScript, and jQuery.
Proficient in developing business reports with Power BI, Tableau, SQL Server Reporting Service (SSRS), analysis using SQL Server Analysis Service (SSAS), and ETL processes using SQL SERVER Integration Service (SSIS). Good handling of complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT.
Adaptable in using Data Modeling packages like NumPy, SciPy, Pandas, Beautiful Soup, Scikit-Learn, Matplotlib, Seaborn in Python, and Dplyr, TidyR, ggplot2 in R. Knowledge of OLAP/OLTP, Dimensional Data Modeling with Ralph Kimball Methodology.
Well-versed with tools like SVN, GIT, SourceTree, Bitbucket, and experience with Unix/Linux commands, scripting, and deployment on servers.
Involved in all phases, including Agile, Scrum, and Waterfall management processes, focusing on high availability, fault tolerance, auto-scaling, and query optimization techniques.
Python, R, SQL, Java, .Net, HTML, CSS, Scala, Requests, Report Lab, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Django, Flask, Pyramid, PyCharm, Sublime Text, REST, SOAP, Microservices, HTML, CSS, JavaScript, MVW, MVC, Oracle, PostgreSQL, Teradata, IBM DB2, MySQL, PL/SQL, MongoDB, Cassandra, DynamoDB, HBase, WAMP, LAMP, Cloudera distribution, Hortonworks Ambari, HDFS, Map Reduce, YARN, Pig, Sqoop, HBase, Hive, Flume, Cassandra, Apache Spark, Oozie, Zookeeper, Hadoop, Scala, Impala, Kafka, Airflow, DBT, NiFi, Power BI, SSIS, SSAS, SSRS, Tableau, Kubernetes, Docker, Docker Registry, Docker Hub, Docker Swarm, EC2, S3, RDS, VPC, IAM, Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, Step Functions, Cloud transformations, EMR, Big Query, GCS Bucket, G-Cloud function, Cloud Dataflow, Pub/Sub, Cloud Shell, GSUTIL, BQ command line utilities, Data Proc, Web application, App services, Storage, SQL Database, Virtual machines, Search, Notification Hub, Relational data modeling, ER/Studio, Erwin, Sybase Power Designer, Star Join Schema, Snowflake modeling, FACT and Dimensions tables, Kinesis, Kafka, Flume, Concurrent Versions System (CVS), Subversion (SVN), GIT, GitHub, Mercurial, Bit Bucket, Docker, Kubernetes
Proficient in building data pipelines using Python/Pyspark/Hive SQL/Presto/Big Query and Apache Airflow. Experienced in using Teradata utilities, Informatica client tools, Sqoop for data import/export, and Flume and NiFi for log file loading.
Extensive hands-on experience in Hadoop architecture and various components, SPARK applications (RDD transformations, Spark core, MLlib, Streaming, SQL), Cloudera ecosystem (HDFS, Yarn, Hive, Sqoop, Flume, HBase, Oozie, Kafka, Pig), data pipeline development, and data analysis with Hive SQL, Impala, Spark, and Spark SQL.