Professional Summary:
With over 10 years of experience in Information Technology and 7+ years specializing in Big Data using the Hadoop ecosystem, I bring expertise in analysis, design, development, testing, deployment, and integration using SQL and Big Data technologies. I have hands-on experience with major Hadoop components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Spark, Kafka, and more.
Results-focused data professional equipped for impactful contributions. Expertise in designing, building, and optimizing complex data pipelines and ETL processes. Strong in SQL, Python, and cloud platforms, ensuring seamless data integration and robust data solutions. Known for excelling in collaborative environments, adapting swiftly to evolving needs, and driving team success.
I have strong knowledge of distributed systems, MapReduce, and Spark processing frameworks, along with experience in ETL methods for data extraction, transformation, and loading. I have successfully deployed Big Data applications using Talend on AWS and Microsoft Azure and optimized cloud services with AWS, including EC2, Redshift, Glue, Lambda, and Kinesis.
My expertise extends to data ingestion, cleansing, transformations, and aggregation using tools like Spark SQL, Kafka, Flume, and AWS Glue. I’ve worked extensively on cloud migration, real-time streaming data processing, and optimizing Hive tables for better query performance.
Experienced with designing and optimizing data pipelines to ensure seamless data flow. Utilizes advanced SQL and Python skills to create and maintain robust data architectures. Track record of implementing scalable solutions that enhance data integrity and support informed decision-making.
I have also collaborated with Data Science teams to build machine learning models and developed data pipelines to support these models. Additionally, I’ve led efforts in serverless architecture deployment, managed Databricks workspaces, and implemented Python-based data processing solutions in AWS EMR.
My skills also include experience in data visualization, Google Cloud components, container management with Kubernetes, and a strong foundation in programming languages like Python, Java, and SQL. I’ve been involved in various project life cycles, from design to implementation, using Agile and Waterfall methodologies.
Lastly, I am proficient in maintaining data quality, performing business and data analysis, and ensuring efficient data solutions and client deliverables on time.
Big Data/Hadoop Technologies - MapReduce, Spark, Spark SQL, Azure, Spark Streaming, Kafka, PySpark, Pig, Hive, HBase, Flume, Flink , Yarn, Oozie, Zookeeper, Hue, Ambari Server, Teradata, GCP,NIFI,
Languages - HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Genism, Kera’s), Java Script, Shell Scripting
NO SQL Databases - Cassandra, HBase, MongoDB, Maria DB
Web Design Tools - HTML, CSS, JavaScript, JSP, jQuery, XML
Development Tools - Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse, NetBeans.
Public Cloud - EC2, IAM, S3, Auto scaling, Cloud Watch, Route53, EMR, RedShift, Glue, Athena, Sage Maker.
Orchestration tools - Oozie, Airflow.
Development Methodologies - Agile/Scrum, UML, Design Patterns, Waterfall
Build Tools - Jenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UI
Reporting Tools - MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos.
Databases - Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza
Operating Systems - All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris