Overall 9 years of experience in IT industry including 4 years of experience as SAS Developer using SAS
and 5 years of experience as Hadoop/Spark Developer using Big data Technologies like hadoop ecosystem,Spark
ecosystem and SQL.
Hands on experience in analysis, design,coding and Testing phases of Software development lifecycle(SDLC)
Experience in using hadoop ecosystem components like HDFS,MapReduce, Hive, sqoop and Spark.
Experience in migrating HQL code into pyspark using spark libraries.
Indepth understanding of spark architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming.
Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
Experience in creating tables, partitioning, bucketing,loading and aggregating data using hive.
Experience in Migrating the code from hive to apache spark and Python using Spark SQL, RDD.
Experience in analyzing data collected from different data sources and perform validations part of integration
testing.
Experience in implementing python code to perform the some checks and audit table validations for a batch load.
Experience in working with SAS Enterprise Guide.
Experience in working with ETL tool SAS Data Integration Studio.
Experience in using SAS/MACRO for creating macro variable, macro programs to modify existing SAS program
for ease of modification while maintaining consistency of results.
Experience in using SAS/SQL for creating summary reports, displaying query results, generating tables and Views, SQL Joins and set operators.
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring integrity in a relational environment by working closely with the stakeholders & solution architect.
Involved in creating Hive tables, Loading with data and writing hive queries that will run internally in mapreduce way.
Used hive to analyze the partitioned and bucketed data.
Used hive Optimization techniques during joins and follow best practices in writing hive scripts Using HiveQL.
Implemented spark using python and sparksql for faster testing and processing data.
Experience in using Accumulator variables, Broadcast variables, RDD Caching.
Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
Using pyspark framework access data from hive and do aggregations and joins to implement the business logic.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,scala and python.
Implemented Spark using Python and utilizing data frames and spark sql api for faster processing of data
developing spark scripts, UDFS using Spark SQL queries for data aggregation, querying and writing data back into RDBMSthrough sqoop
Implemented spark using python and utilizing spark core, spark streaming and spark sql api for faster processing of data instead of Mapreduce.
Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables and handled structured data.
Experienced in performing Data Ingestion and Data Processing(Transformations and aggregations).
Handled ingestion of data from different data sources into HDFS using sqoop.
Experience of partitions and bucketing concepts in hive and designed for both managed and external tables in hive to optimize performance.
Experience with different file formats like AVRO, Parquet, ORC and Json.
Involved in functional studies and being part of walk troughs with stake holders.
Analyzing the requirements.
Created understanding documents for design and mapping.
Developed SAS code to import the EOC files from the different vendor Locations.
Implemented code to check the file format compliance and data integrity of EOC files.
Involved in clean-up activity and performance tuning of jobs as a part of SAS Migration from one version to another version.
Unit testing and documenting the Test results.
Involved into banking data preparation using SAS EG 4.2 for Management Information System.
Involved in SAS Campaign Management using SAS CI Studio.
Creation of files for various channels for the Barclays Marketing Team as per the input leads provided for campaigns such as Marketing (Cross Sell/Up Sell), Operational and Open Markets.
Respond to ad hoc requests by various departments; interact with various departments to clarify their need to view the data. Performs other tasks required as necessary to meet the needs of the business.
Accessing the oracle data using SQL using SAS ACCESS. Used both Implicit and Explicit Pass through.
Involved in creating SAS datasets from excel data by using Libname and Proc import techniques, as per requirement.
Creating SAS datasets – by combining many tables per business logic.
Deliver business reports to end users using SAS Output delivery system (ODS).
Create table metadata for source and target tables and created jobs to populate the target tables by applying various transformations (Table Loader, SQL Join, Splitter, Summary Statistics, Append) by using SAS Data Integration Studio.
Sub setting dataset using set, merge, sort, update and conditional statement.
Involved in development and enhancements of SAS Programs.
Resolving issues pertaining to data delivered to client .
SAS:
SAS v94, Base SAS, Advanced SAS, SAS/ACCESS, SAS/SQL, SAS DI STUDIO and SAS EG 47
Big Data Technologies:
Hadoop, MapReduce, HDFS, Sqoop, Hive, Apache Spark
Programming Languages:
Python, PySpark and Scala
Databases:
SQL Server, Oracle, Teradata, Shell Script and SQL
Versioning Tools:
Git and GitHub