Portfolio

This Portfolio is a compilation of all the Data Science and Data Analysis projects I have done for academic, self-learning and hobby purposes. This portfolio also contains my achievements, skills, and certificates.

Achievements

  • Recipient of Outstanding Master of Engineering - Industrial Engineering Student Award.
  • Winner of a TAMU Datathon 2020 among 50+ teams.
  • Recipient of TAMU Scholarship and Fee Waiver for excellent academic performance (4.0 GPA).
  • Published total 8 research papers in the domains of machine learning, renewable energy, and Computational Fluid Dynamics.

Projects

Customer Survival Analysis and Churn Prediction

In this project I have used survival analysis to study how the likelihood of the customer churn changes over time. I have also implementd a Random Forest model to predict the customer churn and deployed a model using flask webapp on Heroku. App

Instacart Market Basket Analysis

The objective of this project is to analyze the 3 million grocery orders from more than 200,000 Instacart users and predict which previously purchased item will be in user's next order. Customer segmentation and affinity analysis are also done to study user purchase patterns.

Hybrid-filtering News Articles Recommendation Engine

A hybrid-filtering personalized news articles recommendation system which can suggest articles from popular news service providers based on reading history of twitter users who share similar interests (Collaborative filtering) and content similarity of the article and user’s tweets (Content-based filtering).

Predictive Maintenance of Aircraft Engine

In this project I have used models such as RNN, LSTM, 1D-CNN to predict the engine failure 50 cycles ahead of its time, and calculated feature importance from them using sensitivity analysis and shap values. Exponential degradation and similarity-based models are also used to calculate its remaining life.

Multivariate Phase 1 Analysis

Objective of this project is to identify the in-control data points and eliminate out of control data points to set up distribution parameters for manufacturing process monitoring. I utilized PCA for dimension reduction and Hotelling T2 and m-CUSUM control charts to established mean and variance matrices.

Micro Projects

  • Statistics and Machine Learning

    • Genetic Algorithm : In this file, I have implemented simple genetic algorithm that finds out the list of numbers which equal to any specified number when summed together.
    • Bayesian Statistics : In this file, I explored how bayesian statistics works and how prior assumption reflects posterior probabilities using Gun control example.
    • Gaussian Mixture Model and Expectation Maximization: In this file, I implemented Expectation Maximization algorithm to find out true distribution of one dimensional GMM of 2 gaussians.
    • Linear Regression: In this file, I aim to solve linear regression using analytical method and also by implementing gradient descent, stochastic gradient descent and minibatch gradient descent algorithms.
    • Neural Network Implementation: In this file, I implemented simple neural network using forward propogation, backword propogation and optimization functions to predict the customer churn.

  • Challenges

    • SQL Challenges: This repository contains codes of online SQL challenges (From Hackerrank, Leetcode, Testdome, etc.) solved by me.
    • Data Science Challenges: This repository contains codes of online Data Science challenges (From Hackerrank, TestDome, etc.) solved by me.

  • Ranking Algorithms

    • Ranking of NFL teams using Markov-chain methods : In this project I implemented and compared three stationary distribution of Markov-chain based approaches to rank 32 NFL (National Football League) teams from "Best" to "Worst" using the scores of 2007 NFL regular season.
    • Ranking of Tennis players : Objective of this project is to rank all Tennis Players based on the matches they played in the year of 2018. This project comprises 4 approaches to rank Tennis players and I have tried to make these approaches more robust sequentially.

Core Competencies

  • Languages: Python, R, SQL, C++
  • Methodologies: Machine Learning, Time Series Analysis, Deep Learning, NLP, Statistics, Explainable AI, Data Structures & Algorithms
  • Tools: Amazon Web Services (S3, Lambda, API Gateway, ECR, ECS, EC2, Cloudwatch, SNS, SQS), Docker, Airflow, Elasticsearch, PostgreSQL, Serverless, Travis CI, Git, Terraform, Flask, MS Excel, Tableau, LangChain

Certificates