"All data has its beauty, but not everyone sees it."
-- Damian Mingle



About Me

Hey,there! I'm Zhi!

I'm a Data Scientist with previous experience in environmental science. I am fascinated by the potential power of data to solve business challenges.

I will soon graduate from my Master's of Science in Data Science (MSDS) at the University of San Francisco, where I have developed a strong programming and statistics skill set that can tackle business problems involving big data.

As a Data Scientist at Fair.com , my primary role is to improve user performance as well as increase customer conversion rates by creating vehicle recommendation engine.

Skills

In the following part, I'd like to share some of my projects I completed so far that I found interesting.

Click "more" for details and source code on github.

Featured Projects

Spark ML - California Air Quality Prediction

A data pipeline which automates California Air Quality data extraction from AWS S3 to MongoDB and connected to Spark to model California Air Quality.

(ETL, data pipeline, AWS, S3, EC2, EMR, MongoDB, SparkSQL, SparkMLlib, Pyspark)

Deep Vision - Computer Vision Solutions

An comlete web product that enables companies with limited machine learning expertise employees to train high-qualifty images classification models.

(AWS, ETL, RDS, Flask, RDS, HTML, CSS, Deep Learning, Pytorch, Google Analytics)

Time Series - Canadian Bankruptcy Rate Prediction

Predicted Canadian monthly bankruptcy via univariate and multivariate time series models with macroeconomic indicator.


(R, Time Series, Box-Jenkins, ARIMAX/ SARIMAX, Holt-Winters, VAR/VARX, ggplot)

Machine Learning Modeling

Spam Classification

A predictive model to classify whether an email is a spam or not

(python, boosting trees, numpy, XGBClassifier)

Click Through Rate Prediction

A predictive model to potential predict the click through rate of Avazu.

(Python,Feature Engineering, Mean Target Encoding, Random Forest)

Online Bidder Classification

A classification model to classify whether an online auction bidder is 'Bot' or not.

(Python, Random Forest, Grid Search, Gradient Boosting)

Recommendation & Ranking System

Search Engine Implementation

An implementation of search engines, including linear search, index search and hashtable search.

(Python, Flask, HTML, Jinjia2)

Articles Recommender System

An interactive website deployed to recommend other similar articles to your choice.

(word2vec, Standford GloVe, AWS, Flask)

Movie Ranking System

A collaborative filtering system to potential predict movie rating for a viewer.

(Matrix Factorization, Stochastic Gradient Descent Optimization)

Natural Language Processing

Movie Review Sentiment

A sentiment prediction model to summarize whether an IMDB movie review conveys positive or negative sentiment

(NLTK, Continuous Bag of Words Model, Word Embedding, Pytorch)

Twitter Sentiment Analysis

A website deployed twitter list page with colored twitter feeds based on feeds' sentiment and average sentiment score

(Tweepy, vaderSentiment, Jinja, flask, AWS, EC2, Python)

Vehicle Similarity

A tutorial to show how to compute vehicle similarity by generating your own word embeddings with genism word2vec library

(genism, word2vec, word embedding, cosine similarity, Python)

Deep Learning & Python Algorithm Implementation

Handwritten Digit Recognition

Applied neural network to recognize handwritten digit from images with Pytorch

(Pytorch, Neural Network, Image Recognition)

Credict Card Anomaly Detection

An Anomaly Detetion implementation based on the Isolation Forest method using Python

(Python,Object Oriented Programming)

Stay Connected