"All data has its beauty, but not everyone sees it."
-- Damian Mingle
-- Damian Mingle
Hey,there! I'm Zhi!
I'm a Data Scientist with previous experience in environmental science. I am fascinated by the potential power of data to solve business challenges.
I will soon graduate from my Master's of Science in Data Science (MSDS) at the University of San Francisco, where I have developed a strong programming and statistics skill set that can tackle business problems involving big data.
As a Data Scientist at Fair.com , my primary role is to improve user performance as well as increase customer conversion rates by creating vehicle recommendation engine.
In the following part, I'd like to share some of my projects I completed so far that I found interesting.
Click "more" for details and source code on github.
A data pipeline which automates California Air Quality data extraction from AWS S3 to MongoDB and connected to Spark to model California Air Quality.
(ETL, data pipeline, AWS, S3, EC2, EMR, MongoDB, SparkSQL, SparkMLlib, Pyspark)
An comlete web product that enables companies with limited machine learning expertise employees to train high-qualifty images classification models.
(AWS, ETL, RDS, Flask, RDS, HTML, CSS, Deep Learning, Pytorch, Google Analytics)
Predicted Canadian monthly bankruptcy via univariate and multivariate time series models with macroeconomic indicator.
(R, Time Series, Box-Jenkins, ARIMAX/ SARIMAX, Holt-Winters, VAR/VARX, ggplot)
A predictive model to classify whether an email is a spam or not
(python, boosting trees, numpy, XGBClassifier)
A predictive model to potential predict the click through rate of Avazu.
(Python,Feature Engineering, Mean Target Encoding, Random Forest)
A classification model to classify whether an online auction bidder is 'Bot' or not.
(Python, Random Forest, Grid Search, Gradient Boosting)
An implementation of search engines, including linear search, index search and hashtable search.
(Python, Flask, HTML, Jinjia2)
An interactive website deployed to recommend other similar articles to your choice.
(word2vec, Standford GloVe, AWS, Flask)
A collaborative filtering system to potential predict movie rating for a viewer.
(Matrix Factorization, Stochastic Gradient Descent Optimization)
A sentiment prediction model to summarize whether an IMDB movie review conveys positive or negative sentiment
(NLTK, Continuous Bag of Words Model, Word Embedding, Pytorch)
A website deployed twitter list page with colored twitter feeds based on feeds' sentiment and average sentiment score
(Tweepy, vaderSentiment, Jinja, flask, AWS, EC2, Python)
A tutorial to show how to compute vehicle similarity by generating your own word embeddings with genism word2vec library
(genism, word2vec, word embedding, cosine similarity, Python)
Applied neural network to recognize handwritten digit from images with Pytorch
(Pytorch, Neural Network, Image Recognition)
An Anomaly Detetion implementation based on the Isolation Forest method using Python
(Python,Object Oriented Programming)