I'm a data driven analyst/scientist with my background in both Geophysics and Data Analytics. Having Master’s degree in both, I have strong foundation on Signal Processing, Data Wrangling, Data Analysis, Data Visualization, Machine Learning and Automation. I seek to be inspired, proactively learn and adopt to be the mastermind of data.
Data Wrangling (5/5
Automation 5/5
Software Quality Assurance 5/5
Python (5/5)
Machine Learning (5/5)
Data Visualization (5/5)
Data Analysis (5/5)
Statistics/Mathematics (5/5)
R (4/5)
Matlab (4/5)
SQL (4/5)
PyTorch (3/5)
September, 2009 - August, 2013
January, 2014 - May, 2016
May, 2018 - Dec, 2021
The oilonomics (Oil + socio-economics) is a web application for analyzing impact of hydrocarbon production on socio-economics with Colorado as the study area. Extensive analyses were done through regression, correlation and forecasted how each socio-economic factors will behave accordingly to hydrocarbon production.
With utilization of DenseNet121 and PyTorch, a new and existing machine learning models are evaluated and created. Given chest diseases (labels) as predictors, numerous features from clinical data, in conjunction with X-ray images, the model is evaluated with over 85% AUC.
Three multiclass-multiootput probabilistic models (Random Forest, XGBoosting, KNN) are discussed and evaluated with given METAR-ASPM data provided for major airports in the United States. The outputs are generated as an average of an airport’s probability of a flight to be delayed. Extensive parameter tunings are involved via grid-searching to optimize models and increase accuracy.
Raven’s Progressive Matrices (RPM) solver is an Artificial Intelligent Agent which attempts to solve for 2 x 2 and 3 x 3 RPM problems. Utilizing multiple image processing techniques and transformations via Python’s Pillow and OpenCV, the agent is designed to optimize its workflow in search for best matching choice among 6 (2 x 2) or 8 (3 x 3) choices. The accuracy of the performance lies at around 80% with efficiency at 10 seconds for 20 questions.
Utilizing supervised machine learning algorithms, the model predicts odds of winning the infamous online game based on the in game stats that are readily available online. Extensive EDA, feature engineering and feature selection were performed to accurately retrieve the highest performing classification models.
Walnut Creek, California,
94597
832-466-1356
jinhaeng.lee.87@gmail.com