Abstract of the paper:
In this work we train three decision-tree based ensemble machine learning algorithms (Random Forest Classiﬁer, Adaptive Boosting and Gradient Boosting Decision Tree respectively) to classify quasars in the variable source catalog in SDSS Stripe 82. We build training and test samples (both containing 1:1 of quasars and stars) using spectroscopic conﬁrmed sources in SDSS DR14 (including 8330 quasars and 3966 stars). We model each of the SDSS ﬁve band light curves of each source with a damped random walk process. We ﬁnd that, trained with the variation parameters alone, all three models can select quasars with similarly and remarkably high precision and completeness (∼ 98.5% and 97.5%), even better than trained with SDSS colors alone (∼ 97.2% and 96.5%). Combining variability and color features, we achieve precision and completeness both ∼ 99.0%. Applying the trained classiﬁers on the 48,716 unlabeled variable sources in SDSS Stripe 82, we ﬁnd ∼ 1100 more quasar candidates, 57% of which are likely true quasars based on the performance of the classiﬁers. We thus estimate the spectroscopic conﬁrmed quasars in the SDSS Stripe 82 variable source catalog is ∼ 93% complete. We ﬁnally present the relative importance of each observational features in classifying quasars given by the random forest classiﬁer, and discuss the eﬀect of imbalanced datasets.
1. Trained three ensemble machine learning classiﬁers to classify quasars in the variable source catalog in SDSS Stripe 82 with both color and variability features, achieving precision and completeness both ∼ 99.0%.
2. Calculated the completeness of SDSS Stripe 82 variable source catalog by applying the classiﬁers on unlabeled sources in this catalog, getting the result of ∼ 93% which was consistent with previous research.
3. Calculated the structure function of extreme variable quasar (EVQ), and compared it with non-EVQ.
Advisor: Junxian Wang at USTC in 2019