Extracting physical insight from a random forest classifier: A rigorous test of AGN feedback models against the observed local universe
Thursday
Abstract details
id
Extracting physical insight from a random forest classifier: A rigorous test of AGN feedback models against the observed local universe
Date Submitted
2021-04-28 15:24:00
Joanna
Piotrowska
University of Cambridge
Machine Learning Applications in Astronomy
Contributed
J. Piotrowska (University of Cambridge), A. Bluck (University of Cambridge), R.Maiolino (University of Cambridge), Y. Peng (KIAA, Peking University)
Understanding the physical processes responsible for ceasing star formation in galaxies is one of the most important, and complex, open questions in the field of observational cosmology. In this talk we investigate how star formation is brought to a halt in local, massive, central galaxies by comparing Sloan Digital Sky Survey (SDSS) observations with three state-of-the-art cosmological simulations – EAGLE, Illustris and IllustrisTNG.
In order to address the complex and highly non-linear nature of quenching in the observational parameter space, we combine sophisticated machine learning techniques with more conventional statistical methods to accurately determine which galactic properties are fundamentally connected with the physical process of central galaxy quenching.
We optimise a Random Forest (RF) architecture to classify galaxies into star-forming and quenched categories on the basis of their stellar, halo and black hole masses (in addition to black hole accretion rates in simulations). Having successfully optimised and trained the RFs, we then extract feature ‘importances’ (via reduction in Gini impurity) to determine which galactic properties provide the highest information gain across all decision trees. Additionally, we rigorously test for the influence of measurement uncertainty, inter-correlation between input parameters, and sample selection on the inferred importance. We demonstrate that our conclusions drawn from the RF classification are highly robust to all of these common issues inherent in information extraction via machine learning.
In terms of scientific results, we find that supermassive black hole mass (MBH) is the most predictive parameter for determining whether a galaxy is star-forming or passive in observations – a statement which is also true for all three implementations of AGN feedback in the simulations. Yet, this fundamental consistency between observations and simulations lay hidden in a mess of inter-correlation in the raw data: only through machine learning could we extract this key result.
All attendees are expected to show respect and courtesy to other attendees and staff, and to adhere to the NAM Code of Conduct.