Though Big Data has become common in many domains nowadays, the challenges to develop efficient and automated mining of the ever increasing data sets by new generations of data scientists are eminent. These challenges span wide swathes of society, business and research...
Though Big Data has become common in many domains nowadays, the challenges to develop efficient and automated mining of the ever increasing data sets by new generations of data scientists are eminent. These challenges span wide swathes of society, business and research. Astronomers with their high-tech observatories are historically at the forefront of this field, but obviously, the impact in e.g. commercial applications, security, environmental monitoring and experimental research is immense. We aim to contribute to this general discussion by training a number of young scientists in the fields of computer science and astronomy, focussing on techniques of automated learning from large quantities of data to answer fundamental questions on the evolution of properties of galaxies. While these techniques will lead to major advances in our understanding of the formation and evolution of galaxies, we will also promote, in collaboration with industry, much more general applications in society, e.g. in medical imaging or remote sensing. We have put together a team of astronomers and computer scientists, from academic and private sector partners, to develop techniques to detect and classify ultra-faint galaxies and galaxy remnants in a deep survey of the Fornax cluster, and use the results to study how galaxies evolve in the dense environment of galaxy clusters. With a team of young researchers we will develop novel computer science algorithms addressing fundamental topics in galaxy formation. The collaboration is unique - it will develop a platform for deep symbiosis of two radically different strands of approaches: purely data-driven machine learning and specialist approaches based on techniques developed in astronomy. Young scientists trained with such skills are highly demanded both in research and business.
The objective of SUNDIAL is to train researchers to address the most prominent CI topics related to the analysis of Big Data and their application to galaxy evolution studies. These are:
(1) Automatic detection of faint low surface brightness features (dwarf galaxies, merger remnants, intracluster light) in deep astronomical surveys, and interpreting them astrophysically in terms of galaxy formation and evolution.
(2) Automated object recognition in Big Data sets: (a) the unsupervised identification of groups of objects with similar (clustering), properties and (b) the supervised assignment of objects into pre-defined target classes (classification). The addition of prior information from astrophysics will be crucial in both cases.
(3) Simulations of galaxy interaction, their characterisation and visualisation. The simulations serve to identify the critical characterisation, necessary to optimally identify how observations can be described. Such comparisons will lead to a better parametrisation and understanding of galaxy cluster evolution.
Developing detection methods for faint objects (problem 1) involves a number of aspects. First of all, one has to understand the dataset, and the instrumental effects that are intrinsic to the dataset. This is the astronomical part of the problem. Secondly, the detection algorithm itself has to be optimized. In SUNDIAL this is done in close collaboration between astronomers and computer scientists. The third and final task is to determine accurate source parameters, such as size, flux etc. We have advanced considerably in part one and two. We have improved a detection algorithm previously developed by our group, called MTObjects. tool has been compared extensively with other faint object detection methods, and turns out to give a superior performance in most aspects. At the same time we are working on understanding and characterising artefacts in deep datasets, which are necessary to interpret faint detections in order to decide whether they are real astronomical sources or artefacts in the data, or in the foreground, such as Galactic cirrus. The third step, automatically determining reliable parameters, will follow in the coming year.
We have been working on developing two types of analyses for morphological classification of galaxies in the GAMA catalog (problem 2): an unsupervised and a supervised analysis with prototype-based methods. We assessed whether class structure can be recovered by a clustering of the data generated by the unsupervised Self-Organizing Map (SOM), and investigated if the morphological classification can be reproduced using the GMLVQ method. We are able to produce state-of-the-art results, but are limited by the human bias in morphological classification schemes. For that reason we will aim to go for new physically based classification schemes using new information, especially from the outer parts of galaxies, optimally using the astronomical datasets mentioned before.
As a testbed for problem 3, we use the jellyfish galaxy NGC 1427A, a galaxy with large amounts of gas forming many stars at present, which is presumably falling into the Fornax Cluster, and losing its gas due to ram pressure stripping by the intracluster medium. At present, we are making realistic computer simulations of such late-type dwarf galaxies to model this object falling into the cluster. To describe these simulations, we have developed models which are able to automatically detect dense (possibly lower-dimensional) structures embedded in a substantial noisy background.
Our comparison of various faint detection methods shows that MTObjects is the best method to use to detect faint galaxies in deep data. It is reliable, fast, and objective. We are working on a number of applications, which should show convincingly to the scientific community that this is the prime tool to use. MTObjects is currently the only tool within the comparison that can reliably detect nested sources, and provides an initial deblending estimate. It provides consistent performance using the same parameter settings across different quality criteria and data sources.
To prepare attractive samples for galaxy classification, including spectral information, we have created a K-band infrared imaging survey of the Fornax Cluster. With this survey, together with the optical data of the FDS, we are preparing samples of galaxies with known photometric decompositions, which will serve as training sets for automatic morphological decompositions. For this we are developing new classifiers with more discriminative power, which we measure on the galaxy images. These include faint imaging features and spectral information for part of the dataset. We will also explore how to include spectral data in the automatic methods, using UCDs as a first training set.
We have performed realistic simulations of late-type dwarf galaxies falling into the Fornax Cluster. We are characterising them with a fully developed novel machine learning method for robust detection of multiple low-dimensional manifolds in a potentially significant noisy background. The manifolds can be of different dimensionalities and the methodology does not assume their number is known in advance. The method gives us potentially unprecedented possibilities to quantitatively compare simulations with each other and with observations (with full flexibility in defining the observation space). More refined calibration of simulations enabled by this methodology can help us to better understand the physics of dwarf galaxies falling into clusters.
More info: http://www.astro.rug.nl/sundial.