As more and more data is collected by science, industry, finance, and other fields, the need increases for automated methods to identify content of interest. The discovery of new or unusual observations within large data sets is a key element of the scientific process, since unexpected observa- tions can inspire revisions to current knowledge and over- turn existing theories. When exploring a new environment, such as the deep ocean or the surface of Mars, quickly identifying observations that do not fit our expectations is essential for making the best use of limited mission lifetimes. The challenge is particularly acute for image data sets that may contain millions of images (or more), rendering exhaustive manual review infeasible.
Once an observation is identified as novel or anomalous, the next question is generally, “Why?” To investigate the anomaly, users need to know what properties of the obser- vation caused it to be selected. For images, these properties might include color, shape, location, objects, content, etc.
In this work, the first method to generate human- comprehensible (visual) explanations for novel discoveries in large image data sets. A simple example is shown below, where a flower with a dark center was selected from a set of images containing yellow objects (e.g., banana, squash, lemon). The yellow color is not novel (middle), so it is omitted from the explanation (right). Instead, the dark center of the flower is highlighted as novel. We describe the details of how the selections are made and how the explanations are generated, then conduct experiments that include studies of well known ImageNet images and data compiled for real scientific investigations. One of the urgent questions within the field is whether interpretability has to come at the expense of accuracy or performance. Our results support an optimistic answer: in the case of novel image detection, we can obtain explainable results in tandem with the best discovery performance