All of us here at the Andromeda Project are really excited to re-launch the project for a second round of classification in mid-October. While we’re busily getting everything ready for you to look at, I thought we could provide you with an update of what we’ve been doing to analyze your work since last December.
The Basics — From Clicks to Clusters
The primary goal of the Andromeda Project is to find star clusters big and small. Our hard-working Round 1 participants looked at over 1 million images clicking on all the clusters they found. This meant at least 80 of you looked at each of the 12,425 Round 1 images. A University of Utah undergraduate student, Matt Wallace, who has joined the Andromeda Project Science Team found looking at your image clicks a mesmerizing process and made this movie — the drawings are color-coded by the type of object (cluster=white, galaxy=green):
The simplest way we can translate these clicks into real clusters is to look at the fraction of people that called an object a cluster. If 72 of the 80 people that looked at an image circled the same object, then that object has a “ClusterFrac” of 0.90 or 90%, while if only 8 of the 80 people clicked on something, its ClusterFrac is 0.10. The simplest way we can find clusters is to choose a threshold ClusterFrac (e.g. 0.35) — by picking only objects above this threshold we get mostly real clusters without including too many objects that aren’t clusters.
Our Testing Ground — the Year 1 Sample
How do we know if we’re finding the clusters that we want to find? One reference point is the “Year 1” cluster sample, published in Johnson et al. 2012. This sample is based on about 20% of the Andromeda Project images that a group of professional astronomers looked through to create our initial cluster catalog of 601 good star clusters, as well as a catalog of galaxies and other non-cluster objects. We can compare the fraction of these Year 1 clusters found to assess the completeness of the Andromeda Project cluster sample. A completeness value of 0.90 means that 90% of these Year 1 clusters were found by Andromeda project users. The lower we make the ClusterFrac threshold, the higher the resulting completeness. We can also look at all the other objects that were found that might be contaminants; these include previously classified galaxies and objects we previously decided weren’t clusters as well as objects that were not identified by professional astronomers during the Year 1 search (at least some of which may be real clusters!!!!).
The plot below summarizes how we make the comparison between the Andromeda Project data and the Year 1 cluster sample.
The top panel of this plot shows the ClusterFrac threshold required to get the completeness shown on the horizontal x-axis. For instance, to achieve 90% completeness of the Year 1 cluster catalog, we need to use a ClusterFrac threshold of 0.35 (i.e. where at least 35% of people clicked on the candidates). In the bottom panel, we can look at the number of possible non-cluster contaminants we pick up along with the good clusters. For instance, for 90% completeness in the Year 1 sample, we find a few percent of the objects in the sample are known galaxies, 7-8% are objects we previously decided were not clusters and a similar number of objects were previously unidentified. On the other hand, if we use a ClusterFrac threshold of 0.5, contaminants make up <10% of the sample, but we only include ~75% of the Year 1 clusters. Our goal is to try analyzing the data in a way to maximize the completeness and minimize the number of contaminants. As part of this effort we’re weighting users based on how well they did at identifying good clusters and trying to determine what objects might be being missed by Andromeda Project users. This may sound kind of critical, so we’d like to emphasize how awesome the data is. Regardless of how we analyze the data, the Andromeda Project will produce the largest and best characterized sample of clusters known in any galaxy! Thanks to you!