Photometric Redshifts

Methods overview

The estimation method is the same as the one used in Data Release 10; following the name used in Csabai et al. (2007), we refer to it as a kd-tree nearest neighbor fit (KF). The KF estimates are stored in the Photoz table in the CAS.

The method is empirical in the sense that it uses a training set as a reference, then applies a machine learning technique to estimate redshifts. The training set contains photometric and spectroscopic observations for galaxies. We have chosen this approach - as opposed to template fitting methods - because of the machine learning techniques' higher overall precision. The second estimation method was dropped because we have found that the main limiting factor in the accuracy of the results is the composition and photometric errors of the training set, not the choice of machine learning technique.

Photozs updated in DR12, unchanged in DR13 and later data releases

Data Release 12 includes photometric redshift estimations for all primary photometric measurements tagged as galaxies (i.e. the elements of the GalaxyTag view in the CAS). This version features a greatly expanded training set, an updated method of template fitting and a more detailed approach to errors. As opposed to previous releases, we only provide the results of one estimation technique, and they are contained in the Photoz table and the PhotozErrorMap table in the CAS. This page summarizes the methods used to calculate the photometric redshift estimates, with further details available in Beck et al. (2016).

To infer values of physical parameters of galaxies, such as k-corrections, spectral type, and rest frame colors, we extend the KF method with a conservative method of template fitting. We determined the best-fitting template via a minimum chi-square fit to the photometric magnitudes, using the composite spectral template atlas of Dobos et al. (2012). The photometric errors were calculated using the prescriptions of Scranton et al. (2005).

The previous method used in Data Release 10 calculated a non-negative linear combination (NNLS) of spectral model templates. While this method is more sophisticated, it is prone to overfitting, and it also allows non-physical spectral solutions, which is especially a problem in cases where the photometric errors are underestimated. The current method is limited by the number and coverage of templates used, but it avoids the aforementioned issues.

Training set overview

The training set is made up of three main subsets. The first two are extracted from the DR12 spectroscopic catalog: the main galaxy sample, containing more than 830,000 galaxies (average r magnitude: 17.3, average redshift: 0.15, extending to 0.5), and the BOSS sample, comprised of over 1,060,000 galaxies (average r: 19.7, average redshift: 0.45, extending to 0.8). We also matched results from nine public spectroscopic redshift surveys to SDSS photometry, yielding 76,000 additional galaxies in the third set (average r: 19.1, average redshift: 0.25, extending to 0.8). We applied cuts in the photometric color space and errors to ensure higher accuracy, but at the same time greatly limiting the size and redshift coverage of the third subset. The RMS of the estimation errors for the three parts of the training set are 0.029, 0.050 and 0.070, respectively. Fainter objects generally have significantly higher photometric errors, which results in them having larger errors in redshift estimation, too.

Error fields and flags

The error statistics of the reference set are only good indicators of the error of the estimated redshifts when the objects to be estimated follow the same distribution in colour space and have the same photometric error properties as the training set. The KF method provides an explicit estimate of the redshift errors (zErr), and we have found this estimate to be reliable and unbiased if these assumptions hold.

The new flag field photoErrorClass in the Photoz table divides the galaxies into 7 categories based on their photometric errors, 1 being the best and matching the limits of the training set, 2 being somewhat worse, and so on until 7, the worst. Also, the sign of the photoErrorClass field shows whether the estimated galaxy is within the bounding box of its k nearest neighbors: negative if outside (meaning that we extrapolate), positive if inside the bounding box (we interpolate). The following table shows the average RMS for different photoErrorClass values, calculated for all galaxies with available redshifts.

Average RMS for different photoErrorClass values
photoErrorClass RMS photoErrorClass RMS
1 0.043 -1 0.066
2 0.074 -2 0.17
3 0.074 -3 0.15
4 0.085 -4 0.16
5 0.097 -5 0.16
6 0.11 -6 0.17
7 0.17 -7 0.26

The redshift error (zErr) values in the Photoz table are only valid estimates for photoErrorClass 1. For other error classes, the additional statistical error needs to be taken into account, however, it is highly dependent on the location in color and magnitude space. We recommend using photoErrorClass 1 (and perhaps -1, 2 and 3, at the most), with additional filtering based on the zErr values.

We added the table PhotozErrorMap, which provides supporting information regarding the error-dependence in color space, based on the training set. It shows the average actual RMS, the average error estimate, the average of the standard deviation of the k nearest neighbours, the average photometric and spectroscopic redshift, and the number of galaxies in the training set, for a grid in r magnitude, and g-r, r-i colors. This table can be used to pinpoint regions with poor training set coverage, and with bad error estimates.

The KF method provides some additional parameters that can be useful for quality assurance. For each galaxy in the Photoz table, nnCount is the number of nearest neighbors, after removing outliers. A value much smaller than 100 indicates poor training set coverage for that galaxy. Similarly, the parameter nnVol (the volume of the bounding box) warns if the reference set is only very sparsely populated around that galaxy. Although the spectroscopic redshift of the nearest object (nnSpecz) and the average nearest neighbor redshift (nnAvgZ) are not as good estimators as the fitted redshift (z), significantly different values might indicate large errors. Note that in all the related tables instead of NULL values we use the large negative value of -9999 to indicate that the estimation was not possible for some reason, or that data is not available.

Template fitting details

After the photometric redshift of each galaxy is determined, template fitting is used to estimate the galaxy's k-correction, distance modulus, absolute magnitudes, rest frame colors, and spectral type. We consider the templates at the fixed redshift given by the KF estimator. Where applicable, - to match what is used elsewhere in SkyServer/CasJobs - Omega=0.2739, Lambda=0.726, h=0.705 cosmology was assumed, where the unit of the luminosity distance is Mpc. The chisq and rnorm values indicate the quality of the minimum chi-square fit, and bestFitTemplateID identifies the spectral template giving the best fit. Note that bestFitTemplateID=0 indicates a failed fit. The empirical spectral templates described in Dobos et al. (2012) were used, and the following table shows the bestFitTemplateID values with the corresponding names from the compositeatlas library.

bestFitTemplateID values with the corresponding names from the spectral library
bestFitTemplateID name bestFitTemplateID name bestFitTemplateID name
1 p_RG 14 GG 27 s_G
2 h_RG 15 p_BG 28 G
3 hh_RG 16 h_BG 29 RED0_0
4 t_RG 17 hh_BG 30 RED1_0
5 l_RG 18 t_BG 31 RED2_0
6 s_RG 19 l_BG 32 RED3_0
7 RG 20 s_BG 33 RED4_0
8 p_GG 21 BG 34 SF0_0
9 h_GG 22 p_G 35 SF1_0
10 hh_GG 23 h_G 36 SF2_0
11 t_GG 24 hh_G 37 SF3_0
12 l_GG 25 t_G 38 SF4_0
13 s_GG 26 l_G

Examples

Two examples of how to query photometric redshifts in DR12 data are shown in SkyServer at Sample Queries: Photometric Redshifts.

External survey data

The following table references the spectroscopic redshift surveys that we used to extend the training set.

Spectroscopic redshift survey references
Survey name Reference Website
2dF Colless et al. (2001), Colless et al. (2003) http://magnum.anu.edu.au/~TDFgg/
6dF Jones et al. (2004), Jones et al. (2009) www.6dfgs.net/
DEEP2 Davis et al. (2003), Newman et al. (2013) http://deep.ps.uci.edu/
GAMA Driver et al. (2011), Baldry et al. (2014) www.gama-survey.org/
PRIMUS Coil et al. (2011), Cool et al. (2013) http://primus.ucsd.edu/
VIPERS Garilli et al. (2014), Guzzo et al. (2014) http://vipers.inaf.it/
VVDS Le Fèvre et al. (2004), Garilli et al. (2008) https://cesam.lam.fr/vvds/
WiggleZ Drinkwater et al. (2010), Parkinson et al. (2012) http://wigglez.swin.edu.au/site/
zCOSMOS Lilly et al. (2007), Lilly et al. (2009) http://cesam.lam.fr/zCosmos/