The Optical Spectroscopy Pipeline
Introduction and Resources
The SDSS Early Data Release (EDR) paper is the original resource for understanding the processing and data products from the SDSS, describing the pipelines and spectroscopic data products. Successive data release papers: DR1, DR2, DR3, DR4, DR5, DR6, DR7, DR8, DR9, DR10,DR11, DR12, DR13, DR14, DR15, and DR16 describe changes to the optical spectroscopic data reduction between data releases. The technical summary paper provides more general information on the SDSS-I survey, the SDSS-III summary paper provides general information on SDSS-III and similar for the SDSS-IV summary paper . The BOSS overview paper provides general information for BOSS, the eBOSS overview paper also describes SEQUELS and the eBOSS program. The algorithms page includes links to pages describing algorithms used by the spectroscopic data-reduction pipelines. This page provides a summary of those steps and the associated output files.
Most of the optical spectroscopic catalog data (but not the spectra themselves) have been loaded into the Catalog Archive Server (CAS) database. Depending on the scientific use case, users may be better off obtaining SDSS data through a carefully constructed CAS query rather than by downloading the data files from the SAS. Simple queries can be used to select just the objects and parameters of interest, while more complex queries can be used to do complex calculations on many objects, thereby avoiding the need to download the data.
The remainder of this page gives a short description of pipeline changes for DR16 motivated by eBOSS. We provide a brief overview of spectroscopic data processing and sections that describe the steps of data processing in detail. In addition to the descriptions, each section provides references to papers that give additional details and a table of the files associated with that step of the pipeline that can be found in the SAS. These tables include links to the file format documentation (the "data model") and templates which can be used to generate SAS URLs for those files. The templates are in "C printf" format, and can be used in C, bash, Python, and many other languages to automatically generate URLs.
Changes for DR16
- Improved flux-calibration: a new set of stellar templates is now used to fit absorption lines of standard stars in the flux calibration process. Flux calibration residuals were reduced by a factor of 2 in the blue spectrograph (3600 to 6000 Å).
- Improved extraction algorithm: the background flux inside the spectrographs is now modeled prior to the extraction of the flux from individual fibers. This change improved the stability of extraction, mitigated fiber-to-fiber crosstalk from bright objects, and removed some features observed in quasar spectra. While these changes did not significantly reduce the spectroscopic classification success rates, they represent an improvement in the overall data quality.
Even though eBOSS is a different survey than BOSS, it uses the same instrument and has similar data to BOSS. Starting with DR13, all SDSS data releases contain both BOSS and eBOSS spectra reprocessed using the latest version of the reduction pipeline. The reprocessing consists of raw CCD count extraction, sky subtraction, flux calibration, wavelength solution, PSF estimates, and the spectral classification (galaxy, quasar or star). We did not, however, reprocess the SDSS-I/-II data.
Overview
- Spectroscopic Observing
- The spectrographs mounted on the primary 2.5m telescope collected spectra from each plate. There are two spectrographs, each of which collects data from 320 (SDSS) or 500 (BOSS/eBOSS) fibers. Each spectrograph has a dichroic that sends light to red and blue cameras, so the instrument produces a total of four images for each exposure.
- Spectroscopic Data Reduction
- The first component of the spectroscopic pipeline extracts one dimensional spectra from the raw exposures produced by the spectrographs, calibrates them in wavelength and flux, and combines the red and blue halves of the spectra. The second component measures features in these spectra, redshifts from these features, and classifies the objects as galaxies, stars, or quasars.
Notes
- In the tables below listing the data URLs, prepend 'https://data.sdss.org/sas/dr16/' to all 'URL format' values to get the full URL.
- URL suffixes are listed for the eBOSS survey, which includes a reprocessing of BOSS data from SDSS-III. Replace "eboss/" with "sdss/" to get the equivalent location for the original SDSS-I/-II survey files.
- Data reduction can occur multiple times for both images and spectra. The output of each unique reduction is labeled with a distinct "rerun" number (for photometry) or different "run2d" and "run1d" versions (for spectroscopy).
- The BOSS+eBOSS spectroscopic data for DR16 are processed with idlspec2d code
version run2d=run1d=v5_13_0. - The legacy target selection and boss target selection algorithm pages describe how the various spectroscopic target classes are selected for the BOSS and legacy SDSS surveys.
- The tiling algorithms page describes the process by which the spectroscopic plates are designed and placed relative to each other.
Spectroscopic Observing
Plate Plugging (plug)
When the observatory is ready to observe a plate, the observatory staff plugs optical fibers into the holes drilled into the plates, and maps which fiber corresponds to which hole (and therefore which object) by shining light through each fiber. This data is incorporated into one of the HDUs of the spPlate file described below.
Raw Data Collection
Observers mount cartridges containing the drilled, plugged plates on the telescope, and collect a series of 15-minute exposures on each plate until it reached a threshold estimated signal to noise ratio and at least three exposures had been collected.
File Type | in/out | Description | URL format | format parameters |
---|---|---|---|---|
sdReport | out | records exposures collected on a night | Not public | |
sdR | out | raw spectroscopic data frames | eboss/spectro/data/%d/sdR-%c%d-%08d.fit.gz | mjd, CCD (r or b), camera (1 or 2), exposure id |
Spectroscopic Data Reduction
The idlspec2d software has two major pipeline steps:
- spec2d
- Extract and calibrate 1-dimensional spectra from 2-dimensional raw CCD data
- spec1d
- Measure object classifications and redshift from those 1D spectra.
Two-dimensional Pipeline (spec2d)
References: Stoughton et al. (2002), section 4.10.1
The spec2d pipeline reads science and calibration exposures from the spectrographs, reduces and calibrates the science exposures, extracts the one dimensional spectra from the two dimensional exposures, stacks multiple exposures into combined spectra, and produces corresponding masks and noise estimates.
File Type | in/out | Description | URL format | format parameters |
---|---|---|---|---|
spPlan2d | in | the spectro2d processing plan | eboss/spectro/redux/%d/%04d/spPlan2d-%04d-%d.par | run2d, plate, plate, mjd |
spPlancomb | in | the processing plan for combining spectra | eboss/spectro/redux/%d/%04d/spPlancomb-%04d-%d.par | run2d, plate, plate, mjd |
plPlugMapM | in | records which fiber corresponds to which hole in a plate (and therefore objects, and what coordinates on the sky) | plPlugMapM not public, but table is in HDU 5 of the spPlate files | |
sdReport | in | records exposures collected on a night | Not released | |
sdR | in | raw spectroscopic data frames | eboss/spectro/data/%d/sdR-%c%d-%08d.fit.gz | mjd, CCD (r or b), camera (1 or 2), exposure id |
spCFrame | out | calibrated spectra for a single CCD and exposure | eboss/spectro/redux/%d/%04d/spCFrame-%c%d-%08d.par | run2d, plate, CCD (r or b), camera, exposure id |
spPlate | out | the 640 (SDSS) or 1000 (BOSS) combined flux- and wavelength-calibrated spectra over all exposures (potentially spanning multiple nights) for a given mapped plate | eboss/spectro/redux/%s/%04d/spPlate-%04d-%d.fits | run2d, plate, plate, mjd |
Example: the spPlate file for plate 4444 MJD 55538 is at URL
https://data.sdss.org/sas/dr16/eboss/spectro/redux/v5_13_0/4444/spPlate-4444-55538.fits [107 MB].
BOSS/eBOSS has two spectrographs with 500 fibers each, grouped in 25 bundles of 20 fibers each. The light is split into a blue channel and a red channel, for a total of 4 CCD images per exposure. The CCD y-coordinate is the spectral dispersion direction (larger y is larger λ) and larger x is larger fiber number, though the spectral "traces" in y vs. x are curved and do not exactly align with CCD columns.
Raw electrons are extracted from the CCD images using row-by-row extractions similar to Horne 1986 by fitting Gaussians, plus a polynomial background, to each CCD row for each bundle of 20 fibers.
Fiber flats correct for fiber-to-fiber variations by comparing the differences between fibers equally illuminated by a smooth flat lamp spectrum.
Sky model is derived from flat-fielded electrons of sky fibers and then interpolated to the locations of every science fiber and subtracted. Flux calibration vectors model the instrument and atmospheric throughput per exposure by comparing standard star spectra to a set of models of known flux.
Flux correction vectors adjust for flux mis-calibrations with low-order polynomials per-fiber per-exposure to make different exposures of the same object consistent with each other.
Flux distortion vectors model variations in the throughput across the focal plane.
Putting these together, the "flat-fielded sky-subtracted electrons" in spFrame
:
$F_{e} = electrons / \big(superflat \cdot fiberflat\big) - skymodel$
become the "calibrated flux" in spCFrame
:
$F = \big(F_{e}/calib\big) \cdot fluxcorr \cdot fluxdistort$
The following sections describe each of these steps in more detail.
Flux correction
Individual exposures are initially flux-calibrated with no constraint that the same object has the same flux across different exposures. Empirical "fluxcorr" vectors are broadband corrections to bring the different exposures into alignment for each object prior to coaddition. In DR13 and prior, these were implemented for each spectrum by minimizing
$\chi_{i}^{2} = \sum_{\lambda}\frac{\big(f_{i\lambda}-f_{ref,\lambda}/a_{i\lambda}\big)^{2}}{\big(\sigma^{2}_{i\lambda}-\sigma^{2}_{ref,\lambda}/a^{2}_{i\lambda}\big)^{2}}$
where fiλ is the flux of exposure i at wavelength λ; fref,λ is the flux of the selected reference exposure; and aiλ are low-order Legendre polynomials. The number of polynomial terms is dynamic, up to a maximum of 5 terms. Higher order terms are added only if they improve the χ2 by 5 compared to one less term. This approach is biased toward small aiλ, since that inflates the denominator to reduce the χ2.
Since DR14, we solve the fluxcorr vectors relative to a common weighted coadd Fλ which is treated as noiseless compared to the individual exposures.
$\chi_{i}^{2} = \sum_{\lambda}{\frac{\big(f_{i\lambda}-F_{\lambda}/a_{i\lambda}\big)^{2}}{\sigma^{2}_{i\lambda}}}$
We additionally include an empirically-tuned prior that aiλ ~ 1 to avoid large excursions in the solution for very low signal-to-noise data.
The fluxcorr terms are Chebyshev polynomials instead of Legendre polynomials. We actually solve for ((f - Fa)/σ)2 and then return 1/a. The prior is weighted by the data weights such that the relative strength of the prior vs. the data is approximately independent of S/N.
Flux distortion
The flux distortion vectors are parameterized in terms of magnitude (i.e. log-flux) that are achromatic with x, y, x2, y2, xy, where those are linear coordinates XFOCAL
, YFOCAL
from the plugmap.
There are also chromatic terms that scale as $\widetilde{\lambda} = 1- \big(5070/\lambda\big)^{2}$, since that function gives an equal effect between 3900 and 5070 Å as between 5070 and 9000 Å. There are also magnitude offsets as a function of spectrograph ID, and a chromatic offset as a function of spectrograph ID. The 13 parameters are:
$F_{new} = F_{orig}(1+a_{0}s_{1}+a_{1}s_{2})exp(\\ \ \ \ \ \ a_{2}x + a_{3}y + a_{4}xy +a_{5}x^{2} + a_{6}y^{2} + \\ \ \ \ \ \ a_{7}\widetilde{\lambda}x + a_{8}\widetilde{\lambda}y + \\ \ \ \ \ \ a_{9}\widetilde{\lambda}s_{1} + a_{10}\widetilde{\lambda}s_{2} + a_{11}\widetilde{\lambda}^{2}s_{1} + a_{12}\widetilde{\lambda}^{2}s_{2})\\),$
where s1 = 1 if specid = 1 else 0 and s2 = 2 if specid = 1 else 0.
Only SPECTROPHOTO_STD
or REDDEN_STD
objects are used to compute the flux distortion. The procedure minimizes differences between spectro-flux and photometry (CALIBFLUX
). Only g, r and i-bands are used.
One-dimensional Pipeline (spec1d)
Reference: Bolton et al. (2012)
The spec1d pipeline reads spectra and determines classifications, redshifts, and other spectroscopic parameters. It produces the following files.
File Type | in/out | Description | URL format | format parameters |
---|---|---|---|---|
spZline | out | emission line fits | eboss/spectro/redux/%d/%04d/spZline-%04d-%d.fits | run2d, plate, plate, mjd |
spZall | out | all spectroscopic classifications and redshifts | eboss/spectro/redux/%d/%04d/spZall-%04d-%d.fits | run2d, plate, plate, mjd |
spZbest | out | spectroscopic classifications and redshifts | eboss/spectro/redux/%d/%04d/spZbest-%04d-%d.fits | run2d, plate, plate, mjd |
Per-object Spectrum Files
As of DR9, the pipeline also provides a reformatting of the same spectral data into one file per PLATE-MJD-FIBER, including the coadded spectra from spPlate, the emission line fits from spZline, the redshifts and classifications from spZall and spZbest, and optionally the individual exposure spectra from spCFrame. These are useful when you need all of the information for a small subset of objects.
File Type | in/out | Description | URL format | format parameters |
---|---|---|---|---|
spec | out | All spectral information for a single PLATE-MJD-FIBER | eboss/spectro/redux/%d/spectra/full/%04d/spec-%04d-%05d-%04d.fits | run2d, plate, plate, mjd, fiber |
speclite | out | All spectral information for a single PLATE-MJD-FIBER except the individual exposures | eboss/spectro/redux/%d/spectra/lite/%04d/spec-%04d-%05d-%04d.fits | run2d, plate, plate, mjd, fiber |
Stellar Parameters Pipeline (sspp)
References: Lee et al. (2008a), Lee et al. (2008b), Allende Prieto et al. (2008)
The SEGUE stellar parameters pipeline produces a number of files, stored together:
File Type | in/out | Description | URL format | format parameters |
---|---|---|---|---|
ssppOut | out | SSPP stellar parameters ([Fe/H], log g, etc.) | sdss/sspp/%d/%04d/output/param/ssppOut-%04d-%5d.fit | rerun, plate, plate, mjd |
ssppOut_lineindex | out | SSPP line indices | sdss/sspp/%d/%04d/output/param/ssppOut-%04d-%5d.lineindex.fit | rerun, plate, plate, mjd |
Galaxy Parameter Pipelines
For spectra of galaxies, several additional analysis pipelines are applied to derive catalogs of physical galaxy parameters. These pipelines and their outputs are described fully in the Galaxy Parameter Pipeline pages.