Multi Drizzle Automated Image Combination and Cosmic-Ray Identification

Multi. Drizzle: Automated Image Combination and Cosmic-Ray Identification Software Warren Hack, Christopher Hanley, Ivo Busko, Robert Jedrzejewski, Anton Koekemoer Space Telescope Science Institute Baltimore, MD

IRAF Many applications, including most pipeline calibration software, rely on IRAF. • An interactive shell environment • It’s native language is FORTRAN-like • Large library of tasks for data analysis developed over the years • Astronomers can string these tasks together to create specialized data reduction scripts • Scripts and tasks are procedural in nature • File I/O occurs between each step of these scripts • Parameter files used to pass around input and output values • Little capability for error handling • Python has recently been introduced for use with IRAF (through Py. RAF)

It is this framework on which the calibration pipeline software was originally developed…

Calibration Pipeline History • Original instruments (WFPC, WFPC 2, FOC) – Provided basic calibrations in pipeline – No cosmic-ray rejection or image combination • STIS and NICMOS (1997 - ) – Automatically removed cosmic-rays from CR-SPLIT associated data and combined into 1 product in pipeline – Repeat-obs data not cleaned of cosmic-rays – Combination of dithered exposures only for NICMOS using simple shift-and-add • ACS (2002 - 21 Sept 2004) – Automatically combined dithered images while removing distortion using Py. Drizzle – Combined products still contained cosmic-rays • Multi. Drizzle implemented in pipeline for ACS on 22 Sept 2004

What is Multi. Drizzle? Used by the Hubble Space Telescope (HST) Advanced Camera for Surveys data processing pipeline. Purpose: Combine images taken at slightly different pointings – Remove fixed image defects/detector gaps – Remove image distortion – Remove cosmic rays and other transient events

ACS Pipeline Processing: Raw Data 50 Pixel Gap Wide Field Channel of ACS consists of two 2048 x 4096 CCDs that don’t overlap

ACS Pipeline Processing: Basic Calibration Basic CCD reductions have been applied to the data; primarily, • Bias correction • Dark subtraction • Flat-fielding

ACS Pipeline Processing: Basic Calibration Associations specify how multiple images are related to a final product. Each input, though, will have: • Cosmic rays • Geometric distortion The final product will be made from: • Multiple observations of the same portion of the sky • Observations may be offset slightly from previous observations Cosmic Rays

Image Combination • Associated data, which may have been taken at different positions, should be combined. – Why not just shift and add the images, like the NICMOS pipeline?

Shift and Add doesn’t work!!! Geometric distortion means a fixed number of arcseconds on the sky corresponds to a variable number of pixels across the image

So how do you combine images then?

Image Combination with Py. Drizzle DRIZZLE, as automated by Py. Drizzle, has been used in the ACS pipeline for image combination. • ‘drizzle’ is an implementation of the image combination method known as variable-pixel linear reconstruction. • This technique applies a distortion correction model and an optimum weighting scheme to a set of input images to create an undistorted output image. • Performs distortion correction and any shifting/rotating/scaling in one step. • Technique is flux conserving. • Exists as a FORTRAN program that runs under IRAF

Multi. Drizzle Procedure Initially implemented as an Python script calling IRAF tasks to perform the steps: • Mask known defects. • Subtract the sky level from each detector. • Undistort (‘drizzle’) the input images onto separate, registered outputs. • Combine the drizzled images to create a “median” image. • Distort back the median image to the geometry of the input images. • Identify cosmic rays by comparing the distorted median image to the input images and mask them. • Do the final drizzle of all masked input images into the same output to create the cleaned, sub-sampled, mosaic image. Distributed with STSDAS for use by external community

Single Undistorted Image Goal of this step is to create distortion free images that are easily registered with the other input images

Median Image • Inputs to this step are the undistorted images from the previous step • The goal is to create a mosaic image that is cosmic ray free

Cosmic Ray Identification Differences between the images are identified as cosmic rays and added to the image masks. Input Image Distorted Median Image

Final Mosaic • Using our sky subtracted images, static pixel mask, cosmic ray masks as input we convert to our distortion corrected output frame. • The image is displayed with an orientation of North up. Image is photometrically flat, distortion free, and cleaned of all(? ) cosmic-rays and bad-pixels right out of the pipeline.

Why re-write this IRAF script in Python if it was already being used by Astronomers?

Multi. Drizzle Automatic Pipeline Requirements • Minimize File I/O – This is a problem for IRAF scripts since all array-like calculations typically require I/O of files. • Error trapping – IRAF scripts have no ability to trap errors. When an IRAF script crashes, usually no meaningful diagnostic information is returned.

Multi. Drizzle Automatic Pipeline Requirements (cont’d) • Robustness – This HAS to work every time… • IRAF Parameter files – Minimize dependence on IRAF par files to avoid multiple processes colliding when updating par file for same task • Processing time is an issue – The HST data processing pipeline needs to be able to process a day’s worth of data in significantly less than a day. If the program is I/O bound, we could run into problems.

Why Python? Python provides advantages over IRAF: • Object-oriented • Easy to write, read, and maintain • Robust • Good error handling • Fast, especially with C extensions • Active community

Py. FITS and NUMARRAY • The Multi. Drizzle I/O model is based upon the use of a couple of STSc. I additions to Python: Py. FITS and NUMARRAY. – All computations done are on NUMARRAY objects. – Py. FITS allows us to directly read our FITS input as NUMARRAY objects and manipulate header information. – Pre-existence of this code was fundamental. – Numarray objects allow simple array arithmetic just as in IDL; e. g. , output = image 1 + image 2

What problems are solved by Py. FITS and NUMARRAY? • With all calculations done on NUMARRAY objects in memory, file I/O is minimized. • Using Py. FITS allows for a uniform handling of input during processing regardless of how the files were originally structured. – ACS != STIS != WFPC 2 != NICMOS – We can support any new instrument as long as we can convert the science data into a NUMARRAY object.

Multi. Drizzle Issue • Multi. Drizzle relies on creating a median image from a set of input images – It’s necessary to have all of the pixels of the stack for a particular location in memory at the same time. – Two methods for handling this stack of images: read them all into memory at once or process them in sections

THE BRUTE FORCE APPROACH: All input images in memory at once How much memory is needed? • Worst case scenario in pipeline is 30 ACS/WFC images – 192 Mbytes x 30 input images = 5760 Mbytes Or Far more memory than can be addressed on a 32 -bit operating system

Fine. Do your processing in image sections then…

Solution One: Memory Mapping and Iterators • We wrote an imageiterator class that allowed us to step through a stack of images using a predefined (large) buffer size. • Using the memory mapping property of NUMARRAY we planned only to read in the number of pixels defined by our imageiterator buffer size.

Solution One: Why doesn’t this work • Although you are only reading in small portions of the data into memory at a single time, the operating system still needs to be able to address every pixel in each input image simultaneously. – Problem on 32 bit systems

Solution Two: Look to Py. FITS developer for help • A new attribute was developed for the Py. FITS data object called “section”, which returns a user-specified slice. – Data sections must be contiguous – Sections are read directly from the file without having to memory map the entire image. – Using sections in our image iterator got us around our 32 -bit limit – We didn’t need to change any other code to implement this

Drizzle as an IRAF task • Multi. Drizzle originally relied on running ‘drizzle’ as an IRAF task. • The code for ‘drizzle’ was revised and reorganized to not only support an IRAF interface, but also to provide a Python callable interface – Python interface supports direct input of numarray objects instead of using file I/O – Removes all IRAF dependencies, with image I/O done by Py. FITS from within Py. Drizzle

Current Status • ACS Pipeline – Successfully deployed in pipeline on 22 September 2004 with OPUS 15. 4 – Only a small number of problems encountered so far • Released with STSDAS v 3. 3 and STSCI_Python v 2. 0 on 12 November 2004 • Support for WFPC 2 instrument data in testing • Support for STIS and NICMOS imaging under development.

Credits • Drizzle Algorithm – Fruchter and Hook – Drizzle: A Method for the Linear Reconstruction of Undersampled Images Fruchter & Hook, PASP, 114: 144 -152, 2002 Februrary • Anton Koekemoer – Original Multi. Drizzle Algorithm and Script: Koekemoer, Fruchter, Hook & Hack, 2002, HST Calibration Workshop, p. 337 • Todd Miller – NUMARRAY • J. C. Hsu – Py. FITS

Credits • Multi. Drizzle Redesign Team – Warren Hack • hack@stsci. edu – Ivo Busko • busko@stsci. edu – Robert Jedrzejewski • rij@stsci. edu – Christopher Hanley • chanley@stsci. edu • Multi. Drizzle Project Lead – Anton Koekemoer • Schedule: Carl Biagetti • Web site: http: //stsdas. stsci. edu/multidrizzle

Future Enhancements • Better image registration in the mosaic – Prototype task ‘tweakshifts’ available for testing – Development continuing on a more general solution • • Support more instruments (in progress) Support of a generic instrument class Support for 64 -bit operating systems …