6d6a7f2dc0e7d48086157ba7c4fb35ca.ppt
- Количество слайдов: 45
HPCS Productivity Benchmarks Working Group SSCA #3 Sensor Processing Knowledge Formation and Data I/O Serial v 1. 0 MIT Lincoln Laboratory January 4, 2007 999999 -1 XYZ 3/19/2018 MIT Lincoln Laboratory
Outline • Scalable Synthetic Compact Applications • SSCA #3 – Overview – Quick Recipe Data I/O Mode • Implementation and Results MIT Lincoln Laboratory 3/19/2018
Scalable Synthetic Compact Applications Goals • APP SIZE/COMPLEXITY Next. Gen Apps Building on a motivation slide from Fred Johnson (15 January 2004) Full Apps HPCS Compact Apps Micro BMKs Identify which dimensions that must be examined at full complexity and which dimensions that can be examined at reduced scale while providing understanding of both full applications today and future applications SYSTEM SIZE/ COMPLEXITY MIT Lincoln Laboratory 3/19/2018
HPCS Benchmark Spectrum SSCA #3 MIT Lincoln Laboratory 3/19/2018
Outline • The Vision • SSCA #3 – Overview – Quick Recipe Data I/O Mode • Implementation and Results MIT Lincoln Laboratory 3/19/2018
Overview • SSCA #3 Focuses on two stages: – Front end image processing and storage (Stage 1) – Back end image retrieval and knowledge formation (Stage 2) • It is representative of many areas: – Medical imaging (e. g. : tumor growth) Image many patients daily Later compare images of same patient over time – Astronomical image processing (e. g. : monitor supernovae) Image many regions of the sky daily Later compare images of a region over time – Reconnaissance monitoring (e. g. : enemy movement) Image many areas daily Later compare images of a given region over time MIT Lincoln Laboratory 3/19/2018
Overview • Benchmark stresses computation, communication, and data I/O • Can be run in 3 modes: – System Mode: A combination of Compute & Data I/O Modes – Compute Mode (minimized Data I/O Mode) – Data I/O Mode (minimized Compute Mode) • Principal performance goal is throughput – – Maximize rate at which answers are generated May overlap operation of data I/O and compute kernels Data I/O and compute kernels may run on different systems Some data is required to be contiguous MIT Lincoln Laboratory 3/19/2018
SSCA #3 – System Mode Stage 1: Front-End Sensor Processing Kernel #1 Data Read and Image Formation Scalable Data and Template Generator SAR Image Template Insertion Kernel #2 Image Storage SAR Image Templates Raw Data Coeffs, Group of Templates Coeffs Raw Data Template Positional Indices Coeffs Computation Raw Complex Data Image Pair Community has traditionally focused on Computation … Kernel #3 Image Retrieval Groups of Templates Indices, Group of Templates Image Pair Image Template Indices Group of Templates Detection Sub-Images Data I/O Grid of Images Detection Sub-Images Kernel #4 Detection Templates & Indices Stage 2: Back-End Knowledge Formation Group of Templates Detections, Template Indices Validation … but Data I/O performance is increasingly important MIT Lincoln Laboratory 3/19/2018
SSCA #3 – Compute Mode Sensor Processing Scalable Data and Template Generator Raw SAR Kernel #1 Image Formation SAR Image Kernel #2 Image Storage SAR Image Templates Raw SAR File Template Insertion Raw SAR File Groups of Template Files Raw SAR Data Files SAR Image File Template Files Sub-Image Detection Files Groups of Template Files SAR Image File Template Files Kernel #3 Image Retrieval SAR Image Pair Templates Image Files Template Files Detection File Kernel #4 Detections Validation Knowledge Formation MIT Lincoln Laboratory 3/19/2018
SSCA #3: Compute Mode Challenges Back-End Knowledge Formation Front-End Sensor Processing Scalable Data and Template Generator Raw SAR Templates • Scalable synthetic data generation Kernel #1 Image Formation SAR Image Template Insertion SAR Image Templates Kernel #4 Detections Validation Templates • Pulse compression • Polar Interpolation • FFT, IFFT (corner turn) • Sequential store • Non-sequential retrieve • Large & small I/O • Large Images difference & Threshold • Many small correlations on selected pieces of a large image MIT Lincoln Laboratory 3/19/2018
SSCA #3 – Data I/O Mode Stage 1: Front-End Kernel #1 Data Read and Image Formation Scalable Data and Template Generator Large Data Groups of Small Data Large Complex Data Image Sub-Images Group of Small Data Kernel #3 Image Retrieval Image Groups of Small Data Image Pair Kernel #2 Image Storage Image Pair Grid of Images Sub-Images Kernel #4 Stage 2: Back-End MIT Lincoln Laboratory 3/19/2018
Outline • The Vision • SSCA #3 – Overview – Quick Recipe Data I/O Mode • Implementation and Results MIT Lincoln Laboratory 3/19/2018
Ingredients To run Data I/O Mode, the user only needs set: 1) SCALE, 2) N_SDG_GROUPS, and 3) grid Where: SCALE = a parameter that sets the size of raw input data, and image. It should be set so that these are a significant fraction of a single processor’s memory. • N_SDG_GROUPS = number of raw input data and templates groups. It should be set large enough to avoid disk cache effects. • And the number of images in the grid is: GRID_SIDE_SIZE x AV_GRID_DEPTH GRID_SIDE_SIZE A V_ G R ID _D EP TH • GRID_SIDE_SIZE MIT Lincoln Laboratory 3/19/2018
Ingredients Parameters to Code: • PICTURE_SIZE = GRID_SIDE_SIZE 2 is the number of images in a picture • EST_TOT_GRID_SIZE = PICTURE_SIZE x AV_GRID_DEPTH is the total number of times that the input data will be retrieved, and the total number of images stored to the grid • mc x n = is the size of the raw complex valued input data mc = 2 x ceil(80 x SCALE) n = 2 x ceil(158. 496 x SCALE + 60) • ROTATION_STEP is the templates’ rotation angle increment in degrees • n. Distinct. Letters x n. Distinct. Rotations is total number of pixelated templates n. Distinct. Letters = number of least correlated letters in alphabet (21) n. Distinct. Rotations = num of ROTATION_STEP angles between 0 and 360 degs • FONT_SIZE x FONT_SIZE = size of a single template in pixels MIT Lincoln Laboratory 3/19/2018
Ingredients Parameters to Code (Cont. ): • m x nx = size of an image m = 2*ceil(mc/0. 8405246) k 1 n = 8. 3776 x (1. 5 -1/n) kxmin = sqrt(70. 1841812 -6. 3165469 x (m/mc)2) kxmax = sqrt((4 x k 1 n. ^2)-25. 2661877 x (1/mc)2) nx = 2 x ceil(20 x SCALE*(kxmax-kxmin)/pi) + 20 • n. Sub. Images = floor( p. Occupancy x p 2 nd. Not 1 st x (m /(SARLOBE_DISTANCE x FONT_SIZE)) x (nx/(SARLOBE_DISTANCE x FONT_SIZE)) ) = number of smaller images to be stored (by the last kernel), where: p. Occupancy = 0. 5 p 2 nd. Not 1 st = 0. 5 is the probability of template occupancy, and is the probability that a template appear in the second image but not in the first Total memory required, in bytes = N_SDG_GROUPS x (8 x mc x n + 4 x n. Distinct. Letters x n. Distinct. Rotations x FONT_SIZE 2) + EST_TOT_GRID_SIZE x (4 x m x nx + 4*n. Sub. Images x (4 x FONT_SIZE)2) + (coefficients, support and verification parameters; stored once) • Grows with SCALE 2 MIT Lincoln Laboratory 3/19/2018
Directions SDG • • Create a group – Create a random single precision complex valued (large) mc x n matrix – Store the data – Create a random real valued (small) FONT_SIZE x FONT_SIZE matrix – Store small matrix n. Distinct. Letters x n. Distinct. Rotations times Copy the above group N_SDG_GROUPS times STAGE 1 for i. Image = 1 to EST_TOT_GRID_SIZE KERNEL 1 – Randomly pick and retrieve one of the N_SDG_GROUPS groups – Create a random single precision real valued m x nx matrix KERNEL 2 – Randomly select i and j values in the range [1, GRID_SIDE_SIZE] and use these to create a filename. – Store the image matrix end MIT Lincoln Laboratory 3/19/2018
Directions STAGE 2 for i. Image. Seq = 1 to PICTURE_SIZE – Randomly select i and j values in the range [1, GRID_SIDE_SIZE] – Find the grid depth at this particular point for k = 1 to grid. Point. Depth-2 KERNEL 3 – Retrieve a pair of images, and an SDG group of templates KERNEL 4 for l = 1 to n. Sub. Images – Create a random (4 x FONT_SIZE) x (4 x FONT_SIZE) matrix – Store the sub image end end MIT Lincoln Laboratory 3/19/2018
Outline • The Vision • SSCA #3 – Overview – Quick Recipe Data I/O Mode • Implementation and Results MIT Lincoln Laboratory 3/19/2018
SSCA #3 Serial Release v 1. 0 Types of Data I/O Implemented: • FWRITE, binary, IEEE floating point with appropriate big or littleendian byte ordering and 32 -bit data type • HDF 5, HDF 5 32 bit float format Modes: • System Mode • • – Includes both Compute (SAR Processing), and Data I/O Modes. Compute Mode – Dials the smallest possible Grid of 2 images, thus minimizing data I/O. Data I/O Mode – Generates random data, thus foregoing SAR processing. Outputs metrics at each level in the system’s hierarchy – Kernels, Stages, and Overall SSCA #3: – Bytes, seconds, bandwidth (bytes/sec) MIT Lincoln Laboratory 3/19/2018
SSCA #3 Serial Release v 1. 0 • One of many possible implementations • Over 2200 lines of well commented MATLAB code. Carefully picked functional breakdown, data structures, variable names, and comments • Coding standard: Modified “Programming in C++, Rules and Recommendations” by Mats Henricson and Erik Nyquist of Ellemtel Telecommunication System Laboratories, 1990 -1992 • Development tools used – MATLAB Version 7. 1. 0. 246 (R 14) Service Pack 3 (version required) – Octave Version 2. 9. 5 – Pentium® 4 2. 66 GHz CPU with 1. 00 GB of RAM, and 2. 5 GB of virtual RAM, running on MS Windows XP Professional Version 2002 Service Pack 1 – On a dedicated dual processor hyperthreaded P 4 Xeon, 2. 8 GHz, ½ MB cache, GNU/Linux 2. 4. 20 -28. 9 (Redhat 9) • Accompanying documentation: – Written Specification, and these slides – MANIFEST. txt – list of files with brief description – README. txt – installation and run time instructions; code overview – RELEASE_NOTES. txt – known outstanding issues in current release MIT Lincoln Laboratory 3/19/2018
SSCA #3 Release v 1. 0 a MIT Lincoln Laboratory 3/19/2018
Summary Challenges: • Large scale parallel two-dimensional (2 D) Inverse Fast Fourier Transform (IFFT); may require a ‘corner turn’ or a ‘gather scatter’ (depending on architecture), with large quantities of data. Polar interpolation is known to be even more computationally intense than IFFT (Kernel 1). • Streaming image data storage to a data I/O device (write) may involve large block data transfers, storing one large image after another (Kernel 2). • Random location image sequence retrieval from a data I/O device (read) also involving large quantities of data, with possibly stressful spatial or temporal memory access patterns, and locality issues (Kernel 3). • Small data I/O in all four kernels. Large data I/O in three of the four kernels. • Many small convolutions on random pieces of a large image (Kernel 4). Status: • Written and Matlab Executable Specification v 1. 0 released June 22, 2006 • Architecture of Data I/O Mode – Martha Bancroft of Shomo Tech Systems, and Jeremy Kepner • Works with Octave 2. 9. 5 • Written Specification – SAR Editor – Glenn Schrader, MIT Lincoln Laboratory • C version based on release v 1. 0 a (unofficial) – Meng-Ju of UMD, and Janice Onanian Mc. Mahon of USC/ISI MIT Lincoln Laboratory 3/19/2018
SSCA #3 Backup Slides MIT Lincoln Laboratory 3/19/2018
SSCA #3 Specification • • Intent Overview Compute Mode Main Components – – – Synthetic Scalable Data Generator Kernel 1 — SAR Image Formation Template Insertion Kernel 4 — Detection Validation Data I/O Mode Main Components – – – Kernel 1 — Large & Small Data Retrieval Image Grid Kernel 2 — Image Storage Kernel 3 — Image Retrieval Kernel 4 — Small Image Storage MIT Lincoln Laboratory 3/19/2018
The Vision ― Scalable Synthetic Compact Applications • • Bridge the gap between scalable synthetic kernel benchmarks and (non-scalable) real applications, and become an important benchmarking tool Is representative of real application workloads while not being numerically rigorous – memory access characteristics – communications characteristics – I/O characteristics • • • Multi-processor compact application, designed to be easily scalable and verifiable No limits on the distribution to vendors and universities SSCAs represent a wide spectrum of potential HPCS Mission Partner applications MIT Lincoln Laboratory 3/19/2018
Executable Specification What is an Executable Specification: • • • It implements the Written Specification, illustrating all specified properties; it is just one of many possible implementations It provides developers further insight into the corresponding Written Specification It is a tool for developers with which to validate their own work It includes a serial version, and may include one or more approaches to a parallel version It must be easily readable and intelligible, through its choice of functional structure, variable names, comments, and supporting documentation Structure: • • Scalable Data Generator – Creates synthetic data that can be scaled to stress any computer from a single workstation to a petascale multiprocessor Kernels – timed computational algorithms Verification – checks the correctness of select results Validation – validates the resulting solution MIT Lincoln Laboratory 3/19/2018
SSCA #3 Specification • • Intent Overview Compute Mode Main Components – – – Synthetic Scalable Data Generator Kernel 1 — SAR Image Formation Template Insertion Kernel 4 — Detection Validation Data I/O Mode Main Components – – – Kernel 1 — Large & Small Data Retrieval Image Grid Kernel 2 — Image Storage Kernel 3 — Image Retrieval Kernel 4 — Small Image Storage MIT Lincoln Laboratory 3/19/2018
SSCA #3 – Compute Only Mode Sensor Processing Scalable Data and Template Generator Raw SAR Kernel #1 Image Formation SAR Image Kernel #2 Image Storage SAR Image Templates Raw SAR File Template Insertion Raw SAR File Groups of Template Files Raw SAR Data Files SAR Image File Template Files Sub-Image Detection Files Groups of Template Files SAR Image File Template Files Kernel #3 Image Retrieval SAR Image Pair Templates Image Files Template Files Detection File Kernel #4 Detections Validation Knowledge Formation MIT Lincoln Laboratory 3/19/2018
Spotlight SAR MIT Lincoln Laboratory 3/19/2018
Compute Mode - SAR Overview • Radar captures echo returns from a ‘swath’ on the ground • Notional linear FM chirp pulse train, plus two ideally non-overlapping echoes returned from different positions on the swath Synthetic Aperture, L Fixed to Broadside . . . • Summation and scaling of echo returns realizes a challengingly long antenna aperture along the flight path Range, X = 2 X 0 delayed transmitted SAR waveform received ‘raw’ SAR reflection coefficient scale factor, different for each return from the swath Cross-Range, Y = 2 Y 0 MIT Lincoln Laboratory 3/19/2018
Scalable Synthetic Data Generator • Generates synthetic raw SAR complex data • Data size is scalable to enable rigorous testing of high performance computing systems Spotlight SAR Returns • Generates ‘templates’ that consist of rotated and pixelated capitalized letters Range – User defined scale factor determines the size of images generated Cross-Range MIT Lincoln Laboratory 3/19/2018
Kernel 1 — SAR Image Formation Spatial Frequency Domain Interpolation s*0(w, ku) s(t, u) Fourier s(w, ku) Transform (t, u)B(w, ku) Matched Filtering Interpolation kx = sqrt(4 k 2 –ku 2) ky = ku Inverse f(x, y) Fourier Transform F(kx, ky) B (x, y) Cross-Range, Pixels Spotlight SAR Reconstruction ky o Received Samples Fit a Polar Swath kx Range, Pixels Processed Samples Fit a Rectangular Swath f MIT Lincoln Laboratory 3/19/2018
Template Insertion ( not timed) Inserts rotated pixelated capital letter templates into each SAR image – Non-overlapping locations and rotations – Randomly selects 50% – Used as ideal detection targets in Kernel 4 Hypothetical %100 Insertion of Templates Image Inserted with only %50 -Random Templates Y Pixels • X Pixels MIT Lincoln Laboratory 3/19/2018
Kernel 4 — Detection • Detects targets in SAR images 1. 2. 3. 4. Image difference Threshold Sub-regions Correlate with every template max is target ID Image A Image Difference • Computationally difficult – Many small correlations over random pieces of a large image • Requires 100% recognition and no false alarms including objects • Thresholded that cross distributed memory boundaries Sub-region Correlated Image B MIT Lincoln Laboratory 3/19/2018
Computational Challenges Back-End Knowledge Formation Front-End Sensor Processing Scalable Data and Template Generator Raw SAR Templates • Scalable synthetic data generation Kernel #1 Image Formation SAR Image Template Insertion SAR Image Templates Kernel #4 Detections Validation Templates • Pulse compression • Polar Interpolation • FFT, IFFT (corner turn) • Sequential store • Non-sequential retrieve • Large & small IO • Large Images difference & Threshold • Many small correlations on selected pieces of a large image MIT Lincoln Laboratory 3/19/2018
SSCA #3 Specification • • Intent Overview Compute Mode Main Components – – – Synthetic Scalable Data Generator Kernel 1 — SAR Image Formation Template Insertion Kernel 4 — Detection Validation Data I/O Mode Main Components – – – Kernel 1 — Large & Small Data Retrieval Image Grid Kernel 2 — Image Storage Kernel 3 — Image Retrieval Kernel 4 — Small Image Storage MIT Lincoln Laboratory 3/19/2018
SSCA #3 – Data I/O Mode Stage 1: Front-End Kernel #1 Data Read and Image Formation Scalable Data and Template Generator Large Data Groups of Small Data Large Complex Data Image Sub-Images Group of Small Data Kernel #3 Image Retrieval Image Groups of Small Data Image Pair Kernel #2 Image Storage Image Pair Grid of Images Sub-Images Kernel #4 Stage 2: Back-End MIT Lincoln Laboratory 3/19/2018
Scalable Synthetic Data Generator Scalable Data Generator Large Data Groups of Small Data Kernel #1 • Generates large complex data, and groups of small data. • Writes a ‘dialed’ number of large complex data to external memory. • For each large data, it writes a group of small data to external memory. • Single precision Large Complex Data Associated Groups of Small Data • Not timed MIT Lincoln Laboratory 3/19/2018
Kernel 1 — Data Retrieval Stage 1: Front-End Kernel #1 Data Read Large Data Large Complex Data Image Small Data • Randomly reads one large complex data from external memory, at each Stage 1 pass. • Also reads associated group of small data from external memory, at each Stage 1 pass. • Generates a single precision random image (of the size dialed by SCALE). • I/O is timed Associated Groups of Small Data MIT Lincoln Laboratory 3/19/2018
Image Grid • Image size requires a non-trivial amount of memory. • V_ G R It is scalable by image size, number of images. Intended for dealing with enormous quantity of data, with simultaneous reads and writes. A External memory image Grid is accessed by Kernels 2 & 3. • ID _D EP TH • GRID_SIDE_SIZE Image Grid GRID_SIDE_SIZE Image grid, shown scaled to 80 images MIT Lincoln Laboratory 3/19/2018
Kernel 2 — Image Storage Stage 1: Front-End Image • Writes a different image to a random location in the external memory on the Grid at each Stage 1 pass. • Images may be stored together, or in separate pieces (to allow simultaneous reading/writing of the same image). • Image Kernel #2 Image Storage I/O is timed Images in Grid • Computes filenames and addresses, and writes streaming data to random locations on Grid at each Stage 1 Front-End processing pass. MIT Lincoln Laboratory 3/19/2018
Kernel 3 — Image Retrieval Images In Grid Templates Image Kernel #3 Image Retrieval Group of small data Image Pair Stage 2: Back-End • From a random location in the Grid, it computes the address of an image sequence and reads a pair of its images until it reaches its full depth, at each Stage 2 pass. • An image sequence is read through its entire Grid’s Depth. • • Image Grid Also reads a group of small data at each Stage 2 pass. I/O is timed MIT Lincoln Laboratory 3/19/2018
Kernels 2 and 3 Additional notes: • If an optimal scheme is picked for data storage, it may not be optimal for data retrieval, and vice versa. • “Read behind Write” is allowed. Kernel 2 Image Output Kernel 3 Image Pair Input MIT Lincoln Laboratory 3/19/2018
Kernel 4 — Small Image • Writes labeled sub-images. This is repeated for each image pair, at each grid point, at each Stage 2 pass. • I/O is timed Sub-Images Sub-Image pair Kernel #4 Small Image Output Stage 2: Back-End MIT Lincoln Laboratory 3/19/2018
References • Carrara, Walter G. , Ron S. Goodman and Ronald M. Majewski, Spotlight Synthetic Aperture Radar: Signal Processing Algorithms. Boston: Artech House, 1995. • Corlander, John C. and Robert N. Mc. Donough, Synthetic Aperture Radar: Systems and Signal Processing. New York: Wiley, 1991. • Haney, R. , Meuse T. , Kepner, J. , and Lebak, J. , The HPEC Challenge Benchmark Suite, High Performance Embedded Computing Conference, Lexington, MA 2005. • Jakowatz, Charles V. , Jr. , et al. , Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach. Boston Kluwer Academic Publishers, 1996. • Rihaczek, August W. , Principles of High-Resolution Radar. Boston: Artech House 1996. Originally published: New York: Mc. Graw-Hill, 1969. • Stimson, George W. , III, Introduction to Airborne Radar Second Edition. World Color Book Services, 1998. MIT Lincoln Laboratory 3/19/2018
6d6a7f2dc0e7d48086157ba7c4fb35ca.ppt