b8290339f52b8997d92b5c58d697fa9e.ppt
- Количество слайдов: 19
Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin
Why Primitives? “Было бы расточительством и неграмотностью не предоставлять разработчикам общего фундамента для их [систем] построения. ” А. П. Ершов, "Математическое обеспечение 4 -го поколения" • To optimize deeply • To make it cross-platform • To make it orthogonal in functionality • To test perfectly • To develop independently • To give customers the build blocks Intel® Integrated Performance Primitives
Being Primitive ANSI C. Portable Low overhead. High perf with small data Low structure. No conversion Basic common operation. For many ISV Atomic. Making one thing. Build blocks, flexible Self contained. Min or zero OS dependency Predictable. Expectable behavior and results Well defined. No “result is not defined” Well documented. And self documented Intuitive. Understand once ipps. Add. C_8 u_I No magic. No side effects, explicit behavior
High Temperature IPP SW. Applications Components OS IPP HW. CPU & chipset
IPP & Media. What is Inside? • • • Signal & Image Processing String Processing Computer Vision Speech Recognition primitives Jpeg & Jpeg 2000 primitives Speech, Audio and Video Coding Lossless Data Compression Small Matrix operations, Vector Math Cryptography Realistic Rendering Data Integrity Automatically generated DSP transforms
IPP, What Else? For Free? 50+ IPP Samples given in source codes • Video codecs: MPEG 2, MPEG 4, H 264, VC 1 • Audio codecs: MP 3, AAC, AC 3 • JPEG and JPEG 2000 codecs • Speech codecs: G 722, G 723, G 726, G 728 • Computer Vision: Face Detection • Ray Tracing demo • Interfaces: Java, C#, . VB, F 90, C++ • Yes. Download free source-code samples http: //www. intel. com/support/performancetools/libraries/ipp/
Performance More Optimization Needed Core Arch SSE MHz MMX 3 GHz Time Optimization is needed. A lot of work
Achieving Performance § Algorithms § SIMD § Threading § HW accelerators § Hybrid Solution
Algorithm. Right DFT Decomposition Manually optimized code vs. automatically generated. The best of 200 decomposition cases are benchmarked
Threading. Function level and above Primitive level. 1 D FFT is optimized and threaded. Performance on Core™ 2 Duo 22 GFlops Over primitives. IPP based GZIP even single thread version is faster, see performance on the chart in CPU clocks per byte. The threaded version is much faster due to the threading modes implemented: multi-file and in-file parallelization
FFTW Compares FFT Performance FFTW web site IPP 3. 60 GHz Intel Xeon Pentium 4 (Prescott), unknown L 2 size, 64 bit mode. Linux 2. 4. 21, Intel C/C++ Compiler 9. 0, Intel Fortran Compiler 9. 0, Intel Math Kernel Library Version 8. 0. 1, Intel Integrated Performance Primitives v 5. 0. Has SSE (4 -way single precision SIMD), SSE 2 (2 -way double precision SIMD), SSE 3 http: //www. fftw. org/speed
The Open Source Powered by IPP • Data Compression • GZIP, ZLIB, BZIP 2, LZO • Image Coding. Jpeg • IJG • Cryptography • Open. SSL • Computer Vision • Open. CV
Open. CV Calls IPP and Wins Stanford Racing Team has won Grand Challenge. Open. CV & IPP are used in “Stanley” computer vision software. DARPA "Urban Challenge“. In November with 60 -mile multi-robot face-off in a simulated city. Powered by Intel Core 2 Quad running IPP and Open. CV
HW Acceleration Low Power Computing In media, the CPU utilization decrease is desirable (unlike in HPC) Because of less power consumption and letting other applications run. IPP video decoders running on CPU and on HW accelerators compared with Power. VR technology. Menlow with Linux
Hybrid Solution. MC+HT+HWA
AMD Performance Library • IPP API compatible • Much less functionality • Much less performance
Quality vs. Performance MSU Graphics Lab reports IPP H. 264 encoder is in top 3
IPP Economics • 16 functional domains • 10 K functions • 350 MB of source codes • Windows, Linux, Mac. OSX • IA 32, Intel® 64, IA 64, XScale • All development in Russia • 3 Releases a year + Out-Of-Cycle releases • IPP $199, IPP samples $0
IPP Customers • Microsoft • Adobe • Philips Medical • Math. Works • Ulead • Thomson • Yahoo • OKI • Apple • Symantec • Pixar • Envivio • SGI • Oracle • SAP • Google Russian?
b8290339f52b8997d92b5c58d697fa9e.ppt