Скачать презентацию Computer Engineering Senior Projects Research Overview An Скачать презентацию Computer Engineering Senior Projects Research Overview An

af91be274c43246a606f85e74ab1dca0.ppt

  • Количество слайдов: 63

Computer Engineering Senior Projects & Research Overview An informal overview of past & current Computer Engineering Senior Projects & Research Overview An informal overview of past & current projects students & my own by Al Davis School of Computing 1

The Engineering Discipline l Role – design and build things – change the world The Engineering Discipline l Role – design and build things – change the world around us » hopefully for the better » hence faced with a continuous ethical dilemma l Ultimate requirement – what we build must work l Requisite skills – science: math, physics, chemistry, materials, CS, … – engineering: state of the art, current practice, technology trends, manufacturing, testability, maintenance, life cycle costs, … – art: creative component that is clearly evident in the great engineers School of Computing 2

Computer Engineering l Design and build computer systems – inherently involves both software and Computer Engineering l Design and build computer systems – inherently involves both software and hardware design skills l System software – compiler, operating system, device drivers, … – as opposed to application specific software » applications are the target system “user” » hence they are used in design evaluation (pre- and post-build) l Hardware: possibly many disciplines and levels – VLSI chip design: analog and digital circuit aspects » CS, EE, physics are the key disciplines » yet cooling is a big issue – enter ME aspects – board design: CS, EE, and manufacturing issues are dominant – system design: balance of HW and SW capabilities School of Computing 3

CE Senior Projects at Utah l Logistics – CE program run jointly by So. CE Senior Projects at Utah l Logistics – CE program run jointly by So. C and ECE departments – Senior project is capstone project course » team based » students choose their own project » best mechanism to demonstrate your abilities to future employers – CE Senior Project is a year long activity » at least for the last 2. 5 years » Spring term of junior year: plan and propose » Summer: get parts and start building (optional) » Fall term of senior year: build and demonstrate – Exit interview feedback » rave reviews for being hard, fun, and instructive School of Computing 4

04 Projects l l Satellite Tracking station Weaver – a 802. 11 remote control 04 Projects l l Satellite Tracking station Weaver – a 802. 11 remote control vehicle interface – camera on car: image and commands to base station via wireless – car has autonomous anti-collision capability (infrared) l GPS Hummer – autonomous navigation and anti-collision – some AI in route finding since Hummer remembers obstacles that it saw previously l PCI Coprocessor – efficient acceleration via PCI add-on l Jiggawax – build your own i. Pod l RVI – remote vehicle interface – control via web or cell phone – control windows, engine, and door locks from RF base station School of Computing 5

05 Projects l Carputer – OBDII car data and 802. 11 g auto-sync to 05 Projects l Carputer – OBDII car data and 802. 11 g auto-sync to base station – monitor your car or your kids l IR tag – paintball without the mess l Athlete monitor system – real time tracking of position and heart rate to central coaching station – GPS, RF, and HRM on-athlete l l Inverted pendulum 2 -wheeled robot Multi-carrier reflectometry – finding faults in aircraft wires without tearing the plane apart l Glider avionics package – using accelerometers, GPS, and strain sensors School of Computing 6

Current 06 projects (underway now) l PEN – electronic paper – the only paper Current 06 projects (underway now) l PEN – electronic paper – the only paper you’ll ever buy! l Recipedia – a cook book that talks and listens to you l GPS tracker – use campus ubiquitous wireless to keep track of where things are via your cell phone or computer l Omega. Core – a DVR that knows how to remove commercials for you l No. CPR – bathtub drowning prevention l Tracking Visor – virtual reality on your head School of Computing 7

Selected Examples l Some images to illustrate previous projects School of Computing 8 Selected Examples l Some images to illustrate previous projects School of Computing 8

Satellite Tracking Station Final dual band antenna on the roof of MEB during demo Satellite Tracking Station Final dual band antenna on the roof of MEB during demo day School of Computing 9

2 meter (VHF side) antenna specs – students used an antenna design CAD tool 2 meter (VHF side) antenna specs – students used an antenna design CAD tool School of Computing 10

School of Computing 11 School of Computing 11

School of Computing 12 School of Computing 12

School of Computing 13 School of Computing 13

School of Computing 14 School of Computing 14

GPS Hummer School of Computing 15 GPS Hummer School of Computing 15

Controlling direction and speed with transistors School of Computing 16 Controlling direction and speed with transistors School of Computing 16

GPS internals School of Computing 17 GPS internals School of Computing 17

A build your own GPS kit from Motorola School of Computing 18 A build your own GPS kit from Motorola School of Computing 18

Autonomous anit-collision system School of Computing 19 Autonomous anit-collision system School of Computing 19

School of Computing 20 School of Computing 20

School of Computing 21 School of Computing 21

School of Computing 22 School of Computing 22

School of Computing 23 School of Computing 23

Glider Avionics Package (note this ended up being done by a single student as Glider Avionics Package (note this ended up being done by a single student as a thesis) School of Computing 24

Designing an electronic compass is non-trivial especially if you want tilt-compensation School of Computing Designing an electronic compass is non-trivial especially if you want tilt-compensation School of Computing 25

Board Schematic School of Computing 26 Board Schematic School of Computing 26

Power Supply Filters and Registers Board Artwork School of Computing 27 Power Supply Filters and Registers Board Artwork School of Computing 27

School of Computing 28 School of Computing 28

School of Computing 29 School of Computing 29

Senior Project Synopsis l l This was just a peek Just remember – if Senior Project Synopsis l l This was just a peek Just remember – if you can imagine it you can usually build it » there are some things you just can’t do » like a perpetual motion machine l which violates the laws of physics – all it takes is dedication and time l l Huge diversity of both opportunities and problems You might have noticed the world isn’t perfect – so help fix it! School of Computing 30

Personal Research Overview l Past – dataflow, VLSI, asynchronous circuits, parallel computing, high performance Personal Research Overview l Past – dataflow, VLSI, asynchronous circuits, parallel computing, high performance architectures (50% academia, 50% industry) l Currently there are 4 projects – Domain specific architectures » target highly constrained embedded systems » will highlight the perception processor today l have also worked in signal processing and cell phone domains – Interconnect driven architecture » w/ Rajeev Balasubramonian & students – RPU design » w/ Erik Brunvand, Pete Shirley, Steve Parker, & students – VLSI wire scaling theory » w/ Stephanie Forrest & Melanie Moses @ UNM School of Computing 31

Embedded Computing Characteristics l Historically – – l narrow application specific focus typically cheap, Embedded Computing Characteristics l Historically – – l narrow application specific focus typically cheap, low-power, provide just enough compute power » niche filled by small microcontroller/dsp devices » AND often ASIC component(s) New Pressures – – – world goes bonkers on mobility and the web » expects ubiquitous information access » expects better and cheaper everything sensors, microphones & cameras become free » so use lots of them now we’re talking real computing School of Computing 32

New Look for ECS l Sophisticated application suites – not single algorithms – e. New Look for ECS l Sophisticated application suites – not single algorithms – e. g. » 3 G and 4 G cellular handsets l l » process what is streaming in from the net l » l l includes real time media & web access process the sensor, microphone, and camera streams l – multiple channels and multiple encoding models plus the usual DSP stuff plus network information from the neighborhood since things are starting to happen in groups wide range of services » dynamic selection » no single app will do Rate of algorithmic change is staggering School of Computing 33

ECS Economics l Traditional reliance on the ASIC design cycle – – lengthy IC ECS Economics l Traditional reliance on the ASIC design cycle – – lengthy IC design - > 1 year typical little re-use » IP import works but there are many pitfalls l l – l turning an IC is costly » even when it works the first time ECS product cycles – – l HDL code synthesize ed inefficiency Macroblock forces process and layout issues lifetime similar to a mayfly need next improved version “real soon now” Result – sell monster volumes in a short time or lose School of Computing 34

What is Perception Processing ? l l Ubiquitous computing needs natural human interfaces Processor What is Perception Processing ? l l Ubiquitous computing needs natural human interfaces Processor support for perceptual applications – – l Gesture recognition Object detection, recognition, tracking Speech recognition Biometrics Applications – – Multi-modal human friendly interfaces (our focus) Intelligent digital assistants Robotics, unmanned vehicles Perception prosthetics School of Computing 35

Perception Processing Problem consider always on aspect!! School of Computing 36 Perception Processing Problem consider always on aspect!! School of Computing 36

Current Processors Inadequate l Too slow, too much power for embedded space! – – Current Processors Inadequate l Too slow, too much power for embedded space! – – 400 MHz Xscale ~ 800 m. W – l 2. 4 GHz Pentium 4 ~ 60 Watts 10 x or more difference in performance but 100 x in power Inadequate memory bandwidth – – l Sphinx requires 1. 2 GB/s memory bandwidth Xscale delivers 64 MB/s ~ 1/19 th Our methodology – Characterize applications to find the problems – Derive acceleration architecture » School of Computing History of FPUs is an analogy 37

The Problem w/ GPP’s l caches & speculation – – l rigid communication model The Problem w/ GPP’s l caches & speculation – – l rigid communication model – – – l consume significant area and energy great when they work – a liability when they don’t data moves from memory to registers register execution unit register inability to support specialized computational pipelines » ASIC advantage bottom line – – – can process anything but not efficiently in many cases it’s the von Neumann trap » lots of overhead for almost no work School of Computing 38

The Face. Rec Application School of Computing 39 The Face. Rec Application School of Computing 39

Face. Rec In Action Bobby Evans School of Computing 40 Face. Rec In Action Bobby Evans School of Computing 40

Application Structure Flesh tone Image Segment Image Rowley Face Detector Viola & Jones Face Application Structure Flesh tone Image Segment Image Rowley Face Detector Viola & Jones Face Detector ANN based Neural Net Eye Locator Eigenfaces Face Recognizer l l Identity, Adaboost l ~200 stage Coordinates Flesh toning: Soriano et al, Bertran et al Segmentation: Text book approach Rowley detector, voter: Henry Rowley, CMU Viola & Jones’ detector: Published algorithm + Carbonetto, UBC Eigenfaces: Re-implementation by Colorado State University School of Computing 41

Application Profile School of Computing 42 Application Profile School of Computing 42

Face Recognition Analysis l Cache – small L 1 D$ high hit rate – Face Recognition Analysis l Cache – small L 1 D$ high hit rate – L 2$ is useless – most L 1 misses pass through l IPC – low even with lots of FP execution units – Why? » load store register & memory ports saturate l l multiple large matrix traversals are the critical kernel several indirect accesses per operation » dominant loop is a SFP inner product l l no single cycle accumulate Implications – restructure the code – loop fusion more temporary reg’s – need architectures which move data well School of Computing 43

CMU Sphinx 3. 2 Profile Feature Vector = 13 Mel + 1 st and CMU Sphinx 3. 2 Profile Feature Vector = 13 Mel + 1 st and 2 nd derivative è 10 ms of speech is compressed into 39 SP floats èi. Mic possibility School of Computing 44

Speech Analysis l Results – similar to Face. Rec » cache » port saturation Speech Analysis l Results – similar to Face. Rec » cache » port saturation – big difference » also memory B/W starved » due to language model (opt) School of Computing 45

Simple ASIC Design Example: Matrix Multiply def matrix_multiply(A, B, C): # C is the Simple ASIC Design Example: Matrix Multiply def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0. 0 for i in range(0, 16): sum = sum + A[row][i] * B[i][col] return sum School of Computing 46

ASIC Accelerator Design: Matrix Multiply Control Pattern def matrix_multiply(A, B, C): # C is ASIC Accelerator Design: Matrix Multiply Control Pattern def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0. 0 for i in range(0, 16): sum = sum + A[row][i] * B[i][col] return sum School of Computing 47

ASIC Accelerator Design: Matrix Multiply Access Pattern def matrix_multiply(A, B, C): # C is ASIC Accelerator Design: Matrix Multiply Access Pattern def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0. 0 for i in range(0, 16): sum = sum + A[row][i] * B[i][col] return sum School of Computing 48

ASIC Accelerator Design: Matrix Multiply Compute Pattern def matrix_multiply(A, B, C): # C is ASIC Accelerator Design: Matrix Multiply Compute Pattern def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0. 0 for i in range(0, 16): = sum return sum School of Computing + A[row][i] * B[i][col] 49

ASIC Accelerator Design: Matrix Multiply def matrix_multiply(A, B, C): # C is the result ASIC Accelerator Design: Matrix Multiply def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0. 0 for i in range(0, 16): sum = sum + A[row][i] * B[i][col] return sum School of Computing 50

How can we generalize ? l Decompose loop into: – Control pattern – Access How can we generalize ? l Decompose loop into: – Control pattern – Access pattern – Compute pattern Programmable h/w acceleration for each pattern School of Computing 51

Architecture Family School of Computing 52 Architecture Family School of Computing 52

Experimental Method l Measure processor power on – – l 2. 4 GHz Pentium Experimental Method l Measure processor power on – – l 2. 4 GHz Pentium 4, 0. 13 u process 400 MHz XScale, 0. 18 u process Perception Processor – 1 GHz, 0. 13 u process (Berkeley Predictive Tech Model) – Verilog, MCL HDLs – Synthesized using Synopsys Design Compiler – Fanout based heuristic wire loads – Spice (Nanosim) simulation yields current waveform – Numerical integration to calculate energy l ASICs in 0. 25 u process l Normalize 0. 18 u, 0. 25 u energy and delay numbers – model = constant field scaling School of Computing 53

Benchmarks l Visual feature recognition – – – l Speech recognition – – l Benchmarks l Visual feature recognition – – – l Speech recognition – – l HMM: 5 state Hidden Markov Model GAU: 39 element, 8 mixture Gaussian DSP – – l Erode, Dilate: Image segmentation operators Fleshtone: NCC flesh tone detector Viola, Rowley: Face detectors FFT: 128 point, complex to complex, floating point FIR: 32 tap, integer Encryption – Rijndael: 128 bit key, 576 byte packets School of Computing 54

Results: IPC Mean IPC = 3. 3 x R 14 K School of Computing Results: IPC Mean IPC = 3. 3 x R 14 K School of Computing 55

Results: Throughput Mean Throughput = 1. 75 x Pentium 0. 41 x ASIC School Results: Throughput Mean Throughput = 1. 75 x Pentium 0. 41 x ASIC School of Computing 56

Results: Energy Mean Energy/packet = 7. 4% of XScale 5 x of ASIC School Results: Energy Mean Energy/packet = 7. 4% of XScale 5 x of ASIC School of Computing 57

Results: Energy Delay Product Mean EDP = 159 x XScale 1/12 of ASIC School Results: Energy Delay Product Mean EDP = 159 x XScale 1/12 of ASIC School of Computing 58

Perception Results: Summary l l l 41% of ASIC’s performance But programmable! 1. 75 Perception Results: Summary l l l 41% of ASIC’s performance But programmable! 1. 75 times the Pentium 4’s throughput But 7. 4% of the energy of an XScale! advanced perceptive embedded systems are possible – – l above results are maximally pessimistic and as always there are improvements in the works Problems – – – manually intensive design process requires highly skilled programmer, architect, circuit designer current effort is to fix this School of Computing 59

Automating the design process Application Suite C Host Code C & ifc Splitter Human Automating the design process Application Suite C Host Code C & ifc Splitter Human opt. Stream Code Interaction Stream Compiler Host Compiler Co. Processor Description Host Object Code Synthesize School of Computing design choice Co. Processor Simulator Co. Processor Object Code dilation Design Track Graph add point Simulation Analysis & Design Space Explore 60

DSE Results Power Performance Requirement No Way Quadrant Too “Watty” Quadrant Power Limit Too DSE Results Power Performance Requirement No Way Quadrant Too “Watty” Quadrant Power Limit Too Dweeby Quadrant Choice Quadrant Performance School of Computing 61

Conclusions l Significant benefit – – l 3 forms of parallelism: control, address, execution Conclusions l Significant benefit – – l 3 forms of parallelism: control, address, execution program controlled communication patterns » able to mimic ASIC flows » more efficient use of execution units and memory structures Results to date (in terms of ed) – – – 2 -3 orders of magnitude improvement over GPP within 1 order of magnitude of an ASIC while maintaining most of the generality of the GPP approach School of Computing 62

Thanks! Questions? School of Computing 63 Thanks! Questions? School of Computing 63