Software Thread Integration for Concurrency and ILP in Embedded Systems Alex Dean alex_dean@ncsu. edu Center for Embedded Systems Research Department of Electrical and Computer Engineering North Carolina State University www. cesr. ncsu. edu/agdean 1
STI and ASTI Eliminate Context Switches and Interrupts Where They Limit Performance Primary Secondary Integrated • Problem: Embedded Guest (real-time) Thread systems are inherently Schedule Thread (Execution multithreaded, but (Asynchronous) Time Reqts. ) Software Thread most processors are Integration Hardware single-threaded Function • Solution: Create efficient implicitly Idle Time Reclaimed multithreaded Idle (integrated) functions Time • Use a compiler at design time to create the functions (compile for low-cost concurrency) • Build the task/function scheduling decisions into scheduler or ISRs Break down the barrier between task scheduling (scheduler and dispatcher) and instruction scheduling (compiler) foo. s Data-flow Control-flow Static Timing Analysis • Two efficiencies improved – Integrated threads more efficient – Integration process automated GProf foo. id • Simplifies hardware to software migration Integration Analysis Integration foo. int. s 2
Demo Systems: NTSC Generator & Hot Soft CAN STIGlitz: Sync. Threads w/STI ATmega 128 MCU Latch Byte Clock Divider Load 115 kbps serial port Pixel Clock Divider Shift MCU Clock Clear Hardware CAN Bus Honeywell HT 83 C 51 Microcontroller Serial Data Link to Controller CAN Controller 4 -bit Shift Register Honeywell HT 6256 32 K x 8 SRAM 4 -bit Shift Register Sync 64 k. Byte SRAM HSCAN: Async. Threads w/ASTI NTSC Video Out CAN Bus Software CAN Honeywell HT 83 C 51 Microcontroller Additional 4 k. Bytes of code Serial Data Link to Controller Honeywell HT 6256 32 K x 8 SRAM 3
Other Activities • STI for streaming programs on VLIW processors – Problem: VLIW processors often have many unused issue slots – Software pipelining doesn’t always work (resource bound < recurrence bound, control flow, calls, register pressure) – STI can help in many cases – Target System: Stream. It + TI C 6 x DSP – Stream. It implicitly guarantees data independence, simplifying analysis – Developing methods to analyze and transform Stream. It program graph to improve performance • Energy Efficiency for Low-End Embedded Systems • Portable Benchmarking for Embedded Systems 4