Скачать презентацию Goals Provide an overview of the Скачать презентацию Goals Provide an overview of the

a29b6ab207c71a0006a6dd9e832ab781.ppt

  • Количество слайдов: 107

Goals • • Provide an overview of the 8260 device Allow a quick start Goals • • Provide an overview of the 8260 device Allow a quick start of an 8260 design cycle Gain familiarity with debug issues particular to the 8260 Create the basis to build further experience [Rev 2. 8] 1 of 107

Outline • 8260 Architecture • Application examples • Debug considerations [Rev 2. 8] 2 Outline • 8260 Architecture • Application examples • Debug considerations [Rev 2. 8] 2 of 107

Outline • 8260 Architecture – – Device overview Core CPU SIU CPM [Rev 2. Outline • 8260 Architecture – – Device overview Core CPU SIU CPM [Rev 2. 8] 3 of 107

EC 603 e Power. PC Core 16 KB I-Cache IMMU 16 KB D-Cache DMMU EC 603 e Power. PC Core 16 KB I-Cache IMMU 16 KB D-Cache DMMU COMM. PROCESSOR MODULE Internal Four Serial Interrupt Memory Timers DMAs Controller Space Parallel I/O 32 -bit RISC and Virtual Baud Rate Generators Timers Program ROM IDMAs SYSTEM INTERFACE UNIT 60 x Bus Interface Unit Power. PC-to-Local Bridge Local Bus Interface Unit Memory Controller Time Counter/PIT Bus Arbiter L 2 Cache Controller System Functions MCC 1 MCC 2 FCC 1 FCC 2 FCC 3 SCC 1 SCC 2 SCC 3 SCC 4 SMC 1 SMC 2 SPI I 2 C Serial Interface Time Slot Assigner 8 TDMs MII 2 UTOPIA [Rev 2. 8] 4 of 107

CPU • • • Based on the MPC 603 e core Up to two CPU • • • Based on the MPC 603 e core Up to two instructions fetched per clock Up to three instructions issued and retired per clock Up to five instructions in execution per clock Most instructions execute in one clock Branches can execute in zero clocks [Rev 2. 8] 5 of 107

Programming Model 32 bits 64 bits GPR 0 GPR 1 GPR 2 GPR 3 Programming Model 32 bits 64 bits GPR 0 GPR 1 GPR 2 GPR 3 GPR 4 FPR 0 FPR 1 FPR 2 FPR 3 FPR 4 CR XER FPSCR MSR PVR GPR 30 GPR 31 FPR 30 FPR 31 CTR LR TBU TBL SRR 0 SRR 1 DEC SPRn SPRx [Rev 2. 8] 6 of 107

MSR Bit 0 is MSB 0 0 0 0 POW 0 ILE EE PR MSR Bit 0 is MSB 0 0 0 0 POW 0 ILE EE PR FP ME FE 0 SE BE FE 1 0 Bit 31 is LSB IP IR DR 0 0 RI LE Power management enabled Interrupt little endian mode External interrupt enable Privilege level Floating point available Machine check enable Floating point exception mode [0, 1] Single step trace enabled Branch trace enabled Exception [interrupt] prefix Instruction address translation enabled Data address translation enabled Recoverable exception Little endian mode [Rev 2. 8] 7 of 107

CPU Overview Inst. Cache Branch Processing Sequential Fetcher System Register Unit Instruction Queue Dispatch CPU Overview Inst. Cache Branch Processing Sequential Fetcher System Register Unit Instruction Queue Dispatch Inst. MMU CTR CR LR Floating Point Unit Instruction Unit / + * Integer Unit / + * XER GPR File R 0 -R 31 GP Rename Regs FPR File Load/Store Unit FPR 0 -FPR 31 FP Rename Regs Data MMU Completion Unit Main Memory Data Cache [Rev 2. 8] 8 of 107

Execution Units • Execution units operate in parallel – – – Fetch / Branch Execution Units • Execution units operate in parallel – – – Fetch / Branch Integer Floating Point Load / Store System Completion [Rev 2. 8] 9 of 107

Fetch / Dispatch • • Instructions are fetched in pairs Non-branch instructions enter the Fetch / Dispatch • • Instructions are fetched in pairs Non-branch instructions enter the instruction queue Branch instructions are redirected to the branch unit Two instructions can be sent to the execution units and one to the branch unit for a total of three issued instructions per clock • All instructions “appear” to execute sequentially [Rev 2. 8] 10 of 107

On each CPU clock: 64 bit wide transfer from instruction cache Instruction Cache Instructions On each CPU clock: 64 bit wide transfer from instruction cache Instruction Cache Instructions fall through to first open location in queue Instruction Instruction Branch instruction closest to the bottom of the queue is issued to the branch unit on each clock Bottom two non-branch instructions are dispatched to available execution units Instruction Execution Unit Instruction Branch Processing CTR CR LR [Rev 2. 8] 11 of 107

Branch • Branches are pre-executed, giving an effective execution time of zero clocks • Branch • Branches are pre-executed, giving an effective execution time of zero clocks • Instruction queue provides look ahead to determine data dependencies • Unresolved conditional branches are statically predicted under control of the compiler [Rev 2. 8] 12 of 107

Subroutine Control Flow Software maintained stack Address of this instruction is placed into the Subroutine Control Flow Software maintained stack Address of this instruction is placed into the Link Register by the branch function GPR 1 Branch to sub LR Instructions save the LR to the stack to allow nested function calls Branch to sub The LR is reused for another call LR Branch to LR The LR is recalled from the stack to allow a return from subroutine Branching to the contents of the LR is a return instruction [Rev 2. 8] 13 of 107

Integer • Integer unit directly accesses the GPR file • Rename registers prevent stalls Integer • Integer unit directly accesses the GPR file • Rename registers prevent stalls and allow instructions to be un-executed • Most instructions execute in one clock • Divides have been optimized over the 603 to reduce latency by 50% [Rev 2. 8] 14 of 107

Floating Point • Floating point unit directly accesses the FPR file • Rename registers Floating Point • Floating point unit directly accesses the FPR file • Rename registers prevent stalls and allow instructions to be un-executed (The same as in the integer GPR file) • Supports single (32 bit) and double (64 bit) precision operands • Three stage pipeline accepts one instruction per clock • Supports all IEEE 754 floating-point data types (normalized, denormalized, Na. N, zero, and infinity) in hardware, eliminating the latency incurred by software exception routines [Rev 2. 8] 15 of 107

Load/Store • Responsible for all transfers between the GPR file and main memory • Load/Store • Responsible for all transfers between the GPR file and main memory • Instructions appear to execute in order • Actual accesses can occur out of order • Loads from cache execute in one clock with a two clock latency • Stores to cache execute in one clock with a latency of three clocks • Speculative loads are placed in the rename registers • Speculative stores remain in the store queue [Rev 2. 8] 16 of 107

System • Performs moves to and from SPR’s • Doubles as an auxiliary integer System • Performs moves to and from SPR’s • Doubles as an auxiliary integer unit – Executes add / compare instructions – Executes condition register logical operations • Instructions that affect processor mode force serialization of the processor [Rev 2. 8] 17 of 107

Completion • Holds instructions executed in parallel or out of order until they can Completion • Holds instructions executed in parallel or out of order until they can be retired in order • Retiring an instruction commits it’s results to the processor state • Simply discarding an instruction from the completion queue effectively un-executes it • Two instructions can be retired per clock [Rev 2. 8] 18 of 107

Instruction Set • 68 K instructions were based on an accumulator, direct memory model Instruction Set • 68 K instructions were based on an accumulator, direct memory model add (0 x 00035300). L, D 4 D 0 D 1 D 2 D 3 D 4 D 5 D 6 D 7 0 x 00035300 + [Rev 2. 8] 19 of 107

Instruction Set • Power. PC instructions are based on a triadic, load/store model lwz Instruction Set • Power. PC instructions are based on a triadic, load/store model lwz add r 2, 0 x 00035300 r 6, r 2, r 4 GPR 0 GPR 1 GPR 2 GPR 3 GPR 4 GPR 5 GPR 6 GPR 7 0 x 00035300 + GPR 31 [Rev 2. 8] 20 of 107

Exceptions • All exceptions cause processing to vector to a predetermined memory location • Exceptions • All exceptions cause processing to vector to a predetermined memory location • The base address of the vector table is controlled by the [IP] bit in the MSR • Each vector is placed at a page boundary • • • 64 instructions can be placed at a vector before hitting the next vector Reset = 0 xnnn 00100 Machine Check = 0 xnnn 00200 External Interrupt = 0 xnnn 00500 Decrementer = 0 xnnn 00900 Etc. [Rev 2. 8] 21 of 107

Exceptions Flash MSR[IP] = 1 FFF 00100 Instruction 64 instructions External 500 Instruction 64 Exceptions Flash MSR[IP] = 1 FFF 00100 Instruction 64 instructions External 500 Instruction 64 instructions ISI DSI RAM 00000100 MSR[IP] = 0 400 300 Machine Check 200 Reset 100 Instruction Instruction 64 instructions [Rev 2. 8] 22 of 107

Exceptions • Only the Decrementer and the External Interrupt can be masked by the Exceptions • Only the Decrementer and the External Interrupt can be masked by the [EE] bit in the MSR • Machine Check exceptions can vector to a routine or force Checkstop state • All other exceptions are synchronous (caused by instruction execution) and are unmaskable [Rev 2. 8] 23 of 107

Nesting Exceptions • When an exception occurs, return state is stored in the processor Nesting Exceptions • When an exception occurs, return state is stored in the processor • • There is no automated stacking of critical registers The address of the return instruction is stored in SRR 0 The MSR prior to the exception is in SRR 1 The [EE] bit of the MSR is cleared • The processor must save these registers and any other GPR’s to a software maintained stack • The EABI specifies GPR 1 to be the stack pointer • The [RI] bit in the MSR is set by software when enough information is saved to allow recovery from a nested exception [Rev 2. 8] 24 of 107

Exception Control Flow Address of this instruction is placed into SRR 0 by the Exception Control Flow Address of this instruction is placed into SRR 0 by the hardware An exception after the completion of this instruction causes flow to be directed to the ISR Software maintained stack GPR 1 SRR 0 SRR 1 Instructions save the SRR’s to the stack to allow nested exceptions The MSR[RI] bit is cleared by the exception hardware and set by software after the SRR’s have been saved An exception while MSR[RI] is cleared causes a machine check event The MSR[RI] bit is cleared by the software just before the SRR’s are restored by the software It is safe for exceptions to occur in this section of code Breakpoints Are Exceptions! The SRR’s is recalled from the stack to allow a return from subroutine rfi [Rev 2. 8] 25 of 107

Cache • Independent instruction and data caches implements an internal Harvard Architecture • Each Cache • Independent instruction and data caches implements an internal Harvard Architecture • Each cache is 16 Kbyte, four way set associative • Caching of separate memory areas is controlled by the MMU [Rev 2. 8] 26 of 107

Cache Organization 0 Stored in address tag (20) Set select (7) 31 Word Byte Cache Organization 0 Stored in address tag (20) Set select (7) 31 Word Byte Way 0 Block 508 Way 1 Block 509 Way 2 Block 510 Way 3 128 sets Block 511 Way 0 Address Tag 0 Way 1 Address Tag 1 Way 2 Address Tag 2 Way 3 Address Tag 3 State Words 0 -7 Block 0 Words 0 -7 Block 1 State Words 0 -7 Block 2 Words 0 -7 Block 3 [Rev 2. 8] 27 of 107

Cache Operation • Each cache block (or line) can be in one of three Cache Operation • Each cache block (or line) can be in one of three state (MEI protocol) – M = modified (or dirty) • Resides in cache and is different than memory – E = exclusive (resident and clean) • Resides in cache and is identical to memory – I = invalid (not resident) • The “shared” state of the full MESI protocol is not supported – Would allow synchronization of multiply cached blocks • There is no cache coherency for the instruction cache [Rev 2. 8] 28 of 107

Cache control • Hardware implementation dependent registers (HIDn) control cache function – Enabling – Cache control • Hardware implementation dependent registers (HIDn) control cache function – Enabling – Invalidate – Locking • Supervisor instructions provide block level control – Allocate, flush, invalidate, store, touch, zero • Ability to store a given block of memory into the cache is controlled by the MMU – Each block or page in the MMU has WIMG bits • (Write-through, Inhibited, Global, Guarded) [Rev 2. 8] 29 of 107

MMU • The MMU provides for both memory translation and access control • The MMU • The MMU provides for both memory translation and access control • The system boots in Real (un-translated) mode • To effectively use the caches, the MMU must be used in block or page mode – Effectively, a null translation is performed [Rev 2. 8] 30 of 107

Protection • The primary use of the MMU in embedded applications is for cache Protection • The primary use of the MMU in embedded applications is for cache control and access protection • The WIMG bits are set for each page – – W = write-through (applicable only to data cache) I = inhibited M = memory coherency supported in hardware G = guarded (indicates that memory is ill-behaved) • I/O spaces • All accesses are forced to be in order • No speculative reads or pre-fetches [Rev 2. 8] 31 of 107

Translation • Block or page translation allows the full use of a virtual memory Translation • Block or page translation allows the full use of a virtual memory model • Block translation provides a memory space of 232 bytes • Page translation provides a virtual memory space of 252 bytes • System must be debugged with RTOS tools – Emulators and hardware debuggers don’t support it [Rev 2. 8] 32 of 107

Real mode 32 Logical address WIMG: W = 0: write-back I = 0: cache Real mode 32 Logical address WIMG: W = 0: write-back I = 0: cache enable M = 1: data is global G = 1: memory is guarded 32 Physical address [Rev 2. 8] 33 of 107

BAT mode 4 11 17 BL (11) BEPI (15) BAT Reg n 4 WIMG BAT mode 4 11 17 BL (11) BEPI (15) BAT Reg n 4 WIMG 11 Logical address 4 BRPN & 11 + 17 Physical address [Rev 2. 8] 34 of 107

Page mode Logical address 4 16 12 Segment register Virtual address 24 16 12 Page mode Logical address 4 16 12 Segment register Virtual address 24 16 12 40 TLB page table 20 WIMG 12 Physical address [Rev 2. 8] 35 of 107

Reset operation Reset Source Power-on reset External hard reset Software watchdog Bus monitor Checkstop Reset operation Reset Source Power-on reset External hard reset Software watchdog Bus monitor Checkstop External soft reset Reset PLL System configuration sampled Clock module reset HREST driven Other internal logic reset SREST driven Core reset yes yes yes yes [Rev 2. 8] 36 of 107

Reset Types • Power-on reset is used to align all logic from a chaotic Reset Types • Power-on reset is used to align all logic from a chaotic state after Vcc stabilizes – The PLL then begins to lock • Hard reset is analogous to the normal reset on other processors – The PLL is not affected • Soft reset can be used to initiate a warm start – Not commonly used – Not driven or monitored by the emulator – Basically, a non-returnable exception to the reset vector [Rev 2. 8] 37 of 107

Reset Sequence POR asserted HRESET asserted SREST asserted HREST & SREST asserted PLL locks Reset Sequence POR asserted HRESET asserted SREST asserted HREST & SREST asserted PLL locks RSTCONF sampled Internal logic reset HREST & SRESET negated [Rev 2. 8] 38 of 107

Memory Map Startup Boot Map Before Config Word At boot, CS 0 is active Memory Map Startup Boot Map Before Config Word At boot, CS 0 is active for one of two large areas of the address space. All other chip selects are invalid. Flash Flash IMMR Flash CS 0 After Config Word Application Target Map IMMR CSi IMMR I/O Flash CSx, y, z Flash RAM [Rev 2. 8] 39 of 107

Memory Map Implications • Since the Flash memory access by CS 0 occupies one Memory Map Implications • Since the Flash memory access by CS 0 occupies one of two large areas in the address space, boot code can be linked to execute in a number of different locations • Any branches will change the NIA from the boot location to the linked location • All other chip selects are off • IMMR RAM is still available • CS 0 must be reduced in scope before activating other chip selects • Be careful no to pull the rug out from under the boot code when reducing CS 0 • BSP re-entry issues: • Altering chip select option registers while assuming the value in the Valid bit • Can the chip selects to the RAM and Flash be altered while running out of either? [Rev 2. 8] 40 of 107

Memory Map Init Issues • Three different factors can enhance (confuse) the boot process: Memory Map Init Issues • Three different factors can enhance (confuse) the boot process: • The MSR[IP] • The reset vector can be 0 x 0000_0100 or 0 xfff 0_0100 • Determined by the Reset Configuration Word • Not changed by an SRESET • CS 0 scope • CS 0 responds to either a the upper or lower end of the memory map • It must be changed while it is being used • It may have already been reduced by a previous pass through the BSP • Code link results • Execution can start in code that is linked to a different address than the boot vector • Only the address lines within the memory device are significant • PC Relative addressing will solve this, right? WRONG! • The first branch, will set the NIA MSB’s to the current execution value [Rev 2. 8] 41 of 107

RTOS Boot Sequences Compressed application image Flash External application image Boot Code Boot code RTOS Boot Sequences Compressed application image Flash External application image Boot Code Boot code decompresses and relocates application from flash BSP IMMR Data, stack, heap, etc. I/O Chip Select x Uncompressed application image BSP Boot code loads application over communication channel or backplane Base Register Base Address RAM V Option Register Mask Options [Rev 2. 8] 42 of 107

Endian Bus Connections 31 MS Byte Lane 24 7 0 8 Bit 7 LS Endian Bus Connections 31 MS Byte Lane 24 7 0 8 Bit 7 LS Byte Lane 0 7 0 8 Bit 0 MS Byte Lane 7 7 0 8 Bit 68 K 7 LS Byte Lane 0 31 MS Byte Lane 24 X 86 PPC 24 LS Byte Lane 31 [Rev 2. 8] 43 of 107

Big Endian Bus 8 Bit 16 Bit 7 -0 15 -8 0 -7 7 Big Endian Bus 8 Bit 16 Bit 7 -0 15 -8 0 -7 7 -0 0 -7 8260 31 -24 23 -16 15 -8 7 -0 0 -7 8 -15 0 7 8 15 16 23 24 31 32 39 40 47 48 55 56 63 32 Bit 8 -15 16 -23 MS Byte Lane Byte Lane LS Byte Lane 24 -31 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 [Rev 2. 8] 64 Bit 44 of 107

Configuration Word • Configuration word is latched from Flash memory during reset cycle • Configuration Word • Configuration word is latched from Flash memory during reset cycle • A 32 bit value is loaded 8 bits at a time from the high order bits of the data bus – Immune to boot memory width • RSTCONF pin allows configuration word to be forced to all zero • Multiple 8260 can access the same memory device [Rev 2. 8] 45 of 107

Configuration Word Contents EARB EXMC CDIS EBM BPS CIP BMS BBD ISPS MMR • Configuration Word Contents EARB EXMC CDIS EBM BPS CIP BMS BBD ISPS MMR • • EARB – External arbitration EXMC – External memory controller CDIS - Core disable EBM - External bus mode • • BPS – Boot port size CIP – Core initial prefix • • • ISPS – Internal space port size L 2 CPC – L 2 cache control pins DPPC – Data parity pin configuration • L 2 CPC LBPC DPPC - APPC ISB CS 10 PC - MODCK_H ISB – Internal space base address Ø BMS – Boot memory space Ø BBD – Busy bus disable MMR – Mask Masters request LBPC – Local bus pin configuration APPC – Address parity pin configuration CS 10 PC – CS 10 pin configuration MODCK_H – MODCK high order bits Ø Ø Ø [Rev 2. 8] 46 of 107

Configuration Word Format 8 bit wide boot device Address offset from CS 0 603 Configuration Word Format 8 bit wide boot device Address offset from CS 0 603 bus MSB byte lane (0 -7) 0 x 00 0 x 01 Byte 0 Ignored 0 x 08 0 x 09 Byte 1 Ignored 0 x 10 0 x 11 Byte 2 Ignored 0 x 18 0 x 19 32 bit wide boot device Address 603 bus offset from MSB byte CS 0 lane (0 -7) Byte 3 Ignored 0 x 00 0 x 04 0 x 08 0 x 0 C 0 x 10 0 x 14 0 x 18 0 x 1 C Byte 0 Ignored Byte 1 Ignored Byte 2 Ignored Byte 3 Ignored 603 bus byte lane (24 -31) Ignored Ignored Ignored Ignored [Rev 2. 8] Ignored Ignored 47 of 107

Configuring a single 8260 A bus D bus Vcc RSTCONF 8260 A bus D Configuring a single 8260 A bus D bus Vcc RSTCONF 8260 A bus D bus Boot Flash RSTCONF [Rev 2. 8] 48 of 107

Configuring multiple 8260’s Master 8260 A bus D bus Boot Flash RSTCONF Slave 1 Configuring multiple 8260’s Master 8260 A bus D bus Boot Flash RSTCONF Slave 1 A bus D bus 8260 RSTCONF Slave 7 A bus D bus 8260 RSTCONF A 0 A 6 [Rev 2. 8] 49 of 107

SIU • The SIU contains the logic to interface the external system components to SIU • The SIU contains the logic to interface the external system components to the 8260 • Contains all of the glue logic needed for a typical embedded application [Rev 2. 8] 50 of 107

SIU Overview SYSTEM INTERFACE UNIT 60 x Bus Interface Unit Power. PC-to-Local Bridge Local SIU Overview SYSTEM INTERFACE UNIT 60 x Bus Interface Unit Power. PC-to-Local Bridge Local Bus Interface Unit Memory Controller Time Counter/PIT Bus Arbiter L 2 Cache Controller System Functions [Rev 2. 8] 51 of 107

603 e Bus • Very high performance bus – – – Separate address and 603 e Bus • Very high performance bus – – – Separate address and data tenures Pipelined Bursting Multi-master Cache snooping [Rev 2. 8] 52 of 107

603 e bus cycle Address only cycle to support cache snoop Address Data [Rev 603 e bus cycle Address only cycle to support cache snoop Address Data [Rev 2. 8] 53 of 107

Local Bus Two busses, one address map: Address map Flash Code/Data SDRAM CPM Buffer Local Bus Two busses, one address map: Address map Flash Code/Data SDRAM CPM Buffer SDRAM Code/Data SDRAM Memory Control CPM Buffer SDRAM [Rev 2. 8] 54 of 107

Memory Control • 12 banks of memory – Each can be configured for any Memory Control • 12 banks of memory – Each can be configured for any type of device • Glueless support of SDRAM devices • Glueless support of SRAM, EPROM, Flash – Using general purpose chip select machine • Three user programmable machines • All memory controllers can be allocated to either the 603 or local bus [Rev 2. 8] 55 of 107

System control • • Clock synthesis Reset control Interrupt control Real time clock Periodic System control • • Clock synthesis Reset control Interrupt control Real time clock Periodic interrupt timer Bus monitor Bus arbiter Watchdog timer [Rev 2. 8] 56 of 107

Interrupt Control Software Watchdog Timer Or IRQ 0 IRQ[0 -7] MCP Fall / Level Interrupt Control Software Watchdog Timer Or IRQ 0 IRQ[0 -7] MCP Fall / Level Port C [0 -15] CPM Channels On board Timers Edge / Fall Interrupt Controller IRQ[1 -7] INT 603 Core [Rev 2. 8] 57 of 107

SIU Interrupt Vectors • All external interrupts cause processing at 0 xnnn 00500 – SIU Interrupt Vectors • All external interrupts cause processing at 0 xnnn 00500 – There is space for 64 instructions to save processor state and resolve the SIU vector • Vectors are six bits – Shifting w/ indirect addressing is used to decommutate to service routines – A 16 bit load from the long word address of the SIVEC register will point to a 64 entry array of 1 K byte (256 instructions) service routines. – An 8 bit load will allow a 64 entry jump table of branch instructions [Rev 2. 8] 58 of 107

SIU Interrupt Vector Register 5 6 0 Six Bit Interrupt Code 0 7 8 SIU Interrupt Vector Register 5 6 0 Six Bit Interrupt Code 0 7 8 0 0 15 16 0 0 0 0 31 0 0 0 0 8 bit read from address 0 xnnn 10 C 04 16 bit read from address 0 xnnn 10 C 04 32 bit read from address 0 xnnn 10 C 04 [Rev 2. 8] 59 of 107

SIU Interrupt Vectors 8 bit Read Six Bit Interrupt Code 0 0 Table of SIU Interrupt Vectors 8 bit Read Six Bit Interrupt Code 0 0 Table of branch instructions to ISRs Each vector value points to a different branch instruction in the table ba routine_g ba routine_f ba routine_e ba routine_d ba routine_c ba routine_b ba routine_a _18 _14 _10 _0 c _08 _04 _00 [Rev 2. 8] 60 of 107

SIU Interrupt Vectors 16 bit Read Six Bit Interrupt Code 0 0 0 0 SIU Interrupt Vectors 16 bit Read Six Bit Interrupt Code 0 0 0 0 0 nnnn 0 fff Each vector value points to a block of 1 K bytes / 256 instructions 256 32 -bit instructions nnnn 0 c 00 nnnn 0 bff 256 32 -bit instructions nnnn 0800 nnnn 07 ff 256 32 -bit instructions nnnn 0400 nnnn 03 ff 256 32 -bit instructions nnnn 0000 [Rev 2. 8] 61 of 107

CPM • Communications processor module • Direct hardware support for all protocol and application CPM • Communications processor module • Direct hardware support for all protocol and application interfaces – Ethernet, ATM, HDLC, T 1/E 1, T 3/E 3, Bi. Sync, UART, ISDN, PCM highway – Parallel I/O – Full serial and virtual DMA support [Rev 2. 8] 62 of 107

IMMR Format • All on-chip peripherals are accessed though a single 128 K byte IMMR Format • All on-chip peripherals are accessed though a single 128 K byte area of memory • Within the first 64 K of address space, there are three blocks of dual ported RAM • The second 64 K of address space contains the control registers of the on-chip peripherals [Rev 2. 8] 63 of 107

0 x 1_ffff IMMR Map Upper 64 K Hardware Registers 0 x 1_4000 SI 0 x 1_ffff IMMR Map Upper 64 K Hardware Registers 0 x 1_4000 SI routing RAM (8 K) 0 x 1_2000 0 x 1_1 c 00 Control registers (7 K) 0 x 1_0000 0 x 0_c 000 0 x 0_b 000 Lower 64 K Dual Ported RAM FCC Data (4 K) 0 x 0_9000 0 x 0_8000 Parameter RAM (4 K) 0 x 0_4000 Buffer Descriptors / u. Code / Data (16 K) 0 x 0_0000 [Rev 2. 8] 64 of 107

Dual Ported RAM usage • The layout of the Dual Ported RAM is determined Dual Ported RAM usage • The layout of the Dual Ported RAM is determined by the u. Code in the CPM • When the CPM is not in operation, it is nothing more than internal memory – During the boot sequence, stack, global data, and heap can reside in this memory – Initialization code can be written in C++! – A multi-layered boot process can be used • First code resides in flash, uses internal RAM to setup chip selects • Second code resides in another section of flash and uses external RAM to load main application over a CPM channel • Third level is the main application – Each level has it’s own crt 0. s function and initializes the EABI from scratch [Rev 2. 8] 65 of 107

CPM Overview COMM. PROCESSOR MODULE Four Internal Interrupt Timers Memory Controller Parallel I/O Space CPM Overview COMM. PROCESSOR MODULE Four Internal Interrupt Timers Memory Controller Parallel I/O Space Baud Rate 32 -bit RISC and Generators Timers Program ROM Serial DMAs Virtual IDMAs MCC 1 MCC 2 FCC 1 FCC 2 FCC 3 SCC 1 SCC 2 SCC 3 SCC 4 SMC 1 SMC 2 SPI I 2 C Time Slot Assigner Serial Interface [Rev 2. 8] 66 of 107

DMA’s • Serial DMA’s – Full bi-directional support of all serial channels – Can DMA’s • Serial DMA’s – Full bi-directional support of all serial channels – Can access the 603 or local bus • Virtual DMA – 4 channels – Uses the serial DMA hardware to generate transfers – Memory to memory or memory to/from I/O [Rev 2. 8] 67 of 107

CPM Buffer Structure BD 128 IMMR BD 3 BD 2 BD 1 RAM [Rev CPM Buffer Structure BD 128 IMMR BD 3 BD 2 BD 1 RAM [Rev 2. 8] 68 of 107

Buffer Descriptor Format 16 bits Status and Control Data Length High Order Pointer Low Buffer Descriptor Format 16 bits Status and Control Data Length High Order Pointer Low Order Pointer [Rev 2. 8] 69 of 107

From Channel to Buffer Location fixed by: - Hardware channel Format fixed by: - From Channel to Buffer Location fixed by: - Hardware channel Format fixed by: - Protocol Communication Channel hardware Parameter RAM Dual ported RAM (Buffer Descriptors) Location determined by: - Value in Buffer Descriptor - Memory controller mapping of Local/603 bus Format determined by: - Protocol Data Buffers Location determined by: - Parameter RAM value Format of control and status determined by Protocol [Rev 2. 8] 70 of 107

SCC’s • The SCC’s implement the following protocols: – – SDLC/HDLC Apple. Talk UART SCC’s • The SCC’s implement the following protocols: – – SDLC/HDLC Apple. Talk UART 10 -Mbps Ethernet [Rev 2. 8] 71 of 107

Ethernet Frame Stored by CPM in Receive buffer Stored by CPU in Transmit buffer Ethernet Frame Stored by CPM in Receive buffer Stored by CPU in Transmit buffer Preamble Start Frame Destination Address Source Address Type / Length 7 bytes 1 byte 6 bytes 2 bytes Data 46 - 1500 bytes Frame Check 4 bytes [Rev 2. 8] 72 of 107

Ethernet Buffer Descriptor Receive Control & Status E Transmit Control & Status R Common Ethernet Buffer Descriptor Receive Control & Status E Transmit Control & Status R Common for Transmit and Receive - W I L F PAD - M - LG NO SH CR OV CL W I L TC DEF HB RC RL RC UN CSL Data Length High Order Pointer Low Order Pointer [Rev 2. 8] 73 of 107

Status and Control Definitions Receive Control & Status E - W I L F Status and Control Definitions Receive Control & Status E - W I L F - M - LG NO SH CR OV CL First in Frame: Set by the CPM to inform the CPU that this is the start of a new frame. Last in Frame: Set by the CPM or the CPU to inform the other that this is the last buffer of a frame. Interrupt: Generate an interrupt after this buffer is used by the CPM. Wrap: This is the last BD in this set of BD’s. Empty / Ready: 0 = This buffer is owned by the CPU 1 = This buffer is owned by the CPM Transmit CRC: Transmit the CRC after this buffer Transmit Control & Status R PAD W I L TC DEF HB RC RL RC UN CSL [Rev 2. 8] 74 of 107

Transmit Frames Parameter RAM points to this BD R=0 W=0 I=0 L = 0 Transmit Frames Parameter RAM points to this BD R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=1 L = 1 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=1 L = 1 TC = 1 R=0 W=1 I=1 L = 1 TC = 1 After all buffers are filled, “R” is set to “ 1” in all BD’s in this list These BD’s are for the next frame for this channel This BD is for a single buffer frame [Rev 2. 8] 75 of 107

Receive Frames Parameter RAM points to this BD E=1 W=0 I=0 L=0 F=1 E=1 Receive Frames Parameter RAM points to this BD E=1 W=0 I=0 L=0 F=1 E=1 W=0 I=0 L=0 F=0 E=1 W=0 I=0 L=1 F=0 E=1 W=0 I=0 L=0 F=1 E=1 W=0 I=0 L=0 F=0 E=1 W=0 I=0 L=1 F=0 E=1 W=1 I=0 L=1 F=1 After all buffers are filled, “E” is set to “ 1” in all BD’s in this list These BD’s are for the next frame for this channel This BD is for a single buffer frame [Rev 2. 8] 76 of 107

The [E/R] bits Initial Value Operation Transmit [Ready] 0 Fill with data by CPU The [E/R] bits Initial Value Operation Transmit [Ready] 0 Fill with data by CPU Receive [Empty] 1 Fill with data by CPM Changed by Changed to Operation Changed by to CPU 1 CPM transmits buffer CPM 0 CPU reads buffer CPU 1 Polarity can be confusing because the sense is reversed for complementary operations. However, the same level always indicates who [CPU vs. CPM] owns the buffer. This bit is the same for all protocols on all channels. [Rev 2. 8] 77 of 107

The [W] bits • The Wrap bit is always set to indicate the last The [W] bits • The Wrap bit is always set to indicate the last buffer descriptor for the channel • It does not delineate frames! • The value of the first buffer descriptor is stored in the channel’s parameter RAM – The list of BD’s is bounded by the parameter RAM and the [W] bit • Any BD past a BD with the [W] bit set, that’s not pointed to by parameter RAM is inaccessible by the CPM • This bit is the same for all protocols on all channels. [Rev 2. 8] 78 of 107

The [I] Bits • The Interrupt bits generate an interrupt to the CPU when The [I] Bits • The Interrupt bits generate an interrupt to the CPU when the CPM hands the BD to the CPU – Whenever the CPM flips the [E/R] bit to “ 0” • A redundant phrase, the CPM can only flip that bit to “ 0”, right? • For transmit, it’s common to only receive an interrupt at the end of transmission of the last buffer • For receive, the last buffer is not known, so it’s more common to receive an interrupt for most buffers on non-frame oriented protocols – If a buffer is small enough that it can’t contain an entire frame, then this bit might be cleared • The CPU has to stay ahead of the CPM to know when a wrap occurred – On Ethernet, the end of frame interrupt is more efficient • This bit is the same for all protocols on all channels. [Rev 2. 8] 79 of 107

The [L] Bits • The Last bits indicate the end of a frame within The [L] Bits • The Last bits indicate the end of a frame within the list of buffer descriptors • Set and cleared by the CPU on transmit frames – The CPM only reads this bit for transmit • Set by the CPM on receive frames – Should be cleared by the CPU before the [E] is used to hand the buffer to the CPM • This bit is not the same for all protocols on all channels. [Rev 2. 8] 80 of 107

The [F] Bits • The First bit is only present in receive frames • The [F] Bits • The First bit is only present in receive frames • Set by the CPM to tell the CPU that this buffer starts a frame – An underrun, late collision, or aborted frame can cause a new frame in the next buffer without the [L] bit being set in the previous BD • Not needed for transmit – The CPU will control the state of the CPM with the [L] bit – An [L] bit set or an underrun will cause the next buffer to be considered the first buffer of a frame • This bit is not the same for all protocols on all channels. [Rev 2. 8] 81 of 107

The [TC] Bits • The Transmit CRC bits work in conjunction with the [L] The [TC] Bits • The Transmit CRC bits work in conjunction with the [L] bit • The [TC] bit is ignored if the [L] bit is cleared • Initializing all [TC] bits to “ 1” is a good precaution • Only custom protocols that don’t use hardware generated CRC’s should have this bit cleared • This bit is not the same for all protocols on all channels. [Rev 2. 8] 82 of 107

Subtle points on BD’s • Frames can span buffers • Buffers never span frames Subtle points on BD’s • Frames can span buffers • Buffers never span frames – Unless you have all hardware support turned off and are running transparent • Be careful with small receive buffers that have the [I] bit set – You’ll get hammered with interrupts • Turn buffers over to the CPM from last to first – If an interrupt interferes with the handoff, an underrun / overflow can occur • Hands off a BD with the [E/R] bit set – Unless you like working weekends [Rev 2. 8] 83 of 107

FCCs • The FCC’s support: – – 10/100 -Mbps Ethernet through an MII Full FCCs • The FCC’s support: – – 10/100 -Mbps Ethernet through an MII Full 155 Mbps ATM SAR through UTOPIA 45 Mbps HDLC (DS-3) Operation is similar to SCCs • Block mode allows buffers to be dynamically moved into dual ported RAM [Rev 2. 8] 84 of 107

FCC Buffer Descriptors • Identical in format to the SCC’s buffer descriptors • Except: FCC Buffer Descriptors • Identical in format to the SCC’s buffer descriptors • Except: – Buffer descriptors, as well as buffers are in main memory – Pointers to buffer descriptors in the parameter RAM are 32 bits • Buffer descriptors must still be in consecutive memory locations [Rev 2. 8] 85 of 107

SMC’s • The SMC’s perform basic UART as well as transparent mode transmission • SMC’s • The SMC’s perform basic UART as well as transparent mode transmission • Buffer description operation is identical to the SCC’s – The status and control word has different bit fields pertaining to the protocols – Bit fields controlling protocol independent operation are unchanged [Rev 2. 8] 86 of 107

Status and Control Definitions [SMC in UART mode] Receive Control & Status E - Status and Control Definitions [SMC in UART mode] Receive Control & Status E - W I - - CM ID - BR FR PR - OV - Idle: Close buffer on reception of idles Continuous mode: [E] bit isn’t cleared on buffer reception Interrupt: Generate an interrupt after this buffer is used by the CPM. Wrap: This is the last BD in this set of BD’s. Empty / Ready: Transmit Control & Status R - W I - - CM P - - 0 = This buffer is owned by the CPU 1 = This buffer is owned by the CPM - - - [Rev 2. 8] 87 of 107

MII PQ II MPC 8260 FCCn Transmit Error (Tx_ER) Transmit Nibble Data (Tx. D[3: MII PQ II MPC 8260 FCCn Transmit Error (Tx_ER) Transmit Nibble Data (Tx. D[3: 0]) Transmit Enable (Tx_EN) Transmit Clock (Tx_clk) Collision Detect (COL) Receive Nibble Data (Rx. D[3: 0]) Receive Error (Rx_ER) Receive Clock (Rx_clk) Receive Data Valid (Rx_DV) Carrier Sense output (CRS) Management Data Clock (MDC) Management Data I/O (MDIO) Fast Ethernet PHY [Rev 2. 8] 88 of 107

Utopia Interface A[24 -31] D[0 -7] BCTL 0* PWE 0*/PDQM/PBS 0* ATMCS 0* ATMRST* Utopia Interface A[24 -31] D[0 -7] BCTL 0* PWE 0*/PDQM/PBS 0* ATMCS 0* ATMRST* DP 6/CSE 0/IRQ 6* MPC 8260 A[7 -0] D[7 -0] CS* RD* WR* RST* ALE INT* PM 5350 [Rev 2. 8] 89 of 107

Applications • Performance drives the complexity of the 8260 system – Single processor • Applications • Performance drives the complexity of the 8260 system – Single processor • Single 8260 • Multiple 8260’s with all but one core turned off • Multiple 8260’s with all cores off, using an external MPC 750 – Multiple processor • Combinations of 8260’s and 750’s [Rev 2. 8] 90 of 107

Single 8260 MPC 8260 SDRAM/SRAM/DRAM/Flash 60 x Bus PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps Single 8260 MPC 8260 SDRAM/SRAM/DRAM/Flash 60 x Bus PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2. 8] 91 of 107

Multiple 8260 s MPC 8260 PHY SDRAM/SRAM/DRAM Local Bus Communication Channels ATM Connection Tables Multiple 8260 s MPC 8260 PHY SDRAM/SRAM/DRAM Local Bus Communication Channels ATM Connection Tables SDRAM/SRAM/DRAM/Flash 155 Mbps ATM PHY UTOPIA 60 x Bus MPC 8260 PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2. 8] 92 of 107

MPC 7 xx w/ 8260(s) MPC 7 xx Backside Cache 32 -Kbyte I cache MPC 7 xx w/ 8260(s) MPC 7 xx Backside Cache 32 -Kbyte I cache 32 -Kbyte D cache MPC 8260 PHY Communication Channels SDRAM/SRAM/DRAM/Flash 60 x Bus SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2. 8] 93 of 107

Debug Considerations Ø Ø Ø Ø What is JTAG Limitations Getting out of reset Debug Considerations Ø Ø Ø Ø What is JTAG Limitations Getting out of reset The 60 x Core and Bus The cache is on CPM Realities Exception Routines Tracing at the Bus Cycle Level [Rev 2. 8] 94 of 107

What is JTAG? Ø Ø Ø JTAG is a SLOW serial connection to the What is JTAG? Ø Ø Ø JTAG is a SLOW serial connection to the 8260 CPU resources The serial data is called the scan chain. JTAG provides the ability to modify memory and registers. The scan chain for each processor is different. JTAG was not created for Debug… [Rev 2. 8] 95 of 107

JTAG connection • JTAG connector allows for full run control of the processor • JTAG connection • JTAG connector allows for full run control of the processor • The emulator can sync with the processor without disrupting it’s state TDO TDI QREQ* TCK TMS SRESET* HRESET* XBR 3* TRST* 3. 3 V GND [Rev 2. 8] 96 of 107

JTAG Limitations Slow download of code to RAM. Ø JTAG accesses during execution MAY JTAG Limitations Slow download of code to RAM. Ø JTAG accesses during execution MAY dramatically affect performance. Ø All commands through JTAG must be “scanned in” Ø [Rev 2. 8] 97 of 107

Getting out of Reset Ø Ø Ø Reset Configuration word of vital importance TRST Getting out of Reset Ø Ø Ø Reset Configuration word of vital importance TRST must not be permanently asserted When flashing your boot code, be careful to replace or keep the configuration word What is your Interrupt Prefix? Switchable pullup on RSTCONF*? [Rev 2. 8] 98 of 107

The 60 x Core and Bus a STOP instruction must be scanned in (no The 60 x Core and Bus a STOP instruction must be scanned in (no breakpoint pin) Ø only one hardware code breakpoint available; no hardware data breakpoints Ø Address and Data do not necessarily appear on the bus at the same time Ø Predictive Fetching means what you see on the bus may not be executed. Ø [Rev 2. 8] 99 of 107

The Caches are On • Bus Cycles now appear as bursts • Fetches are The Caches are On • Bus Cycles now appear as bursts • Fetches are determined by the BIU, not related to instruction execution • No Cache Visibility pins • Instrumentation required for accurate debug • Caution must be exercised when the boot process performs a code relocation – – – Contents are cached as data during the move Contents are fetched as instructions after the move The instruction queue doesn’t snoop the data cache The load/store unit doesn’t snoop the instruction cache There is no cache coherency for the instruction cache [Rev 2. 8] 100 of 107

CPM Realities • The CPM operates independently of the CPU • The CPM is CPM Realities • The CPM operates independently of the CPU • The CPM is not debugged yet. . Expect the unexpected • Early releases of the silicon didn’t propagate watchdog resets to the external reset pin • “Last Buffer Interrupt” occurs at the beginning of transmission [Rev 2. 8] 101 of 107

Exception Routines are difficult to debug Ø The Recoverability of exceptions is an issue Exception Routines are difficult to debug Ø The Recoverability of exceptions is an issue Ø On board hardware breakpoints do not work in the head or tail of an exception handler Ø [Rev 2. 8] 102 of 107

Tracing at the Bus Cycle Level The 8260 comes in a BGA package Ø Tracing at the Bus Cycle Level The 8260 comes in a BGA package Ø Connecting to an emulator Ø Connecting to an analyzer Ø [Rev 2. 8] 103 of 107

Connecting to an Emulator Connection to Emulator Buffer Board Original 8260 BGA site to Connecting to an Emulator Connection to Emulator Buffer Board Original 8260 BGA site to pin socket Target Adaptor Pin header Target board [Rev 2. 8] 104 of 107

Connecting to an Analyzer Mictor Connectors 8260 Target board [Rev 2. 8] 105 of Connecting to an Analyzer Mictor Connectors 8260 Target board [Rev 2. 8] 105 of 107

Connecting to an Emulator or Analyzer Connection to Emulator Socket to Mictor adaptor - Connecting to an Emulator or Analyzer Connection to Emulator Socket to Mictor adaptor - OR Buffer Board Original 8260 BGA site to pin socket Target Adaptor Pin header Target board [Rev 2. 8] 106 of 107

Summary of debug issues • Init MMU before turning on caches • Loads and Summary of debug issues • Init MMU before turning on caches • Loads and stores can be re-ordered • The CPM doesn’t use the MMU’s or the caches • Don’t single step through moves to or from SPR’s • ISR’s can not have breakpoints in the first or last few instructions • Each processor must have it’s own JTAG connector • JTAG lines must be terminated with 1 K or 2 K values (depending on the signal) • JTAG connector should be within 2 inches of the processor • Provide for the ability to pull RSTCNFG high • When using the 750 as the CPU, provide the ability to access the 8260 configuration word in flash • Don’t place code or program data on the local bus [Rev 2. 8] 107 of 107