a29b6ab207c71a0006a6dd9e832ab781.ppt
- Количество слайдов: 107
Goals • • Provide an overview of the 8260 device Allow a quick start of an 8260 design cycle Gain familiarity with debug issues particular to the 8260 Create the basis to build further experience [Rev 2. 8] 1 of 107
Outline • 8260 Architecture • Application examples • Debug considerations [Rev 2. 8] 2 of 107
Outline • 8260 Architecture – – Device overview Core CPU SIU CPM [Rev 2. 8] 3 of 107
EC 603 e Power. PC Core 16 KB I-Cache IMMU 16 KB D-Cache DMMU COMM. PROCESSOR MODULE Internal Four Serial Interrupt Memory Timers DMAs Controller Space Parallel I/O 32 -bit RISC and Virtual Baud Rate Generators Timers Program ROM IDMAs SYSTEM INTERFACE UNIT 60 x Bus Interface Unit Power. PC-to-Local Bridge Local Bus Interface Unit Memory Controller Time Counter/PIT Bus Arbiter L 2 Cache Controller System Functions MCC 1 MCC 2 FCC 1 FCC 2 FCC 3 SCC 1 SCC 2 SCC 3 SCC 4 SMC 1 SMC 2 SPI I 2 C Serial Interface Time Slot Assigner 8 TDMs MII 2 UTOPIA [Rev 2. 8] 4 of 107
CPU • • • Based on the MPC 603 e core Up to two instructions fetched per clock Up to three instructions issued and retired per clock Up to five instructions in execution per clock Most instructions execute in one clock Branches can execute in zero clocks [Rev 2. 8] 5 of 107
Programming Model 32 bits 64 bits GPR 0 GPR 1 GPR 2 GPR 3 GPR 4 FPR 0 FPR 1 FPR 2 FPR 3 FPR 4 CR XER FPSCR MSR PVR GPR 30 GPR 31 FPR 30 FPR 31 CTR LR TBU TBL SRR 0 SRR 1 DEC SPRn SPRx [Rev 2. 8] 6 of 107
MSR Bit 0 is MSB 0 0 0 0 POW 0 ILE EE PR FP ME FE 0 SE BE FE 1 0 Bit 31 is LSB IP IR DR 0 0 RI LE Power management enabled Interrupt little endian mode External interrupt enable Privilege level Floating point available Machine check enable Floating point exception mode [0, 1] Single step trace enabled Branch trace enabled Exception [interrupt] prefix Instruction address translation enabled Data address translation enabled Recoverable exception Little endian mode [Rev 2. 8] 7 of 107
CPU Overview Inst. Cache Branch Processing Sequential Fetcher System Register Unit Instruction Queue Dispatch Inst. MMU CTR CR LR Floating Point Unit Instruction Unit / + * Integer Unit / + * XER GPR File R 0 -R 31 GP Rename Regs FPR File Load/Store Unit FPR 0 -FPR 31 FP Rename Regs Data MMU Completion Unit Main Memory Data Cache [Rev 2. 8] 8 of 107
Execution Units • Execution units operate in parallel – – – Fetch / Branch Integer Floating Point Load / Store System Completion [Rev 2. 8] 9 of 107
Fetch / Dispatch • • Instructions are fetched in pairs Non-branch instructions enter the instruction queue Branch instructions are redirected to the branch unit Two instructions can be sent to the execution units and one to the branch unit for a total of three issued instructions per clock • All instructions “appear” to execute sequentially [Rev 2. 8] 10 of 107
On each CPU clock: 64 bit wide transfer from instruction cache Instruction Cache Instructions fall through to first open location in queue Instruction Instruction Branch instruction closest to the bottom of the queue is issued to the branch unit on each clock Bottom two non-branch instructions are dispatched to available execution units Instruction Execution Unit Instruction Branch Processing CTR CR LR [Rev 2. 8] 11 of 107
Branch • Branches are pre-executed, giving an effective execution time of zero clocks • Instruction queue provides look ahead to determine data dependencies • Unresolved conditional branches are statically predicted under control of the compiler [Rev 2. 8] 12 of 107
Subroutine Control Flow Software maintained stack Address of this instruction is placed into the Link Register by the branch function GPR 1 Branch to sub LR Instructions save the LR to the stack to allow nested function calls Branch to sub The LR is reused for another call LR Branch to LR The LR is recalled from the stack to allow a return from subroutine Branching to the contents of the LR is a return instruction [Rev 2. 8] 13 of 107
Integer • Integer unit directly accesses the GPR file • Rename registers prevent stalls and allow instructions to be un-executed • Most instructions execute in one clock • Divides have been optimized over the 603 to reduce latency by 50% [Rev 2. 8] 14 of 107
Floating Point • Floating point unit directly accesses the FPR file • Rename registers prevent stalls and allow instructions to be un-executed (The same as in the integer GPR file) • Supports single (32 bit) and double (64 bit) precision operands • Three stage pipeline accepts one instruction per clock • Supports all IEEE 754 floating-point data types (normalized, denormalized, Na. N, zero, and infinity) in hardware, eliminating the latency incurred by software exception routines [Rev 2. 8] 15 of 107
Load/Store • Responsible for all transfers between the GPR file and main memory • Instructions appear to execute in order • Actual accesses can occur out of order • Loads from cache execute in one clock with a two clock latency • Stores to cache execute in one clock with a latency of three clocks • Speculative loads are placed in the rename registers • Speculative stores remain in the store queue [Rev 2. 8] 16 of 107
System • Performs moves to and from SPR’s • Doubles as an auxiliary integer unit – Executes add / compare instructions – Executes condition register logical operations • Instructions that affect processor mode force serialization of the processor [Rev 2. 8] 17 of 107
Completion • Holds instructions executed in parallel or out of order until they can be retired in order • Retiring an instruction commits it’s results to the processor state • Simply discarding an instruction from the completion queue effectively un-executes it • Two instructions can be retired per clock [Rev 2. 8] 18 of 107
Instruction Set • 68 K instructions were based on an accumulator, direct memory model add (0 x 00035300). L, D 4 D 0 D 1 D 2 D 3 D 4 D 5 D 6 D 7 0 x 00035300 + [Rev 2. 8] 19 of 107
Instruction Set • Power. PC instructions are based on a triadic, load/store model lwz add r 2, 0 x 00035300 r 6, r 2, r 4 GPR 0 GPR 1 GPR 2 GPR 3 GPR 4 GPR 5 GPR 6 GPR 7 0 x 00035300 + GPR 31 [Rev 2. 8] 20 of 107
Exceptions • All exceptions cause processing to vector to a predetermined memory location • The base address of the vector table is controlled by the [IP] bit in the MSR • Each vector is placed at a page boundary • • • 64 instructions can be placed at a vector before hitting the next vector Reset = 0 xnnn 00100 Machine Check = 0 xnnn 00200 External Interrupt = 0 xnnn 00500 Decrementer = 0 xnnn 00900 Etc. [Rev 2. 8] 21 of 107
Exceptions Flash MSR[IP] = 1 FFF 00100 Instruction 64 instructions External 500 Instruction 64 instructions ISI DSI RAM 00000100 MSR[IP] = 0 400 300 Machine Check 200 Reset 100 Instruction Instruction 64 instructions [Rev 2. 8] 22 of 107
Exceptions • Only the Decrementer and the External Interrupt can be masked by the [EE] bit in the MSR • Machine Check exceptions can vector to a routine or force Checkstop state • All other exceptions are synchronous (caused by instruction execution) and are unmaskable [Rev 2. 8] 23 of 107
Nesting Exceptions • When an exception occurs, return state is stored in the processor • • There is no automated stacking of critical registers The address of the return instruction is stored in SRR 0 The MSR prior to the exception is in SRR 1 The [EE] bit of the MSR is cleared • The processor must save these registers and any other GPR’s to a software maintained stack • The EABI specifies GPR 1 to be the stack pointer • The [RI] bit in the MSR is set by software when enough information is saved to allow recovery from a nested exception [Rev 2. 8] 24 of 107
Exception Control Flow Address of this instruction is placed into SRR 0 by the hardware An exception after the completion of this instruction causes flow to be directed to the ISR Software maintained stack GPR 1 SRR 0 SRR 1 Instructions save the SRR’s to the stack to allow nested exceptions The MSR[RI] bit is cleared by the exception hardware and set by software after the SRR’s have been saved An exception while MSR[RI] is cleared causes a machine check event The MSR[RI] bit is cleared by the software just before the SRR’s are restored by the software It is safe for exceptions to occur in this section of code Breakpoints Are Exceptions! The SRR’s is recalled from the stack to allow a return from subroutine rfi [Rev 2. 8] 25 of 107
Cache • Independent instruction and data caches implements an internal Harvard Architecture • Each cache is 16 Kbyte, four way set associative • Caching of separate memory areas is controlled by the MMU [Rev 2. 8] 26 of 107
Cache Organization 0 Stored in address tag (20) Set select (7) 31 Word Byte Way 0 Block 508 Way 1 Block 509 Way 2 Block 510 Way 3 128 sets Block 511 Way 0 Address Tag 0 Way 1 Address Tag 1 Way 2 Address Tag 2 Way 3 Address Tag 3 State Words 0 -7 Block 0 Words 0 -7 Block 1 State Words 0 -7 Block 2 Words 0 -7 Block 3 [Rev 2. 8] 27 of 107
Cache Operation • Each cache block (or line) can be in one of three state (MEI protocol) – M = modified (or dirty) • Resides in cache and is different than memory – E = exclusive (resident and clean) • Resides in cache and is identical to memory – I = invalid (not resident) • The “shared” state of the full MESI protocol is not supported – Would allow synchronization of multiply cached blocks • There is no cache coherency for the instruction cache [Rev 2. 8] 28 of 107
Cache control • Hardware implementation dependent registers (HIDn) control cache function – Enabling – Invalidate – Locking • Supervisor instructions provide block level control – Allocate, flush, invalidate, store, touch, zero • Ability to store a given block of memory into the cache is controlled by the MMU – Each block or page in the MMU has WIMG bits • (Write-through, Inhibited, Global, Guarded) [Rev 2. 8] 29 of 107
MMU • The MMU provides for both memory translation and access control • The system boots in Real (un-translated) mode • To effectively use the caches, the MMU must be used in block or page mode – Effectively, a null translation is performed [Rev 2. 8] 30 of 107
Protection • The primary use of the MMU in embedded applications is for cache control and access protection • The WIMG bits are set for each page – – W = write-through (applicable only to data cache) I = inhibited M = memory coherency supported in hardware G = guarded (indicates that memory is ill-behaved) • I/O spaces • All accesses are forced to be in order • No speculative reads or pre-fetches [Rev 2. 8] 31 of 107
Translation • Block or page translation allows the full use of a virtual memory model • Block translation provides a memory space of 232 bytes • Page translation provides a virtual memory space of 252 bytes • System must be debugged with RTOS tools – Emulators and hardware debuggers don’t support it [Rev 2. 8] 32 of 107
Real mode 32 Logical address WIMG: W = 0: write-back I = 0: cache enable M = 1: data is global G = 1: memory is guarded 32 Physical address [Rev 2. 8] 33 of 107
BAT mode 4 11 17 BL (11) BEPI (15) BAT Reg n 4 WIMG 11 Logical address 4 BRPN & 11 + 17 Physical address [Rev 2. 8] 34 of 107
Page mode Logical address 4 16 12 Segment register Virtual address 24 16 12 40 TLB page table 20 WIMG 12 Physical address [Rev 2. 8] 35 of 107
Reset operation Reset Source Power-on reset External hard reset Software watchdog Bus monitor Checkstop External soft reset Reset PLL System configuration sampled Clock module reset HREST driven Other internal logic reset SREST driven Core reset yes yes yes yes [Rev 2. 8] 36 of 107
Reset Types • Power-on reset is used to align all logic from a chaotic state after Vcc stabilizes – The PLL then begins to lock • Hard reset is analogous to the normal reset on other processors – The PLL is not affected • Soft reset can be used to initiate a warm start – Not commonly used – Not driven or monitored by the emulator – Basically, a non-returnable exception to the reset vector [Rev 2. 8] 37 of 107
Reset Sequence POR asserted HRESET asserted SREST asserted HREST & SREST asserted PLL locks RSTCONF sampled Internal logic reset HREST & SRESET negated [Rev 2. 8] 38 of 107
Memory Map Startup Boot Map Before Config Word At boot, CS 0 is active for one of two large areas of the address space. All other chip selects are invalid. Flash Flash IMMR Flash CS 0 After Config Word Application Target Map IMMR CSi IMMR I/O Flash CSx, y, z Flash RAM [Rev 2. 8] 39 of 107
Memory Map Implications • Since the Flash memory access by CS 0 occupies one of two large areas in the address space, boot code can be linked to execute in a number of different locations • Any branches will change the NIA from the boot location to the linked location • All other chip selects are off • IMMR RAM is still available • CS 0 must be reduced in scope before activating other chip selects • Be careful no to pull the rug out from under the boot code when reducing CS 0 • BSP re-entry issues: • Altering chip select option registers while assuming the value in the Valid bit • Can the chip selects to the RAM and Flash be altered while running out of either? [Rev 2. 8] 40 of 107
Memory Map Init Issues • Three different factors can enhance (confuse) the boot process: • The MSR[IP] • The reset vector can be 0 x 0000_0100 or 0 xfff 0_0100 • Determined by the Reset Configuration Word • Not changed by an SRESET • CS 0 scope • CS 0 responds to either a the upper or lower end of the memory map • It must be changed while it is being used • It may have already been reduced by a previous pass through the BSP • Code link results • Execution can start in code that is linked to a different address than the boot vector • Only the address lines within the memory device are significant • PC Relative addressing will solve this, right? WRONG! • The first branch, will set the NIA MSB’s to the current execution value [Rev 2. 8] 41 of 107
RTOS Boot Sequences Compressed application image Flash External application image Boot Code Boot code decompresses and relocates application from flash BSP IMMR Data, stack, heap, etc. I/O Chip Select x Uncompressed application image BSP Boot code loads application over communication channel or backplane Base Register Base Address RAM V Option Register Mask Options [Rev 2. 8] 42 of 107
Endian Bus Connections 31 MS Byte Lane 24 7 0 8 Bit 7 LS Byte Lane 0 7 0 8 Bit 0 MS Byte Lane 7 7 0 8 Bit 68 K 7 LS Byte Lane 0 31 MS Byte Lane 24 X 86 PPC 24 LS Byte Lane 31 [Rev 2. 8] 43 of 107
Big Endian Bus 8 Bit 16 Bit 7 -0 15 -8 0 -7 7 -0 0 -7 8260 31 -24 23 -16 15 -8 7 -0 0 -7 8 -15 0 7 8 15 16 23 24 31 32 39 40 47 48 55 56 63 32 Bit 8 -15 16 -23 MS Byte Lane Byte Lane LS Byte Lane 24 -31 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 [Rev 2. 8] 64 Bit 44 of 107
Configuration Word • Configuration word is latched from Flash memory during reset cycle • A 32 bit value is loaded 8 bits at a time from the high order bits of the data bus – Immune to boot memory width • RSTCONF pin allows configuration word to be forced to all zero • Multiple 8260 can access the same memory device [Rev 2. 8] 45 of 107
Configuration Word Contents EARB EXMC CDIS EBM BPS CIP BMS BBD ISPS MMR • • EARB – External arbitration EXMC – External memory controller CDIS - Core disable EBM - External bus mode • • BPS – Boot port size CIP – Core initial prefix • • • ISPS – Internal space port size L 2 CPC – L 2 cache control pins DPPC – Data parity pin configuration • L 2 CPC LBPC DPPC - APPC ISB CS 10 PC - MODCK_H ISB – Internal space base address Ø BMS – Boot memory space Ø BBD – Busy bus disable MMR – Mask Masters request LBPC – Local bus pin configuration APPC – Address parity pin configuration CS 10 PC – CS 10 pin configuration MODCK_H – MODCK high order bits Ø Ø Ø [Rev 2. 8] 46 of 107
Configuration Word Format 8 bit wide boot device Address offset from CS 0 603 bus MSB byte lane (0 -7) 0 x 00 0 x 01 Byte 0 Ignored 0 x 08 0 x 09 Byte 1 Ignored 0 x 10 0 x 11 Byte 2 Ignored 0 x 18 0 x 19 32 bit wide boot device Address 603 bus offset from MSB byte CS 0 lane (0 -7) Byte 3 Ignored 0 x 00 0 x 04 0 x 08 0 x 0 C 0 x 10 0 x 14 0 x 18 0 x 1 C Byte 0 Ignored Byte 1 Ignored Byte 2 Ignored Byte 3 Ignored 603 bus byte lane (24 -31) Ignored Ignored Ignored Ignored [Rev 2. 8] Ignored Ignored 47 of 107
Configuring a single 8260 A bus D bus Vcc RSTCONF 8260 A bus D bus Boot Flash RSTCONF [Rev 2. 8] 48 of 107
Configuring multiple 8260’s Master 8260 A bus D bus Boot Flash RSTCONF Slave 1 A bus D bus 8260 RSTCONF Slave 7 A bus D bus 8260 RSTCONF A 0 A 6 [Rev 2. 8] 49 of 107
SIU • The SIU contains the logic to interface the external system components to the 8260 • Contains all of the glue logic needed for a typical embedded application [Rev 2. 8] 50 of 107
SIU Overview SYSTEM INTERFACE UNIT 60 x Bus Interface Unit Power. PC-to-Local Bridge Local Bus Interface Unit Memory Controller Time Counter/PIT Bus Arbiter L 2 Cache Controller System Functions [Rev 2. 8] 51 of 107
603 e Bus • Very high performance bus – – – Separate address and data tenures Pipelined Bursting Multi-master Cache snooping [Rev 2. 8] 52 of 107
603 e bus cycle Address only cycle to support cache snoop Address Data [Rev 2. 8] 53 of 107
Local Bus Two busses, one address map: Address map Flash Code/Data SDRAM CPM Buffer SDRAM Code/Data SDRAM Memory Control CPM Buffer SDRAM [Rev 2. 8] 54 of 107
Memory Control • 12 banks of memory – Each can be configured for any type of device • Glueless support of SDRAM devices • Glueless support of SRAM, EPROM, Flash – Using general purpose chip select machine • Three user programmable machines • All memory controllers can be allocated to either the 603 or local bus [Rev 2. 8] 55 of 107
System control • • Clock synthesis Reset control Interrupt control Real time clock Periodic interrupt timer Bus monitor Bus arbiter Watchdog timer [Rev 2. 8] 56 of 107
Interrupt Control Software Watchdog Timer Or IRQ 0 IRQ[0 -7] MCP Fall / Level Port C [0 -15] CPM Channels On board Timers Edge / Fall Interrupt Controller IRQ[1 -7] INT 603 Core [Rev 2. 8] 57 of 107
SIU Interrupt Vectors • All external interrupts cause processing at 0 xnnn 00500 – There is space for 64 instructions to save processor state and resolve the SIU vector • Vectors are six bits – Shifting w/ indirect addressing is used to decommutate to service routines – A 16 bit load from the long word address of the SIVEC register will point to a 64 entry array of 1 K byte (256 instructions) service routines. – An 8 bit load will allow a 64 entry jump table of branch instructions [Rev 2. 8] 58 of 107
SIU Interrupt Vector Register 5 6 0 Six Bit Interrupt Code 0 7 8 0 0 15 16 0 0 0 0 31 0 0 0 0 8 bit read from address 0 xnnn 10 C 04 16 bit read from address 0 xnnn 10 C 04 32 bit read from address 0 xnnn 10 C 04 [Rev 2. 8] 59 of 107
SIU Interrupt Vectors 8 bit Read Six Bit Interrupt Code 0 0 Table of branch instructions to ISRs Each vector value points to a different branch instruction in the table ba routine_g ba routine_f ba routine_e ba routine_d ba routine_c ba routine_b ba routine_a _18 _14 _10 _0 c _08 _04 _00 [Rev 2. 8] 60 of 107
SIU Interrupt Vectors 16 bit Read Six Bit Interrupt Code 0 0 0 0 0 nnnn 0 fff Each vector value points to a block of 1 K bytes / 256 instructions 256 32 -bit instructions nnnn 0 c 00 nnnn 0 bff 256 32 -bit instructions nnnn 0800 nnnn 07 ff 256 32 -bit instructions nnnn 0400 nnnn 03 ff 256 32 -bit instructions nnnn 0000 [Rev 2. 8] 61 of 107
CPM • Communications processor module • Direct hardware support for all protocol and application interfaces – Ethernet, ATM, HDLC, T 1/E 1, T 3/E 3, Bi. Sync, UART, ISDN, PCM highway – Parallel I/O – Full serial and virtual DMA support [Rev 2. 8] 62 of 107
IMMR Format • All on-chip peripherals are accessed though a single 128 K byte area of memory • Within the first 64 K of address space, there are three blocks of dual ported RAM • The second 64 K of address space contains the control registers of the on-chip peripherals [Rev 2. 8] 63 of 107
0 x 1_ffff IMMR Map Upper 64 K Hardware Registers 0 x 1_4000 SI routing RAM (8 K) 0 x 1_2000 0 x 1_1 c 00 Control registers (7 K) 0 x 1_0000 0 x 0_c 000 0 x 0_b 000 Lower 64 K Dual Ported RAM FCC Data (4 K) 0 x 0_9000 0 x 0_8000 Parameter RAM (4 K) 0 x 0_4000 Buffer Descriptors / u. Code / Data (16 K) 0 x 0_0000 [Rev 2. 8] 64 of 107
Dual Ported RAM usage • The layout of the Dual Ported RAM is determined by the u. Code in the CPM • When the CPM is not in operation, it is nothing more than internal memory – During the boot sequence, stack, global data, and heap can reside in this memory – Initialization code can be written in C++! – A multi-layered boot process can be used • First code resides in flash, uses internal RAM to setup chip selects • Second code resides in another section of flash and uses external RAM to load main application over a CPM channel • Third level is the main application – Each level has it’s own crt 0. s function and initializes the EABI from scratch [Rev 2. 8] 65 of 107
CPM Overview COMM. PROCESSOR MODULE Four Internal Interrupt Timers Memory Controller Parallel I/O Space Baud Rate 32 -bit RISC and Generators Timers Program ROM Serial DMAs Virtual IDMAs MCC 1 MCC 2 FCC 1 FCC 2 FCC 3 SCC 1 SCC 2 SCC 3 SCC 4 SMC 1 SMC 2 SPI I 2 C Time Slot Assigner Serial Interface [Rev 2. 8] 66 of 107
DMA’s • Serial DMA’s – Full bi-directional support of all serial channels – Can access the 603 or local bus • Virtual DMA – 4 channels – Uses the serial DMA hardware to generate transfers – Memory to memory or memory to/from I/O [Rev 2. 8] 67 of 107
CPM Buffer Structure BD 128 IMMR BD 3 BD 2 BD 1 RAM [Rev 2. 8] 68 of 107
Buffer Descriptor Format 16 bits Status and Control Data Length High Order Pointer Low Order Pointer [Rev 2. 8] 69 of 107
From Channel to Buffer Location fixed by: - Hardware channel Format fixed by: - Protocol Communication Channel hardware Parameter RAM Dual ported RAM (Buffer Descriptors) Location determined by: - Value in Buffer Descriptor - Memory controller mapping of Local/603 bus Format determined by: - Protocol Data Buffers Location determined by: - Parameter RAM value Format of control and status determined by Protocol [Rev 2. 8] 70 of 107
SCC’s • The SCC’s implement the following protocols: – – SDLC/HDLC Apple. Talk UART 10 -Mbps Ethernet [Rev 2. 8] 71 of 107
Ethernet Frame Stored by CPM in Receive buffer Stored by CPU in Transmit buffer Preamble Start Frame Destination Address Source Address Type / Length 7 bytes 1 byte 6 bytes 2 bytes Data 46 - 1500 bytes Frame Check 4 bytes [Rev 2. 8] 72 of 107
Ethernet Buffer Descriptor Receive Control & Status E Transmit Control & Status R Common for Transmit and Receive - W I L F PAD - M - LG NO SH CR OV CL W I L TC DEF HB RC RL RC UN CSL Data Length High Order Pointer Low Order Pointer [Rev 2. 8] 73 of 107
Status and Control Definitions Receive Control & Status E - W I L F - M - LG NO SH CR OV CL First in Frame: Set by the CPM to inform the CPU that this is the start of a new frame. Last in Frame: Set by the CPM or the CPU to inform the other that this is the last buffer of a frame. Interrupt: Generate an interrupt after this buffer is used by the CPM. Wrap: This is the last BD in this set of BD’s. Empty / Ready: 0 = This buffer is owned by the CPU 1 = This buffer is owned by the CPM Transmit CRC: Transmit the CRC after this buffer Transmit Control & Status R PAD W I L TC DEF HB RC RL RC UN CSL [Rev 2. 8] 74 of 107
Transmit Frames Parameter RAM points to this BD R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=1 L = 1 TC = 1 R=0 W=0 I=0 L = 0 TC = 1 R=0 W=0 I=1 L = 1 TC = 1 R=0 W=1 I=1 L = 1 TC = 1 After all buffers are filled, “R” is set to “ 1” in all BD’s in this list These BD’s are for the next frame for this channel This BD is for a single buffer frame [Rev 2. 8] 75 of 107
Receive Frames Parameter RAM points to this BD E=1 W=0 I=0 L=0 F=1 E=1 W=0 I=0 L=0 F=0 E=1 W=0 I=0 L=1 F=0 E=1 W=0 I=0 L=0 F=1 E=1 W=0 I=0 L=0 F=0 E=1 W=0 I=0 L=1 F=0 E=1 W=1 I=0 L=1 F=1 After all buffers are filled, “E” is set to “ 1” in all BD’s in this list These BD’s are for the next frame for this channel This BD is for a single buffer frame [Rev 2. 8] 76 of 107
The [E/R] bits Initial Value Operation Transmit [Ready] 0 Fill with data by CPU Receive [Empty] 1 Fill with data by CPM Changed by Changed to Operation Changed by to CPU 1 CPM transmits buffer CPM 0 CPU reads buffer CPU 1 Polarity can be confusing because the sense is reversed for complementary operations. However, the same level always indicates who [CPU vs. CPM] owns the buffer. This bit is the same for all protocols on all channels. [Rev 2. 8] 77 of 107
The [W] bits • The Wrap bit is always set to indicate the last buffer descriptor for the channel • It does not delineate frames! • The value of the first buffer descriptor is stored in the channel’s parameter RAM – The list of BD’s is bounded by the parameter RAM and the [W] bit • Any BD past a BD with the [W] bit set, that’s not pointed to by parameter RAM is inaccessible by the CPM • This bit is the same for all protocols on all channels. [Rev 2. 8] 78 of 107
The [I] Bits • The Interrupt bits generate an interrupt to the CPU when the CPM hands the BD to the CPU – Whenever the CPM flips the [E/R] bit to “ 0” • A redundant phrase, the CPM can only flip that bit to “ 0”, right? • For transmit, it’s common to only receive an interrupt at the end of transmission of the last buffer • For receive, the last buffer is not known, so it’s more common to receive an interrupt for most buffers on non-frame oriented protocols – If a buffer is small enough that it can’t contain an entire frame, then this bit might be cleared • The CPU has to stay ahead of the CPM to know when a wrap occurred – On Ethernet, the end of frame interrupt is more efficient • This bit is the same for all protocols on all channels. [Rev 2. 8] 79 of 107
The [L] Bits • The Last bits indicate the end of a frame within the list of buffer descriptors • Set and cleared by the CPU on transmit frames – The CPM only reads this bit for transmit • Set by the CPM on receive frames – Should be cleared by the CPU before the [E] is used to hand the buffer to the CPM • This bit is not the same for all protocols on all channels. [Rev 2. 8] 80 of 107
The [F] Bits • The First bit is only present in receive frames • Set by the CPM to tell the CPU that this buffer starts a frame – An underrun, late collision, or aborted frame can cause a new frame in the next buffer without the [L] bit being set in the previous BD • Not needed for transmit – The CPU will control the state of the CPM with the [L] bit – An [L] bit set or an underrun will cause the next buffer to be considered the first buffer of a frame • This bit is not the same for all protocols on all channels. [Rev 2. 8] 81 of 107
The [TC] Bits • The Transmit CRC bits work in conjunction with the [L] bit • The [TC] bit is ignored if the [L] bit is cleared • Initializing all [TC] bits to “ 1” is a good precaution • Only custom protocols that don’t use hardware generated CRC’s should have this bit cleared • This bit is not the same for all protocols on all channels. [Rev 2. 8] 82 of 107
Subtle points on BD’s • Frames can span buffers • Buffers never span frames – Unless you have all hardware support turned off and are running transparent • Be careful with small receive buffers that have the [I] bit set – You’ll get hammered with interrupts • Turn buffers over to the CPM from last to first – If an interrupt interferes with the handoff, an underrun / overflow can occur • Hands off a BD with the [E/R] bit set – Unless you like working weekends [Rev 2. 8] 83 of 107
FCCs • The FCC’s support: – – 10/100 -Mbps Ethernet through an MII Full 155 Mbps ATM SAR through UTOPIA 45 Mbps HDLC (DS-3) Operation is similar to SCCs • Block mode allows buffers to be dynamically moved into dual ported RAM [Rev 2. 8] 84 of 107
FCC Buffer Descriptors • Identical in format to the SCC’s buffer descriptors • Except: – Buffer descriptors, as well as buffers are in main memory – Pointers to buffer descriptors in the parameter RAM are 32 bits • Buffer descriptors must still be in consecutive memory locations [Rev 2. 8] 85 of 107
SMC’s • The SMC’s perform basic UART as well as transparent mode transmission • Buffer description operation is identical to the SCC’s – The status and control word has different bit fields pertaining to the protocols – Bit fields controlling protocol independent operation are unchanged [Rev 2. 8] 86 of 107
Status and Control Definitions [SMC in UART mode] Receive Control & Status E - W I - - CM ID - BR FR PR - OV - Idle: Close buffer on reception of idles Continuous mode: [E] bit isn’t cleared on buffer reception Interrupt: Generate an interrupt after this buffer is used by the CPM. Wrap: This is the last BD in this set of BD’s. Empty / Ready: Transmit Control & Status R - W I - - CM P - - 0 = This buffer is owned by the CPU 1 = This buffer is owned by the CPM - - - [Rev 2. 8] 87 of 107
MII PQ II MPC 8260 FCCn Transmit Error (Tx_ER) Transmit Nibble Data (Tx. D[3: 0]) Transmit Enable (Tx_EN) Transmit Clock (Tx_clk) Collision Detect (COL) Receive Nibble Data (Rx. D[3: 0]) Receive Error (Rx_ER) Receive Clock (Rx_clk) Receive Data Valid (Rx_DV) Carrier Sense output (CRS) Management Data Clock (MDC) Management Data I/O (MDIO) Fast Ethernet PHY [Rev 2. 8] 88 of 107
Utopia Interface A[24 -31] D[0 -7] BCTL 0* PWE 0*/PDQM/PBS 0* ATMCS 0* ATMRST* DP 6/CSE 0/IRQ 6* MPC 8260 A[7 -0] D[7 -0] CS* RD* WR* RST* ALE INT* PM 5350 [Rev 2. 8] 89 of 107
Applications • Performance drives the complexity of the 8260 system – Single processor • Single 8260 • Multiple 8260’s with all but one core turned off • Multiple 8260’s with all cores off, using an external MPC 750 – Multiple processor • Combinations of 8260’s and 750’s [Rev 2. 8] 90 of 107
Single 8260 MPC 8260 SDRAM/SRAM/DRAM/Flash 60 x Bus PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2. 8] 91 of 107
Multiple 8260 s MPC 8260 PHY SDRAM/SRAM/DRAM Local Bus Communication Channels ATM Connection Tables SDRAM/SRAM/DRAM/Flash 155 Mbps ATM PHY UTOPIA 60 x Bus MPC 8260 PHY Communication Channels SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2. 8] 92 of 107
MPC 7 xx w/ 8260(s) MPC 7 xx Backside Cache 32 -Kbyte I cache 32 -Kbyte D cache MPC 8260 PHY Communication Channels SDRAM/SRAM/DRAM/Flash 60 x Bus SDRAM/SRAM/DRAM 155 Mbps ATM PHY UTOPIA Local Bus ATM Connection Tables [Rev 2. 8] 93 of 107
Debug Considerations Ø Ø Ø Ø What is JTAG Limitations Getting out of reset The 60 x Core and Bus The cache is on CPM Realities Exception Routines Tracing at the Bus Cycle Level [Rev 2. 8] 94 of 107
What is JTAG? Ø Ø Ø JTAG is a SLOW serial connection to the 8260 CPU resources The serial data is called the scan chain. JTAG provides the ability to modify memory and registers. The scan chain for each processor is different. JTAG was not created for Debug… [Rev 2. 8] 95 of 107
JTAG connection • JTAG connector allows for full run control of the processor • The emulator can sync with the processor without disrupting it’s state TDO TDI QREQ* TCK TMS SRESET* HRESET* XBR 3* TRST* 3. 3 V GND [Rev 2. 8] 96 of 107
JTAG Limitations Slow download of code to RAM. Ø JTAG accesses during execution MAY dramatically affect performance. Ø All commands through JTAG must be “scanned in” Ø [Rev 2. 8] 97 of 107
Getting out of Reset Ø Ø Ø Reset Configuration word of vital importance TRST must not be permanently asserted When flashing your boot code, be careful to replace or keep the configuration word What is your Interrupt Prefix? Switchable pullup on RSTCONF*? [Rev 2. 8] 98 of 107
The 60 x Core and Bus a STOP instruction must be scanned in (no breakpoint pin) Ø only one hardware code breakpoint available; no hardware data breakpoints Ø Address and Data do not necessarily appear on the bus at the same time Ø Predictive Fetching means what you see on the bus may not be executed. Ø [Rev 2. 8] 99 of 107
The Caches are On • Bus Cycles now appear as bursts • Fetches are determined by the BIU, not related to instruction execution • No Cache Visibility pins • Instrumentation required for accurate debug • Caution must be exercised when the boot process performs a code relocation – – – Contents are cached as data during the move Contents are fetched as instructions after the move The instruction queue doesn’t snoop the data cache The load/store unit doesn’t snoop the instruction cache There is no cache coherency for the instruction cache [Rev 2. 8] 100 of 107
CPM Realities • The CPM operates independently of the CPU • The CPM is not debugged yet. . Expect the unexpected • Early releases of the silicon didn’t propagate watchdog resets to the external reset pin • “Last Buffer Interrupt” occurs at the beginning of transmission [Rev 2. 8] 101 of 107
Exception Routines are difficult to debug Ø The Recoverability of exceptions is an issue Ø On board hardware breakpoints do not work in the head or tail of an exception handler Ø [Rev 2. 8] 102 of 107
Tracing at the Bus Cycle Level The 8260 comes in a BGA package Ø Connecting to an emulator Ø Connecting to an analyzer Ø [Rev 2. 8] 103 of 107
Connecting to an Emulator Connection to Emulator Buffer Board Original 8260 BGA site to pin socket Target Adaptor Pin header Target board [Rev 2. 8] 104 of 107
Connecting to an Analyzer Mictor Connectors 8260 Target board [Rev 2. 8] 105 of 107
Connecting to an Emulator or Analyzer Connection to Emulator Socket to Mictor adaptor - OR Buffer Board Original 8260 BGA site to pin socket Target Adaptor Pin header Target board [Rev 2. 8] 106 of 107
Summary of debug issues • Init MMU before turning on caches • Loads and stores can be re-ordered • The CPM doesn’t use the MMU’s or the caches • Don’t single step through moves to or from SPR’s • ISR’s can not have breakpoints in the first or last few instructions • Each processor must have it’s own JTAG connector • JTAG lines must be terminated with 1 K or 2 K values (depending on the signal) • JTAG connector should be within 2 inches of the processor • Provide for the ability to pull RSTCNFG high • When using the 750 as the CPU, provide the ability to access the 8260 configuration word in flash • Don’t place code or program data on the local bus [Rev 2. 8] 107 of 107


