77e1f437ae6a883947531cfcf605a589.ppt

- Количество слайдов: 37

Click to edit Master title style A Flexible VLSI Architecture for Extracting Diversity and Spatial Multiplexing Gains in MIMO Channels Chia-Hsiang Yang University of California, Los Angeles Qualifying Exam 11/19/2007 UCLA

Motivation w Limited spectrum motivates MIMO – Growing demand for high-speed wireless connectivity – Less availability of unlicensed spectrum bands w MIMO used for range and rate increase w Signal processing implementations are very complex – Diversity algorithms (increased range) – Spatial multiplexing algorithms (increased rate) Chia-Hsiang Yang – Qualifying Exam UCLA 2

Existing Solutions w Diversity algorithms – Repetition scheme – Alamouti scheme – Space-time coding w Spatial multiplexing algorithms – Bell Labs Layered Space Time (BLAST) algorithm – Singular Value Decomposition (SVD) – QR decomposition These are all point-wise solutions… Chia-Hsiang Yang – Qualifying Exam UCLA 3

Fundamental Diversity-Multiplexing Tradeoff w Tradeoff curve in diversity-multiplexing space Optimal tradeoff curve Tradeoff curve of Tradeoff curve Repetition and of V-BAST Alamouti schemes L. Zheng and D. Tse, “Diversity and Multiplexing: A Fundamental Tradeoff in Multiple-Antenna Channels, ” IEEE Tran. Information Theory, vol. 49, no. 5, pp. 1073– 1096, May 2003. Can we span the entire curve (unify point-wise solutions)? Chia-Hsiang Yang – Qualifying Exam UCLA 4

Unified Sphere Decoder w Sphere decoding can extract both diversity and spatial multiplexing gains Repetition Alamouti Space-time array size re he ing Sp od c de Diversity gain (range) – Diversity gain d: error probability decays as 1/SNRd – Spatial multiplexing gain r: channel capacity increases proportionally to r log (SNR) array size BLAST SVD QR Spatial multiplexing gain (rate) Chia-Hsiang Yang – Qualifying Exam UCLA 5

Outline w Sphere decoding algorithm – Existing implementations – Remaining challenges w Layered design methodology – Algorithm-architecture-circuit optimization w Preliminary results – Scalable architecture design – FPGA hardware emulation w Research plan – Ongoing work: multi-core architecture – Future work: ASIC implementation and verification – Summary and timeline Chia-Hsiang Yang – Qualifying Exam UCLA 6

MIMO System Model w Signal model: received signal w Approach 1: Maximum-Likelihood (ML) detection w Approach 2: Decompose H as H = QR – Q is unitary, i. e. QHQ=I – R is an upper triangular matrix or Chia-Hsiang Yang – Qualifying Exam UCLA 7

Basic Idea of Sphere Decoding w Exhaustive search: complexity exponential with the number of Tx antennas (M) and constellation size (k): O(k. M) w Sphere decoding: Only nodes within the search radius are examined. Discarding a potential symbol in the tree topology removes all the nodes of it’s branch: complexity is O(M 3) # Tx antennas Chia-Hsiang Yang – Qualifying Exam UCLA 8

Complexity Reduction of Sphere Decoding w Near optimal ML performance with significantly reduced computational complexity (# search points) w Mainly contributed by tree pruning Algorithm Exploration Computational Complexity 4 4 array 16 -QAM Total # search points=65536 Near ML detection with 0. 1% computational complexity Chia-Hsiang Yang – Qualifying Exam UCLA 9

Sphere Decoding Algorithm w Definition: cumulative w is Partial Euclidean Distance (PED) Decoding from the last row (root in the tree topology) – Traverse the constellations points within the search radius – Since PED increases monotonically, any branch with PED larger than the minimum PED should be discarded ant-1 ant-2 Decoding sequence ant-M Chia-Hsiang Yang – Qualifying Exam UCLA 10

State-of-the-Art and Proposed Work w Prior work: fixed array size, fixed modulation scheme – Antenna array size: up to 8 x 8 (mostly 4 x 4) – Modulation scheme: up to 16 -QAM Array size Modulation Search method Knagge, 2006 [1] 8 8 QPSK K-best Barbero, 2006 [2] Guo, 2006 [3] 4 4 16 -QAM K-best Burg, 2005 [4] Garrett, 2004 [5] 4 4 16 -QAM Depth-first w This work: flexible sphere decoder architecture – – Antenna array size (2 2 to 16 16) Modulation (BPSK to 64 -QAM) Search method (K-best and depth-first) Also, number of sub-carriers (16 to 128) Chia-Hsiang Yang – Qualifying Exam UCLA 11

Expected Research Contributions w A unified sphere decoder architecture for extracting diversity and spatial multiplexing gains in MIMO wireless channels w A flexible architecture – – Antenna array: 2× 2 to 16× 16 Modulations: BPSK to 64 -QAM Number of sub-carriers: 16 to 128 Search method: K-best or depth-first search w A simplified multiplier – Numerical strength reduction and Gray coding w A region partition based enumeration method w A multi-core architecture for enhanced performance Chia-Hsiang Yang – Qualifying Exam UCLA 12

Challenge #1: Search Method w Major types of tree search methods – Depth-first: starts the search from the root of the tree and explores as far as possible along each branch until a leaf node is found – K-best: Keeps only K branches with the smallest partial Euclidean distance (PED) at each level w Basic architecture Chia-Hsiang Yang – Qualifying Exam UCLA 13

Challenge #2: Antenna Array Size w Area and delay tradeoffs for two architectures – Multiple stage architecture ● Area increases quadratically with # transmit antennas – Folding architecture ● Critical path increases linearly with # transmit antennas Impact on critical path delay Impact on area Chia-Hsiang Yang – Qualifying Exam UCLA 14

Challenge #3: Number of Sub-Carriers w Data-stream interleaving – Samples of independent data streams can be introduced in the loop – Improve area efficiency through logic sharing – Provide flexibility for varying number of data sub-carriers Interleaved by 2 (Noble Identity) Y(t), Y(t-1) … Y 1(t), Y 2(t), Y 1(t-1), Y 2(t-1), … Chia-Hsiang Yang – Qualifying Exam UCLA 15

Challenge #4: Modulation w Hardware cost grows quickly with the modulation size w Schnorr-Euchner (SE) enumeration – Traversing the constellation candidates according to the distance increment in an ascending for each transmit antenna – Corresponding to finding the points closest to bi and scaling constellation points Rii. Qi from the closest to the farthest Q Q 1 Q 2 bi Rii I Q 3 Chia-Hsiang Yang – Qualifying Exam Q 4 Exhaustive search UCLA 16

Outline w Sphere decoding algorithm – Existing implementations – Remaining challenges w Layered design methodology – Algorithm-architecture-circuit optimization w Preliminary results – Scalable architecture design – FPGA hardware emulation w Research plan – Ongoing work: multi-core architecture – Future work: ASIC implementation and verification – Summary and timeline Chia-Hsiang Yang – Qualifying Exam UCLA 17

Numerical Strength Reduction w Multiplier size is the key factor for complexity reduction w Two equivalent representations w The latter is a better choice from hardware perspective – Rationale: s. ZF and QHy can be precomputed – Wordlength of s (3 -bit for real/imag part) is usually shorter than s. ZF separate terms multipliers with reduced wordlength / area ● Area and delay reduction due to numerical strength reduction Chia-Hsiang Yang – Qualifying Exam UCLA 18

Simplified Multiplier w Since s (decoded symbols) is encoded with Gray code, each real multiplier can be implemented by shift, add and invert operations – One complex multiplier = 6 adders + inverters and multiplexers w A total 40% area reduction compared to traditional approach Corresponding “multiplier” s[2: 0] representation Chia-Hsiang Yang – Qualifying Exam UCLA 19

Metric Calculation w Metric Calculation Unit (MCU) computes w Bi-directional shift register chain is embedded to support back-trace and forward-trace w Area-efficient storage of R matrix coefficients: off-diagonal terms organized into a square memory Chia-Hsiang Yang – Qualifying Exam UCLA 20

Metric Enumeration Unit (MEU) w Enumerates the possible constellation points according to the w w distance increment in an ascending order Exhaustive search is not suitable for large constellation size A region partition method simplifies the SE enumeration Q Rii. Qi Exhaustive search bi I Region partition search Chia-Hsiang Yang – Qualifying Exam --- decision boundary UCLA 21

Detailed MEU Circuit w Instead of deciding w w in the constellation plane, decide bi in a constellation plane scaled by Rii The decision boundary (db) is denoted as , then we simplify to calculation Decoded symbols are remapped to support different modulations Chia-Hsiang Yang – Qualifying Exam UCLA 22

Symbol Remapping w As the constellation size is changed, symbol mapping and # of decoded bits should be adjusted simultaneously 64 -QAM 16 -QAM 00 (7) 00 01 (5) 01 01 11 (3) 11 11 10 (1) s[1: 0] 00 10 10 QPSK 00 01 11 11 10 10 Real Imaginary s[2] s[1] s[0] s[2] s[1] 64 -QAM (6 bit) × 16 -QAM (4 bit) × QPSK (2 bit) × BPSK (1 bit) 10 s[0] × × Chia-Hsiang Yang – Qualifying Exam × × × × UCLA 23

Enumeration for Remaining Candidates w We use geometric relationship instead of sorting algorithm w 8 surrounding constellation points are divided into 2 subsets: – 1 bit error and 2 bit errors if Gray coding is used w The 2 nd closest point is decided by the decision boundaries w The remaining points are decided by the search direction --- decision boundary search direction Chia-Hsiang Yang – Qualifying Exam UCLA 24

Outline w Sphere decoding algorithm – Existing implementations – Remaining challenges w Layered design methodology – Algorithm-architecture-circuit optimization w Preliminary results – Scalable architecture design – FPGA hardware emulation w Research plan – Ongoing work: multi-core architecture – Future work: ASIC implementation and verification – Summary and timeline Chia-Hsiang Yang – Qualifying Exam UCLA 25

Scaleable PE Architecture Parameter Configuration Modes Antenna array Any square matrix # b/w 2 2 and 16 16 Modulation BPSK, QPSK, 16 -QAM, 64 -QAM # sub-carriers 16, 32, 64, 128 Detection Depth-first, K-best Chia-Hsiang Yang – Qualifying Exam UCLA 26

Hardware Complexity Reduction w An overall 20 area reduction compared to 16 -bit direct mapped architecture – Signal processing & circuit techniques Chia-Hsiang Yang – Qualifying Exam UCLA 27

Hardware Emulation on BEE 2 FPGA Array w Graphical Matlab-Simulink environment is used for w bit-true cycle-accurate algorithm simulation FPGA emulation speeds up simulation (~106 ) 1 PE element Au tom ate d. F PG Am ap pin g Chia-Hsiang Yang – Qualifying Exam UCLA BEE 2: 10 M gate equivalent 28

Hardware Emulation Results w Comparable BER performance of 4 4, 8 8, and 16 16, with w 10 0 10 -1 8 8 -2 BER 10 10 -4 10 10 4 4 -5 10 0 5 10 15 20 25 Eb/No (d. B) Chia-Hsiang Yang – Qualifying Exam 4× 4 ML -3 10 repetition 2 10 repetition 4 8× 8 5 d. B 16× 16 -5 10 -4 64 -QAM data -6 10 repetition 2 8 8 16 16 repetition 4 4× 4 -2 -3 10 0 -1 16 16 10 BER w different throughput given a fixed bandwidth Repetition coding by a factor 2 reduces the throughput by 2 , but improves BER performance An 8 8 system with repetition coding by 2 outperforms the ML 4 4 system performance by 5 d. B -6 16 -QAM data 10 0 5 10 15 20 25 Eb/No (d. B) UCLA 29

Hardware Complexity Comparison w Key features of the proposed architecture – – Highest reported area efficiency (normalized area per antenna) Highest reported antenna array size and constellation size Supports multiple sub-carriers and search methods The first hardware design to deploy the diversity-multiplexing tradeoff 16 16 18. 2 9. 2 6. 5 1. 3 [1] [2] 2. 5 1 [3] Chia-Hsiang Yang – Qualifying Exam [4] Supported array size Norm. area per antenna [This work] [5] [This work] UCLA 12 12 [1] 8 8 [2 -5] 4 4 2 2 BPSK QPSK 16 QAM 64 QAM Supported modulation 30

Outline w Sphere decoding algorithm – Existing implementations – Remaining challenges w Layered design methodology – Algorithm-architecture-circuit optimization w Preliminary results – Scalable architecture design – FPGA hardware emulation w Research plan – Ongoing work: multi-core architecture – Future work: ASIC implementation and verification – Summary and timeline Chia-Hsiang Yang – Qualifying Exam UCLA 31

Multi-core Architecture: Improved Performance w Sorting: enabled by recording the branch with minimum w w w Euclidean distance Radius checking: expands search area of single PE Candidate enumeration: more candidates Number of PEs decided from BER–area–power tradeoff Proposed 16 -PE architecture Chia-Hsiang Yang – Qualifying Exam UCLA 32

Choosing the Optimal Sorting Circuit w Serial sorting circuit is adopted in the design – Area efficient and low routing complexity – Leveraging the data-interleaving operation, N − 1 time slots are available for additional sub-carriers Serial Parallel SIMD Latency N n(n+1)/2 Area N/2 (n 2+n)N/4 N/2 Routing complexity Low Medium High *n=log 2 N Chia-Hsiang Yang – Qualifying Exam UCLA 33

ASIC Design and Verification w A hierarchical sensitivity-based optimization framework will be used to balance design variables w Optimization includes wordlength optimization, gate sizing, VDD optimization, and architecture techniques w The fabricated chip will be tested using an FPGA-based ASIC verification approach Client PC Chia-Hsiang Yang – Qualifying Exam Ethernet FPGA board UCLA GPIO ASIC board 34

Summary and Timeline w Proposal: flexible sphere decoder architecture for extracting diversity and spatial multiplexing gains w Completed work – Scaleable processing element that supports varying levels of data-rate and transmission range – FPGA hardware emulation for measuring SNR vs. BER characteristics for various operation modes w Future work – – Multi-core architecture. . . . (Dec. 2007) BEE 2 emulation of the multi-core design. . . (Dec. 2007) ASIC implementation……………. (Apr. 2008) ASIC verification………………. (Sep. 2008) Chia-Hsiang Yang – Qualifying Exam UCLA 35

Proposed Timeline w Publications: – C. -H. Yang, D. Markovic, “A Flexible VLSI Architecture for Extracting Diversity and Spatial Multiplexing Gains in MIMO Channels, ” submitted to IEEE International Conference on Communications (ICC’ 08) Chia-Hsiang Yang – Qualifying Exam UCLA 36

Sorting Circuit Choices Parallel SIMD (Single Instruction Multiple Data) Serial Chia-Hsiang Yang – Qualifying Exam UCLA 37