c92b4ed46899a2d2832c8de02980d5db.ppt
- Количество слайдов: 96
Kris Gaj Electrical and Computer Engineering George Mason University Towards secure cryptographic transformations efficient in both software and hardware: A case for synergy among math, computing, and engineering http: //ece. gmu. edu/crypto-text. htm
Motivation
Criteria used to evaluate cryptographic transformations Security Hardware Efficiency Software Efficiency Flexibility
Flexibility • Additional key-sizes and block-sizes • Ability to function efficiently and securely in a wide variety of platforms and applications low-end smartcards, wireless: small memory requirements IPSec, ATM – small key setup time in hardware B-ISDN, satellite communication – large encryption speed
Advanced Encryption Standard (AES) Contest 1997 -2001 June 1998 15 Candidates Round 1 from USA, Canada, Belgium, France, Germany, Norway, UK, Israel, Korea, Japan, Australia, Costa Rica Security Software efficiency Flexibility August 1999 5 final candidates Mars, RC 6, Rijndael, Serpent, Twofish October 2000 1 winner: Rijndael Belgium Round 2 Security Hardware efficiency
Europe NESSIE Project New European Schemes for Signatures, Integrity, and Encryption 2000 -2002 Japan CRYPTREC Project 2000 -2002
NESSIE, CRYPTREC Multiple types of transformations: • Symmetric-key block ciphers • Stream ciphers • Hash functions • MACs • Asymmetric encryption schemes • Asymmetric digital signature schemes • Asymmetric identification schemes Development of methodology of a fair evaluation and comparison of algorithms belonging to the same class, including software and hardware efficiency
Speed of the final AES candidates in hardware Speed [Mbit/s] K. Gaj, P. Chodowiec, AES 3, April, 2000 500 450 400 350 300 250 200 150 100 50 0 Serpent Rijndael Twofish RC 6 Mars
Survey filled by 167 participants of the Third AES Conference, April 2000 # votes 100 90 80 70 60 50 40 30 20 10 0 Rijndael Serpent Twofish RC 6 Mars
Results of the NSA group Hardware Speed [Mbit/s] 700 NSA ASIC 606 600 500 GMU FPGA 431 414 400 300 202 177 200 100 0 105 143 103 Rijndael Serpent Twofish RC 6 57 61 Mars
Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128 -bit key 192 -bit key 256 -bit key 30 25 20 15 10 5 0 Rijndael RC 6 Twofish Mars Serpent
NIST Report: Security Margin High Serpent MARS Twofish Adequate Rijndael RC 6 Simple Complexity
Security: Theoretical attacks better than exhaustive key search Serpent 9 Twofish 23 10 6 Mars Rijndael 7 RC 6 3 15 0 16 5 11 32 16 without 16 mixing rounds 10 5 20 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds
Security: Theoretical attacks better than exhaustive key search 28% Serpent 72% 38% Twofish Mars 62% 31% 69% 70% Rijndael RC 6 30% 25% 75% 0 10 20 30 40 50 60 70 80 90 100 # of rounds in the attack/total # of rounds 100%
Security and hardware speed for hash functions Speed in hardware [Mbit/s] 700 GMU team, May 2002 610 600 500 400 359 300 200 100 0 Complexity of the best attack the same as SHA-1 SHA-512 280 2256 Skipjack AES-256
What’s more important: software or hardware?
Historical view Secret-key ciphers Hash functions 1970 DES – optimized for hardware DES-based hash functions – optimized for hardware 1980 1990 2000 time Fast Software Encryption: ciphers optimized for software: e. g. , RC 5, Blowfish, RC 4 AES – optimized for software and hardware MD 4 -family optimized primarily for software
Software or hardware? HARDWARE SOFTWARE security of data during transmission low cost flexibility (new cryptoalgorithms, protection against new attacks) speed random key generation access control to keys tamper resistance (viruses, internal attacks)
Efficiency indicators
Primary efficiency indicators Hardware Software Speed Memory Speed Area Power consumption
Efficiency parameters Latency Mi Encryption/ decryption Ci Throughput = Speed Mi+2 Mi+1 Mi Time to encrypt/decrypt Encryption/ a single block decryption of data Ci+2 Ci+1 Ci Number of bits encrypted/decrypted in a unit of time Block_size · Number_of_blocks_processed_simultaneously Throughput = Latency
What’s more important: Speed or area?
Non-Feedback Cipher Modes ECB, counter
Comparison for non-feedback cipher modes, e. g. Counter Mode - CTR IV IV+1 IV+N-1 IV+2 IV+N E E . . . M 0 M 2 M 1 C 1 E E E C 2 . . . MN MN-1 C 3 Ci = Mi E(IV+i) CN-1 for i=0. . N CN
Increasing speed by parallel processing Encryption/ decryption unit Encryption/ decryption unit
Increasing speed using pipelining Cipher 2 Cipher 1 round 2 target clock period, e. g. , 20 ns . . . round 10 round 16 block size Speed = target_clock_period
clock cycle Pipelined operation of the encryption unit 3 4 5 6 7 8 B 1 clock cycle 2 B 1 B 3 B 2 B 1 B 4 B 3 B 2 B 1 B 5 B 4 B 3 B 2 B 6 B 5 B 4 B 3 B 7 B 6 B 5 B 4 B 8 B 7 B 6 B 5 9 10 11 12 13 14 15 16 B 9 B 8 B 7 B 6 B 10 B 9 B 8 B 7 B 11 B 10 B 9 B 8 B 12 B 3 B 2 B 9 B 13 B 4 B 3 B 10 B 14 B 5 B 4 B 11 B 15 B 6 B 5 B 12 B 16 B 7 B 6 B 13 1
Encryption in non-feedback modes (ECB, counter) decryption in all modes Speed [Mbit/s] 7000 6000 Rijndael 6. 4 Gbit/s Serpent RC 6 Twofish 5000 Mars 4000 3000 2000 Assuming clock period = 50 MHz 1000 0 0 10000 20000 30000 40000 50000 60000 Area [CLB slices]
Our Results: Full mixed pipelining Throughput [Gbit/s] Virtex FPGA 18 16. 8 15. 2 16 13. 1 14 12. 2 12 10 8 6 4 2 0 Serpent Twofish RC 6 Rijndael
Our Results: Full mixed pipelining Area [CLB slices] 50000 45000 40000 dedicated memory blocks, RAMs 46, 900 35000 30000 25000 20000 19, 700 21, 000 12, 600 80 RAMs 15000 10000 5000 0 Serpent Twofish RC 6 Rijndael
NIST Report + GMU Report: Hardware Efficiency Non-feedback cipher modes: ECB, CTR Speed Rijndael Serpent Twofish High RC 6 Mars Medium Low Small Medium Large Area
Feedback cipher modes CBC, CFB, OFB
Feedback cipher modes - CBC M 1 M 2 M 3 MN-1 MN E E . . . IV E E E C 1 C 2 . . . C 3 C 1 = E(Mi IV) Ci = E(Mi Ci-1) CN-1 for i=2. . N CN
Typical Flow Diagram of a Secret-Key Block Cipher Round Key[0] Initial transformation i: =1 Round Key[i] Cipher Round i<#rounds? Round Key[#rounds+1] Final transformation i: =i+1 #rounds times
Basic iterative architecture multiplexer register one round combinational logic
Increasing speed in cipher feedback modes speed loop-unrolling basic architecture k=2 k=3 k=4 k=5 area
GMU Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput [Mbit/s] 500 400 Serpent I 8 Rijndael 300 Twofish Serpent I 1 200 RC 6 100 Mars 0 0 1000 2000 3000 4000 5000 Area [CLB slices]
NSA Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - ASIC, 0. 5 m CMOS Throughput [Mbit/s] 700 600 Rijndael 500 400 300 Serpent I 1 200 100 0 RC 6 0 5 Mars Twofish 10 15 20 25 30 35 40 Area [CLB slices]
Decreasing area by resource sharing After Before D 0 D 1 multiplexer F F F D 0’ D 1’ D 0’ register D 1’ register
Resource sharing: Speed vs. Area Throughput - basic architecture - resource sharing basic architecture resource sharing Area
NIST Report + GMU Report: Hardware Efficiency Feedback cipher modes: CBC, CFB Speed High Medium Rijndael Serpent Twofish RC 6 Low MARS Small Medium Large Area
Aren’t software and hardware optimizations equivalent?
Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128 -bit key 192 -bit key 256 -bit key 30 25 20 15 10 5 0 Rijndael RC 6 Twofish Mars Serpent
Our Results: Basic architecture - Speed Throughput [Mbit/s] 500 450 400 350 300 250 200 150 100 50 0 Serpent Rijndael Twofish RC 6 Mars
Basic atomic operations of secret-key ciphers and hash functions
Atomic operations used in 41 most popular secret-key ciphers (1) B. Chetwynd, MS Thesis, WPI Considered ciphers: Blowfish, CAST-128, CAST-256, CRYPTON, CS-Cipher, DEAL, DES, DFC, E 2, FEAL, FROG, GOST, Hasty Pudding, ICE, IDEA, Khafre, Khufu, LOKI 91, LOKI 97, Lucifer, Mac. Guffin, MAGENTA, MARS, MISTY 1, MISTY 2, MMB, RC 2, RC 5, RC 6, Rijndael, SAFER K, SAFER+, Serpent, SQUARE, SHARK, Skipjack, TEA, Twofish, WAKE, Wider. Wake
Major atomic operations used in 41 most popular secret-key ciphers (2) B. Chetwynd, MS Thesis, WPI 40 35 30 25 20 15 10 5 0 30 10 7 7 1 S-box Variable rotation Modular multiplication GF(2 n) multiplication Modular inversion
Auxiliary atomic operations used in 41 most popular secret-key ciphers (3) B. Chetwynd, MS Thesis, WPI 40 40 35 30 25 20 15 10 5 0 Boolean (XOR, AND, OR, etc. ) 25 20 ? Fixed rotation Modular addition & subtraction Permutation
Major cipher operations (1) - S-box Software S-box n x m n C Hardware ROM n-bit address WORD S[1<
S-box: Memory in hardware 32 x 4 = 128 bits 4 4 S S 4 4 4 S 4 S 4 4 4 S . . . 4 4 Memory = 32 24 4 bits = 2 kbit 16 x 8 = 128 bits 8 S 8 8 . . . S 8 8 S 8 Memory = 16 28 8 bits = 32 kbit = 16 2 kbit 8
S-box: Memory in software 32 x 4 = 128 bits 4 4 S S 4 4 4 S 4 S 4 4 4 S . . . 4 4 Memory = 24 4 bits = 64 bit 16 x 8 = 128 bits 8 8 S S 8 8 . . . S 8 8 8 Memory = 28 8 bits = 2 kbit = 32 64 bits S 8
Major cipher operations (2) – Variable Rotation Software Hardware Mux-based shifter A<<<0 A<<<16 C C = (A << B) | (A >> (32 -B)); A <<< B 32 variable rotation ROL 32 ASM ROL A, B B[4] B[3] B[2] B[1] B[0] A<<
Major cipher operations (3) – Modular Multiplication A B n Software n Hardware C unsigned long A, B, C; n n C = A*B; MUL n C C=A·B mod 2 n n=32, 16 ASM MUL n Half. Multiplier
Major cipher operations (4) Multiplication in the Galois Field GF(2 m) Software X 8 C = const 8 MUL GF(28) Hardware C x 0 x 3 x 4 x 7 x 0 x 3 x 7 <<, ^, |, & or. . . alog[X]+log[C]%255] ASM Y ROL, XOR, AND or ALOG DW 3 H, 5 H, … LOG DW 7 H, 9 H, … y 0 y 7
Auxiliary cipher operations (1) - Permutation Software n Permutation Hardware C x 1 x 2 x 3 complex sequence of instructions <<, |, & ASM complex sequence of instructions ROL, OR, AND xn-1 xn . . . y 1 y 2 y 3 yn-1 yn order of wires
Auxiliary cipher operations (2) - Fixed rotation Software Hardware C x 1 x 2 x 3 C = (A << n) | (A >> (32 -n)); fixed rotation ROL 32 . . . A <<< n 32 xn-1 xn ASM ROL A, n y 1 y 2 y 3 yn-1 yn order of wires
Auxiliary cipher operations (3) Boolean operations Software A B n n XOR, AND, OR C Hardware a 0 an-1 b 0 A^B A&B A|B . . . yn-1 y 0 ASM Y bn-1 a 0 an-1 b 0 XOR A, B AND A, B OR A, B bn-1 . . . y 0 yn-1
Auxiliary cipher operations (4) Addition/subtraction A B n Software n Hardware C unsigned long A, B, C; n n C = A+B; ADD n C C=A+B mod 2 n n=32, 16 ASM ADD n Adder/subtractor
Multiple designs for hardware adders Delay Ripple carry adder (RC) Carry-Skip adder (CS) Carry-Look. Ahead adder (CLA) Carry-Select adder Parallel-Prefix Network adder (Kogge-Stone, Brent-Kung) Area
Basic operations Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication Boolean permutation fixed rotation modular inverse variable rotation addition (CLA) S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Area
Basic operations Delay and area in SOFTWARE Delay modular inverse permutation GF(2 n) multiplication variable rotation fixed rotation multiplication addition Boolean S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Memory
Major operations of AES finalists Serpent Twofish Rijndael S-boxes Multiplication in GF(2 m) Variable rotation Integer multiplication RC 6 Mars
Auxiliary operations of AES finalists Serpent Twofish Rijndael Boolean Fixed rotation Addition/ subtraction Permutation RC 6 Mars
MARS – IBM team Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication Boolean permutation fixed rotation modular inverse variable rotation addition (CLA) S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Area
Serpent – R. Anderson, E. Biham, L. Knudsen Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication variable rotation addition (CLA) S-box permutation 4 x 4 fixed rotation Boolean modular inverse S-box 8 x 8 S-box 9 x 32 Area
Rijndael – V. Rijmen, J. Daemen Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication variable rotation addition (CLA) S-box permutation 4 x 4 fixed rotation Boolean modular inverse S-box 8 x 8 S-box 9 x 32 Area
MARS – IBM team Delay and area in SOFTWARE Delay modular inverse permutation GF(2 n) multiplication variable rotation fixed rotation multiplication addition Boolean S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Memory
Operations efficient in both software and hardware Summary Software Slow & big Slow or big permutation GF(2 n) multiply Fast & compact S-box modular inverse variable rotation Boolean fixed rotation addition Fast & compact Slow or big multiplication Slow & big Hardware
Types of ciphers
AES: Types of candidate algorithms Feistel Networks Twofish E 2 DFC Deal LOKI 97 Magenta Substitution. Linear Transformation Networks Rijndael Serpent Safer+ Crypton Modified Feistel Network RC 6 MARS CAST-256 Others Frog HPC
Feistel Network: Single Round of Twofish D[1] D[0] D[3] D[2] K 2 r+8 K 2 r+9 <<< 1 F - function >>> 1 D’[3] D’[2] D’[1] D’[0] - units shared between encryption and decryption
Modified Feistel Network: Single Round of MARS D[3] D[2] D[1] D[0] k k’ k=K[4+2 i], k’ = K[5+2 i], i - round no. out 1 out 2 E in out 3 <<<13 D’[3] D’[2] D’[1] D’[0] - units shared between encryption and decryption
Substitution-Linear Transformation Network: Single Round of Serpent 128 S-boxes Linear Transformation K[i] 128 - units shared between encryption and decryption
Substitution-Linear Transformation Network: Serpent in Hardware 128 initial permutation 128 K 0, . . . , K 7, K 32 encryption block 128 decryption block 128 128 final permutation 128 K 32, . . . , K 7, K 0
Substitution-Linear Transformation Network: Rijndael in Hardware - units shared between encryption and decryption inversed affine transformation decryption encryption Inversion in GF(28) affine transformation Inv. Shift. Row subkey Shift. Row Inv. Mix. Column subkey
Number and complexity of rounds
Number vs. complexity of a round Number of rounds 50 Triple DES 40 Serpent Mars 30 20 10 RC 6 DES Twofish Rijndael Complexity of a round
Complexity of the cipher round in hardware Time in hardware [ns] 0 Serpent 20 40 60 80 regular round S-box 4 x 4 XOR 7 MUX 2 100 K. Gaj, P. Chodowiec April 2000 Rijndael S-box 8 x 8 XOR 6 XOR 5 XOR 4 2 MUX 2 Twofish 2 ADD 32 6 S-boxes 4 x 4 9 XOR 2 XOR 5 XOR 4 2 MUX 2 RC 6 SQR 32 2 ADD 32 ROT 32 4 MUX 2 Mars ADD 32 MUL 32 ROT 32 ADD 32 2 XOR 2 4 MUX 2
Security margin: Theoretical attacks better than exhaustive key search Serpent 9 Twofish 23 10 6 Mars Rijndael 7 RC 6 3 15 0 16 5 11 32 16 without 16 mixing rounds 10 5 20 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds
Making all rounds identical
Serpent: Hardware Architecture I 8 128 128 -bit register K 0 round 0 32 x S-box 0 linear transformation one implementation round of Serpent = K 7 round 7 32 x S-box 7 linear transformation K 32 128 output 8 regular cipher rounds
Serpent – Hardware Architecture I 1 128 128 -bit register Ki regular Serpent round 128 32 x S-box 0 128 32 x S-box 1 128 8 -to-1 128 -bit multiplexer linear transformation K 32 128 output 128 32 x S-box 7 128
GMU Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput [Mbit/s] 500 400 Serpent with all S-boxes Rijndael identical 300 Serpent I 8 Twofish Serpent I 1 200 RC 6 100 Mars 0 0 1000 2000 3000 4000 5000 Area [CLB slices]
Parallelism
Parallelism in SHA-1 A A 32 32 ROTL 5 B A ROTL 5 B 32 ROTL 30 C 32 32 ft D 32 E C D 32 + + E + + Kt + + E Wt 32 Kt Wt Operations from two different steps that can be performed in parallel
Executing SHA-1 on a 7 -way superscalar processor A. Bosselaers, R. Govaerts, J. Vandewalle, 1997 step n ROL 1 step n+1 ROL 30 ROL 1 step n+2 ROL 30 ROL 1 ROL 5 step n+3 ROL 30 ROL 1 ROL 5 step n+4 ROL 30 ROL 1 ROL 30 ROL 5 ROL 1 ROL 5 ROL 30
Number of operations that can be executed in parallel for various hash functions A. Bosselaers, R. Govaerts, J. Vandewalle, 1997 8 7 6 5 4 3 2 1 0 SHA-1 RIPEMD 128 160 MD 5 MD 4
Optimization tricks
Rijndael round: Table-lookup implementation a 0, 0 a 0, 1 a 0, 2 a 0, 3 T 0 a 1, 1 a 1, 2 a 1, 3 T 1 a 2, 0 a 2, 1 a 2, 2 a 2, 3 T 2 a 3, 0 a 3, 1 a 3, 2 a 3, 3 k 2 b 0 b 1 b 2 b 3 T 3 x 3, 2 x 2, 2 x 1, 2 Speed-up in software: Speed-up in hardware: x 0, 2 = b 2 ~ 100 times ~ 20%
Serpent: Bit-slice implementation 32 x 4 = 128 bits (0) (0) (1) (2) (3) (1) (2) (0) (3) x 1 x 2 x 3 x 4 x 1 x(2) x 3 x 4 x 1 x(3) x 3 x 4 2 2 S S (0) y 1 (1) y 1 (2) y 1 (3) y 1 (k) (k) (k) e. g. y 1 = f (x 1, x 2, x 3, x 4 ) = (3). . . x 1 x(2) x 1(1) x(0) 1 1 AND (3) (1) x(31)x(30). . . x 2 x(2) x 2 x(0) 2 2 = (31) (30) (3) (1) u 1. . . u 1 u(2) u 1 u(0) 1 1 XOR S (31) y 1 (k) (k ) x 1 x 2 (x 3 x 4 ) (3). . . x 3 x(2) x 3(1) x(0) 3 3 OR (3) (1) (0) x(31)x(30). . . x 4 x(2) x 4 4 4 4 = (31) (30) (3) (1) v 1. . . v 1 v(2) v 1 v(0) 1 1 x(31)x(30) 3 3 x(31)x(30) 1 1 y(31)y(30) 1 1 (k) (31) x 1 x 2 x 3 x 4 (3) (1) y 1 y(2) y 1 y(0) 1 1
The proposed approach
Cipher design methodology (1) 1. Choose one or maximum two major operations efficient in both software and hardware best choice: S-box 4 x 4, GF(2 n) multiplication 2. Choose one or maximum two auxiliary operations efficient in both software and hardware best choice: Boolean, fixed rotation 3. Choose cipher type that enables maximum sharing among encryption and decryption best choice: Feistel network, modified Feistel network
Cipher design methodology (2) 4. Design a round taking into account a trade-off among • round complexity • number of rounds necessary to guarantee sufficient security margin 5. Make each round [possibly] identical negative examples: Serpent, Mars 6. Look for parallelism within a round among consecutive rounds positive example: SHA-1 7. Look for optimization tricks positive examples: table-look-up in Rijndael bit-slice implementation in Serpent
Mathematicians Security Flexibility Software efficiency Computer scientists Hardware efficiency Computer Engineers
$A 100 Challenges For mathematicians: Prove or disprove that Serpent with • all S-boxes identical • 16 rounds is at least as secure as Rijndael For computer scientists: Is there a way of using instruction level parallelism to speed-up software implementation of [modified] Serpent to make it as fast as Rijndael?
$A 50 Challenge For mathematicians: Is there a way of changing Serpent into a modified Feistel network cipher without loosing its security properties? For computer scientists: What is a level of parallelism present in SHA-256, SHA-384, SHA-512?