Скачать презентацию Kris Gaj Electrical and Computer Engineering George Mason Скачать презентацию Kris Gaj Electrical and Computer Engineering George Mason

c92b4ed46899a2d2832c8de02980d5db.ppt

  • Количество слайдов: 96

Kris Gaj Electrical and Computer Engineering George Mason University Towards secure cryptographic transformations efficient Kris Gaj Electrical and Computer Engineering George Mason University Towards secure cryptographic transformations efficient in both software and hardware: A case for synergy among math, computing, and engineering http: //ece. gmu. edu/crypto-text. htm

Motivation Motivation

Criteria used to evaluate cryptographic transformations Security Hardware Efficiency Software Efficiency Flexibility Criteria used to evaluate cryptographic transformations Security Hardware Efficiency Software Efficiency Flexibility

Flexibility • Additional key-sizes and block-sizes • Ability to function efficiently and securely in Flexibility • Additional key-sizes and block-sizes • Ability to function efficiently and securely in a wide variety of platforms and applications low-end smartcards, wireless: small memory requirements IPSec, ATM – small key setup time in hardware B-ISDN, satellite communication – large encryption speed

Advanced Encryption Standard (AES) Contest 1997 -2001 June 1998 15 Candidates Round 1 from Advanced Encryption Standard (AES) Contest 1997 -2001 June 1998 15 Candidates Round 1 from USA, Canada, Belgium, France, Germany, Norway, UK, Israel, Korea, Japan, Australia, Costa Rica Security Software efficiency Flexibility August 1999 5 final candidates Mars, RC 6, Rijndael, Serpent, Twofish October 2000 1 winner: Rijndael Belgium Round 2 Security Hardware efficiency

Europe NESSIE Project New European Schemes for Signatures, Integrity, and Encryption 2000 -2002 Japan Europe NESSIE Project New European Schemes for Signatures, Integrity, and Encryption 2000 -2002 Japan CRYPTREC Project 2000 -2002

NESSIE, CRYPTREC Multiple types of transformations: • Symmetric-key block ciphers • Stream ciphers • NESSIE, CRYPTREC Multiple types of transformations: • Symmetric-key block ciphers • Stream ciphers • Hash functions • MACs • Asymmetric encryption schemes • Asymmetric digital signature schemes • Asymmetric identification schemes Development of methodology of a fair evaluation and comparison of algorithms belonging to the same class, including software and hardware efficiency

Speed of the final AES candidates in hardware Speed [Mbit/s] K. Gaj, P. Chodowiec, Speed of the final AES candidates in hardware Speed [Mbit/s] K. Gaj, P. Chodowiec, AES 3, April, 2000 500 450 400 350 300 250 200 150 100 50 0 Serpent Rijndael Twofish RC 6 Mars

Survey filled by 167 participants of the Third AES Conference, April 2000 # votes Survey filled by 167 participants of the Third AES Conference, April 2000 # votes 100 90 80 70 60 50 40 30 20 10 0 Rijndael Serpent Twofish RC 6 Mars

Results of the NSA group Hardware Speed [Mbit/s] 700 NSA ASIC 606 600 500 Results of the NSA group Hardware Speed [Mbit/s] 700 NSA ASIC 606 600 500 GMU FPGA 431 414 400 300 202 177 200 100 0 105 143 103 Rijndael Serpent Twofish RC 6 57 61 Mars

Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128 Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128 -bit key 192 -bit key 256 -bit key 30 25 20 15 10 5 0 Rijndael RC 6 Twofish Mars Serpent

NIST Report: Security Margin High Serpent MARS Twofish Adequate Rijndael RC 6 Simple Complexity NIST Report: Security Margin High Serpent MARS Twofish Adequate Rijndael RC 6 Simple Complexity

Security: Theoretical attacks better than exhaustive key search Serpent 9 Twofish 23 10 6 Security: Theoretical attacks better than exhaustive key search Serpent 9 Twofish 23 10 6 Mars Rijndael 7 RC 6 3 15 0 16 5 11 32 16 without 16 mixing rounds 10 5 20 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds

Security: Theoretical attacks better than exhaustive key search 28% Serpent 72% 38% Twofish Mars Security: Theoretical attacks better than exhaustive key search 28% Serpent 72% 38% Twofish Mars 62% 31% 69% 70% Rijndael RC 6 30% 25% 75% 0 10 20 30 40 50 60 70 80 90 100 # of rounds in the attack/total # of rounds 100%

Security and hardware speed for hash functions Speed in hardware [Mbit/s] 700 GMU team, Security and hardware speed for hash functions Speed in hardware [Mbit/s] 700 GMU team, May 2002 610 600 500 400 359 300 200 100 0 Complexity of the best attack the same as SHA-1 SHA-512 280 2256 Skipjack AES-256

What’s more important: software or hardware? What’s more important: software or hardware?

Historical view Secret-key ciphers Hash functions 1970 DES – optimized for hardware DES-based hash Historical view Secret-key ciphers Hash functions 1970 DES – optimized for hardware DES-based hash functions – optimized for hardware 1980 1990 2000 time Fast Software Encryption: ciphers optimized for software: e. g. , RC 5, Blowfish, RC 4 AES – optimized for software and hardware MD 4 -family optimized primarily for software

Software or hardware? HARDWARE SOFTWARE security of data during transmission low cost flexibility (new Software or hardware? HARDWARE SOFTWARE security of data during transmission low cost flexibility (new cryptoalgorithms, protection against new attacks) speed random key generation access control to keys tamper resistance (viruses, internal attacks)

Efficiency indicators Efficiency indicators

Primary efficiency indicators Hardware Software Speed Memory Speed Area Power consumption Primary efficiency indicators Hardware Software Speed Memory Speed Area Power consumption

Efficiency parameters Latency Mi Encryption/ decryption Ci Throughput = Speed Mi+2 Mi+1 Mi Time Efficiency parameters Latency Mi Encryption/ decryption Ci Throughput = Speed Mi+2 Mi+1 Mi Time to encrypt/decrypt Encryption/ a single block decryption of data Ci+2 Ci+1 Ci Number of bits encrypted/decrypted in a unit of time Block_size · Number_of_blocks_processed_simultaneously Throughput = Latency

What’s more important: Speed or area? What’s more important: Speed or area?

Non-Feedback Cipher Modes ECB, counter Non-Feedback Cipher Modes ECB, counter

Comparison for non-feedback cipher modes, e. g. Counter Mode - CTR IV IV+1 IV+N-1 Comparison for non-feedback cipher modes, e. g. Counter Mode - CTR IV IV+1 IV+N-1 IV+2 IV+N E E . . . M 0 M 2 M 1 C 1 E E E C 2 . . . MN MN-1 C 3 Ci = Mi E(IV+i) CN-1 for i=0. . N CN

Increasing speed by parallel processing Encryption/ decryption unit Encryption/ decryption unit Increasing speed by parallel processing Encryption/ decryption unit Encryption/ decryption unit

Increasing speed using pipelining Cipher 2 Cipher 1 round 2 target clock period, e. Increasing speed using pipelining Cipher 2 Cipher 1 round 2 target clock period, e. g. , 20 ns . . . round 10 round 16 block size Speed = target_clock_period

clock cycle Pipelined operation of the encryption unit 3 4 5 6 7 8 clock cycle Pipelined operation of the encryption unit 3 4 5 6 7 8 B 1 clock cycle 2 B 1 B 3 B 2 B 1 B 4 B 3 B 2 B 1 B 5 B 4 B 3 B 2 B 6 B 5 B 4 B 3 B 7 B 6 B 5 B 4 B 8 B 7 B 6 B 5 9 10 11 12 13 14 15 16 B 9 B 8 B 7 B 6 B 10 B 9 B 8 B 7 B 11 B 10 B 9 B 8 B 12 B 3 B 2 B 9 B 13 B 4 B 3 B 10 B 14 B 5 B 4 B 11 B 15 B 6 B 5 B 12 B 16 B 7 B 6 B 13 1

Encryption in non-feedback modes (ECB, counter) decryption in all modes Speed [Mbit/s] 7000 6000 Encryption in non-feedback modes (ECB, counter) decryption in all modes Speed [Mbit/s] 7000 6000 Rijndael 6. 4 Gbit/s Serpent RC 6 Twofish 5000 Mars 4000 3000 2000 Assuming clock period = 50 MHz 1000 0 0 10000 20000 30000 40000 50000 60000 Area [CLB slices]

Our Results: Full mixed pipelining Throughput [Gbit/s] Virtex FPGA 18 16. 8 15. 2 Our Results: Full mixed pipelining Throughput [Gbit/s] Virtex FPGA 18 16. 8 15. 2 16 13. 1 14 12. 2 12 10 8 6 4 2 0 Serpent Twofish RC 6 Rijndael

Our Results: Full mixed pipelining Area [CLB slices] 50000 45000 40000 dedicated memory blocks, Our Results: Full mixed pipelining Area [CLB slices] 50000 45000 40000 dedicated memory blocks, RAMs 46, 900 35000 30000 25000 20000 19, 700 21, 000 12, 600 80 RAMs 15000 10000 5000 0 Serpent Twofish RC 6 Rijndael

NIST Report + GMU Report: Hardware Efficiency Non-feedback cipher modes: ECB, CTR Speed Rijndael NIST Report + GMU Report: Hardware Efficiency Non-feedback cipher modes: ECB, CTR Speed Rijndael Serpent Twofish High RC 6 Mars Medium Low Small Medium Large Area

Feedback cipher modes CBC, CFB, OFB Feedback cipher modes CBC, CFB, OFB

Feedback cipher modes - CBC M 1 M 2 M 3 MN-1 MN E Feedback cipher modes - CBC M 1 M 2 M 3 MN-1 MN E E . . . IV E E E C 1 C 2 . . . C 3 C 1 = E(Mi IV) Ci = E(Mi Ci-1) CN-1 for i=2. . N CN

Typical Flow Diagram of a Secret-Key Block Cipher Round Key[0] Initial transformation i: =1 Typical Flow Diagram of a Secret-Key Block Cipher Round Key[0] Initial transformation i: =1 Round Key[i] Cipher Round i<#rounds? Round Key[#rounds+1] Final transformation i: =i+1 #rounds times

Basic iterative architecture multiplexer register one round combinational logic Basic iterative architecture multiplexer register one round combinational logic

Increasing speed in cipher feedback modes speed loop-unrolling basic architecture k=2 k=3 k=4 k=5 Increasing speed in cipher feedback modes speed loop-unrolling basic architecture k=2 k=3 k=4 k=5 area

GMU Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput GMU Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput [Mbit/s] 500 400 Serpent I 8 Rijndael 300 Twofish Serpent I 1 200 RC 6 100 Mars 0 0 1000 2000 3000 4000 5000 Area [CLB slices]

NSA Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - ASIC, 0. 5 NSA Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - ASIC, 0. 5 m CMOS Throughput [Mbit/s] 700 600 Rijndael 500 400 300 Serpent I 1 200 100 0 RC 6 0 5 Mars Twofish 10 15 20 25 30 35 40 Area [CLB slices]

Decreasing area by resource sharing After Before D 0 D 1 multiplexer F F Decreasing area by resource sharing After Before D 0 D 1 multiplexer F F F D 0’ D 1’ D 0’ register D 1’ register

Resource sharing: Speed vs. Area Throughput - basic architecture - resource sharing basic architecture Resource sharing: Speed vs. Area Throughput - basic architecture - resource sharing basic architecture resource sharing Area

NIST Report + GMU Report: Hardware Efficiency Feedback cipher modes: CBC, CFB Speed High NIST Report + GMU Report: Hardware Efficiency Feedback cipher modes: CBC, CFB Speed High Medium Rijndael Serpent Twofish RC 6 Low MARS Small Medium Large Area

Aren’t software and hardware optimizations equivalent? Aren’t software and hardware optimizations equivalent?

Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128 Efficiency in software: NIST-specified platform 200 MHz Pentium Pro, Borland C++ Speed [Mbits/s] 128 -bit key 192 -bit key 256 -bit key 30 25 20 15 10 5 0 Rijndael RC 6 Twofish Mars Serpent

Our Results: Basic architecture - Speed Throughput [Mbit/s] 500 450 400 350 300 250 Our Results: Basic architecture - Speed Throughput [Mbit/s] 500 450 400 350 300 250 200 150 100 50 0 Serpent Rijndael Twofish RC 6 Mars

Basic atomic operations of secret-key ciphers and hash functions Basic atomic operations of secret-key ciphers and hash functions

Atomic operations used in 41 most popular secret-key ciphers (1) B. Chetwynd, MS Thesis, Atomic operations used in 41 most popular secret-key ciphers (1) B. Chetwynd, MS Thesis, WPI Considered ciphers: Blowfish, CAST-128, CAST-256, CRYPTON, CS-Cipher, DEAL, DES, DFC, E 2, FEAL, FROG, GOST, Hasty Pudding, ICE, IDEA, Khafre, Khufu, LOKI 91, LOKI 97, Lucifer, Mac. Guffin, MAGENTA, MARS, MISTY 1, MISTY 2, MMB, RC 2, RC 5, RC 6, Rijndael, SAFER K, SAFER+, Serpent, SQUARE, SHARK, Skipjack, TEA, Twofish, WAKE, Wider. Wake

Major atomic operations used in 41 most popular secret-key ciphers (2) B. Chetwynd, MS Major atomic operations used in 41 most popular secret-key ciphers (2) B. Chetwynd, MS Thesis, WPI 40 35 30 25 20 15 10 5 0 30 10 7 7 1 S-box Variable rotation Modular multiplication GF(2 n) multiplication Modular inversion

Auxiliary atomic operations used in 41 most popular secret-key ciphers (3) B. Chetwynd, MS Auxiliary atomic operations used in 41 most popular secret-key ciphers (3) B. Chetwynd, MS Thesis, WPI 40 40 35 30 25 20 15 10 5 0 Boolean (XOR, AND, OR, etc. ) 25 20 ? Fixed rotation Modular addition & subtraction Permutation

Major cipher operations (1) - S-box Software S-box n x m n C Hardware Major cipher operations (1) - S-box Software S-box n x m n C Hardware ROM n-bit address WORD S[1<

S-box: Memory in hardware 32 x 4 = 128 bits 4 4 S S S-box: Memory in hardware 32 x 4 = 128 bits 4 4 S S 4 4 4 S 4 S 4 4 4 S . . . 4 4 Memory = 32 24 4 bits = 2 kbit 16 x 8 = 128 bits 8 S 8 8 . . . S 8 8 S 8 Memory = 16 28 8 bits = 32 kbit = 16 2 kbit 8

S-box: Memory in software 32 x 4 = 128 bits 4 4 S S S-box: Memory in software 32 x 4 = 128 bits 4 4 S S 4 4 4 S 4 S 4 4 4 S . . . 4 4 Memory = 24 4 bits = 64 bit 16 x 8 = 128 bits 8 8 S S 8 8 . . . S 8 8 8 Memory = 28 8 bits = 2 kbit = 32 64 bits S 8

Major cipher operations (2) – Variable Rotation Software Hardware Mux-based shifter A<<<0 A<<<16 C Major cipher operations (2) – Variable Rotation Software Hardware Mux-based shifter A<<<0 A<<<16 C C = (A << B) | (A >> (32 -B)); A <<< B 32 variable rotation ROL 32 ASM ROL A, B B[4] B[3] B[2] B[1] B[0] A<<

Major cipher operations (3) – Modular Multiplication A B n Software n Hardware C Major cipher operations (3) – Modular Multiplication A B n Software n Hardware C unsigned long A, B, C; n n C = A*B; MUL n C C=A·B mod 2 n n=32, 16 ASM MUL n Half. Multiplier

Major cipher operations (4) Multiplication in the Galois Field GF(2 m) Software X 8 Major cipher operations (4) Multiplication in the Galois Field GF(2 m) Software X 8 C = const 8 MUL GF(28) Hardware C x 0 x 3 x 4 x 7 x 0 x 3 x 7 <<, ^, |, & or. . . alog[X]+log[C]%255] ASM Y ROL, XOR, AND or ALOG DW 3 H, 5 H, … LOG DW 7 H, 9 H, … y 0 y 7

Auxiliary cipher operations (1) - Permutation Software n Permutation Hardware C x 1 x Auxiliary cipher operations (1) - Permutation Software n Permutation Hardware C x 1 x 2 x 3 complex sequence of instructions <<, |, & ASM complex sequence of instructions ROL, OR, AND xn-1 xn . . . y 1 y 2 y 3 yn-1 yn order of wires

Auxiliary cipher operations (2) - Fixed rotation Software Hardware C x 1 x 2 Auxiliary cipher operations (2) - Fixed rotation Software Hardware C x 1 x 2 x 3 C = (A << n) | (A >> (32 -n)); fixed rotation ROL 32 . . . A <<< n 32 xn-1 xn ASM ROL A, n y 1 y 2 y 3 yn-1 yn order of wires

Auxiliary cipher operations (3) Boolean operations Software A B n n XOR, AND, OR Auxiliary cipher operations (3) Boolean operations Software A B n n XOR, AND, OR C Hardware a 0 an-1 b 0 A^B A&B A|B . . . yn-1 y 0 ASM Y bn-1 a 0 an-1 b 0 XOR A, B AND A, B OR A, B bn-1 . . . y 0 yn-1

Auxiliary cipher operations (4) Addition/subtraction A B n Software n Hardware C unsigned long Auxiliary cipher operations (4) Addition/subtraction A B n Software n Hardware C unsigned long A, B, C; n n C = A+B; ADD n C C=A+B mod 2 n n=32, 16 ASM ADD n Adder/subtractor

Multiple designs for hardware adders Delay Ripple carry adder (RC) Carry-Skip adder (CS) Carry-Look. Multiple designs for hardware adders Delay Ripple carry adder (RC) Carry-Skip adder (CS) Carry-Look. Ahead adder (CLA) Carry-Select adder Parallel-Prefix Network adder (Kogge-Stone, Brent-Kung) Area

Basic operations Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) Basic operations Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication Boolean permutation fixed rotation modular inverse variable rotation addition (CLA) S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Area

Basic operations Delay and area in SOFTWARE Delay modular inverse permutation GF(2 n) multiplication Basic operations Delay and area in SOFTWARE Delay modular inverse permutation GF(2 n) multiplication variable rotation fixed rotation multiplication addition Boolean S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Memory

Major operations of AES finalists Serpent Twofish Rijndael S-boxes Multiplication in GF(2 m) Variable Major operations of AES finalists Serpent Twofish Rijndael S-boxes Multiplication in GF(2 m) Variable rotation Integer multiplication RC 6 Mars

Auxiliary operations of AES finalists Serpent Twofish Rijndael Boolean Fixed rotation Addition/ subtraction Permutation Auxiliary operations of AES finalists Serpent Twofish Rijndael Boolean Fixed rotation Addition/ subtraction Permutation RC 6 Mars

MARS – IBM team Delay and area in HARDWARE Delay modular multiplication addition (RC) MARS – IBM team Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication Boolean permutation fixed rotation modular inverse variable rotation addition (CLA) S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Area

Serpent – R. Anderson, E. Biham, L. Knudsen Delay and area in HARDWARE Delay Serpent – R. Anderson, E. Biham, L. Knudsen Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication variable rotation addition (CLA) S-box permutation 4 x 4 fixed rotation Boolean modular inverse S-box 8 x 8 S-box 9 x 32 Area

Rijndael – V. Rijmen, J. Daemen Delay and area in HARDWARE Delay modular multiplication Rijndael – V. Rijmen, J. Daemen Delay and area in HARDWARE Delay modular multiplication addition (RC) GF(2 n) multiplication variable rotation addition (CLA) S-box permutation 4 x 4 fixed rotation Boolean modular inverse S-box 8 x 8 S-box 9 x 32 Area

MARS – IBM team Delay and area in SOFTWARE Delay modular inverse permutation GF(2 MARS – IBM team Delay and area in SOFTWARE Delay modular inverse permutation GF(2 n) multiplication variable rotation fixed rotation multiplication addition Boolean S-box 4 x 4 S-box 8 x 8 S-box 9 x 32 Memory

Operations efficient in both software and hardware Summary Software Slow & big Slow or Operations efficient in both software and hardware Summary Software Slow & big Slow or big permutation GF(2 n) multiply Fast & compact S-box modular inverse variable rotation Boolean fixed rotation addition Fast & compact Slow or big multiplication Slow & big Hardware

Types of ciphers Types of ciphers

AES: Types of candidate algorithms Feistel Networks Twofish E 2 DFC Deal LOKI 97 AES: Types of candidate algorithms Feistel Networks Twofish E 2 DFC Deal LOKI 97 Magenta Substitution. Linear Transformation Networks Rijndael Serpent Safer+ Crypton Modified Feistel Network RC 6 MARS CAST-256 Others Frog HPC

Feistel Network: Single Round of Twofish D[1] D[0] D[3] D[2] K 2 r+8 K Feistel Network: Single Round of Twofish D[1] D[0] D[3] D[2] K 2 r+8 K 2 r+9 <<< 1 F - function >>> 1 D’[3] D’[2] D’[1] D’[0] - units shared between encryption and decryption

Modified Feistel Network: Single Round of MARS D[3] D[2] D[1] D[0] k k’ k=K[4+2 Modified Feistel Network: Single Round of MARS D[3] D[2] D[1] D[0] k k’ k=K[4+2 i], k’ = K[5+2 i], i - round no. out 1 out 2 E in out 3 <<<13 D’[3] D’[2] D’[1] D’[0] - units shared between encryption and decryption

Substitution-Linear Transformation Network: Single Round of Serpent 128 S-boxes Linear Transformation K[i] 128 - Substitution-Linear Transformation Network: Single Round of Serpent 128 S-boxes Linear Transformation K[i] 128 - units shared between encryption and decryption

Substitution-Linear Transformation Network: Serpent in Hardware 128 initial permutation 128 K 0, . . Substitution-Linear Transformation Network: Serpent in Hardware 128 initial permutation 128 K 0, . . . , K 7, K 32 encryption block 128 decryption block 128 128 final permutation 128 K 32, . . . , K 7, K 0

Substitution-Linear Transformation Network: Rijndael in Hardware - units shared between encryption and decryption inversed Substitution-Linear Transformation Network: Rijndael in Hardware - units shared between encryption and decryption inversed affine transformation decryption encryption Inversion in GF(28) affine transformation Inv. Shift. Row subkey Shift. Row Inv. Mix. Column subkey

Number and complexity of rounds Number and complexity of rounds

Number vs. complexity of a round Number of rounds 50 Triple DES 40 Serpent Number vs. complexity of a round Number of rounds 50 Triple DES 40 Serpent Mars 30 20 10 RC 6 DES Twofish Rijndael Complexity of a round

Complexity of the cipher round in hardware Time in hardware [ns] 0 Serpent 20 Complexity of the cipher round in hardware Time in hardware [ns] 0 Serpent 20 40 60 80 regular round S-box 4 x 4 XOR 7 MUX 2 100 K. Gaj, P. Chodowiec April 2000 Rijndael S-box 8 x 8 XOR 6 XOR 5 XOR 4 2 MUX 2 Twofish 2 ADD 32 6 S-boxes 4 x 4 9 XOR 2 XOR 5 XOR 4 2 MUX 2 RC 6 SQR 32 2 ADD 32 ROT 32 4 MUX 2 Mars ADD 32 MUL 32 ROT 32 ADD 32 2 XOR 2 4 MUX 2

Security margin: Theoretical attacks better than exhaustive key search Serpent 9 Twofish 23 10 Security margin: Theoretical attacks better than exhaustive key search Serpent 9 Twofish 23 10 6 Mars Rijndael 7 RC 6 3 15 0 16 5 11 32 16 without 16 mixing rounds 10 5 20 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds

Making all rounds identical Making all rounds identical

Serpent: Hardware Architecture I 8 128 128 -bit register K 0 round 0 32 Serpent: Hardware Architecture I 8 128 128 -bit register K 0 round 0 32 x S-box 0 linear transformation one implementation round of Serpent = K 7 round 7 32 x S-box 7 linear transformation K 32 128 output 8 regular cipher rounds

Serpent – Hardware Architecture I 1 128 128 -bit register Ki regular Serpent round Serpent – Hardware Architecture I 1 128 128 -bit register Ki regular Serpent round 128 32 x S-box 0 128 32 x S-box 1 128 8 -to-1 128 -bit multiplexer linear transformation K 32 128 output 128 32 x S-box 7 128

GMU Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput GMU Results: Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA Throughput [Mbit/s] 500 400 Serpent with all S-boxes Rijndael identical 300 Serpent I 8 Twofish Serpent I 1 200 RC 6 100 Mars 0 0 1000 2000 3000 4000 5000 Area [CLB slices]

Parallelism Parallelism

Parallelism in SHA-1 A A 32 32 ROTL 5 B A ROTL 5 B Parallelism in SHA-1 A A 32 32 ROTL 5 B A ROTL 5 B 32 ROTL 30 C 32 32 ft D 32 E C D 32 + + E + + Kt + + E Wt 32 Kt Wt Operations from two different steps that can be performed in parallel

Executing SHA-1 on a 7 -way superscalar processor A. Bosselaers, R. Govaerts, J. Vandewalle, Executing SHA-1 on a 7 -way superscalar processor A. Bosselaers, R. Govaerts, J. Vandewalle, 1997 step n ROL 1 step n+1 ROL 30 ROL 1 step n+2 ROL 30 ROL 1 ROL 5 step n+3 ROL 30 ROL 1 ROL 5 step n+4 ROL 30 ROL 1 ROL 30 ROL 5 ROL 1 ROL 5 ROL 30

Number of operations that can be executed in parallel for various hash functions A. Number of operations that can be executed in parallel for various hash functions A. Bosselaers, R. Govaerts, J. Vandewalle, 1997 8 7 6 5 4 3 2 1 0 SHA-1 RIPEMD 128 160 MD 5 MD 4

Optimization tricks Optimization tricks

Rijndael round: Table-lookup implementation a 0, 0 a 0, 1 a 0, 2 a Rijndael round: Table-lookup implementation a 0, 0 a 0, 1 a 0, 2 a 0, 3 T 0 a 1, 1 a 1, 2 a 1, 3 T 1 a 2, 0 a 2, 1 a 2, 2 a 2, 3 T 2 a 3, 0 a 3, 1 a 3, 2 a 3, 3 k 2 b 0 b 1 b 2 b 3 T 3 x 3, 2 x 2, 2 x 1, 2 Speed-up in software: Speed-up in hardware: x 0, 2 = b 2 ~ 100 times ~ 20%

Serpent: Bit-slice implementation 32 x 4 = 128 bits (0) (0) (1) (2) (3) Serpent: Bit-slice implementation 32 x 4 = 128 bits (0) (0) (1) (2) (3) (1) (2) (0) (3) x 1 x 2 x 3 x 4 x 1 x(2) x 3 x 4 x 1 x(3) x 3 x 4 2 2 S S (0) y 1 (1) y 1 (2) y 1 (3) y 1 (k) (k) (k) e. g. y 1 = f (x 1, x 2, x 3, x 4 ) = (3). . . x 1 x(2) x 1(1) x(0) 1 1 AND (3) (1) x(31)x(30). . . x 2 x(2) x 2 x(0) 2 2 = (31) (30) (3) (1) u 1. . . u 1 u(2) u 1 u(0) 1 1 XOR S (31) y 1 (k) (k ) x 1 x 2 (x 3 x 4 ) (3). . . x 3 x(2) x 3(1) x(0) 3 3 OR (3) (1) (0) x(31)x(30). . . x 4 x(2) x 4 4 4 4 = (31) (30) (3) (1) v 1. . . v 1 v(2) v 1 v(0) 1 1 x(31)x(30) 3 3 x(31)x(30) 1 1 y(31)y(30) 1 1 (k) (31) x 1 x 2 x 3 x 4 (3) (1) y 1 y(2) y 1 y(0) 1 1

The proposed approach The proposed approach

Cipher design methodology (1) 1. Choose one or maximum two major operations efficient in Cipher design methodology (1) 1. Choose one or maximum two major operations efficient in both software and hardware best choice: S-box 4 x 4, GF(2 n) multiplication 2. Choose one or maximum two auxiliary operations efficient in both software and hardware best choice: Boolean, fixed rotation 3. Choose cipher type that enables maximum sharing among encryption and decryption best choice: Feistel network, modified Feistel network

Cipher design methodology (2) 4. Design a round taking into account a trade-off among Cipher design methodology (2) 4. Design a round taking into account a trade-off among • round complexity • number of rounds necessary to guarantee sufficient security margin 5. Make each round [possibly] identical negative examples: Serpent, Mars 6. Look for parallelism within a round among consecutive rounds positive example: SHA-1 7. Look for optimization tricks positive examples: table-look-up in Rijndael bit-slice implementation in Serpent

Mathematicians Security Flexibility Software efficiency Computer scientists Hardware efficiency Computer Engineers Mathematicians Security Flexibility Software efficiency Computer scientists Hardware efficiency Computer Engineers

$A 100 Challenges For mathematicians: Prove or disprove that Serpent with • all S-boxes $A 100 Challenges For mathematicians: Prove or disprove that Serpent with • all S-boxes identical • 16 rounds is at least as secure as Rijndael For computer scientists: Is there a way of using instruction level parallelism to speed-up software implementation of [modified] Serpent to make it as fast as Rijndael?

$A 50 Challenge For mathematicians: Is there a way of changing Serpent into a $A 50 Challenge For mathematicians: Is there a way of changing Serpent into a modified Feistel network cipher without loosing its security properties? For computer scientists: What is a level of parallelism present in SHA-256, SHA-384, SHA-512?