

# **Low-Latency Encryption in Cyber-Physical Systems: Balancing Security, Efficiency, and Performance**



### **Elif Bilge Kavun** September 19, 2024

## *Overview*



- Fundamental concepts
	- Need for lightweight security
	- When is low latency a requirement for (crypto) hardware?
	- How to achieve?
- Challenges of low latency crypto on hardware implementations
- Existing low-latency symmetric encryption ciphers
	- PRINCE and PRINCEv2 (2012 and 2020)
	- Midori (2015)
	- MANTIS (2016)
	- QARMA (2017\*)
	- Kcipher (2020)
	- SPEEDY (2021)
	- Orthros (2021)
	- SCARF (2022\*)
	- LLLWBC (2022)
	- Sonic and SuperSonic (2023)
	- BipBip (2023)
	- IoVCipher (2024)
- Physical attack resistant variants



#### **Time**

# *Ubiquitous Computing Era*





**Smartphones Mobile applications**



### **Products, IPs**

#### **Medical & sensor systems**





**Automation components Asset tracking systems**











# **Security Concerns!!!**

### *Ubiquitous Computing Era*

 $\equiv$   $\boxed{\color{red}1}$   $\boxed{\color{red}3}$   $\boxed{\color{red}5}$   $\boxed{\color{blue}9}$  business culture gear tokas science more  $\color{red}5$ 



September/October 2011

#### Another one bites the dust (Mifare **DESFire)!**



I'm not sure it's a big surprise but this month David Oswald and Christof Paar from the Horst Gortz Institute for IT Security in Bochum Germany have given a paper at the CHES (Cryptographic Hardware and Embedded Systems) conference in Japan on how to break the Mifare DESFire MF31CD40 contactless memory chip.

So back to basics, do we need to panic, is it important and will there be further repercussions? Really it's no to all these questions but don't go away yet because that would belittle the quality of their work.

They used Side Channel Analysis (SCA) by using an electromagnetic probe to contactlessly measure the power signal taken by the chip. Using these techniques they were able to recover the 2 DES keys (56 bits each) from

ANNY COCENTERS SECURITY 09.10.2010 01:00 PM

#### Hackers Can Steal a Tesla Model S in Seconds by Cloning Its Key Fob

Weak encryption in Tesla Model S key fobs allowed all-too-easy theft, but you can set a PIN code on your Tesla to protect it.



The researchers also believe their attack might wi<br>manufactures **ETHAN MILLER/BETIY IMAGES** 



ANDY GREENBERG SECURITY 88.18.16 4:29 PM

**OF PASSAL** 

Faculty of Computer Science and Mathematics

#### **A New Wireless Hack Can Unlock 100 Million Volkswagens**



FO KAZUHIRO NOGI/AFP/GETTY IMAGES

In 2013, when University of Birmingham computer scientist Flavio Garcia and a team of researchers were preparing to reveal a vulnerability that allowed them to start the ignition of millions of Volkswagen cars and drive them off without a key, they were hit with a lawsuit that delayed the publication of their research for two years. But that experience doesn't seem to have deterred Garcia and his





# **Good security designs and**

- Access control
- Data confidentiality
- Security
- Counterfeiting mitigations

# **architecture needed to**

# **resist attacks!**





• **Conventional cryptography**

- RSA
- Standard block cipher solutions (AES, etc.)
- **Applications in**
	- **Servers**
	- PCs
	- "Strong" tablets, smartphones



# Embedded/IoT devices  $\rightarrow$  Resource-constrained!

**Power and Energy Consumption:** Lowest Possible!



**Chip Area:** Limited!

 $\rightarrow$  to match these constraints:

# **Need for** *Tailored* **Cryptography:** *Lightweight* **Cryptography**



- Reduces computational efforts to provide security
	- Cheaper than traditional crypto
	- Not weak, but "sufficient, security
- Many different proposals/implementations especially in the last decade
	- Public-key cryptography: ECC
	- Stream ciphers: Grain, Trivium, etc.
	- Hash functions: Photon, Quark, etc.
	- **Block ciphers**
		- Core for symmetric cryptography, stream ciphers, MACs, etc.



and Mathematics



Home > News > NIST demands lightweight cryptography to protect IoT devices



NEWS

**NIST demands lightweight** cryptography to protect IoT devices



- Lightweight Cryptography (LWC) Standardization Competition by NIST
	- Specifically:

"Authenticated Encryption with Associated Data" (AEAD)



\*Figure from "L. Cardoso Santos and J. López, Pipeline Oriented Implementation of NORX for ARM Processors, SBSEG'17"



- In round format
- ASCON selected out of 10 finalists
- 1. Initialization: initializes the state with the key  $K$  and nonce  $N$ .
- 2. Associated Data Processing: updates the state with associated data blocks  $A_i$ .
- 3. Plaintext Processing: injects plaintext blocks  $P_i$  into the state and extracts

ciphertext blocks C<sub>i</sub>.

4. Finalization: injects the key  $K$  again and extracts the tag  $T$  for authentication.



The duplex sponge mode for Ascon authenticated encryption [tex]







 $x_0 := x_0 \oplus (x_0 \gg 19) \oplus (x_0 \gg 28)$  $x_1 := x_1 \oplus (x_1 \gg 61) \oplus (x_1 \gg 39)$  $x_2 := x_2 \oplus (x_2 \gg 1) \oplus (x_2 \gg 6)$  $x_3 := x_3 \oplus (x_3 \gg 10) \oplus (x_3 \gg 17)$  $x_4 := x_4 \oplus (x_4 \ggg 7) \oplus (x_4 \ggg 41)$ 

Ascon's S-box [tex] [C instructions]

#### Ascon's linear layer

Ascon's permutation:  $\oplus$  denotes xor,  $\odot$  denotes AND,  $\gg$  is rotation to the right.

# *Initial Proposals*

ISO/IEC 29192-2:2019 Information security — Lightweight cryptography — Part 2: Block ciphers

### **PRESENT: An Ultra-Lightweight Block Cipher**

A. Bogdanov<sup>1</sup>, L.R. Knudsen<sup>2</sup>, G. Leander<sup>1</sup>, C. Paar<sup>1</sup>, A. Poschmann<sup>1</sup>, M.J.B. Robshaw<sup>3</sup>, Y. Seurin<sup>3</sup>, and C. Vikkelsoe<sup>2</sup>

<sup>1</sup> Horst-Görtz-Institute for IT-Security, Ruhr-University Bochum, Germany <sup>2</sup> Technical University Denmark, DK-2800 Kgs. Lyngby, Denmark <sup>3</sup> France Telecom R&D, Issy les Moulineaux, France leander@rub.de, {abogdanov, cpaar, poschmann}@crypto.rub.de lars@ramkilde.com. chv@mat.dtu.dk {matt.robshaw, yannick.seurin}@orange-ftgroup.com

**Abstract.** With the establishment of the AES the need for new block ciphers has been greatly diminished; for almost all block cipher applications the AES is an excellent and preferred choice. However, despite recent implementation advances, the AES is not suitable for extremely constrained environments such as RFID tags and sensor networks. In this paper we describe an ultra-lightweight block cipher, PRESENT. Both security and hardware efficiency have been equally important during the design of the cipher and at 1570 GE, the hardware requirements for PRESENT are competitive with today's leading compact stream ciphers.

- Simple but strong design
	- Well-studied substitution-permutation network (SPN)
- Targeting hardware
- Low-area
	- Permutation is just wiring in hardware!









- Simple but strong design
	- Well-studied substitution-permutation network (SPN)
- Targeting hardware
- Low-area
	- Permutation is just wiring in hardware!







### **KLEIN: A New Family of Lightweight Block Ciphers**

Zheng Gong<sup>1</sup>, Svetla Nikova<sup>2,3</sup> and Yee Wei Law<sup>4</sup>

School of Computer Science, South China Normal University, China  $\mathbf{1}$ cis.gong@gmail.com <sup>2</sup> Faculty of EWI, University of Twente, The Netherlands <sup>3</sup> Dept. ESAT/SCD-COSIC, Katholieke Universiteit Leuven, Belgium s.i.nikova@utwente.nl <sup>4</sup> Department of EEE, The University of Melbourne, Australia yee.wei.law@qmail.com

Abstract. Resource-efficient cryptographic primitives are essential for realizing both security and efficiency in embedded systems like RFID tags and sensor nodes. Among those primitives, lightweight block cipher plays a major role as a building block for security protocols. In this paper, we describe a new family of lightweight block ciphers named KLEIN, which is designed for resourceconstrained devices such as wireless sensors and RFID tags. Compared to related proposals, KLEIN has advantage in the software performance on legacy sensor platforms, while its hardware implementation can be compact as well.

- AES-like
- Works on nibbles
- Involution Sbox



### The LED Block Cipher\*

Jian Guo<sup>1</sup>, Thomas Pevrin<sup>2,†</sup>, Axel Poschmann<sup>2,†</sup>, and Matt Robshaw<sup>3,‡</sup>

<sup>1</sup> Institute for Infocomm Research, Singapore<sup>2</sup> Nanyang Technological University, Singapore <sup>3</sup> Applied Cryptography Group, Orange Labs, France {ntu.guo, thomas.peyrin}@gmail.com aposchmann@ntu.edu.sg matt.robshaw@orange.com

**Abstract.** We present a new block cipher LED. While dedicated to compact hardware implementation, and offering the smallest silicon footprint among comparable block ciphers, the cipher has been designed to simultaneously tackle three additional goals. First, we explore the role of an ultra-light (in fact non-existent) key schedule. Second, we consider the resistance of ciphers, and LED in particular, to related-key attacks: we are able to derive simple yet interesting AES-like security proofs for LED regarding related- or single-key attacks. And third, while we provide a block cipher that is very compact in hardware, we aim to maintain a reasonable performance profile for software implementation.

### • AES-like

- Uses PRESENT Sbox
- Consists of steps
	- Number based on key size
	- Each step 4 rounds



- Initial proposals mostly address area in hardware / speed in software
- Other important metrics?





# • Area

- Usually measured in μm2, but depends on technology and the standard cell library
- Hence stated in gate equivalents (GE) independent of the technology and library
- One GE is equivalent to the area required to implement the twoinput NAND gate (area derived by dividing the area in μm2 by the area of a two-input NAND gate)

# **Cycles**

- # of clock cycles required to compute and output the results
- Time
	- Required time for a certain operation, i. e., # of cycles divided by operating frequency

\* Shahram Rasoolzadeh, Hardware-Oriented SPN Block Ciphers, PhD Thesis, RUB, 2020



# • Throughput

- Bit rate production of a new output w.r.t., i. e., # of output bits divided by time (expressed in bits per second – bps –)
- Power
	- Usually the power consumption estimated on the gate level by the synthesizer tool (typically in μW)
	- Power estimations on transistor level are more accurate (more steps in design flow)

# • Energy

– Power consumption over a certain time period, i. e., multiplicating the power consumption with the required time (typically in μJ)

<sup>\*</sup> Shahram Rasoolzadeh, Hardware-Oriented SPN Block Ciphers, PhD Thesis, RUB, 2020



# • Unrolled

- Whole encryption or decryption process is computed within only one clock cycle without using any registers in combinatorial circuit
- Low-latency

# • Pipelined

- Circuit for whole encryption or decryption process is implemented (similar to unrolled), some registers are inserted in the critical path (path with maximum delay) to increase
- Higher throughput rate but with the cost of higher area and power consumption

<sup>\*</sup> Shahram Rasoolzadeh, Hardware-Oriented SPN Block Ciphers, PhD Thesis, RUB, 2020



# • Round-based

- Each round function of the cipher is computed within one clock cycle
- Reduces area and power at cost of decreasing throughput

# • Serialized

- Each round function computed in several clock cycles, and in each clock cycle, a small part of the round function is computed (e. g., only one S-box, or only one word of the linear layer)
- Lower area & power consumption, but also lowest throughput
- After a point, implementing control logic may require more overhead than before

\* Shahram Rasoolzadeh, Hardware-Oriented SPN Block Ciphers, PhD Thesis, RUB, 2020





# IBM





## *Low-latency in Focus*





## Latency: Time to encrypt one block of data (ns)



\* Knezevic et al., Low-Latency Encryption – Is "Lightweight = Light + Wait"?, CHES, 2012



## Latency: Time to encrypt one block of data (ns)  $\rightarrow$  per round



Faculty of Computer Science<br>and Mathematics

# Latency: Time to encrypt one block of data (Corresponding area cost in GE)



\* Knezevic et al., Low-Latency Encryption – Is "Lightweight = Light + Wait"?, CHES, 2012

Latency: Time to encrypt one block of data (Corresponding area cost in  $GE \rightarrow$  per round)



### **Less area possible for encryption of one block of data?**

\* Knezevic et al., Low-Latency Encryption – Is "Lightweight = Light + Wait"?, CHES, 2012



Faculty of Computer Science

*Low-latency in Focus: What is Important? (1/2)*



- Keep the hardware cost of one round as low as possible
	- Main savings in Sbox, smaller (3/4 bit Sboxes better)
	- Even among these there are significant differences
- All rounds are unrolled
	- Cipher can be thought as one big round
	- Number of rounds hence is important, should be minimized
- All rounds same, decreases cost
	- Less round complexity as well based on components, not too low

*Low-latency in Focus: What is Important? (2/2)*



- Slightly heavy round with less/balanced number of rounds
- Simpler key schedule
	- Should be independent of number of rounds
	- Constant addition instead of key schedule should be preferred, if possible
- Minimum overhead for encryption and decryption
	- Use involution





## *Low-latency: PRINCE Cipher*

- 64-bit block, 128-bit key
- Core cipher with 64-bit key
- 64-bit whitening keys (FX construction)
- 12 rounds











 $R=\texttt{SR}\circ\texttt{MC}\circ\texttt{SB}\;,\quad R_{\texttt{PRINECE}}^{\prime}=\texttt{SB}^{-1}\circ\texttt{MC}\circ\texttt{SB}\quad\text{and}\quad R^{-1}=\texttt{SB}^{-1}\circ\texttt{MC}\circ\texttt{SR}^{-1}$  $\oplus_{k_i}(x) \coloneqq x + k_i$  $\oplus_{RC_i}(x) = x + RC_i$ 



 $R=\texttt{SR}\circ\texttt{MC}\circ\texttt{SB}\;,\quad R_{\texttt{PRINCE}}'=\texttt{SB}^{-1}\circ\texttt{MC}\circ\texttt{SB}\quad\text{and}\quad R^{-1}=\texttt{SB}^{-1}\circ\texttt{MC}\circ\texttt{SR}^{-1}$ 

 $\oplus_{k_i}(x) \coloneqq x + k_i$  $\oplus_{RC_i}(x) = x + RC_i$ 

### SB Sbox



$$
\widehat{M}^{(0)} = \begin{pmatrix} M_1 & M_2 & M_3 & M_4 \\ M_2 & M_3 & M_4 & M_1 \\ M_3 & M_4 & M_1 & M_2 \\ M_4 & M_1 & M_2 & M_3 \end{pmatrix}, \ \widehat{M}^{(1)} = \begin{pmatrix} M_2 & M_3 & M_4 & M_1 \\ M_3 & M_4 & M_1 & M_2 \\ M_4 & M_1 & M_2 & M_3 \\ M_1 & M_2 & M_3 & M_4 \end{pmatrix}, \ M' = \begin{pmatrix} \widehat{M}^{(0)} & 0 & 0 & 0 \\ 0 & \widehat{M}^{(1)} & 0 & 0 \\ 0 & 0 & \widehat{M}^{(1)} & 0 \\ 0 & 0 & 0 & \widehat{M}^{(0)} \end{pmatrix}
$$

 $M_i$  is the  $4 \times 4$  identity matrix

Permutation of SR

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 5 10 15 4 9 14 3 8 13 2 7 12 1 6 11

MC-layer multiplies the state with  $M'$ 

involution



63)

 $R = \texttt{SR} \circ \texttt{MC} \circ \texttt{SB} \; , \quad R_{\texttt{PRINECE}}^{\prime} = \texttt{SB}^{-1} \circ \texttt{MC} \circ \texttt{SB} \quad \text{and} \quad R^{-1} = \texttt{SB}^{-1} \circ \texttt{MC} \circ \texttt{SR}^{-1}$ 

 $\oplus_{k_i}(x) := x + k_i$  $\oplus_{RC_i}(x) = x + RC_i$ 





- 64000 Sboxes (and their inverses) with good cryptographic criteria are implemented and synthesized to obtain average gate counts
- Smallest Sbox selected




## **Results**



## *Low-latency: PRINCE Cipher*



### **Results**



### *Low-latency: PRINCE Cipher (Round-based)*







### **Results**



# *PRINCE: Deployment (NXP)*



#### **LPC55S6x MCU Block Diagram**





- NIST lightweight security requirement:
	- 112-bit security with at most  $2^{50}$  bytes of chosen data
	- *PRINCE cannot reach*: Data complexity 2<sup>n</sup>, time complexity 2<sup>126-n</sup>
- PRINCEv2
	- 64-bit block, 128-bit key
	- Core cipher with 64-bit key
	- 12 rounds







- Differences to PRINCE
	- FX construction and alpha reflection removed
	- Key schedule changed, new round constant introduced
	- Keyed middle layer





- Differences to PRINCE
	- FX construction and alpha reflection removed
	- Key schedule changed, new round constant introduced
	- Keyed middle layer

$$
R' = \text{SB}^{-1} \circ \oplus_{RC_{11}+k_1} \circ \text{MC} \circ \oplus_{k_0} \circ \text{SB}
$$

$$
\texttt{Swap}(k_0,k_1,\text{dec}) = \begin{cases} k_0, k_1 & \text{if } \text{dec} = 0 \\ k_1 \oplus \beta, k_0 \oplus \alpha & \text{if } \text{dec} = 1 \end{cases}
$$





- Differences to PRINCE
	- PRINCE round keys

 $k_0 \oplus k_1$ ,  $k_1$ ,  $k_1$ ,  $k_1$ ,  $k_1$ ,  $k_1$ ,  $k_1 \oplus \alpha$ ,  $k_0' \oplus k_1 \oplus \alpha$ 

• PRINCEv2 round keys

 $k_0$ ,  $k_1$ ,  $k_0$ ,  $k_1$ ,  $k_0$ ,  $k_1$ ,  $k_0$ ,  $k_1 \oplus \beta$ ,  $k_0 \oplus \alpha$ ,  $k_1 \oplus \beta$ ,  $k_0 \oplus \alpha$ ,  $k_1 \oplus \beta$ ,  $k_0 \oplus \alpha$ ,  $k_1 \oplus \beta$ 

$$
k_0 \rightarrow k_1 \rightarrow k_0 \rightarrow k_1 \rightarrow k_0 \rightarrow k_1 \rightarrow k_0
$$
  

$$
k_1 \oplus \beta \leftarrow k_0 \oplus \alpha \leftarrow k_1 \oplus \beta \leftarrow k_0 \oplus \beta \leftarrow k_1 \oplus \beta \leftarrow k_0 \oplus \alpha \leftarrow k_1 \oplus \beta
$$

 $k_0 \leftarrow k_1 \oplus \beta$  and  $k_1 \leftarrow k_0 \oplus \alpha$ :  $k_1 \oplus \beta \mathbin{\rightarrow} k_0 \oplus \alpha \mathbin{\rightarrow} k_1 \oplus \beta \mathbin{\rightarrow} k_0 \oplus \beta \mathbin{\rightarrow} k_1 \oplus \beta \mathbin{\rightarrow} k_0 \oplus \alpha \mathbin{\rightarrow} k_1 \oplus \beta$  $k_0 \oplus \leftarrow k_1 \oplus \leftarrow k_0 \oplus \leftarrow k_1 \oplus \leftarrow k_0 \oplus \leftarrow k_1 \oplus \leftarrow k_0 \oplus$  $\alpha \oplus \beta$   $\alpha \oplus \beta$ 



### Minimum latency constrained



 $^*$  LP = Low Power

\*\* HPC = High Performance Computing



# Minimum latency constrained





### Minimum area constrained



\*  $LP = Low Power$ 

\*\* HPC = High Performance Computing



# Minimum area constrained



## *Midori*

Low energy oriented design



- Not necessarily for low latency but compared with PRINCE
- Cipher specifics
	- 64/128-bit block, 128-bit key
	- SPN: 2 bijective Sboxes (nonlinear) and involutive binary matrix (linear)



\* Banik et al., Midori: A Block Cipher for Low Energy, ASIACRYPT, 2015

## *MANTIS*

• Turning PRINCE into a tweakable block cipher



- Well understood when TWEAKEY framework is used
- Cipher specifics
	- 64-bit block, 128-bit key, 64-bit tweak
	- FX-design, SPN
		- Midori Sbox used as it has better latency than PRINCE Sbox
	- 14 rounds (PRINCE-like middle)



\* Beierle et al., The SKINNY Family of Block Ciphers and its Low-Latency Variant MANTIS, CRYPTO, 2016

### *QARMA*



- 
- Lightweight tweakable block cipher Primarily known for its use in the ARMv8 architecture
	- For protection of software as a cryptographic hash for the Pointer Authentication Code
- Cipher specifics
	- 64/128-bit block, 128/256-bit key (round numbers 7/10 in permutation)
	- An Even-Mansour cipher using three stages, with whitening keys  $w^0$  and  $w^1$  XORed in between permutation F (and its inverse) which uses using core key  $\mathsf{k}^0$  and parameterized by a tweak  $\bar{T}$  and "central" permutation C which uses key k<sup>1</sup> and is designed to be reversible via a simple key transformation



\* Roberto Avanzi, The QARMA Block Cipher Family, ToSC, vol. 17, 2017

Faculty of Computer Science and Mathematics





\* Roberto Avanzi, The QARMA Block Cipher Family, ToSC, vol. 17, 2017





Comparison of unrolled block ciphers in NanGate 45nm Open Cell Library.



and Mathematics

### K-Cipher: A Low Latency, Bit Length Parameterizable Cipher

Michael Kounavis, Sergej Deutsch, Santosh Ghosh, and David Durham

Intel Labs, Intel Corporation, 2111, NE 25th Avenue, Hillsboro, OR 97124 Email: {michael.e.kounavis, sergej.deutsch, santosh.ghosh, david.durham}@intel.com

Abstract-We present the design of a novel low latency, bit length parameterizable cipher, called the "K-Cipher". K-Cipher is particularly useful to applications that need to support ultra low latency encryption at arbitrary ciphertext lengths. We can think of a range of networking, gaming and computing applications that may require encrypting data at unusual block lengths for many different reasons, such as to make space for other unencrypted state values. Furthermore, in modern applications, encryption is typically required to complete inside stringent time frames in order not to affect performance. K-Cipher has been designed to meet these requirements. In the paper we present the K-Cipher design and discuss its rationale. We also present results from our ongoing security analysis which suggest that only 2 to 4 rounds are sufficient to make the cipher operate securely. Finally, we present synthesis results from 2round 32 bit and 64-bit K-Cipher encrypt datapaths, produced using Intel's (R) 10 nm process technology. Our results show that the encrypt datapaths can complete in no more than 767 psec, or 3 clocks in 3.9-4.9 GHz frequencies, and are associated with a maximum area requirement of  $1875 \mu m^2$ 



Fig. 1. The Aggressive Adder Component of the K-Cipher Round



and Mathematics

### K-Cipher: A Low Latency, Bit Length Parameterizable Cipher

Michael Kounavis, Sergej Deutsch, Santosh Ghosh, and David Durham

Intel Labs, Intel Corporation, 2111, NE 25th Avenue, Hillsboro, OR 97124 Email: {michael.e.kounavis, sergej.deutsch, santosh.ghosh, david.durham}@intel.com

Abstract-We present the design of a novel low latency, bit length parameterizable cipher, called the "K-Cipher". K-Cipher is particularly useful to applications that need to support ultra low latency encryption at arbitrary ciphertext lengths. We can think of a range of networking, gaming and computing applications that may require encrypting data at unusual block lengths for many different reasons, such as to make space for other unencrypted state values. Furthermore, in modern applications, encryption is typically required to complete inside stringent time frames in order not to affect performance. K-Cipher has been designed to meet these requirements. In the paper we present the K-Cipher design and discuss its rationale. We also present results from our ongoing security analysis which suggest that only 2 to 4 rounds are sufficient to make the cipher operate securely. Finally, we present synthesis results from 2round 32 bit and 64 bit K-Cipher encrypt datapaths, produced using Intel's (R) 10 nm process technology. Our results show that the encrypt datapaths can complete in no more than 767 psec, or 3 clocks in 3.9-4.9 GHz frequencies, and are associated with a maximum area requirement of  $1875 \mu m^2$ 



Fig. 2. Two Round K-Cipher Specification

| cipher                   | area $(\mu m^2)$ | latency (psec) | number of clocks | freq.             |
|--------------------------|------------------|----------------|------------------|-------------------|
| K-Cipher Enc-32, $r = 2$ | 614              | 613            |                  | $4.9 \text{ GHz}$ |
| K-Cipher Enc-64, $r = 2$ | 1875             | 767            |                  | $3.9 \text{ GHz}$ |



The SPEEDY Family of Block Ciphers

Engineering an Ultra Low-Latency Cipher from Gate Level for Secure **Processor Architectures** 

Gregor Leander<br>' $\textcircled{D},$  Thorben  $\text{Moos}^1\textcircled{D},$  Amir Moradi<br>' $\textcircled{D}$  and Shahram Rasoolzadeh\*2  $\textcircled{D}$ 

<sup>1</sup> Ruhr University Bochum, Horst Görtz Institute for IT Security, Bochum, Germany firstname.lastnameCrub.de <sup>2</sup> Radboud University, Nijmegen, The Netherlands firstname.lastname@ru.nl

Abstract. We introduce SPEEDY, a family of ultra low-latency block ciphers. We mix engineering expertise into each step of the cipher's design process in order to create a secure encryption primitive with an extremely low latency in CMOS hardware. The centerpiece of our constructions is a high-speed 6-bit substitution box whose coordinate functions are realized as two-level NAND trees. In contrast to other low-latency block ciphers such as PRINCE, PRINCEv2, MANTIS and QARMA, we neither constrain ourselves by demanding decryption at low overhead, nor by requiring a super low area or energy. This freedom together with our gate- and transistor-level considerations allows us to create an ultra low-latency cipher which outperforms all known solutions in single-cycle encryption speed. Our main result, SPEEDY-6-192, is a 6-round 192-bit block and 192-bit key cipher which can be executed faster in hardware than any other known encryption primitive (including Gin11 in Even-Mansour scheme and the Orthros pseudorandom function) and offers 128-bit security. One round more, i.e., SPEEDY-7-192, provides full 192-bit security. SPEEDY primarily targets hardware security solutions embedded in high-end CPUs, where area and energy restrictions are secondary while high performance is the number one priority.

Keywords: Low-Latency Cryptography, High-Speed Encryption, Block Cipher



Figure 3: Implementation of the 6-bit S-box of SPEEDY based on two-level NAND trees.





and Mathematics



 $*$  = Optimized HDL code with direct instantiation of library cells based on Figures 3 and 4.



 $\hspace*{-0.1cm}^*$  = Optimized HDL code with direct instantiation of library cells based on Figures 3 and 4.

*Orthros*

**IACR Transactions on Symmetric Cryptology** ISSN 2519-173X, Vol. 2021, No. 1, pp. 37-77.

DOI:10.46586/tosc.v2021.i1.37-77



#### **Orthros: A Low-Latency PRF**

Subhadeep Banik<sup>1</sup> and Takanori Isobe<sup>2,3,4</sup> and Fukang Liu<sup>2,5</sup> and Kazuhiko Minematsu<sup>6</sup> and Kosei Sakamoto<sup>2</sup>

 $^1$  LASEC, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland. subhadeep.banik@epfl.ch <sup>2</sup> University of Hyogo, Kobe, Japan. takanori.isobe@ai.u-hyogo.ac.jp, liufukangs@gmail.com, k.sakamoto0728@gmail.com <sup>3</sup> NICT, Tokyo, Japan <sup>4</sup> PRESTO, Japan Science and Technology Agency, Tokyo, Japan <sup>5</sup> East China Normal University, Shanghai, China  $6$  NEC, Kawasaki, Japan k-minematsu@nec.com

Abstract. We present Orthros, a 128-bit block pseudorandom function. It is designed with primary focus on latency of fully unrolled circuits. For this purpose, we adopt a parallel structure comprising two keyed permutations. The round function of each permutation is similar to Midori, a low-energy block cipher, however we thoroughly revise it to reduce latency, and introduce different rounds to significantly improve cryptographic strength in a small number of rounds. We provide a comprehensive, dedicated security analysis. For hardware implementation, Orthros achieves the lowest latency among the state-of-the-art low-latency primitives. For example, using the STM 90nm library, Orthros achieves a minimum latency of around 2.4 ns, while other constructions like PRINCE, Midori-128 and QARMA<sub>9</sub>-128- $\sigma_0$  achieve 2.56 ns, 4.10 ns, 4.38 ns respectively.

Keywords: Pseudorandom Function  $\cdot$  Low Latency  $\cdot$  Lightweight Cryptography  $\cdot$ Sum of Permutations



*Orthros*

**IACR Transactions on Symmetric Cryptology** ISSN 2519-173X, Vol. 2021, No. 1, pp. 37-77.

DOI:10.46586/tosc.v2021.i1.37-77



**Orthros: A Low-Latency PRF** 

Subhadeep Banik<sup>1</sup> and Takanori Isobe<sup>2,3,4</sup> and Fukang Liu<sup>2,5</sup> and Kazuhiko Minematsu<sup>6</sup> and Kosei Sakamoto<sup>2</sup>

 $^1$  LASEC, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland. subhadeep.banik@epfl.ch <sup>2</sup> University of Hyogo, Kobe, Japan. takanori.isobe@ai.u-hyogo.ac.jp, liufukangs@gmail.com, k.sakamoto0728@gmail.com <sup>3</sup> NICT, Tokyo, Japan <sup>4</sup> PRESTO, Japan Science and Technology Agency, Tokyo, Japan <sup>5</sup> East China Normal University, Shanghai, China  $6$  NEC, Kawasaki, Japan k-minematsu@nec.com

Abstract. We present Orthros, a 128-bit block pseudorandom function. It is designed with primary focus on latency of fully unrolled circuits. For this purpose, we adopt a parallel structure comprising two keyed permutations. The round function of each permutation is similar to Midori, a low-energy block cipher, however we thoroughly revise it to reduce latency, and introduce different rounds to significantly improve cryptographic strength in a small number of rounds. We provide a comprehensive, dedicated security analysis. For hardware implementation, Orthros achieves the lowest latency among the state-of-the-art low-latency primitives. For example, using the STM 90nm library, Orthros achieves a minimum latency of around 2.4 ns, while other constructions like PRINCE, Midori-128 and QARMA<sub>9</sub>-128- $\sigma_0$  achieve 2.56 ns, 4.10 ns, 4.38 ns respectively.

Keywords: Pseudorandom Function  $\cdot$  Low Latency  $\cdot$  Lightweight Cryptography  $\cdot$ Sum of Permutations



*SCARF*



#### **SCARF-A Low-Latency Block Cipher for Secure Cache-Randomization**

Federico Canale <sup>1</sup>, Tim Güneysu<sup>1,4</sup> **6**, Gregor Leander<sup>1</sup> **6**, Jan Philipp Thoma<sup>1</sup> **6**, Yosuke Todo<sup>2</sup> (D, and Rei Ueno<sup>3</sup> (D)

<sup>1</sup> Ruhr University Bochum, Bochum, Germany firstname.lastname@rub.de <sup>2</sup> NTT Social Informatics Laboratories, Tokyo, Japan yosuke.todo@ntt.com <sup>3</sup> Tohoku University, Sendai-shi, Japan. rei. ueno. a8@tohoku. ac. 1p <sup>4</sup> DFKI, Bremen, Germany.

A bstract. Randomized cache architectures have proven to significantly increase the complexity of contention-based cache side-channel attacks and therefore present an important building block for side-channel secure microarchitectures. By randomizing the address-to-cache-index mapping, attackers can no longer trivially construct minimal eviction sets which are fundamental for contention-based cache attacks. At the same time, randomized caches maintain the flexibility of traditional caches, making them broadly applicable across various CPU types. This is a major advantage over cache partitioning approaches.

A large variety of randomized cache architectures has been proposed. However, the actual randomization function received little attention and is often neglected in these proposals. Since the randomization operates directly on the critical path of the cache lookup, the function needs to have extremely low latency. At the same time, attackers must not be able to bypass the randomization which would nullify the security benefit of the randomized mapping. In this paper, we propose SCARF (Secure CAche Randomization Function), the first dedicated cache randomization cipher which achieves low latency and is cryptographically secure in the cache attacker model. The design methodology for this dedicated cache cipher enters new territory in the field of block ciphers with a small 10-bit block length and heavy key-dependency in few rounds.





and Mathematics

#### **SCARF-A Low-Latency Block Cipher for Secure Cache-Randomization**

Federico Canale <sup>[6]</sup>, Tim Güneysu<sup>1,4</sup> [6], Gregor Leander<sup>1</sup> [6], Jan Philipp Thoma<sup>1</sup> [6], Yosuke Todo<sup>2</sup> (D, and Rei Ueno<sup>3</sup> (D)

<sup>1</sup> Ruhr University Bochum, Bochum, Germany firstname.lastname@rub.de <sup>2</sup> NTT Social Informatics Laboratories, Tokyo, Japan yosuke.todo@ntt.com <sup>3</sup> Tohoku University, Sendai-shi, Japan. re1. ueno. a8@tohoku. ac. 1p <sup>4</sup> DFKI, Bremen, Germany.

A bstract. Randomized cache architectures have proven to significantly increase the complexity of contention-based cache side-channel attacks and therefore present an important building block for side-channel secure microarchitectures. By randomizing the address-to-cache-index mapping, attackers can no longer trivially construct minimal eviction sets which are fundamental for contention-based cache attacks. At the same time, randomized caches maintain the flexibility of traditional caches, making them broadly applicable across various CPU types. This is a major advantage over cache partitioning approaches.

A large variety of randomized cache architectures has been proposed. However, the actual randomization function received little attention and is often neglected in these proposals. Since the randomization operates directly on the critical path of the cache lookup, the function needs to have extremely low latency. At the same time, attackers must not be able to bypass the randomization which would nullify the security benefit of the randomized mapping. In this paper, we propose SCARF (Secure CAche Randomization Function), the first dedicated cache randomization cipher which achieves low latency and is cryptographically secure in the cache attacker model. The design methodology for this dedicated cache cipher enters new territory in the field of block ciphers with a small 10-bit block length and heavy key-dependency in few rounds.







#### LLLWBC: A New Low-Latency Light-Weight Block Cipher

I

Lei Zhang<sup>1,2( $\otimes$ )</sup>  $\odot$ , Ruichen Wu<sup>1</sup>, Yuhan Zhang<sup>1</sup>, Yafei Zheng<sup>1,2</sup>, and Wenling Wu<sup>1</sup>

<sup>1</sup> Institute of Software, Chinese Academy of Sciences, Beijing 100190, China zhanglei@iscas.ac.cn <sup>2</sup> State Key Laboratory of Cryptology, P. O. Box 5159, Beijing, China

Abstract. Lightweight cipher suitable for resource constrained environment is crucial to the security of applications such as RFID. Internet of Things, etc. Moreover, in recent years low-latency is becoming more important and highly desirable by some specific applications which need instant response and real-time security. In this paper, we propose a new low-latency block cipher named LLLWBC. Similar to other known lowlatency block ciphers, LLLWBC preserves the important  $\alpha$ -reflection property, namely the decryption for a key  $K$  is equal to encryption with a key  $K \oplus \alpha$  where  $\alpha$  is a fixed constant. However, instead of the normally used SP-type construction, the core cipher employs a variant of generalized Feistel structure called extended GFS. It has 8 branches and employs byte-wise round function and nibble-wise round permutation iterated for 21 rounds. We choose the round permutations carefully together with a novel key schedule to guarantee the  $\alpha$ -reflection property. This allows an efficient fully unrolled implementation of LLLWBC in hardware and the overhead of decryption on top of encryption is negligible. Moreover, because of the involutory property of extended GFS, the inverse round function is not needed, which makes it possible to be implemented in round-based architecture with a competitive area cost. Furthermore, our security evaluation shows that LLLWBC can achieve enough security margin within the constraints of security claims. Finally, we evaluate the hardware and software performances of LLLWBC on various platforms and a brief comparison with other low-latency ciphers is also presented.

Keywords: Block cipher · Low-latency · Lightweight · Extended GFS



Table 8. Performance results of round-based version of LLLWBC and PRINCE.



#### Table 7. Performance results of fully unrolled version of LLLWBC and other ciphers.

### *Sonic and SuperSonic*



and Mathematics

### Introducing two Low-Latency Cipher Families: Sonic and SuperSonic

Yanis Belkheyar<sup>1</sup>, Joan Daemen<sup>1</sup>, Christoph Dobraunig<sup>2</sup>, Santosh Ghosh<sup>2</sup> and Shahram Rasoolzadeh<sup>1</sup>

<sup>1</sup> Digital Security Group, Radboud University, Nijmegen, The Netherlands firstnane.lastname@ru.nl <sup>2</sup> Intel Labs, Hillsboro, USA firstnane.lastname@intel.com

Abstract. For many latency-critical operations in computer systems, like memory reads/writes, adding encryption can have a big impact on the performance. Hence, the existence of cryptographic primitives with good security properties and minimal latency is a key element in the wide-spread implementation of such security measures. In this paper, we introduce two new families of low-latency permutations/block ciphers called SONIC and SUPERSONIC, inspired by the SIMON block ciphers.

Keywords: low-latency, Simon, Sonic, SuperSonic, Feistel structure, gate-delaybalenced Feistel, block cipher





Figure 1: Comparison of round functions of SIMON, SONIC, and SUPERSONIC.

*BipBip*

### BipBip: A Low-Latency Tweakable Block Cipher with Small Dimensions

Yanis Belkheyar<sup>1</sup>, Joan Daemen<sup>1</sup>, Christoph Dobraunig<sup>2</sup>, Santosh Ghosh<sup>2</sup> and Shahram Rasoolzadeh<sup>1</sup>

> <sup>1</sup> Digital Security Group, Radboud University, Nijmegen, The Netherlands firstname.lastname@ru.nl

<sup>2</sup> Intel Labs, Hillsboro, USA firstname.lastname@intel.com

**Abstract.** Recently, a memory safety concept called Cryptographic Capability Computing  $(C^3)$  has been proposed.  $C^3$  is the first memory safety mechanism that works without requiring extra storage for metadata and hence, has the potential to significantly enhance the security of modern IT-systems at a rather low cost. To achieve this, C<sup>3</sup> heavily relies on ultra-low-latency cryptographic primitives. However, the most crucial primitive required by  $C^3$  demands uncommon dimensions. To partially encrypt 64-bit pointers, a 24-bit tweakable block cipher with a 40-bit tweak is needed. The research on low-latency tweakable block ciphers with such small dimensions is not very mature. Therefore, designing such a cipher provides a great research challenge, which we take on with this paper. As a result, we present BipBip, a 24-bit tweakable block cipher with a 40-bit tweak that allows for ASIC implementations with a latency of 3 cycles at a 4.5 GHz clock frequency on a modern 10 nm CMOS technology.

Keywords: BipBip · low-latency · tweakable block cipher



| <b>Cipher</b>     | <b>Critical Path</b> |            | Area  | Power |
|-------------------|----------------------|------------|-------|-------|
|                   | <b>Gate Levels</b>   | Delay [ps] | IGEl  | [mW]  |
| Prince Enc/Dec    | 74                   | 853        | 7542  | 42.71 |
| <b>BipBip Dec</b> | 48                   | 622        | 5741  | 15.91 |
| <b>BipBip Enc</b> | 148                  | 1523       | 10776 | 19.23 |

**Table 9:** Unrolled implementation results on Intel 10nm with 0.85 V and 100°C.



Figure 2: Structure of BipBip.



Ad Hoc Networks 160 (2024) 103524



#### IoVCipher: A low-latency lightweight block cipher for internet of vehicles

Xiantong Huang<sup>a,b</sup>, Lang Li<sup>a,b,\*</sup>, Hong Zhang<sup>a,b</sup>, Jinling Yang<sup>a,b</sup>, Juanli Kuang<sup>a,b,c</sup>

<sup>a</sup> College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China

<sup>b</sup> Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang Normal University, Hengyang 421002, China <sup>E</sup> Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China

ARTICLE INFO

Keywords:

**Low latency** 

Internet of vehicles

**Automotive security** 

Electronic control units

Lightweight block cipher

**ABSTRACT** 

The data security of CAN bus system is receiving increasing attention with the rapid development of Internet of Vehicles (IoV). However, traditional ciphers are not the best choice due to the limitations of computation, real-time, and resources of Electronic Control Units in vehicles. Thus, this paper proposes a lightweight block cipher IoVCipher to protect the security of IoV. It is designed focus on the latency and area in roundbased architectures (both encryption and decryption) to meet this resource-constrained environments. For this purpose, two S-boxes with low latency and tiny area are constructed in this paper, one involution and one non-involution. Considering the decryption latency, a low latency subkey generation method is designed. In addition, this paper proposes a new extended MISTY structure that makes the encryption and decryption of hardware implementations similar. In comparison to other low-latency lightweight block ciphers such as PRINCE, QARMA, MANTIS and LLLWBC, IoVCipher achieves an effective balance between latency and area in the round-based architecture, and IoVCipher has low latency, low area, and low energy in the fully unrolled architecture. Finally, IoVCipher is implemented on a real-time speed acquisition and encryption testbed to simulate encrypted transmission of real-time speed in a CAN bus environment.



Fig. 4. Encryption and decryption on CAN bus messages.

Implementation results for fully unrolled architecture (encryption-only). Estimated for 100 MHz operation.







**Classical attacks**





# **Physical attack**



# *Attack-resistant Lightweight Crypto Implementations*

- Only efficiency and functionality not adequate
- Should also be resistant against physical attacks
	- Side-channel attacks (SCA)
	- Fault attacks
- Countermeasures
	- Masking: Threshold implementations, etc.
	- Redundancy





*Countermeasures*

- Masking Novel techniques
	- Threshold Implementations
	- Domain-oriented masking
	- Comes with cost, sharing of secret triples/quadruples costs (even for first-order security)
- Brings additional randomness as well
	- Cost of random number generators non-linear shift registers (NLFSRs)
- Redundancy against fault attacks
	- For critical parts in the design (NLFSRs)



### *PRINCE: When Protected*





### *SCA-resistant "Threshold Implementation" of PRINCE*

### **Bozilov et al (KU Leuven) at LWC Workshop 2016**

#### Threshold Implementations of PRINCE: The Cost of Physical Security

**Abstract.** Threshold implementations have recently emerged as one of the most popular masking countermeasures for hardware implementations of cryptographic primitives. In the original version of TI, the number of input shares was dependant on both security order d and algebraic degree of a function t, namely  $td + 1$ . At CRYPTO 2015 Reparaz et al. presented a way to perform d-th order secure implementation using  $d+1$  shares. Here we analyze  $d+1$  and  $td+1$  TI versions for first and second order secure implementations of the PRINCE block cipher. We compare a plain round-based implementation of PRINCE with its secured versions and we report hardware figures to indicate the overhead introduced by adding a side channel protection.


- Applied on PRINCE Sbox: Algebraic degree 3, Class  $Q_{294}$
- Unprotected, round-based PRINCE





- Class  $Q_{294}$  sharing, first-order secure, 3 by 3 sharing
- No re-masking, sharing is uniform





- Class  $Q_{294}$  sharing, second-order secure, 5 by 10 sharing
- Re-masking applied





# **Results**

PRINCE-128 (round-based implementation) unprotected



#### PRINCE-128 (round-based implementation) 1st-order secure



#### PRINCE-128 (round-based implementation) 2nd-order secure





- Development of novel resource-efficient ciphers
	- Both SCA and fault attacks in mind
	- With the cost of randomness in mind
		- Needed for countermeasures
	- Still more on low-latency!



Faculty of Computer Science<br>and Mathematics

# **Thanks for listening!**

Any questions?

([elif.kavun@uni-passau.de\)](mailto:elif.kavun@uni-passau.de)