JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 492

TI Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard

A1 Guido Bertoni,

A1 Paolo Maistri,

A1 Israel Koren,

A1 Luca Breveglieri,

A1 Vincenzo Piuri,

K1 Advanced Encryption Standard

K1 AES

K1 fault tolerance

K1 fault detection

K1 parity codes.

AB <p><b>Abstract</b>—The goal of the Advanced Encryption Standard (AES) is to achieve secure communication. The use of AES does not, however, guarantee reliable communication. Prior work has shown that even a single transient error occurring during the AES encryption (or decryption) process will very likely result in a large number of errors in the encrypted/decrypted data. Such faults must be detected before sending to avoid the transmission and use of erroneous data. Concurrent fault detection is important not only to protect the encryption/decryption process from random faults. It will also protect the encryption/decryption circuitry from an attacker who may maliciously inject faults in order to find the encryption secret key. In this paper, we first describe some studies of the effects that faults may have on a hardware implementation of AES by analyzing the propagation of such faults to the outputs. We then present two fault detection schemes: The first is a redundancy-based scheme while the second uses an error detecting code. The latter is a novel scheme which leads to very efficient and high coverage fault detection. Finally, the hardware costs and detection latencies of both schemes are estimated.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190590

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190590

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 403

TI A High-Speed Oscillator-Based Truly Random Number Source for Cryptographic Applications on a Smart Card IC

A1 Marco Bucci,

A1 Raimondo Luzzi,

A1 Lucia Germani,

A1 Alessandro Trifiletti,

A1 Mario Varanonuovo,

K1 Random number generator

K1 noise source

K1 jittered oscillator

K1 Smart Cards.

AB <p><b>Abstract</b>—The design of a high-speed IC random number source macro-cell, suitable to be integrated in a Smart Card microcontroller, is presented. The oscillator sampling technique is exploited and a jittered oscillator which features an amplified thermal noise source has been designed in order to increase the output throughput and the statistical quality of the generated bit sequences. The oscillator feedback loop acts as an offset compensation for the noise amplifier, thus solving one of the major issues in this kind of circuit. A numerical model for the proposed system has been developed which allows us to carry out an analytical expression for the transition probability between successive bits in the output stream. A prototype chip has been fabricated in a standard digital <tmath>0.18\mu m</tmath><it>n</it>-well CMOS process which features a 10Mbps throughput and fulfills the NIST FIPS and correlation-based tests for randomness. The macro-cell area, excluding pads, is <tmath>0.0016mm^{2}</tmath> (<tmath>184\mu m \times 86\mu m</tmath>) and a <tmath>2.3mW</tmath> power consumption has been measured.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190581

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190581

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 506

TI Two-Version Based Concurrency Control and Recovery in Real-Time Client/Server Databases

A1 Tei-Wei Kuo,

A1 Chin-Fu Kuo,

A1 Yuan-Ting Kao,

K1 Real-time database

K1 concurrency control

K1 recovery

K1 read-only transactions

K1 client/server databases

K1 logging

K1 write through procedure.

AB <p><b>Abstract</b>—While there has been a significant amount of research in real-time concurrency control, little work has been done in logging and recovery for real-time databases. This paper proposes a two-version approach which considers both real-time concurrency control and recovery. We propose a network-server-based architecture and algorithms which can not only reduce the blocking time of higher-priority transactions and improve the response time of client-side read-only transactions, but also provide a diskless runtime logging mechanism and an efficient and predictable recovery procedure. The performance of the algorithms was verified by a series of simulation experiments by comparing the algorithms with the well-known Priority Ceiling Protocol (PCP), the Read/Write PCP, the New PCP, and the 2-version two-phase locking protocol, for which we have very encouraging results. The schedulability of higher-priority transactions and the response time of client-side read-only transactions were all greatly improved.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190591

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190591

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 449

TI A Scalable Dual-Field Elliptic Curve Cryptographic Processor

A1 Kohji Takano,

A1 Akashi Satoh,

K1 Elliptic curve cryptography

K1 public key cryptography

K1 Montgomery multiplication

K1 Galois field

K1 high-speed hardware

K1 ASIC implementation.

AB <p><b>Abstract</b>—We propose an elliptic curve (EC) cryptographic processor architecture that can support Galois fields <tmath>{\rm GF}(p)</tmath> and <tmath>{\rm GF}(2^n)</tmath> for arbitrary prime numbers and irreducible polynomials by introducing a dual field multiplier. A Montgomery multiplier with an optimized data bus and an on-the-fly redundant binary converter boost the throughput of the EC scalar multiplication. All popular cryptographic functions such as DSA, EC-DSA, RSA, CRT, and prime generation are also supported. All commands are organized in a hierarchical structure according to their complexity. Our processor has high scalability and flexibility between speed, hardware area, and operand size. In the hardware evaluation using a 0.13-<tmath>\mu</tmath>m CMOS standard cell library, the high-speed design using 117.5 Kgates with a 64-bit multiplier achieved operation times of 1.21 ms and 0.19 ms for a 160-bit EC scalar multiplication in <tmath>{\rm GF}(p)</tmath> and <tmath>{\rm GF}(2^n)</tmath>, respectively. A compact version with an 8-bit multiplier requires only 28.3K gates and executes the operations in 7.47 ms and 2.79 ms. Not only 160-bit operations, but any bit length can be supported by any hardware configuration so long as the memory capacity is sufficient.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190586

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190586

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 428

TI Efficient Multiplication Beyond Optimal Normal Bases

A1 Arash Reyhani-Masoleh,

A1 M. Anwar Hasan,

K1 Finite fields

K1 multiplication

K1 normal bases

K1 composite fields

K1 optimal bases.

AB <p><b>Abstract</b>—In cryptographic applications, the use of normal bases to represent elements of the finite field <tmath>{\rm GF}( 2^{m})</tmath> is quite advantageous, especially for hardware implementation. In this article, we consider an important field operation, namely, multiplication which is used in many cryptographic functions. We present a class of algorithms for normal basis multiplication in <tmath>{\rm GF}( 2^{m})</tmath>. Our proposed multiplication algorithm for composite finite fields requires a significantly lower number of bit level operations and, hence, can reduce the space complexity of cryptographic systems.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190584

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190584

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 401

TI Guest Editors? Introduction to the Special Section on Cryptographic Hardware and Embedded Systems

A1 Christof Paar,

A1 Çetin K. Koç,

K1

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190580

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190580

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 525

TI What Designers of Bus and Network Architectures Should Know about Hypercubes

A1 M. Sami Fadali,

A1 Kirk F. Korver,

A1 Laurence E. LaForge,

K1 Hypercube fault tolerance

K1 hypercube latency

K1 configuration architectures

K1 performability

K1 quorums

K1 Hamming graphs

K1 K-cubes

K1 Moore graphs

K1 Moore Bound

K1 C-cubes

K1 Lee distance.

AB <p><b>Abstract</b>—We quantify why, as designers, we should prefer clique-based hypercubes (<it>K-cubes</it>) over traditional hypercubes based on cycles (<it>C-cubes</it>). Reaping fresh analytic results, we find that K-cubes minimize the wirecount and, <it>simultaneously</it>, the latency of hypercube architectures that tolerate failure of any <tmath>f</tmath> nodes. Refining the graph model of Hayes (1976), we pose the feasibility of configuration as a problem in multivariate optimization:</p><p>What <tmath>(f + 1){\hbox{-}}{\rm connected}</tmath><tmath>n{\hbox{-}}{\rm vertex}</tmath> graphs with fewest edges <tmath>\lceil n ( f + 1) / 2\rceil</tmath> minimize the maximum a) radius or b) diameter of subgraphs (i.e., <it>quorums</it>) induced by deleting up to <tmath>f</tmath> vertices? (1)</p><p>We solve (1) for <tmath>f</tmath> that is superlogarithmic but sublinear in <tmath>n</tmath> and, in the process, prove: 1) the fault tolerance of K-cubes is proportionally greater than that of C-cubes; 2) quorums formed from K-cubes have a diameter that is asymptotically convergent to the Moore Bound on radius; 3) under any conditions of scaling, by contrast, C-cubes diverge from the Moore Bound. Thus, K-cubes are <it>optimal</it>, while C-cubes are <it>suboptimal</it>. Our exposition furthermore: 4) counterexamples, corrects, and generalizes a mistaken claim by Armstrong and Gray (1981) concerning binary cubes; 5) proves that K-cubes and certain of their quorums are the <it>only</it> graphs which can be labeled such that the edge distance between any two vertices equals the Hamming distance between their labels; and 6) extends our results to K-cube-connected cycles and edges. We illustrate and motivate our work with applications to the synthesis of multicomputer architectures for deep space missions.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190592

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190592

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 418

TI Low Complexity Multiplication in a Finite Field Using Ring Representation

A1 Rajendra Katti,

A1 Joseph Brennan,

K1 Finite field multiplication

K1 ring representation

K1 systolic arrays.

AB <p><b>Abstract</b>—Elements of a finite field, <tmath>GF(2^m )</tmath>, are represented as elements in a ring in which multiplication is more time efficient. This leads to faster multipliers with a modest increase in the number of XOR and AND gates needed to construct the multiplier. Such multipliers are used in error control coding and cryptography. We consider rings modulo trinomials and 4-term polynomials. In each case, we show that our multiplier is faster than multipliers over elements in a finite field defined by irreducible pentanomials. These results are especially significant in the field of elliptic curve cryptography, where pentanomials are used to define finite fields. Finally, an efficient systolic implementation of a multiplier for elements in a ring defined by <tmath>x^n + x + 1</tmath> is presented.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190583

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190583

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 440

TI Achieving NTRU with Montgomery Multiplication

A1 Berk Sunar,

A1 Colleen O'Rourke,

K1 Cryptography

K1 NTRU

K1 unified architectures

K1 Montgomery multipliers

K1 Montgomery multiplication

K1 finite fields.

AB <p><b>Abstract</b>—In this paper, we propose a new unified architecture that utilizes the Montgomery Multiplication algorithm to perform a modular multiplication for both integers and binary polynomials and NTRU's polynomial multiplications. The unified design is capable of supporting a majority of public-key cryptosystems such as NTRU, RSA, Diffie-Hellman key exchange, and Elliptic Curve schemes, among others. Furthermore, the architecture is highly efficient in terms of area and speed.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190585

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190585

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 461

TI RSA Speedup with Chinese Remainder Theorem Immune against Hardware Fault Cryptanalysis

A1 Seongan Lim,

A1 Seungjoo Kim,

A1 Sang-Jae Moon,

A1 Sung-Ming Yen,

K1 Chinese Remainder Theorem (CRT)

K1 cryptography

K1 denial of service attack

K1 factorization

K1 fault detection

K1 fault infective CRT

K1 fault tolerance

K1 hardware fault cryptanalysis

K1 physical cryptanalysis

K1 residue number system

K1 side channel attack.

AB <p><b>Abstract</b>—This article considers the problem of how to prevent the fast RSA signature and decryption computation with residue number system (or called the CRT-based approach) speedup from a hardware fault cryptanalysis in a highly reliable and efficient approach. The CRT-based speedup for RSA signature has been widely adopted as an implementation standard ranging from large servers to very tiny smart IC cards. However, given a single erroneous computation result, a hardware fault cryptanalysis can totally break the RSA system by factoring the public modulus. Some countermeasures by using a simple verification function (e.g., raising a signature to the power of public key) or fault detection (e.g., an expanded modulus approach) have been reported in the literature; however, it will be pointed out in this paper that very few of these existing solutions are both sound and efficient. Unreasonably, in these methods, they assume that a comparison instruction will always be fault-free when developing countermeasures against hardware fault cryptanalysis. Researches show that the expanded modulus approach proposed by Shamir is superior to the approach of using a simple verification function when other physical cryptanalysis (e.g., timing cryptanalysis) is considered. So, we intend to improve Shamir's method. In this paper, the new concepts of fault infective CRT computation and fault infective CRT recombination are proposed. Based on the new concepts, two novel protocols are developed with rigorous proof of security. Two possible parameter settings are provided for the protocols. One setting is to select a small public key <tmath>e</tmath> and the proposed protocols can have comparable performance to Shamir's scheme. The other setting is to have better performance than Shamir's scheme (i.e., having comparable performance to conventional CRT speedup), but with a large public key. Most importantly, we wish to emphasize the importance of developing and proving the security of physically secure protocols without relying on unreliable or unreasonable assumptions, e.g., always fault-free instructions. In this paper, related protocols are also considered and are carefully examined to point out possible weaknesses.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190587

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190587

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 483

TI A Highly Regular and Scalable AES Hardware Architecture

A1 Stefan Mangard,

A1 Sandra Dominikus,

A1 Manfred Aigner,

K1 Advanced Encryption Standard (AES)

K1 hardware architecture

K1 IP module

K1 VLSI

K1 scalability

K1 regularity.

AB <p><b>Abstract</b>—This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semi-custom design flows. Contrary to other publications, a complete architecture (even including CBC mode) that is scalable in terms of throughput and in terms of the used key size is described. Similarities of encryption and decryption are utilized to provide a high level of performance using only a relatively small area (10,799 gate equivalents for the standard configuration). This performance is reached by balancing the combinational paths of the design. No other published AES hardware architecture provides similar balancing or a comparable regularity. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a 0.6 <tmath>\mu</tmath>m CMOS process using standard cells.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190589

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190589

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 473

TI Efficient Uses of FPGAs for Implementations of DES and Its Experimental Linear Cryptanalysis

A1 Gaël Rouvroy,

A1 Francois-Xavier Standaert,

A1 Jean-Jacques Quisquater,

A1 Jean-Didier Legat,

K1 Cryptography

K1 DES

K1 linear cryptanalysis

K1 FPGA

K1 efficient implementations.

AB <p><b>Abstract</b>—In its basic version, linear cryptanalysis is a known-plaintext attack that uses a linear relation between input-bits, output-bits, and key-bits of an encryption algorithm that holds with a certain probability. If enough plaintext-ciphertext pairs are provided, this approximation can be used to assign probabilities to the possible keys and to locate the most probable one. In 1993, Matsui applied it to DES, becoming the best known attack against DES. In 2000, Knudsen proposed three chosen-plaintext linear attacks, the third one becoming the best chosen-plaintext attack. This paper presents two original FPGA implementations of a DES encryption/decryption core that work at data rates up to 21.3 Gbps (333 MHz). We believe that our implementations are the fastest ones known nowadays. In our design, the plaintext, the key, and the mode (encryption/decrytion) can be changed with no dead cycles. Based on one of our fast DES implementations, we present an FPGA implementation of the known-plaintext linear cryptanalysis of DES. The resulting design is deployed on eight FPGAs and allows us to find 12 + 1 key bits in about 2.3 hours. As a comparison, the fastest software implementation known so far (in 2000) used the idle time of 18 Intel Pentium III MMX and broke a DES key in 4.32 days. Our fast linear cryptanalysis implementation made the performing of practical tests possible, allowing a comparison with Matsui's theoretical estimations.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190588

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190588

RT Journal Article

JF IEEE Transactions on Computers

YR 2003

VO 52

IS

SP 410

TI Efficient Implementation of Cryptographically Useful 'Large' Boolean Functions

A1 Subhamoy Maitra,

A1 Palash Sarkar,

K1 Boolean function

K1 cryptography

K1 pipelined architecture

K1 reconfigurable hardware

K1 stream cipher.

AB <p><b>Abstract</b>—We present low cost hardware architecture for implementing state-of-the-art theoretical constructions of secure Boolean functions suitable for stream ciphers. Using a pipelined architecture, we show that it is possible to implement systems which use Boolean functions of a relatively large number of variables. Our architecture is reconfigurable and provide a universal circuit for a certain class of secure Boolean functions.</p>

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2003.1190582

LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190582