RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 492
TI Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard
A1 Guido Bertoni,
A1 Paolo Maistri,
A1 Israel Koren,
A1 Luca Breveglieri,
A1 Vincenzo Piuri,
K1 Advanced Encryption Standard
K1 AES
K1 fault tolerance
K1 fault detection
K1 parity codes.
AB <p><b>Abstract</b>—The goal of the Advanced Encryption Standard (AES) is to achieve secure communication. The use of AES does not, however, guarantee reliable communication. Prior work has shown that even a single transient error occurring during the AES encryption (or decryption) process will very likely result in a large number of errors in the encrypted/decrypted data. Such faults must be detected before sending to avoid the transmission and use of erroneous data. Concurrent fault detection is important not only to protect the encryption/decryption process from random faults. It will also protect the encryption/decryption circuitry from an attacker who may maliciously inject faults in order to find the encryption secret key. In this paper, we first describe some studies of the effects that faults may have on a hardware implementation of AES by analyzing the propagation of such faults to the outputs. We then present two fault detection schemes: The first is a redundancy-based scheme while the second uses an error detecting code. The latter is a novel scheme which leads to very efficient and high coverage fault detection. Finally, the hardware costs and detection latencies of both schemes are estimated.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190590
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190590

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 403
TI A High-Speed Oscillator-Based Truly Random Number Source for Cryptographic Applications on a Smart Card IC
A1 Marco Bucci,
A1 Raimondo Luzzi,
A1 Lucia Germani,
A1 Alessandro Trifiletti,
A1 Mario Varanonuovo,
K1 Random number generator
K1 noise source
K1 jittered oscillator
K1 Smart Cards.
AB <p><b>Abstract</b>—The design of a high-speed IC random number source macro-cell, suitable to be integrated in a Smart Card microcontroller, is presented. The oscillator sampling technique is exploited and a jittered oscillator which features an amplified thermal noise source has been designed in order to increase the output throughput and the statistical quality of the generated bit sequences. The oscillator feedback loop acts as an offset compensation for the noise amplifier, thus solving one of the major issues in this kind of circuit. A numerical model for the proposed system has been developed which allows us to carry out an analytical expression for the transition probability between successive bits in the output stream. A prototype chip has been fabricated in a standard digital <tmath>0.18\mu m</tmath><it>n</it>-well CMOS process which features a 10Mbps throughput and fulfills the NIST FIPS and correlation-based tests for randomness. The macro-cell area, excluding pads, is <tmath>0.0016mm^{2}</tmath> (<tmath>184\mu m \times 86\mu m</tmath>) and a <tmath>2.3mW</tmath> power consumption has been measured.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190581
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190581

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 506
TI Two-Version Based Concurrency Control and Recovery in Real-Time Client/Server Databases
A1 Tei-Wei Kuo,
A1 Chin-Fu Kuo,
A1 Yuan-Ting Kao,
K1 Real-time database
K1 concurrency control
K1 recovery
K1 read-only transactions
K1 client/server databases
K1 logging
K1 write through procedure.
AB <p><b>Abstract</b>—While there has been a significant amount of research in real-time concurrency control, little work has been done in logging and recovery for real-time databases. This paper proposes a two-version approach which considers both real-time concurrency control and recovery. We propose a network-server-based architecture and algorithms which can not only reduce the blocking time of higher-priority transactions and improve the response time of client-side read-only transactions, but also provide a diskless runtime logging mechanism and an efficient and predictable recovery procedure. The performance of the algorithms was verified by a series of simulation experiments by comparing the algorithms with the well-known Priority Ceiling Protocol (PCP), the Read/Write PCP, the New PCP, and the 2-version two-phase locking protocol, for which we have very encouraging results. The schedulability of higher-priority transactions and the response time of client-side read-only transactions were all greatly improved.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190591
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190591

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 449
TI A Scalable Dual-Field Elliptic Curve Cryptographic Processor
A1 Kohji Takano,
A1 Akashi Satoh,
K1 Elliptic curve cryptography
K1 public key cryptography
K1 Montgomery multiplication
K1 Galois field
K1 high-speed hardware
K1 ASIC implementation.
AB <p><b>Abstract</b>—We propose an elliptic curve (EC) cryptographic processor architecture that can support Galois fields <tmath>{\rm GF}(p)</tmath> and <tmath>{\rm GF}(2^n)</tmath> for arbitrary prime numbers and irreducible polynomials by introducing a dual field multiplier. A Montgomery multiplier with an optimized data bus and an on-the-fly redundant binary converter boost the throughput of the EC scalar multiplication. All popular cryptographic functions such as DSA, EC-DSA, RSA, CRT, and prime generation are also supported. All commands are organized in a hierarchical structure according to their complexity. Our processor has high scalability and flexibility between speed, hardware area, and operand size. In the hardware evaluation using a 0.13-<tmath>\mu</tmath>m CMOS standard cell library, the high-speed design using 117.5 Kgates with a 64-bit multiplier achieved operation times of 1.21 ms and 0.19 ms for a 160-bit EC scalar multiplication in <tmath>{\rm GF}(p)</tmath> and <tmath>{\rm GF}(2^n)</tmath>, respectively. A compact version with an 8-bit multiplier requires only 28.3K gates and executes the operations in 7.47 ms and 2.79 ms. Not only 160-bit operations, but any bit length can be supported by any hardware configuration so long as the memory capacity is sufficient.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190586
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190586

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 428
TI Efficient Multiplication Beyond Optimal Normal Bases
A1 Arash Reyhani-Masoleh,
A1 M. Anwar Hasan,
K1 Finite fields
K1 multiplication
K1 normal bases
K1 composite fields
K1 optimal bases.
AB <p><b>Abstract</b>—In cryptographic applications, the use of normal bases to represent elements of the finite field <tmath>{\rm GF}( 2^{m})</tmath> is quite advantageous, especially for hardware implementation. In this article, we consider an important field operation, namely, multiplication which is used in many cryptographic functions. We present a class of algorithms for normal basis multiplication in <tmath>{\rm GF}( 2^{m})</tmath>. Our proposed multiplication algorithm for composite finite fields requires a significantly lower number of bit level operations and, hence, can reduce the space complexity of cryptographic systems.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190584
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190584

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 401
TI Guest Editors? Introduction to the Special Section on Cryptographic Hardware and Embedded Systems
A1 Christof Paar,
A1 Çetin K. Koç,
K1
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190580
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190580

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 525
TI What Designers of Bus and Network Architectures Should Know about Hypercubes
A1 M. Sami Fadali,
A1 Kirk F. Korver,
A1 Laurence E. LaForge,
K1 Hypercube fault tolerance
K1 hypercube latency
K1 configuration architectures
K1 performability
K1 quorums
K1 Hamming graphs
K1 K-cubes
K1 Moore graphs
K1 Moore Bound
K1 C-cubes
K1 Lee distance.
AB <p><b>Abstract</b>—We quantify why, as designers, we should prefer clique-based hypercubes (<it>K-cubes</it>) over traditional hypercubes based on cycles (<it>C-cubes</it>). Reaping fresh analytic results, we find that K-cubes minimize the wirecount and, <it>simultaneously</it>, the latency of hypercube architectures that tolerate failure of any <tmath>f</tmath> nodes. Refining the graph model of Hayes (1976), we pose the feasibility of configuration as a problem in multivariate optimization:</p><p>What <tmath>(f + 1){\hbox{-}}{\rm connected}</tmath><tmath>n{\hbox{-}}{\rm vertex}</tmath> graphs with fewest edges <tmath>\lceil n ( f + 1) / 2\rceil</tmath> minimize the maximum a) radius or b) diameter of subgraphs (i.e., <it>quorums</it>) induced by deleting up to <tmath>f</tmath> vertices? (1)</p><p>We solve (1) for <tmath>f</tmath> that is superlogarithmic but sublinear in <tmath>n</tmath> and, in the process, prove: 1) the fault tolerance of K-cubes is proportionally greater than that of C-cubes; 2) quorums formed from K-cubes have a diameter that is asymptotically convergent to the Moore Bound on radius; 3) under any conditions of scaling, by contrast, C-cubes diverge from the Moore Bound. Thus, K-cubes are <it>optimal</it>, while C-cubes are <it>suboptimal</it>. Our exposition furthermore: 4) counterexamples, corrects, and generalizes a mistaken claim by Armstrong and Gray (1981) concerning binary cubes; 5) proves that K-cubes and certain of their quorums are the <it>only</it> graphs which can be labeled such that the edge distance between any two vertices equals the Hamming distance between their labels; and 6) extends our results to K-cube-connected cycles and edges. We illustrate and motivate our work with applications to the synthesis of multicomputer architectures for deep space missions.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190592
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190592

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 418
TI Low Complexity Multiplication in a Finite Field Using Ring Representation
A1 Rajendra Katti,
A1 Joseph Brennan,
K1 Finite field multiplication
K1 ring representation
K1 systolic arrays.
AB <p><b>Abstract</b>—Elements of a finite field, <tmath>GF(2^m )</tmath>, are represented as elements in a ring in which multiplication is more time efficient. This leads to faster multipliers with a modest increase in the number of XOR and AND gates needed to construct the multiplier. Such multipliers are used in error control coding and cryptography. We consider rings modulo trinomials and 4-term polynomials. In each case, we show that our multiplier is faster than multipliers over elements in a finite field defined by irreducible pentanomials. These results are especially significant in the field of elliptic curve cryptography, where pentanomials are used to define finite fields. Finally, an efficient systolic implementation of a multiplier for elements in a ring defined by <tmath>x^n + x + 1</tmath> is presented.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190583
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190583

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 440
TI Achieving NTRU with Montgomery Multiplication
A1 Berk Sunar,
A1 Colleen O'Rourke,
K1 Cryptography
K1 NTRU
K1 unified architectures
K1 Montgomery multipliers
K1 Montgomery multiplication
K1 finite fields.
AB <p><b>Abstract</b>—In this paper, we propose a new unified architecture that utilizes the Montgomery Multiplication algorithm to perform a modular multiplication for both integers and binary polynomials and NTRU's polynomial multiplications. The unified design is capable of supporting a majority of public-key cryptosystems such as NTRU, RSA, Diffie-Hellman key exchange, and Elliptic Curve schemes, among others. Furthermore, the architecture is highly efficient in terms of area and speed.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190585
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190585

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 461
TI RSA Speedup with Chinese Remainder Theorem Immune against Hardware Fault Cryptanalysis
A1 Seongan Lim,
A1 Seungjoo Kim,
A1 Sang-Jae Moon,
A1 Sung-Ming Yen,
K1 Chinese Remainder Theorem (CRT)
K1 cryptography
K1 denial of service attack
K1 factorization
K1 fault detection
K1 fault infective CRT
K1 fault tolerance
K1 hardware fault cryptanalysis
K1 physical cryptanalysis
K1 residue number system
K1 side channel attack.
AB <p><b>Abstract</b>—This article considers the problem of how to prevent the fast RSA signature and decryption computation with residue number system (or called the CRT-based approach) speedup from a hardware fault cryptanalysis in a highly reliable and efficient approach. The CRT-based speedup for RSA signature has been widely adopted as an implementation standard ranging from large servers to very tiny smart IC cards. However, given a single erroneous computation result, a hardware fault cryptanalysis can totally break the RSA system by factoring the public modulus. Some countermeasures by using a simple verification function (e.g., raising a signature to the power of public key) or fault detection (e.g., an expanded modulus approach) have been reported in the literature; however, it will be pointed out in this paper that very few of these existing solutions are both sound and efficient. Unreasonably, in these methods, they assume that a comparison instruction will always be fault-free when developing countermeasures against hardware fault cryptanalysis. Researches show that the expanded modulus approach proposed by Shamir is superior to the approach of using a simple verification function when other physical cryptanalysis (e.g., timing cryptanalysis) is considered. So, we intend to improve Shamir's method. In this paper, the new concepts of fault infective CRT computation and fault infective CRT recombination are proposed. Based on the new concepts, two novel protocols are developed with rigorous proof of security. Two possible parameter settings are provided for the protocols. One setting is to select a small public key <tmath>e</tmath> and the proposed protocols can have comparable performance to Shamir's scheme. The other setting is to have better performance than Shamir's scheme (i.e., having comparable performance to conventional CRT speedup), but with a large public key. Most importantly, we wish to emphasize the importance of developing and proving the security of physically secure protocols without relying on unreliable or unreasonable assumptions, e.g., always fault-free instructions. In this paper, related protocols are also considered and are carefully examined to point out possible weaknesses.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190587
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190587

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 483
TI A Highly Regular and Scalable AES Hardware Architecture
A1 Stefan Mangard,
A1 Sandra Dominikus,
A1 Manfred Aigner,
K1 Advanced Encryption Standard (AES)
K1 hardware architecture
K1 IP module
K1 VLSI
K1 scalability
K1 regularity.
AB <p><b>Abstract</b>—This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semi-custom design flows. Contrary to other publications, a complete architecture (even including CBC mode) that is scalable in terms of throughput and in terms of the used key size is described. Similarities of encryption and decryption are utilized to provide a high level of performance using only a relatively small area (10,799 gate equivalents for the standard configuration). This performance is reached by balancing the combinational paths of the design. No other published AES hardware architecture provides similar balancing or a comparable regularity. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a 0.6 <tmath>\mu</tmath>m CMOS process using standard cells.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190589
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190589

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 473
TI Efficient Uses of FPGAs for Implementations of DES and Its Experimental Linear Cryptanalysis
A1 Gaël Rouvroy,
A1 Francois-Xavier Standaert,
A1 Jean-Jacques Quisquater,
A1 Jean-Didier Legat,
K1 Cryptography
K1 DES
K1 linear cryptanalysis
K1 FPGA
K1 efficient implementations.
AB <p><b>Abstract</b>—In its basic version, linear cryptanalysis is a known-plaintext attack that uses a linear relation between input-bits, output-bits, and key-bits of an encryption algorithm that holds with a certain probability. If enough plaintext-ciphertext pairs are provided, this approximation can be used to assign probabilities to the possible keys and to locate the most probable one. In 1993, Matsui applied it to DES, becoming the best known attack against DES. In 2000, Knudsen proposed three chosen-plaintext linear attacks, the third one becoming the best chosen-plaintext attack. This paper presents two original FPGA implementations of a DES encryption/decryption core that work at data rates up to 21.3 Gbps (333 MHz). We believe that our implementations are the fastest ones known nowadays. In our design, the plaintext, the key, and the mode (encryption/decrytion) can be changed with no dead cycles. Based on one of our fast DES implementations, we present an FPGA implementation of the known-plaintext linear cryptanalysis of DES. The resulting design is deployed on eight FPGAs and allows us to find 12 + 1 key bits in about 2.3 hours. As a comparison, the fastest software implementation known so far (in 2000) used the idle time of 18 Intel Pentium III MMX and broke a DES key in 4.32 days. Our fast linear cryptanalysis implementation made the performing of practical tests possible, allowing a comparison with Matsui's theoretical estimations.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190588
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190588

RT Journal Article
JF IEEE Transactions on Computers
YR 2003
VO 52
IS
SP 410
TI Efficient Implementation of Cryptographically Useful 'Large' Boolean Functions
A1 Subhamoy Maitra,
A1 Palash Sarkar,
K1 Boolean function
K1 cryptography
K1 pipelined architecture
K1 reconfigurable hardware
K1 stream cipher.
AB <p><b>Abstract</b>—We present low cost hardware architecture for implementing state-of-the-art theoretical constructions of secure Boolean functions suitable for stream ciphers. Using a pipelined architecture, we show that it is possible to implement systems which use Boolean functions of a relatively large number of variables. Our architecture is reconfigurable and provide a universal circuit for a certain class of secure Boolean functions.</p>
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2003.1190582
LK http://doi.ieeecomputersociety.org/10.1109/TC.2003.1190582