JF IEEE Transactions on Computers

YR

VO 57

IS 11

SP 1441

TI Guest Editors' Introduction to the Special Section on Special-Purpose Hardware for Cryptography and Cryptanalysis

A1 Rainer Steinwandt,

A1 Willi Geiselmann,

A1 Çetin Kaya Koç, K1 Special issues and sections

K1 Cryptography

K1 Hardware

AB The eight papers in this special section focus on special-purpose hardware for cryptography and cryptanalysis.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.188

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.188

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS 11

SP 1469

TI Provably Sublinear Point Multiplication on Koblitz Curves and Its Hardware Implementation

A1 V.S. Dimitrov,

A1 K.U. Jarvinen,

A1 M.J. Jacobson,

A1 W. Chan,

A1 Zhun Huang,

K1 field programmable gate arrays

K1 cryptography

K1 parallel processing

K1 sublinear point multiplication

K1 Koblitz curves

K1 hardware implementation

K1 multiple-base expansions

K1 sublinear type

K1 nonadjacent form expansions

K1 FPGA implementation

K1 performance data

K1 sublinearity

K1 Elliptic curve cryptography

K1 Field-programmable gate arrays

K1 Koblitz curves

K1 multiple-base expansions

K1 parallel processing

K1 sublinearity

K1 Elliptic curve cryptography

K1 Field-programmable gate arrays

K1 Koblitz curves

K1 multiple-base expansions

K1 parallel processing

AB We describe algorithms for point multiplication on Koblitz curves using multiple-base expansions of the form k = Sigmaplusmntau

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.65

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.65

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1498

TI Cryptanalysis with COPACOBANA

A1 Martin Novotný,

A1 Tim Güneysu,

A1 Andy Rupp,

A1 Christof Paar,

A1 Timo Kasper,

K1 Special-Purpose and Application-Based Systems

K1 Reconfigurable hardware

K1 Cryptanalysis

AB Cryptanalysis of ciphers usually involves massive computations. The security parameters of cryptographic algorithms are commonly chosen so that attacks are infeasible with available computing resources. This contribution presents a variety of cryptanalytical applications utilizing the COPACOBANA (Cost-Optimized Parallel Code Breaker) machine which is a high-performance, low-cost cluster consisting of 120 Field Programmable Gate Arrays (FPGA). COPACOBANA appears to be the only such reconfigurable parallel FPGA machine optimized for code breaking tasks reported in the open literature. Depending on the actual algorithm, the parallel hardware architecture can outperform conventional computers by several orders of magnitude. In this work, we will focus on novel implementations of cryptanalytical algorithms, utilizing the impressive computational power of COPACOBANA. We describe various exhaustive key search attacks on symmetric ciphers and demonstrate an attack on a security mechanism employed in the electronic passport. Furthermore, we describe time-memory tradeoff techniques which can, e.g., be used for attacking the popular A5/1 algorithm used in GSM voice encryption. In addition, we introduce efficient implementations of more complex cryptanalysis on asymmetric cryptosystems, e.g., Elliptic Curve Cryptosystems (ECC) and number co-factorization for RSA.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.80

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.80

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1550

TI Novel Table Lookup-Based Algorithms for High-Performance CRC Generation

A1 Frank L. Berry,

A1 Michael E. Kounavis,

K1 Error handling and recovery

K1 Mathematical Software

K1 Data communications

K1 Network Protocols

AB A framework for designing a family of novel fast CRC generation algorithms is presented. Our algorithms can ideally read arbitrarily large amounts of data at a time, while optimizing their memory requirement to meet the constraints of specific computer architectures. In addition, our algorithms can be implemented in software using commodity processors instead of specialized parallel circuits. We use this framework to design two efficient algorithms that run in the popular Intel IA32 processor architecture. First, a 'slicing-by-4' algorithm doubles the performance of existing software-based, table-driven CRC implementations based on the Sarwate [12] algorithm while using a 4K cache footprint. Second, a 'slicing-by-8' algorithm triples the performance of existing software-based CRC implementations while using an 8K cache footprint. Whereas well-known software- based CRC implementations compute the current CRC value from a bit-stream reading 8 bits at a time, our algorithms read 32 and 64 bits at a time respectively. The slicing-by-8 source code is freely available for experimentation and can be found at: http://sourceforge.net/projects/slicing-by-8

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.85

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.85

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1454

TI Algorithms and Arithmetic Operators for Computing the ηT Pairing in Characteristic Three

A1 Nicolas Brisebarre,

A1 Eiji Okamoto,

A1 Masaaki Shirase,

A1 Jérémie Detrey,

A1 Jean-Luc Beuchat,

A1 Tsuyoshi Takagi,

K1 Eta_T pairing

K1 elliptic curve

K1 finite field arithmetic

K1 hardware accelerator

K1 FPGA

AB Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. Software implementations being rather slow, the study of hardware architectures became an active research area. In this paper, we discuss several algorithms to compute the ηT pairing in characteristic three and suggest further improvements. These algorithms involve addition, multiplication, cubing, inversion, and sometimes cube root extraction over GF(3m). We propose a hardware accelerator based on a unified arithmetic operator able to perform the operations required by a given algorithm. We describe the implementation of a compact coprocessor for the field GF(397) given by GF(3)[x]/(x97+x12+2), which compares favorably with other solutions described in the open literature.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.103

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.103

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1514

TI Elliptic-Curve-Based Security Processor for RFID

A1 Lejla Batina,

A1 Yong Ki Lee,

A1 Kazuo Sakiyama,

A1 Ingrid Verbauwhede,

K1 Support for security

K1 Micro-architecture implementation considerations

K1 Processor Architectures

K1 Compu

K1 Special-purpose

K1 General

K1 Low-power design

AB RFID (Radio Frequency IDentification) tags need to include security functions, yet at the same time their resources are extremely limited. Moreover, to provide privacy, authentication and protection against tracking of RFID tags without loosing the system scalability, a public-key based approach is inevitable, which is shown by M. Burmester et al. In this paper, we present an architecture of a state-of-the-art processor for RFID tags with an Elliptic Curve (EC) processor over GF(2^163). It shows the plausibility of meeting both security and efficiency requirements even in a passive RFID tag. The proposed processor is able to perform EC scalar multiplications as well as general modular arithmetic (additions and multiplications) which are needed for the cryptographic protocols. As we work with large numbers, the register file is the most critical component in the architecture. By combining several techniques, we are able to reduce the number of registers from 9 to 6 resulting in EC processor of 10.1K gates. To obtain an efficient modulo arithmetic, we introduce a redundant modular operation. Moreover the proposed architecture can support multiple cryptographic protocols. The synthesis results with a 0.13 um CMOS technology show that the gate area of the most compact version is 12.5K gates.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.148

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.148

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1443

TI High-Performance Architecture of Elliptic Curve Scalar Multiplication

A1 M. Anwar Hasan,

A1 Bijan Ansari,

K1 Elliptic curves

K1 finite fields

K1 scalar multiplication

AB A high performance architecture of elliptic curve scalar multiplication based on the Montgomery ladder method over finite field GF(2m) is proposed. A pseudo-pipelined word serial finite field multiplier with word size w, suitable for the scalar multiplication is also developed. Implemented in hardware, this system performs a scalar multiplication in approximately 6⌈m/w⌉(m−1) clock cycles and the gate delay in the critical path is equal to TAND + ⌈log2(w/k)⌉TXOR, where TAND and TXOR are delays due to two-input AND and XOR gates respectively and 1 ≤ k ≪ w is used to shorten the critical path.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.133

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.133

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1561

TI Efficient Approximate Wordlength Optimization

A1 Paul D. Fiore,

K1 Signal processing systems

K1 Reconfigurable hardware

K1 Tradeoffs between Complexity Measures

K1 Multiple precision arithmetic

K1 Constrained optimization

AB In this paper, the problem of bounding the performance of good wordlength combinations for fixed-point digital signal processing flowgraphs is addressed. By formulating and solving an approximate optimization problem, a lower bounding curve on attainable cost/quality combinations is rapidly calculated. This curve and the associated wordlength combinations are useful in several situations, and can serve as starting points for real design searches. A detailed design example that utilizes these concepts is given.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.87

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.87

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1540

TI Protection Circuit against Differential Power Analysis Attacks for Smart Cards

A1 Stefano Gregori,

A1 Radu Muresan,

K1 VLSI

K1 Security and Privacy Protection

AB In this paper, we present a circuit that protects smart cards against differential power analysis attacks. The circuit is based on a current flattening technique, is designed using a standard 0.18-µm CMOS technology, and can be integrated on the same die or in the same package with the smart card microcontroller. We evaluate the current flattening performance and the effectiveness of the protection against differential power analysis attacks. Our analysis is based on transistor-level simulations in Cadence environment using experimental current traces collected from an 8-bit microcontroller for smart cards executing DES encryptions. The proposed circuit effectively protects against differential power analysis attacks with small chip area overhead and limited increased power consumption during the encryption cycles.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.107

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.107

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1528

TI Double-Data-Rate Computation as a Countermeasure against Fault Analysis

A1 Paolo Maistri,

A1 Régis Leveugle,

K1 AES

K1 DFA

K1 Fault detection

K1 Redundancy

AB Differential Fault Analysis (DFA) is one of the most powerful techniques to attack cryptosystems. Several countermeasures have been proposed, which are based either on information or temporal redundancy. In this work, we propose a novel approach based on a Double-Data-Rate (DDR) computation template. A few sample architectures have been implemented: they are compared to other existing architectures and countermeasures, and a thorough dependability analysis is given.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.149

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.149

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1571

TI Configurable Flash-Memory Management: Performance versus Overheads

A1 Tei-Wei Kuo,

A1 Yi-Lin Tsai,

A1 Jen-Wei Hsieh,

A1 Tzao-Lin Lee,

K1 Storage Management

K1 Performance

AB Flash memory is widely adopted in various consumer products for information storage, especially for embedded systems. With strong demands on product designs for overhead control and performance requirements, vendors must have an effective design for the mapping of logical block addresses (LBA's) and physical addresses of data over flash memory. This paper targets such an essential issue by proposing a configurable mapping method that could trade the main-memory overhead with the system performance under the best needs of vendors. A series of experiments is conducted to provide insights on different configurations and the proposed method, compared to existing implementations.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.61

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.61

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1469

TI Provably Sublinear Point Multiplication on Koblitz Curves and Its Hardware Implementation

A1 Wai Fong (Andy) Chan,

A1 Zhun Huang,

A1 Kimmo U. Järvinen,

A1 Vassil S. Dimitrov,

A1 Micheal J. Jacobson Jr.,

K1 Elliptic curve cryptography

K1 Field-programmable gate arrays

K1 Koblitz curves

K1 multiple-base expansions

K1 parallel processing

K1 sublinearity

AB We describe algorithms for point multiplication on Koblitz curves using multiple-base expansions of the form $k = \sum \pm \tau^a (\tau-1)^b$ and $k= \sum \pm \tau^a (\tau-1)^b (\tau^2 - \tau - 1)^c.$ We prove that the number of terms in the second type is sublinear in the bit length of $k$, which leads to the first provably sublinear point multiplication algorithm on Koblitz curves. For the first type, we conjecture that the number of terms is sublinear and provide numerical evidence demonstrating that the number of terms is significantly less than that of $\tau$-adic non-adjacent form expansions. We present details of an innovative FPGA implementation of our algorithm and performance data demonstrating the efficiency of our method. We also show that implementations with very low computation latency are possible with the proposed method because parallel processing can be exploited efficiently.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.65

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.65

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1482

TI Security Evaluation of WDDL and SecLib Countermeasures against Power Attacks

A1 Laurent Sauvage,

A1 Guido Marco Bertoni,

A1 Sylvain Guilley,

A1 Sumanta Chaudhuri,

A1 Philippe Hoogvorst,

A1 Renaud Pacalet,

K1 Types and Design Styles

K1 Power Management

AB Power-constant logic styles are promising solutions to counter-act side-channel attacks on sensitive cryptographic devices. Recently, one vulnerability has been identified in a standard-cell based power-constant logic called WDDL. Another logic, nicknamed SecLib, is considered and does not present the flaw of WDDL. In this paper, we evaluate the security level of WDDL and SecLib. The methodology consists in embedding in a dedicated circuit one unprotected DES co-processor along with two others, implemented in WDDL and in SecLib. One essential part of this article is to describe the conception of the cryptographic ASIC, devised to foster side-channel cryptanalyses, in a view to model the strongest possible attacker. The same analyses are carried out successively on the three DES modules. We conclude that, provided the backend of the WDDL module is carefully designed, its vulnerability cannot be exploited by the state-of-the-art attacks. Similarly, the SecLib DES module resists all assaults. However, using a principal component analysis, we show that WDDL is more vulnerable than SecLib. The statistical dispersion of WDDL, that reflects the correlation between the secrets and the power dissipation, is proved to be an order of magnitude higher than that of SecLib.

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.109

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.109

RT Journal Article

JF IEEE Transactions on Computers

YR 2008

VO 57

IS

SP 1441

TI Guest Editors' Introduction to the Special Section on Special-Purpose Hardware for Cryptography and Cryptanalysis

A1 Rainer Steinwandt,

A1 Çetin Kaya Koç,

A1 Willi Geiselmann,

K1

PB IEEE Computer Society, [URL:http://www.computer.org]

SN 0018-9340

LA English

DO 10.1109/TC.2008.188

LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.188