RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1498
TI Cryptanalysis with COPACOBANA
A1 Martin Novotný,
A1 Tim Güneysu,
A1 Andy Rupp,
A1 Christof Paar,
A1 Timo Kasper,
K1 Special-Purpose and Application-Based Systems
K1 Reconfigurable hardware
K1 Cryptanalysis
AB Cryptanalysis of ciphers usually involves massive computations. The security parameters of cryptographic algorithms are commonly chosen so that attacks are infeasible with available computing resources. This contribution presents a variety of cryptanalytical applications utilizing the COPACOBANA (Cost-Optimized Parallel Code Breaker) machine which is a high-performance, low-cost cluster consisting of 120 Field Programmable Gate Arrays (FPGA). COPACOBANA appears to be the only such reconfigurable parallel FPGA machine optimized for code breaking tasks reported in the open literature. Depending on the actual algorithm, the parallel hardware architecture can outperform conventional computers by several orders of magnitude. In this work, we will focus on novel implementations of cryptanalytical algorithms, utilizing the impressive computational power of COPACOBANA. We describe various exhaustive key search attacks on symmetric ciphers and demonstrate an attack on a security mechanism employed in the electronic passport. Furthermore, we describe time-memory tradeoff techniques which can, e.g., be used for attacking the popular A5/1 algorithm used in GSM voice encryption. In addition, we introduce efficient implementations of more complex cryptanalysis on asymmetric cryptosystems, e.g., Elliptic Curve Cryptosystems (ECC) and number co-factorization for RSA.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.80
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.80

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1550
TI Novel Table Lookup-Based Algorithms for High-Performance CRC Generation
A1 Frank L. Berry,
A1 Michael E. Kounavis,
K1 Error handling and recovery
K1 Mathematical Software
K1 Data communications
K1 Network Protocols
AB A framework for designing a family of novel fast CRC generation algorithms is presented. Our algorithms can ideally read arbitrarily large amounts of data at a time, while optimizing their memory requirement to meet the constraints of specific computer architectures. In addition, our algorithms can be implemented in software using commodity processors instead of specialized parallel circuits. We use this framework to design two efficient algorithms that run in the popular Intel IA32 processor architecture. First, a 'slicing-by-4' algorithm doubles the performance of existing software-based, table-driven CRC implementations based on the Sarwate [12] algorithm while using a 4K cache footprint. Second, a 'slicing-by-8' algorithm triples the performance of existing software-based CRC implementations while using an 8K cache footprint. Whereas well-known software- based CRC implementations compute the current CRC value from a bit-stream reading 8 bits at a time, our algorithms read 32 and 64 bits at a time respectively. The slicing-by-8 source code is freely available for experimentation and can be found at: http://sourceforge.net/projects/slicing-by-8
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.85
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.85

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1454
TI Algorithms and Arithmetic Operators for Computing the ηT Pairing in Characteristic Three
A1 Nicolas Brisebarre,
A1 Eiji Okamoto,
A1 Masaaki Shirase,
A1 Jérémie Detrey,
A1 Jean-Luc Beuchat,
A1 Tsuyoshi Takagi,
K1 Eta_T pairing
K1 elliptic curve
K1 finite field arithmetic
K1 hardware accelerator
K1 FPGA
AB Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. Software implementations being rather slow, the study of hardware architectures became an active research area. In this paper, we discuss several algorithms to compute the ηT pairing in characteristic three and suggest further improvements. These algorithms involve addition, multiplication, cubing, inversion, and sometimes cube root extraction over GF(3m). We propose a hardware accelerator based on a unified arithmetic operator able to perform the operations required by a given algorithm. We describe the implementation of a compact coprocessor for the field GF(397) given by GF(3)[x]/(x97+x12+2), which compares favorably with other solutions described in the open literature.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.103
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.103

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1514
TI Elliptic-Curve-Based Security Processor for RFID
A1 Lejla Batina,
A1 Yong Ki Lee,
A1 Kazuo Sakiyama,
A1 Ingrid Verbauwhede,
K1 Support for security
K1 Micro-architecture implementation considerations
K1 Processor Architectures
K1 Compu
K1 Special-purpose
K1 General
K1 Low-power design
AB RFID (Radio Frequency IDentification) tags need to include security functions, yet at the same time their resources are extremely limited. Moreover, to provide privacy, authentication and protection against tracking of RFID tags without loosing the system scalability, a public-key based approach is inevitable, which is shown by M. Burmester et al. In this paper, we present an architecture of a state-of-the-art processor for RFID tags with an Elliptic Curve (EC) processor over GF(2^163). It shows the plausibility of meeting both security and efficiency requirements even in a passive RFID tag. The proposed processor is able to perform EC scalar multiplications as well as general modular arithmetic (additions and multiplications) which are needed for the cryptographic protocols. As we work with large numbers, the register file is the most critical component in the architecture. By combining several techniques, we are able to reduce the number of registers from 9 to 6 resulting in EC processor of 10.1K gates. To obtain an efficient modulo arithmetic, we introduce a redundant modular operation. Moreover the proposed architecture can support multiple cryptographic protocols. The synthesis results with a 0.13 um CMOS technology show that the gate area of the most compact version is 12.5K gates.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.148
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.148

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1443
TI High-Performance Architecture of Elliptic Curve Scalar Multiplication
A1 M. Anwar Hasan,
A1 Bijan Ansari,
K1 Elliptic curves
K1 finite fields
K1 scalar multiplication
AB A high performance architecture of elliptic curve scalar multiplication based on the Montgomery ladder method over finite field GF(2m) is proposed. A pseudo-pipelined word serial finite field multiplier with word size w, suitable for the scalar multiplication is also developed. Implemented in hardware, this system performs a scalar multiplication in approximately 6⌈m/w⌉(m−1) clock cycles and the gate delay in the critical path is equal to TAND + ⌈log2(w/k)⌉TXOR, where TAND and TXOR are delays due to two-input AND and XOR gates respectively and 1 ≤ k ≪ w is used to shorten the critical path.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.133
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.133

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1561
TI Efficient Approximate Wordlength Optimization
A1 Paul D. Fiore,
K1 Signal processing systems
K1 Reconfigurable hardware
K1 Multiple precision arithmetic
K1 Constrained optimization
AB In this paper, the problem of bounding the performance of good wordlength combinations for fixed-point digital signal processing flowgraphs is addressed. By formulating and solving an approximate optimization problem, a lower bounding curve on attainable cost/quality combinations is rapidly calculated. This curve and the associated wordlength combinations are useful in several situations, and can serve as starting points for real design searches. A detailed design example that utilizes these concepts is given.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.87
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.87

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1540
TI Protection Circuit against Differential Power Analysis Attacks for Smart Cards
A1 Stefano Gregori,
K1 VLSI
K1 Security and Privacy Protection
AB In this paper, we present a circuit that protects smart cards against differential power analysis attacks. The circuit is based on a current flattening technique, is designed using a standard 0.18-µm CMOS technology, and can be integrated on the same die or in the same package with the smart card microcontroller. We evaluate the current flattening performance and the effectiveness of the protection against differential power analysis attacks. Our analysis is based on transistor-level simulations in Cadence environment using experimental current traces collected from an 8-bit microcontroller for smart cards executing DES encryptions. The proposed circuit effectively protects against differential power analysis attacks with small chip area overhead and limited increased power consumption during the encryption cycles.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.107
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.107

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1528
TI Double-Data-Rate Computation as a Countermeasure against Fault Analysis
A1 Paolo Maistri,
A1 Régis Leveugle,
K1 AES
K1 DFA
K1 Fault detection
K1 Redundancy
AB Differential Fault Analysis (DFA) is one of the most powerful techniques to attack cryptosystems. Several countermeasures have been proposed, which are based either on information or temporal redundancy. In this work, we propose a novel approach based on a Double-Data-Rate (DDR) computation template. A few sample architectures have been implemented: they are compared to other existing architectures and countermeasures, and a thorough dependability analysis is given.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.149
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.149

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1571
TI Configurable Flash-Memory Management: Performance versus Overheads
A1 Tei-Wei Kuo,
A1 Yi-Lin Tsai,
A1 Jen-Wei Hsieh,
A1 Tzao-Lin Lee,
K1 Storage Management
K1 Performance
AB Flash memory is widely adopted in various consumer products for information storage, especially for embedded systems. With strong demands on product designs for overhead control and performance requirements, vendors must have an effective design for the mapping of logical block addresses (LBA's) and physical addresses of data over flash memory. This paper targets such an essential issue by proposing a configurable mapping method that could trade the main-memory overhead with the system performance under the best needs of vendors. A series of experiments is conducted to provide insights on different configurations and the proposed method, compared to existing implementations.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.61
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.61

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1469
TI Provably Sublinear Point Multiplication on Koblitz Curves and Its Hardware Implementation
A1 Wai Fong (Andy) Chan,
A1 Zhun Huang,
A1 Kimmo U. Järvinen,
A1 Vassil S. Dimitrov,
A1 Micheal J. Jacobson Jr.,
K1 Elliptic curve cryptography
K1 Field-programmable gate arrays
K1 Koblitz curves
K1 multiple-base expansions
K1 parallel processing
K1 sublinearity
AB We describe algorithms for point multiplication on Koblitz curves using multiple-base expansions of the form $k = \sum \pm \tau^a (\tau-1)^b$ and $k= \sum \pm \tau^a (\tau-1)^b (\tau^2 - \tau - 1)^c.$ We prove that the number of terms in the second type is sublinear in the bit length of $k$, which leads to the first provably sublinear point multiplication algorithm on Koblitz curves. For the first type, we conjecture that the number of terms is sublinear and provide numerical evidence demonstrating that the number of terms is significantly less than that of $\tau$-adic non-adjacent form expansions. We present details of an innovative FPGA implementation of our algorithm and performance data demonstrating the efficiency of our method. We also show that implementations with very low computation latency are possible with the proposed method because parallel processing can be exploited efficiently.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.65
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.65

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1482
TI Security Evaluation of WDDL and SecLib Countermeasures against Power Attacks
A1 Laurent Sauvage,
A1 Guido Marco Bertoni,
A1 Sylvain Guilley,
A1 Sumanta Chaudhuri,
A1 Philippe Hoogvorst,
A1 Renaud Pacalet,
K1 Types and Design Styles
K1 Power Management
AB Power-constant logic styles are promising solutions to counter-act side-channel attacks on sensitive cryptographic devices. Recently, one vulnerability has been identified in a standard-cell based power-constant logic called WDDL. Another logic, nicknamed SecLib, is considered and does not present the flaw of WDDL. In this paper, we evaluate the security level of WDDL and SecLib. The methodology consists in embedding in a dedicated circuit one unprotected DES co-processor along with two others, implemented in WDDL and in SecLib. One essential part of this article is to describe the conception of the cryptographic ASIC, devised to foster side-channel cryptanalyses, in a view to model the strongest possible attacker. The same analyses are carried out successively on the three DES modules. We conclude that, provided the backend of the WDDL module is carefully designed, its vulnerability cannot be exploited by the state-of-the-art attacks. Similarly, the SecLib DES module resists all assaults. However, using a principal component analysis, we show that WDDL is more vulnerable than SecLib. The statistical dispersion of WDDL, that reflects the correlation between the secrets and the power dissipation, is proved to be an order of magnitude higher than that of SecLib.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.109
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.109

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS
SP 1441
TI Guest Editors' Introduction to the Special Section on Special-Purpose Hardware for Cryptography and Cryptanalysis
A1 Rainer Steinwandt,
A1 Çetin Kaya Koç,
A1 Willi Geiselmann,
K1
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.188
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.188

RT Journal Article
JF IEEE Transactions on Computers
YR 2008
VO 57
IS 11
SP 1469
TI Provably Sublinear Point Multiplication on Koblitz Curves and Its Hardware Implementation
A1 V.S. Dimitrov,
A1 K.U. Jarvinen,
A1 M.J. Jacobson,
A1 W. Chan,
A1 Zhun Huang,
K1 field programmable gate arrays
K1 cryptography
K1 parallel processing
K1 sublinear point multiplication
K1 Koblitz curves
K1 hardware implementation
K1 multiple-base expansions
K1 sublinear type
K1 FPGA implementation
K1 performance data
K1 sublinearity
K1 Elliptic curve cryptography
K1 Field-programmable gate arrays
K1 Koblitz curves
K1 multiple-base expansions
K1 parallel processing
K1 sublinearity
K1 Elliptic curve cryptography
K1 Field-programmable gate arrays
K1 Koblitz curves
K1 multiple-base expansions
K1 parallel processing
AB We describe algorithms for point multiplication on Koblitz curves using multiple-base expansions of the form k = Sigmaplusmntaua(tau-1)b and k = Sigmaplusmntaua(tau - mu)b(tau2 - mutau - 1)c. We prove that the number of terms in the second type is sublinear in the bit length of k, which leads to the first provably sublinear point multiplication algorithm on Koblitz curves. For the first type, we conjecture that the number of terms is sublinear and provide numerical evidence demonstrating that the number of terms is significantly less than that of tau-adic nonadjacent form expansions. We present details of an innovative FPGA implementation of our algorithm and performance data demonstrating the efficiency of our method. We also show that implementations with very low computation latency are possible with the proposed method because parallel processing can be exploited efficiently.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.65
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.65

RT Journal Article
JF IEEE Transactions on Computers
YR
VO 57
IS 11
SP 1441
TI Guest Editors' Introduction to the Special Section on Special-Purpose Hardware for Cryptography and CryptanalysisK1 Special issues and sections
K1 Cryptography
K1 Hardware

AB The eight papers in this special section focus on special-purpose hardware for cryptography and cryptanalysis.
PB IEEE Computer Society, [URL:http://www.computer.org]
SN 0018-9340
LA English
DO 10.1109/TC.2008.188
LK http://doi.ieeecomputersociety.org/10.1109/TC.2008.188