RT Journal Article
JF Parallel Algorithms / Architecture Synthesis, AIZU International Symposium on
YR 1997
VO 00
SP 42
TI Run-Time Reference Clustering for Cache Performance Optimization
A1 Boleslaw K. Szymanski,
A1 Viktor K. Decyk,
A1 Wesley K. Kaplow,
A1 Peter Tannenbaum,
AB We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re-mapping techniques that are a part of distributed memory multi-processor codes designed to minimize interprocessor communication. Re-mapping each set of local data decreases cache-misses the same way re-mapping the global data decreases off-processor references. We demonstrate the applicability and performance of the RTRC technique on several prevalent applications: Sparse Matrix-Vector Multiply, Particle-In-Cell, and CHARMM-like codes. Performance results on SPARC-20, SP-2, and T3-D processors show that single node execution performance can be improved by as much as 35%.
PB IEEE Computer Society, [URL:http://www.computer.org]
LA English
DO 10.1109/AISPAS.1997.581623
LK http://doi.ieeecomputersociety.org/10.1109/AISPAS.1997.581623