ABSTRACT
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law.
In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations.
The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.
- H. Akkary and M.A. Driscoll, A Dynamic Multithreading Processor, in Proc. of the 31st Int. Symp. on Microarchitecture, 1998 Google ScholarDigital Library
- S. Balakrishnan, G. Sohi, Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs, in Proc. of the Int. Symp. on Computer Architecture, pp. 302--313, 2006 Google ScholarDigital Library
- L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 282--293, June 2000 Google ScholarDigital Library
- R. Canal, J.-M. Parcerisa, and A. Gonzalez, A Cost-effective Clustered Architecture. in Int. Conf. on Parallel Architectures and Compilation Techniques, pp 160--168, Newport Beach, CA, October 1999 Google ScholarDigital Library
- M. Cintra, J.F. Martinez and J. Torrellas, Architectural Support for Scalable Speculative Parallelization in Shared-Memory Systems, in Proc. of the 27th Int. Symp. on Computer Architecture, 2000 Google ScholarDigital Library
- J. D. Collins and D. M. Tullsen, Clustered Multithreaded Architectures - Pursuing Both Ipc and Cycle Time, in Int. Parallel and Distributed Processing Symp., April 2004Google ScholarCross Ref
- J.D. Collins, H. Wang, D.M. Tullsen, C. Hughes, Y-F. Lee, D. Lavery and J.P. Shen, Speculative Precomputation: Long Range Prefetching of Delinquent Loads, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google ScholarDigital Library
- C. García, C. Madriles, J. Sánchez, P. Marcuello, A. González, D. Tullsen, Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices, in Procs. of the Conf. on Programming Language Design and Implementation, 2005 Google ScholarDigital Library
- S. Gopal, T.N. Vijaykumar, J.E. Smith and G.S. Sohi, Speculative Versioning Cache, in Proc. of the 4th Int. Symp. on High Performance Computer Architecture, 1998 Google ScholarDigital Library
- L. Hammond, M. Willey and K. Olukotun, Data Speculation Support for a Chip Multiprocessor, in Proc. of the Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 1998 Google ScholarDigital Library
- E. Ipek, M. Kirman, and N. Kirman. Core fusion: Accommodating Software Diversity in Chip Multiprocessors, In Proc. of the Int. Symp. on Computer Architecture, 2007 Google ScholarDigital Library
- T. Johnson, R. Eigenmann, and T. Vijaykumar, Min-Cut Program Decomposition for Thread-Level Speculation, in Procs. of Conf. on Programming Language Design and Implementation, 2004 Google ScholarDigital Library
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, Introduction to the Cell Multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589--604, July 2005 Google ScholarDigital Library
- G. Karypis, and V. Kumar, Analysis of Multilevel Graph Partitioning, in Procs. of the 7th Supercomputing, 1995 Google ScholarDigital Library
- B. Kernighan, and S. Lin, An Efficient Heuristic Procedure for Partitioning of Electrical Circuits, in Bell System Technical Journal, 1970Google Scholar
- V. Krishnan and J. Torrellas, Hardware and Software Support for Speculative Execution of Sequential binaries on a Chip-Multiprocessor, in Int. Conf. on Supercomputing, pp. 85--92, 1998 Google ScholarDigital Library
- F. Latorre, J. Gonzalez, and A. Gonzalez, Back-end Assignment Schemes for Clustered Multithreaded Processors, in Intl. Conf. on Supercomputing, pp 316--325, Malo, France, June-July 2004 Google ScholarDigital Library
- P. Marcuello, and A. González, Thread-Spawning Schemes for Speculative Multithreaded Architectures, in Procs. of the Symp. on High Performance Computer Architectures, 2002 Google ScholarDigital Library
- J.F. Martinez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas, Cherry: Checkpointed Early Recycling in Out-of-order Microprocessors, in Procs. of the Int. Symp. on Microarchitecture, November 2002 Google ScholarDigital Library
- A. Mendelson, J, Mandelblat, S. Gochman, A. Shemer, R. Chabukswar, E. Niemeyer, A. Kumar, "CMP Implementation in Systems Based on the Intel® CoreTM Duo Processor", in Intel Technology Journal, Volume 10, Issue 2, 2006Google ScholarCross Ref
- T. Ohsawa, M. Takagi, S. Kawahara, and S. Matsushita, Pinot: Speculative Muti-threading Processor Architecture Exploiting Parallelism over a wide Range of Granularities, in Proc. of the 38th Int. Symp. on Microarchitecture, 2005 Google ScholarDigital Library
- M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas, Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google ScholarDigital Library
- S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. P. Jouppi, CACTI 5.1, Technical Report HPL-2008-20, HP Labs.Google Scholar
- N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August, Speculative Decoupled Software Pipelining, in Procs. of the Conference on Parallel Architecture and Compilation Techniques, pp. 49--59, 2007 Google ScholarDigital Library
- C.B. Zilles and G.S. Sohi, Execution-Based Prediction Using Speculative Slices, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google ScholarDigital Library
- C.B. Zilles and G.S. Sohi, Master/Slave Speculative Parallelization, in Proc. of the 35th Int. Symp. on Microarchitecture, 2002 Google ScholarDigital Library
- H. Zhong, S. A. Lieberman, and S. A. Mahlke, Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications. In Int. Symp. on High-Performance Computer Architecture, 2007 Google ScholarDigital Library
Index Terms
- Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Recommendations
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with ...
Anaphase: A Fine-Grain Thread Decomposition Scheme for Speculative Multithreading
PACT '09: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation TechniquesIndustry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial code (applications with low TLP, ...
Hybrid multi-core architecture for boosting single-threaded performance
The scaling of technology and the diminishing return of complicated uniprocessors have driven the industry towards multicore processors. While multithreaded applications can naturally leverage the enhanced throughput of multi-core processors, a large ...
Comments