skip to main content
10.1145/1555754.1555813acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Authors Info & Claims
Published:20 June 2009Publication History

ABSTRACT

Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law.

In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations.

The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.

References

  1. H. Akkary and M.A. Driscoll, A Dynamic Multithreading Processor, in Proc. of the 31st Int. Symp. on Microarchitecture, 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Balakrishnan, G. Sohi, Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs, in Proc. of the Int. Symp. on Computer Architecture, pp. 302--313, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 282--293, June 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Canal, J.-M. Parcerisa, and A. Gonzalez, A Cost-effective Clustered Architecture. in Int. Conf. on Parallel Architectures and Compilation Techniques, pp 160--168, Newport Beach, CA, October 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Cintra, J.F. Martinez and J. Torrellas, Architectural Support for Scalable Speculative Parallelization in Shared-Memory Systems, in Proc. of the 27th Int. Symp. on Computer Architecture, 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. D. Collins and D. M. Tullsen, Clustered Multithreaded Architectures - Pursuing Both Ipc and Cycle Time, in Int. Parallel and Distributed Processing Symp., April 2004Google ScholarGoogle ScholarCross RefCross Ref
  7. J.D. Collins, H. Wang, D.M. Tullsen, C. Hughes, Y-F. Lee, D. Lavery and J.P. Shen, Speculative Precomputation: Long Range Prefetching of Delinquent Loads, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. García, C. Madriles, J. Sánchez, P. Marcuello, A. González, D. Tullsen, Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices, in Procs. of the Conf. on Programming Language Design and Implementation, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Gopal, T.N. Vijaykumar, J.E. Smith and G.S. Sohi, Speculative Versioning Cache, in Proc. of the 4th Int. Symp. on High Performance Computer Architecture, 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Hammond, M. Willey and K. Olukotun, Data Speculation Support for a Chip Multiprocessor, in Proc. of the Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Ipek, M. Kirman, and N. Kirman. Core fusion: Accommodating Software Diversity in Chip Multiprocessors, In Proc. of the Int. Symp. on Computer Architecture, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Johnson, R. Eigenmann, and T. Vijaykumar, Min-Cut Program Decomposition for Thread-Level Speculation, in Procs. of Conf. on Programming Language Design and Implementation, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, Introduction to the Cell Multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589--604, July 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Karypis, and V. Kumar, Analysis of Multilevel Graph Partitioning, in Procs. of the 7th Supercomputing, 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Kernighan, and S. Lin, An Efficient Heuristic Procedure for Partitioning of Electrical Circuits, in Bell System Technical Journal, 1970Google ScholarGoogle Scholar
  16. V. Krishnan and J. Torrellas, Hardware and Software Support for Speculative Execution of Sequential binaries on a Chip-Multiprocessor, in Int. Conf. on Supercomputing, pp. 85--92, 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Latorre, J. Gonzalez, and A. Gonzalez, Back-end Assignment Schemes for Clustered Multithreaded Processors, in Intl. Conf. on Supercomputing, pp 316--325, Malo, France, June-July 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Marcuello, and A. González, Thread-Spawning Schemes for Speculative Multithreaded Architectures, in Procs. of the Symp. on High Performance Computer Architectures, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J.F. Martinez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas, Cherry: Checkpointed Early Recycling in Out-of-order Microprocessors, in Procs. of the Int. Symp. on Microarchitecture, November 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Mendelson, J, Mandelblat, S. Gochman, A. Shemer, R. Chabukswar, E. Niemeyer, A. Kumar, "CMP Implementation in Systems Based on the Intel® CoreTM Duo Processor", in Intel Technology Journal, Volume 10, Issue 2, 2006Google ScholarGoogle ScholarCross RefCross Ref
  21. T. Ohsawa, M. Takagi, S. Kawahara, and S. Matsushita, Pinot: Speculative Muti-threading Processor Architecture Exploiting Parallelism over a wide Range of Granularities, in Proc. of the 38th Int. Symp. on Microarchitecture, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas, Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. P. Jouppi, CACTI 5.1, Technical Report HPL-2008-20, HP Labs.Google ScholarGoogle Scholar
  24. N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August, Speculative Decoupled Software Pipelining, in Procs. of the Conference on Parallel Architecture and Compilation Techniques, pp. 49--59, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C.B. Zilles and G.S. Sohi, Execution-Based Prediction Using Speculative Slices, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.B. Zilles and G.S. Sohi, Master/Slave Speculative Parallelization, in Proc. of the 35th Int. Symp. on Microarchitecture, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Zhong, S. A. Lieberman, and S. A. Mahlke, Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications. In Int. Symp. on High-Performance Computer Architecture, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Boosting single-thread performance in multi-core systems through fine-grain multi-threading

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
        June 2009
        510 pages
        ISBN:9781605585260
        DOI:10.1145/1555754
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
          June 2009
          495 pages
          ISSN:0163-5964
          DOI:10.1145/1555815
          Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader