Seung-Jai Min and Professor Rudolf Eigenmann
Purdue University, School of Electrical and Computer Engineering

Motivation

  • The software distributed shared memory (SDSM) systems are known to have performance overhead in their implementations. However, the concept of software DSM system, which is to provide a shared address space abstraction on a cluster of computing nodes is still valid; it is the mechanism that makes the software DSM inefficient.
  • We don't want a yet another software DSM if it is just an enhanced version of the conventional software DSM. Instead, we need to create a software DSM starting from a completely new concept. It is possible to take both the performance of MPI and the productivity of shared memory programming.
  • Design Philosophy

  • The goal of LDSM is to achieve scalable performance comparable to hand-tuned MPI applications on distributed memory systems, while leveraging the ease of programming of the shared memory programming, such as OpenMP.
  • It is all about the performance!
    • Ease of programming is great, but not sufficient to solicit people in the MPI community to use LDSM.
  • We do not want to deal with the overheads of the conventional page-based software DSMs.
    • No false sharing problem due to fixed large memory coherence unit
    • No OS intervention, such as page fault mechanism, to detect shared data accesses
  • We do not want to deal with the overheads of the conventional object-based software DSMs.
    • Shared write detections in many object-based software DSMs have overheads because it monitors every store operation to shared data.
    • Data access analysis should be performed only at synchronization point and only if it is necessary.
  • But, we still want to retain the advantage of the conventional software DSMs.
    • Generally, software DSMs can handle a wide range of applications, which is an advantage of software DSMs that needs to be retained.
    • Especially for page-based software DSM, aggregating communication messages are efficiently achieved due to its paging mechanism, which is good. However, message aggregation can also be achieved more effectively without using the paging mechanism.
  • Aggressive Communication Optimizations
    • Sources of communication inefficiency
      • unnecessary communication - sending data that will not be used by the receiving processor
      • redundant communication - sending data that the receiving processor already has in its local memory.
    • Inefficient communications are caused by the lack of good data access analysis (solution? see the next bullet)
  • LDSM's combined compile-time/runtime data access analysis!
    • Efficiency - Runtime analysis is good at precise shared-write-analysis, however, it may incur runtime overehad. Compiler analysis can reduce the runtime overhead by pinpointing where to inspect.
    • Versatility - Conventional inspector/executor model is difficult to use and its applicability is limited to simple indirect array accesses. LDSM's data access analysis is general that it can be applied to any access patterns including indirect array accesses, non-affine array subscript expressions, and even pointers.
  • Compiler analysis should be simple and supportive
    • Compiler should not be a bottleneck - heavily relying on compiler's ability makes the compiler implementation a tedious job, because it takes tremendous effort to handle corner cases and case-by-case scenarios. We don't need a system that works on a limited set of simple applications.
    • Therefore, the compiler analysis should be a supporting role, which will help the runtime system execute efficiently. If the compiler information is available, then the runtime system will get the benefit from the compiler analysis. If not, still the runtime system should be able to handle the program.
  • Simple and Lean Implementation.
    • LDSM is simple that it is easy to understand and modify the source code; we want LDSM to be a software infrastructure for researchers who want to evaluate their optimization techniques in distributed computing environment.
    • LDSM is portable - it is written in C language with standard MPI library that LDSM can be easily ported to different platforms.
  • An Overview of LDSM

  • An Integrated Compile-Time/Runtime Data Access Analysis


    The gist of our data access analysis is an integrated compile-time/runtime analysis - it combines what both analysis techniques do well.

    Data Access Analysis

    1. Key Observations
      • Compile-time analysis - a compiler is good at summarizing future (before the execution of the program) data accesses, but it can be conservative; complex access patterns or access patterns not known at compile-time are overestimated.
      • Runtime analysis - both regular and irregular accesses occurred in the past can be precisely analyzed by inspecting modified memory regions, but it cannot analyze data accesses that didn't happen yet.

    For more information, please, come back later. We will provide the complete description of LDSM system and open the donwload link for the LDSM software package (compiler + runtime libraries) by the Feburary 2009, thank you [go to top]

    Publications

    Seung-Jai Min and Rudolf Eigenmann, Optimizing Irregular Shared-Memory Applications for Clusters , Proc. of the 22nd ACM International Conference on Supercomputing, (ICS'08), June 2008.
    Previous Publications [go to top] [go to top]

    Funding

    This work is supported in part by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
  • Contact

  • Seung-Jai Min
  • ( Home page, Email)
    [go to top]
    Go Back to the Paramount Group Research Page.