New Features in OFED

In Mellanox OFED version 4.3, a massive of new features have been released. These techniques will be useful to develop RDMA based applications and hence improve the performance. I will introduce some core techniques as following.

Advanced Transport

  • Dynamically Connected Transport: Dynamically Connected transport (DCT) service is an extension to transport services to enable a higher degree of scalability while maintaining high performance for sparse traffic. Utilization of DCT reduces the total number of QPs required system wide by having Reliable type QPs dynamically connect and disconnect from any remote node. DCT connections only stay connected while they are active. This results in smaller memory footprint, less overhead to set connections and higher on-chip cache utilization and hence increased performance.

Optimized Memory Access

  • Contiguous Pages: Contiguous Pages improves performance by allocating user memory regions over physical contiguous pages. It enables a user application to ask low level drivers to allocate contiguous memory for it as part of ibv_reg_mr.

    1
    2
    3
    4
    5
    6
    7
    Possible Value1 | Description
    ANON | Use current pages ANON small ones.
    HUGE | Force huge pages.
    CONTIG | Force contiguous pages.
    PREFER_CONTIG | Try contiguous fallback to ANON small pages. (Default)
    PREFER_HUGE | Try huge fallback to ANON small pages.
    ALL | Try huge fallback to contiguous if failed fallback to ANON small pages.
  • Memory Window: Memory Window allows the application to have a more flexible control over remote access to its memory. Memory Windows are intended for situations where the application wants to:

    • grant and revoke remote access rights to a registered region in a dynamic fashion with less of a performance penalty
    • grant different remote access rights to different remote agents and/or grant those rights over different ranges within registered region
  • Inline Receive: The inline Optimization is only available for RDMA_Send/RDMA_Write. When Inline-Receive is active, the HCA may write received data in to the receive WQE or CQE. Using Inline-Receive saves PCIe read transaction since the HCA does not need to read the scatter list, therefore it improves performance in case of short receive-messages. Usage: Inline-Receive on the requestor side is possible only if the user chooses IB(V)_SIGNAL_ALL_WR.
  • ODP (On Demand Page): On-Demand-Paging (ODP) is a technique to alleviate much of the shortcomings of memory registration. Applications no longer need to pin down the underlying physical pages of the address space, and track the validity of the mappings. Rather, the HCA requests the latest translations from the OS when pages are not present, and the OS invalidates translations which are no longer valid due to either non-present pages or mapping changes. ODP does not support contiguous pages.
    ODP can be further divided into 2 subclasses: Explicit and Implicit ODP.
    • Explicit ODP: In Explicit ODP, applications still register memory buffers for communication, but this operation is used to define access control for IO rather than pin-down the pages. ODP Memory Region (MR) does not need to have valid mappings at registration time.
    • Implicit ODP: In Implicit ODP, applications are provided with a special memory key that represents their complete address space. This all IO accesses referencing this key (subject to the access rights associated with the key) does not need to register any virtual address range.