GPUDirect
- Eliminate the need to make a redundant copy in CUDA host memory
- Eliminate CPU bandwidth and latency bottlenecks
PeerDirect
- Eliminate the need to make a redundant copy in host memory
- Direct path for data exchange
PeerDirect Async
- Control RDMA device from the GPU
- Reduce CPU utilization