GPUDirect
- Eliminate the need to make a redundant copy in CUDA host memory
 - Eliminate CPU bandwidth and latency bottlenecks
 
PeerDirect
- Eliminate the need to make a redundant copy in host memory
 - Direct path for data exchange
 
PeerDirect Async
- Control RDMA device from the GPU
 - Reduce CPU utilization