Exposing graphic memory to any device has always been the goal for applications looking to have low latency communication of data between any device and the GPU.

Using DirectGMA, any device can directly read and write to GPU memory. DirectGMA eliminates unnecessary system memory copies, dramatically lowers CPU overhead, and reduces latency, resulting in significant performance improvements in data transfer times for applications for their AMD FireProTM W5x00 and above and for all AMD FireProTM S series products.

Key Features:

  • Makes a portion of the GPU memory accessible to other devicesimage001
  • Allows devices on the bus to write directly into this area of GPU memory
  • Allows GPUs to write directly into the memory of remote devices on the bus supporting DirectGMA
  • Provides a driver interface to allow 3rd party hardware vendors to support data exchange with an AMD GPU using DirectGMA
  • Peer-to-Peer Transfers between GPUs
    Use high-speed DMA transfers to copy data between the memories of two GPUs on the same system/PCIe bus.
  • Peer-to-Peer Transfers between GPU and FPGAs
    Use high-speed DMA transfers to copy data between the memories of the GPU and the FPGA memory.
  • DirectGMA for Video
    Optimized pipeline for frame-based devices such as frame grabbers, video switchers, HD-SDI capture, and CameraLink devices. See our SDI webpage

Requirements:

  • APIs supporting AMD’s DirectGMA are: OpenGL, OpenCLTM, DirectX®
  • The supported operation systems are: Windows ® 7/8 64 Bit and Linux ® 64 Bitimage002
  • The supported cards (AMD FirePro WTM W5x00 and above as well as all AMD FireProTM S series)

 

AMD’s DirectGMA P2P

  • Direct communication between PCI cards
  • Bidirectional DirectGMA P2P requires memory on both cards

AMD’s DirectGMA in OpenGL

The OpenGL extension AMD_BUS_ADDRESSABLE_MEMORY provides access to DirectGMA. The functions are:

void glMakeBuffersResident(sizei n, uint* buffers, uint64* baddr, uint64* maddr);
void glBufferBusAddress(enum target, sizeiptr size, uint64 surfbusaddress, uint64 markerbusaddress); void glWaitMarker(uint buf, uint value); void glWriteMarker(uint buf, uint value, uint64 offset);

The new tokens are:

GL_BUS_ADDRESSABLE_MEMORY_AMD

GL_EXTERNAL_PHYSICAL_MEMORY_AMD

 

Creating a buffer to receive data

To receive data a buffer needs to be created that can be accessed by other devices on the bus

The physical address of this buffer needs to be known in order to have a remote device writing to this address

glGenBuffers(m_uiNumBuffers, m_pBuffer);m_pBufferBusAddress  = new unsigned long long[m_uiNumBuffers];

m_pMarkerBusAddress = new unsigned long long[m_uiNumBuffers];

for (unsigned int i = 0; i < m_uiNumBuffers; i++){

glBindBuffer(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_pBuffer[i]);

glBufferData(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_uiBufferSize, 0, GL_DYNAMIC_DRAW);

}// Call makeResident when all BufferData calls were submitted.

glMakeBuffersResidentAMD(m_uiNumBuffers, m_pBuffer, m_pBufferBusAddress, m_pMarkerBusAddress);

// Make sure that the buffer creation really succeeded

if (glGetError() != GL_NO_ERROR)

return false;

glBindBuffer(GL_BUS_ADDRESSABLE_MEMORY_AMD, 0); 

Using a buffer on a remote device

To write into the buffer on a remote device we need to create an OpenGL buffer and assign the physical addresses of the memory on the remote device

glGenBuffers(m_uiNumBuffers, m_pBuffer);for (unsigned int i = 0; i < m_uiNumBuffers; i++){glBindBuffer(GL_EXTERNAL_PHYSICAL_MEMORY_AMD,  m_pBuffer[i]);

glBufferBusAddressAMD(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, m_uiBufferSize, m_pBufferBusAddress[i],  m_pMarkerBusAddress[i]);

if (glGetError() != GL_NO_ERROR)

return false;

}

glBindBuffer(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, 0);

 

GPU to GPU copy

Create one thread per GPU. Each thread creates its own context. One thread adds as data sink the other as source.

On the sink GPU a GL_BUS_ADDRESSABLE_MEMORY_AMD buffer is created

On the source GPU a buffer is created.

glGenBuffers(m_uiNumBuffers, m_pSinkBuffer);

for (unsigned int i = 0; i < m_uiNumBuffers; i++){

glBindBuffer(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_pSinkBuffer[i]);

glBufferData(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_uiBufferSize, 0, GL_DYNAMIC_DRAW);

}

// Call makeResident when all BufferData calls were submitted.

glMakeBuffersResidentAMD(m_uiNumBuffers, m_pBuffer, m_pBufferBusAddress, m_pMarkerBusAddress);

glGenBuffers(m_uiNumBuffers, m_pSourceBuffer);

for (unsigned int i = 0; i < m_uiNumBuffers; i++){

glBindBuffer(GL_EXTERNAL_PHYSICAL_MEMORY_AMD,  m_pSourceBuffer[i]);

glBufferBusAddressAMD(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, m_uiBufferSize, m_pBufferBusAddress[i],  m_pMarkerBusAddress[i]);

}

 

The source creates data and copies it into the GL_EXTERNAL_PHYSICAL_MEMORY buffer that has it’s data store on the sink device

// Submit draw calls that do not require data sent by the source…glBindTexture(GL_TEXTURE_2D, m_uiTexture);glBindBuffer(GL_PIXEL_UNPACK_BUFFER, uiBufferIdx);// Indicate that the following commands will need the data transferred by the source

glWaitMarkerAMD(uiBufferId, uiTransferId);// Copy buffer into texture

glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_uiTextureWidth, m_uiTextureHeight, m_nExtFormat, m_nType, NULL);// Draw using received texture

// Draw…++uiTransferId;// Bind buffer that has its data store on the sink GPUglBindBuffer(GL_PIXEL_PACK_BUFFER, uiBufferid);// Copy local buffer into remote bufferglReadPixels(0, 0, m_uiBufferWidth, m_uiBufferHeight, m_nExtFormat, m_nType, NULL);// Write marker

glWriteMarkerAMD(uiBufferId, uiTransferId , ullMarkerBusAddress);

glFlush();

The sink device receives the data and copies it into a texture to be displayed

 

AMD’s DirectGMA in OpenCL

The OpenCL extension CL_AMD_BUS_ADDRESSABLE_MEMORY provides access to DirectGMA

The functions are:

cl_int clEnqueueWaitSignalAMD(cl_command_queue command_queue, cl_mem mem_object, uint value, cl_uint num_events, …

cl_int clEnqueueWriteSignalAMD(cl_command_queue command_queue, cl_mem mem_object, uint value, cl_ulong offset, …    cl_int clEnqueueMakeBuffersResidentAMD(cl_command_queue command_queue, cl_uint num_mem_objects, cl_mem* mem_objects, cl_bool blocking_make_resident, cl_bus_address_amd * bus_addresses, cl_uint num_events, … 

 

The new tokens are:

CL_BUS_ADDRESSABLE_MEMORY_AMDCL_EXTERNAL_PHYSICAL_MEMORY_AMD

AMD’s DirectGMA in DX9

The DirectGMA functionality in DX9 is made available through a so called communication surface The process for using it is as follow: Create a 1×1 offscreen plain surface of format FOURCC_SDIF Lock the surface. On lock, the driver will allocate and return a pointer to a AMDDX9SDICOMMPACKET structure. This structure is the communication surface. Assign and cast the pBits pointer to a locally created AMDDX9SDICOMMPACKET pointer.

AMD_SDI_CMD_GET_CAPS_DATAAMD_SDI_CMD_CREATE_SURFACE_LOCAL_BEGINAMD_SDI_CMD_CREATE_SURFACE_LOCAL_END

AMD_SDI_CMD_CREATE_SURFACE_REMOTE_BEGIN

AMD_SDI_CMD_CREATE_SURFACE_REMOTE_END

AMD_SDI_CMD_QUERY_PHY_ADDRESS_LOCAL

AMD_SDI_CMD_SYNC_WAIT_MARKER

AMD_SDI_CMD_SYNC_WRITE_MARKER

 Running a DirectGMA command:

HRESULT RunSDICommand(IN  LPDIRECT3DDEVICE9 pd3dDevice, IN  AMDDX9SDICMD sdiCmd, IN  PBYTE pInBuf, IN  DWORD dwInBufSize, IN  PBYTE pOutBuf, IN  DWORD dwOutBufSize){    HRESULT                                    hr;

    PAMDDX9SDICOMMPACKET   pCommPacket;

    D3DLOCKED_RECT                  lockedRect;

    LPDIRECT3DSURFACE9          pCommSurf    = NULL;

    hr = pd3dDevice->CreateOffscreenPlainSurface(1, 1, (D3DFORMAT) FOURCC_SDIF, D3DPOOL_DEFAULT, &pCommSurf, NULL);    hr = pCommSurf->LockRect(&lockedRect, NULL, 0);   

    pCommPacket = (PAMDDX9SDICOMMPACKET)(lockedRect.pBits);

    pCommPacket->dwSign = 'SDIF';

    pCommPacket->pResult = &hr;

    pCommPacket->sdiCmd = sdiCmd;

    pCommPacket->pOutBuf = pOutBuf;

    pCommPacket->dwOutBufSize = dwOutBufSize;

    pCommPacket->pInBuf = pInBuf;

    pCommPacket->dwInBufSize = dwInBufSize;

    pCommSurf->UnlockRect();

    REL(pCommSurf);

    return hr;

}

Create a local surface that can be accessed by a remote device

hr = RunSDICommand(pd3dDevice, AMD_SDI_CMD_CREATE_SURFACE_LOCAL_BEGIN, NULL, 0, NULL, 0);if (SUCCEEDED(hr)){// Create SDI_LOCAL resources herehr = pd3dDevice->CreateTexture(width, height, 1, usage, format, D3DPOOL_DEFAULT, ppTex, NULL);

if (SUCCEEDED(hr)){

hr = MakeAllocDoneViaDumpDraw( pd3dDevice, *ppTex );

hr = RunSDICommand(pd3dDevice, AMD_SDI_CMD_CREATE_SURFACE_LOCAL_END, NULL, 0, (PBYTE)pAttrib, sizeof(AMDDX9SDISURFACEATTRIBUTES));

if (SUCCEEDED(hr))

{

pAttrib->surfaceHandle,

pAttrib->surfaceAddr.surfaceBusAddr,

pAttrib->surfaceAddr.markerBusAddr);

}

}

}

return hr;

 

AMD’s DirectGMA in DX10/DX11

The AMD’s DirectGMA extension is accessed by way of the IAmdDxExt interface. In order to create this interface, the extension client must do the following:

  • Include the “AmdDxExtSDIApi.h” file
  • Get the exported function AmdDxExtCreate() from the DXX driver using GetProcAddress()
  • Call AmdDxExtCreate to create an IAmdDxExt interface
  • Get and use the desired specific extension interfaces
  • Close the AMD DirectX extension interface IAmdDxExt once it is no longer needed
  • Release the SDI interface IAmdDxExtSDI
  • Release the extension interface IAmdDxExt

The following DirectGMA functions are provided:

hr = RunSDICommand(pd3dDevice, AMD_SDI_CMD_CREATE_SURFACE_LOCAL_BEGIN, NULL, 0, NULL, 0);if (SUCCEEDED(hr)){// Create SDI_LOCAL resources herehr = pd3dDevice->CreateTexture(width, height, 1, usage, format, D3DPOOL_DEFAULT, ppTex, NULL);if (SUCCEEDED(hr)){hr = MakeAllocDoneViaDumpDraw( pd3dDevice, *ppTex );hr = RunSDICommand(pd3dDevice, AMD_SDI_CMD_CREATE_SURFACE_LOCAL_END, NULL, 0, (PBYTE)pAttrib, sizeof(AMDDX9SDISURFACEATTRIBUTES));if (SUCCEEDED(hr)){pAttrib->surfaceHandle,

pAttrib->surfaceAddr.surfaceBusAddr,

pAttrib->surfaceAddr.markerBusAddr);

}

}

}

return hr;

Download the FirePro DirectGMA SDK from this page