Skip to end of metadata
Go to start of metadata

xGPU version with modified input data fetching (bypass of host→device transfers) is at the following Github repository:


To change number of tiles, ultrafine channel width, pipelining, etc:

  1. Edit mwax-xGPU/src/xgpu_info.h
  2. make clean
  3. make
  4. sudo cp /usr/local/lib

  5. sudo ldconfig


This is set to the number of tiles (not signal paths).


This is set to the number of ultrafine channels.
For 125 Hz ultrafine resolution: #define NFREQUENCY 10240
For 250 Hz ultrafine resolution: #define NFREQUENCY 5120


xGPU is modified to accept data written directly into array_d.
If no pipelining is used (which is fine because there's little to be gained given that host→device transfers are bypassed), all data is written to array_d[0].
For no pipelining, set PIPE_LENGTH = 1, i.e. NTIME_PIPE = NTIME.
The xGPU malloc of array_d has been modified such that array_d[0] and array_d[1] are always contiguous.  This means one can also use PIPE_LENGTH = 2 (but no higher).
For PIPE_LENGTH = 2, set NTIME_PIPE = NTIME/2.  NTIME_PIPE is constrained by xGPU to be a multiple of 4, so NTIME must be a multiple of 8.

Settings for 250 Hz ultrafine resolution and 256T (or fewer):

Number of actual time samples per xGPU gulp is 50 (10 per 40 ms block, 5 blocks).

Must use PIPE_LENGTH = 2 so the input cube to xGPU fits within CUDA texture memory constraints.  NTIME must be a multiple of 8 that is >= 50, the first of which is 56.


#define NTIME 56

#define NTIME_PIPE 28

Alternatively can use:

#define NTIME 64
#define NTIME_PIPE 32

Both options give near-identical speed for 256T on the RTX2080Ti.  (It seems there is a benefit to having NTIME_PIPE a multiple of 16 that offsets the extra time samples to correlate.)
But 56/28 has a very slight edge, at least for the subset of modes tested, and it should also have smaller accumulated floating-point rounding errors (but that is difficult to confirm).

Settings for 250 Hz ultrafine resolution and 128T (or fewer):

If 128 or fewer tiles, we have the option to use PIPE_LENGTH = 1 without hitting texture memory constraints.  In that case, only array_d[0] is used and NTIME = NTIME_PIPE can be the first multiple of 4 that is >= 50.


#define NTIME 52
#define NTIME_PIPE 52

Settings for 125 Hz ultrafine resolution:

The input cube to xGPU is similar in size to 250 Hz because twice the channels is offset by ~half the time samples.  All depends on the number of tiles and NTIME/NTIME_PIPE values.  Suck it and see!

  • No labels