xGPU version with modified input data fetching (bypass of host→device transfers) is at the following Github repository:
To change number of tiles, ultrafine channel width, pipelining, etc:
- Edit mwax-xGPU/src/xgpu_info.h
- make clean
sudo cp libxgpu.so /usr/local/lib
Settings for 250 Hz ultrafine resolution and 256T (or fewer):
Number of actual time samples per xGPU gulp is 50 (10 per 40 ms block, 5 blocks).
Must use PIPE_LENGTH = 2 so the input cube to xGPU fits within CUDA texture memory constraints. NTIME must be a multiple of 8 that is >= 50, the first of which is 56.
#define NTIME 56
#define NTIME_PIPE 28
Alternatively can use:
Settings for 250 Hz ultrafine resolution and 128T (or fewer):
If 128 or fewer tiles, we have the option to use PIPE_LENGTH = 1 without hitting texture memory constraints. In that case, only array_d is used and NTIME = NTIME_PIPE can be the first multiple of 4 that is >= 50.
Settings for 125 Hz ultrafine resolution:
The input cube to xGPU is similar in size to 250 Hz because twice the channels is offset by ~half the time samples. All depends on the number of tiles and NTIME/NTIME_PIPE values. Suck it and see!