Recombining means converting legacy VCS data from one format to another. The two formats are both binary formats, with each byte representing a (4+4)-bit complex signed sample. The main difference between the formats is the ordering of the bytes and how they are distributed between multiple files. For the purposes of this document, the "from" format will be called the PFB or unrecombined format, and the "to" format will be called the VCS or recombined format. The format conversion is necessary because the beamformer only (currently) supports the VCS format as input.
The primary software for recombining is found in the mwa-voltage repository. This has recently been forked to its own dedicated recombine repo in order to promote further development/maintenance, but as of this writing (2022-06-01) is identical (in functionality and usage) to mwa-voltage. In either case, the name of the exectuable is
recombine. On Garrawarla, it is provided by the
recombine modules. Future developments will be made available through the
recombine module, but the
mwa-voltage module will always remain available for compatibility with historical pipelines.
Processing on Garrawarla
Examples of using
recombine to process both single-second and multiple-second jobs on Garrawarla is now provided as part of the documentation included in the recombine repository.
Other (wrapper) scripts
vcs_download.nf is a Nextflow script provided by the mwa_search repo. Its use is described on the main Documentation page. As a quick reference, however, the following template can be followed on Garrawarla:
vcs_download.nf will remove the PFB files once they have been successfully recombined.
checks.py are provided as part of VCSTools (vcstools module on Garrwarla). This (among many other things) is a wrapper for doing recombine on the GPU cluster ("gpuq") on Galaxy.
To recombine all of the data, use
or, for only a subset of data, use
If you want to see the progress, then use:
Generally, this processing should not take too long, typically ~few hours.
Checking the recombined data
It is a good idea to check at this stage to make sure that all of the data were recombined properly. To do this, use:
This will check that there are all the recombined files are present and of the correct size. If there are missing raw files the recombining process will make zero-padded files and leave gaps in your data. If you would like to do a more robust check, beamform and splice the data (using the following steps) and then run:
Then you can look through the produced .dat file for gaps using:
Once you are happy that the data have been recombined correctly then you should delete the raw voltages (as they are no longer used in the pipeline and are a massive drain on storage resources).
Planned future developments for recombine
- Add GPU support
- Improve CLI interface
Description of PFB format