Skip to end of metadata
Go to start of metadata

VCS pulsar data processing

This documentation is focused on the pulsar science applications of VCS data. For a detailed description of the VCS and its science goals and capabilities, please see Tremblay et al. (2015)
While not necessary, it will be useful to have a basic understanding of radio interferometry when reading this documentation, so this might help (if you're unfamiliar). We also have a series of papers on the tied-array beamformer: Ord et al. (2019), Xue et al. (2019) and McSweeney et al. (2020).

The wrapper script is used for many of the steps needed to reduce the voltages to produce incoherently/coherently summed data in PSRFITS and/or VDIF format. It attempts to make things as streamlined and user-friendly as possible. To see detailed usage information, -h. There is also the recently developed which can display its detailed usage with --help.

For our current Pawsey installation, there are a few specifics:

  • We now have access to the Garrawarla cluster, which we should use for the bulk of our processing. Note Garrawarla does not have access to the /group file system so make sure all of your data is on /astro. The Galaxy (and Magnus) Cluster(s) should be thought of a backup cluster or somewhere to do testing.
  • In order to access any of the tools, you will need to source in your personal profile: /pawsey/mwa/software/psrBash.profile or on Galaxy: /group/mwavcs/PULSAR/psrBash.profile. One way to do this is to add the following to your .bashrc

    if [[ $HOSTNAME == garrawarla* ]]; then 
    	alias sp3='source /pawsey/mwa/software/profiles/mwavcs_pulsar.profile'
    elif [[ $HOSTNAME == galaxy* ]]; then
    	alias sp3='source /group/mwavcs/PULSAR/psrBash.profile'

    Then just run the command sp3 to load all our software

There is a master branch and a development branch of the main codebase. By default, psrBash.profile will load the master branch, which is meant to represent the most stable version. However, there may be bugs that were later discovered and fixed, but which have not yet been grafted back into the master branch. If you wish to try the more cutting edge development branch, then after sourcing /group/mwavcs/PULSAR/psrBash.profile, run

module load vcstools/devel

The Nextflow scripts (.nf) are in the mwa_search module so to switch to the development version run

module load mwa_search/devel

You will also have to load the nextflow module if you are using any nextflow scripts :

module load nextflow
  • There is a strict directory structure for VCS processing, given the large volumes of data involved. Raw/recombined and all other products (incoherent sum products, coherent sum products, calibration solutions, etc.) data is stored on the /astro file system.
    • We have endeavoured to make sure this is done without requiring the user's attention, however, please double-check working directories and defaults for scripts.

A note regarding code examples:

Code snippets or complete command-line calls will be enclosed in a code block, i.e.

This is a code block

Furthermore, scripts will require some input and other input will be optional. Optional arguments will be surrounded by [] characters. In both cases, where the user need to specify some input (rather than just toggling a mode of operation), a brief description of that input will be surrounded by <> characters. For example: --user_input <required input> [--optional_user_input <optional user input>] [--optional_flag]

Table of Contents

Downloading data

Data are downloaded via the "copyq" on the Zeus machine at Pawsey. In the background on the NGAS servers, the raw voltages are (slowly) being "recombined" - they are labeled with the "Processed" flag on the MWA data archive
Therefore, there are two different "kinds" of downloads available: raw data OR recombined data. In both cases, the time from a download request to having the data on disk is typically ~few days (depending on volume, load on queues/archives, etc). 

All being well, the decision between "raw" and "recombined" downloads should be auto-magically made for you by the script (see Nextflow Notes for tips on running Nextflow scripts). If the data on the NGAS server has been processed, the script will download the "tarballed" recombined files (and, rather nicely, sends them off to get "untarred" on the fly). If the data has not been completely processed, then the raw data will be downloaded and then recombined.

  • If raw voltages are downloaded, then they will be placed in: /astro/mwavcs/vcs/[obs ID]/raw
  • If the processed/recombined data are downloaded, then the relevant files are placed in: /astro/mwavcs/vcs/[obs ID]/combined

If the observation is recent (performed within 4 days) then it is best to first check if all the VCS files have been archived (moved from site to Pawsey). This can be done with the following command: <obs ID> | grep Files

Which should output something like this:

Files available:    97.6%  (55244/56590)

In this example, 2.4% of the files are not yet archived so the user should wait until 100% of the files are archived. If all the files have not been archived in over 5 days you should contact someone on the MWA operations team to ask them what may have gone wrong.

To download VCS data, you must use the ASVO. Once you have created an account and logged in, you can use the ASVO job dashboard to submit a download job. You must click on the "Voltage Download Job" tab, input the observation ID, offset (seconds from the beginning of observation) and duration (in seconds). For example, here is how you would download the first 600 seconds of an observation.

You should then be able to see your download jobs on the ASVO job monitor. Once the job is complete, you can continue to the next step

The data will be downloaded to /astro/mwavcs/asvo/<ASVO job ID>. You can look at the first and last files in this directory to confirm your observation's first and last GPS second. To recombine raw data or untar combined data and put it in the correct directory can be done with: --obsid <obs ID> --begin <starting GPS second> --end <end GPS second> --download_dir /astro/mwavcs/asvo/<ASVO job ID>

A note on beginning and end times: the observation ID does not correspond to the starting GPS second of the data. In order to determine the beginning/end times of the observation data, use: <obs ID>

and this will help you determine the correct GPS times to pass to the -b/-e options. This applies for all jobs that have an optional beginning and end times.


As noted above in Downloading Data, the script includes the recombine step, so manually recombining will only be necessary for situations where the Nextflow option is either unavailable or undesirable. (Note that with the recent switch to using ASVO as the data downloading server, is now a bit of a misnomer – it can still be used to do the recombining, even if the data are already downloaded.) Other methods of recombining are detailed at the Recombine page.

Incoherent sums

After the download has completed, you already have all that is necessary to create the first kind of beamformed data: an incoherent sum. The data used to create this kind of beamformed output are labelled as <obsID>_<GPS second>_ics.dat in the downloaded data directory (/group/mwavcs/vcs/<obs ID>/combined by default), hence we refer to the incoherent sum files as ICS files. The incoherent sum is, basically, the sum of the tile powers, producing a full tile's field of view, with a √N improvement in sensitivity over a single tile, where N is the number of tiles combined. Unfortunately, this kind of data is also very susceptible to RFI corruption, thus you will need to be quite stringent with your RFI mitigation techniques (in time and frequency). 

To create the incoherent sum, we run: <obs ID> [-b base directory]

where the optional argument -b can be used to make the script look in the input path for the ICS data, but the default is /group/mwavcs/vcs/<obs ID>. This script will create a folder called ics in the base directory and populate it with the incoherently summed PSRFITS data products. There will be one file per 200-seconds of data, and the files are labelled as <obs ID>_XXXX.fits, where XXXX corresponds to which 200-second chunk is included (i.e. 0001 means the first 200 seconds of data).


In order to coherently beamform the data (i.e. form a tied-array or phased-array beam at a given pointing somewhere within the "incoherent" beam), we need to account for a number of factors. These corrections and how they are calculated is totally beyond the scope of this guide, but it amounts to: "What delay do we add to each tile to make sure they all phase-up and point at the same place on the sky?"

There are different ways that we can get this information. In our case, there are two types of observations that can be used:

  • dedicated calibrator: usually directly before or after the target observation on a bright calibrator source. These are stored on the MWA data archive as visibilities (they are run through the online correlator like normal MWA observations). You can find the observation ID for the calibrator by searching on the MWA data archive or by using the following command which will list compatible calibrator IDs sorted by how close they are in time to the observation: -c <obs ID>

    If this command generates an error, it may be due to the lack of calibration observations with the same frequency channels. If so try the Calibration Combine Method.

  • in-beam calibrator: using data from the target observation itself, correlating a section offline, and using those visibilities. See Offline correlation.

In order to download the calibration observation, set your MW ASVO API key as an environment variable. Below are some steps to do so:

  • First go to the website
  • Login with your username and password, or via eduGAIN (using your institution details)
  • Go to your profile and get your API key
  • Copy this API key to your .profile  or  .bashrc file (i.e. export MWA_ASVO_API_KEY=<your API key>)
  • For more information visit manta-ray-client github

To download a dedicated calibrator observation, use: -m download_cal -o <obs ID> -O <cal obs ID>

which will automatically create the correct directory structure and symbolic links and download the calibrator data to /astro/mwavcs/vcs/[cal obs ID]/[cal obs ID]. A symbolic link to this directory, called "vis", that is created in /group/mwavcs/vcs/[obs ID]/cal/[cal obs ID].

For both of these types of calibration data, there are (technically) multiple methods we can use to get calibration solutions, including: 

We currently only use the RTS to produce solutions, though in principle the imaging software pipeline is just as valid.

Calibrating with the Real Time System (RTS)

There are a number of steps to go from the visibilities to a usable calibration solution. Most of which is setting up the RTS input files. In all cases, we need a metafits file which contains a lot of information about the array setup, observation parameters, etc. By downloading a dedicated calibrator observation, there will be a [obs ID]*metafits*.fits included. When correlating for in-beam will also check to see if there is a metafits file available and download it.

Getting calibrator source lists

In order to do the calibration, we need 1 or more bright sources in the field to actually get the phase and amplitude solutions. The RTS uses a list of known sources to calibrate the antennas.

First, change to the directory where the calibrator "vis" symbolic link is located. This should be /group/mwavcs/vcs/[obs ID]/cal/[cal obs ID]. We get a list of sources for the RTS to use by running: -m <cal obs metafits file> -n <number of sources> -s <base RTS source list>

which will create a source list of one "super" source, which contains N components and is called "srclist_puma-*_[obs ID]_patch[N].txt". In general, it's a good idea to have N~1000, regardless of whether you are using a dedicated calibrator observation of doing in-beam calibration, but this number is totally arbitrary (in some cases, 100 may do).The RTS selects the brightest source per channel and uses it to calibrate the tile amplitudes and gains. The remaining N-1 sources ("components") are then used to correct for ionospheric shifts.

An important note here is that we have two options for the base RTS source list, which are captured with the following environment variables:

  • $RTS_SRCLIST  (newest source list, based on GLEAM catalogue, but does have gaps, particularly around some of the "A-team" sources and the Galactic plane)
  • $RTS_SRCLIST_V2  (the old source list, still has gaps around the Galactic plane, but includes the "A-team" we normally use for calibration)

By default, this code searches for a source with a beam-weighted flux density >10 Jy and within 1 degree of the nominal pointing centre. If no source satisfies that criteria, the search radius is incrementally increased by 0.5 degrees until a source is found.  The code also has an experimental option "order" (-o), which we can use to specify which sources to select based on flux density and distance from the pointing centre. For example, if we wanted to allow the primary calibrator to be 5 Jy (instead of the default minimum of 10 Jy), and search within a 20 degree radius of the pointing centre, then our command would become: -m <cal obs metafits file> -n <number of sources> -s <base RTS source list> -o experimental=10,20

This is particularly handy if, by default, the code does not pick up on the desired calibrator source (i.e. if you know Pic A is in the field, but not within the central degree or so). You can specify a wider search radius to begin with, at which point Pic A should be selected as the "base" calibrator source. If all else fails, please use $RTS_SRCLIST_V2 as the base source list and try again.

RTS configuration and tile/channel flagging

To create a RTS-specific configuration file (which the RTS reads to get all the information and file locations, etc, that it needs), use: -o <obs ID> -O <cal obs ID> -m <cal obs metafits file> -s <output source list> 
				 [--gpubox_dir <path/to/visibilities>] [--rts_output_dir <output directory>] [--n_vis_grp <# groups>] [--offline] [--nosubmit]

The --gpubox_dir option is by default pointing to /group/mwavcs/vcs/[obs ID]/cal/[cal obs ID]/vis, and should only be changed if the visibilities are stored in a non-standard location.

The --rts_output_dir is by default pointing to /group/mwavcs/vcs/[obs ID]/cal/[cal obs ID], and only experienced users should change the output directory. The script will create a "rts" subdirectory in the argument of --rts_output_dir.

The --n_vis_grp is an advanced option that sets how many bins the visibility data are chopped into. This affects how well the RTS can combat decorrelation over the band and will depend strongly on the baseline configuration (i.e. long baselines decorrelate faster, so more bins are required). In theory there is no downside to increasing this number other than computation time, but we are limited by GPU memory in this case. The default number of bins created is 6, but even this is not sufficient for the longest baselines. Given the current limitations of the Galaxy GPU queue hardware, we cannot use anything more than 6.

The --offline flag allows for offline correlated data to be used with the RTS (experimental - we have had limited success in doing this).

The --nosubmit flag tells the script to not submit the jobs to the GPU queue, but just to write the required scripts.

You should see an output like the following:

This script will create a "rts" subdirectory in whatever directory was given --rts_output_dir and populate it with:

  • flagged_tiles.txt, which contains the numbers of tiles that need to be flagged, based on what is in the metafits file provided,
  • flagged_channels.txt, which contains the channel numbers (depending on the fine channel resolution provided) to be flagged, and
  • a configuration file, rts_[cal obs ID].in, which is what the RTS will read during initialisation.

Inspecting the solutions

After the RTS has run, it will create a set of files in the "rts" subdirectory (Bandpass calibration files and Direction Independent Jones matrix files) used to actually beamform the data.
Once the calibration is completed (it runs on the "gpuq"), move to the rts directory and run: 


which will show you the bandpass amplitude and phase calibration solutions for each coarse channel in the "attempt_1" subdirectory.  From these plots, we are able to identify troublesome tiles and/or channels and include these in the flagged_tiles.txt and flagged_channels.txt and then re-running the calibration step.

A good solution will have a relatively flat bandpass with a nominal  "gain relative to band average" value of ~1 for Jones Matrix elements P←X (top left) and Q←Y (bottom right), and should have a gain value of ~0 for the P←Y (top right) and Q←X (bottom left) components. Below is an example of the amplitudes for an excellent calibration solution.

For the phase solutions, we expect that the P←X (top left) and Q←Y (bottom right) components are around 0 degrees, while the P←Y (top right) and Q←X(bottom left) should be random. Below are the phase solutions corresponding to the above amplitude solutions.

Remember to check the calibration solution for each coarse channel and if only a single coarse channel appears to have problems, that is okay, just remember that when processing your data (with Presto or DSPSR) as you may have to flag that coarse channel.

Improving the RTS solutions

If your calibration solutions don't look as clean as the above images you can use the following steps to attempt to improve them. Remember that calibration is more art than science so try a few different methods and see what works, auto_plot.bash will make a new "attempt_N" subdirectory so if you make your calibration solution worse you can copy the files from the attempt subdirectory with a better solution to the rts directory to overwrite your latest attempt. If a calibration solution isn't converging it may be easy to try a different calibration observation (3C444 often gives the best calibration solutions).

Here is an example of an imperfect calibration solution

The key in these plots are suggestions of which tiles may need to be flagged. These suggestions mean different things for amplitude or phase solution plotting:

  • Amplitudes: the relative gain amplitude threshold used to determine whether tiles are bad. Default is 0.35. 
  • Phases: the phase deviation threshold (in degrees) used to determine bad tiles. Default is 30 degrees.

I (Nick) normally use the amplitude plots to work out what needs flagging so you can assume I'm talking about amplitude plots from here onward. Here are some basic tips:

  • If there are any tiles with max>3 on multiple channels it is worth flagging.
  • If there are any tiles that have a max=0 this is worth flagging (even if only one polarisation has max=0!) as it is contributing no signal and can cause errors in beamforming
  • It is best to only flag up to ~3 tiles at a time as bad tiles can affect other potentially good tiles
  • Sensitivity scales as ~sqrt(128-N) where N is the number of tiles flagged so if you start flagging more than 50 tiles will start to lower your sensitivity and maybe worth abandoning
  • Make sure you put the number in the right of the key (between flag and ?) into the flagged_tiles.txt file

In the "attempt_number_N" subdirectory are a chan_x_output.txt and phase_x_output.txt file that contains all of the recommended flags that the calibration plotting script creates. These can be useful when deciding which tile(s) to flag next. The following bash command will output the tile with the greatest gain value for each channel:

for i in $(ls chan*txt); do grep $(cat $i | cut -d '=' -f 3 | cut -d ' ' -f 1 | sort -n | tail -n 1 | head -n 1) $i; done

Here is an example output of the bash command

Possible PQ flag: ID= 82, max=2.921873 (flag  81?)
Possible QQ flag: ID=123, max=1.634151 (flag 122?)
Possible PP flag: ID= 69, max=1.744891 (flag  68?)
Possible PP flag: ID= 82, max=1.776776 (flag  81?)
Possible PP flag: ID=123, max=1.676935 (flag 122?)
Possible QQ flag: ID=123, max=1.500668 (flag 122?)
Possible PP flag: ID=  6, max=1.759412 (flag   5?)
Possible PP flag: ID=123, max=1.584389 (flag 122?)
Possible PP flag: ID=123, max=1.545457 (flag 122?)
Possible PP flag: ID= 82, max=1.906439 (flag  81?)
Possible QQ flag: ID= 15, max=1.805018 (flag  14?)
Possible PP flag: ID= 82, max=1.837698 (flag  81?)
Possible PP flag: ID=123, max=1.877094 (flag 122?)
Possible QQ flag: ID=121, max=1.762988 (flag 120?)
Possible PP flag: ID=  6, max=1.699051 (flag   5?)
Possible QQ flag: ID= 14, max=1.678808 (flag  13?)
Possible PP flag: ID=123, max=1.892947 (flag 122?)
Possible QQ flag: ID=123, max=1.839456 (flag 122?)
Possible QQ flag: ID=121, max=1.696999 (flag 120?)
Possible QQ flag: ID= 69, max=1.818684 (flag  68?)
Possible PP flag: ID=  6, max=1.750552 (flag   5?)
Possible PP flag: ID=123, max=1.908161 (flag 122?)
Possible PP flag: ID=  6, max=2.012411 (flag   5?)
Possible PP flag: ID= 15, max=1.563874 (flag  14?)

So for this example, you should flag tile 122.

It is also interesting to look for obviously too low powers, which can be achieved by running the following, which will list each tile with the lowest gain value. Tiles with very small (near-zero) values should be flagged.

for i in $(ls chan*.txt);do grep $(cat $i | tail -n+3 | cut -d '=' -f 3 | cut -d ' ' -f 1 | sort -n | head -1) $i;done

If there is a single channel that is affecting your solution, you can flag it using the following method.

Make sure you have used the -X option when you ssh to enable an interactive terminal, then run: -f BandpassCalibration_node<coarse channel number>.dat -c

This should create an interactive plot that looks like this

Use the magnifying glass symbol to drag a rectangle around the bad channel to zoom in like so

It is now clear that you should add 23 to flagged_channels.txt

Picket fence calibration with the RTS

The MWA can optionally split its 30.72 MHz of bandwidth up into 24 x 1.28 MHz bands that can be spread anywhere within the nominal observing range (70-300 MHz). We call observations of this type "picket fence". Calibrating these observations and combining them can be a little more involved, but none-the-less doable. 

The RTS takes quite a bit of work to get going with picket fence data. It assumes that when you pass it data that all the frequency bands are contiguous. So we have to do some "hacking" of the configuration files we send for each picket fence channel. We've incorporated all of the difficult stuff into, so the user shouldn't need to worry whether they are calibrating using picket fence data or not.


  • The RTS basically does a "ls" command to find all of the gpubox files, which on a parallel file system can sometimes hang. This will be obvious when watching the RTS job for a channel takes much longer than other channels. In this case, it is a good idea to look in the batch output file (named something like RTS_[obs ID]*.out) and the log files (which are written to the output directory specified when calling to help determine which jobs are hanging. These jobs should be cancelled and resubmitted individually to the queue. 

  • You may find that some coarse channel solutions simply don't converge. In this case, look at those channels that did converge and flag any obviously bad tiles or fine frequency channels (by adding the relevant entry to either flagged_tiles.txt or flagged_channels.txt) and re-run the calibration (often you will need to iterate a few times anyway). Hopefully, by eliminating really bad tiles/fine-channels, other coarse channels will converge. 

Cleaning up after using the RTS

Once you have a calibration solutions you're happy with, please remember to delete the raw visibilities, which are no longer needed.

It is a good idea to make a backup copy of the solutions themselves. We suggest making a compressed tarball that includes the following files:

  • The DI_Jones files,
  • The Bandpass files,
  • flagged_channels.txt and flagged_tiles.txt, and
  • the rts_[OBSID].txt file

Compression will take the size down from ~55 MB to ~20 MB. The final tarball can then be easily transported to a secure backup location of your choice.

Calibrating with the calibrate software from mwa-reduce package by Andre Offringa

This is initial introduction of using calibrate software from mwa-reduce package by Andre Offringa (AO). It is available on Garrawarla after loading module:

  • module load mwa-reduce/master
  • Execute calibrate to see the available options.

It requires CASA measurement set which can be downloaded using MWA ASVO interface (see also in the RTS section above). To use mwa_client on Garrawarla load:

  • module load manta-ray-client/master
  • Downloading example :
    • Create csv file target.csv, cat target.csv:
      obs_id=1103645160, job_type=c, download_type=vis, timeres=1, freqres=40, conversion=ms, edgewidth=160, allowmissing=true, noflagautos=true
    • Then use command: mwa_client --csv=target.csv --dir=./
    • This will download zip file with CASA measurement set from MWA ASVO regardless of the internal format

Sky model file can be generated in the same way as sky model for the RTS using PUMA  (see also in the RTS section above). A-team calibrator soures may require a better model than the default in PUMA.

Assuming the name of CASA measurement set is and sky model is saved in sky_model.txt the following command can be used as an example:

calibrate -j 14 -applybeam -mwa-path /pawsey/mwa -absmem 40 -m sky_model.txt -minuv 58 -i 100 1103645160_calsol.bin

There are several options. Here, -applybeam beam specifies if you want to use the MWA 2016 (or other) beam model (beam model files are in /pawsey/mwa/) to calculate apparent flux densities of the calibrator sources in the sky_model.txt file. Alternatively, some people do it themselves and put "apparent sky" flux values (corrected for the beam) into sky_model_BeamCorr.txt sky model files. Parameters to specify min and max baselines to be used -minuv and maxuv respectively.
The resulting calibration solutions will be saved to a binary file 1103645160_calsol.bin. This file can be easily read into python.

I've attached two (untested) example scripts which should work on Garrawarla. Calibration solutions in the .bin AO-format can be plotted for example by using script from package by running:

  • 1103645160_calsol.bin

Offline correlation

In the case where trying to calibrate on a dedicated calibrator observation fails, we can try in-beam calibration. This requires visibilities made from the actual data you're ultimately trying to beamform on, thus we need to use the offline correlator. This is relatively easy, as it's a mode that has: it will send off the correlator jobs at the frequency and time resolution you request and will (by default) put them in the /astro/mwavcs/vcs/[obs ID]/vis

While you technically can correlate the entire observation, that is not recommended - the data volume will be immense. Usually you can just correlate ~200 seconds and that will be sufficient for a calibration solution.

For a desired frequency and time resolution over the interval [begin : end] (in GPS seconds), use: -m correlate -o <obs ID> -b <starting GPS second> -e <end GPS second> --ft_res <freq. res. (kHz)> <time res. (ms)>

A typical ft_res is --ft_res 40 1000 which is 40 (kHz) and 1 (second) integrations. Note that the offline implementation of the correlator is (currently) LIMITED to a maximum dump time of 1 second. Again, see the help for details.

The correlator jobs are run on Garrawarla's "gpuq", thus you can check the jobs using:

squeue -u $USER -p gpuq

Finding pulsars in the observation

The huge field-of-view of the MWA's tile beam means that there can be 100s of pulsars in an observation. To list all pulsars in the beam run the following command

module load vcstools/devel -o <obs ID>

Which will create a file <obs ID>_analytic_beam.txt with all the pulsars in the beam.

You can also run -o <obs ID> --sn_est

To estimate the signal-to-noise ratio of the pulsars in this observation. Note that this is a time-consuming calculation so you'll have to be patient and they often have large uncertainties. See section Estimating pulsar fluxes at MWA frequencies for more information on the method.

We do have a data processing pipeline that will beamform, fold and create stokes profiles on all pulsars in the beam. This pipeline is extremely processing heavy and still in development so it's best that you ask Keegan or Nick to run the pipeline for you.

Beamforming (Nextflow method)

The new Nextflow beamforming method is just a better wrapper for the beamformer (see Nextflow Notes for pros, cons and tips) and automatically splices the fits files. The old method is shown below if you prefer it.

To display the available options of the Nextflow beamformer run --help

Once we have finished calibrating we can beamform using --obsid <obs ID> --calid <cal ID> --all (or --begin <begin GPS time> --end <end GPS time>) --pointings <"RA string">_<"DEC string">,<"RA string">_<"DEC string"> [--ipfb] [--summed]

where ["RA string"] is formatted as "" and ["DEC string"] is formatted as "" (including the sign: "+" or "-") you can also include multiple pointings by separating them by a comma (,). The beamformer will output full polarisation (Stokes I, Q, U & V) PSRFITS files unless you used the --summed flag. The summed option only outputs Stokes I which means you can't create polarisation profiles but uses a quarter of the storage and for that reasons is useful for large scale searches.

In our experience, it typically takes ~0.3x real-time to beamform. The process will create a directory /astro/mwavcs/vcs/[obs ID]/pointings/[RA string]_[DEC string] where the PSRFITS files will labelled as something like: 

[obs ID]_[pointing]_ch[lowest channel]-[highest channel]_0001.fits

where [channel] is the absolute frequency channel (0-256).

We are also able to create VDIF format output by including --ipfb which instead produces two files per channel: 

[obs ID]_[pointing]_ch[channel]_u.hdr
[obs ID]_[pointing]_ch[channel]_u.vdif

where [obs ID], [pointing] and [channel] are defined as above.

Pulsar Processing on Garrawarla

PRESTO and DSPSR are not currently natively installed on Garrawarla so the following singularity commands.

For PRESTO commands use:

/pawsey/mwa/singularity/presto/presto.sif <command_here>

For example:

/pawsey/mwa/singularity/presto/presto.sif prepfold -o <output_name> -psr <pulsar_jname> *fits

For simple DSPSR and PSRCHIVE commands use:

/pawsey/mwa/singularity/dspsr/dspsr.sif <command_here>

If you require an interactive or pop up window (such as with tempo2) first, make sure you use the -Y option when you ssh into Garrawarla then use:

singularity run -B ~/.Xauthority /pawsey/mwa/singularity/psrchive_tempo2/psrchive_tempo2.sif <command_here>

For PulsePortraiture use

singularity run -B ~/.Xauthority /pawsey/mwa/singularity/pulseportraiture/pulseportraiture.sif <command_here>

The Observation Processing Pipeline

The OPP is a piece of software available on Garrawarla designed to fold on all known pulsars in an observation. The OPP will also launch polarimetric processing for detected pulsars.

First estimate the number of pulsars that will be beamformed using -o <obsid> and checking the output file. The number of sources may be higher than this because we also search for other sources (FRBs, RRATs and Fermi candidates) and pulsars further away from the beam centre if we estimate them to be extremely bright. We can estimate the storage the beamformed fits files require using <number of pulsars> X <observation duration in minutes> TB. If this number is below 50 TB then use the method described in Beamforming on all pulsars and Launching the OPP, if it is over 50 TB use the method described in Running the OPP in chunks

The PPP Workflow

Image accurate as of commit e6215f42c1d7c0b5a255721bc46840335170e579 to mwa_search repo

  1. Input Data: The OPP requires the calibrated and beamformed products of the VCS. These data can be acquired using the method described here.
  2. Pulsar Search: Given an observation ID, each pulsar within the field is identified and handed to the Pulsar Processing Pipeline (PPP)
  3. Initial Fold(s): Performs a PRESTO fold on the data. For slow pulsars, this will probably be 100 bins. Fast pulsars will be 50 bins.
  4. Classification: The resulting folds are classified as either a detection, or non-detection.
  5. Best Pointing: For the MWA's extended array configuration, there may be multiple pointings for a single pulsar. Should this be the case, we want to find the brightest detection to use for the rest of the pipeline. The "best" detection will be decided on and its pointing will be the only one used going forward.
  6. Post Folds:  A series of high-bin folds will be done. This is in order to find the highest time resolution fold we can do while still getting a detection.
  7. Upload products to database: Uploads the initial fold and best fold to the pulsar database.
  8. IQUV Folding: Uses DSPSR to fold on stokes IQUV, making a timescrunched archive. This archive is immediately converted back to PSRFITS format for use with PSRSALSA
  9. RM Synthesis: Runs RM synthesis on the archive. If successful, will apply this RM correction.
  10. RVM Fitting: Attempts to fit the Rotating Vector Model to the profile. If successful, will upload products to the database.

Beamforming on all pulsars

Before running the OPP, we need to beamform on all pulsars in the field of view of the observation we're processing. This can done with the following command: --obsid <OBSID> --calid <CALID> --begin <OBS BEGIN> --end <OBS END> --publish_fits_scratch

Like many nextflow scripts, it's recommended that you run this in a screen as it may take some time to complete.

 You can make sure you have the beamformed fits files by checking the following directory:


Launching the OPP

To run the OPP, ensure you have the vcstools and mwa_search modules loaded. Then, the following command will launch the  OPP: --obsid <OBSID> --calid <CALID> --beg <OBS BEGIN> --end <OBS END> --run_type fresh_run

This will launch an instance of the Pulsar Processing Pipeline for each pulsar in the field. Additional run options can be accessed with the --help tag. This may take up to an hour for observations with many pulsars.

Currently, there are three run types:

  • Fresh run: Creates a new config file for each pulsar, clears the working directories of any previous results and launches the PPP
  • Rerun existing: Loads the existing config files in the 'dpp' directory and launches the PP for each of them without any modifications
  • Rerun broken: Loads the existing config files, checks to see which did not complete properly and relaunches these jobs only

The Pulsar Processing Pipeline

The PPP is a piece of software designed to work in conjunction with the OPP. It handles the processing requirements for a single pulsar. You shouldn't need to interact with the PPP, however, this page will detail how it works.

When the PPP is launched, it checks the config file (.yaml) created by the OPP and decides what needs to be done. From here, it will create a job script and a relaunch script that will run the PPP script again. When the job has finished, the PPP will launch and read the config file again and decide what to do once again. This continues until there are no more things that can be done for the pulsar. Everything that happens in the PPP is documented in a .log file in the following directory:


The .log file contains all of the job information for the run. Similarly, the results of the run can be found in the config file and the resulting files will be found in the above directory.

Should you need to launch the PP manually, you can so so with the following command: --cfg <CONFIG_YAML>

Note that by default, the PPP will continue where it left off. If you want to restart the pipeline, add the --fresh_run tag. You can use the --help tag for more information


Many observations will contain a large number of pulsars in their field of view. This makes it inconvenient to check each output of the PPP manually to check its outcome. Instead, you can use the following command to do so (provided the OPP run is completely finished): -o <OBSID>

An example output looks something like this: -o 1223042480
# INFO:root: Reading config files from obsid 1223042480
# INFO:root: The following pulsars were detected:


# INFO:root: 100 || The following pulsars completed their run successfully:


# INFO:root: 101 || The following pulsars could not proceed with an RVM fit:


# INFO:root: 200 || The following pulsars were not classified as a detection:


# INFO:root: Total pulsars run through opp: 15
# INFO:root: Total successful detections: 2


Note that the positive detections are listed at the very top of the output. A detected pulsar is merely one that was classified as a detection by the LOTAAS classifier.

Running the OPP in chunks

This method explains how to run the OPP a chunk at a time so when you have a large number of pulsars in the FoV you can save storage space. It is best to run all of these commands in a screen from your Nextflow work directory (/astro/mwaops/<user>/<obsid>_work)

First, run all of the searches using --obsid <OBSID> --calid <CALID> --begin <OBS BEGIN> --end <OBS END> --only_cand_search

Once it is complete you can check the candidates in /astro/mwavcs/pulsar_search/<obsid>_candidates and delete the contents of the Nextflow work directory using

rm -r /astro/mwaops/<user>/<obsid>_work/??

Then run the following command to get all the required pointings and split them into 60 pointing chunks -o <obsid> -a -n 60 -c

This should output several pointing chunk files that look like this


Each file represents a chunk of the OPP you will run. For each file complete the following 4 steps (beamform, delete work files, run OPP and delete pointing files)

Run the beamformer using --obsid <obsid> --calid <calid> --begin <OBS BEGIN> --end <OBS END> --pointing_file <pointing chunk file> --publish_fits_scratch (--ipfb)

The (–ipfb) option should only be used for the vdif sources (eg. 1275758864_fov_vdif_sources_1_25.txt). Once the beamforming has complete successfully delete the Nextflow work files

rm -r /astro/mwaops/<user>/<obsid>_work/??

Then run the OPP --obsid <OBSID> --calid <CALID> --beg <OBS BEGIN> --end <OBS END> --run_type (fresh_run/rerun_broken)

If this is your first chunk of data use --run_type fresh run, if not use --run_type rerun_broken.

Once all the OPP jobs have finished you can safely delete the pointing files using

for i in $(cat <pointing chunk file>); do rm -r /astro/mwavcs/vcs/1275758864/pointings/${i}; done

Repeat these 4 steps until you have used all the pointing chunk files.

Multi-Gaussian Profile Fitting

Should you need to fit a pulse profile, vcstools contains a program to do so using multiple Gaussian components. The purpose of the fitter is not to attempt to model individual beam components, but rather to dissect the profile into a clean mathematical entity that can be used to attain some useful measurements of the profile. The script we use to run the fitter is This script requires a single input file that is one of the following formats:

  • PRESTO .pfd.bestprof
  • PRSCHIVE .archive
  • ASCII output of a .archive file

Note that even if a full-stokes .archive or ACII profile is provided, only Stokes I will be used, other stokes variants will be ignored.

The profile is the only required input for However, I would recommend using the --clip_type input and setting it to verbose

The verbose tag increases the number of alpha attempts when determining the optimal clipping verbosity. This is used to determine what part of the profile is noise and what is a real on-pulse region. A small alpha will not flag much of the profile as noise while a large alpha will be more aggressive with its flagging. The fitter decides which alpha value is most appropriate by attempting to fit a profile to ALL alpha variants and evaluating the result using the Bayesian Information Criterion.

An example run of the script using a command-line interface looks like this: --clip_type verbose --bestprof <path/to/profile.pfd.bestprof>

Which will produce a logger output that looks something like this:

2021-04-09 11:56:59,743  vcstools.gfit  223   INFO      :: ### Best fit results ###
2021-04-09 11:56:59,743  vcstools.gfit  224   INFO      :: Best model found with BIC of 1204.336446223922 and reduced Chi of 1.0756249942716816 using an alpha value of 4.0
2021-04-09 11:56:59,743  vcstools.gfit  225   INFO      :: W10:                   0.0327960216238512 +/- 0.0001587968509055737
2021-04-09 11:56:59,743  vcstools.gfit  226   INFO      :: W50:                   0.00741350847899902 +/- 0.00018715018881919736
2021-04-09 11:56:59,743  vcstools.gfit  227   INFO      :: Wscat:                 0.025053330660786743 +/- 0.0001387467684711917
2021-04-09 11:56:59,743  vcstools.gfit  228   INFO      :: Weq:                   0.013014107730249715 +/- 4.857997176991152e-06
2021-04-09 11:56:59,743  vcstools.gfit  229   INFO      :: Maxima:                [0.5415930174045701, 0.558888361926715]
2021-04-09 11:56:59,743  vcstools.gfit  230   INFO      :: Maxima error:          [6.478112024072669e-05, 0.00015442136061085071]
2021-04-09 11:56:59,743  vcstools.gfit  231   INFO      :: S/N estimate:          56.069659562426395 +/- 1.2607087746511898

Here, the widths and maxima are measured in units of phase.

One additional feature of the fitter is the plotting capability. To plot your result, you need to supply a plot name. For example: --clip_type verbose --bestprof <path/to/profile.pfd.bestprof> --plot_name <path/to/ouput/plot/plotname.png>

Which produces something like this:

The fitter is also capable of fitting noisy profiles:

The MWA Pulsar Database

The MWA Pulsar Database is where we store all of our detections and it is your responsibility to upload all detections you make to it.

Setting up your account

To get an account email Nick Swainston ( and then perform the following steps to set up your account on the Pawsey supercomputers.

Make a file with your username and password

cd ~
vi .mwa_pulsar_database_auth

Then write this to the file:

export MWA_PULSAR_DB_USER="<username>"
export MWA_PULSAR_DB_PASS="<password>"

but replace <username> and <password> with your username and password. Then we will change the permissions so no one else can open it

chmod 600 .mwa_pulsar_database_auth

Then open your .bashrc and add the line

source ~/.mwa_pulsar_database_auth

This should now set your username and password every time you log in so log in again

Uploaded detections to the database

There are a variety of files that you can upload to the database so it's a good idea to look through the available options with -h

The most common upload is a PRESTO prepfold detection which can be uploaded with -o <obs ID> -O <cal ID> -p <pulsar_Jname> -b <bestprof_file> --ppps <presto_plot>

Where the <bestprof_file> file should end in .pfd.bestprof and the <presto_plot> file should end in

Pulsar search pipelines

Nick has developed an MWA pulsar search pipeline which is getting stable enough for most users to be able to start running. These pipelines are very processing heavy and can cause bugs if run in a non-standard way so it's best to discuss with Nick what sort of search you're planning to do.

Working out a pointing grid

If you are doing a large scale search (more than a single pointing), for example, a search of a supernova remnant, then you will likely have to play how you grid your pointings using If you want to do a single loop (-l 1) of beams at the half-power point (-f 0.5) then you could use the following command -o <obsid> -l 1 -f 0.5 -d <fwhm_deg> -p <pointing>

where <fwhm_deg> is the FWHM of the beam in degrees, this can be estimated using <obsid>.

If you want to fill a RA and dec range you could use a command like this -o <obsid> -l 50 -f 0.8 -d <fwhm_deg> -p <pointing> --ra_range <min RA> <max RA> --dec_range <min dec> <max dec>

Where the --ra_range and --dec_range values are in degrees.

If you want to fill a circle with beams you could use a command like this (note this is only available in the mwa_search devel branch/module version) -o <obsid> --fill <radius_to_fill_in_degrees> -f 0.8 -d <fwhm_deg> -p <pointing>

It's always best to check the output png file to check that the grid looks reasonable. Once you are happy with your grid you can use the search pipelines using --pointing_file < output> 


The will beamform and search. A common search command is: --obsid <obs ID> --calid <cal ID> --all (or --begin <begin GPS time> --end <end GPS time>) --pointings <"RA string">_<"DEC string"> 

The candidates will be output to /astro/mwavcs/pulsar_search/<obsid>_candidates

By default, the search is a simple periodic search. To perform an acceleration search use --zmax 200 (this is extremely processing heavy so recommended you only do it on short observations)

By default, DMs 1 to 250 is searched. To change this use --dm_min --dm_max.

To see all the available options use: --help


If you have already beamformed and have the fits files available, you can use --obsid <obs ID> --fits_file <fits file location> --dur <fits file duration in seconds>

You should only input a single fits file to prevent errors in the single pulse searcher. The candidates will also be output to /astro/mwavcs/pulsar_search/<obsid>_candidates.

Estimating pulsar fluxes at MWA frequencies

The ATNF database is an abundant resource for published pulsar data. This may include flux densities over a broad range of frequencies. Should such data be available for a pulsar, we are able to make a first-order estimate of the flux we expect from a pulsar.

To view the spectral data on the ATNF database for any pulsar: --mode ATNF -p <Pulsar Name>

Which will output a plot such as this one:

Note that a spectral Index will be derived from the points. If there is sufficient data to do so, like in this example, this is calculated from a least-squares approach. Otherwise, the standard 1.4 +/- 1.0 will be applied from Bates et al. 2013

We can also make an estimate of a pulsar's flux at from any observation ID: --mode SNFE -p <Pulsar Name> -o <Obs ID>

This will apply the radiometer equation and estimate the flux and signal to noise ratio at the observation's central frequency. It is also possible to specify beginning and end times if you don't intend to use the entire observation length by adding:

-b <beginning GPS time > -e <end GPS time>

Plotting the SED like before is also possible with the tag:


For example: --mode SNFE -p J0630-2834 -o 1258221008 --plot_est

screen will output the following:

J0630-2834 derived spectral index: -1.3698868864530596 +/- 0.04796669612862415
J0630-2834 flux estimate at 154.24 MHz: 0.41368443304349944 +/- 0.006638533625983657 Jy
Pulsar S/N: 747.0705031130714 +/- 161.9689693670181

Nextflow Notes

This section explains the pros and cons of Nextflow and some tips on how to run the Nextflow scripts well


  • Shows you the progress of the pipeline
  • Prints any errors that occur (no need to look through the batch scripts)
  • Keeps all intermediate files in one place so they're easily deleted
  • If one job breaks, the pipeline can be resumed (using -resume) without having to rerun everything again
  • Makes handling container images easier


  • The Nextflow job runs as long as the pipeline so it's often best you put it in a screen (explained below)
  • Newer so there may be more bugs

Using a screen 

Screens allow you to run a command in the background that won't be interrupted when you log out of an ssh session. Here is a good guide of how to use screens:
I will go through the basics that you will require for running Nextflow

To make a screen use the command

screen -S <screen_name>

Make sure the screen_name is descriptive enough so that you know what you're doing in that screen. You will then be put inside a screen where you can run a Nextflow pipeline. You can then detach from the screen by holding down ALT and pressing d. Your pipeline will then continue to run in the background and you can return to the screen using

screen -r <screen_name>

If you can't remember what the screen_name was then you can run

screen -r 

This will reattach you to your screen if you only have one or list all your screens. If you can't see the screen you've made then you may be on the wrong login node. To switch to the other login in node run

nswainston@garrawarla-1:~> ssh garrawarla-2

Resuming Nextflow Pipelines

One large benefit of Nextflow pipelines is that you can resume the pipelines. Once you have fixed the bug that caused the pipeline to crash simply relaunch the pipeline with the -resume option added. For the resume option to work you must run the command from the same directory and the working directory can't be deleted

Cleaning up the work directories

Once the pipeline is done and you are confident you don't need to resume the pipeline or need the intermediate files then it is a good idea to remove the Nextflow work directories to save space. By default, the work directories are stored in

Calibration Combining Method (username is MWA-guest and the password is bowtie) can be used to search for observations and look at their metadata. To look for calibrators for your observations input your observation ID and click search like so

Open the observation ID link in a new tab and find the frequency channel IDs:

Then to find nearby calibrators, remove the observation ID, click the calibration box and add a start time 4 hours earlier than the target observation and end time hours 4 later. Like so:

Click search and you should see results similar to this

The name formatting for calibrator observations is the name of the calibrator source, an underscore and the centre frequency channel ID. Try and find a pair of calibration observations with the same calibrator source and, together, will cover the entire frequency range of the target observation. For the above example, this was 1195317056 and 1195316936. If you can't find any suitable calibration observations, then you can keep increasing the time search window up to 48 hours.

Now that you know which calibration observations you need, download and calibrate them as you normally would as explained in the calibration section. It is best to use the same values in the flagged_tiles.txt and flagged_channels.txt for all calibration obs to ensure your calibration solutions are consistent. Once the calibration is complete you can combine the two calibrations into one using the script -o <observation ID> -c1 <first calibration ID> -c2 <second calibration ID>

This will output the combined calibration solution to /astro/mwavcs/vcs/[obs ID]/cal/<first calibration ID> _<second calibration ID>/rts and you can treat the calibrator ID as <first calibration ID> _<second calibration ID> when being used in other scripts.

Deprecated Methods

These are old methods that are not maintained but may be useful if you need to do something specific or the new scripts have failed

Download (old python method)

If there are any issues with the above download method you can always try the previous method using the script

To download an entire observation (referred to by their observation IDs, GPS seconds, as labelled on the MWA data archive), use: -m download -o <obs ID> -a

otherwise, if you only want some interval [begin, end] of the full observation, then use: -m download -o <obs ID> -b <starting GPS second> -e <end GPS second>

Additionally, if the data have been processed on the NGAS server, then you have the option to ONLY download the incoherently summed data files (*_ics.dat) with the mode option: -m download_ics.

To check progress, use:

squeue -p copyq -M zeus -u $USER

and you should see something like this:

Each user can have a maximum of 8 concurrent download jobs running. 

Checking downloaded data

By default, data are written to /astro/mwavcs/vcs/[obs ID]. Once the observation appears to have downloaded correctly, it is good practice to check. The easiest way is to check to output from the download command. In /group/mwavcs/vcs/[obs ID] there is a subdirectory labelled "batch". Within the "batch" directory is stored all of the submitted SLURM scripts and their output. If there has been a problem in downloading data, this will appear as up to 10 files named "check_volt*.out", i.e.

check_volt_[GPS time]_0.out
check_volt_[GPS time]_1.out
check_volt_[GPS time]_2.out

After 10 attempts to correctly download the data, we stop trying. In this case, the user will need to investigate the SLURM output to see what was going wrong.

If the raw data was downloaded, and you manually want to check them, then use: -m download -d raw -o <obs ID> -a (or -b/-e)

otherwise, if the recombined data (i.e. with the "Processed" flag on the MWA data archive) were downloaded, and you manually want to check them, then use: -m recombine -o <obs ID> -a (or -b/-e)

The total data volume downloaded will vary, but for maximum duration VCS observations this can easily be ~40 TB of just raw data. It is therefore important to keep in mind the amount of data you are processing and the disk space your are consuming. If only the raw voltages have been downloaded then you will need to recombine the data yourself, which doubles the amount of data (see next section).

Beamforming (old python method)

There are a few different modes in which the beamformer can be operated and these different "modes" produce different outputs.

With the calibration solutions, we are able to form a tied-array beam at any pointing direction within the field-of-view. To start the beamforming process (it runs on the "gpuq"), use: -m beamform -o <obs ID> -a (or -b/-e) -p <"RA string"> <"DEC string"> --DI_dir </path/to/rts/output> --flagged_tiles </path/to/flagged_tiles.txt> 
 			   [--incoh] [--bf_out_format <"psrfits" or "vdif" or "both">]

where ["RA string"] is formatted as "" and ["DEC string"] is formatted as "" (including the sign: "+" or "-"). If you also want the incoherent sum, you can pass the --incoh flag.

Splicing channels together

We are able to splice together adjacent frequency channels for each 200 second chunk using the splice_psrfits program. In it's current form it accepts all the files you want to splice into one (i.e. the 24 coarse channel data files for an individual 200 second chunk), plus an output suffix. It is invoked as follows:

splice_psrfits [file1] [file2] ... [suffix]

and the output file will be named [suffix]_0001.fits.

Note that frequency ordering matters here! The beamformed PSRFITS are already ordered by channel, so just add them in order. i.e.

splice_psrfits [lowest channel] [second lowest channel] [...] [suffix]

If you give it channels in the wrong frequency order, it will complain about channels not being "adjacent" and crash.

For VDIF output, each channel is written to its own file. As far as we can tell, the only way to add them together is to first create the archives using DSPSR and then add them together with PSRCHIVE tools.

To make this all easier, the script will handle the multiple frequency channels and multiple 200-second chunk aspect of this. To make use of this, change into the directory that contains the individual coarse channel PSRFITS, and then run:

at which point you will be asked to input the Project ID (usually G0024), the observation ID (this is required), how many 200-second chunks you want to combine, and the lowest and high coarse channel number. All of these, except the observation ID, have sensible defaults. If you do not enter an observation ID, the script will not proceed.

  • No labels