Code Documentation

CLI Module

Contains functionality for commandline interface as well as argument validation.

dcdf.cli._build_parser() → argparse.ArgumentParser

Build the parser for commandline arguments.

Returns:returns an argument parser for the CLI
dcdf.cli._check_nifti(file_list: List[str], from_file: Optional[bool] = False) → bool

Check whether each of the nifti files can be found.

Parameters:
  • file_list – Should be one of: args.build, args.evaluate, arg.reference_masks, args.evaluation_masks, args.group_mask. Where args are the parsed arguments.
  • from_file – Whether the arguments have been supplied as text files.
Returns:

True if each of the nifti images can be found on disk.

dcdf.cli._get_bounds_filter(args)

If lower/upper bounds have been specified by the arguments, then provide a filter to be applied to the data.

Parameters:args – The parsed arguments to this program.
dcdf.cli._get_list(arg: List[str], from_file: bool) → List[str]

Helper function to handle the from_file = True/False usecases.

Parameters:
  • arg – either a list of nifti filenames, or a list with a single entry to a textfile (of filenames)
  • from_file – if False, arg is a list of filenames, if true, it is a list with a single entry to a textfile
Returns:

List of filenames

dcdf.cli._lc(filename: str) → int

Helper function to count the number of lines in a file.

Parameters:filename – name of the file to be read – assumed to be plain text.
Returns:count of the number of lines in file
dcdf.cli._validate_args(parser: argparse.ArgumentParser) → bool

Sanity checking on the inputs. Returns False if any checks fail.

Parameters:parser – the parser returned from _build_parser
Returns:False if any of the arguments fail the sanity checks, else True.
dcdf.cli.main()

This function is called on startup, and consists of the following steps: 1. Construct our parser, validate commandline arguments, and retrieve arguments. 2. Construct a filter based on any specified bounds. 3. Build reference as specified. 4. Write reference (if requested). 5. Evaluate samples (if requested) – run in parallel if requested. 6. Print results.

Data Module

Contains functions used to load and create the datastructures used in this project.

class dcdf.data.ModifiedECDF(ecdf: scipy.stats.stats.CumfreqResult)

Bases: object

__dict__ = mappingproxy({'__module__': 'dcdf.data', '__init__': <function ModifiedECDF.__init__>, '__dict__': <attribute '__dict__' of 'ModifiedECDF' objects>, '__weakref__': <attribute '__weakref__' of 'ModifiedECDF' objects>, '__doc__': None})
__init__(ecdf: scipy.stats.stats.CumfreqResult)

Slightly hack-ish class that was added to add support for inverse functions without changing too much of the code.

Parameters:ecdf – the output of scipy.stats.cumfreq which is being modified.
__module__ = 'dcdf.data'
__weakref__

list of weak references to the object (if defined)

dcdf.data.get_datapoints(input_filename: str, mask_filename: Optional[str] = None, mask_indices: Optional[numpy.ndarray] = None, ignore_zeros: Optional[bool] = True, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → numpy.ndarray

This function reads a nifti file and returns a flat (1D) array of the image. Various options can be used to filter the array

Parameters:
  • input_filename – filename of the nifti file to be loaded
  • mask_filename – Optional: filename of mask to be applied to data
  • mask_indices – Optional and ignored if mask_filename is set. Indices to extract from data array
  • filter – Optional: function which takes in an np.ndarray and returns an np.ndarrray. Can be used to apply a filter to the data (e.g thresholding)
Returns:

A 1-D number array containing the filtered datapoints

dcdf.data.get_null_reference_cdf(lowerlimit: numpy.float32, upperlimit: numpy.float32, numbins: int = 1000) → dcdf.data.ModifiedECDF

This function will return a CDF to be used as a null reference.

Parameters:
  • lowerlimit – lower bound for the CDF
  • upperlimit – upperbound for the CDF
  • numbins – How many bins should be used for the reference
Returns:

ModifiedECDF of all zeros for the specified range

dcdf.data.get_percentiles(data: numpy.ndarray, nsamples: int) → numpy.ndarray

Sample the data at various percentiles.

Parameters:
  • data – the full data to be considered
  • nsamples – the number of percentiles to be evaluated (e.g nsamples=100 will calulate the 1st, 2nd, … , 99th, 100th percentile)
Returns:

a numpy array holding the data values which correspond

the index’s percentile

dcdf.data.get_reference_cdf(reference_list: List[str], numbins: Optional[int] = 1000, indv_mask_list: Optional[List[str]] = None, group_mask_filename: Optional[str] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None, lowerlimit: Optional[numpy.float32] = None, upperlimit: Optional[numpy.float32] = None, _piecewise: Optional[bool] = True) → dcdf.data.ModifiedECDF

This function will return a CDF to be used as a reference based on the provided images.

Parameters:
  • reference_list – List of nifti files to be used for reference.
  • numbins – How many bins should be used for the reference
  • indv_mask_list – A list with the same length as reference_list of masks to be used for each subject.
  • group_mask_filename – If not None, this should be a path to a nifti file which should be used as a mask for each of the reference images. If set, indv_mask_list will be ignored.
  • filter – A filtering function to be applied to the flattened array of nifti data.
  • lowerlimit – lower bound for the CDF
  • upperlimit – upperbound for the CDF
  • _piecewise – memory efficient loading – requires lowerlimit and upper limit to be set behaviour is undefined otherwise.
Returns:

A data structure containing information about the cumulative distribution of the data

dcdf.data.get_subject_cdf(subject_array: numpy.ndarray, reference_cdf: dcdf.data.ModifiedECDF) → dcdf.data.ModifiedECDF

Calculate the individual subject’s cdf with respect to the reference CDF.

Parameters:
  • subject_array – numpy array of datapoints from get_datapoints
  • reference_cdf – reference_cdf that was built using get_reference_cdf.
Returns:

ECDF information for the requested subject.

dcdf.data.get_subject_cdf2(subject_array: numpy.ndarray, numbins: int, lowerlimit: numpy.float32, binsize: int) → dcdf.data.ModifiedECDF

Calculate the individual subject’s cdf with respect to the reference CDF.

Parameters:
  • subject_array – numpy array of datapoints from get_datapoints
  • numbins – len(Cumfreqresult.cumcount)
  • lowerlimit – CumfreqResult.lowerlimit
  • binsize – Cumfreqresult.binsize
Returns:

ECDF information for the requested subject.

dcdf.data.load_reference(filename) → dcdf.data.ModifiedECDF

Load and retrun a pickled reference. Note: this function will assume that the pickled object is in fact of type ModifiedECDF. No checks will be performed …

Parameters:filename – path to the pickled ModifiedECDF which should be loaded.
Returns:the pickled reference
dcdf.data.save_reference(reference: dcdf.data.ModifiedECDF, filename: str)

Save the reference using pickle. If available, protocol 4 will be used.

Parameters:
  • reference – ModifiedECDF to be saved.
  • filename – path to save the reference to.

Measure Module

Contains functions used to perform single-threaded evaluation of subjects.

dcdf.measure.get_func_dict(func_file: str) → dict

Reads a textfile which should be in the format: [function name] : [function code] where [function code] will later be executed using eval

Parameters:func_file – name of the textfile containing the equations
Returns:a dictionary of [function name] and [function code] pairs
dcdf.measure.measure_single_subject(subject: str, reference: scipy.stats.stats.CumfreqResult, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], indv_mask: Optional[str] = None, group_mask_indices: Optional[numpy.ndarray] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None, _print_inverse: Optional[bool] = False) → Tuple[str, Dict[str, numpy.float32]]

Function to apply provided measures to a single subject

Parameters:
  • subjects – Nifti file paths
  • reference – CumfreqResult from data.get_reference_cdf
  • func_dict – Output of measure.get_func_dict. A dictionary of functions to be calculated over CDF differences. Keys will be used as column names in the return of this function
  • indv_mask – Mask to be applied to subject image
  • group_mask_filename – If not None, this should be a path to a nifti file which will be used as a mask for eac of the individual images. If set, indv_mask_list will be ignored.
  • filter – Optional: function which takes in an np.ndarray and returns an np.ndarray. Can be used to apply a filter to the data (e.g thresholding)
Returns:

a Tuple with first element being the nifti file path, and the second element being a dictionary of results with keys being function names

dcdf.measure.measure_subjects(subjects_list: List[str], reference: scipy.stats.stats.CumfreqResult, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], indv_mask_list: Optional[List[str]] = None, group_mask_filename: Optional[str] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → pandas.core.frame.DataFrame

Wrapper around measure_single_subject to apply to each subject

Parameters:
  • subjects_list – List of nifti file paths
  • reference – CumfreqResult from data.get_reference_cdf
  • func_dict – Output of measure.get_func_dict. A dictionary of functions to be calculated over CDF differences. Keys will be used as column names in the return of this function
  • indv_mask_list – A list with the same length as subjects_list to be used for each subject.
  • group_mask_filename – If not None, this should be a path to anifti file which will be used as a mask for eac of the individual images. If set, indv_mask_list will be ignored.
  • filter – Optional: function which takes in an np.ndarray and returns an np.ndarray. Can be used to apply a filter to the data (e.g thresholding)
Returns:

pandas.DataFrame containing all of the results.

dcdf.measure.print_measurements(mdf: pandas.core.frame.DataFrame)

This function will print out the results of measure.measure_subjects.

Parameters:mdf – pd.DataFrame returned from measure.measure_subjects

Parallel Module

As per measure module, but allows for parallel evaluation of subjects.

dcdf.parallel._mp_measure(subject: str, indv_mask_filename: Optional[str] = None) → List[numpy.float32]

Function to apply provided measures to a single subject

Parameters:
  • subjects – Nifti file paths
  • indv_mask_filename – Mask to be applied to subject image
dcdf.parallel._worker_init(binsize: numpy.float32, inverse_binsize: numpy.float32, lowerlimit: numpy.float32, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], shared_mask: Tuple[str, Tuple[int], numpy.dtype], shared_ref: Tuple[str, Tuple[int], numpy.dtype], shared_ref_inverse: Tuple[str, Tuple[int], numpy.dtype], filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → None

Initialization function for parallel workers calling _mp_measure. :param binsize: CumfreqResult.binsize :param lowerlimit: CumfreqResult.lowerlimit. :param func_dict: Output of measure.get_func_dict. A dictionary :param shm_ref_tuple: (shm.name,shape,dtype) :param shm_mask_tuple: (shm.name,shape,dtype) :param filter: Optional: function which takes in an np.ndarray and

dcdf.parallel.parallel_measure_subjects(subjects_list: List[str], reference: scipy.stats.stats.CumfreqResult, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], indv_mask_list: Optional[List[str]] = None, group_mask_filename: Optional[str] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None, n_procs: Optional[int] = None) → pandas.core.frame.DataFrame
Parameters:
  • subjects_list – List of nifti file paths
  • reference – CumfreqResult from data.get_reference_cdf
  • func_dict – Output of measure.get_func_dict. A dictionary of functions to be calculated over CDF differences. Keys will be used as column names in the return of this function
  • indv_mask_list – A list with the same length as subjects_list to be used for each subject.
  • group_mask_filename – If not None, this should be a path to anifti file which will be used as a mask for eac of the individual images. If set, indv_mask_list will be ignored.
  • filter – Optional: function which takes in an np.ndarray and returns an np.ndarray. Can be used to apply a filter to the data (e.g thresholding)
  • n_procs – Number of processes to be started. If none, then the number returned by os.cpu_count() is used