Code Documentation¶
CLI Module¶
Contains functionality for commandline interface as well as argument validation.
-
dcdf.cli.
_build_parser
() → argparse.ArgumentParser¶ Build the parser for commandline arguments.
Returns: returns an argument parser for the CLI
-
dcdf.cli.
_check_nifti
(file_list: List[str], from_file: Optional[bool] = False) → bool¶ Check whether each of the nifti files can be found.
Parameters: - file_list – Should be one of: args.build, args.evaluate, arg.reference_masks, args.evaluation_masks, args.group_mask. Where args are the parsed arguments.
- from_file – Whether the arguments have been supplied as text files.
Returns: True if each of the nifti images can be found on disk.
-
dcdf.cli.
_get_bounds_filter
(args)¶ If lower/upper bounds have been specified by the arguments, then provide a filter to be applied to the data.
Parameters: args – The parsed arguments to this program.
-
dcdf.cli.
_get_list
(arg: List[str], from_file: bool) → List[str]¶ Helper function to handle the from_file = True/False usecases.
Parameters: - arg – either a list of nifti filenames, or a list with a single entry to a textfile (of filenames)
- from_file – if False, arg is a list of filenames, if true, it is a list with a single entry to a textfile
Returns: List of filenames
-
dcdf.cli.
_lc
(filename: str) → int¶ Helper function to count the number of lines in a file.
Parameters: filename – name of the file to be read – assumed to be plain text. Returns: count of the number of lines in file
-
dcdf.cli.
_validate_args
(parser: argparse.ArgumentParser) → bool¶ Sanity checking on the inputs. Returns False if any checks fail.
Parameters: parser – the parser returned from _build_parser Returns: False if any of the arguments fail the sanity checks, else True.
-
dcdf.cli.
main
()¶ This function is called on startup, and consists of the following steps: 1. Construct our parser, validate commandline arguments, and retrieve arguments. 2. Construct a filter based on any specified bounds. 3. Build reference as specified. 4. Write reference (if requested). 5. Evaluate samples (if requested) – run in parallel if requested. 6. Print results.
Data Module¶
Contains functions used to load and create the datastructures used in this project.
-
class
dcdf.data.
ModifiedECDF
(ecdf: scipy.stats.stats.CumfreqResult)¶ Bases:
object
-
__dict__
= mappingproxy({'__module__': 'dcdf.data', '__init__': <function ModifiedECDF.__init__>, '__dict__': <attribute '__dict__' of 'ModifiedECDF' objects>, '__weakref__': <attribute '__weakref__' of 'ModifiedECDF' objects>, '__doc__': None})¶
-
__init__
(ecdf: scipy.stats.stats.CumfreqResult)¶ Slightly hack-ish class that was added to add support for inverse functions without changing too much of the code.
Parameters: ecdf – the output of scipy.stats.cumfreq which is being modified.
-
__module__
= 'dcdf.data'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
dcdf.data.
get_datapoints
(input_filename: str, mask_filename: Optional[str] = None, mask_indices: Optional[numpy.ndarray] = None, ignore_zeros: Optional[bool] = True, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → numpy.ndarray¶ This function reads a nifti file and returns a flat (1D) array of the image. Various options can be used to filter the array
Parameters: - input_filename – filename of the nifti file to be loaded
- mask_filename – Optional: filename of mask to be applied to data
- mask_indices – Optional and ignored if mask_filename is set. Indices to extract from data array
- filter – Optional: function which takes in an np.ndarray and returns an np.ndarrray. Can be used to apply a filter to the data (e.g thresholding)
Returns: A 1-D number array containing the filtered datapoints
-
dcdf.data.
get_null_reference_cdf
(lowerlimit: numpy.float32, upperlimit: numpy.float32, numbins: int = 1000) → dcdf.data.ModifiedECDF¶ This function will return a CDF to be used as a null reference.
Parameters: - lowerlimit – lower bound for the CDF
- upperlimit – upperbound for the CDF
- numbins – How many bins should be used for the reference
Returns: ModifiedECDF of all zeros for the specified range
-
dcdf.data.
get_percentiles
(data: numpy.ndarray, nsamples: int) → numpy.ndarray¶ Sample the data at various percentiles.
Parameters: - data – the full data to be considered
- nsamples – the number of percentiles to be evaluated (e.g nsamples=100 will calulate the 1st, 2nd, … , 99th, 100th percentile)
Returns: a numpy array holding the data values which correspond
the index’s percentile
-
dcdf.data.
get_reference_cdf
(reference_list: List[str], numbins: Optional[int] = 1000, indv_mask_list: Optional[List[str]] = None, group_mask_filename: Optional[str] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None, lowerlimit: Optional[numpy.float32] = None, upperlimit: Optional[numpy.float32] = None, _piecewise: Optional[bool] = True) → dcdf.data.ModifiedECDF¶ This function will return a CDF to be used as a reference based on the provided images.
Parameters: - reference_list – List of nifti files to be used for reference.
- numbins – How many bins should be used for the reference
- indv_mask_list – A list with the same length as reference_list of masks to be used for each subject.
- group_mask_filename – If not None, this should be a path to a nifti file which should be used as a mask for each of the reference images. If set, indv_mask_list will be ignored.
- filter – A filtering function to be applied to the flattened array of nifti data.
- lowerlimit – lower bound for the CDF
- upperlimit – upperbound for the CDF
- _piecewise – memory efficient loading – requires lowerlimit and upper limit to be set behaviour is undefined otherwise.
Returns: A data structure containing information about the cumulative distribution of the data
-
dcdf.data.
get_subject_cdf
(subject_array: numpy.ndarray, reference_cdf: dcdf.data.ModifiedECDF) → dcdf.data.ModifiedECDF¶ Calculate the individual subject’s cdf with respect to the reference CDF.
Parameters: - subject_array – numpy array of datapoints from get_datapoints
- reference_cdf – reference_cdf that was built using get_reference_cdf.
Returns: ECDF information for the requested subject.
-
dcdf.data.
get_subject_cdf2
(subject_array: numpy.ndarray, numbins: int, lowerlimit: numpy.float32, binsize: int) → dcdf.data.ModifiedECDF¶ Calculate the individual subject’s cdf with respect to the reference CDF.
Parameters: - subject_array – numpy array of datapoints from get_datapoints
- numbins – len(Cumfreqresult.cumcount)
- lowerlimit – CumfreqResult.lowerlimit
- binsize – Cumfreqresult.binsize
Returns: ECDF information for the requested subject.
-
dcdf.data.
load_reference
(filename) → dcdf.data.ModifiedECDF¶ Load and retrun a pickled reference. Note: this function will assume that the pickled object is in fact of type ModifiedECDF. No checks will be performed …
Parameters: filename – path to the pickled ModifiedECDF which should be loaded. Returns: the pickled reference
-
dcdf.data.
save_reference
(reference: dcdf.data.ModifiedECDF, filename: str)¶ Save the reference using pickle. If available, protocol 4 will be used.
Parameters: - reference – ModifiedECDF to be saved.
- filename – path to save the reference to.
Measure Module¶
Contains functions used to perform single-threaded evaluation of subjects.
-
dcdf.measure.
get_func_dict
(func_file: str) → dict¶ Reads a textfile which should be in the format: [function name] : [function code] where [function code] will later be executed using eval
Parameters: func_file – name of the textfile containing the equations Returns: a dictionary of [function name] and [function code] pairs
-
dcdf.measure.
measure_single_subject
(subject: str, reference: scipy.stats.stats.CumfreqResult, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], indv_mask: Optional[str] = None, group_mask_indices: Optional[numpy.ndarray] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None, _print_inverse: Optional[bool] = False) → Tuple[str, Dict[str, numpy.float32]]¶ Function to apply provided measures to a single subject
Parameters: - subjects – Nifti file paths
- reference – CumfreqResult from data.get_reference_cdf
- func_dict – Output of measure.get_func_dict. A dictionary of functions to be calculated over CDF differences. Keys will be used as column names in the return of this function
- indv_mask – Mask to be applied to subject image
- group_mask_filename – If not None, this should be a path to a nifti file which will be used as a mask for eac of the individual images. If set, indv_mask_list will be ignored.
- filter – Optional: function which takes in an np.ndarray and returns an np.ndarray. Can be used to apply a filter to the data (e.g thresholding)
Returns: a Tuple with first element being the nifti file path, and the second element being a dictionary of results with keys being function names
-
dcdf.measure.
measure_subjects
(subjects_list: List[str], reference: scipy.stats.stats.CumfreqResult, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], indv_mask_list: Optional[List[str]] = None, group_mask_filename: Optional[str] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → pandas.core.frame.DataFrame¶ Wrapper around measure_single_subject to apply to each subject
Parameters: - subjects_list – List of nifti file paths
- reference – CumfreqResult from data.get_reference_cdf
- func_dict – Output of measure.get_func_dict. A dictionary of functions to be calculated over CDF differences. Keys will be used as column names in the return of this function
- indv_mask_list – A list with the same length as subjects_list to be used for each subject.
- group_mask_filename – If not None, this should be a path to anifti file which will be used as a mask for eac of the individual images. If set, indv_mask_list will be ignored.
- filter – Optional: function which takes in an np.ndarray and returns an np.ndarray. Can be used to apply a filter to the data (e.g thresholding)
Returns: pandas.DataFrame containing all of the results.
-
dcdf.measure.
print_measurements
(mdf: pandas.core.frame.DataFrame)¶ This function will print out the results of measure.measure_subjects.
Parameters: mdf – pd.DataFrame returned from measure.measure_subjects
Parallel Module¶
As per measure module, but allows for parallel evaluation of subjects.
-
dcdf.parallel.
_mp_measure
(subject: str, indv_mask_filename: Optional[str] = None) → List[numpy.float32]¶ Function to apply provided measures to a single subject
Parameters: - subjects – Nifti file paths
- indv_mask_filename – Mask to be applied to subject image
-
dcdf.parallel.
_worker_init
(binsize: numpy.float32, inverse_binsize: numpy.float32, lowerlimit: numpy.float32, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], shared_mask: Tuple[str, Tuple[int], numpy.dtype], shared_ref: Tuple[str, Tuple[int], numpy.dtype], shared_ref_inverse: Tuple[str, Tuple[int], numpy.dtype], filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None) → None¶ Initialization function for parallel workers calling _mp_measure. :param binsize: CumfreqResult.binsize :param lowerlimit: CumfreqResult.lowerlimit. :param func_dict: Output of measure.get_func_dict. A dictionary :param shm_ref_tuple: (shm.name,shape,dtype) :param shm_mask_tuple: (shm.name,shape,dtype) :param filter: Optional: function which takes in an np.ndarray and
-
dcdf.parallel.
parallel_measure_subjects
(subjects_list: List[str], reference: scipy.stats.stats.CumfreqResult, func_dict: Dict[str, Callable[[numpy.ndarray, numpy.ndarray, numpy.float32], numpy.float32]], indv_mask_list: Optional[List[str]] = None, group_mask_filename: Optional[str] = None, filter: Optional[Callable[[numpy.ndarray], numpy.ndarray]] = None, n_procs: Optional[int] = None) → pandas.core.frame.DataFrame¶ Parameters: - subjects_list – List of nifti file paths
- reference – CumfreqResult from data.get_reference_cdf
- func_dict – Output of measure.get_func_dict. A dictionary of functions to be calculated over CDF differences. Keys will be used as column names in the return of this function
- indv_mask_list – A list with the same length as subjects_list to be used for each subject.
- group_mask_filename – If not None, this should be a path to anifti file which will be used as a mask for eac of the individual images. If set, indv_mask_list will be ignored.
- filter – Optional: function which takes in an np.ndarray and returns an np.ndarray. Can be used to apply a filter to the data (e.g thresholding)
- n_procs – Number of processes to be started. If none, then the number returned by os.cpu_count() is used