deepblink package¶
deepblink.augment module¶
Model utility functions for augmentation.
-
deepblink.augment.
augment_batch_baseline
(images: numpy.ndarray, masks: numpy.ndarray, flip_: bool = False, illuminate_: bool = False, gaussian_noise_: bool = False, rotate_: bool = False, translate_: bool = False, cell_size: int = 4) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Baseline augmentation function.
Probability of augmentations is determined in the corresponding functions and not in this baseline.
Parameters: - images – Batch of input image to be augmented with shape (n, x, y).
- masks – Batch of corresponding prediction matrix with ground truth values with shape (n, x, y).
- flip_ – If True, images might be flipped.
- illuminate_ – If True, images might be altered in illumination.
- gaussian_noise_ – If True, gaussian noise might be added.
- rotate_ – If True, images might be rotated.
- translate_ – If True, images might be translated.
- cell_size – Size of one cell in the prediction matrix.
-
deepblink.augment.
flip
(image: numpy.ndarray, mask: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Augment through horizontal/vertical flipping.
-
deepblink.augment.
gaussian_noise
(image: numpy.ndarray, mask: numpy.ndarray, mean: int = 0) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Augment through the addition of gaussian noise.
Parameters: - image – Image to be augmented.
- mask – Corresponding prediction matrix with ground truth values.
- mean – Average noise pixel values added. Zero means no net difference occurs.
-
deepblink.augment.
illuminate
(image: numpy.ndarray, mask: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Augment through changing illumination.
-
deepblink.augment.
rotate
(image: numpy.ndarray, mask: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Augment through rotation.
-
deepblink.augment.
translate
(image: numpy.ndarray, mask: numpy.ndarray, cell_size: int = 4) → Tuple[numpy.ndarray, numpy.ndarray][source]¶ Augment through translation along all axes.
Parameters: - image – Image to be augmented.
- mask – Corresponding prediction matrix with ground truth values.
- cell_size – Size of one cell in the prediction matrix.
deepblink.cli package¶
Module contents¶
Module that contains the command line interface.
Divided into <_command> files containing the parser and a call handler. Additionally, utility functions in _logger and _parseutil.
deepblink.data module¶
List of functions to handle data including converting matrices <-> coordinates.
-
deepblink.data.
absolute_coordinate
(coord_spot: Tuple[numpy.float32, numpy.float32], coord_cell: Tuple[numpy.float32, numpy.float32], cell_size: int = 4) → Tuple[numpy.float32, numpy.float32][source]¶ Return the absolute image coordinate from a relative cell coordinate.
Parameters: - coord_spot – Relative spot coordinate in format (r, c).
- coord_cell – Top-left coordinate of the cell.
- cell_size – Size of one cell in a grid.
Returns: Absolute coordinate.
-
deepblink.data.
get_coordinate_list
(matrix: numpy.ndarray, image_size: int = 512, probability: float = 0.5) → numpy.ndarray[source]¶ Convert the prediction matrix into a list of coordinates.
NOTE - plt.scatter uses the x, y system. Therefore any plots must be inverted by assigning x=c, y=r!
Parameters: - matrix – Matrix representation of spot coordinates.
- image_size – Default image size the grid was layed on.
- probability – Cutoff value to round model prediction probability.
Returns: Array of r, c coordinates with the shape (n, 2).
-
deepblink.data.
get_prediction_matrix
(coords: numpy.ndarray, image_size: int, cell_size: int = 4, size_c: int = None) → numpy.ndarray[source]¶ Return np.ndarray of shape (n, n, 3): p, r, c format for each cell.
Parameters: - coords – List of coordinates in r, c format with shape (n, 2).
- image_size – Size of the image from which List of coordinates are extracted.
- cell_size – Size of one grid cell inside the matrix. A cell_size of 2 means that one cell corresponds to 2 pixels in the original image.
- size_c – If empty, assumes a squared image. Else the length of the r axis.
Returns: The prediction matrix as numpy array of shape (n, n, 3) – p, r, c format for each cell.
deepblink.datasets package¶
Submodules¶
deepblink.datasets.sequence module¶
SequenceDataset class.
-
class
deepblink.datasets.sequence.
SequenceDataset
(x: numpy.ndarray, y: numpy.ndarray, batch_size: int = 16, augment_fn: Callable = None, format_fn: Callable = None, overfit: bool = False)[source]¶ Bases:
keras.utils.data_utils.Sequence
Custom Sequence class used to feed data into model.fit.
Parameters: - x_list – List of inputs.
- y_list – List of targets.
- batch_size – Size of one mini-batch.
- augment_fn – Function to augment one mini-batch of x and y.
- format_fn – Function to format raw data to model input.
- overfit – If only one batch should be used thereby causing overfitting.
deepblink.datasets.spots module¶
SpotsDataset class.
-
class
deepblink.datasets.spots.
SpotsDataset
(name: str, cell_size: int, smooth_factor: float = 1)[source]¶ Bases:
deepblink.datasets._datasets.Dataset
Class used to load all spots data.
Parameters: - cell_size – Number of pixels (from original image) constituting one cell in the prediction matrix.
- smooth_factor – Value used to weigh true cells, weighs false cells with 1-smooth_factor.
-
image_size
¶ Check if all images have the same square shape.
Module contents¶
Datasets module with classes to handle data import and data presentation for training.
-
class
deepblink.datasets.
Dataset
(name: str, *_)[source]¶ Bases:
object
Simple abstract class for datasets.
Parameters: name – Absolute path to dataset file. -
data_filename
¶ Return the absolute path to the dataset.
-
-
class
deepblink.datasets.
SequenceDataset
(x: numpy.ndarray, y: numpy.ndarray, batch_size: int = 16, augment_fn: Callable = None, format_fn: Callable = None, overfit: bool = False)[source]¶ Bases:
keras.utils.data_utils.Sequence
Custom Sequence class used to feed data into model.fit.
Parameters: - x_list – List of inputs.
- y_list – List of targets.
- batch_size – Size of one mini-batch.
- augment_fn – Function to augment one mini-batch of x and y.
- format_fn – Function to format raw data to model input.
- overfit – If only one batch should be used thereby causing overfitting.
-
class
deepblink.datasets.
SpotsDataset
(name: str, cell_size: int, smooth_factor: float = 1)[source]¶ Bases:
deepblink.datasets._datasets.Dataset
Class used to load all spots data.
Parameters: - cell_size – Number of pixels (from original image) constituting one cell in the prediction matrix.
- smooth_factor – Value used to weigh true cells, weighs false cells with 1-smooth_factor.
-
image_size
¶ Check if all images have the same square shape.
deepblink.inference module¶
Model prediction / inference functions.
-
deepblink.inference.
get_intensities
(image: numpy.ndarray, coordinate_list: numpy.ndarray, radius: int, method: str = 'sum') → numpy.ndarray[source]¶ Finds integrated intensities in a radius around each coordinate.
Parameters: - image – Input image with pixel values.
- coordinate_list – List of r, c coordinates in shape (n, 2).
- radius – Radius of kernel to determine intensities.
- method – How the integrated intensity should be calculated [options: sum, mean, std].
Returns: Array with all integrated intensities.
-
deepblink.inference.
get_probabilities
(matrix: numpy.ndarray, coordinates: numpy.ndarray, image_size: int = 512) → numpy.ndarray[source]¶ Find prediction probability given the matrix and coordinates.
Parameters: - matrix – Matrix representation of spot coordinates.
- coordinates – Coordinates at which the probability should be determined.
- image_size – Default image size the grid was layed on.
Returns: Array with all probabilities matching the coordinates.
-
deepblink.inference.
predict
(image: numpy.ndarray, model: keras.engine.training.Model, probability: Union[None, float] = None) → numpy.ndarray[source]¶ Returns a binary or categorical model based prediction of an image.
Parameters: - image – Image to be predicted.
- model – Model used to predict the image.
- probability – Cutoff value to round model prediction probability.
Returns: List of coordinates [r, c].
deepblink.io module¶
Dataset preparation functions.
-
deepblink.io.
basename
(path: Union[str, os.PathLike[str]]) → str[source]¶ Returns the basename removing path and extension.
-
deepblink.io.
grab_files
(path: Union[str, os.PathLike[str]], extensions: Tuple[str, ...]) → List[str][source]¶ Grab all files in directory with listed extensions.
Parameters: - path – Path to files to be grabbed. Without trailing “/”.
- extensions – List of all file extensions. Without leading “.”.
Returns: Sorted list of all corresponding files.
Raises: OSError
– Path not existing.
-
deepblink.io.
load_image
(fname: Union[str, os.PathLike[str]], extensions: Tuple[str, ...] = ('tif', 'tiff', 'jpeg', 'jpg', 'png'), is_rgb: bool = False) → numpy.ndarray[source]¶ Import a single image as numpy array checking format requirements.
Parameters: - fname – Absolute or relative filepath of image.
- extensions – Allowed image extensions.
- is_rgb – If true, converts RGB images to grayscale.
-
deepblink.io.
load_model
(fname: Union[str, os.PathLike[str]]) → keras.engine.training.Model[source]¶ Import a deepBlink model from file.
-
deepblink.io.
load_npz
(fname: Union[str, os.PathLike[str]], test_only: bool = False) → List[numpy.ndarray][source]¶ Imports the standard npz file format used for custom training and inference.
Only for files saved using “np.savez_compressed(fname, x_train, y_train…)”.
Parameters: - fname – Path to npz file.
- test_only – Only return testing images and labels.
Returns: A list of the required numpy arrays. If no “test_only” arguments were passed, returns [x_train, y_train, x_valid, y_valid, x_test, y_test].
Raises: ValueError
– If not all datasets are found.
deepblink.losses module¶
Functions to calculate training loss on batches of images.
While functions are comparable to the ones found in the module metrics, these rely on keras’ backend and do not take raw numpy as input.
-
deepblink.losses.
categorical_crossentropy
(y_true, y_pred)[source]¶ Keras’ categorical crossentropy loss.
-
deepblink.losses.
combined_bce_rmse
(y_true, y_pred)[source]¶ Loss that combines binary cross entropy for probability and rmse for coordinates.
The optimal values for binary crossentropy (bce) and rmse are both 0.
-
deepblink.losses.
combined_dice_rmse
(y_true, y_pred)[source]¶ Loss that combines dice for probability and rmse for coordinates.
The optimal values for dice and rmse are both 0.
-
deepblink.losses.
combined_f1_rmse
(y_true, y_pred)[source]¶ Difference between F1 score and root mean square error (rmse).
The optimal values for F1 score and rmse are 1 and 0 respectively. Therefore, the combined optimal value is 1.
-
deepblink.losses.
dice_loss
(y_true, y_pred)[source]¶ Dice score loss corresponding to deepblink.losses.dice_score.
-
deepblink.losses.
dice_score
(y_true, y_pred, smooth: int = 1)[source]¶ Computes the dice coefficient on a batch of tensors.
\[\textrm{Dice} = \frac{2 * {\lvert X \cup Y\rvert}}{\lvert X\rvert +\lvert Y\rvert}\]ref: https://arxiv.org/pdf/1606.04797v1.pdf
Parameters: - y_true – Ground truth masks.
- y_pred – Predicted masks.
- smooth – Epslion value to avoid division by zero.
-
deepblink.losses.
f1_loss
(y_true, y_pred)[source]¶ F1 score loss corresponding to deepblink.losses.f1_score.
-
deepblink.losses.
f1_score
(y_true, y_pred)[source]¶ F1 score metric.
\[F1 = \frac{2 * \textrm{precision} * \textrm{recall}}{\textrm{precision} + \textrm{recall}}\]The equally weighted average of precision and recall. The best value is 1 and the worst value is 0.
-
deepblink.losses.
precision_score
(y_true, y_pred)[source]¶ Precision score metric.
Defined as
tp / (tp + fp)
where tp is the number of true positives and fp the number of false positives. Can be interpreted as the accuracy to not mislabel samples or how many selected items are relevant. The best value is 1 and the worst value is 0.
-
deepblink.losses.
recall_score
(y_true, y_pred)[source]¶ Recall score metric.
Defined as
tp / (tp + fn)
where tp is the number of true positives and fn the number of false negatives. Can be interpreted as the accuracy of finding positive samples or how many relevant samples were selected. The best value is 1 and the worst value is 0.
deepblink.metrics module¶
Functions to calculate training loss on single image.
-
deepblink.metrics.
compute_metrics
(pred: numpy.ndarray, true: numpy.ndarray, mdist: float = 3.0) → pandas.core.frame.DataFrame[source]¶ Calculate metric scores across cutoffs.
Parameters: - pred – Predicted set of coordinates.
- true – Ground truth set of coordinates.
- mdist – Maximum euclidean distance in px to which F1 scores will be calculated.
Returns: DataFrame with one row per cutoff containing columns for –
- f1_score: Harmonic mean of precision and recall based on the number of coordinates
- found at different distance cutoffs (around ground truth).
- abs_euclidean: Average euclidean distance at each cutoff.
- offset: List of (r, c) coordinates denoting offset in pixels.
- f1_integral: Area under curve f1_score vs. cutoffs.
- mean_euclidean: Normalized average euclidean distance based on the total number of assignments.
-
deepblink.metrics.
euclidean_dist
(x1: float, y1: float, x2: float, y2: float) → float[source]¶ Return the euclidean distance between two the points (x1, y1) and (x2, y2).
-
deepblink.metrics.
f1_integral
(pred: numpy.ndarray, true: numpy.ndarray, mdist: float = 3.0, n_cutoffs: int = 50, return_raw: bool = False) → Union[float, tuple][source]¶ F1 integral calculation / area under F1 vs. cutoff.
Compute the area under the curve when plotting F1 score vs cutoff values. Optimal score is ~1 (floating point inaccuracy) when F1 is achieved across all cutoff values including 0.
Parameters: - pred – Array of shape (n, 2) for predicted coordinates.
- true – Array of shape (n, 2) for ground truth coordinates.
- mdist – Maximum cutoff distance to calculate F1. Defaults to None.
- n_cutoffs – Number of intermediate cutoff steps. Defaults to 50.
- return_raw – If True, returns f1_scores, offsets, and cutoffs. Defaults to False.
Returns: By default returns a single value in the f1_integral score. If return_raw is True, a tuple containing: * f1_scores: The non-integrated list of F1 values for all cutoffs. * offsets: Offset in r, c on predicted coords assigned to true coords * cutoffs: A list of all cutoffs used
Notes
Scipy.spatial.distance.cdist((xa*n), (xb*n)) returns a matrix of shape (xa*xb). Here we use pred as xa and true as xb. This means that the matrix has all true coordinates along the row axis and all pred coordinates along the column axis. It’s transpose has the opposite. The linear assignment takes in a cost matrix and returns the coordinates to assigned costs which fall below a defined cutoff. This assigment takes the rows as reference and assignes columns to them. Therefore, the transpose matrix resulting in row and column coordinates named “true_pred_r” and “true_pred_c” respectively uses true (along matrix row axis) as reference and pred (along matrix column axis) as assigments. In other terms the assigned predictions that are close to ground truth coordinates. To now calculate the offsets, we can use the “true_pred” rows and columns to find the originally referenced coordinates. As mentioned, the matrix has true along its row axis and pred along its column axis. Thereby we can use basic indexing. The [0] and [1] index refer to the coordinates’ row and column value. This offset is now used two-fold. Once to plot the scatter pattern to make sure models aren’t biased in one direction and secondly to compute the euclidean distance.
The euclidean distance could not simply be summed up like with the F1 score because the different cutoffs actively influence the maximum euclidean distance score. Here, instead, we sum up all distances measured across every cutoff and then dividing by the total number of assigned coordinates. This automatically weighs models with more detections at lower cutoff scores.
-
deepblink.metrics.
f1_score
(pred: numpy.ndarray, true: numpy.ndarray) → Optional[float][source]¶ F1 score metric.
\[F1 = \frac{2 * precision * recall} / {precision + recall}.\]The equally weighted average of precision and recall. The best value is 1 and the worst value is 0.
NOTE – direction dependent, arguments cant be switched!!
Parameters: - pred – np.ndarray of shape (n, n, 3): p, r, c format for each cell.
- true – np.ndarray of shape (n, n, 3): p, r, c format for each cell.
-
deepblink.metrics.
linear_sum_assignment
(matrix: numpy.ndarray, cutoff: float = None) → Tuple[list, list][source]¶ Solve the linear sum assignment problem with a cutoff.
A problem instance is described by matrix matrix where each matrix[i, j] is the cost of matching i (worker) with j (job). The goal is to find the most optimal assignment of j to i if the given cost is below the cutoff.
Parameters: - matrix – Matrix containing cost/distance to assign cols to rows.
- cutoff – Maximum cost/distance value assignments can have.
Returns: (rows, columns) corresponding to the matching assignment.
-
deepblink.metrics.
offset_euclidean
(offset: List[tuple]) → numpy.ndarray[source]¶ Calculates the euclidean distance based on row_column_offsets per coordinate.
-
deepblink.metrics.
precision_score
(pred: numpy.ndarray, true: numpy.ndarray) → float[source]¶ Precision score metric.
Defined as
tp / (tp + fp)
where tp is the number of true positives and fp the number of false positives. Can be interpreted as the accuracy to not mislabel samples or how many selected items are relevant. The best value is 1 and the worst value is 0.NOTE – direction dependent, arguments cant be switched!!
Parameters: - pred – np.ndarray of shape (n, n, 3): p, r, c format for each cell.
- true – np.ndarray of shape (n, n, 3): p, r, c format for each cell.
-
deepblink.metrics.
recall_score
(pred: numpy.ndarray, true: numpy.ndarray) → float[source]¶ Recall score metric.
Defined as
tp / (tp + fn)
where tp is the number of true positives and fn the number of false negatives. Can be interpreted as the accuracy of finding positive samples or how many relevant samples were selected. The best value is 1 and the worst value is 0.NOTE – direction dependent, arguments cant be switched!!
Parameters: - pred – np.ndarray of shape (n, n, 3): p, r, c format for each cell.
- true – np.ndarray of shape (n, n, 3): p, r, c format for each cell.
deepblink.models package¶
Submodules¶
Module contents¶
Models module with the training loop and logic to handle data which feeds into the loop.
-
class
deepblink.models.
Model
(augmentation_args: Dict[KT, VT], dataset_args: Dict[KT, VT], dataset_cls: deepblink.datasets._datasets.Dataset, network_args: Dict[KT, VT], network_fn: Callable, loss_fn: Callable, optimizer_fn: Callable, train_args: Dict[KT, VT], pre_model: keras.engine.training.Model = None, **kwargs)[source]¶ Bases:
object
Base class, to be subclassed by predictors for specific type of data, e.g. spots.
Parameters: - dataset_args – Dataset arguments containing - version, cell_size, flip, illuminate, rotate, gaussian_noise, and translate.
- dataset_cls – Specific dataset class.
- network_args – Network arguments containing - n_channels.
- network_fn – Network function returning a built model.
- loss_fn – Loss function.
- optimizer_fn – Optimizer function.
- train_args – Training arguments containing - batch_size, epochs, learning_rate.
- pre_model – Loaded, pre-trained model to bypass a new network creation.
- Kwargs:
- batch_format_fn: Formatting function added in the specific model, e.g. spots. batch_augment_fn: Same as batch_format_fn for augmentation.
-
evaluate
(x: numpy.ndarray, y: numpy.ndarray) → List[float][source]¶ Evaluate on images / masks and return l2 norm and f1 score.
-
fit
(dataset: deepblink.datasets._datasets.Dataset, augment_val: bool = True, callbacks: list = None) → None[source]¶ Training loop.
-
metrics
¶ Return metrics.
deepblink.networks package¶
Submodules¶
deepblink.networks.unet module¶
UNet architecture.
-
deepblink.networks.unet.
unet
(dropout: float = 0.2, cell_size: int = 4, filters: int = 5, ndown: int = 2, l2: float = 1e-06, block: str = 'convolutional') → keras.engine.training.Model[source]¶ Unet model with second, cell size dependent encoder.
Note that “convolution” is the currently best block.
Parameters: - dropout – Percentage of dropout before each MaxPooling step.
- cell_size – Size of one cell in the prediction matrix.
- filters – Log_2 number of filters in the first inception block.
- ndown – Downsampling steps in the first encoder / decoder.
- l2 – L2 value for kernel and bias regularization.
- block – Type of block in each layer. [options: convolutional, inception, residual]
Module contents¶
Networks folder.
Contains functions returning the base architectures of used models.
-
deepblink.networks.
unet
(dropout: float = 0.2, cell_size: int = 4, filters: int = 5, ndown: int = 2, l2: float = 1e-06, block: str = 'convolutional') → keras.engine.training.Model[source]¶ Unet model with second, cell size dependent encoder.
Note that “convolution” is the currently best block.
Parameters: - dropout – Percentage of dropout before each MaxPooling step.
- cell_size – Size of one cell in the prediction matrix.
- filters – Log_2 number of filters in the first inception block.
- ndown – Downsampling steps in the first encoder / decoder.
- l2 – L2 value for kernel and bias regularization.
- block – Type of block in each layer. [options: convolutional, inception, residual]
deepblink.optimizers module¶
Optimizers are used to update weight parameters in a neural network.
The learning rate defines what stepsizes are taken during one iteration of training. This file contains functions to return standard or custom optimizers.
-
deepblink.optimizers.
adam
(learning_rate: float)[source]¶ Keras’ adam optimizer with a specified learning rate.
deepblink.training module¶
Training functions.
-
deepblink.training.
run_experiment
(cfg: Dict[KT, VT], pre_model: keras.engine.training.Model = None)[source]¶ Run a training experiment.
Configuration file can be generated using deepblink config.
Parameters: - cfg – Dictionary configuration file.
- pre_model – Pre-trained model if not training from scratch.
-
deepblink.training.
train_model
(model: deepblink.models._models.Model, dataset: deepblink.datasets._datasets.Dataset, cfg: Dict[KT, VT], run_name: str = 'model', use_wandb: bool = True) → deepblink.models._models.Model[source]¶ Model training loop with callbacks.
Parameters: - model – Model class with the .fit method.
- dataset – Dataset class with access to train and validation images.
- cfg – Configuration file equivalent to the one used in pink.training.run_experiment.
- run_name – Name given to the model.h5 file saved.
- use_wandb – If Wandb should be used.
deepblink.util module¶
Utility helper functions.
-
deepblink.util.
delete_non_unique_columns
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶ Deletes DataFrame columns that only contain one (non-unique) value.
-
deepblink.util.
get_from_module
(path: str, attribute: str) → Callable[source]¶ Grab an attribute (e.g. class) from a given module path.
-
deepblink.util.
predict_pixel_size
(fname: Union[str, os.PathLike[str]]) → Tuple[float, float][source]¶ Predict the pixel size based on tifffile metadata.
-
deepblink.util.
predict_shape
(shape: tuple) → str[source]¶ Predict the channel-arangement based on common standards.
Assumes the following things: * x, y are the two largest axes * rgb only if the last axis is 3 * up to 4 channels * “fill up order” is c, z, t
Parameters: shape – To be predicted shape. Output from np.ndarray.shape
-
deepblink.util.
relative_shuffle
(x: Union[list, numpy.ndarray], y: Union[list, numpy.ndarray]) → Tuple[Union[list, numpy.ndarray], Union[list, numpy.ndarray]][source]¶ Shuffles x and y keeping their relative order.
-
deepblink.util.
remove_falses
(tup: tuple) → tuple[source]¶ Removes all false occurences from a tuple.
-
deepblink.util.
train_valid_split
(x_list: list, y_list: list, valid_split: float = 0.2, shuffle: bool = True) → Iterable[list][source]¶ Split two lists (usually input and ground truth).
Splitting into random training and validation sets with an optional shuffling.
Parameters: - x_list – First list of items. Typically input data.
- y_list – Second list of items. Typically labeled data.
- valid_split – Number between 0-1 to denote the percentage of examples used for validation.
Returns: (x_train, x_valid, y_train, y_valid) splited lists containing training or validation examples respectively.