Title: | Blocking and Randomization for Experimental Design |
---|---|
Description: | Intelligently assign samples to batches in order to reduce batch effects. Batch effects can have a significant impact on data analysis, especially when the assignment of samples to batches coincides with the contrast groups being studied. By defining a batch container and a scoring function that reflects the contrasts, this package allows users to assign samples in a way that minimizes the potential impact of batch effects on the comparison of interest. Among other functionality, we provide an implementation for OSAT score by Yan et al. (2012, <doi:10.1186/1471-2164-13-689>). |
Authors: | Iakov I. Davydov [aut, cre, cph] , Juliane Siebourg-Polster [aut, cph] , Guido Steiner [aut, cph], Konrad Rudolph [ctb] , Jitao David Zhang [aut, cph] , Balazs Banfai [aut, cph] , F. Hoffman-La Roche [cph, fnd] |
Maintainer: | Iakov I. Davydov <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.0.9000 |
Built: | 2024-11-11 17:39:21 UTC |
Source: | https://github.com/bedapub/designit |
Alternative acceptance function for multi-dimensional scores in which order (left to right, e.g. first to last) denotes relevance.
accept_leftmost_improvement(current_score, best_score, ..., tolerance = 0)
accept_leftmost_improvement(current_score, best_score, ..., tolerance = 0)
current_score |
One- or multi-dimensional score from the current optimizing iteration (double or vector of doubles) |
best_score |
Best one- or multi-dimensional score found so far (double or vector of doubles) |
... |
Ignored arguments that may be used by alternative acceptance functions |
tolerance |
Tolerance value: When comparing score vectors from left to right, differences within +/- tol won't immediately shortcut the comparison at this point, allowing improvement in a less important score to exhibit some influence |
Boolean, TRUE if current score should be taken as the new optimal score, FALSE otherwise
Distributes samples based on a sample sheet.
assign_from_table(batch_container, samples)
assign_from_table(batch_container, samples)
batch_container |
Instance of BatchContainer class |
samples |
|
Returns a new BatchContainer
.
bc <- BatchContainer$new( dimensions = list( plate = 2, column = list(values = letters[1:3]), row = 3 ) ) sample_sheet <- tibble::tribble( ~plate, ~column, ~row, ~sampleID, ~group, 1, "a", 1, 1, "TRT", 1, "b", 2, 2, "CNTRL", 2, "a", 1, 3, "TRT", 2, "b", 2, 4, "CNTRL", 2, "a", 3, 5, "TRT", ) # assign samples from the sample sheet bc <- assign_from_table(bc, sample_sheet) bc$get_samples(remove_empty_locations = TRUE)
bc <- BatchContainer$new( dimensions = list( plate = 2, column = list(values = letters[1:3]), row = 3 ) ) sample_sheet <- tibble::tribble( ~plate, ~column, ~row, ~sampleID, ~group, 1, "a", 1, 1, "TRT", 1, "b", 2, 2, "CNTRL", 2, "a", 1, 3, "TRT", 2, "b", 2, 4, "CNTRL", 2, "a", 3, 5, "TRT", ) # assign samples from the sample sheet bc <- assign_from_table(bc, sample_sheet) bc$get_samples(remove_empty_locations = TRUE)
First sample is assigned to the first location, second sample is assigned to the second location, etc.
assign_in_order(batch_container, samples = NULL)
assign_in_order(batch_container, samples = NULL)
batch_container |
Instance of BatchContainer class |
samples |
data.frame with samples. |
Returns a new BatchContainer
.
samples <- data.frame(sampId = 1:3, sampName = letters[1:3]) samples bc <- BatchContainer$new(dimensions = c("row" = 3, "column" = 2)) bc set.seed(42) # assigns samples randomly bc <- assign_random(bc, samples) bc$get_samples() # assigns samples in order bc <- assign_in_order(bc) bc$get_samples()
samples <- data.frame(sampId = 1:3, sampName = letters[1:3]) samples bc <- BatchContainer$new(dimensions = c("row" = 3, "column" = 2)) bc set.seed(42) # assigns samples randomly bc <- assign_random(bc, samples) bc$get_samples() # assigns samples in order bc <- assign_in_order(bc) bc$get_samples()
Assignment function which distributes samples randomly.
assign_random(batch_container, samples = NULL)
assign_random(batch_container, samples = NULL)
batch_container |
Instance of BatchContainer class |
samples |
data.frame with samples. |
Returns a new BatchContainer
.
samples <- data.frame(sampId = 1:3, sampName = letters[1:3]) samples bc <- BatchContainer$new(dimensions = c("row" = 3, "column" = 2)) bc set.seed(42) # assigns samples randomly bc <- assign_random(bc, samples) bc$get_samples() # assigns samples in order bc <- assign_in_order(bc) bc$get_samples()
samples <- data.frame(sampId = 1:3, sampName = letters[1:3]) samples bc <- BatchContainer$new(dimensions = c("row" = 3, "column" = 2)) bc set.seed(42) # assigns samples randomly bc <- assign_random(bc, samples) bc$get_samples() # assigns samples in order bc <- assign_in_order(bc) bc$get_samples()
Creates a BatchContainer from a table (data.frame/tibble::tibble) containing sample and location information.
batch_container_from_table(tab, location_cols)
batch_container_from_table(tab, location_cols)
tab |
A table with location and sample information.
Table rows with all |
location_cols |
Names of columns containing information about locations. |
A BatchContainer assigned samples.
tab <- data.frame( row = rep(1:3, each = 3), column = rep(1:3, 3), sample_id = c(1, 2, 3, NA, 5, 6, 7, NA, 9) ) bc <- batch_container_from_table(tab, location_cols = c("row", "column"))
tab <- data.frame( row = rep(1:3, each = 3), column = rep(1:3, 3), sample_id = c(1, 2, 3, NA, 5, 6, 7, NA, 9) ) bc <- batch_container_from_table(tab, location_cols = c("row", "column"))
Describes container dimensions and samples to container location assignment.
A typical workflow starts with creating a BatchContainer
. Then
samples can be assigned to locations in that container.
trace
Optimization trace, a tibble::tibble()
scoring_f
Scoring functions used for optimization. Each scoring function should receive a BatchContainer. This function should return a floating point score value for the assignment. This a list of functions. Upon assignment a single function will be automatically converted to a list In the later case each function is called.
has_samples
Returns TRUE if BatchContainer
has samples.
has_samples_attr
Returns TRUE if BatchContainer
has sample atrributes assigned.
n_locations
Returns number of locations in a BatchContainer
.
n_dimensions
Returns number of dimensions in a BatchContainer
.
This field cannot be assigned.
dimension_names
character vector with dimension names. This field cannot be assigned.
samples
Samples in the batch container. When assigning data.frame should not have column named .sample_id column.
samples_attr
Extra attributes of samples. If set, this is included into
BatchContainer$get_samples()
output.
assignment
Sample assignment vector. Should contain NAs for empty locations.
Assigning this field is deprecated, please use $move_samples()
instead.
new()
Create a new BatchContainer object.
BatchContainer$new(locations_table, dimensions, exclude = NULL)
locations_table
A table with available locations.
dimensions
A vector or list of dimensions. Every dimension
should have a name. Could be an integer vector of dimensions or
a named list. Every value of a list could be either dimension size
or parameters for
BatchContainerDimension$new().
Can be used as an alternative to passing locations_table
.
exclude
data.frame with excluded locations of a container. Only used together with dimensions.
bc <- BatchContainer$new( dimensions = list( "plate" = 3, "row" = list(values = letters[1:3]), "column" = list(values = c(1, 3)) ), exclude = data.frame(plate = 1, row = "a", column = c(1, 3), stringsAsFactors = FALSE) ) bc
get_samples()
Return table with samples and sample assignment.
BatchContainer$get_samples( assignment = TRUE, include_id = FALSE, remove_empty_locations = FALSE, as_tibble = TRUE )
assignment
Return sample assignment. If FALSE, only samples table is returned, with out batch assignment.
include_id
Keep .sample_id in the table. Use TRUE
for
lower overhead.
remove_empty_locations
Removes empty locations from the result tibble.
as_tibble
Return tibble
.
If FALSE
returns data.table
. This should have
lower overhead, as internally there is a cached data.table
.
table with samples and sample assignment.
get_locations()
Get a table with all the locations in a BatchContainer
.
BatchContainer$get_locations()
A tibble
with all the available locations.
move_samples()
Move samples between locations modifying the BatchContainer in place
This method can receive either src
and dst
or locations_assignment
.
BatchContainer$move_samples(src, dst, location_assignment)
src
integer vector of source locations
dst
integer vector of destination locations (the same length as src
).
location_assignment
integer vector with location assignment.
The length of the vector should match the number of locations,
NA
should be used for empty locations.
BatchContainer
, invisibly
score()
Score current sample assignment,
BatchContainer$score(scoring)
scoring
a function or a names list of scoring functions. Each function should return a numeric vector.
Returns a named vector of all scoring functions values.
copy()
Create an independent copy (clone) of a BatchContainer
BatchContainer$copy()
Returns a new BatchContainer
print()
Prints information about BatchContainer
.
BatchContainer$print(...)
...
not used.
scores_table()
Return a table with scores from an optimization.
BatchContainer$scores_table(index = NULL, include_aggregated = FALSE)
index
optimization index, all by default
include_aggregated
include aggregated scores
a tibble::tibble()
with scores
plot_trace()
Plot trace
BatchContainer$plot_trace(index = NULL, include_aggregated = FALSE, ...)
index
optimization index, all by default
include_aggregated
include aggregated scores
...
not used.
a ggplot2::ggplot()
object
List of scoring functions.
Tibble with batch container locations.
Tibble with sample information and sample ids.
Sample attributes, a data.table.
Vector with assignment of sample ids to locations.
Cached data.table with samples assignment.
Validate sample assignment.
## ------------------------------------------------ ## Method `BatchContainer$new` ## ------------------------------------------------ bc <- BatchContainer$new( dimensions = list( "plate" = 3, "row" = list(values = letters[1:3]), "column" = list(values = c(1, 3)) ), exclude = data.frame(plate = 1, row = "a", column = c(1, 3), stringsAsFactors = FALSE) ) bc
## ------------------------------------------------ ## Method `BatchContainer$new` ## ------------------------------------------------ bc <- BatchContainer$new( dimensions = list( "plate" = 3, "row" = list(values = letters[1:3]), "column" = list(values = c(1, 3)) ), exclude = data.frame(plate = 1, row = "a", column = c(1, 3), stringsAsFactors = FALSE) ) bc
R6 Class representing a batch container dimension.
R6 Class representing a batch container dimension.
name
dimension name.
values
vector of dimension values.
size
Returns size of a dimension.
short_info
Returns a string summarizing the dimension. E.g., "mydim<size=10>".
new()
Create a new BatchContainerDimension object.
This is usually used implicitly via BatchContainer$new()
.
BatchContainerDimension$new(name, size = NULL, values = NULL)
name
Dimension name, a character string. Requiered.
size
Dimension size. Setting this implies that dimension values are 1:size
.
values
Explicit list of dimension values. Could be numeric, character or factor.
It is required to provide dimension namd and either size of values.
plate_dimension <- BatchContainerDimension$new("plate", size=3) row_dimension <- BatchContainerDimension$new("row", values = letters[1:3]) column_dimension <- BatchContainerDimension$new("column", values = 1:3) bc <- BatchContainer$new( dimensions = list(plate_dimension, row_dimension, column_dimension), exclude = data.frame(plate = 1, row = "a", column = c(1, 3), stringsAsFactors = FALSE) ) bc
clone()
The objects of this class are cloneable with this method.
BatchContainerDimension$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------ ## Method `BatchContainerDimension$new` ## ------------------------------------------------ plate_dimension <- BatchContainerDimension$new("plate", size=3) row_dimension <- BatchContainerDimension$new("row", values = letters[1:3]) column_dimension <- BatchContainerDimension$new("column", values = 1:3) bc <- BatchContainer$new( dimensions = list(plate_dimension, row_dimension, column_dimension), exclude = data.frame(plate = 1, row = "a", column = c(1, 3), stringsAsFactors = FALSE) ) bc
## ------------------------------------------------ ## Method `BatchContainerDimension$new` ## ------------------------------------------------ plate_dimension <- BatchContainerDimension$new("plate", size=3) row_dimension <- BatchContainerDimension$new("row", values = letters[1:3]) column_dimension <- BatchContainerDimension$new("column", values = 1:3) bc <- BatchContainer$new( dimensions = list(plate_dimension, row_dimension, column_dimension), exclude = data.frame(plate = 1, row = "a", column = c(1, 3), stringsAsFactors = FALSE) ) bc
All information needed to perform this function (primarily the number and size of subgroups plus the levels of the allocation variable) are contained in and extracted from the subgroup object.
compile_possible_subgroup_allocation( subgroup_object, fullTree = FALSE, maxCalls = 1e+06 )
compile_possible_subgroup_allocation( subgroup_object, fullTree = FALSE, maxCalls = 1e+06 )
subgroup_object |
A subgrouping object as returned by |
fullTree |
Boolean: Enforce full search of the possibility tree, independent of the value of |
maxCalls |
Maximum number of recursive calls in the search tree, to avoid long run times with very large trees |
List of possible allocations; Each allocation is an integer vector of allocation levels that are assigned in that order to the subgroups with given sizes
This function was just added to test early on the functionality of optimize_design() to accept a permutation vector rather than a list with src and dst indices.
complete_random_shuffling(batch_container, ...)
complete_random_shuffling(batch_container, ...)
batch_container |
The batch-container. |
... |
Other params that are passed to a generic shuffling function (like the iteration number). |
A random permutation of the sample assignment in the container.
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("plate" = 2, "column" = 5, "row" = 6) ) scoring_f <- osat_score_generator("plate", "Sex") bc <- optimize_design( bc, scoring = scoring_f, invivo_study_samples, max_iter = 100, shuffle_proposal_func = complete_random_shuffling )
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("plate" = 2, "column" = 5, "row" = 6) ) scoring_f <- osat_score_generator("plate", "Sex") bc <- optimize_design( bc, scoring = scoring_f, invivo_study_samples, max_iter = 100, shuffle_proposal_func = complete_random_shuffling )
Drop highest order interactions
drop_order(.terms, m = -1)
drop_order(.terms, m = -1)
.terms |
|
m |
order of interaction (highest available if -1) |
This function enables comparison of the results of two scoring functions by just basing the decision on the first element. This reflects the original behavior of the optimization function, just evaluating the 'auxiliary' scores for the user's information.
first_score_only(scores, ...)
first_score_only(scores, ...)
scores |
A score or multiple component score vector |
... |
Parameters to be ignored by this aggregation function |
The aggregated score, i.e. the first element of a multiple-component score vector.
first_score_only(c(1, 2, 3))
first_score_only(c(1, 2, 3))
Form groups and subgroups of 'homogeneous' samples as defined by certain variables and size constraints
form_homogeneous_subgroups( batch_container, allocate_var, keep_together_vars = c(), n_min = NA, n_max = NA, n_ideal = NA, subgroup_var_name = NULL, prefer_big_groups = TRUE, strict = TRUE )
form_homogeneous_subgroups( batch_container, allocate_var, keep_together_vars = c(), n_min = NA, n_max = NA, n_ideal = NA, subgroup_var_name = NULL, prefer_big_groups = TRUE, strict = TRUE )
batch_container |
Batch container with all samples assigned that are to be grouped and sub-grouped |
allocate_var |
Name of a variable in the |
keep_together_vars |
Vector of column names in sample table; groups are formed by pooling samples with identical values of all those variables |
n_min |
Minimal number of samples in one sub(!)group; by default 1 |
n_max |
Maximal number of samples in one sub(!)group; by default the size of the biggest group |
n_ideal |
Ideal number of samples in one sub(!)group; by default the floor or ceiling of |
subgroup_var_name |
An optional column name for the subgroups which are formed (or NULL) |
prefer_big_groups |
Boolean; indicating whether or not bigger subgroups should be preferred in case of several possibilities |
strict |
Boolean; if TRUE, subgroup size constraints have to be met strictly, implying the possibility of finding no solution at all |
Subgroup object to be used in subsequent calls to compile_possible_subgroup_allocation()
terms.object
(formula with attributes)Generate terms.object
(formula with attributes)
generate_terms(.tbl, ...)
generate_terms(.tbl, ...)
.tbl |
data |
... |
columns to skip (unquoted) |
Get highest order interaction
get_order(.terms)
get_order(.terms)
.terms |
highest order (numeric).
This sample list is intended to be used in connection with the "invivo_study_treatments"
data object
data(invivo_study_samples)
data(invivo_study_samples)
An object of class "tibble"
The animal IDs, i.e. unique identifiers for each animal
Strain (A or B)
Female (F) or Male (M)
Date of birth, not available for all the animals
Markings to distinguish individual animals, applied on the left (L), right (R) or both(B) ears
Initial body weight of the animal
Unit of the body weight, here: grams
The litter IDs, grouping offspring from one set of parents
Guido Steiner
This treatment list is intended to be used in connection with the "invivo_study_samples"
data object
data(invivo_study_treatments)
data(invivo_study_treatments)
An object of class "tibble"
The treatment to be given to an individual animal (1-3, plus a few untreated cases)
Strain (A or B) - a constraint which kind of animal may receive the respective treatment
Female (F) or Male (M) - a constraint which kind of animal may receive the respective treatment
Guido Steiner
This function enables comparison of the results of two scoring functions by calculating an L1 norm (Manhattan distance from origin).
L1_norm(scores, ...)
L1_norm(scores, ...)
scores |
A score or multiple component score vector |
... |
Parameters to be ignored by this aggregation function |
The L1 norm as an aggregated score.
L1_norm(c(2, 2))
L1_norm(c(2, 2))
This function enables comparison of the results of two scoring functions by calculating an L2 norm (euclidean distance from origin). Since this is only used for ranking solutions, the squared L2 norm is returned.
L2s_norm(scores, ...)
L2s_norm(scores, ...)
scores |
A score or multiple component score vector |
... |
Parameters to be ignored by this aggregation function |
The squared L2 norm as an aggregated score.
L2s_norm(c(2, 2))
L2s_norm(c(2, 2))
Create locations table from dimensions and exclude table
locations_table_from_dimensions(dimensions, exclude)
locations_table_from_dimensions(dimensions, exclude)
dimensions |
A vector or list of dimensions. Every dimension should have a name. Could be an integer vector of dimensions or a named list. Every value of a list could be either dimension size or parameters for BatchContainerDimension$new(). |
exclude |
data.frame with excluded locations of a container. |
a tibble::tibble()
with all the available locations.
A sample list with 9 columns as described below.
There are 3 types of records (rows) indicated by the SampleType
variable.
Patient samples, controls and spike-in standards.
Patient samples were collected over up to 7 time points.
Controls and SpikeIns are QC samples for distribution of the samples on
96 well plates.
data(longitudinal_subject_samples)
data(longitudinal_subject_samples)
An object of class "tibble"
A unique sample identifier.
Indicates whether the sample is a patient sample, control oder spike-in.
The subject identifier.
Indicates the treatment group of a subject.
Sampling time points in weeks of study.
Subject Sex, Female (F) or Male (M).
Subject age.
Subject Body Mass Index.
Look up variable for the number of samples per subject. This varies as not subject have samples from all weeks.
Juliane Siebourg
Alternative acceptance function for multi-dimensional scores with exponentially downweighted score improvements from left to right
mk_exponentially_weighted_acceptance_func( kappa = 0.5, simulated_annealing = FALSE, temp_function = mk_simanneal_temp_func(T0 = 500, alpha = 0.8) )
mk_exponentially_weighted_acceptance_func( kappa = 0.5, simulated_annealing = FALSE, temp_function = mk_simanneal_temp_func(T0 = 500, alpha = 0.8) )
kappa |
Coefficient that determines how quickly the weights for the individual score improvements drop when going from left to right (i.e. first to last score). Weight for the first score's delta is 1, then the original delta multiplied with kappa^(p-1) for the p'th score |
simulated_annealing |
Boolean; if TRUE, simulated annealing (SA) will be used to minimize the weighted improved score |
temp_function |
In case SA is used, a temperature function that returns the annealing temperature for a certain iteration number |
Acceptance function which returns TRUE if current score should be taken as the new optimal score, FALSE otherwise
Create a list of scoring functions (one per plate) that quantify the spatially homogeneous distribution of conditions across the plate
mk_plate_scoring_functions( batch_container, plate = NULL, row, column, group, p = 2, penalize_lines = "soft" )
mk_plate_scoring_functions( batch_container, plate = NULL, row, column, group, p = 2, penalize_lines = "soft" )
batch_container |
Batch container (bc) with all columns that denote plate related information |
plate |
Name of the bc column that holds the plate identifier (may be missing or NULL in case just one plate is used) |
row |
Name of the bc column that holds the plate row number (integer values starting at 1) |
column |
Name of the bc column that holds the plate column number (integer values starting at 1) |
group |
Name of the bc column that denotes a group/condition that should be distributed on the plate |
p |
p parameter for minkowski type of distance metrics. Special cases: p=1 - Manhattan distance; p=2 - Euclidean distance |
penalize_lines |
How to penalize samples of the same group in one row or column of the plate. Valid options are: 'none' - there is no penalty and the pure distance metric counts, 'soft' - penalty will depend on the well distance within the shared plate row or column, 'hard' - samples in the same row/column will score a zero distance |
List of scoring functions, one per plate, that calculate a real valued measure for the quality of the group distribution (the lower the better).
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("column" = 6, "row" = 10) ) bc <- assign_random(bc, invivo_study_samples) scoring_f <- mk_plate_scoring_functions( bc, row = "row", column = "column", group = "Sex" ) bc <- optimize_design(bc, scoring = scoring_f, max_iter = 100) plot_plate(bc$get_samples(), .col = Sex)
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("column" = 6, "row" = 10) ) bc <- assign_random(bc, invivo_study_samples) scoring_f <- mk_plate_scoring_functions( bc, row = "row", column = "column", group = "Sex" ) bc <- optimize_design(bc, scoring = scoring_f, max_iter = 100) plot_plate(bc$get_samples(), .col = Sex)
Generate acceptance function for an optimization protocol based on simulated annealing
mk_simanneal_acceptance_func( temp_function = mk_simanneal_temp_func(T0 = 500, alpha = 0.8) )
mk_simanneal_acceptance_func( temp_function = mk_simanneal_temp_func(T0 = 500, alpha = 0.8) )
temp_function |
A temperature function that returns the annealing temperature for a certain cycle k |
A function that takes parameters (current_score
, best_score
, iteration
) for an optimization step and return a Boolean indicating whether the current solution should be accepted or dismissed. Acceptance probability of a worse solution decreases with annealing temperature.
Supported annealing types are currently "Exponential multiplicative", "Logarithmic multiplicative", "Quadratic multiplicative" and "Linear multiplicative", each with dedicated constraints on alpha. For information, see http://what-when-how.com/artificial-intelligence/a-comparison-of-cooling-schedules-for-simulated-annealing-artificial-intelligence/
mk_simanneal_temp_func(T0, alpha, type = "Quadratic multiplicative")
mk_simanneal_temp_func(T0, alpha, type = "Quadratic multiplicative")
T0 |
Initial temperature at step 1 (when k=0) |
alpha |
Rate of cooling |
type |
Type of annealing protocol. Defaults to the quadratic multiplicative method which seems to perform well. |
Temperature at cycle k
.
If length(n_swaps)==1, the returned function may be called an arbitrary number of times. If length(n_swaps)>1 the returned function may be called length(n_swaps) timed before returning NULL, which would be the stopping criterion if all requested swaps have been exhausted.
mk_subgroup_shuffling_function( subgroup_vars, restrain_on_subgroup_levels = c(), n_swaps = 1 )
mk_subgroup_shuffling_function( subgroup_vars, restrain_on_subgroup_levels = c(), n_swaps = 1 )
subgroup_vars |
Column names of the variables that together define the relevant subgroups |
restrain_on_subgroup_levels |
Permutations can be forced to take place only within a level of the factor of the subgrouping variable. In this case, the user must pass only one subgrouping variable and a number of levels that together define the permuted subgroup. |
n_swaps |
Vector with number of swaps to be proposed in successive calls to the returned function (each value should be in valid range from 1..floor(n_locations/2)) |
Function to return a list with length n vectors src
and dst
, denoting source and destination index for the swap operation, or NULL
if the user provided a defined protocol for the number of swaps and the last iteration has been reached
set.seed(42) bc <- BatchContainer$new( dimensions = c( plate = 2, row = 4, col = 4 ) ) bc <- assign_in_order(bc, samples = tibble::tibble( Group = c(rep(c("Grp 1", "Grp 2", "Grp 3", "Grp 4"), each = 8)), ID = 1:32 )) # here we use a 2-step approach: # 1. Assign samples to plates. # 2. Arrange samples within plates. # overview of sample assagnment before optimization plot_plate(bc, plate = plate, row = row, column = col, .color = Group ) # Step 1, assign samples to plates scoring_f <- osat_score_generator( batch_vars = c("plate"), feature_vars = c("Group") ) bc <- optimize_design( bc, scoring = scoring_f, max_iter = 10, # the real number of iterations should be bigger n_shuffle = 2, quiet = TRUE ) plot_plate( bc, plate = plate, row = row, column = col, .color = Group ) # Step 2, distribute samples within plates scoring_f <- mk_plate_scoring_functions( bc, plate = "plate", row = "row", column = "col", group = "Group" ) bc <- optimize_design( bc, scoring = scoring_f, max_iter = 50, shuffle_proposal_func = mk_subgroup_shuffling_function(subgroup_vars = c("plate")), aggregate_scores_func = L2s_norm, quiet = TRUE ) plot_plate(bc, plate = plate, row = row, column = col, .color = Group )
set.seed(42) bc <- BatchContainer$new( dimensions = c( plate = 2, row = 4, col = 4 ) ) bc <- assign_in_order(bc, samples = tibble::tibble( Group = c(rep(c("Grp 1", "Grp 2", "Grp 3", "Grp 4"), each = 8)), ID = 1:32 )) # here we use a 2-step approach: # 1. Assign samples to plates. # 2. Arrange samples within plates. # overview of sample assagnment before optimization plot_plate(bc, plate = plate, row = row, column = col, .color = Group ) # Step 1, assign samples to plates scoring_f <- osat_score_generator( batch_vars = c("plate"), feature_vars = c("Group") ) bc <- optimize_design( bc, scoring = scoring_f, max_iter = 10, # the real number of iterations should be bigger n_shuffle = 2, quiet = TRUE ) plot_plate( bc, plate = plate, row = row, column = col, .color = Group ) # Step 2, distribute samples within plates scoring_f <- mk_plate_scoring_functions( bc, plate = "plate", row = "row", column = "col", group = "Group" ) bc <- optimize_design( bc, scoring = scoring_f, max_iter = 50, shuffle_proposal_func = mk_subgroup_shuffling_function(subgroup_vars = c("plate")), aggregate_scores_func = L2s_norm, quiet = TRUE ) plot_plate(bc, plate = plate, row = row, column = col, .color = Group )
If length(n_swaps)==1
, the returned function may be called an arbitrary number of times.
If length(n_swaps)>1
and called without argument, the returned function may be called length(n_swaps) timed before returning NULL, which would be the stopping criterion if all requested swaps have been exhausted. Alternatively, the function may be called with an iteration number as the only argument, giving the user some freedom how to iterate over the sample swapping protocol.
mk_swapping_function(n_swaps = 1)
mk_swapping_function(n_swaps = 1)
n_swaps |
Vector with number of swaps to be proposed in successive calls to the returned function (each value should be in valid range from 1.. |
Function to return a list with length n vectors src
and dst
, denoting source and destination index for the swap operation, or NULL if the user provided a defined protocol for the number of swaps and the last iteration has been reached.
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("plate" = 2, "column" = 5, "row" = 6) ) scoring_f <- osat_score_generator("plate", "Sex") optimize_design( bc, scoring = scoring_f, invivo_study_samples, max_iter = 100, shuffle_proposal_func = mk_swapping_function(1) )
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("plate" = 2, "column" = 5, "row" = 6) ) scoring_f <- osat_score_generator("plate", "Sex") optimize_design( bc, scoring = scoring_f, invivo_study_samples, max_iter = 100, shuffle_proposal_func = mk_swapping_function(1) )
A sample list with 4 columns SampleName, Well, Time and Treatment Not all treatments are avaliable at all time points. All samples are placed on the same plate.
data(multi_trt_day_samples)
data(multi_trt_day_samples)
An object of class "tibble"
siebourj
Generic optimizer that can be customized by user provided functions for generating shuffles and progressing towards the minimal score
optimize_design( batch_container, samples = NULL, scoring = NULL, n_shuffle = NULL, shuffle_proposal_func = NULL, acceptance_func = accept_strict_improvement, aggregate_scores_func = identity, check_score_variance = TRUE, autoscale_scores = FALSE, autoscaling_permutations = 100, autoscale_useboxcox = TRUE, sample_attributes_fixed = FALSE, max_iter = 10000, min_delta = NA, quiet = FALSE )
optimize_design( batch_container, samples = NULL, scoring = NULL, n_shuffle = NULL, shuffle_proposal_func = NULL, acceptance_func = accept_strict_improvement, aggregate_scores_func = identity, check_score_variance = TRUE, autoscale_scores = FALSE, autoscaling_permutations = 100, autoscale_useboxcox = TRUE, sample_attributes_fixed = FALSE, max_iter = 10000, min_delta = NA, quiet = FALSE )
batch_container |
An instance of |
samples |
A |
scoring |
Scoring function or a named |
n_shuffle |
Vector of length 1 or larger, defining how many random sample
swaps should be performed in each iteration. If |
shuffle_proposal_func |
A user defined function to propose the next shuffling of samples.
Takes priority over n_shuffle if both are provided. The function is called with
a BatchContainer |
acceptance_func |
Alternative function to select a new score as the best one.
Defaults to strict improvement rule, i.e. all elements of a score have to be smaller or equal in order to accept the solution as better.
This may be replaced with an alternative acceptance function included in the package
(e.g. |
aggregate_scores_func |
A function to aggregate multiple scores AFTER (potential) auto-scaling and BEFORE acceptance evaluation.
If a function is passed, (multi-dimensional) scores will be transformed (often to a single double value) before calling the acceptance function.
E.g., see |
check_score_variance |
Logical: if TRUE, scores will be checked for variability under sample permutation and the optimization is not performed if at least one subscore appears to have a zero variance. |
autoscale_scores |
Logical: if TRUE, perform a transformation on the fly to equally scale scores to a standard normal. This makes scores more directly comparable and easier to aggregate. |
autoscaling_permutations |
How many random sample permutations should be done to estimate autoscaling parameters. (Note: minimum will be 20, regardless of the specified value) |
autoscale_useboxcox |
Logical; if TRUE, use a boxcox transformation for the autoscaling if possible at all.
Requires installation of the |
sample_attributes_fixed |
Logical; if TRUE, sample shuffle function may generate altered sample attributes at each iteration. This affects estimation of score distributions. (Parameter only relevant if shuffle function does introduce attributes!) |
max_iter |
Stop optimization after a maximum number of iterations, independent from other stopping criteria (user defined shuffle proposal or min_delta). |
min_delta |
If not NA, optimization is stopped as soon as successive improvement (i.e. euclidean distance between score vectors from current best and previously best solution) drops below min_delta. |
quiet |
If TRUE, suppress non-critical warnings or messages. |
A trace object
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("plate" = 2, "column" = 5, "row" = 6) ) bc <- optimize_design(bc, invivo_study_samples, scoring = osat_score_generator("plate", "Sex"), max_iter = 100 ) plot_plate(bc$get_samples(), .col = Sex)
data("invivo_study_samples") bc <- BatchContainer$new( dimensions = c("plate" = 2, "column" = 5, "row" = 6) ) bc <- optimize_design(bc, invivo_study_samples, scoring = osat_score_generator("plate", "Sex"), max_iter = 100 ) plot_plate(bc$get_samples(), .col = Sex)
The batch container will in the end contain the updated experimental layout
optimize_multi_plate_design( batch_container, across_plates_variables = NULL, within_plate_variables = NULL, plate = "plate", row = "row", column = "column", n_shuffle = 1, max_iter = 1000, quiet = FALSE )
optimize_multi_plate_design( batch_container, across_plates_variables = NULL, within_plate_variables = NULL, plate = "plate", row = "row", column = "column", n_shuffle = 1, max_iter = 1000, quiet = FALSE )
batch_container |
Batch container (bc) with all columns that denote plate related information |
across_plates_variables |
Vector with bc column name(s) that denote(s) groups/conditions to be balanced across plates, sorted by relative importance of the factors |
within_plate_variables |
Vector with bc column name(s) that denote(s) groups/conditions to be spaced out within each plate, sorted by relative importance of the factors |
plate |
Name of the bc column that holds the plate identifier |
row |
Name of the bc column that holds the plate row number (integer values starting at 1) |
column |
Name of the bc column that holds the plate column number (integer values starting at 1) |
n_shuffle |
Vector of length 1 or larger, defining how many random sample
swaps should be performed in each iteration. See |
max_iter |
Stop any of the optimization runs after this maximum number of iterations. See |
quiet |
If TRUE, suppress informative messages. |
A list with named traces, one for each optimization step
The OSAT score is intended to ensure even distribution of samples across batches and is closely related to the chi-square test contingency table (Yan et al. (2012) doi:10.1186/1471-2164-13-689).
osat_score(bc, batch_vars, feature_vars, expected_dt = NULL, quiet = FALSE)
osat_score(bc, batch_vars, feature_vars, expected_dt = NULL, quiet = FALSE)
bc |
BatchContainer with samples
or |
batch_vars |
character vector with batch variable names to take into account for the score computation. |
feature_vars |
character vector with sample variable names to take into account for score computation. |
expected_dt |
A |
quiet |
Do not warn about |
a list with two attributes: $score
(numeric score value), $expected_dt
(expected counts data.table
for reuse)
sample_assignment <- tibble::tribble( ~ID, ~SampleType, ~Sex, ~plate, 1, "Case", "Female", 1, 2, "Case", "Female", 1, 3, "Case", "Male", 2, 4, "Control", "Female", 2, 5, "Control", "Female", 1, 6, "Control", "Male", 2, NA, NA, NA, 1, NA, NA, NA, 2, ) osat_score(sample_assignment, batch_vars = "plate", feature_vars = c("SampleType", "Sex") )
sample_assignment <- tibble::tribble( ~ID, ~SampleType, ~Sex, ~plate, 1, "Case", "Female", 1, 2, "Case", "Female", 1, 3, "Case", "Male", 2, 4, "Control", "Female", 2, 5, "Control", "Female", 1, 6, "Control", "Male", 2, NA, NA, NA, 1, NA, NA, NA, 2, ) osat_score(sample_assignment, batch_vars = "plate", feature_vars = c("SampleType", "Sex") )
This function wraps osat_score()
in order to take full advantage of the speed gain without
managing the buffered objects in the user code.
osat_score_generator(batch_vars, feature_vars, quiet = FALSE)
osat_score_generator(batch_vars, feature_vars, quiet = FALSE)
batch_vars |
character vector with batch variable names to take into account for the score computation. |
feature_vars |
character vector with sample variable names to take into account for score computation. |
quiet |
Do not warn about |
A function that returns the OSAT score for a specific sample arrangement
sample_assignment <- tibble::tribble( ~ID, ~SampleType, ~Sex, ~plate, 1, "Case", "Female", 1, 2, "Case", "Female", 1, 3, "Case", "Male", 2, 4, "Control", "Female", 2, 5, "Control", "Female", 1, 6, "Control", "Male", 2, NA, NA, NA, 1, NA, NA, NA, 2, ) osat_scoring_function <- osat_score_generator( batch_vars = "plate", feature_vars = c("SampleType", "Sex") ) osat_scoring_function(sample_assignment)
sample_assignment <- tibble::tribble( ~ID, ~SampleType, ~Sex, ~plate, 1, "Case", "Female", 1, 2, "Case", "Female", 1, 3, "Case", "Male", 2, 4, "Control", "Female", 2, 5, "Control", "Female", 1, 6, "Control", "Male", 2, NA, NA, NA, 1, NA, NA, NA, 2, ) osat_scoring_function <- osat_score_generator( batch_vars = "plate", feature_vars = c("SampleType", "Sex") ) osat_scoring_function(sample_assignment)
Here top and bottom row were both used as controls (in dilutions). The top row however was affected differently than the bottom one. This makes normalization virtually impossible.
data(plate_effect_example)
data(plate_effect_example)
An object of class "tibble"
Plate row
Plate column
Sample concentration
Logarithm of sample concentration
Sample treatment
Readout from experiment
Balazs Banfai
Plot plate layouts
plot_plate( .tbl, plate = plate, row = row, column = column, .color, .alpha = NULL, .pattern = NULL, title = paste("Layout by", rlang::as_name(rlang::enquo(plate))), add_excluded = FALSE, rename_empty = FALSE )
plot_plate( .tbl, plate = plate, row = row, column = column, .color, .alpha = NULL, .pattern = NULL, title = paste("Layout by", rlang::as_name(rlang::enquo(plate))), add_excluded = FALSE, rename_empty = FALSE )
.tbl |
a |
plate |
optional dimension variable used for the plate ids |
row |
the dimension variable used for the row ids |
column |
the dimension variable used for the column ids |
.color |
the continuous or discrete variable to color by |
.alpha |
a continuous variable encoding transparency |
.pattern |
a discrete variable encoding tile pattern (needs ggpattern) |
title |
string for the plot title |
add_excluded |
flag to add excluded wells (in bc$exclude) to the plot. A BatchContainer must be provided for this. |
rename_empty |
whether NA entries in sample table should be renamed to 'empty'. |
the ggplot object
siebourj
nPlate <- 3 nColumn <- 4 nRow <- 6 treatments <- c("CTRL", "TRT1", "TRT2") timepoints <- c(1, 2, 3) bc <- BatchContainer$new( dimensions = list( plate = nPlate, column = list(values = letters[1:nColumn]), row = nRow ) ) sample_sheet <- tibble::tibble( sampleID = 1:(nPlate * nColumn * nRow), Treatment = rep(treatments, each = floor(nPlate * nColumn * nRow) / length(treatments)), Timepoint = rep(timepoints, floor(nPlate * nColumn * nRow) / length(treatments)) ) # assign samples from the sample sheet bc <- assign_random(bc, samples = sample_sheet) plot_plate(bc$get_samples(), plate = plate, column = column, row = row, .color = Treatment, .alpha = Timepoint ) plot_plate(bc$get_samples(), plate = plate, column = column, row = row, .color = Treatment, .pattern = Timepoint )
nPlate <- 3 nColumn <- 4 nRow <- 6 treatments <- c("CTRL", "TRT1", "TRT2") timepoints <- c(1, 2, 3) bc <- BatchContainer$new( dimensions = list( plate = nPlate, column = list(values = letters[1:nColumn]), row = nRow ) ) sample_sheet <- tibble::tibble( sampleID = 1:(nPlate * nColumn * nRow), Treatment = rep(treatments, each = floor(nPlate * nColumn * nRow) / length(treatments)), Timepoint = rep(timepoints, floor(nPlate * nColumn * nRow) / length(treatments)) ) # assign samples from the sample sheet bc <- assign_random(bc, samples = sample_sheet) plot_plate(bc$get_samples(), plate = plate, column = column, row = row, .color = Treatment, .alpha = Timepoint ) plot_plate(bc$get_samples(), plate = plate, column = column, row = row, .color = Treatment, .pattern = Timepoint )
Generate in one go a shuffling function that produces permutations with specific constraints on multiple sample variables and group sizes fitting one specific allocation variable
shuffle_grouped_data( batch_container, allocate_var, keep_together_vars = c(), keep_separate_vars = c(), n_min = NA, n_max = NA, n_ideal = NA, subgroup_var_name = NULL, report_grouping_as_attribute = FALSE, prefer_big_groups = FALSE, strict = TRUE, fullTree = FALSE, maxCalls = 1e+06 )
shuffle_grouped_data( batch_container, allocate_var, keep_together_vars = c(), keep_separate_vars = c(), n_min = NA, n_max = NA, n_ideal = NA, subgroup_var_name = NULL, report_grouping_as_attribute = FALSE, prefer_big_groups = FALSE, strict = TRUE, fullTree = FALSE, maxCalls = 1e+06 )
batch_container |
Batch container with all samples assigned that are to be grouped and sub-grouped |
allocate_var |
Name of a variable in the |
keep_together_vars |
Vector of column names in sample table; groups are formed by pooling samples with identical values of all those variables |
keep_separate_vars |
Vector of column names in sample table; items with identical values in those variables will not be put into the same subgroup if at all possible |
n_min |
Minimal number of samples in one sub(!)group; by default 1 |
n_max |
Maximal number of samples in one sub(!)group; by default the size of the biggest group |
n_ideal |
Ideal number of samples in one sub(!)group; by default the floor or ceiling of |
subgroup_var_name |
An optional column name for the subgroups which are formed (or NULL) |
report_grouping_as_attribute |
Boolean, if TRUE, add an attribute table to the permutation functions' output, to be used in scoring during the design optimization |
prefer_big_groups |
Boolean; indicating whether or not bigger subgroups should be preferred in case of several possibilities |
strict |
Boolean; if TRUE, subgroup size constraints have to be met strictly, implying the possibility of finding no solution at all |
fullTree |
Boolean: Enforce full search of the possibility tree, independent of the value of |
maxCalls |
Maximum number of recursive calls in the search tree, to avoid long run times with very large trees |
Shuffling function that on each call returns an index vector for a valid sample permutation
Can be used with optimize_design
to improve convergence speed.
shuffle_with_constraints(src = TRUE, dst = TRUE)
shuffle_with_constraints(src = TRUE, dst = TRUE)
src |
Expression to define possible source locations in the samples/locations
table. Usually evaluated based on
|
dst |
Expression to define possible destination locations in the
samples/locations table. Usually evaluated based on |
Returns a function which accepts a BatchContainer
and an iteration
number (i
). This function returns a list with two names: src
vector of length
2 and dst
vector of length two. See BatchContainer$move_samples()
.
set.seed(43) samples <- data.frame( id = 1:100, sex = sample(c("F", "M"), 100, replace = TRUE), group = sample(c("treatment", "control"), 100, replace = TRUE) ) bc <- BatchContainer$new( dimensions = c("plate" = 5, "position" = 25) ) scoring_f <- function(samples) { osat_score( samples, "plate", c("sex", "group") )$score } # in this example we treat all the positions in the plate as equal. # when shuffling we enforce that source location is non-empty, # and destination location has a different plate number bc <- optimize_design( bc, scoring = scoring_f, samples, shuffle_proposal = shuffle_with_constraints( # source is non-empty location !is.na(.sample_id), # destination has a different plate plate != .src$plate ), max_iter = 10 )
set.seed(43) samples <- data.frame( id = 1:100, sex = sample(c("F", "M"), 100, replace = TRUE), group = sample(c("treatment", "control"), 100, replace = TRUE) ) bc <- BatchContainer$new( dimensions = c("plate" = 5, "position" = 25) ) scoring_f <- function(samples) { osat_score( samples, "plate", c("sex", "group") )$score } # in this example we treat all the positions in the plate as equal. # when shuffling we enforce that source location is non-empty, # and destination location has a different plate number bc <- optimize_design( bc, scoring = scoring_f, samples, shuffle_proposal = shuffle_with_constraints( # source is non-empty location !is.na(.sample_id), # destination has a different plate plate != .src$plate ), max_iter = 10 )
Compose shuffling function based on already available subgrouping and allocation information
shuffle_with_subgroup_formation( subgroup_object, subgroup_allocations, keep_separate_vars = c(), report_grouping_as_attribute = FALSE )
shuffle_with_subgroup_formation( subgroup_object, subgroup_allocations, keep_separate_vars = c(), report_grouping_as_attribute = FALSE )
subgroup_object |
A subgrouping object as returned by |
subgroup_allocations |
A list of possible assignments of the allocation variable as returned by |
keep_separate_vars |
Vector of column names in sample table; items with identical values in those variables will not be put into the same subgroup if at all possible |
report_grouping_as_attribute |
Boolean, if TRUE, add an attribute table to the permutation functions' output, to be used in scoring during the design optimization |
Shuffling function that on each call returns an index vector for a valid sample permutation
Aggregation of scores: sum up all individual scores
sum_scores(scores, na.rm = FALSE, ...)
sum_scores(scores, na.rm = FALSE, ...)
scores |
A score or multiple component score vector |
na.rm |
Boolean. Should NA values be ignored when obtaining the maximum? FALSE by default as ignoring NA values may render the sum meaningless. |
... |
Parameters to be ignored by this aggregation function |
The aggregated score, i.e. the sum of all indicidual scores.
sum_scores(c(3, 2, 1))
sum_scores(c(3, 2, 1))
Validates sample data.frame.
validate_samples(samples)
validate_samples(samples)
samples |
A |
This function enables comparison of the results of two scoring functions by just basing the decision on the largest element. This corresponds to the infinity-norm in ML terms.
worst_score(scores, na.rm = FALSE, ...)
worst_score(scores, na.rm = FALSE, ...)
scores |
A score or multiple component score vector |
na.rm |
Boolean. Should NA values be ignored when obtaining the maximum? FALSE by default as ignoring NA values may hide some issues with the provided scoring functions and also the aggregated value cannot be seen as the proper infinity norm anymore. |
... |
Parameters to be ignored by this aggregation function |
The aggregated score, i.e. the value of the largest element in a multiple-component score vector.
worst_score(c(3, 2, 1))
worst_score(c(3, 2, 1))