---
title: "designit: a flexible engine to generate experiment layouts"
author: "Juliane Siebourg-Polster, Iakov Davydov, Guido Steiner, Balazs Banfai"
output: 
  rmarkdown::html_vignette: 
vignette: >
  %\VignetteIndexEntry{designit: a flexible engine to generate experiment layouts}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
editor_options: 
  chunk_output_type: inline
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include = FALSE}
library(designit)
library(tidyverse)
```


# Introduction

Examples in this vignette are used were used in our presentation.

It uses a subset of the `longitudinal_subject_samples` dataset.

```{r get_data, include = TRUE}
data("longitudinal_subject_samples")

dat <- longitudinal_subject_samples |> 
  filter(Group %in% 1:5, Week %in% c(1, 4)) |>
  select(SampleID, SubjectID, Group, Sex, Week)

# for simplicity: remove two subjects that don't have both visits
dat <- dat |>
  filter(SubjectID %in%
    (dat |> count(SubjectID) |> filter(n == 2) |> pull(SubjectID)))


subject_data <- dat |>
  select(SubjectID, Group, Sex) |>
  unique()
```

## Batch effects matter
Here's an example of plate effect. Here both top and bottom rows of the
plate are used as controls. 

This is the experiment design:

```{r, fig.width= 4, fig.height=3, echo = FALSE}
data("plate_effect_example")
plate_effect_example |>
  ggplot() +
  aes(x = column, y = row, fill = treatment, alpha = log_conc) +
  geom_tile() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  scale_y_discrete(limits = rev) +
  scale_fill_brewer(palette = "Set1") +
  # make transparency more visible
  scale_alpha_continuous(range = c(0.2, 1)) +
  ggtitle("Design")
```

These are the readouts:

```{r, fig.width= 4, fig.height=5, echo = FALSE}
p1 <- plate_effect_example |>
  ggplot() +
  aes(x = column, y = row, fill = readout) +
  geom_tile() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  scale_y_discrete(limits = rev) +
  scale_fill_viridis_c() +
  ggtitle("Readout")

p2 <- plate_effect_example |>
  filter(treatment == "control") |>
  mutate(column = as.numeric(column)) |>
  ggplot() +
  aes(x = column, y = readout, color = row) +
  geom_point() +
  geom_line() +
  scale_color_brewer(palette = "Set1") +
  ggtitle("Control")

cowplot::plot_grid(p1, p2, nrow = 2)
```

Due to the plate effect, the control rows are affected differently. It is
virtually impossible to normalize readouts in a meaningful way.

## Go fully random? 

* Could it be sufficient to randomly distribute samples across batches?
* Not necessarily!
    * Often sample sizes are too small to avoid grouping by change
    * Experimental constraints might not allow for a fully random layout

```{r, echo = FALSE}
set.seed(17) # gives `bad` random assignment

bc <- BatchContainer$new(
  dimensions = list("batch" = 3, "location" = 11)
) |>
  assign_random(subject_data)
```

Gone wrong: Random distribution of 31 grouped subjects into 3 batches 
turns out unbalanced: 

```{r, fig.width= 3, fig.height=3, echo = FALSE}
bc$get_samples() |>
  ggplot(aes(x = batch, fill = Group)) +
  geom_bar() +
  labs(y = "subject count")
```

“**Block** what you can and **randomize** what you cannot.”  (G. Box, 1978)


## designit

::: {#hello .greeting .message style="color: darkgreen;"}
To **avoid batch or gradient effects** in complex experiments, 
`designit` is an R package that offers flexible ways to **allocate a given set of 
samples to experiment layouts**. 
It's strength is that it implements a very general framework that can 
easily be customized and extended to fit specific constrained layouts.
:::

* Data structure: `BatchContainer` class
  * R6 object storing:
      * Experiment dimensions (cages, plates…)
      * Sample annotation
      * Scoring functions for sample distribution
* Main function: `optimize_design()`
  * Optimizes the layout with user defined
    * Scores for sample distribution
    * Optimization protocols
    * Sample shuffling functions
  * Returns improved design and optimization trace


# Sample Batching 

## Setup

* Assign 31 samples to 3 equally sized batches
* Balance by: 
    * treatment group (higher priority)
    * sex (lower priority)

```{r, include=FALSE}
set.seed(17) # gives `bad` random assignment
```
```{r}
bc <- BatchContainer$new(
  dimensions = list("batch" = 3, "location" = 11)
) |>
  assign_random(subject_data)
```


**Batch composition before optimization**

```{r, fig.width= 5.5, fig.height=3, echo = FALSE}
cowplot::plot_grid(
  plotlist = list(
    bc$get_samples() |>
      ggplot(aes(x = batch, fill = Group)) +
      geom_bar() +
      labs(y = "subject count"),
    bc$get_samples() |>
      ggplot(aes(x = batch, fill = Sex)) +
      geom_bar() +
      labs(y = "subject count")
  ),
  nrow = 1
)
```

```{r, eval = FALSE}
bc$get_samples()
```

```{r, echo=FALSE}
bind_rows(
  head(bc$get_samples(), 3) |>
    mutate(across(everything(), as.character)),
  tibble(
    batch = "...",
    location = " ...",
    SubjectID = "...",
    Group = "...", Sex = "..."
  ),
  tail(bc$get_samples(), 3) |>
    mutate(across(everything(), as.character))
) |>
  gt::gt() |>
  gt::tab_options(
    table.font.size = 11,
    data_row.padding = 0.1
  )
```


## Optimization

* Assign 31 samples to 3 equally sized batches
* Balance by: 
    * treatment group (higher priority)
    * sex (lower priority)

```{r, warning=FALSE}
bc <- optimize_design(
  bc,
  scoring = list(
    group = osat_score_generator(
      batch_vars = "batch",
      feature_vars = "Group"
    ),
    sex = osat_score_generator(
      batch_vars = "batch",
      feature_vars = "Sex"
    )
  ),
  n_shuffle = 1,
  acceptance_func =
    ~ accept_leftmost_improvement(..., tolerance = 0.01),
  max_iter = 150,
  quiet = TRUE
)
```

**Batch composition after optimization**

```{r, fig.width= 8, fig.height=3, echo = FALSE}
cowplot::plot_grid(
  plotlist = list(
    bc$get_samples() |>
      ggplot(aes(x = batch, fill = Group)) +
      geom_bar() +
      labs(y = "subject count"),
    bc$get_samples() |>
      ggplot(aes(x = batch, fill = Sex)) +
      geom_bar() +
      labs(y = "subject count"),
    bc$plot_trace(include_aggregated = TRUE)
  ),
  ncol = 3
)
```


```{r, echo=FALSE}
bind_rows(
  head(bc$get_samples(), 3) |>
    mutate(across(everything(), as.character)),
  tibble(
    batch = "...",
    location = " ...",
    SubjectID = "...",
    Group = "...", Sex = "..."
  ),
  tail(bc$get_samples(), 3) |>
    mutate(across(everything(), as.character))
) |>
  gt::gt() |>
  gt::tab_options(
    table.font.size = 11,
    data_row.padding = 0.1
  )
```


# Plate layouts

## Continuous confounding

Assays are often performed in well plates  (24, 96, 384) 

Observed effects

* Edge effects (bad plate sealing)
* Gradients (non-equal temperature distribution)
* Row / column effects (pipetting issues)

Since plate effects often cannot be avoided, we aim to distribute sample groups 
of interest evenly across the plate and adjust for the effect computationally.


## Setup

* Assume previous batches are 24-well plates 
* Within plate optimization & across plate blocking
* Balanced by: 
    * treatment group (higher priority)
    * sex (lower priority)

```{r}
set.seed(4)

bc <- BatchContainer$new(
  dimensions = list("plate" = 3, "row" = 4, "col" = 6)
) |>
  assign_in_order(dat)
```

```{r, fig.width= 5, fig.height=4.5, eval=FALSE}
plot_plate(bc,
  plate = plate, row = row, column = col,
  .color = Group, title = "Initial layout by Group"
)
plot_plate(bc,
  plate = plate, row = row, column = col,
  .color = Sex, title = "Initial layout by Sex"
)
```

```{r, fig.width= 5, fig.height=4.5, echo=FALSE}
cowplot::plot_grid(
  plotlist = list(
    plot_plate(bc,
      plate = plate, row = row, column = col,
      .color = Group, title = "Initial layout by Group"
    ),
    plot_plate(bc,
      plate = plate, row = row, column = col,
      .color = Sex, title = "Initial layout by Sex"
    )
  ),
  nrow = 2
)
```

## 2-step optimization

### Across plate optimization using osat score as before
```{r, warning=FALSE}
bc1 <- optimize_design(
  bc,
  scoring = list(
    group = osat_score_generator(
      batch_vars = "plate",
      feature_vars = "Group"
    ),
    sex = osat_score_generator(
      batch_vars = "plate",
      feature_vars = "Sex"
    )
  ),
  n_shuffle = 1,
  acceptance_func =
    ~ accept_leftmost_improvement(..., tolerance = 0.01),
  max_iter = 150,
  quiet = TRUE
)
```

```{r, fig.width= 5, fig.height=4.5, echo=FALSE}
cowplot::plot_grid(
  plotlist = list(
    plot_plate(bc1,
      plate = plate, row = row, column = col,
      .color = Group, title = "Layout after the first step, Group"
    ),
    plot_plate(bc1,
      plate = plate, row = row, column = col,
      .color = Sex, title = "Layout after the first step, Sex"
    )
  ),
  nrow = 2
)
```

### Within plate optimization using distance based sample scoring function
```{r, warning=FALSE}
bc2 <- optimize_design(
  bc1,
  scoring = mk_plate_scoring_functions(
    bc1,
    plate = "plate", row = "row", column = "col",
    group = "Group"
  ),
  shuffle_proposal_func = shuffle_with_constraints(dst = plate == .src$plate),
  max_iter = 150,
  quiet = TRUE
)
```

```{r, fig.width= 5, fig.height=4.5, echo=FALSE}
cowplot::plot_grid(
  plotlist = list(
    plot_plate(bc2,
      plate = plate, row = row, column = col,
      .color = Group, title = "Layout after the second step, Group"
    ),
    plot_plate(bc2,
      plate = plate, row = row, column = col,
      .color = Sex, title = "Layout after the second step, Sex"
    )
  ),
  nrow = 2
)
```


## 2-step optimization `multi_plate_layout()`

We are performing the same optimization as before, but using the
`multi_plate_layout()` function to combine the two steps.

```{r, warning=FALSE, message=FALSE} 
bc <- optimize_multi_plate_design(
  bc,
  across_plates_variables = c("Group", "Sex"),
  within_plate_variables = c("Group"),
  plate = "plate", row = "row", column = "col",
  n_shuffle = 2,
  max_iter = 500 # 2000
)
```


```{r, fig.width= 5, fig.height=4.5, echo=FALSE}
cowplot::plot_grid(
  plotlist = list(
    plot_plate(bc,
      plate = plate, row = row, column = col,
      .color = Group, title = "After optimization, Group"
    ),
    plot_plate(bc,
      plate = plate, row = row, column = col,
      .color = Sex, title = "After optimization, Sex"
    )
  ),
  nrow = 2
)
```

```{r fig.width=5, fig.height=4, echo=FALSE}
bc$plot_trace()
```


# Glimpse on more complex application

Goal: 

* Assign 3 treatment conditions to 59 animals, 
  representing 2 relevant strains
* Avoid confounding by sex, weight and age

Constraints:

* Cages host ideally 3 animals (preferably 2-5)
* Strain, Sex and Treatment must be homogeneous within a cage
* Don’t put males from different litters same cage; 
  litter mixing is possible for females!
* Average weight and age composition comparable between treatment groups and cages
* Avoid animals with identical ear markings in same cage (if possible)
* Treatment distribution across animal subgroups (if specified) has to be respected

see vignette `invivo_study_design` for the full story.


# Conclusion

* designit aims to be general and adaptable
    * One framework to address simple batching as well as complex multi-step procedures
    * Easy add-ons: custom scoring-functions, acceptance-criteria and
    shuffling-procedures can be passed to optimize_design by the user
* Includes functions and vignettes for frequently used layouts such as plates.


**Acknowledgements**

* Martha Serrano
* Sabine Wilson
* David Jitao Zhang
* Fabian Birzele
* PMDA group for feedback?


```{r, fig.width=4.0, fig.hight = 5.0, echo = FALSE}
layout <- crossing(row = 1:9, column = 1:12) |>
  mutate(Questions = "no")
layout$Questions[c(
  16, 17, 18, 19, 20, 21,
  27, 28, 33, 34,
  45, 46,
  55, 56, 66, 67, 90, 91
)] <- "yes"

plot_plate(layout, .color = Questions, title = "Thank you")
```