We load the packages used throughout the analysis for data manipulation (dplyr, data.table), string handling (stringr), scaling utilities (scales), and tidy outputs (tibble).
#install.packages('lubridate')
pkgs <- c("dplyr","data.table","stringr","scales","tibble")
for (p in pkgs) if (!requireNamespace(p, quietly=TRUE)) install.packages(p)
library(dplyr); library(data.table); library(stringr); library(scales); library(tibble)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Adjuntando el paquete: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
We first load a gene-by-sample expression matrix in logTPM scale, where the first column contains gene identifiers and the remaining columns correspond to samples, and a metadata table with sample-level annotations.
expr_df <- read.csv("log_tpm.csv", check.names = FALSE) # first col = gene_id
meta_df <- read.csv("sample_table.csv", check.names = FALSE)
colnames(expr_df)[1:10]
## [1] "" "p1k_00001" "p1k_00002" "p1k_00003" "p1k_00004" "p1k_00005"
## [7] "p1k_00006" "p1k_00007" "p1k_00008" "p1k_00009"
colnames(meta_df)[1:10]
## [1] "" "sample_id" "study"
## [4] "project" "condition" "rep_id"
## [7] "Strain Description" "Strain" "Culture Type"
## [10] "Evolved Sample"
We download the E. coli genome annotation (GFF) from NCBI to obtain genomic coordinates for each gene. These coordinates allow us to map expression measurements onto positions along the chromosome.
gff_url <- "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.gff.gz"
gff_file <- "GCF_000005845.2_ASM584v2_genomic.gff.gz"
if (!file.exists(gff_file)) download.file(gff_url, gff_file, mode="wb")
gff <- fread(gff_file, sep="\t", header=FALSE, quote="", comment.char="#", fill=TRUE) #fread help us to manage the information from the NCBI File
setnames(gff, c("seqid","source","type","start","end","score","strand","phase","attr")) #We set some known names to the columns to manage more easy the
genes_main <- copy(gff)
table(gff$type) #We make some expoloration of the data in gff
##
## CDS exon gene
## 4340 216 4506
## mobile_genetic_element ncRNA origin_of_replication
## 50 108 1
## pseudogene region rRNA
## 145 1 22
## sequence_feature tRNA
## 48 86
In this step we transform the raw NCBI GFF annotation into a clean gene-coordinate table that can be joined with the expression matrix. We first restrict the annotation to features of type gene and extract the locus_tag from the attribute field (attr) to obtain a gene identifier compatible with our dataset. Because the GFF may contain multiple sequences (e.g., chromosome and plasmids), we identify the main chromosome as the seqid with the largest coordinate span and store it in genes_main for downstream origin (oriC) detection. We then keep only the essential columns (gene_id, start, end), enforce integer coordinates, and estimate the chromosome length as \(L=\max(end)\). Finally, each gene is represented by a single position \(pos=(start+end)/2\) (midpoint), which provides a unique coordinate per gene for subsequent binning and circular mapping.
gff <- gff[type == "gene"] #We remain just with the gene information
gff[, gene_id := sub(".*locus_tag=([^;]+).*", "\\1", attr)] #We extract from the NCBI attr the identificator of the gene that correspond to the name of the gene in our dataset
seq_spans <- gff[, .(span = max(end, na.rm=TRUE) - min(start, na.rm=TRUE)), by=seqid]
main_seqid <- seq_spans[which.max(span), seqid]
genes_main <- gff[seqid == main_seqid]
gff <- gff[, .(gene_id, start, end)]
gff[, start := as.integer(start)]
gff[, end := as.integer(end)] #We extract the positions of the gene
typeof(gff$start)
## [1] "integer"
L <- max(gff$end, na.rm=TRUE)
#Sets L to the maximum end coordinate on the main chromosome:
#your estimate of the genome length (useful for building circular coordinates later).
#L should be approximately the chromosome length (e.g., ~4.6e6 for E. coli K12).
gene_coords <- gff[grepl("^b\\d{4}$", gene_id),
.(gene_id, start, end, pos=(start+end)/2)]
gene_coords <- unique(gene_coords, by="gene_id")
head(gene_coords)
## gene_id start end pos
## <char> <int> <int> <num>
## 1: b0001 190 255 222.5
## 2: b0002 337 2799 1568.0
## 3: b0003 2801 3733 3267.0
## 4: b0004 3734 5020 4377.0
## 5: b0005 5234 5530 5382.0
## 6: b0006 5683 6459 6071.0
At this stage, we align the expression matrix with the genomic coordinate table using a common gene identifier. As a basic integrity check, we confirm that gene IDs in the expression input are not duplicated, since repeated identifiers would break the one-to-one mapping between expression rows and genomic positions. We then build gene_info by joining the expression gene list with gene_coords, yielding a per-gene table that includes genomic coordinates (and later-derived quantities such as bin and angle). In parallel, we convert the expression values into a numeric matrix (expr_mat) and reorder/subset it to produce expr_mat2, ensuring that its row order matches exactly the order of gene_info. This guarantees that downstream binning and circular modeling are performed on expression values consistently matched to the correct genomic locations.
expr_gene_id <- expr_df[[1]] #Expression values themselves have no genomic meaning unless you know which gene each row corresponds to. So we extract the genes id of our dataset.
#This vector becomes the backbone for joining expression ↔ coordinates
anyDuplicated(expr_gene_id) #In our sanity check we have not repeated genes.
## [1] 0
#“take genes in expression order, then attach coordinates if available.”
#We create then "gene_info", that contains the genes of our dataset with its respective position.
gene_info <- tibble(gene_id = expr_gene_id) %>%
left_join(as.data.frame(gene_coords) %>% mutate(gene_id=as.character(gene_id)),
by="gene_id")
expr_mat <- as.matrix(expr_df[, -1]) #We transform our data into a matrix to apply it in the linear model.
rownames(expr_mat) <- expr_df[[1]]
storage.mode(expr_mat) <- "numeric"
expr_mat2 <- expr_mat[gene_info$gene_id, , drop=FALSE] #We organize our matrix to keep only rows corresponding to genes with coordinates, reorders it to match exactly the order of gene_info.
head(expr_mat2[, 1:6, drop=FALSE])
## p1k_00001 p1k_00002 p1k_00003 p1k_00004 p1k_00005 p1k_00006
## b0002 -0.05399314 0.05399314 0.879043050 1.0896003 0.18245342 -0.07887270
## b0003 -0.06197325 0.06197325 1.063329885 1.4778890 0.39257269 0.23467136
## b0004 -0.03697197 0.03697197 0.003131557 0.3567021 -0.09675615 -0.33491258
## b0005 -0.10496714 0.10496714 -1.118144530 -1.2779633 -0.18154598 -0.36405654
## b0006 0.04224167 -0.04224167 -0.123591888 -0.1077775 -0.17317704 -0.08988823
## b0007 0.05465326 -0.05465326 0.327976479 0.2759481 0.10268599 0.11661840
To represent the circular chromosome at a coarser (and less noisy) resolution, we partitioned the genome into fixed windows of 100 kb (“bins”). Each gene was assigned to a bin using its genomic midpoint, and because the chromosome is circular we used the wrapped coordinate \(pos\_{mod} = pos \bmod L\) to keep positions inside \([0, L)\).
For each bin we defined a single representative angular coordinate \(\theta\) using the bin midpoint. This midpoint was mapped onto \([0,2\pi)\) as: \[\theta = \frac{2\pi \cdot \text{bin\_mid}}{L}.\]
This bin-level angle is the input for the downstream circular regression model, \[y = \alpha + \beta\cos(\theta) + \gamma\sin(\theta),\] where \(y\) is the mean log-expression per bin. We also tracked the number of genes per bin (\(n\_{genes}\)), which can later be used as regression weights to account for unequal gene counts across windows.
window_size <- 100000
#E. coli genome ≈ 4.6 Mb → ~46 bins This is a tradeoff:
#Smaller bins → noisier but higher resolution
#Larger bins → smoother but may wash out signal
L / window_size #46.41628
## [1] 46.41628
gene_info <- gene_info %>%
mutate(
pos_mod = pos %% L,
bin = floor(pos_mod / window_size),
bin_mid = ((bin + 0.5) * window_size) %% L,
theta_bin = 2*pi*bin_mid/L
)
#y = α + β cos(θ) + γ sin(θ)
idx_by_bin <- split(seq_len(nrow(gene_info)), gene_info$bin)
n_genes_in_window <- sapply(idx_by_bin, length)
# this counts how many genes are in each bin;
# 0 1 2 3 4 5
#86 85 97 88 93 89
theta_by_bin <- gene_info %>%
distinct(bin, theta_bin) %>%
mutate(bin = as.integer(bin)) %>%
arrange(bin)
To fit the circular model, we first aggregated gene-level expression into bin-level expression. For each sample and each genomic bin, we computed the mean of the logTPM values across all genes assigned to that bin: \[y_{b,s}=\frac{1}{|G_b|}\sum_{g\in G_b} y_{g,s},\] where \(G_b\) is the set of genes in bin \(b\). Because the expression matrix is already in log scale, averaging within bins provides a stable summary of the bin’s typical expression level.
We then reshaped the bin-by-sample matrix into a long-format table
with one row per \((bin,\ sample)\)
pair, and we attached: (i) the bin’s angular coordinate \(\theta_{bin}\) (from the bin midpoint) and
(ii) the number of genes in the bin \(n_{genes}=|G_b|\). This final table
(bin_expr) is the direct input for the per-sample circular
regression, and \(n_{genes}\) can be
used as a weight to account for bins containing different numbers of
genes.
y_bin_mat <- sapply(idx_by_bin, function(idx) {
colMeans(expr_mat2[idx, , drop=FALSE], na.rm=TRUE)
}) #Here we build our dataset with the average expression for each bin.
y_bin_mat <- t(y_bin_mat) # And as we want to mantain the order we use the transpose so we have in the columns the experiments and in the lines the bin.
bins <- as.integer(names(idx_by_bin)) #Its easier to use the bins as an integer in the future
bin_expr <- data.frame(
bin = rep(bins, times = ncol(y_bin_mat)),
sample_id = rep(colnames(y_bin_mat), each = length(bins)),
y = as.vector(y_bin_mat)
) %>%
left_join(theta_by_bin, by="bin") %>%
mutate(n_genes = rep(as.integer(n_genes_in_window), times=ncol(y_bin_mat))) #Here we have our output for the model. With the long format required
head(bin_expr)
## bin sample_id y theta_bin n_genes
## 1 0 p1k_00001 -0.05766221 0.06768299 86
## 2 1 p1k_00001 -0.01140104 0.20304897 85
## 3 2 p1k_00001 -0.03812402 0.33841495 97
## 4 3 p1k_00001 -0.01064519 0.47378094 88
## 5 4 p1k_00001 -0.02505665 0.60914692 93
## 6 5 p1k_00001 0.02339887 0.74451290 89
For each sample, we fit a null model (fit0) that assumes constant expression across the genome (y=α), we will use this later to compare if using circular model (y = α + β cos(θ) + γ sin(θ)) improves it significantly the fit.
We parametrized the circular model using sine and cosine basis functions, allowing the sinusoidal pattern to be estimated within a linear regression framework. This avoids nonlinear optimization (\(y = \alpha + ACos(\theta-\phi)\)) over amplitude and phase parameters.
The non linear model starts by representing the bin as follows: \[A(\cos(\theta-\phi))=A\cos(\theta)\cos(\phi)+A\sin(\theta)\sin(\phi)\] We defined: \[\beta = A\cos{\phi}\] \[\gamma=A\sin(\phi)\] With this transformation we represent the same model with a different parameterization. From the previous analysis we can set as useful information that: \[A=\sqrt{\beta^2+\gamma^2}\] That give us the amplitude And: \[\phi=\operatorname{atan2}(\gamma, \beta)\] That give us the phase, so the position in the circular chromosome in dependence of \(\gamma\) and \(\beta\). We at this point can use also a theoretical treatment that define the phase duration as C= DNA synthesis phase and D= predivisional phase. The doubling time is \(\tau = \frac{2}{growthrate}\) and the position of a locus is \(n_a= 2^{\frac{p_aC+D}{\tau}}\) where \(n_a\) is the copy number of locus \(a\) positioned at a distance \(p_a\) from the terminus of the chromosome. Due to this we can arrive that: \[log_2\frac{n_{ori}}{n_{ter}}=\frac{C}{\tau}=\frac{C\mu}{ln(2)}\]
In our case we can say that, if expression is proportional to copy
number, then in \(log_2\) space the
difference between maximal and minimal expression across the chromosome
corresponds to: \[y_{max}-y_{min}=2A=\delta\] Thus, the
fitted amplitude A provides a quantitative estimate of the
replication-associated expression gradient. We can retrive the position
due to the fact that we have \(\phi\),
where the expression is maximum and minimum in radiants; to apply the
model to all the samples, we used the fit_one_sample function that we’ve
just created. This function operates on one sample’s binned data, so in
the “results” variable for each sample, there are all the parameters of
the fitted model explained above.
In order to evaluate if there is statistical evidence to reject the null
model “fit0”, we used an anova F test to compute p value (p_full), then
corrected with BH method for multiple testing correction (q_full)
fit_one_sample <- function(df, use_weights=TRUE) {
w <- if (use_weights) df$n_genes else NULL
fit0 <- lm(y ~ 1, data=df, weights=w)
fitFull <- lm(y ~ cos(theta_bin) + sin(theta_bin), data=df, weights=w)
# Reduced models (for later comparison)
fitCos <- lm(y ~ cos(theta_bin), data=df, weights=w)
fitSin <- lm(y ~ sin(theta_bin), data=df, weights=w)
p_full <- anova(fit0, fitFull)$`Pr(>F)`[2]
p_cos <- anova(fit0, fitCos )$`Pr(>F)`[2]
p_sin <- anova(fit0, fitSin )$`Pr(>F)`[2]
coefs <- coef(summary(fitFull))
alpha <- coefs["(Intercept)", "Estimate"]
beta <- coefs["cos(theta_bin)", "Estimate"]
gamma <- coefs["sin(theta_bin)", "Estimate"]
# amplitude/phase representation:
# y(theta) = alpha + A cos(theta - phi)
A <- sqrt(beta^2 + gamma^2)
phi <- atan2(gamma, beta) # [-pi, pi]
delta <- 2*A # naive: max-min in log space
theta_max <- (phi %% (2*pi)) # where cos(theta-phi) is max
theta_min <- (theta_max + pi) %% (2*pi)
tibble(alpha, beta, gamma, A, phi, delta, theta_max, theta_min, p_full, p_cos, p_sin)
}
results <- bin_expr %>%
group_by(sample_id) %>%
group_modify(~ fit_one_sample(.x, use_weights=TRUE)) %>%
ungroup() %>%
mutate(
q_full = p.adjust(p_full, method="BH"),
q_cos = p.adjust(p_cos, method="BH"),
q_sin = p.adjust(p_sin, method="BH")
) %>%
as_tibble()
sig_q <- !is.na(results$q_full) & results$q_full < 0.05
head(results)
## # A tibble: 6 × 15
## sample_id alpha beta gamma A phi delta theta_max theta_min
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 p1k_00001 -0.0169 -0.00572 0.00283 0.00638 2.68 0.0128 2.68 5.82
## 2 p1k_00002 0.0169 0.00572 -0.00283 0.00638 -0.460 0.0128 5.82 2.68
## 3 p1k_00003 0.168 -0.00762 0.0792 0.0796 1.67 0.159 1.67 4.81
## 4 p1k_00004 0.160 -0.00587 0.0747 0.0749 1.65 0.150 1.65 4.79
## 5 p1k_00005 -0.0992 -0.0201 -0.00811 0.0217 -2.76 0.0434 3.52 0.383
## 6 p1k_00006 -0.106 -0.0196 -0.00540 0.0203 -2.87 0.0407 3.41 0.269
## # ℹ 6 more variables: p_full <dbl>, p_cos <dbl>, p_sin <dbl>, q_full <dbl>,
## # q_cos <dbl>, q_sin <dbl>
The histogram shows the distribution of amplitude values across all
samples. Most amplitudes cluster at low values, and the distribution is
clearly right-skewed, with a long tail toward larger amplitudes. The rug
marks indicate where significant fits (q_full < 0.05)
fall along the x-axis: they are sparse at very low amplitudes and become
increasingly frequent as amplitude increases, concentrating mainly in
the upper part of the distribution.
This scatter plot shows the fitted \(\beta\) (cosine component) against \(\gamma\) (sine component) across samples,
with points colored by q_full. The cloud is densest near
the origin, indicating that most samples have small estimated
coefficients. As expected, the most significant fits concentrate away
from (0,0), while points near the origin are predominantly
non-significant. The distribution is not isotropic: the sample cloud is
elongated along a negative diagonal, suggesting that \(\beta\) and \(\gamma\) may not be independent in practice
and tend to co-vary with opposite signs in many samples.
How are these significant values distributed? In the \((\beta,\gamma)\) plane, significant samples
are primarily enriched at larger distances from the origin
(equivalently, larger \(A\)),
indicating that significance largely tracks effect size. In our data,
the median \(A\) increases from 0.0859
in non-significant samples to 0.214 in significant samples
(q_full < 0.05; Wilcoxon p-value \(= 5.89\times10^{-38}\)). To assess whether
there is also a preferred direction (i.e., a consistent peak location
across samples), we examine the angular distribution (phi)
among significant samples; the resulting histogram shows clear
clustering rather than a uniform spread, suggesting that many
significant samples share similar peak directions.
What happens if the models account only for the cosine or only for the sine term? The reduced models typically retain significance only for samples whose signal is sufficiently aligned with that single basis term. In our dataset, the full model identifies 95 significant samples, whereas the cosine-only and sine-only models identify 24 and 39, respectively. Importantly, 37 samples are significant under the full model but not under either reduced model, consistent with signals that require both sine and cosine terms (i.e., peak directions not aligned with a single basis). Conversely, only 5 samples are significant under the sine-only model but not under the full model, which is consistent with borderline cases where the 1-df reduced model can be slightly more powerful than the 2-df full model.
What does the estimated parameter mean? The pair \((\beta,\gamma)\) jointly encodes magnitude
and direction: the distance to the origin reflects effect size (strength
of the fitted pattern), while the angle (phi) reflects
where along the circle the maximum/minimum occurs. Therefore, points
farther from (0,0) correspond to stronger fitted patterns, while
different quadrants/angles correspond to different peak positions.
Do you think one can derive a quantitative relationship of the
parameter from the model and the growth rate? Under the
replication-driven assumptions, larger effect sizes (e.g., \(A\) or \(2A\)) should tend to be associated with
higher growth rate \(\mu\). } Overall,
these comparisons support using q_full as the primary significance
criterion in the remainder of the analysis. The full model is
phase-invariant (it captures signal regardless of whether it aligns with
the cosine or sine basis) and therefore detects substantially more
significant samples than either reduced model (95 vs 24/39). The
presence of many ‘full-only’ cases (37 samples significant only under
q_full) indicates that restricting to a single basis term would miss a
large fraction of signals whose peak direction is not aligned with that
term. While a small number of borderline cases can be recovered by a
1-df reduced model (5 samples significant only under q_sin), the overall
gain in sensitivity and robustness makes q_full the most appropriate and
conservative choice for downstream analyses.
## # A tibble: 2 × 2
## group median_A
## <chr> <dbl>
## 1 non-significant (q_full >= 0.05) 0.0859
## 2 significant (q_full < 0.05) 0.214
## # A tibble: 1 × 2
## test p_value
## <chr> <dbl>
## 1 Wilcoxon rank-sum test: A ~ (q_full < 0.05) 5.89e-38
## # A tibble: 5 × 2
## category n
## <chr> <int>
## 1 Significant (full model): q_full < 0.05 95
## 2 Significant (cos-only): q_cos < 0.05 24
## 3 Significant (sin-only): q_sin < 0.05 39
## 4 Full-only: q_full < 0.05 AND q_cos >= 0.05 AND q_sin >= 0.05 37
## 5 Sin-only not full: q_sin < 0.05 AND q_full >= 0.05 5
## Cos-only significant (q_cos<0.05)
## Full significant (q_full<0.05) FALSE TRUE
## FALSE 940 0
## TRUE 71 24
## Sin-only significant (q_sin<0.05)
## Full significant (q_full<0.05) FALSE TRUE
## FALSE 935 5
## TRUE 61 34
i <- which(names(meta_df) == "")[1]
names(meta_df)[i] <- "sample_id"
merged_results <- dplyr::left_join(
results,
dplyr::distinct(meta_df, sample_id, .keep_all = TRUE),
by = "sample_id"
)
To anchor the circular expression model to a biologically meaningful reference point, we determined the genomic position of the replication origin (oriC). In E. coli, the dnaA gene is located immediately adjacent to oriC and was therefore used as a proxy for the origin position. The dnaA locus was identified from the genome annotation, and its midpoint was computed as:
\[ori_{pos} = \frac{\text{start}_{\text{dnaA}} + \text{end}_{\text{dnaA}}}{2}.\]
This genomic coordinate was then mapped onto the circular genome using:
\[ \theta_{\text{ori}} = \frac{2\pi \cdot (ori_{pos} \equiv L)}{L}\]
where \(L\) is the chromosome length. The resulting angular coordinate defines the position of oriC within the circular regression framework and enables computation of the origin–terminus expression contrast.
ori_row <- genes_main[grepl("(^|;)gene=dnaA(;|$)|(^|;)Name=dnaA(;|$)", attr)]
if (nrow(ori_row) == 0) {
print(head(genes_main[grepl("dnaA", attr)]$attr, 10))
stop("dnaA not found with gene=/Name=. Inspect attr above and adjust grepl().")
}
ori_pos <- (ori_row$start[1] + ori_row$end[1]) / 2
theta_ori <- (2*pi*(ori_pos %% L))/L
cat("theta_ori (radians):", theta_ori, "\n")
## theta_ori (radians): 5.256298
At this stage we have already identified, for each sample, the peak location of the fitted replication-related pattern (\(\theta_{max}\)) and we restrict the analysis to samples with a significant full model (\(q_{full} < 0.05\)). The key question is whether these peak angles are randomly distributed along the circular chromosome or show a consistent preferred direction relative to the replication origin. The Rayleigh test strongly rejects the hypothesis of a uniform angular distribution, indicating that the peaks are not random but instead concentrate around a specific direction. However, the directed V-tests show that this concentration is not toward our oriC reference angle \(\theta_{ori}\), but rather toward the opposite direction \(\theta_{ori}+\pi\). In other words, the fitted “peak direction” is systematically shifted by approximately half a chromosome relative to the origin anchor. This empirical result matters for the downstream quantification of the origin–terminus contrast: it implies that a naive definition based on the fitted maximum/minimum locations (e.g., using \(\theta_{max}\) and \(\theta_{min}\)) would not reliably represent an ori-centered replication gradient in this dataset. Therefore, before proceeding to any technical derivation, these directionality tests motivate an explicit ori-anchored contrast that compares expression at \(\theta_{ori}\) versus \(\theta_{ori}+\pi\).
if (!requireNamespace("circular", quietly = TRUE)) install.packages("circular")
if (!requireNamespace("CircStats", quietly = TRUE)) install.packages("CircStats")
library(circular)
##
## Adjuntando el paquete: 'circular'
## The following objects are masked from 'package:stats':
##
## sd, var
library(CircStats)
## Cargando paquete requerido: MASS
##
## Adjuntando el paquete: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Cargando paquete requerido: boot
##
## Adjuntando el paquete: 'CircStats'
## The following objects are masked from 'package:circular':
##
## A1, A1inv, deg, I.0, I.1, I.p, pp.plot, rad, rose.diag, rstable
sig_q <- !is.na(results$q_full) & results$q_full < 0.05 # We use only BH-significant samples from the full model
th <- (results$theta_max[sig_q]) %% (2*pi) # We need to transform the indormation in a circular form.
mu_ori <- (theta_ori) %% (2*pi)
mu_opp <- (mu_ori+pi) %% (2*pi)
ray <- circular::rayleigh.test(circular(th, units = "radians", modulo = "2pi")) #We perform the tests
vt_ori <- CircStats::v0.test(th, mu0 = mu_ori, degree = FALSE)
vt_opp <- CircStats::v0.test(th, mu0 = mu_opp, degree = FALSE)
mu_hat <- atan2(mean(sin(th)), mean(cos(th))) %% (2*pi)
Rbar <- sqrt(mean(cos(th))^2 + mean(sin(th))^2)
V_ori <- sum(cos(th - mu_ori)); rbar_ori <- V_ori / length(th)
V_opp <- sum(cos(th - mu_opp)); rbar_opp <- V_opp / length(th)
tibble(
n_sig = length(th),
theta_ori = mu_ori,
theta_opp = mu_opp,
mu_hat = mu_hat,
Rbar = Rbar,
rayleigh_p = unname(ray$p.value),
V_ori = V_ori,
rbar_ori = rbar_ori,
p_ori = unname(vt_ori$p.value),
V_opp = V_opp,
rbar_opp = rbar_opp,
p_opp = unname(vt_opp$p.value)
)
## # A tibble: 1 × 12
## n_sig theta_ori theta_opp mu_hat Rbar rayleigh_p V_ori rbar_ori p_ori V_opp
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 95 5.26 2.11 2.82 0.619 1.53e-16 -44.8 -0.471 1.000 44.8
## # ℹ 2 more variables: rbar_opp <dbl>, p_opp <dbl>
To obtain a biologically anchored estimate, we redefined the contrast directly on the fitted model as the difference between the predicted expression at the origin and at the antipodal position: \[\Delta_{ori}=y(\theta_{ori})-y(\theta_{ori}+\pi)= 2(\beta\cos{\theta_{ori}}+\gamma\sin{\theta_{ori}})\] which preserves the sign (ori higher vs opposite higher) and enforces a consistent ori-based reference across samples.
results <- results %>%
mutate(
delta_ori = 2 * (beta*cos(theta_ori) + gamma*sin(theta_ori)),
delta_opp = -delta_ori
)
merged_results <- merged_results %>%
dplyr::left_join(
dplyr::select(results, sample_id, delta_ori, delta_opp),
by = "sample_id"
)
sig_q <- !is.na(results$q_full) & results$q_full < 0.05
delta_summary <- results %>%
filter(sig_q) %>%
summarise(
n_sig = n(),
median_delta_naive = median(delta, na.rm=TRUE), # delta = 2A (max–min)
median_delta_ori = median(delta_ori, na.rm=TRUE), # ori-anchored contrast
cor_naive_vs_ori = cor(delta, delta_ori, use="complete.obs"),
prop_delta_ori_pos = mean(delta_ori > 0, na.rm=TRUE),
prop_delta_ori_neg = mean(delta_ori < 0, na.rm=TRUE)
)
cat("\n### Delta contrast summary (q_full < 0.05)\n")
##
## ### Delta contrast summary (q_full < 0.05)
delta_summary
## # A tibble: 1 × 6
## n_sig median_delta_naive median_delta_ori cor_naive_vs_ori prop_delta_ori_pos
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 95 0.428 -0.314 -0.166 0.221
## # ℹ 1 more variable: prop_delta_ori_neg <dbl>
cat("\n### Example rows: naive delta (2A) vs corrected delta_ori\n")
##
## ### Example rows: naive delta (2A) vs corrected delta_ori
results %>%
dplyr::filter(sig_q) %>%
dplyr::select(sample_id, q_full, A, delta, delta_ori) %>%
dplyr::arrange(q_full) %>%
head(10)
## # A tibble: 10 × 5
## sample_id q_full A delta delta_ori
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 p1k_00494 0.00158 0.222 0.445 -0.437
## 2 p1k_00493 0.00243 0.214 0.428 -0.421
## 3 p1k_00080 0.00324 0.203 0.406 -0.395
## 4 p1k_00475 0.00324 0.192 0.384 -0.380
## 5 p1k_00476 0.00324 0.205 0.409 -0.405
## 6 p1k_00052 0.00841 0.0866 0.173 0.173
## 7 p1k_00139 0.00841 0.193 0.385 0.384
## 8 p1k_00877 0.00841 0.354 0.707 -0.658
## 9 p1k_00288 0.00933 0.151 0.303 -0.254
## 10 p1k_00291 0.00933 0.161 0.321 -0.282
To evaluate whether the replication-associated expression gradient scales with bacterial growth, we integrated the growth-rate metadata provided for each sample. Growth rate was taken directly from the metadata column “Growth Rate (1/hr)” and converted to numeric. As indicated in the dataset documentation, samples with growth rate equal to 1 correspond to missing measurements and were excluded from downstream analyses. We further restricted the dataset to samples with finite estimates of the fitted effect size (A), the naive amplitude contrast (\(\Delta=2A\)), and the ori-anchored contrast (\(\Delta_{ori}\)). Finally, we defined the subset of statistically supported patterns using the BH-adjusted significance threshold on the full model (\(q_{full}<0.05\)), which is used in the subsequent association analyses with growth rate.
merged_results2 <- merged_results %>%
dplyr::mutate(
growth_rate = as.numeric(`Growth Rate (1/hr)`)
)
## Keep only samples with measured growth rate (exclude sentinel = 1) and finite contrasts
known <- merged_results2 %>%
dplyr::filter(
!is.na(growth_rate),
growth_rate != 1,
is.finite(delta_ori),
is.finite(delta),
is.finite(A)
)
## Significant fits (full model BH)
mask_q <- !is.na(known$q_full) & known$q_full < 0.05
## Quick look
known %>%
dplyr::select(sample_id, growth_rate, q_full, A, delta, delta_ori) %>%
dplyr::arrange(growth_rate) %>%
head(10)
## # A tibble: 10 × 6
## sample_id growth_rate q_full A delta delta_ori
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 p1k_00003 0 0.415 0.0796 0.159 -0.143
## 2 p1k_00004 0 0.463 0.0749 0.150 -0.134
## 3 p1k_00007 0 0.190 0.124 0.248 -0.240
## 4 p1k_00008 0 0.198 0.116 0.233 -0.224
## 5 p1k_00019 0 0.462 0.0752 0.150 -0.0730
## 6 p1k_00020 0 0.489 0.0707 0.141 -0.0652
## 7 p1k_00021 0 0.475 0.0599 0.120 0.000967
## 8 p1k_00022 0 0.553 0.0536 0.107 0.00127
## 9 p1k_00023 0 0.408 0.0760 0.152 -0.134
## 10 p1k_00024 0 0.385 0.0789 0.158 -0.130
Using the samples with measured growth rate (known), we tested whether the replication-associated signal scales with growth by correlating growth rate with (i) the naive amplitude contrast \(\Delta=2A\) and (ii) the ori-anchored contrast \(\Delta_{ori}\) defined above. Correlations were computed on all samples with growth-rate measurements and on the subset of samples with a significant circular signal (\(q<0.05\)).
In the left plot, (\(\delta=2A\)) mainly tells you how strong the wave is in each sample. Red points (significant fits) tend to have larger delta, which just means the wave is clearer. But growth rate does not increase in a clear way as delta increases. In the right plot, delta_ori tells you which side is higher. The positive positive side are the ones higher at the origin side, and the negative are higher on the opposite side.
Most non-significant samples sit near 0 (no clear difference). Significant samples spread away from 0, and many are negative, matching the earlier result that the peak is often opposite to the origin proxy. Therefore we keep using delta_ori, because it measures the specific “origin vs opposite” difference, not just the strength of the wave.
## # A tibble: 4 × 4
## set metric pearson spearman
## <chr> <chr> <dbl> <dbl>
## 1 ALL delta (=2A) -0.0315 -0.0654
## 2 ALL delta_ori 0.383 0.360
## 3 q_full<0.05 delta (=2A) 0.110 -0.00949
## 4 q_full<0.05 delta_ori 0.816 0.768
To quantify how informative the replication-related metrics are about bacterial growth, we framed the problem as a prediction task: given a sample’s replication signal estimated from the circular model, can we predict its measured growth rate \(\mu\) (1/hr)? We evaluated simple linear regression models because the project’s theoretical expectation suggests an approximately monotonic relationship between copy-number gradients and growth, and linear models are interpretable and allow a clean comparison between alternative replication metrics.
We compared four model specifications:
Because \(\Delta = 2A\), models \(m_A\) and \(m_\Delta\) are expected to behave identically (up to rescaling), and are effectively two views of the same phase-free effect-size information.
Then, to estimate out-of-sample performance, we used repeated 5-fold cross-validation (50 repetitions). In each repetition, samples were randomly split into 5 folds; each fold was predicted by a model trained on the other 4 folds, and predictions were pooled across folds to compute performance metrics. We report:
We evaluated performance on two datasets: First , all samples with valid growth-rate measurements (excluding the sentinel value \(\mu=1\) used for missing growth rates), and the second one sig_q: the subset with a statistically supported circular signal (\(q_{\mathrm{full}}<0.05\)). In the full dataset, the ori-anchored contrast \(\Delta_{\mathrm{ori}}\) provides measurable predictive value (positive \(R^2\)), whereas phase-free effect-size predictors (\(A\) or \(\Delta\)) perform worse than predicting the mean growth rate (negative \(R^2\)). This indicates that, once phase is ignored, “how strong the wave is” does not reliably map to growth across heterogeneous conditions; anchoring the contrast to ori carries substantially more growth-relevant information.
Restricting to samples with a clear circular replication signal strengthens the relationship: \(\Delta_{\mathrm{ori}}\) explains a substantial fraction of out-of-sample variance (\(R^2 \approx 0.59\)). Adding \(A\) slightly worsens performance, suggesting that once the contrast is correctly oriented (ori vs opposite), additional “overall amplitude” does not add independent predictive information and may introduce noise given the small sample size.
These results are consistent with the earlier directionality analysis: in this dataset the fitted peak location is not reliably aligned with \(\theta_{\mathrm{ori}}\), so phase-free metrics (\(A\) or \(\Delta\)) are not appropriate proxies for an origin–terminus gradient. In contrast, \(\Delta_{\mathrm{ori}}\) directly measures the fitted expression difference between ori and the opposite genomic position, preserving direction and remaining interpretable even when the peak is shifted. Therefore, in the remainder of the analysis we treat \(\Delta_{\mathrm{ori}}\) as the primary replication signal when relating the fitted sinusoid to growth rate.
# datasets
mask_q <- !is.na(known$q_full) & known$q_full < 0.05
datasets <- list(
ALL = known,
sig_q = known[mask_q, , drop = FALSE]
)
# models (keep it small and interpretable)
models <- list(
delta_ori = growth_rate ~ delta_ori,
delta = growth_rate ~ delta,
A = growth_rate ~ A,
delta_ori_A = growth_rate ~ delta_ori + A
)
# repeated 5-fold CV
cv_rep <- function(df, form, k = 5, R = 50, seed = 1) {
set.seed(seed)
n <- nrow(df)
out <- vector("list", R)
for (r in seq_len(R)) {
fold <- sample(rep(seq_len(k), length.out = n))
y <- df$growth_rate
pred <- rep(NA_real_, n)
for (f in seq_len(k)) {
fit <- lm(form, data = df[fold != f, , drop = FALSE])
pred[fold == f] <- predict(fit, newdata = df[fold == f, , drop = FALSE])
}
res <- y - pred
rmse <- sqrt(mean(res^2, na.rm = TRUE))
mae <- mean(abs(res), na.rm = TRUE)
r2 <- 1 - sum(res^2, na.rm = TRUE) / sum((y - mean(y, na.rm = TRUE))^2, na.rm = TRUE)
out[[r]] <- tibble(RMSE = rmse, MAE = mae, R2 = r2)
}
dplyr::bind_rows(out)
}
# run CV
cv_all <- list()
for (ds in names(datasets)) {
for (m in names(models)) {
cv_all[[paste(ds, m, sep = "__")]] <-
cv_rep(datasets[[ds]], models[[m]]) %>%
dplyr::mutate(dataset = ds, model = m, n = nrow(datasets[[ds]]))
}
}
cv_df <- dplyr::bind_rows(cv_all)
# summary table (main deliverable)
cv_summary <- cv_df %>%
dplyr::group_by(dataset, model, n) %>%
dplyr::summarise(
RMSE = mean(RMSE, na.rm = TRUE),
MAE = mean(MAE, na.rm = TRUE),
R2 = mean(R2, na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::arrange(dataset, RMSE)
cv_summary
## # A tibble: 8 × 6
## dataset model n RMSE MAE R2
## <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 ALL delta_ori 353 0.244 0.179 0.134
## 2 ALL delta_ori_A 353 0.245 0.179 0.128
## 3 ALL A 353 0.265 0.193 -0.0152
## 4 ALL delta 353 0.265 0.193 -0.0152
## 5 sig_q delta_ori 27 0.214 0.145 0.594
## 6 sig_q delta_ori_A 27 0.224 0.159 0.552
## 7 sig_q A 27 0.373 0.300 -0.241
## 8 sig_q delta 27 0.373 0.300 -0.241
# RMSE distributions (compact visual comparison)
library(ggplot2)
library(dplyr)
cv_df$model <- factor(cv_df$model,
levels = c("A","delta","delta_ori","delta_ori_A"))
p <- cv_df %>%
filter(dataset %in% c("ALL","sig_q")) %>%
mutate(dataset = ifelse(dataset=="ALL",
"ALL samples",
"Significant (q < 0.05)")) %>%
ggplot(aes(x=model, y=RMSE, fill=model)) +
geom_boxplot(alpha=0.8, outlier.alpha=0.3) +
facet_wrap(~dataset, scales="free_y") +
scale_fill_brewer(palette="Blues") +
theme_minimal(base_size = 13) +
theme(
legend.position="none",
axis.text.x = element_text(angle=45, hjust=1),
panel.grid.minor = element_blank()
) +
labs(
title = "Model comparison via repeated cross-validation",
y = "RMSE",
x = "Model"
)
print(p)
Based on the cross-validation results, we retained the simplest and best-performing specification and fitted the final model on the full dataset \(growth_rate \sim \Delta_{ori}\).
Across all samples with measured growth rate (\(n=353\)), \(\Delta_{ori}\) shows a clear positive association with growth (slope \(=0.660\), \(p=9.35\times 10^{-14}\)), explaining a modest but non-negligible fraction of variance (\(R^2=0.146\)). This is expected in a heterogeneous compendium where many factors beyond replication-related gene dosage contribute to growth.
We then fitted the same model on a higher-confidence subset restricted to samples with a statistically significant circular signal (\(q_{full}<0.05\); \(n=27\)). In this subset the relationship strengthens substantially (slope \(=0.881\), \(p=2.19\times 10^{-7}\)), and the model explains roughly two thirds of the observed growth-rate variance (\(R^2=0.665\)). Overall, these final fits confirm that the ori-anchored replication contrast \(\Delta_{ori}\) is an interpretable predictor of growth, and that its explanatory power is highest when the replication-associated sinusoidal signal is clearly detectable in the transcriptome.
# Final model (ALL samples with growth rate)
fit_all <- lm(growth_rate ~ delta_ori, data = known)
# High-confidence subset (q_full < 0.05)
mask_q <- !is.na(known$q_full) & known$q_full < 0.05
fit_sigq <- lm(growth_rate ~ delta_ori, data = known[mask_q, , drop = FALSE])
summary(fit_all)
##
## Call:
## lm(formula = growth_rate ~ delta_ori, data = known)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.63878 -0.10467 0.00721 0.11575 0.78458
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.63794 0.01369 46.58 < 2e-16 ***
## delta_ori 0.66021 0.08508 7.76 9.35e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2434 on 351 degrees of freedom
## Multiple R-squared: 0.1464, Adjusted R-squared: 0.144
## F-statistic: 60.22 on 1 and 351 DF, p-value: 9.345e-14
summary(fit_sigq)
##
## Call:
## lm(formula = growth_rate ~ delta_ori, data = known[mask_q, ,
## drop = FALSE])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30299 -0.09349 -0.01470 0.02464 0.49837
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.74656 0.04079 18.301 5.43e-16 ***
## delta_ori 0.88111 0.12503 7.047 2.19e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2018 on 25 degrees of freedom
## Multiple R-squared: 0.6652, Adjusted R-squared: 0.6518
## F-statistic: 49.66 on 1 and 25 DF, p-value: 2.187e-07
As an extra check on the final model (\(growth\_rate \sim \Delta_{ori}\)), we generated out-of-fold (OOF) predictions with a single 5-fold split (one prediction per sample, always from a model trained without that sample). This complements the repeated CV summary by showing how the errors look, not only how big they are.
The cloud follows the expected upward trend, so \(\Delta_{ori}\) carries real signal. However, predictions are pulled toward the average: very low growth tends to be overestimated and very high growth tends to be underestimated. This “regression to the mean” is typical for a simple one-feature linear model.
Residuals are centered around zero with no strong curved pattern, which supports a roughly linear relationship at this resolution. The spread is not perfectly constant and a few points sit far from zero, suggesting remaining variability driven by factors not included in the model (condition-to-condition differences, noise, or other biology beyond replication signal).
Residuals are broadly symmetric but not perfectly tight, consistent with a model that captures a real effect while leaving substantial unexplained variation in the full dataset.
Overall, the diagnostics match what we saw in cross-validation: \(\Delta_{ori}\) is a meaningful predictor, but a single linear term cannot fully explain growth across all experimental conditions.
set.seed(123)
df <- known
k <- 5
# Assign folds once, produce one OOF prediction per sample
fold_id <- sample(rep(1:k, length.out = nrow(df)))
pred_oof <- rep(NA_real_, nrow(df))
for (f in 1:k) {
fit <- lm(growth_rate ~ delta_ori, data = df[fold_id != f, , drop=FALSE])
pred_oof[fold_id == f] <- predict(fit, newdata = df[fold_id == f, , drop=FALSE])
}
resid_oof <- df$growth_rate - pred_oof
rmse_oof <- sqrt(mean(resid_oof^2, na.rm=TRUE))
mae_oof <- mean(abs(resid_oof), na.rm=TRUE)
r2_oof <- 1 - sum(resid_oof^2, na.rm=TRUE) /
sum((df$growth_rate - mean(df$growth_rate, na.rm=TRUE))^2, na.rm=TRUE)
tibble(
model = "OOF 5-fold | growth_rate ~ delta_ori",
RMSE = rmse_oof,
MAE = mae_oof,
R2 = r2_oof
)
## # A tibble: 1 × 4
## model RMSE MAE R2
## <chr> <dbl> <dbl> <dbl>
## 1 OOF 5-fold | growth_rate ~ delta_ori 0.244 0.179 0.138
# Plots
cols <- col_q_fun(pmin(pmax(df$q_full, 0), 1))
par(mfrow=c(1,3), mar=c(4,4,2,1))
plot(df$growth_rate, pred_oof,
pch=16, col=cols,
xlab="Observed growth rate (1/hr)",
ylab="OOF predicted growth rate (1/hr)",
main="OOF predicted vs observed")
abline(0, 1, lty=2, lwd=2)
plot(pred_oof, resid_oof, pch=16, col="grey50",
xlab="OOF fitted", ylab="Residual",
main="Residuals vs fitted (OOF)")
abline(h=0, lty=2)
hist(resid_oof, breaks=40, col="grey90", border="grey60",
main="Residuals (OOF)", xlab="Residual")
par(mfrow=c(1,1))
To evaluate whether the replication signal can separate fast vs slow growth, we turned growth rate into a binary label: fast is the top 25% of observed growth rates, slow the remaining 75%.
We then used the OOF predicted growth rates as a continuous score and computed an ROC curve across all thresholds. The AUC summarizes discrimination: it is the probability that a randomly chosen fast sample gets a higher score than a randomly chosen slow sample (\(AUC=0.5\) is random, \(AUC \to 1\) is strong separation).
The ROC curve lies consistently above the diagonal, indicating genuine out-of-sample discriminative signal. However, the separation is only moderate (AUC ≈ 0.69): many fast and slow samples overlap in predicted scores, so achieving high sensitivity requires accepting a non-trivial false-positive rate. Overall, \(\Delta_{ori}\) captures enough replication-related information to rank growth phenotypes better than chance, but it is not a strong standalone classifier in this dataset.
## Type 'citation("pROC")' for a citation.
##
## Adjuntando el paquete: 'pROC'
## The following object is masked from 'package:circular':
##
## var
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
##
## AUC (fast = top 25% by observed growth): 0.6884576
Samples with missing growth-rate measurements were identified using the sentinel value \(growth_rate = 1\). For these samples, we predicted growth rate from the ori-anchored replication contrast \(\Delta_{ori}\) using the final regression model fitted on all labeled samples (\(growth_rate \sim \Delta_{ori}\)). We also computed 95% prediction intervals to reflect sample-level uncertainty. For unlabeled samples that also show a statistically supported circular signal (\(q_{full}<0.05\)), we additionally generated predictions from the model fitted only on the significant subset (same specification, trained on high-confidence samples). The output table reports the final prediction, its interval, and which model generated it.
## # A tibble: 20 × 8
## sample_id q_full delta_ori A pred_final pred_final_lo pred_final_hi
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 p1k_00486 0.0222 0.542 0.297 1.22 0.770 1.68
## 2 p1k_00485 0.0224 0.542 0.296 1.22 0.770 1.68
## 3 p1k_00462 0.0330 0.504 0.319 1.19 0.739 1.64
## 4 p1k_00461 0.0336 0.500 0.317 1.19 0.737 1.64
## 5 p1k_00471 0.0312 0.493 0.308 1.18 0.731 1.63
## 6 p1k_00826 0.157 0.812 0.454 1.17 0.673 1.67
## 7 p1k_00472 0.0378 0.468 0.300 1.16 0.711 1.61
## 8 p1k_00840 0.190 0.753 0.416 1.13 0.637 1.63
## 9 p1k_00841 0.181 0.750 0.418 1.13 0.636 1.63
## 10 p1k_00830 0.299 0.680 0.407 1.09 0.592 1.58
## 11 p1k_00831 0.300 0.670 0.399 1.08 0.586 1.57
## 12 p1k_00839 0.341 0.644 0.366 1.06 0.570 1.56
## 13 p1k_00845 0.361 0.613 0.369 1.04 0.551 1.53
## 14 p1k_00835 0.336 0.605 0.359 1.04 0.545 1.53
## 15 p1k_00834 0.336 0.596 0.356 1.03 0.540 1.52
## 16 p1k_00865 0.0202 0.319 0.306 1.03 0.591 1.46
## 17 p1k_00838 0.377 0.584 0.340 1.02 0.533 1.51
## 18 p1k_00851 0.310 0.582 0.367 1.02 0.531 1.51
## 19 p1k_00776 0.0360 0.308 0.161 1.02 0.582 1.45
## 20 p1k_00850 0.317 0.573 0.363 1.02 0.526 1.51
## # ℹ 1 more variable: pred_source <chr>
## # A tibble: 1 × 6
## n_unknown n_sig_q n_all pred_min pred_med pred_max
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 682 68 614 0.161 0.577 1.22
To make the circular model easier to interpret, we plotted three representative samples: (i) a fast-growing sample with a significant circular replication signal, (ii) a slow / weak-signal sample, and (iii) one unlabeled sample (missing growth rate) when available.
For each sample we show three panels:
Expression vs genomic angle (\(\theta\)): binned mean logTPM across the
genome with the fitted sinusoid, plus reference lines at \(\theta_{ori}\) and \(\theta_{ori}+\pi\) (opposite side), and the
fitted \(\theta_{max}\) / \(\theta_{min}\).
Polar view: the same information mapped to a circle to emphasize genome
continuity.
Compact summary: signal strength and orientation (\(A\), \(\Delta=2A\), \(\Delta_{ori}\)), model significance (\(q_{full}\)), observed growth rate when
available, and the predicted growth rate with a 95% prediction interval.
Predictions follow the same rule used before: if \(q_{full}<0.05\) we use the \(sig\_q\) model; otherwise we use the ALL
model.
These plots provide an intuitive link between the fitted circular expression gradient and how \(\Delta_{ori}\) is used to predict growth.