Simulate experiments in which abundances of rhythmic and non-rhythmic features are measured at multiple timepoints in one or more conditions.
Usage
simphony(
featureGroupsList,
fracFeatures = NULL,
nFeatures = 10,
timepointsType = c("auto", "specified", "random"),
timeRange = c(0, 48),
interval = 2,
nReps = 1,
timepoints = NULL,
nSamplesPerCond = NULL,
rhyFunc = sin,
dispFunc = NULL,
logOdds = FALSE,
family = c("gaussian", "negbinom", "bernoulli", "poisson")
)
Arguments
- featureGroupsList
data.frame
ordata.table
(for a single condition) or list ofdata.frame
s ordata.table
s (for multiple conditions), where each row corresponds to a group of features to simulate. The following columns are all optional:- fracFeatures
Fraction of simulated features to allocate to each group. Defaults to 1/(number of groups).
- rhyFunc
Function to generate rhythmic abundance. Must have a period of \(2\pi\). Defaults to
sin
.- amp
Amplitude of rhythm. Defaults to 0. Corresponds to multiplicative term in front of
rhyFunc
. Can be numeric (constant over time) or a function (time-dependent). See vignette for examples.- period
Period of rhythm. Defaults to 24.
- phase
Phase of rhythm, in the same units as
period
. Defaults to 0. Corresponds to an additive term inrhyFunc
.- base
Baseline abundance, i.e., abundance when
rhyFunc
term is 0. Depending onfamily
, defaults to 0 ('gaussian'), 8 ('negbinom', mean log2 counts), 0 ('bernoulli' withlogOdds
asTRUE
), 0.5 ('bernoulli' iflogOdds
asFALSE
), or 1 ('poisson'). Can be numeric (constant over time) or a function (time-dependent). See vignette for examples.- sd
Standard deviation of sampled abundance values. Defaults to 1. Only used if
family
is 'gaussian'.- dispFunc
Function to calculate dispersion of sampled abundance values, given expected abundance in counts. Only used if
family
is 'negbinom'.
- fracFeatures
Fraction of simulated features to allocate to each group. Defaults to 1/(number of groups). Only used if the first
featureGroupsList
data.frame
lacks afracFeatures
column.- nFeatures
Integer for the total number of features to simulate.
- timepointsType
Character string for how to set the timepoints for the simulation. Must be 'auto' (default), 'specified', or 'random'.
- timeRange
Numeric vector for the range of timepoints to use for the simulation. Defaults to c(0, 48). Only used if
timepointsType
is 'auto' or 'random'.- interval
Number for the amount of time between consecutive timepoints, in the same units as
period
. The first timepoint is 0. Only used iftimepointsType
is 'auto'.- nReps
Integer for the number of replicates per timepoint. Only used if
timepointsType
is 'auto'.- timepoints
Numeric vector of exact timepoints to simulate, including any replicates. Only used if
timepointsType
is 'specified'.- nSamplesPerCond
Integer for the number of samples per condition, which will be randomly uniformly spaced between 0 and
period
and different for each condition. Only used if timepointsType is 'random'.- rhyFunc
Function to generate rhythmic abundance. Must have a period of \(2\pi\). Defaults to
sin
. Only used if adata.frame
infeatureGroupsList
lacks arhyFunc
column.- dispFunc
Function to calculate dispersion of sampled abundance values, given expected abundance in counts. Defaults to
defaultDispFunc
. Only used iffamily
is 'negbinom' and adata.frame
infeatureGroupsList
lacks adispFunc
column.- logOdds
Logical for whether the rhythmic function corresponds to log-odds. Only used if
family
is 'bernoulli'.- family
Character string for the family of distributions from which to sample the abundance values.
simphony
will give a warning if it tries to sample from a distribution outside the region in which the distribution is defined: \(\mu < 0\) for negative binomial and Poisson, and \(\mu < 0\) or \(\mu > 1\) for Bernoulli.
Value
List with the following elements:
- abundData
Matrix of abundance values (counts, if
family
is 'negbinom'), with features as rownames and samples as colnames.- sampleMetadata
data.table
with one row per sample.- featureMetadata
data.table
with one row per feature per condition. Columnsamp
andbase
are functions of time. Columnsamp0
andbase0
are numeric and correspond to the amplitude and baseline abundance at time 0, respectively.- experMetadata
List of arguments that were passed to
simphony
.
Examples
library('data.table')
# Simulate data for features having one of three sets of rhythmic parameters.
featureGroups = data.table(amp = c(0, 1, 1), phase = c(0, 0, 6),
rhyFunc = c(cos, cos, sin))
simData = simphony(featureGroups)
# Simulate data for an experiment with specified timepoints and replicates.
featureGroups = data.table(amp = c(0, 1))
simData = simphony(featureGroups, timepointsType = 'specified',
timepoints = c(0, 2, 2, 4, 12, 16, 21))
# Simulate data for an experiment with random timepoints between 0 and 24.
featureGroups = data.table(amp = c(0, 2))
simData = simphony(featureGroups, timepointsType = 'random',
timeRange = c(0, 24), nSamplesPerCond = 20)
# Simulate data with time-dependent rhythm amplitude or baseline abundance
featureGroups = data.table(amp = c(function(x) 1, function(x) 2^(-x / 24)),
base = c(function(x) x / 12, function(x) 0))
simData = simphony(featureGroups)
# Simulate data for features whose rhythmicity varies between two conditions.
featureGroupsList = list(
data.table(amp = c(1, 2, 2), phase = c(0, -3, 0), period = c(24, 24, 22)),
data.table(amp = c(3, 2, 2), phase = c(0, 3, 0), period = c(24, 24, 26)))
simData = simphony(featureGroupsList)
# Simulate data from a negative binomial distribution with a higher variance.
featureGroups = data.table(amp = 1, base = 6:8)
dispFunc = function(x) 3 * defaultDispFunc(x)
simData = simphony(featureGroups, family = 'negbinom', dispFunc = dispFunc)
# Simulate data at high temporal resolution from a Poisson distribution that
# alternates between two states.
featureGroups = data.table(amp = 1, base = 0,
rhyFunc = function(x) ifelse(x %% (2 * pi) < pi, 0.5, 4))
simData = simphony(featureGroups, timeRange = c(0, 24 * 4), interval = 0.1,
nReps = 1, family = 'poisson')
# Simulate data for 100 features, half non-rhythmic and half rhythmic, with
# amplitudes for rhythmic features sampled from a log-normal distribution.
nFeatures = 100
rhyFrac = 0.5
nRhyFeatures = round(rhyFrac * nFeatures)
rhyAmps = exp(rnorm(nRhyFeatures, mean = 0, sd = 0.25))
fracFeatures = c(1 - rhyFrac, rep(rhyFrac / nRhyFeatures, nRhyFeatures))
featureGroups = data.table(amp = c(0, rhyAmps), fracFeatures = fracFeatures)
simData = simphony(featureGroups, nFeatures = nFeatures)
# Simulate data for 100 rhythmic features, with baseline log2 expected counts
# and residual log dispersion sampled from distributions whose parameters
# were estimated, using DESeq2 and fitdistrplus, from circadian RNA-seq data
# from mouse liver (PRJNA297287).
nFeatures = 100
baseLog2Counts = rnorm(nFeatures, mean = 8.63, sd = 2.73)
dispFactors = exp(rnorm(nFeatures, sd = 0.819))
dispFuncs = sapply(dispFactors, function(z) {function(x) defaultDispFunc(x) * z})
featureGroups = data.table(base = baseLog2Counts, dispFunc = dispFuncs, amp = 1)
simData = simphony(featureGroups, nFeatures = nFeatures, family = 'negbinom')