Simulate experiments in which abundances of rhythmic and non-rhythmic features are measured at multiple timepoints in one or more conditions.
Usage
simphony(
featureGroupsList,
fracFeatures = NULL,
nFeatures = 10,
timepointsType = c("auto", "specified", "random"),
timeRange = c(0, 48),
interval = 2,
nReps = 1,
timepoints = NULL,
nSamplesPerCond = NULL,
rhyFunc = sin,
dispFunc = NULL,
logOdds = FALSE,
family = c("gaussian", "negbinom", "bernoulli", "poisson")
)Arguments
- featureGroupsList
data.frameordata.table(for a single condition) or list ofdata.frames ordata.tables (for multiple conditions), where each row corresponds to a group of features to simulate. The following columns are all optional:- fracFeatures
Fraction of simulated features to allocate to each group. Defaults to 1/(number of groups).
- rhyFunc
Function to generate rhythmic abundance. Must have a period of \(2\pi\). Defaults to
sin.- amp
Amplitude of rhythm. Defaults to 0. Corresponds to multiplicative term in front of
rhyFunc. Can be numeric (constant over time) or a function (time-dependent). See vignette for examples.- period
Period of rhythm. Defaults to 24.
- phase
Phase of rhythm, in the same units as
period. Defaults to 0. Corresponds to an additive term inrhyFunc.- base
Baseline abundance, i.e., abundance when
rhyFuncterm is 0. Depending onfamily, defaults to 0 ('gaussian'), 8 ('negbinom', mean log2 counts), 0 ('bernoulli' withlogOddsasTRUE), 0.5 ('bernoulli' iflogOddsasFALSE), or 1 ('poisson'). Can be numeric (constant over time) or a function (time-dependent). See vignette for examples.- sd
Standard deviation of sampled abundance values. Defaults to 1. Only used if
familyis 'gaussian'.- dispFunc
Function to calculate dispersion of sampled abundance values, given expected abundance in counts. Only used if
familyis 'negbinom'.
- fracFeatures
Fraction of simulated features to allocate to each group. Defaults to 1/(number of groups). Only used if the first
featureGroupsListdata.framelacks afracFeaturescolumn.- nFeatures
Integer for the total number of features to simulate.
- timepointsType
Character string for how to set the timepoints for the simulation. Must be 'auto' (default), 'specified', or 'random'.
- timeRange
Numeric vector for the range of timepoints to use for the simulation. Defaults to c(0, 48). Only used if
timepointsTypeis 'auto' or 'random'.- interval
Number for the amount of time between consecutive timepoints, in the same units as
period. The first timepoint is 0. Only used iftimepointsTypeis 'auto'.- nReps
Integer for the number of replicates per timepoint. Only used if
timepointsTypeis 'auto'.- timepoints
Numeric vector of exact timepoints to simulate, including any replicates. Only used if
timepointsTypeis 'specified'.- nSamplesPerCond
Integer for the number of samples per condition, which will be randomly uniformly spaced between 0 and
periodand different for each condition. Only used if timepointsType is 'random'.- rhyFunc
Function to generate rhythmic abundance. Must have a period of \(2\pi\). Defaults to
sin. Only used if adata.frameinfeatureGroupsListlacks arhyFunccolumn.- dispFunc
Function to calculate dispersion of sampled abundance values, given expected abundance in counts. Defaults to
defaultDispFunc. Only used iffamilyis 'negbinom' and adata.frameinfeatureGroupsListlacks adispFunccolumn.- logOdds
Logical for whether the rhythmic function corresponds to log-odds. Only used if
familyis 'bernoulli'.- family
Character string for the family of distributions from which to sample the abundance values.
simphonywill give a warning if it tries to sample from a distribution outside the region in which the distribution is defined: \(\mu < 0\) for negative binomial and Poisson, and \(\mu < 0\) or \(\mu > 1\) for Bernoulli.
Value
List with the following elements:
- abundData
Matrix of abundance values (counts, if
familyis 'negbinom'), with features as rownames and samples as colnames.- sampleMetadata
data.tablewith one row per sample.- featureMetadata
data.tablewith one row per feature per condition. Columnsampandbaseare functions of time. Columnsamp0andbase0are numeric and correspond to the amplitude and baseline abundance at time 0, respectively.- experMetadata
List of arguments that were passed to
simphony.
Examples
library('data.table')
# Simulate data for features having one of three sets of rhythmic parameters.
featureGroups = data.table(amp = c(0, 1, 1), phase = c(0, 0, 6),
rhyFunc = c(cos, cos, sin))
simData = simphony(featureGroups)
# Simulate data for an experiment with specified timepoints and replicates.
featureGroups = data.table(amp = c(0, 1))
simData = simphony(featureGroups, timepointsType = 'specified',
timepoints = c(0, 2, 2, 4, 12, 16, 21))
# Simulate data for an experiment with random timepoints between 0 and 24.
featureGroups = data.table(amp = c(0, 2))
simData = simphony(featureGroups, timepointsType = 'random',
timeRange = c(0, 24), nSamplesPerCond = 20)
# Simulate data with time-dependent rhythm amplitude or baseline abundance
featureGroups = data.table(amp = c(function(x) 1, function(x) 2^(-x / 24)),
base = c(function(x) x / 12, function(x) 0))
simData = simphony(featureGroups)
# Simulate data for features whose rhythmicity varies between two conditions.
featureGroupsList = list(
data.table(amp = c(1, 2, 2), phase = c(0, -3, 0), period = c(24, 24, 22)),
data.table(amp = c(3, 2, 2), phase = c(0, 3, 0), period = c(24, 24, 26)))
simData = simphony(featureGroupsList)
# Simulate data from a negative binomial distribution with a higher variance.
featureGroups = data.table(amp = 1, base = 6:8)
dispFunc = function(x) 3 * defaultDispFunc(x)
simData = simphony(featureGroups, family = 'negbinom', dispFunc = dispFunc)
# Simulate data at high temporal resolution from a Poisson distribution that
# alternates between two states.
featureGroups = data.table(amp = 1, base = 0,
rhyFunc = function(x) ifelse(x %% (2 * pi) < pi, 0.5, 4))
simData = simphony(featureGroups, timeRange = c(0, 24 * 4), interval = 0.1,
nReps = 1, family = 'poisson')
# Simulate data for 100 features, half non-rhythmic and half rhythmic, with
# amplitudes for rhythmic features sampled from a log-normal distribution.
nFeatures = 100
rhyFrac = 0.5
nRhyFeatures = round(rhyFrac * nFeatures)
rhyAmps = exp(rnorm(nRhyFeatures, mean = 0, sd = 0.25))
fracFeatures = c(1 - rhyFrac, rep(rhyFrac / nRhyFeatures, nRhyFeatures))
featureGroups = data.table(amp = c(0, rhyAmps), fracFeatures = fracFeatures)
simData = simphony(featureGroups, nFeatures = nFeatures)
# Simulate data for 100 rhythmic features, with baseline log2 expected counts
# and residual log dispersion sampled from distributions whose parameters
# were estimated, using DESeq2 and fitdistrplus, from circadian RNA-seq data
# from mouse liver (PRJNA297287).
nFeatures = 100
baseLog2Counts = rnorm(nFeatures, mean = 8.63, sd = 2.73)
dispFactors = exp(rnorm(nFeatures, sd = 0.819))
dispFuncs = sapply(dispFactors, function(z) {function(x) defaultDispFunc(x) * z})
featureGroups = data.table(base = baseLog2Counts, dispFunc = dispFuncs, amp = 1)
simData = simphony(featureGroups, nFeatures = nFeatures, family = 'negbinom')