Title: | Parallel Factor Analysis Modelling of Longitudinal Microbiome Data |
---|---|
Description: | Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>. |
Authors: | Geert Roelof van der Ploeg [aut, cre]
|
Maintainer: | Geert Roelof van der Ploeg <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.3 |
Built: | 2025-02-24 15:03:50 UTC |
Source: | https://github.com/grvanderploeg/parafac4microbiome |
Create randomly initialized models to determine the correct number of components by assessing model quality metrics.
assessModelQuality( X, minNumComponents = 1, maxNumComponents = 5, numRepetitions = 100, method = "als", ctol = 1e-06, maxit = 2500, max_fn = 10000, rel_tol = 1e-08, abs_tol = 1e-08, grad_tol = 1e-08, numCores = 1 )
assessModelQuality( X, minNumComponents = 1, maxNumComponents = 5, numRepetitions = 100, method = "als", ctol = 1e-06, maxit = 2500, max_fn = 10000, rel_tol = 1e-08, abs_tol = 1e-08, grad_tol = 1e-08, numCores = 1 )
X |
Input data |
minNumComponents |
Minimum number of components (default 1). |
maxNumComponents |
Maximum number of components (default 5). |
numRepetitions |
Number of randomly initialized models to create (default 100). |
method |
Use ALS algorithm ("als", default) or use all-at-once optimization ("opt"). The all-at-once optimization is based on a nonlinear conjugate gradient method with Hestenes-Stiefel updates and the More-Thuente line search algorithm. |
ctol |
Change in SSQ needed for model to be converged (default 1e-6). |
maxit |
Maximum number of iterations (default 2500). |
max_fn |
Maximum number of function evaluations allowed without convergence in the OPT case (default 10000). |
rel_tol |
Relative change in loss tolerated to call the algorithm converged in the OPT case (default 1e-8). |
abs_tol |
Absolute loss tolerated to call the algorithm converged in the OPT case (default 1e-8). |
grad_tol |
Tolerance on the two-norm of the gradient divided over the number of elements in the gradient in the OPT case (default 1e-8). |
numCores |
Number of cores to use. If set larger than 1, it will run the job in parallel (default 1) |
A list object of the following:
plots: Plots of all assessed metrics and an overview plot showing a summary of all of them.
metrics: metrics of every created model (number of iterations, sum of squared errors, CORCONDIA score and variance explained).
models: all created models.
X = Fujita2023$data # Run assessModelQuality with less strict convergence parameters as example assessment = assessModelQuality(X, minNumComponents=1, maxNumComponents=3, numRepetitions=5, ctol=1e-4, maxit=250) assessment$plots$overview
X = Fujita2023$data # Run assessModelQuality with less strict convergence parameters as example assessment = assessModelQuality(X, minNumComponents=1, maxNumComponents=3, numRepetitions=5, ctol=1e-4, maxit=250) assessment$plots$overview
Bootstrapping procedure to determine PARAFAC model stability for a given number of components.
assessModelStability( dataset, minNumComponents = 1, maxNumComponents = 5, numFolds = dim(dataset$data)[1], considerGroups = FALSE, groupVariable = "", colourCols = NULL, legendTitles = NULL, xLabels = NULL, legendColNums = NULL, arrangeModes = NULL, method = "als", ctol = 1e-06, maxit = 2500, max_fn = 10000, rel_tol = 1e-08, abs_tol = 1e-08, grad_tol = 1e-08, numCores = 1 )
assessModelStability( dataset, minNumComponents = 1, maxNumComponents = 5, numFolds = dim(dataset$data)[1], considerGroups = FALSE, groupVariable = "", colourCols = NULL, legendTitles = NULL, xLabels = NULL, legendColNums = NULL, arrangeModes = NULL, method = "als", ctol = 1e-06, maxit = 2500, max_fn = 10000, rel_tol = 1e-08, abs_tol = 1e-08, grad_tol = 1e-08, numCores = 1 )
dataset |
See Fujita2023, Shao2019 or vanderPloeg2024. |
minNumComponents |
Minimum number of components (default 1). |
maxNumComponents |
Maximum number of components (default 5). |
numFolds |
Number of bootstrapped models to create. |
considerGroups |
Consider subject groups in calculating sparsity (default FALSE) |
groupVariable |
Column name in dataset$mode1 that should be used to consider groups (default "") |
colourCols |
Vector of strings stating which column names should be factorized for colours per mode. |
legendTitles |
Vector of strings stating the legend title per mode. |
xLabels |
Vector of strings stating the x-axis labels per mode. |
legendColNums |
Vector of integers stating the desired number of columns for the legends per mode. |
arrangeModes |
Vector of boolean values per mode, stating if the loadings should be arranged according to colourCols (TRUE) or not (FALSE). |
method |
Use ALS algorithm ("als", default) or use all-at-once optimization ("opt"). The all-at-once optimization is based on a nonlinear conjugate gradient method with Hestenes-Stiefel updates and the More-Thuente line search algorithm. |
ctol |
Relative change in loss tolerated to call the algorithm converged in the ALS case (default 1e-4). |
maxit |
Maximum number of iterations allowed without convergence in the ALS case (default 500). |
max_fn |
Maximum number of function evaluations allowed without convergence in the OPT case (default 10000). |
rel_tol |
Relative change in loss tolerated to call the algorithm converged in the OPT case (default 1e-8). |
abs_tol |
Absolute loss tolerated to call the algorithm converged in the OPT case (default 1e-8). |
grad_tol |
Tolerance on the two-norm of the gradient divided over the number of elements in the gradient in the OPT case (default 1e-8). |
numCores |
Number of cores to use. If set larger than 1, it will run the job in parallel (default 1) |
A list containing the following:
models: All stabilized sign-flipped bootstrapped PARAFAC models.
modelPlots: A list of plots of the median model with error bars for each number of components.
FMSplot: A bar plot showing the Factor Match Scores per number of components (see Li et al., 2024).
FMS: FMS values that the FMS plot is based on.
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) modelStability = assessModelStability(processedFujita, minNumComponents=1, maxNumComponents=2, ctol=1e-4, maxit=250)
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) modelStability = assessModelStability(processedFujita, minNumComponents=1, maxNumComponents=2, ctol=1e-4, maxit=250)
Calculate Factor Match Score for all initialized models.
calculateFMS(models)
calculateFMS(models)
models |
Output of |
Vector containing FMS scores of all comparisons
A = array(rnorm(108*2), c(108, 2)) B = array(rnorm(100*2), c(100, 2)) C = array(rnorm(10*2), c(10, 2)) X = reinflateTensor(A, B, C) models = parafac(X, 2, initialization="random", nstart=10, maxit=2, output="all") calculateFMS(models)
A = array(rnorm(108*2), c(108, 2)) B = array(rnorm(100*2), c(100, 2)) C = array(rnorm(10*2), c(10, 2)) X = reinflateTensor(A, B, C) models = parafac(X, 2, initialization="random", nstart=10, maxit=2, output="all") calculateFMS(models)
Calculate sparsity across the feature mode of a multi-way array.
calculateSparsity(dataset, considerGroups = FALSE, groupVariable = "")
calculateSparsity(dataset, considerGroups = FALSE, groupVariable = "")
dataset |
See Fujita2023, Shao2019 or vanderPloeg2024. |
considerGroups |
Consider subject groups in calculating sparsity (default FALSE) |
groupVariable |
Column name in dataset$mode1 that should be used to consider groups (default "") |
Vector of sparsity fractions (N x J) where N is the number of groups and J is the number of features.
# No groups sparsity = calculateSparsity(Fujita2023) length(sparsity) hist(sparsity) # Consider groups colnames(Shao2019$mode1) sparsity = calculateSparsity(Shao2019, considerGroups=TRUE, groupVariable="Delivery_mode") dim(sparsity) hist(sparsity[1,]) hist(sparsity[2,])
# No groups sparsity = calculateSparsity(Fujita2023) length(sparsity) hist(sparsity) # Consider groups colnames(Shao2019$mode1) sparsity = calculateSparsity(Shao2019, considerGroups=TRUE, groupVariable="Delivery_mode") dim(sparsity) hist(sparsity[1,]) hist(sparsity[2,])
Calculate the variation explained by a PARAFAC model.
calculateVarExp(Fac, X)
calculateVarExp(Fac, X)
Fac |
Fac object output from the |
X |
Input data of the PARAFAC model. |
The variation explained by the model, expressed as a fraction (between 0-1).
X = Fujita2023$data model = parafac(X, nfac=1, nstart=1, verbose=FALSE) calculateVarExp(model$Fac, X)
X = Fujita2023$data model = parafac(X, nfac=1, nstart=1, verbose=FALSE) calculateVarExp(model$Fac, X)
Calculate the variance explained of a PARAFAC model, per component
calcVarExpPerComponent(Fac, X)
calcVarExpPerComponent(Fac, X)
Fac |
Fac object output of a model |
X |
Input dataset |
Vector of scalars of the percentage of variation explained per component
X = array(rnorm(108*100*10), c(108,100,10)) model = parafac(X, 2) calcVarExpPerComponent(model$Fac, X)
X = array(rnorm(108*100*10), c(108,100,10)) model = parafac(X, 2) calcVarExpPerComponent(model$Fac, X)
Core Consistency Diagnostic (CORCONDIA) calculation
corcondia(X, Fac)
corcondia(X, Fac)
X |
Input data matrix |
Fac |
PARAFAC model Fac object |
Scalar of the CORCONDIA value
X = Fujita2023$data model = parafac(X, 2) corcondia(X, model$Fac)
X = Fujita2023$data model = parafac(X, 2) corcondia(X, model$Fac)
Vectorize Fac object
fac_to_vect(Fac)
fac_to_vect(Fac)
Fac |
Fac object output of parafac. |
Vectorized Fac object
set.seed(123) A = array(rnorm(108*2), c(108, 2)) B = array(rnorm(100*2), c(100, 2)) C = array(rnorm(10*2), c(10, 2)) Fac = list(A, B, C) v = fac_to_vect(Fac)
set.seed(123) A = array(rnorm(108*2), c(108, 2)) B = array(rnorm(100*2), c(100, 2)) C = array(rnorm(10*2), c(10, 2)) Fac = list(A, B, C) v = fac_to_vect(Fac)
Sign flip the loadings of many randomly initialized models to make consistent overview plots.
flipLoadings(models, X)
flipLoadings(models, X)
models |
Output of parafac. |
X |
Input dataset of parafac modelling procedure. |
models with sign flipped components where applicable.
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) X = reinflateTensor(A, B, C) models = parafac(X, 2, nstart=10, output="all", sortComponents=TRUE) flippedModels = flipLoadings(models, X)
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) X = reinflateTensor(A, B, C) models = parafac(X, 2, nstart=10, output="all", sortComponents=TRUE) flippedModels = flipLoadings(models, X)
The Fujita2023 longitudinal microbiome dataset as a three-dimensional array, with replicates in mode 1, microbial abundances in mode 2 and time in mode 3.
Fujita2023
Fujita2023
Fujita2023
A list object with three elements:
Array object of the data cube
Dataframe with all the subject metadata, ordered the same as the rows in the data cube.
Taxonomic classification of the microbiota, ordered the same as the columns in the data cube.
Dataframe with the time metadata, ordered the same as the third dimension in the array.
...
doi:10.1186/s40168-023-01474-5
Import MicrobiotaProcess object for PARAFAC modelling
importMicrobiotaProcess(MPobject, subjectIDs, thirdMode, taxa_are_rows = TRUE)
importMicrobiotaProcess(MPobject, subjectIDs, thirdMode, taxa_are_rows = TRUE)
MPobject |
MicrobiotaProcess object containing at least an OTU table and sample information, preferably also taxonomic information. |
subjectIDs |
Column name in the sample information corresponding to the subject IDs. |
thirdMode |
Column name in the sample information corresponding to the study design aspect to put in the third mode of the data cube. |
taxa_are_rows |
Boolean specifying if the taxa are in the rows of the OTU table (TRUE) or not (FALSE). |
List object containing:
'data': data cube
'mode1': metadata of the subject mode
'mode2': taxonomy information
'mode3': metadata of the third mode
library(MicrobiotaProcess) # Generate synthetic data sample_info = data.frame(Sample = factor(c("S1", "S2", "S3", "S4", "S5")), time = factor(c("T1", "T2", "T1", "T2", "T1"))) otu_table = matrix(runif(25, min = 0, max = 100), nrow = 5, ncol = 5, dimnames = list(paste0("OTU", 1:5), sample_info$Sample)) taxonomy_table = data.frame(OTU = paste0("OTU", 1:5), Kingdom = rep("King", 5), Phylum = rep("Phy", 5), Class = rep("Cla", 5), Order = rep("Ord", 5), Family = rep("Fam", 5), Genus = rep("Gen", 5)) # Create Summarized Experiment synthetic_SE = SummarizedExperiment::SummarizedExperiment( assays = list(otu = otu_table), colData = sample_info, rowData = taxonomy_table) # Convert to MicrobiotaProcess object synthetic_MPSE = as.MPSE(synthetic_SE) dataset = importMicrobiotaProcess(synthetic_MPSE, subjectIDs = "Sample", thirdMode = "time", taxa_are_rows = TRUE)
library(MicrobiotaProcess) # Generate synthetic data sample_info = data.frame(Sample = factor(c("S1", "S2", "S3", "S4", "S5")), time = factor(c("T1", "T2", "T1", "T2", "T1"))) otu_table = matrix(runif(25, min = 0, max = 100), nrow = 5, ncol = 5, dimnames = list(paste0("OTU", 1:5), sample_info$Sample)) taxonomy_table = data.frame(OTU = paste0("OTU", 1:5), Kingdom = rep("King", 5), Phylum = rep("Phy", 5), Class = rep("Cla", 5), Order = rep("Ord", 5), Family = rep("Fam", 5), Genus = rep("Gen", 5)) # Create Summarized Experiment synthetic_SE = SummarizedExperiment::SummarizedExperiment( assays = list(otu = otu_table), colData = sample_info, rowData = taxonomy_table) # Convert to MicrobiotaProcess object synthetic_MPSE = as.MPSE(synthetic_SE) dataset = importMicrobiotaProcess(synthetic_MPSE, subjectIDs = "Sample", thirdMode = "time", taxa_are_rows = TRUE)
Import Phyloseq object for PARAFAC modelling
importPhyloseq(phyloseqObject, subjectIDs, thirdMode)
importPhyloseq(phyloseqObject, subjectIDs, thirdMode)
phyloseqObject |
Phyloseq object containing at least an otu table and sample data, preferably also taxonomic information. |
subjectIDs |
Column name in sam_data corresponding to the subject IDs. |
thirdMode |
Column name in sam_data corresponding to the study design aspect to put in the third mode of the data cube. |
List object containing:
'data': data cube
'mode1': metadata of the subject mode
'mode2': taxonomy information
'mode3': metadata of the third mode
library(phyloseq) data(GlobalPatterns) GP = GlobalPatterns # Add custom subject IDs to the sample data to make this example work alteredSampleData = sample_data(GP) alteredSampleData$subjectID = c(1,2,3,1,2,1,2,3,1,2,1,2,1,2,3,1,2,3,1,2,3,4,5,1,2,3) df = phyloseq(otu_table(GP), tax_table(GP), alteredSampleData) # Make a data cube with SampleType (soil, feces, etc.) as the third mode. result = importPhyloseq(df, subjectIDs = "subjectID", thirdMode="SampleType")
library(phyloseq) data(GlobalPatterns) GP = GlobalPatterns # Add custom subject IDs to the sample data to make this example work alteredSampleData = sample_data(GP) alteredSampleData$subjectID = c(1,2,3,1,2,1,2,3,1,2,1,2,1,2,3,1,2,3,1,2,3,4,5,1,2,3) df = phyloseq(otu_table(GP), tax_table(GP), alteredSampleData) # Make a data cube with SampleType (soil, feces, etc.) as the third mode. result = importPhyloseq(df, subjectIDs = "subjectID", thirdMode="SampleType")
Import TreeSummarizedExperiment object for PARAFAC modelling
importTreeSummarizedExperiment( treeObject, subjectIDs, thirdMode, taxa_are_rows )
importTreeSummarizedExperiment( treeObject, subjectIDs, thirdMode, taxa_are_rows )
treeObject |
TreeSummarizedExperiment object containing at least an OTU table and sample information, preferably also taxonomic information. |
subjectIDs |
Column name in the sample information corresponding to the subject IDs. |
thirdMode |
Column name in the sample information corresponding to the study design aspect to put in the third mode of the data cube. |
taxa_are_rows |
Boolean specifying if the taxa are in the rows of the OTU table (TRUE) or not (FALSE). |
List object containing:
'data': data cube
'mode1': metadata of the subject mode
'mode2': taxonomy information
'mode3': metadata of the third mode
library(TreeSummarizedExperiment) fakeOTU = t(rTensor::k_unfold(rTensor::as.tensor(Fujita2023$data), 2)@data) fakeTaxa = as.matrix(Fujita2023$mode2) fakeSam = as.data.frame(cbind(rep(1:8, 110), rep(1:110, each=8))) colnames(fakeSam) = c("replicate.id", "timepoint") fakeTreeObj = TreeSummarizedExperiment(assays = list(Count = fakeOTU), rowData = fakeSam, colData = fakeTaxa) dataset = importTreeSummarizedExperiment(fakeTreeObj, subjectIDs="replicate.id", thirdMode="timepoint", taxa_are_rows=FALSE)
library(TreeSummarizedExperiment) fakeOTU = t(rTensor::k_unfold(rTensor::as.tensor(Fujita2023$data), 2)@data) fakeTaxa = as.matrix(Fujita2023$mode2) fakeSam = as.data.frame(cbind(rep(1:8, 110), rep(1:110, each=8))) colnames(fakeSam) = c("replicate.id", "timepoint") fakeTreeObj = TreeSummarizedExperiment(assays = list(Count = fakeOTU), rowData = fakeSam, colData = fakeTaxa) dataset = importTreeSummarizedExperiment(fakeTreeObj, subjectIDs="replicate.id", thirdMode="timepoint", taxa_are_rows=FALSE)
Initialize PARAFAC algorithm input vectors
initializePARAFAC(Tensor, nfac, initialization = "random", output = "Fac")
initializePARAFAC(Tensor, nfac, initialization = "random", output = "Fac")
Tensor |
Input dataset matrix or tensor |
nfac |
Number of components to initialize. |
initialization |
Either "random" for random initialization or "svd" for svd based. |
output |
Output the initialized components as a Fac object ("Fac", default) or as a vector ("vect"). |
Fac or vector with initialized components.
A = array(rnorm(108,2), c(108,2)) B = array(rnorm(100,2), c(100,2)) C = array(rnorm(10,2), c(10,2)) Tensor = reinflateTensor(A, B, C, returnAsTensor=TRUE) init = initializePARAFAC(Tensor, 2)
A = array(rnorm(108,2), c(108,2)) B = array(rnorm(100,2), c(100,2)) C = array(rnorm(10,2), c(10,2)) Tensor = reinflateTensor(A, B, C, returnAsTensor=TRUE) init = initializePARAFAC(Tensor, 2)
Center a multi-way array
multiwayCenter(X, mode = 1)
multiwayCenter(X, mode = 1)
X |
Multi-way array |
mode |
Mode to center across (default 1). |
Centered multi-way array
cube_cnt = multiwayCenter(Fujita2023$data)
cube_cnt = multiwayCenter(Fujita2023$data)
Note: Propagates NAs corresponding to missing samples.
multiwayCLR(X, pseudocount = 1)
multiwayCLR(X, pseudocount = 1)
X |
Multi-way array of counts |
pseudocount |
Pseudocount value to use (default 1). |
CLRed cube
cubeCLR = multiwayCLR(Fujita2023$data)
cubeCLR = multiwayCLR(Fujita2023$data)
Scale a multi-way array
multiwayScale(X, mode = 2)
multiwayScale(X, mode = 2)
X |
Multi-way array |
mode |
Mode to scale within: 1=subjects,2=features,3=time (default 2). |
Scaled multi-way array
cube_scl = multiwayCenter(Fujita2023$data)
cube_scl = multiwayCenter(Fujita2023$data)
Parallel Factor Analysis
parafac( Tensor, nfac, nstart = 1, maxit = 500, max_fn = 10000, ctol = 1e-04, rel_tol = 1e-08, abs_tol = 1e-08, grad_tol = 1e-08, initialization = "random", method = "als", verbose = FALSE, output = "best", sortComponents = FALSE )
parafac( Tensor, nfac, nstart = 1, maxit = 500, max_fn = 10000, ctol = 1e-04, rel_tol = 1e-08, abs_tol = 1e-08, grad_tol = 1e-08, initialization = "random", method = "als", verbose = FALSE, output = "best", sortComponents = FALSE )
Tensor |
3-way matrix of numeric data |
nfac |
Number of factors (components) to fit. |
nstart |
Number of models to randomly initialize (default 1). |
maxit |
Maximum number of iterations allowed without convergence in the ALS case (default 500). |
max_fn |
Maximum number of function evaluations allowed without convergence in the OPT case (default 10000). |
ctol |
Relative change in loss tolerated to call the algorithm converged in the ALS case (default 1e-4). |
rel_tol |
Relative change in loss tolerated to call the algorithm converged in the OPT case (default 1e-8). |
abs_tol |
Absolute loss tolerated to call the algorithm converged in the OPT case (default 1e-8). |
grad_tol |
Tolerance on the two-norm of the gradient divided over the number of elements in the gradient in the OPT case (default 1e-8). |
initialization |
"Random" for randomly initialized input vectors or "nvec" for svd-based best guess. |
method |
Use ALS algorithm ("als", default) or use all-at-once optimization ("opt"). The all-at-once optimization is based on a nonlinear conjugate gradient method with Hestenes-Stiefel updates and the More-Thuente line search algorithm. |
verbose |
|
output |
String ("best"/"all") Return only the best model of the nstart models ("best") or return all of them in a list object ("all"). |
sortComponents |
Boolean to sort the components based on their variance explained (default FALSE) |
List object of the PARAFAC model or models.
X = array(rnorm(108*100*10), c(108,100,10)) model = parafac(X, 2)
X = array(rnorm(108*100*10), c(108,100,10)) model = parafac(X, 2)
Internal PARAFAC alternating least-squares (ALS) core algorithm
parafac_core_als(Tensor, nfac, init, maxit = 500, ctol = 1e-04)
parafac_core_als(Tensor, nfac, init, maxit = 500, ctol = 1e-04)
Tensor |
Tensor data object |
nfac |
Number of components to compute |
init |
Initialization from initializePARAFAC. |
maxit |
Maximum number of iterations to run (default 500). |
ctol |
Loss function tolerance for convergence (default 1e-4) |
List containing a Fac object and the loss per iteration
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) Tensor = reinflateTensor(A, B, C) init = initializePARAFAC(Tensor, 2) model = parafac_core_als(Tensor, 2, init)
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) Tensor = reinflateTensor(A, B, C) init = initializePARAFAC(Tensor, 2) model = parafac_core_als(Tensor, 2, init)
PARAFAC loss function calculation
parafac_fun(x, Tensor, lambdas = NULL)
parafac_fun(x, Tensor, lambdas = NULL)
x |
Vector of fitted loadings generated by the PARAFAC algorithm, can also be a Fac object |
Tensor |
input data |
lambdas |
If lambdas (from the kruskal tensor case) are generated to make the Fac norm 1, they can be supplied. |
Scalar value of the loss function
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) X = reinflateTensor(A, B, C) model = parafac(X, 2) f = parafac_fun(model$Fac, X)
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) X = reinflateTensor(A, B, C) model = parafac(X, 2) f = parafac_fun(model$Fac, X)
Calculate gradient of PARAFAC model.
parafac_gradient(x, Tensor)
parafac_gradient(x, Tensor)
x |
Vector of fitted loadings generated by the PARAFAC algorithm, can also be a Fac object |
Tensor |
input data |
Gradient of the PARAFAC model.
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) X = reinflateTensor(A, B, C) init = initializePARAFAC(X, 2) g = parafac_gradient(init, X)
A = array(rnorm(108*2), c(108,2)) B = array(rnorm(100*2), c(100,2)) C = array(rnorm(10*2), c(10,2)) X = reinflateTensor(A, B, C) init = initializePARAFAC(X, 2) g = parafac_gradient(init, X)
Plot diagnostics of many initialized PARAFAC models.
plotModelMetric( metric, plottingMode = "box", ylabel = "metric", titleString = "" )
plotModelMetric( metric, plottingMode = "box", ylabel = "metric", titleString = "" )
metric |
Matrix of metrics per initialized model (number of models x number of components). |
plottingMode |
Plot the metrics as a box plot ("box", default) or as a bar plot ("bar"). |
ylabel |
String of the y axis label (default "metric"). |
titleString |
String of the plot title (default ""). |
A plot of the metrics
varExp = array(runif(100*2, min=50, max=100), c(100,2)) plotModelMetric(varExp, plottingMode="box", ylabel="Variation explained (%)")
varExp = array(runif(100*2, min=50, max=100), c(100,2)) plotModelMetric(varExp, plottingMode="box", ylabel="Variation explained (%)")
Plot a summary of the loadings of many initialized parafac models.
plotModelStability( models, dataset, colourCols = NULL, legendTitles = NULL, xLabels = NULL, legendColNums = NULL, arrangeModes = NULL, continuousModes = NULL, overallTitle = "" )
plotModelStability( models, dataset, colourCols = NULL, legendTitles = NULL, xLabels = NULL, legendColNums = NULL, arrangeModes = NULL, continuousModes = NULL, overallTitle = "" )
models |
Models list output from |
dataset |
A longitudinal microbiome dataset, ideally processed with
|
colourCols |
Vector of strings stating which column names should be factorized for colours per mode. |
legendTitles |
Vector of strings stating the legend title per mode. |
xLabels |
Vector of strings stating the x-axis labels per mode. |
legendColNums |
Vector of integers stating the desired number of columns for the legends per mode. |
arrangeModes |
Vector of boolean values per mode, stating if the loadings should be arranged according to colourCols (TRUE) or not (FALSE). |
continuousModes |
Vector of boolean values per mode, stating if the loadings should be plotted as a line plot (TRUE) or a bar plot (FALSE). |
overallTitle |
Overall title of the plot. |
Plot of loadings with error bars
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) models = parafac(processedFujita$data, 2, nstart=10, output="all") plotModelStability(models, processedFujita)
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) models = parafac(processedFujita$data, 2, nstart=10, output="all") plotModelStability(models, processedFujita)
Plots Tucker Congruence Coefficients of randomly initialized models.
plotModelTCCs(models)
plotModelTCCs(models)
models |
Models list output of |
Plot of TCCs
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) models = parafac(processedFujita$data, 3, nstart=10, output="all") plotModelTCCs(models)
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) models = parafac(processedFujita$data, 3, nstart=10, output="all") plotModelTCCs(models)
Plot a PARAFAC model
plotPARAFACmodel( model, dataset, numComponents, colourCols = NULL, legendTitles = NULL, xLabels = NULL, legendColNums = NULL, arrangeModes = NULL, continuousModes = NULL, overallTitle = "" )
plotPARAFACmodel( model, dataset, numComponents, colourCols = NULL, legendTitles = NULL, xLabels = NULL, legendColNums = NULL, arrangeModes = NULL, continuousModes = NULL, overallTitle = "" )
model |
Model output from |
dataset |
A longitudinal microbiome dataset, ideally processed with
|
numComponents |
Number of PARAFAC components in the model. |
colourCols |
Vector of strings stating which column names should be factorized for colours per mode. |
legendTitles |
Vector of strings stating the legend title per mode. |
xLabels |
Vector of strings stating the x-axis labels per mode. |
legendColNums |
Vector of integers stating the desired number of columns for the legends per mode. |
arrangeModes |
Vector of boolean values per mode, stating if the loadings should be arranged according to colourCols (TRUE) or not (FALSE). |
continuousModes |
Vector of boolean values per mode, stating if the loadings should be plotted as a line plot (TRUE) or a bar plot (FALSE). |
overallTitle |
Overall title of the plot. |
Plot object
library(multiway) library(dplyr) library(ggplot2) set.seed(0) # Process the data processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.9, centerMode=1, scaleMode=2) # Make PARAFAC model model = parafac(processedFujita$data, nfac=2, nstart=10, verbose=FALSE) # Make plot plotPARAFACmodel(model, processedFujita, numComponents = 2, colourCols = c("", "Genus", ""), legendTitles = c("", "Genus", ""), xLabels = c("Replicate", "Feature index", "Time point"), legendColNums = c(0,5,0), arrangeModes = c(FALSE, TRUE, FALSE), continuousModes = c(FALSE,FALSE,TRUE), overallTitle = "Fujita PARAFAC model")
library(multiway) library(dplyr) library(ggplot2) set.seed(0) # Process the data processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.9, centerMode=1, scaleMode=2) # Make PARAFAC model model = parafac(processedFujita$data, nfac=2, nstart=10, verbose=FALSE) # Make plot plotPARAFACmodel(model, processedFujita, numComponents = 2, colourCols = c("", "Genus", ""), legendTitles = c("", "Genus", ""), xLabels = c("Replicate", "Feature index", "Time point"), legendColNums = c(0,5,0), arrangeModes = c(FALSE, TRUE, FALSE), continuousModes = c(FALSE,FALSE,TRUE), overallTitle = "Fujita PARAFAC model")
Process a multi-way array of count data.
processDataCube( dataset, sparsityThreshold = 1, considerGroups = FALSE, groupVariable = "", CLR = TRUE, centerMode = 0, scaleMode = 0 )
processDataCube( dataset, sparsityThreshold = 1, considerGroups = FALSE, groupVariable = "", CLR = TRUE, centerMode = 0, scaleMode = 0 )
dataset |
A longitudinal microbiome dataset, formatted as follows:
See Fujita2023, Shao2019 or vanderPloeg2024 for more information. |
sparsityThreshold |
Maximum sparsity for a feature to be selected (default=1, i.e. do not select features). |
considerGroups |
Consider groups when calculating sparsity (default=FALSE). |
groupVariable |
Column name in dataset$mode1 that should be used to consider groups (default=""). |
CLR |
Perform a centered log-ratio transformation of the count data (default=TRUE). |
centerMode |
Mode to center across: 1=subjects,2=features,3=time (default 0, i.e. do not center). See |
scaleMode |
Mode to scale within: 1=subjects,2=features,3=time (default 0, i.e. do not scale). See |
CLRed, centered and scaled cube
processedCube = processDataCube(Fujita2023)
processedCube = processDataCube(Fujita2023)
Calculate Xhat from a model Fac object
reinflateFac(Fac, X, returnAsTensor = FALSE)
reinflateFac(Fac, X, returnAsTensor = FALSE)
Fac |
Fac object from parafac |
X |
Input data X |
returnAsTensor |
Boolean to return Xhat as rTensor tensor (TRUE) or matrix (default, FALSE). |
Xhat
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) model = parafac(processedFujita$data, nfac=1, nstart=1, verbose=FALSE) Xhat = reinflateFac(model$Fac, processedFujita$data)
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) model = parafac(processedFujita$data, nfac=1, nstart=1, verbose=FALSE) Xhat = reinflateFac(model$Fac, processedFujita$data)
Create a tensor out of a set of matrices similar to a component model.
reinflateTensor(A, B, C, returnAsTensor = FALSE)
reinflateTensor(A, B, C, returnAsTensor = FALSE)
A |
I x N matrix corresponding to loadings in the first mode for N components. |
B |
J x N matrix corresponding to loadings in the second mode for N components. |
C |
K x N matrix corresponding to loadings in the third mode for N components. |
returnAsTensor |
Boolean return as rTensor S4 tensor object (default FALSE). |
M, an I x J x K tensor.
A = rnorm(108) B = rnorm(100) C = rnorm(10) M = reinflateTensor(A,B,C)
A = rnorm(108) B = rnorm(100) C = rnorm(10) M = reinflateTensor(A,B,C)
The Shao2019 longitudinal microbiome dataset as a three-dimensional array, with subjects in mode 1, microbial abundances in mode 2 and time in mode 3. Note: only time points 4, 7, 21 and Infancy are used. Note: all-zero microbial abundances have been removed to save disk space.
Shao2019
Shao2019
Shao2019
A list object with three elements:
Array object of the data cube
Dataframe with all the subject metadata, ordered the same as the rows in the data cube.
Taxonomic classification of the microbiota, ordered the same as the columns in the data cube.
Dataframe with the time metadata, ordered the same as the third dimension in the array.
...
Sort PARAFAC components based on variance explained per component.
sortComponents(Fac, X)
sortComponents(Fac, X)
Fac |
Fac object output of a parafac model |
X |
Input data |
Fac object of sorted components
X = array(rnorm(108*100*10), c(108,100,10)) model = parafac(X, 2) sortedFac = sortComponents(model$Fac, X)
X = array(rnorm(108*100*10), c(108,100,10)) model = parafac(X, 2) sortedFac = sortComponents(model$Fac, X)
Transform PARAFAC loadings to an orthonormal basis. Note: this function only works for 3-way PARAFAC models.
transformPARAFACloadings(Fac, modeToCorrect, moreOutput = FALSE)
transformPARAFACloadings(Fac, modeToCorrect, moreOutput = FALSE)
Fac |
Fac object from a PARAFAC object, see |
modeToCorrect |
Correct the subject (1), feature (2) or time mode (3). |
moreOutput |
Give orthonormal basis and transformation matrices as part of output (default FALSE). |
Corrected loadings of the specified mode.
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) model = parafac(processedFujita$data, nfac=2, nstart=1, verbose=FALSE) transformedA = transformPARAFACloadings(model$Fac, 1)
processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, centerMode=1, scaleMode=2) model = parafac(processedFujita$data, nfac=2, nstart=1, verbose=FALSE) transformedA = transformPARAFACloadings(model$Fac, 1)
The vanderPloeg2024 longitudinal microbiome dataset as a three-dimensional array, with subjects in mode 1, microbial abundances in mode 2, and time in mode 3. Note: all-zero microbial abundances have been removed to save disk space.
vanderPloeg2024
vanderPloeg2024
vanderPloeg2024
A list object with three elements:
Array object of the data cube
Dataframe with all the subject metadata, ordered the same as the rows in the data cube.
Taxonomic classification of the microbiota, ordered the same as the columns in the data cube.
Dataframe with the time metadata, ordered the same as the third dimension in the array.
...
Convert vectorized output of PARAFAC to a Fac list object with all loadings per mode.
vect_to_fac(vect, X, sortComponents = FALSE)
vect_to_fac(vect, X, sortComponents = FALSE)
vect |
Vectorized output of PARAFAC modelling |
X |
Input data |
sortComponents |
Sort the order of the components by variation explained (default FALSE). |
Fac: list object with all loadings in all components per mode, ordered the same way as Z$modes.
set.seed(123) A = array(rnorm(108*2), c(108, 2)) B = array(rnorm(100*2), c(100, 2)) C = array(rnorm(10*2), c(10, 2)) X = reinflateTensor(A, B, C) result = initializePARAFAC(X, 2, initialization="random", output="vect") Fac = vect_to_fac(result, X)
set.seed(123) A = array(rnorm(108*2), c(108, 2)) B = array(rnorm(100*2), c(100, 2)) C = array(rnorm(10*2), c(10, 2)) X = reinflateTensor(A, B, C) result = initializePARAFAC(X, 2, initialization="random", output="vect") Fac = vect_to_fac(result, X)