Perform Brain Gene Set Analysis
brainenrich.Rd
This function performs gene set analysis using group-level brain data. It aggregates the associations between gene expression data and brain data within gene sets from annotation data. The empirical aggregation score is compared to a null distribution generated by the selected null model. This function supports user-defined correlation and aggregation methods by allowing the use of custom functions.
Usage
brainenrich(
brain_data,
gene_data,
annoData,
cor_method = c("pearson", "spearman", "pls1c", "pls1w"),
aggre_method = c("mean", "median", "meanabs", "meansqr", "maxmean", "ks_orig",
"ks_weighted", "ks_pos_neg_sum", "sign_test", "rank_sum"),
null_model = c("spin_brain", "resample_gene", "coexp_matched"),
minGSSize = 10,
maxGSSize = 200,
n_cores = 0,
n_perm = 5000,
perm_id = NULL,
coord.l = NULL,
coord.r = NULL,
seed = NULL,
threshold_type = c("sd", "percentile", "none"),
threshold_value = 1,
pvalueCutoff = 0.05,
pAdjustMethod = c("fdr", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"none"),
matchcoexp_tol = 0.05,
matchcoexp_max_iter = 1e+06
)
Arguments
- brain_data
A data frame of brain data with regions as rows. The row names (i.e., region names) must be consistent with those in gene_data.
- gene_data
A data frame of gene expression data with regions as rows and genes as columns. The row names (i.e., region names) must be consistent with those in brain_data.
- annoData
An environment containing annotation data. See
get_annoData
for more details.- cor_method
A character string specifying the correlation method. Default is 'pearson'. Other options include 'spearman', 'pls1c', and 'pls1w'. If a custom function that takes (gene_data, brain_data) as input is provided, the function will use the custom correlation method, and
cor_method
will be set to 'custom'.- aggre_method
A character string specifying the aggregation method. Default is 'mean'. Other options include 'median', 'meanabs', 'meansqr', 'maxmean', 'ks_orig', 'ks_weighted', 'ks_pos_neg_sum', 'sign_test', and 'rank_sum'. If a custom function that takes (geneList, geneSet) as input is provided, the function will use the custom aggregation method, and
aggre_method
will be set to 'custom'.- null_model
A character string specifying the null model to use. Default is 'spin_brain'. Other options include 'resample_gene' and 'coexp_matched'.
- minGSSize
An integer specifying the minimum gene set size after intersecting with the genes in gene_data. Default is 10.
- maxGSSize
An integer specifying the maximum gene set size after intersecting with the genes in gene_data. Default is 200.
- n_cores
An integer specifying the number of cores to use. Default is 0 (use all available cores minus one).
- n_perm
An integer specifying the number of permutations. Default is 5000.
- perm_id
A matrix of permutation indices for the 'spin_brain' null model. Default is NULL. Either perm_id or coord.l/coord.r must be provided if choosing the 'spin_brain' mode.
- coord.l
A matrix of coordinates for the left hemisphere used in the 'spin_brain' null model. Default is NULL. Can be NULL if coord.r or perm_id is provided.
- coord.r
A matrix of coordinates for the right hemisphere used in the 'spin_brain' null model. Default is NULL. Can be NULL if coord.l or perm_id is provided.
- seed
An integer specifying the seed for reproducibility when using the 'spin_brain' model. Default is NULL.
- threshold_type
A character string specifying the threshold type for identifying core genes. Default is 'sd'. Other options are 'percentile' and 'none'. For 'sd', the threshold value represents the number of standard deviations from the mean. For 'percentile', the threshold value represents the percentile of the distribution. For 'none', no identification of core genes is performed.
- threshold_value
A numeric value specifying the threshold for core gene identification. Default is 1. See
find_core_genes
for more details.- pvalueCutoff
A numeric value specifying the p-value cutoff for significance in the output. Default is 0.05.
- pAdjustMethod
A character string specifying the method for p-value adjustment ("fdr", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "none"). Default is 'fdr'. See
p.adjust
for more details.- matchcoexp_tol
A numeric value specifying the tolerance for co-expression matching in the 'coexp_matched' null model. Lower values result in better matching but increase the number of iterations required. Default is 0.05. See
resample_geneSetList_matching_coexp
for more details.- matchcoexp_max_iter
An integer specifying the maximum number of iterations for co-expression matching in the 'coexp_matched' null model. Default is 1,000,000. See
resample_geneSetList_matching_coexp
for more details.