Perform Brain Gene Set Analysis — brainenrich • BrainEnrich

This function performs gene set analysis using group-level brain data. It aggregates the associations between gene expression data and brain data within gene sets from annotation data. The empirical aggregation score is compared to a null distribution generated by the selected null model. This function supports user-defined correlation and aggregation methods by allowing the use of custom functions.

Usage

brainenrich(
  brain_data,
  gene_data,
  annoData,
  cor_method = c("pearson", "spearman", "pls1c", "pls1w"),
  aggre_method = c("mean", "median", "meanabs", "meansqr", "maxmean", "ks_orig",
    "ks_weighted", "ks_pos_neg_sum", "sign_test", "rank_sum"),
  null_model = c("spin_brain", "resample_gene", "coexp_matched"),
  minGSSize = 10,
  maxGSSize = 200,
  n_cores = 0,
  n_perm = 5000,
  perm_id = NULL,
  coord.l = NULL,
  coord.r = NULL,
  seed = NULL,
  threshold_type = c("sd", "percentile", "none"),
  threshold_value = 1,
  pvalueCutoff = 0.05,
  pAdjustMethod = c("fdr", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
    "none"),
  matchcoexp_tol = 0.05,
  matchcoexp_max_iter = 1e+06
)

Arguments

brain_data: A data frame of brain data with regions as rows. The row names (i.e., region names) must be consistent with those in gene_data.
gene_data: A data frame of gene expression data with regions as rows and genes as columns. The row names (i.e., region names) must be consistent with those in brain_data.
annoData: An environment containing annotation data. See get_annoData for more details.
cor_method: A character string specifying the correlation method. Default is 'pearson'. Other options include 'spearman', 'pls1c', and 'pls1w'. If a custom function that takes (gene_data, brain_data) as input is provided, the function will use the custom correlation method, and cor_method will be set to 'custom'.
aggre_method: A character string specifying the aggregation method. Default is 'mean'. Other options include 'median', 'meanabs', 'meansqr', 'maxmean', 'ks_orig', 'ks_weighted', 'ks_pos_neg_sum', 'sign_test', and 'rank_sum'. If a custom function that takes (geneList, geneSet) as input is provided, the function will use the custom aggregation method, and aggre_method will be set to 'custom'.
null_model: A character string specifying the null model to use. Default is 'spin_brain'. Other options include 'resample_gene' and 'coexp_matched'.
minGSSize: An integer specifying the minimum gene set size after intersecting with the genes in gene_data. Default is 10.
maxGSSize: An integer specifying the maximum gene set size after intersecting with the genes in gene_data. Default is 200.
n_cores: An integer specifying the number of cores to use. Default is 0 (use all available cores minus one).
n_perm: An integer specifying the number of permutations. Default is 5000.
perm_id: A matrix of permutation indices for the 'spin_brain' null model. Default is NULL. Either perm_id or coord.l/coord.r must be provided if choosing the 'spin_brain' mode.
coord.l: A matrix of coordinates for the left hemisphere used in the 'spin_brain' null model. Default is NULL. Can be NULL if coord.r or perm_id is provided.
coord.r: A matrix of coordinates for the right hemisphere used in the 'spin_brain' null model. Default is NULL. Can be NULL if coord.l or perm_id is provided.
seed: An integer specifying the seed for reproducibility when using the 'spin_brain' model. Default is NULL.
threshold_type: A character string specifying the threshold type for identifying core genes. Default is 'sd'. Other options are 'percentile' and 'none'. For 'sd', the threshold value represents the number of standard deviations from the mean. For 'percentile', the threshold value represents the percentile of the distribution. For 'none', no identification of core genes is performed.
threshold_value: A numeric value specifying the threshold for core gene identification. Default is 1. See find_core_genes for more details.
pvalueCutoff: A numeric value specifying the p-value cutoff for significance in the output. Default is 0.05.
pAdjustMethod: A character string specifying the method for p-value adjustment ("fdr", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "none"). Default is 'fdr'. See p.adjust for more details.
matchcoexp_tol: A numeric value specifying the tolerance for co-expression matching in the 'coexp_matched' null model. Lower values result in better matching but increase the number of iterations required. Default is 0.05. See resample_geneSetList_matching_coexp for more details.
matchcoexp_max_iter: An integer specifying the maximum number of iterations for co-expression matching in the 'coexp_matched' null model. Default is 1,000,000. See resample_geneSetList_matching_coexp for more details.

Value

A gseaResult object containing the enrichment results.