Skip to contents

Introduction

This tutorial demonstrates how to perform enrichment analysis of individual-level imaging-derived phenotypes (IDPs) using the BrainEnrich package.

1. Prepare Data for the Analysis

In this section, we will prepare the data needed for individual-level enrichment analysis.

Note that for demonstration purpose, the data used here is simulated from the HCP dataset.

1.1. Load sample data frame. The data is simulated from HCP data.

The data is simulated from HCP data. Let’s load and inspect it.

data(sim_hcp)
str(sim_hcp)
#> 'data.frame':    100 obs. of  37 variables:
#>  $ L_bankssts                : num  2.56 2.48 2.47 2.63 2.48 ...
#>  $ L_caudalanteriorcingulate : num  2.56 2.7 2.63 2.48 2.43 ...
#>  $ L_caudalmiddlefrontal     : num  2.52 2.75 2.57 2.67 2.68 ...
#>  $ L_cuneus                  : num  1.7 1.92 1.87 1.87 1.85 ...
#>  $ L_entorhinal              : num  3.51 3.38 3.38 3.15 2.63 ...
#>  $ L_fusiform                : num  2.57 2.64 2.7 2.55 2.56 ...
#>  $ L_inferiorparietal        : num  2.42 2.51 2.51 2.56 2.54 ...
#>  $ L_inferiortemporal        : num  2.55 2.71 2.76 2.8 2.76 ...
#>  $ L_isthmuscingulate        : num  2.18 2.54 2.43 2.41 2.47 ...
#>  $ L_lateraloccipital        : num  2 2.25 2.21 2.17 2.29 ...
#>  $ L_lateralorbitofrontal    : num  2.79 2.77 2.7 2.84 2.67 ...
#>  $ L_lingual                 : num  1.84 2.07 2.11 2.12 2.07 ...
#>  $ L_medialorbitofrontal     : num  2.54 2.43 2.41 2.68 2.45 ...
#>  $ L_middletemporal          : num  2.75 2.64 2.8 2.9 2.81 ...
#>  $ L_parahippocampal         : num  2.88 2.47 2.68 3.1 2.98 ...
#>  $ L_paracentral             : num  2.32 2.43 2.38 2.56 2.47 ...
#>  $ L_parsopercularis         : num  2.63 2.67 2.47 2.57 2.72 ...
#>  $ L_parsorbitalis           : num  2.86 2.74 3.08 2.78 2.61 ...
#>  $ L_parstriangularis        : num  2.44 2.39 2.54 2.51 2.47 ...
#>  $ L_pericalcarine           : num  1.3 1.87 1.86 1.82 1.61 ...
#>  $ L_postcentral             : num  1.74 2.18 2.14 2.42 2.13 ...
#>  $ L_posteriorcingulate      : num  2.34 2.72 2.32 2.38 2.52 ...
#>  $ L_precentral              : num  2.54 2.7 2.69 2.79 2.64 ...
#>  $ L_precuneus               : num  2.26 2.35 2.31 2.34 2.36 ...
#>  $ L_rostralanteriorcingulate: num  3.06 2.72 2.91 2.57 2.83 ...
#>  $ L_rostralmiddlefrontal    : num  2.34 2.42 2.39 2.57 2.49 ...
#>  $ L_superiorfrontal         : num  2.81 2.83 2.71 2.77 2.85 ...
#>  $ L_superiorparietal        : num  2.06 2.24 2.16 2.35 2.29 ...
#>  $ L_superiortemporal        : num  2.73 2.9 2.91 3.09 2.86 ...
#>  $ L_supramarginal           : num  2.42 2.48 2.62 2.71 2.62 ...
#>  $ L_frontalpole             : num  2.8 2.75 3.1 2.87 3.29 ...
#>  $ L_temporalpole            : num  3.52 3.44 3.89 3.67 2.87 ...
#>  $ L_transversetemporal      : num  2.16 2.64 2.71 2.58 2.47 ...
#>  $ L_insula                  : num  2.83 3.23 2.95 3.16 3.04 ...
#>  $ Age                       : int  29 28 36 29 28 27 29 26 34 31 ...
#>  $ Sex                       : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 2 2 ...
#>  $ BMI                       : num  37.2 23.8 30.1 32.1 25.7 ...

1.2. prepare brain_data for scoring

Next, we’ll prepare the brain data by selecting the relevant columns and transforming the data for scoring.

brain_data <- dplyr::select(sim_hcp, starts_with("L_")) %>% t()
colnames(brain_data) <- paste0("sub-", 1:ncol(brain_data))
str(brain_data)
#>  num [1:34, 1:100] 2.56 2.56 2.52 1.7 3.51 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:34] "L_bankssts" "L_caudalanteriorcingulate" "L_caudalmiddlefrontal" "L_cuneus" ...
#>   ..$ : chr [1:100] "sub-1" "sub-2" "sub-3" "sub-4" ...

1.3. Load gene expression data for the Analysis

We will now load gene expression data for the analysis. This data is based on a specific brain atlas (Desikan) and includes gene expression levels (correlation across donnors > 0.4) for the left hemisphere.

gene_data <- get_geneExp(atlas = "desikan", rdonor = "r0.4", hem = "L")
str(gene_data)
#>  num [1:34, 1:6513] 0.469 0.607 0.512 0.341 0.664 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:34] "L_bankssts" "L_caudalanteriorcingulate" "L_caudalmiddlefrontal" "L_cuneus" ...
#>   ..$ : chr [1:6513] "A1BG" "A1BG-AS1" "AACS" "AADAT" ...

1.4. Load SynGO for the Analysis

We will load SynGO terms that will be used in the enrichment analysis. This data represents gene sets associated with various molecular functions.

annoData <- get_annoData(type = "SynGO")
geneSetList <- get_geneSetList(annoData)
length(geneSetList)
#> [1] 302
print(sprintf("Number of SynGO terms: %d", length(geneSetList)))
#> [1] "Number of SynGO terms: 302"

Filter the geneSetList (Optional)

Filtering the gene sets by size is optional and is shown here for demonstration purposes. This step is embedded in the brainenrich/brainscore function.

selected.gs <- filter_geneSetList(bg_genes = colnames(gene_data), geneSetList = geneSetList, minGSSize = 20, maxGSSize = 200)
print(sprintf("%d MF terms have gene size ranging between 20 and 200", length(selected.gs)))

2. Get gene set scores using brainscore

In this section, we will run the individual enrichment analysis using the prepared data.

2.1. get the raw (empirical) score of the brain_data (null_model='none')

We calculate the gene set scores for the brain data without using any null model.

gsScore.raw <- brainscore(
  brain_data = brain_data,
  gene_data = gene_data,
  annoData = annoData,
  cor_method = "pearson",
  aggre_method = "mean",
  n_cores = 0,
  minGSSize = 20,
  maxGSSize = 200,
  null_model = "none"
)
str(head(gsScore.raw))
#> List of 6
#>  $ GO:0007268: num [1:100] 0.0588 0.0278 -0.013 0.0465 0.0445 ...
#>  $ GO:0007416: num [1:100] 0.1185 0.0785 0.0442 0.1009 0.088 ...
#>  $ GO:0008021: num [1:100] 0.25 0.216 0.148 0.223 0.216 ...
#>  $ GO:0016079: num [1:100] -0.0939 -0.0846 -0.1204 -0.0569 -0.0439 ...
#>  $ GO:0030285: num [1:100] 0.1037 0.1043 0.0359 0.1126 0.1658 ...
#>  $ GO:0030672: num [1:100] 0.277 0.242 0.174 0.247 0.252 ...

The gsScore.raw is a list of molecular terms. For each molecular term, you can find individual gene set scores and use them for downstream analyses (e.g., correlation analysis, machine learning, clustering etc.).

2.2. Get the null score of the brain_data (null_model='spin_brain')

Next, we will calculate the null scores using the ‘spin_brain’ null model.

gsScore.spin <- brainscore(
  brain_data = brain_data,
  gene_data = gene_data,
  annoData = annoData,
  cor_method = "pearson",
  aggre_method = "mean",
  n_cores = 0,
  minGSSize = 20,
  maxGSSize = 200,
  null_model = "spin_brain",
  n_perm = 5,
  perm_id = perm_id_dk_lh_5000
)
str(head(gsScore.spin, 5), 1)
#> List of 5
#>  $ null_1:List of 55
#>  $ null_2:List of 55
#>  $ null_3:List of 55
#>  $ null_4:List of 55
#>  $ null_5:List of 55

The result is a list of lists. Each sublist contains null scores for the molecular terms (i.e. SynGO terms). This is useful for permutation tests, where you compare the observed results (e.g., correlation) with your dependent variable between empirical gsScore and null model gsScore.

2.3. Get the null score of the resample_gene (set null_model='resample_gene')

gsScore.resample <- brainscore(
  brain_data = brain_data,
  gene_data = gene_data,
  annoData = annoData,
  cor_method = "pearson",
  aggre_method = "mean",
  n_cores = 0,
  minGSSize = 20,
  maxGSSize = 200,
  null_model = "resample_gene",
  n_perm = 5
)
str(head(gsScore.resample, 5), 1)
#> List of 5
#>  $ null_1:List of 55
#>  $ null_2:List of 55
#>  $ null_3:List of 55
#>  $ null_4:List of 55
#>  $ null_5:List of 55

The result is also a list of lists. Each sublist contains null scores for the molecular terms (i.e. SynGO terms). This is useful for permutation tests, where you compare the observed results (e.g., correlation) with your dependent variable between empirical gsScore and null model gsScore.

3. Perform linear regression between the gsScore and predictors using brainscore.lm_test

The function brainscore.lm_test is designed to perform linear regression tests between the gsScore and the brain data as follows: gsScore ~ predictor + covariates. This would cover scenarios where gsScores will be used. The predictor can be either a single numeric variable or a single two-level factor variable. The covariates can be a data frame with multiple numeric variables or factors. The function will first get the emprical statistics (i.e., standardized beta corresponding to the predictor) and then test that against the null model (e.g., spin_brain) to get the p-value.

Normal setup

cov_df <- sim_hcp %>% dplyr::select(Age, Sex)
pred_df <- sim_hcp %>% dplyr::select(BMI)
res <- brainscore.lm_test(
  pred_df = pred_df,
  cov_df = cov_df,
  brain_data = brain_data,
  gene_data = gene_data,
  annoData = annoData,
  cor_method = "pearson",
  aggre_method = "mean",
  n_cores = 0,
  minGSSize = 50,
  maxGSSize = 200,
  null_model = "spin_brain",
  n_perm = 5,
  perm_id = perm_id_dk_lh_5000,
  pvalueCutoff = 0.8
)

For demonstration purposes, we only performed 10 permutations and set minGSSize to 50. In practice, you may need to perform more permutations (e.g., 5000) for a more reliable p-value.

Setup with precomputed null scores

As an alternative to the previous step, you can use precomputed null scores to perform linear regression tests between the gsScore and the brain data. Generating null brain scores can be time consuming, especially when the number of permutations is high. Therefore, we recommend using precomputed null scores for the analysis. Here we used the null scores generated in Section 2.2.

res1 <- brainscore.lm_test(
  pred_df = pred_df,
  cov_df = cov_df,
  brain_data = brain_data,
  gene_data = gene_data,
  annoData = annoData,
  gsScoreList.null = gsScore.spin,
  cor_method = "pearson",
  aggre_method = "mean",
  n_cores = 0,
  minGSSize = 20,
  maxGSSize = 200,
  null_model = "spin_brain",
  n_perm = 5,
  pvalueCutoff = 0.8
)

The attribute of the gsScoreList.null should be the same as your input of the function for these variables: cor_method, aggre_method, minGSSize, maxGSSize, null_model, n_perm, if not, that means the gsScoreList.null is not usable for the currrent setup.

Note that the output of the brainscore.lm_test function is gseaResult object. Some methods and functions are available for this object. For example, you can convert it to dataframe easily and visualize the results (see section below).

res.df <- as.data.frame(res)
# make scollable table
kable(res.df, format = "html", escape = FALSE) %>%
  kable_styling("striped", full_width = FALSE) %>%
  scroll_box(width = "100%", height = "400px")
ID Description setSize Predictor Unstandardized_Coefficient Standard_Error t_Value Standardized_Coefficient CI_95_Lower CI_95_Upper pvalue p.adjust qvalue null_model core_enrichment
GO:0007268 GO:0007268 chemical synaptic transmission (GO:0007268) 109 BMI 0.0013914 0.0004921 2.827405 0.2746981 0.0818458 0.4675503 0.1666667 0.1666667 0.1666667 spin_brain AKAP12/PDE4A/NCAN/PACSIN2/DLGAP3/CAMKV/RPS6KB1/CHRNA4/GRM8/NR3C2/NRN1/SYT12/OPRM1/NRXN2/PLCB1/INA/NEFL/NEFH/ATG5
GO:0007416 GO:0007416 synapse assembly (GO:0007416) 92 BMI 0.0010665 0.0004921 2.167452 0.2148880 0.0180905 0.4116856 0.1666667 0.1666667 0.1666667 spin_brain CADM1/EFNB2/SDC2/LRRTM1/NRG1/FARP1/GRID2/LRRC4B/GPC4/NUMBL/CPEB3/PALM/PDLIM5/SYNDIG1/DCLK1/DOCK4/FLRT2/ASIC1
GO:0008021 GO:0008021 synaptic vesicle (GO:0008021) 82 BMI 0.0018788 0.0007204 2.607986 0.2542290 0.0607308 0.4477271 0.1666667 0.1666667 0.1666667 spin_brain TRIM9/BRSK1/DTNBP1/SH3GL2/GNAO1/GNB2/VAMP1/MAL2/SLC17A8/OPRK1/SV2B/SYT12/ATP6V0A4
GO:0016079 GO:0016079 synaptic vesicle exocytosis (GO:0016079) 58 BMI 0.0009755 0.0004370 2.232404 0.2221946 0.0246260 0.4197632 0.1666667 0.1666667 0.1666667 spin_brain UNC13C/STX1B/MCTP1/DOC2A/PRKCG/GIT2/SV2B/NPY1R/BRSK1
GO:0030672 GO:0030672 synaptic vesicle membrane (GO:0030672) 69 BMI 0.0019300 0.0007385 2.613354 0.2545216 0.0611986 0.4478446 0.1666667 0.1666667 0.1666667 spin_brain CLTA/SH3GL2/GNAO1/GNB2/ATP6V1G2/VAMP1/MAL2/SLC17A8/OPRK1/SV2B/SYT9/SYT12/ATP6V0A4
GO:0042734 GO:0042734 presynaptic membrane (GO:0042734) 103 BMI 0.0014652 0.0005742 2.551639 0.2506532 0.0556638 0.4456425 0.1666667 0.1666667 0.1666667 spin_brain SNPH/KCTD16/KCTD12/GNAO1/GNB5/RGS7/CTNNA2/EPHB2/ADCY8/EFNB2/CNR1/CACNA1C/NRXN1/CHRNA4/HTR1A/VAPB/DRD1/IGSF8/ABCC8
GO:0045211 GO:0045211 postsynaptic membrane (GO:0045211) 100 BMI 0.0012215 0.0004954 2.465768 0.2427001 0.0473225 0.4380776 0.1666667 0.1666667 0.1666667 spin_brain NTRK2/KCTD12/KCTD16/GNAO1/TENM2/FBXO2/KCNA2/EPHB2/FXYD6/SLC8A1/ADRA2C/GRIA3/ABHD6/KCNN2/HCN1/SDC2/CADM2
GO:0048786 GO:0048786 presynaptic active zone (GO:0048786) 75 BMI 0.0014962 0.0005709 2.620828 0.2579514 0.0625820 0.4533208 0.1666667 0.1666667 0.1666667 spin_brain FLOT1/PI4K2A/ERC1/ERC2/NR3C2/FLOT2/CACNA2D2/SCN8A/GRM7/GRIA1/GABRA2/KCNJ8/GRM8/HCN1/GRIA3/NRXN2
GO:0050804 GO:0050804 modulation of chemical synaptic transmission (GO:0050804) 61 BMI 0.0010765 0.0004931 2.183228 0.2157757 0.0195931 0.4119582 0.1666667 0.1666667 0.1666667 spin_brain ADCY8/AKAP12/PDE4A/PPP1R9A/DGKB/NCAN/SHANK1/CAMKV/RPS6KB1
GO:0051963 GO:0051963 regulation of synapse assembly (GO:0051963) 67 BMI 0.0009248 0.0004219 2.191853 0.2178028 0.0205564 0.4150492 0.1666667 0.1666667 0.1666667 spin_brain SDC2/LRRC4B/NUMBL/CPEB3/PALM/PDLIM5/DCLK1/DOCK4/ASIC1
GO:0098839 GO:0098839 postsynaptic density membrane (GO:0098839) 92 BMI 0.0014867 0.0005539 2.683837 0.2632578 0.0685505 0.4579650 0.1666667 0.1666667 0.1666667 spin_brain RGS9/MDGA1/ADCY1/EFNB2/SCN8A/ADRA2C/DCC/CACNG3/GRID2/LRRC4B/PTPRT/SHISA6/GRIA3/SEMA4F/TMEM108/ASIC1
GO:0099055 GO:0099055 integral component of postsynaptic membrane (GO:0099055) 76 BMI 0.0015037 0.0006345 2.369779 0.2338337 0.0379689 0.4296984 0.1666667 0.1666667 0.1666667 spin_brain EPHB2/CACNA1H/FXYD6/SLC6A6/GRIA3/ITGB1/DRD1/KCNN2/ADGRL3/CADM2/CACNA1A
GO:0099056 GO:0099056 integral component of presynaptic membrane (GO:0099056) 77 BMI 0.0017934 0.0007413 2.419248 0.2385934 0.0428284 0.4343584 0.1666667 0.1666667 0.1666667 spin_brain EPHB2/KCNQ5/ADCY8/EFNB2/CNR1/CACNA1C/CHRNA4/HTR1A/VAPB/IGSF8/ABCC8
GO:0099061 GO:0099061 integral component of postsynaptic density membrane (GO:0099061) 75 BMI 0.0015592 0.0005803 2.687153 0.2633527 0.0688156 0.4578898 0.1666667 0.1666667 0.1666667 spin_brain ADCY1/EFNB2/SCN8A/ADRA2C/DCC/GRID2/LRRC4B/LRRC7/SHISA6/GRIA3/SEMA4F/TMEM108/ASIC1
GO:0099072 GO:0099072 regulation of postsynaptic membrane neurotransmitter receptor levels (GO:0099072) 91 BMI 0.0013138 0.0004926 2.667087 0.2605179 0.0666270 0.4544088 0.1666667 0.1666667 0.1666667 spin_brain EPS8/CPT1C/CACNG2/ZDHHC5/AGAP3/ATG5/GRID2/ADAM22/TNIK/KALRN/GPC4/CACNA2D2/EPB41L3/GABARAP/NEDD4/OPHN1/RNF220
GO:0099504 GO:0099504 synaptic vesicle cycle (GO:0099504) 124 BMI 0.0015391 0.0005665 2.716692 0.2658276 0.0715974 0.4600578 0.1666667 0.1666667 0.1666667 spin_brain ATG7/NRXN1/ARHGDIA/UNC13C/MCTP1/ERC2/FBXL20/STXBP5L/SV2B/SLC4A8/EFR3A/BRSK1/SLC17A8/CLCN3/ATP6V1C1/ATP6V0A4/STON2/AP3M2/OPHN1/CXADR
GO:0099536 GO:0099536 synaptic signaling (GO:0099536) 138 BMI 0.0013507 0.0004851 2.784359 0.2704906 0.0776564 0.4633247 0.1666667 0.1666667 0.1666667 spin_brain PLCB1/ABHD6/TENM2/CADM1/FARP1/AKAP12/PDE4A/PLPPR4/PTPRA/MAPK3/SHANK1/CAMKV/RPS6KB1/GSK3B/CHRNA4/NR3C2/NRN1/SYT12/NRXN2/KCNK2/INA/NEFL/NEFH/ATG5
GO:0099537 GO:0099537 trans-synaptic signaling (GO:0099537) 131 BMI 0.0013964 0.0004943 2.824992 0.2742737 0.0815549 0.4669925 0.1666667 0.1666667 0.1666667 spin_brain PLCB1/ABHD6/CADM1/FARP1/AKAP12/PDE4A/PLPPR4/PTPRA/NCAN/MAPK3/SHANK1/CAMKV/RPS6KB1/HTR1A/CHRNA4/NR3C2/NRN1/SYT12/NRXN2/KCNK2/INA/NEFL/NEFH/ATG5
SYNGO:postsynprocess SYNGO:postsynprocess process in the postsynapse 158 BMI 0.0012515 0.0004827 2.592633 0.2540571 0.0595447 0.4485695 0.1666667 0.1666667 0.1666667 spin_brain NR3C2/HCN1/ANO6/GABRA2/GRIN2A/CHRNA3/GRIA1/GABRA1/GABRG2/GRIA3/KCNA2/KCNK1/CACNG4/EFNB2/CPT1C/AGAP3/CAMK2G/ATG5/GRID2/NSG2/ADAM22/KALRN/GPC4/GABARAP/NEDD4/OPHN1/RNF220/SYT6
SYNGO:presynprocess SYNGO:presynprocess process in the presynapse 179 BMI 0.0016795 0.0006074 2.764976 0.2705764 0.0763288 0.4648240 0.1666667 0.1666667 0.1666667 spin_brain SV2B/ERC2/NPY/TSPOAP1/CACNA1A/CACNB2/CACNB4/GRIA3/GABRA2/GABRB1/KCTD12/KCTD16/NRXN1/TBC1D24/UNC13C/STX1B/MCTP1/RELN/PRKCA/STXBP5L/EFR3A/BRSK1/SLC17A8/CLCN3/ATP6V1C1/ATP6V1G2/ATP6V0A4/RAB27B/AP3M2/LRRK2/SYT10/RAB10/SIPA1L2

4. Visualization

This section covers the visualization of the results. Note: the upsetplot, heatplot available for brainenrich resutls are not applicable for these results.

4.1. Gene-Concept Network

We will visualize the gene-concept network.

geneList <- res@geneList
showCategory <- 10
showID <- res@result$ID[1:showCategory]
show.geneList <- geneList[showID]
color_palette <- colorRampPalette(c("#0197b2", "white", "orange"))(100)
breaks <- seq(-5, 5, length.out = 101)
geneList_colors <- color_palette[cut(show.geneList, breaks = breaks, labels = FALSE)]
named_colors <- setNames(geneList_colors, names(show.geneList))
cnetplot(res,
  layout = "kk",
  color.params = list(category = named_colors),
  showCategory = showCategory, cex.params = list(
    category_node = 1, gene_node = 0.75,
    category_label = 1.2, gene_label = 0.6
  ),
) + scale_color_gradientn(colors = color_palette, limits = c(-5, 5), name = "t value")

4.2. Dot Plot

We will create a dot plot to visualize the results.

dotplot(res, x = "t_Value", label_format = 50, showCategory = 30) +
  xlab("Correlation") + scale_fill_gradient(
    high = "#0197b2", low = "orange",
    space = "Lab", name = "pval"
  )