Individual-level enrichment analysis (brainscore/brainscore.lm_test)
brainscore_demo.Rmd
Introduction
This tutorial demonstrates how to perform enrichment analysis of individual-level imaging-derived phenotypes (IDPs) using the BrainEnrich package.
1. Prepare Data for the Analysis
In this section, we will prepare the data needed for individual-level enrichment analysis.
Note that for demonstration purpose, the data used here is simulated from the HCP dataset.
1.1. Load sample data frame. The data is simulated from HCP data.
The data is simulated from HCP data. Let’s load and inspect it.
data(sim_hcp)
str(sim_hcp)
#> 'data.frame': 100 obs. of 37 variables:
#> $ L_bankssts : num 2.56 2.48 2.47 2.63 2.48 ...
#> $ L_caudalanteriorcingulate : num 2.56 2.7 2.63 2.48 2.43 ...
#> $ L_caudalmiddlefrontal : num 2.52 2.75 2.57 2.67 2.68 ...
#> $ L_cuneus : num 1.7 1.92 1.87 1.87 1.85 ...
#> $ L_entorhinal : num 3.51 3.38 3.38 3.15 2.63 ...
#> $ L_fusiform : num 2.57 2.64 2.7 2.55 2.56 ...
#> $ L_inferiorparietal : num 2.42 2.51 2.51 2.56 2.54 ...
#> $ L_inferiortemporal : num 2.55 2.71 2.76 2.8 2.76 ...
#> $ L_isthmuscingulate : num 2.18 2.54 2.43 2.41 2.47 ...
#> $ L_lateraloccipital : num 2 2.25 2.21 2.17 2.29 ...
#> $ L_lateralorbitofrontal : num 2.79 2.77 2.7 2.84 2.67 ...
#> $ L_lingual : num 1.84 2.07 2.11 2.12 2.07 ...
#> $ L_medialorbitofrontal : num 2.54 2.43 2.41 2.68 2.45 ...
#> $ L_middletemporal : num 2.75 2.64 2.8 2.9 2.81 ...
#> $ L_parahippocampal : num 2.88 2.47 2.68 3.1 2.98 ...
#> $ L_paracentral : num 2.32 2.43 2.38 2.56 2.47 ...
#> $ L_parsopercularis : num 2.63 2.67 2.47 2.57 2.72 ...
#> $ L_parsorbitalis : num 2.86 2.74 3.08 2.78 2.61 ...
#> $ L_parstriangularis : num 2.44 2.39 2.54 2.51 2.47 ...
#> $ L_pericalcarine : num 1.3 1.87 1.86 1.82 1.61 ...
#> $ L_postcentral : num 1.74 2.18 2.14 2.42 2.13 ...
#> $ L_posteriorcingulate : num 2.34 2.72 2.32 2.38 2.52 ...
#> $ L_precentral : num 2.54 2.7 2.69 2.79 2.64 ...
#> $ L_precuneus : num 2.26 2.35 2.31 2.34 2.36 ...
#> $ L_rostralanteriorcingulate: num 3.06 2.72 2.91 2.57 2.83 ...
#> $ L_rostralmiddlefrontal : num 2.34 2.42 2.39 2.57 2.49 ...
#> $ L_superiorfrontal : num 2.81 2.83 2.71 2.77 2.85 ...
#> $ L_superiorparietal : num 2.06 2.24 2.16 2.35 2.29 ...
#> $ L_superiortemporal : num 2.73 2.9 2.91 3.09 2.86 ...
#> $ L_supramarginal : num 2.42 2.48 2.62 2.71 2.62 ...
#> $ L_frontalpole : num 2.8 2.75 3.1 2.87 3.29 ...
#> $ L_temporalpole : num 3.52 3.44 3.89 3.67 2.87 ...
#> $ L_transversetemporal : num 2.16 2.64 2.71 2.58 2.47 ...
#> $ L_insula : num 2.83 3.23 2.95 3.16 3.04 ...
#> $ Age : int 29 28 36 29 28 27 29 26 34 31 ...
#> $ Sex : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 2 2 ...
#> $ BMI : num 37.2 23.8 30.1 32.1 25.7 ...
1.2. prepare brain_data for scoring
Next, we’ll prepare the brain data by selecting the relevant columns and transforming the data for scoring.
brain_data <- dplyr::select(sim_hcp, starts_with("L_")) %>% t()
colnames(brain_data) <- paste0("sub-", 1:ncol(brain_data))
str(brain_data)
#> num [1:34, 1:100] 2.56 2.56 2.52 1.7 3.51 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:34] "L_bankssts" "L_caudalanteriorcingulate" "L_caudalmiddlefrontal" "L_cuneus" ...
#> ..$ : chr [1:100] "sub-1" "sub-2" "sub-3" "sub-4" ...
1.3. Load gene expression data for the Analysis
We will now load gene expression data for the analysis. This data is based on a specific brain atlas (Desikan) and includes gene expression levels (correlation across donnors > 0.4) for the left hemisphere.
gene_data <- get_geneExp(atlas = "desikan", rdonor = "r0.4", hem = "L")
str(gene_data)
#> num [1:34, 1:6513] 0.469 0.607 0.512 0.341 0.664 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:34] "L_bankssts" "L_caudalanteriorcingulate" "L_caudalmiddlefrontal" "L_cuneus" ...
#> ..$ : chr [1:6513] "A1BG" "A1BG-AS1" "AACS" "AADAT" ...
1.4. Load SynGO for the Analysis
We will load SynGO terms that will be used in the enrichment analysis. This data represents gene sets associated with various molecular functions.
annoData <- get_annoData(type = "SynGO")
geneSetList <- get_geneSetList(annoData)
length(geneSetList)
#> [1] 302
print(sprintf("Number of SynGO terms: %d", length(geneSetList)))
#> [1] "Number of SynGO terms: 302"
Filter the geneSetList (Optional)
Filtering the gene sets by size is optional and is shown here for demonstration purposes. This step is embedded in the brainenrich/brainscore function.
selected.gs <- filter_geneSetList(bg_genes = colnames(gene_data), geneSetList = geneSetList, minGSSize = 20, maxGSSize = 200)
print(sprintf("%d MF terms have gene size ranging between 20 and 200", length(selected.gs)))
2. Get gene set scores using brainscore
In this section, we will run the individual enrichment analysis using the prepared data.
2.1. get the raw (empirical) score of the brain_data (null_model='none'
)
We calculate the gene set scores for the brain data without using any null model.
gsScore.raw <- brainscore(
brain_data = brain_data,
gene_data = gene_data,
annoData = annoData,
cor_method = "pearson",
aggre_method = "mean",
n_cores = 0,
minGSSize = 20,
maxGSSize = 200,
null_model = "none"
)
str(head(gsScore.raw))
#> List of 6
#> $ GO:0007268: num [1:100] 0.0588 0.0278 -0.013 0.0465 0.0445 ...
#> $ GO:0007416: num [1:100] 0.1185 0.0785 0.0442 0.1009 0.088 ...
#> $ GO:0008021: num [1:100] 0.25 0.216 0.148 0.223 0.216 ...
#> $ GO:0016079: num [1:100] -0.0939 -0.0846 -0.1204 -0.0569 -0.0439 ...
#> $ GO:0030285: num [1:100] 0.1037 0.1043 0.0359 0.1126 0.1658 ...
#> $ GO:0030672: num [1:100] 0.277 0.242 0.174 0.247 0.252 ...
The gsScore.raw is a list of molecular terms. For each molecular term, you can find individual gene set scores and use them for downstream analyses (e.g., correlation analysis, machine learning, clustering etc.).
2.2. Get the null score of the brain_data (null_model='spin_brain'
)
Next, we will calculate the null scores using the ‘spin_brain’ null model.
gsScore.spin <- brainscore(
brain_data = brain_data,
gene_data = gene_data,
annoData = annoData,
cor_method = "pearson",
aggre_method = "mean",
n_cores = 0,
minGSSize = 20,
maxGSSize = 200,
null_model = "spin_brain",
n_perm = 5,
perm_id = perm_id_dk_lh_5000
)
str(head(gsScore.spin, 5), 1)
#> List of 5
#> $ null_1:List of 55
#> $ null_2:List of 55
#> $ null_3:List of 55
#> $ null_4:List of 55
#> $ null_5:List of 55
The result is a list of lists. Each sublist contains null scores for the molecular terms (i.e. SynGO terms). This is useful for permutation tests, where you compare the observed results (e.g., correlation) with your dependent variable between empirical gsScore and null model gsScore.
2.3. Get the null score of the resample_gene (set null_model='resample_gene'
)
gsScore.resample <- brainscore(
brain_data = brain_data,
gene_data = gene_data,
annoData = annoData,
cor_method = "pearson",
aggre_method = "mean",
n_cores = 0,
minGSSize = 20,
maxGSSize = 200,
null_model = "resample_gene",
n_perm = 5
)
str(head(gsScore.resample, 5), 1)
#> List of 5
#> $ null_1:List of 55
#> $ null_2:List of 55
#> $ null_3:List of 55
#> $ null_4:List of 55
#> $ null_5:List of 55
The result is also a list of lists. Each sublist contains null scores for the molecular terms (i.e. SynGO terms). This is useful for permutation tests, where you compare the observed results (e.g., correlation) with your dependent variable between empirical gsScore and null model gsScore.
3. Perform linear regression between the gsScore and predictors using brainscore.lm_test
The function brainscore.lm_test is designed to perform linear regression tests between the gsScore and the brain data as follows: gsScore ~ predictor + covariates
. This would cover scenarios where gsScores will be used. The predictor can be either a single numeric variable or a single two-level factor variable. The covariates can be a data frame with multiple numeric variables or factors. The function will first get the emprical statistics (i.e., standardized beta corresponding to the predictor) and then test that against the null model (e.g., spin_brain) to get the p-value.
Normal setup
cov_df <- sim_hcp %>% dplyr::select(Age, Sex)
pred_df <- sim_hcp %>% dplyr::select(BMI)
res <- brainscore.lm_test(
pred_df = pred_df,
cov_df = cov_df,
brain_data = brain_data,
gene_data = gene_data,
annoData = annoData,
cor_method = "pearson",
aggre_method = "mean",
n_cores = 0,
minGSSize = 50,
maxGSSize = 200,
null_model = "spin_brain",
n_perm = 5,
perm_id = perm_id_dk_lh_5000,
pvalueCutoff = 0.8
)
For demonstration purposes, we only performed 10 permutations and set minGSSize to 50. In practice, you may need to perform more permutations (e.g., 5000) for a more reliable p-value.
Setup with precomputed null scores
As an alternative to the previous step, you can use precomputed null scores to perform linear regression tests between the gsScore and the brain data. Generating null brain scores can be time consuming, especially when the number of permutations is high. Therefore, we recommend using precomputed null scores for the analysis. Here we used the null scores generated in Section 2.2.
res1 <- brainscore.lm_test(
pred_df = pred_df,
cov_df = cov_df,
brain_data = brain_data,
gene_data = gene_data,
annoData = annoData,
gsScoreList.null = gsScore.spin,
cor_method = "pearson",
aggre_method = "mean",
n_cores = 0,
minGSSize = 20,
maxGSSize = 200,
null_model = "spin_brain",
n_perm = 5,
pvalueCutoff = 0.8
)
The attribute of the gsScoreList.null should be the same as your input of the function for these variables: cor_method, aggre_method, minGSSize, maxGSSize, null_model, n_perm, if not, that means the gsScoreList.null is not usable for the currrent setup.
Note that the output of the brainscore.lm_test function is gseaResult object. Some methods and functions are available for this object. For example, you can convert it to dataframe easily and visualize the results (see section below).
res.df <- as.data.frame(res)
# make scollable table
kable(res.df, format = "html", escape = FALSE) %>%
kable_styling("striped", full_width = FALSE) %>%
scroll_box(width = "100%", height = "400px")
ID | Description | setSize | Predictor | Unstandardized_Coefficient | Standard_Error | t_Value | Standardized_Coefficient | CI_95_Lower | CI_95_Upper | pvalue | p.adjust | qvalue | null_model | core_enrichment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GO:0007268 | GO:0007268 | chemical synaptic transmission (GO:0007268) | 109 | BMI | 0.0013914 | 0.0004921 | 2.827405 | 0.2746981 | 0.0818458 | 0.4675503 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | AKAP12/PDE4A/NCAN/PACSIN2/DLGAP3/CAMKV/RPS6KB1/CHRNA4/GRM8/NR3C2/NRN1/SYT12/OPRM1/NRXN2/PLCB1/INA/NEFL/NEFH/ATG5 |
GO:0007416 | GO:0007416 | synapse assembly (GO:0007416) | 92 | BMI | 0.0010665 | 0.0004921 | 2.167452 | 0.2148880 | 0.0180905 | 0.4116856 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | CADM1/EFNB2/SDC2/LRRTM1/NRG1/FARP1/GRID2/LRRC4B/GPC4/NUMBL/CPEB3/PALM/PDLIM5/SYNDIG1/DCLK1/DOCK4/FLRT2/ASIC1 |
GO:0008021 | GO:0008021 | synaptic vesicle (GO:0008021) | 82 | BMI | 0.0018788 | 0.0007204 | 2.607986 | 0.2542290 | 0.0607308 | 0.4477271 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | TRIM9/BRSK1/DTNBP1/SH3GL2/GNAO1/GNB2/VAMP1/MAL2/SLC17A8/OPRK1/SV2B/SYT12/ATP6V0A4 |
GO:0016079 | GO:0016079 | synaptic vesicle exocytosis (GO:0016079) | 58 | BMI | 0.0009755 | 0.0004370 | 2.232404 | 0.2221946 | 0.0246260 | 0.4197632 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | UNC13C/STX1B/MCTP1/DOC2A/PRKCG/GIT2/SV2B/NPY1R/BRSK1 |
GO:0030672 | GO:0030672 | synaptic vesicle membrane (GO:0030672) | 69 | BMI | 0.0019300 | 0.0007385 | 2.613354 | 0.2545216 | 0.0611986 | 0.4478446 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | CLTA/SH3GL2/GNAO1/GNB2/ATP6V1G2/VAMP1/MAL2/SLC17A8/OPRK1/SV2B/SYT9/SYT12/ATP6V0A4 |
GO:0042734 | GO:0042734 | presynaptic membrane (GO:0042734) | 103 | BMI | 0.0014652 | 0.0005742 | 2.551639 | 0.2506532 | 0.0556638 | 0.4456425 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | SNPH/KCTD16/KCTD12/GNAO1/GNB5/RGS7/CTNNA2/EPHB2/ADCY8/EFNB2/CNR1/CACNA1C/NRXN1/CHRNA4/HTR1A/VAPB/DRD1/IGSF8/ABCC8 |
GO:0045211 | GO:0045211 | postsynaptic membrane (GO:0045211) | 100 | BMI | 0.0012215 | 0.0004954 | 2.465768 | 0.2427001 | 0.0473225 | 0.4380776 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | NTRK2/KCTD12/KCTD16/GNAO1/TENM2/FBXO2/KCNA2/EPHB2/FXYD6/SLC8A1/ADRA2C/GRIA3/ABHD6/KCNN2/HCN1/SDC2/CADM2 |
GO:0048786 | GO:0048786 | presynaptic active zone (GO:0048786) | 75 | BMI | 0.0014962 | 0.0005709 | 2.620828 | 0.2579514 | 0.0625820 | 0.4533208 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | FLOT1/PI4K2A/ERC1/ERC2/NR3C2/FLOT2/CACNA2D2/SCN8A/GRM7/GRIA1/GABRA2/KCNJ8/GRM8/HCN1/GRIA3/NRXN2 |
GO:0050804 | GO:0050804 | modulation of chemical synaptic transmission (GO:0050804) | 61 | BMI | 0.0010765 | 0.0004931 | 2.183228 | 0.2157757 | 0.0195931 | 0.4119582 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | ADCY8/AKAP12/PDE4A/PPP1R9A/DGKB/NCAN/SHANK1/CAMKV/RPS6KB1 |
GO:0051963 | GO:0051963 | regulation of synapse assembly (GO:0051963) | 67 | BMI | 0.0009248 | 0.0004219 | 2.191853 | 0.2178028 | 0.0205564 | 0.4150492 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | SDC2/LRRC4B/NUMBL/CPEB3/PALM/PDLIM5/DCLK1/DOCK4/ASIC1 |
GO:0098839 | GO:0098839 | postsynaptic density membrane (GO:0098839) | 92 | BMI | 0.0014867 | 0.0005539 | 2.683837 | 0.2632578 | 0.0685505 | 0.4579650 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | RGS9/MDGA1/ADCY1/EFNB2/SCN8A/ADRA2C/DCC/CACNG3/GRID2/LRRC4B/PTPRT/SHISA6/GRIA3/SEMA4F/TMEM108/ASIC1 |
GO:0099055 | GO:0099055 | integral component of postsynaptic membrane (GO:0099055) | 76 | BMI | 0.0015037 | 0.0006345 | 2.369779 | 0.2338337 | 0.0379689 | 0.4296984 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | EPHB2/CACNA1H/FXYD6/SLC6A6/GRIA3/ITGB1/DRD1/KCNN2/ADGRL3/CADM2/CACNA1A |
GO:0099056 | GO:0099056 | integral component of presynaptic membrane (GO:0099056) | 77 | BMI | 0.0017934 | 0.0007413 | 2.419248 | 0.2385934 | 0.0428284 | 0.4343584 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | EPHB2/KCNQ5/ADCY8/EFNB2/CNR1/CACNA1C/CHRNA4/HTR1A/VAPB/IGSF8/ABCC8 |
GO:0099061 | GO:0099061 | integral component of postsynaptic density membrane (GO:0099061) | 75 | BMI | 0.0015592 | 0.0005803 | 2.687153 | 0.2633527 | 0.0688156 | 0.4578898 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | ADCY1/EFNB2/SCN8A/ADRA2C/DCC/GRID2/LRRC4B/LRRC7/SHISA6/GRIA3/SEMA4F/TMEM108/ASIC1 |
GO:0099072 | GO:0099072 | regulation of postsynaptic membrane neurotransmitter receptor levels (GO:0099072) | 91 | BMI | 0.0013138 | 0.0004926 | 2.667087 | 0.2605179 | 0.0666270 | 0.4544088 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | EPS8/CPT1C/CACNG2/ZDHHC5/AGAP3/ATG5/GRID2/ADAM22/TNIK/KALRN/GPC4/CACNA2D2/EPB41L3/GABARAP/NEDD4/OPHN1/RNF220 |
GO:0099504 | GO:0099504 | synaptic vesicle cycle (GO:0099504) | 124 | BMI | 0.0015391 | 0.0005665 | 2.716692 | 0.2658276 | 0.0715974 | 0.4600578 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | ATG7/NRXN1/ARHGDIA/UNC13C/MCTP1/ERC2/FBXL20/STXBP5L/SV2B/SLC4A8/EFR3A/BRSK1/SLC17A8/CLCN3/ATP6V1C1/ATP6V0A4/STON2/AP3M2/OPHN1/CXADR |
GO:0099536 | GO:0099536 | synaptic signaling (GO:0099536) | 138 | BMI | 0.0013507 | 0.0004851 | 2.784359 | 0.2704906 | 0.0776564 | 0.4633247 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | PLCB1/ABHD6/TENM2/CADM1/FARP1/AKAP12/PDE4A/PLPPR4/PTPRA/MAPK3/SHANK1/CAMKV/RPS6KB1/GSK3B/CHRNA4/NR3C2/NRN1/SYT12/NRXN2/KCNK2/INA/NEFL/NEFH/ATG5 |
GO:0099537 | GO:0099537 | trans-synaptic signaling (GO:0099537) | 131 | BMI | 0.0013964 | 0.0004943 | 2.824992 | 0.2742737 | 0.0815549 | 0.4669925 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | PLCB1/ABHD6/CADM1/FARP1/AKAP12/PDE4A/PLPPR4/PTPRA/NCAN/MAPK3/SHANK1/CAMKV/RPS6KB1/HTR1A/CHRNA4/NR3C2/NRN1/SYT12/NRXN2/KCNK2/INA/NEFL/NEFH/ATG5 |
SYNGO:postsynprocess | SYNGO:postsynprocess | process in the postsynapse | 158 | BMI | 0.0012515 | 0.0004827 | 2.592633 | 0.2540571 | 0.0595447 | 0.4485695 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | NR3C2/HCN1/ANO6/GABRA2/GRIN2A/CHRNA3/GRIA1/GABRA1/GABRG2/GRIA3/KCNA2/KCNK1/CACNG4/EFNB2/CPT1C/AGAP3/CAMK2G/ATG5/GRID2/NSG2/ADAM22/KALRN/GPC4/GABARAP/NEDD4/OPHN1/RNF220/SYT6 |
SYNGO:presynprocess | SYNGO:presynprocess | process in the presynapse | 179 | BMI | 0.0016795 | 0.0006074 | 2.764976 | 0.2705764 | 0.0763288 | 0.4648240 | 0.1666667 | 0.1666667 | 0.1666667 | spin_brain | SV2B/ERC2/NPY/TSPOAP1/CACNA1A/CACNB2/CACNB4/GRIA3/GABRA2/GABRB1/KCTD12/KCTD16/NRXN1/TBC1D24/UNC13C/STX1B/MCTP1/RELN/PRKCA/STXBP5L/EFR3A/BRSK1/SLC17A8/CLCN3/ATP6V1C1/ATP6V1G2/ATP6V0A4/RAB27B/AP3M2/LRRK2/SYT10/RAB10/SIPA1L2 |
4. Visualization
This section covers the visualization of the results. Note: the upsetplot, heatplot available for brainenrich resutls are not applicable for these results.
4.1. Gene-Concept Network
We will visualize the gene-concept network.
geneList <- res@geneList
showCategory <- 10
showID <- res@result$ID[1:showCategory]
show.geneList <- geneList[showID]
color_palette <- colorRampPalette(c("#0197b2", "white", "orange"))(100)
breaks <- seq(-5, 5, length.out = 101)
geneList_colors <- color_palette[cut(show.geneList, breaks = breaks, labels = FALSE)]
named_colors <- setNames(geneList_colors, names(show.geneList))
cnetplot(res,
layout = "kk",
color.params = list(category = named_colors),
showCategory = showCategory, cex.params = list(
category_node = 1, gene_node = 0.75,
category_label = 1.2, gene_label = 0.6
),
) + scale_color_gradientn(colors = color_palette, limits = c(-5, 5), name = "t value")
4.2. Dot Plot
We will create a dot plot to visualize the results.
dotplot(res, x = "t_Value", label_format = 50, showCategory = 30) +
xlab("Correlation") + scale_fill_gradient(
high = "#0197b2", low = "orange",
space = "Lab", name = "pval"
)