SCS【37】hdWGCNA在空间转录组学中的作用

简介

生物系统是非常复杂的，在不同分子、细胞、器官和有机体之间严格调节的相互作用的基础上，被组织成一个多尺度的功能细胞层次。虽然实验方法能够在数百万个细胞中进行转录组范围的测量，但流行的生物信息学工具不支持系统级分析。在这里，我们提出了hdWGCNA，这是一个全面的框架，用于分析高维转录组学数据中的共表达网络，如单细胞和空间RNA测序(RNA-seq)。hdWGCNA提供网络推理、基因模块识别、基因富集分析、统计测试和数据可视化等功能。除了传统的单细胞RNA-seq, hdWGCNA能够使用长读单细胞数据进行同型水平的网络分析。我们利用自闭症谱系障碍和阿尔茨海默病大脑样本的数据展示了hdWGCNA，确定了与疾病相关的共表达网络模块。hdWGCNA与Seurat直接兼容，Seurat是一个广泛使用的R包，用于单细胞和空间转录组学分析，我们通过分析包含近100万个细胞的数据集来证明hdWGCNA的可扩展性。

hdWGCNA工作流程及其在人类前额皮质中的应用综述：

(A)scRNA-seq数据集上标准hdWGCNA工作流程的示意图概述。UMAP图显示了来自11名认知正常供体的36671个细胞Zhou等人的前额皮质(PFC)数据集。ASC,星形胶质细胞;EX，兴奋性神经元;INH，抑制性神经元;毫克,小胶质细胞;ODC,少突胶质细胞;OPC,少突胶质祖细胞。

(B)密度图显示单细胞(sc)表达矩阵和元细胞表达基因之间的成对Pearson相关性分布具有变化的K近邻参数K值的矩阵。

(C)每种细胞类型中不同K值的sc、伪体积(pb)和元细胞矩阵的表达矩阵密度(1，稀疏度)。

(D)kME对INH-M6、EX-M2、ODC-M3、OPC-M2、ASC-M18和MG-M14中前5个枢纽基因的标度基因表达热图。

(E)(D)中所选模块的snRNA-seq用模块特征基因(ME)着色的UMAP。

(F)ODC共表达网络的UMAP图。每个节点代表一个基因，边缘代表基因与模块集线器之间的共表达链接基因。点的大小按kME缩放。节点通过共表达式模块分配来着色。每个模块的前两个中心基因被标记。网络边缘是为视觉清晰度下采样。

(G) 10个ODC共表达模块(F)的snRNA-seq UMAP (A)，用MEs标记。

(H) Morabito et al.12人类PFC数据集中ODC模块的模块保存分析。

软件包安装

R 包 hdWGCNA 安装时候依赖很多其他的包，并且加载时总是断掉，因此大家多尝试几次，保证网速，即可实现安装成功。

devtools::install_github('smorabit/hdWGCNA', ref='dev')
install.packages("WGCNA")
BiocManager::install("impute")

数据读取

首先，加载所需要软件包，如下：

# single-cell analysis package
library(Seurat)

# package to install the mouse brain dataset
library(SeuratData)

# plotting and data science packages
library(tidyverse)
library(cowplot)
library(patchwork)

# co-expression network analysis packages:
library(WGCNA)
library(hdWGCNA)

# install this package, which allows us to compute distance between the spots
# install.packages('proxy')
library(proxy)

# enable parallel processing for network analysis (optional)
enableWGCNAThreads(nThreads = 2)
## Allowing parallel execution with up to 2 working processes.

# using the cowplot theme for ggplot
theme_set(theme_cowplot())

# set random seed for reproducibility
set.seed(12345)

其次，下载并处理鼠标大脑数据集，然后保存.rds文件，方便后续调取使用。

# download the mouse brain ST dataset (stxBrain)
SeuratData::InstallData("stxBrain")

# load the anterior and posterior samples
brain <- LoadData("stxBrain", type = "anterior1")
brain$region <- 'anterior'
brain2 <- LoadData("stxBrain", type = "posterior1")
brain2$region <- 'posterior'

# merge into one seurat object
seurat_obj <- merge(brain, brain2)
seurat_obj$region <- factor(as.character(seurat_obj$region), levels=c('anterior', 'posterior'))

# save unprocessed object
saveRDS(seurat_obj, file='mouse_brain_ST_unprocessed.rds')

加载保存后的.rds文件即可：

seurat_obj <- readRDS("mouse_brain_ST_unprocessed.rds")

对单细胞空间转录组数据进行处理，这里就是一般的空间单细胞处理流程，具体可以参考桓峰基因公众号上的推文：

SCS【21】单细胞空间转录组可视化 (Seurat V5)

实例操作

对单细胞空间转录组数据进行常规处理

# make a dataframe containing the image coordinates for each sample
image_df <- do.call(rbind, lapply(names(seurat_obj@images), function(x) {
    seurat_obj@images[[x]]@coordinates
}))

# merge the image_df with the Seurat metadata
new_meta <- merge(seurat_obj@meta.data, image_df, by = "row.names")

# fix the row ordering to match the original seurat object
rownames(new_meta) <- new_meta$Row.names
ix <- match(as.character(colnames(seurat_obj)), as.character(rownames(new_meta)))
new_meta <- new_meta[ix, ]

# add the new metadata to the seurat object
seurat_obj@meta.data <- new_meta

head(image_df)
##                      tissue row col imagerow imagecol
## AAACAAGTATCTCCCA-1_1      1  50 102     7475     8501
## AAACACCAATAACTGC-1_1      1  59  19     8553     2788
## AAACAGAGCGACTCCT-1_1      1  14  94     3164     7950
## AAACAGCTTTCAGAAG-1_1      1  43   9     6637     2099
## AAACAGGGTCTATATT-1_1      1  47  13     7116     2375
## AAACATGGTGAGAGGA-1_1      1  62   0     8913     1480
# normalization, feature selection, scaling, and PCA
seurat_obj <- seurat_obj %>%
    NormalizeData() %>%
    FindVariableFeatures() %>%
    ScaleData() %>%
    RunPCA()

# Louvain clustering and umap
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:30)
seurat_obj <- FindClusters(seurat_obj, verbose = TRUE)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 6049
## Number of edges: 188730
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9093
## Number of communities: 22
## Elapsed time: 0 seconds
seurat_obj <- RunUMAP(seurat_obj, dims = 1:30)

数据可视化

绘制UMAP聚类图

# show the UMAP
p1 <- DimPlot(seurat_obj, label = TRUE, reduction = "umap", group.by = "seurat_clusters") +
    NoLegend()
p1

绘制空间聚类图

p2 <- SpatialDimPlot(seurat_obj, label = TRUE, label.size = 3)
p2

####　加入细胞注释结果并可视化

# add annotations to Seurat object
annotations <- read.table("annotations.txt", header = T, sep = "\t")
head(annotations)
##   seurat_clusters   annotation
## 1               0 Caudoputamen
## 2               1 White matter
## 3               2    Cortex L5
## 4               3 White matter
## 5               4    Cortex L6
## 6               5      Medulla
ix <- match(seurat_obj$seurat_clusters, annotations$seurat_clusters)
seurat_obj$annotation <- annotations$annotation[ix]

# set idents
Idents(seurat_obj) <- seurat_obj$annotation

p3 <- SpatialDimPlot(seurat_obj, label = TRUE, label.size = 3)
p3 + NoLegend()

构建 WGCNA 数据结构：

现在我们准备使用与单细胞工作流相同的管道来执行共表达网络分析。在这个分析中，我们使用所有区域的所有点进行共表达网络分析，但这种分析可以调整为对特定区域进行网络分析。

seurat_obj <- SetupForWGCNA(seurat_obj, gene_select = "fraction", fraction = 0.05,
    wgcna_name = "vis")

seurat_obj <- MetaspotsByGroups(seurat_obj, group.by = c("region"), ident.group = "region",
    assay = "Spatial", slot = "counts")
seurat_obj <- NormalizeMetacells(seurat_obj)

m_obj <- GetMetacellObject(seurat_obj)
m_obj
## An object of class Seurat 
## 31053 features across 1505 samples within 1 assay 
## Active assay: Spatial (31053 features, 0 variable features)
##  2 layers present: counts, data

# set up the expression matrix, set group.by and group_name to NULL to include
# all spots
seurat_obj <- SetDatExpr(seurat_obj, group.by = NULL, group_name = NULL)

# test different soft power thresholds
seurat_obj <- TestSoftPowers(seurat_obj)
## pickSoftThreshold: will use block size 3621.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 3621 of 12355
##    ..working on genes 3622 through 7242 of 12355
##    ..working on genes 7243 through 10863 of 12355
##    ..working on genes 10864 through 12355 of 12355
##    Power SFT.R.sq   slope truncated.R.sq  mean.k. median.k. max.k.
## 1      1 0.076400  7.3400          0.847 6.41e+03  6.43e+03 7130.0
## 2      2 0.000029  0.0755          0.924 3.36e+03  3.37e+03 4220.0
## 3      3 0.038200 -1.8700          0.978 1.78e+03  1.78e+03 2550.0
## 4      4 0.097800 -2.2900          0.994 9.53e+02  9.45e+02 1570.0
## 5      5 0.188000 -2.5200          0.957 5.16e+02  5.04e+02  983.0
## 6      6 0.404000 -3.0600          0.930 2.83e+02  2.72e+02  627.0
## 7      7 0.691000 -3.6600          0.963 1.57e+02  1.47e+02  422.0
## 8      8 0.818000 -3.8200          0.981 8.82e+01  8.04e+01  293.0
## 9      9 0.878000 -3.7000          0.985 5.04e+01  4.41e+01  207.0
## 10    10 0.909000 -3.4100          0.986 2.92e+01  2.43e+01  150.0
## 11    12 0.940000 -2.8300          0.988 1.04e+01  7.54e+00   83.6
## 12    14 0.960000 -2.3100          0.986 4.09e+00  2.41e+00   49.9
## 13    16 0.985000 -1.9400          0.994 1.78e+00  7.79e-01   32.4
## 14    18 0.977000 -1.8000          0.994 8.67e-01  2.58e-01   25.4
## 15    20 0.965000 -1.6900          0.986 4.70e-01  8.73e-02   20.9
## 16    22 0.949000 -1.6200          0.966 2.80e-01  3.00e-02   17.7
## 17    24 0.953000 -1.5500          0.966 1.80e-01  1.06e-02   15.2
## 18    26 0.915000 -1.5300          0.911 1.23e-01  3.79e-03   13.3
## 19    28 0.945000 -1.4600          0.945 8.83e-02  1.39e-03   11.7
## 20    30 0.960000 -1.4200          0.958 6.58e-02  5.12e-04   10.4
plot_list <- PlotSoftPowers(seurat_obj)
##   Power     SFT.R.sq       slope truncated.R.sq   mean.k. median.k.    max.k.
## 1     1 7.639643e-02  7.33514850      0.8466033 6405.5297 6432.7005 7132.2276
## 2     2 2.904133e-05  0.07551022      0.9240082 3358.2870 3373.6067 4220.6212
## 3     3 3.820507e-02 -1.87021534      0.9781808 1779.6276 1780.2656 2550.8785
## 4     4 9.781546e-02 -2.29479976      0.9944443  953.2136  944.6947 1570.6053
## 5     5 1.876156e-01 -2.51991523      0.9568117  516.2232  504.2980  983.3341
## 6     6 4.044255e-01 -3.06021052      0.9296996  282.8325  271.6237  627.0142

wrap_plots(plot_list, ncol = 2)

一般单细胞转录数据的hdWGCNA教程可以参考：

SCS【28】单细胞转录组加权基因共表达网络分析(hdWGCNA)

构建共表达网络并绘制系统树图

# construct co-expression network:
seurat_obj <- ConstructNetwork(seurat_obj, tom_name = "test", overwrite_tom = TRUE)
## Soft power not provided. Automatically using the lowest power that meets 0.8 scale-free topology fit. Using soft_power = 8
##  Calculating consensus modules and module eigengenes block-wise from all genes
##  Calculating topological overlaps block-wise from all genes
##    Flagging genes and samples with too many missing values...
##     ..step 1
##     TOM calculation: adjacency..
##     ..will not use multithreading.
##      Fraction of slow calculations: 0.000000
##     ..connectivity..
##     ..matrix multiplication (system BLAS)..
##     ..normalization..
##     ..done.
##  ..Working on block 1 .
##  ..Working on block 1 .
##  ..merging consensus modules that are too close..

# plot the dendrogram
PlotDendrogram(seurat_obj, main = "Spatial hdWGCNA dendrogram")

计算 MEs 并使用 Seurat 的 DotPlot 函数绘制

我们分别使用ModuleEigengenes和ModuleConnectivity函数计算模块特征基因(MEs)和基于特征基因的连通性(kMEs)。

seurat_obj <- ModuleEigengenes(seurat_obj)
## [1] "grey"
## [1] "red"
## [1] "turquoise"
## [1] "brown"
## [1] "tan"
## [1] "black"
## [1] "blue"
## [1] "yellow"
## [1] "pink"
## [1] "purple"
## [1] "magenta"
## [1] "green"
## [1] "greenyellow"
seurat_obj <- ModuleConnectivity(seurat_obj)

seurat_obj <- ResetModuleNames(seurat_obj, new_name = "SM")

modules <- GetModules(seurat_obj) %>%
    subset(module != "grey")
head(modules[, 1:3])
##                   gene_name module     color
## Rgs20                 Rgs20    SM1       red
## Oprk1                 Oprk1    SM1       red
## St18                   St18    SM2 turquoise
## 3110035E14Rik 3110035E14Rik    SM3     brown
## A830018L16Rik A830018L16Rik    SM3     brown
## Sulf1                 Sulf1    SM4       tan

数据结果可视化

# get module eigengenes and gene-module assignment tables
MEs <- GetMEs(seurat_obj)
modules <- GetModules(seurat_obj)
mods <- levels(modules$module)
mods <- mods[mods != "grey"]

# add the MEs to the seurat metadata so we can plot it with Seurat functions
seurat_obj@meta.data <- cbind(seurat_obj@meta.data, MEs)

# plot with Seurat's DotPlot function
p4 <- DotPlot(seurat_obj, features = mods, group.by = "annotation", dot.min = 0.1)

# flip the x/y axes, rotate the axis labels, and change color scheme:
p4 <- p4 + coord_flip() + RotatedAxis() + scale_color_gradient2(high = "red", mid = "grey95",
    low = "blue") + xlab("") + ylab("")

p5 <- SpatialFeaturePlot(
  seurat_obj,
  features = mods,
  alpha = c(0.1, 1),
  ncol = 8
)

png("MEs_featureplot.png", height=16, width=20, units='in', res=200)
p5
dev.off()

Reference

Morabito S, Reese F, Rahimzadeh N, Miyoshi E, Swarup V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Rep Methods. 2023;3(6):100498. Published 2023 Jun 12. doi:10.1016/j.crmeth.2023.100498
Morabito, S., Miyoshi, E., Michael, N. et al. Single-nucleus chromatin accessibility and transcriptomic characterization of Alzheimer’s disease. Nat Genet 53, 1143–1155 (2021). https://doi.org/10.1038/s41588-021-00894-z