Microbial Network Analysis in R: A Complete Tutorial
This tutorial introduces the construction, visualization, and interpretation of microbial association networks using several commonly used approaches: correlation-based, distance-based, and probabilistic models. It is designed for users with introductory R knowledge and microbiome experience.
1. Introduction
Microbial communities consist of interacting populations whose relationships can be represented as networks.
- Nodes represent microbial taxa (OTUs/ASVs)
- Edges represent significant associations between taxa
Understanding these associations helps disentangle ecological processes, niche differentiation, and community assembly.
1.1 Types of microbial networks
Common approaches to infer pairwise microbial associations include:
- Correlation-based: Spearman, SparCC
- Distance-based: Bray–Curtis, Jaccard, Kullback–Leibler
- Probability-based: Veech model (co-occurrence probability)
Each method captures different ecological signals and produces different network structures.
2. Requirements and Setup
2.1 Install required R packages
install.packages(c(
"ggplot2", "igraph", "reshape2", "Hmisc",
"vegan", "RColorBrewer", "pheatmap"
))
2.2 Load libraries
```library(ggplot2)
library(igraph)
library(reshape2)
library(Hmisc)
library(vegan)
library(RColorBrewer)
library(pheatmap)
```
3. Downloading the Tutorial Data
The tutorial uses six example files:
- otu_df.csv
- otu_df2.csv
- tax_df.csv
- tax_df2.csv
- meta_df.txt
- meta_df2.txt
Upload your files to GitHub, then replace URLs below.
3.1 Load the data
otu_df <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/otu_df.csv", row.names = 1)
otut_df <- otu_df / rowSums(otu_df)
otu_df2 <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/otu_df2.csv", row.names = 1)
otut_df2 <- otu_df2 / rowSums(otu_df2)
tax_df <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/tax_df.csv", row.names = 1)
tax_df2 <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/tax_df2.csv", row.names = 1)
sample_df <- read.table("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/meta_df.txt",
sep = " ", header = TRUE)
sample_df2 <- read.table("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/meta_df2.txt",
sep = " ", header = TRUE)
4. Network Construction Methods
This tutorial covers:
- Co-occurrence probability (Veech model)
- Spearman correlation
- Jaccard similarity
- Bray–Curtis similarity
5. Probabilistic Co-Occurrence Network (Veech Model)
5.1 Convert to presence/absence
otu_prob <- otu_df
otu_prob[otu_prob > 0] <- 1
bac_prob <- as.matrix(otu_prob)
occurs_prob <- crossprod(bac_prob)
5.2 Veech probability function
Veech_calc <- function(N, j, N1, N2){
(choose(N, j) * choose(N - j, N2 - j) * choose(N - N2, N1 - j)) /
(choose(N, N2) * choose(N, N1))
}
5.3 Compute probabilities
res <- matrix(NA, ncol(bac_prob), ncol(bac_prob))
for (i in 1:nrow(occurs_prob)) {
for (j in 2:ncol(occurs_prob)) {
if (j <= i) next
out <- 0
cooc <- occurs_prob[i,j] - 1
for (k in 0:cooc) {
out <- out + Veech_calc(nrow(bac_prob), k, occurs_prob[j,j], occurs_prob[i,i])
}
res[i,j] <- out
}
}
5.4 Filter significant associations
pos_prob <- subset(melt(res), value >= 0.95)
colnames(pos_prob) <- c("Source", "Target", "value")
pos_prob$Source <- colnames(bac_prob)[pos_prob$Source]
pos_prob$Target <- colnames(bac_prob)[pos_prob$Target]
5.5 Build and plot network
metanet_prob <- graph_from_data_frame(pos_prob[,1:2], directed = FALSE)
set.seed(1)
layout_prob <- layout_with_fr(metanet_prob)
plot(metanet_prob, vertex.size = 5, vertex.label = NA,
edge.color = "gray70", layout = layout_prob)
6. Correlation-Based Networks (Spearman)
sp.corr <- rcorr(as.matrix(otu_df), type = "spearman")
sp.corr.r <- sp.corr$r
sp.corr.p <- sp.corr$P
sp.corr.r[sp.corr.p > 0.05] <- 0
sp.corr.r[sp.corr.r < 0.5] <- 0
pos_corr <- melt(sp.corr.r)
pos_corr <- subset(pos_corr, value != 0)
7. Distance-Based Networks (Jaccard & Bray–Curtis)
7.1 Jaccard similarity
sim.jacc <- 1 - as.matrix(vegdist(t(otu_df),
method = "jaccard",
binary = TRUE))
7.2 Bray–Curtis similarity
sim.bc <- 1 - as.matrix(vegdist(t(otu_df), method = "bray"))
8. Comparing Networks
- Jaccard edge overlap
- Overlap coefficient
- Modularity
- Clustering coefficient
- Edge density
9. Case Study
Reconstruct networks for two datasets and compare:
- Global structure
- Sample subnetworks
- Module completeness
- Metadata relationships
10. Conclusions
This workflow provides a reproducible pipeline for constructing and comparing microbial networks.
11. References
- Layeghifard et al., 2018
- Röttjers & Faust, 2018
- Faust & Raes, 2012
- Veech, 2013
- Hernandez et al., 2021