https://github.com/mmistakes/minimal-mistakes/discussions/3160

Microbial Network Analysis in R: A Complete Tutorial

This tutorial introduces the construction, visualization, and interpretation of microbial association networks using several commonly used approaches: correlation-based, distance-based, and probabilistic models. It is designed for users with introductory R knowledge and microbiome experience.


1. Introduction

Microbial communities consist of interacting populations whose relationships can be represented as networks.

  • Nodes represent microbial taxa (OTUs/ASVs)
  • Edges represent significant associations between taxa

Understanding these associations helps disentangle ecological processes, niche differentiation, and community assembly.

1.1 Types of microbial networks

Common approaches to infer pairwise microbial associations include:

  • Correlation-based: Spearman, SparCC
  • Distance-based: Bray–Curtis, Jaccard, Kullback–Leibler
  • Probability-based: Veech model (co-occurrence probability)

Each method captures different ecological signals and produces different network structures.


2. Requirements and Setup

2.1 Install required R packages

install.packages(c(
  "ggplot2", "igraph", "reshape2", "Hmisc", 
  "vegan", "RColorBrewer", "pheatmap"
))

2.2 Load libraries

```
library(ggplot2)
library(igraph)
library(reshape2)
library(Hmisc)
library(vegan)
library(RColorBrewer)
library(pheatmap)
```

3. Downloading the Tutorial Data

The tutorial uses six example files:

  • otu_df.csv
  • otu_df2.csv
  • tax_df.csv
  • tax_df2.csv
  • meta_df.txt
  • meta_df2.txt

Upload your files to GitHub, then replace URLs below.

3.1 Load the data

otu_df <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/otu_df.csv", row.names = 1)
otut_df <- otu_df / rowSums(otu_df)

otu_df2 <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/otu_df2.csv", row.names = 1)
otut_df2 <- otu_df2 / rowSums(otu_df2)

tax_df <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/tax_df.csv", row.names = 1)
tax_df2 <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/tax_df2.csv", row.names = 1)

sample_df <- read.table("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/meta_df.txt",
                        sep = "	", header = TRUE)
sample_df2 <- read.table("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/meta_df2.txt",
                         sep = "	", header = TRUE)

4. Network Construction Methods

This tutorial covers:

  1. Co-occurrence probability (Veech model)
  2. Spearman correlation
  3. Jaccard similarity
  4. Bray–Curtis similarity

5. Probabilistic Co-Occurrence Network (Veech Model)

5.1 Convert to presence/absence

otu_prob <- otu_df
otu_prob[otu_prob > 0] <- 1
bac_prob <- as.matrix(otu_prob)
occurs_prob <- crossprod(bac_prob)

5.2 Veech probability function

Veech_calc <- function(N, j, N1, N2){
  (choose(N, j) * choose(N - j, N2 - j) * choose(N - N2, N1 - j)) /
    (choose(N, N2) * choose(N, N1))
}

5.3 Compute probabilities

res <- matrix(NA, ncol(bac_prob), ncol(bac_prob))

for (i in 1:nrow(occurs_prob)) {
  for (j in 2:ncol(occurs_prob)) {
    if (j <= i) next
    out <- 0
    cooc <- occurs_prob[i,j] - 1
    for (k in 0:cooc) {
      out <- out + Veech_calc(nrow(bac_prob), k, occurs_prob[j,j], occurs_prob[i,i])
    }
    res[i,j] <- out
  }
}

5.4 Filter significant associations

pos_prob <- subset(melt(res), value >= 0.95)
colnames(pos_prob) <- c("Source", "Target", "value")

pos_prob$Source <- colnames(bac_prob)[pos_prob$Source]
pos_prob$Target <- colnames(bac_prob)[pos_prob$Target]

5.5 Build and plot network

metanet_prob <- graph_from_data_frame(pos_prob[,1:2], directed = FALSE)

set.seed(1)
layout_prob <- layout_with_fr(metanet_prob)
plot(metanet_prob, vertex.size = 5, vertex.label = NA,
     edge.color = "gray70", layout = layout_prob)

6. Correlation-Based Networks (Spearman)

sp.corr <- rcorr(as.matrix(otu_df), type = "spearman")
sp.corr.r <- sp.corr$r
sp.corr.p <- sp.corr$P

sp.corr.r[sp.corr.p > 0.05] <- 0
sp.corr.r[sp.corr.r < 0.5] <- 0

pos_corr <- melt(sp.corr.r)
pos_corr <- subset(pos_corr, value != 0)

7. Distance-Based Networks (Jaccard & Bray–Curtis)

7.1 Jaccard similarity

sim.jacc <- 1 - as.matrix(vegdist(t(otu_df), 
                                  method = "jaccard", 
                                  binary = TRUE))

7.2 Bray–Curtis similarity

sim.bc <- 1 - as.matrix(vegdist(t(otu_df), method = "bray"))

8. Comparing Networks

  • Jaccard edge overlap
  • Overlap coefficient
  • Modularity
  • Clustering coefficient
  • Edge density

9. Case Study

Reconstruct networks for two datasets and compare:

  • Global structure
  • Sample subnetworks
  • Module completeness
  • Metadata relationships

10. Conclusions

This workflow provides a reproducible pipeline for constructing and comparing microbial networks.


11. References

  • Layeghifard et al., 2018
  • Röttjers & Faust, 2018
  • Faust & Raes, 2012
  • Veech, 2013
  • Hernandez et al., 2021
<\div>