https://github.com/mmistakes/minimal-mistakes/discussions/3160

Microbial Network Analysis in R: A Complete Tutorial

This tutorial introduces the construction, visualization, and interpretation of microbial association networks using several commonly used approaches: correlation-based, distance-based, and probabilistic models. It is designed for users with introductory R knowledge and microbiome experience.

1. Introduction

Microbial communities consist of interacting populations whose relationships can be represented as networks.

Nodes represent microbial taxa (OTUs/ASVs)
Edges represent significant associations between taxa

Understanding these associations helps disentangle ecological processes, niche differentiation, and community assembly.

1.1 Types of microbial networks

Common approaches to infer pairwise microbial associations include:

Correlation-based: Spearman, SparCC
Distance-based: Bray–Curtis, Jaccard, Kullback–Leibler
Probability-based: Veech model (co-occurrence probability)

Each method captures different ecological signals and produces different network structures.

2. Requirements and Setup

2.1 Install required R packages

install.packages(c(
  "ggplot2", "igraph", "reshape2", "Hmisc", 
  "vegan", "RColorBrewer", "pheatmap"
))

2.2 Load libraries

```

library(ggplot2)
library(igraph)
library(reshape2)
library(Hmisc)
library(vegan)
library(RColorBrewer)
library(pheatmap)

```

3. Downloading the Tutorial Data

The tutorial uses six example files:

otu_df.csv
otu_df2.csv
tax_df.csv
tax_df2.csv
meta_df.txt
meta_df2.txt

Upload your files to GitHub, then replace URLs below.

3.1 Load the data

otu_df <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/otu_df.csv", row.names = 1)
otut_df <- otu_df / rowSums(otu_df)

otu_df2 <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/otu_df2.csv", row.names = 1)
otut_df2 <- otu_df2 / rowSums(otu_df2)

tax_df <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/tax_df.csv", row.names = 1)
tax_df2 <- read.csv("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/tax_df2.csv", row.names = 1)

sample_df <- read.table("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/meta_df.txt",
                        sep = "	", header = TRUE)
sample_df2 <- read.table("https://raw.githubusercontent.com/<USER>/<REPO>/main/data/meta_df2.txt",
                         sep = "	", header = TRUE)

4. Network Construction Methods

This tutorial covers:

Co-occurrence probability (Veech model)
Spearman correlation
Jaccard similarity
Bray–Curtis similarity

5. Probabilistic Co-Occurrence Network (Veech Model)

5.1 Convert to presence/absence

otu_prob <- otu_df
otu_prob[otu_prob > 0] <- 1
bac_prob <- as.matrix(otu_prob)
occurs_prob <- crossprod(bac_prob)

5.2 Veech probability function

Veech_calc <- function(N, j, N1, N2){
  (choose(N, j) * choose(N - j, N2 - j) * choose(N - N2, N1 - j)) /
    (choose(N, N2) * choose(N, N1))
}

5.3 Compute probabilities

res <- matrix(NA, ncol(bac_prob), ncol(bac_prob))

for (i in 1:nrow(occurs_prob)) {
  for (j in 2:ncol(occurs_prob)) {
    if (j <= i) next
    out <- 0
    cooc <- occurs_prob[i,j] - 1
    for (k in 0:cooc) {
      out <- out + Veech_calc(nrow(bac_prob), k, occurs_prob[j,j], occurs_prob[i,i])
    }
    res[i,j] <- out
  }
}

5.4 Filter significant associations

pos_prob <- subset(melt(res), value >= 0.95)
colnames(pos_prob) <- c("Source", "Target", "value")

pos_prob$Source <- colnames(bac_prob)[pos_prob$Source]
pos_prob$Target <- colnames(bac_prob)[pos_prob$Target]

5.5 Build and plot network

metanet_prob <- graph_from_data_frame(pos_prob[,1:2], directed = FALSE)

set.seed(1)
layout_prob <- layout_with_fr(metanet_prob)
plot(metanet_prob, vertex.size = 5, vertex.label = NA,
     edge.color = "gray70", layout = layout_prob)

6. Correlation-Based Networks (Spearman)

sp.corr <- rcorr(as.matrix(otu_df), type = "spearman")
sp.corr.r <- sp.corr$r
sp.corr.p <- sp.corr$P

sp.corr.r[sp.corr.p > 0.05] <- 0
sp.corr.r[sp.corr.r < 0.5] <- 0

pos_corr <- melt(sp.corr.r)
pos_corr <- subset(pos_corr, value != 0)

7. Distance-Based Networks (Jaccard & Bray–Curtis)

7.1 Jaccard similarity

sim.jacc <- 1 - as.matrix(vegdist(t(otu_df), 
                                  method = "jaccard", 
                                  binary = TRUE))

7.2 Bray–Curtis similarity

sim.bc <- 1 - as.matrix(vegdist(t(otu_df), method = "bray"))

8. Comparing Networks

Jaccard edge overlap
Overlap coefficient
Modularity
Clustering coefficient
Edge density

9. Case Study

Reconstruct networks for two datasets and compare:

Global structure
Sample subnetworks
Module completeness
Metadata relationships

10. Conclusions

This workflow provides a reproducible pipeline for constructing and comparing microbial networks.

11. References

Layeghifard et al., 2018
Röttjers & Faust, 2018
Faust & Raes, 2012
Veech, 2013
Hernandez et al., 2021

<\div>