1 Introduction

This report includes stats and diagnostic plots of the Nanostring GeoMx Digital Spatial (DSP) data.

Nanostring offers a set of R packages that enables comprehensive QC/analysis of the GeoMx DSP data and, visit the vignettes for more information. Nanostring generously provided Spatial Organ Atlas (SOA) and I downloaded the Kidney dataset to generate the following results. Lastly, the following results can also be generated by an App from Docker Hub that was built to compile a set of packages to improve reproducibility and replicability. To execute the Docker app, visit a repo on GitHub for the instructions.

2 Prerequisite

2.1 Environment

  • OS: macOS Sonoma 14.0
  • Platform: aarch64-apple-darwin20 (64-bit)
  • Software: R (v4.3.1)
  • Pakcages: NanoStringNCTools (v1.8.0), GeoMxWorkflows (v1.6.0), GeomxTools (v3.4.0), stringr (v1.5.1), yaml (v2.3.7), dplyr (v1.1.3), scales (v1.2.1), ggplot2 (v3.4.3), ggpubr (v0.6.0), ggsankey (v0.0.99999), plotly (v4.10.3), reshape2 (v1.4.4), DescTools (v0.99.50)

2.2 GeoMx DSP Data: SOA_Kidney

  • Slide: hu_kidney_001, hu_kidney_002, hu_kidney_003, hu_kidney_004; N=4
  • Segment: CD10+ â—¼, PanCK+ â—¼, MUC1+ â—¼, Full â—¼; N=4
  • Files: Annotation.xlsx, DCC files, and PKC file are under the Annot, DCC, and PKC folders, respectivley
WORKING_DIRECTORY
├── Annot
│   └── Annotation.xlsx
├── DCC
│   ├── DSP-*-?-???.dcc
└── PKC
    └── *.pkc

3 Preprocessing/QC

3.1 Preparation

  • Set working directory to load files
# Do not run
baseDir <- "WORKING_DIRECTORY" # where the Annot/DCC/PKC/Settings folders are located
setwd(baseDir)
suppressPackageStartupMessages(library(NanoStringNCTools))
suppressPackageStartupMessages(library(GeoMxWorkflows))
suppressPackageStartupMessages(library(GeomxTools))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(scales))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggpubr))
suppressPackageStartupMessages(library(ggsankey))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(reshape2))
suppressPackageStartupMessages(library(UpSetR))
suppressPackageStartupMessages(library(grid))
suppressPackageStartupMessages(library(EnvStats))
suppressPackageStartupMessages(library(DescTools))
suppressPackageStartupMessages(library(RColorBrewer))
suppressPackageStartupMessages(library(ComplexHeatmap))
suppressPackageStartupMessages(library(circlize))

3.2 Compile a GeoMx object (Step 1)

  • Load DCC/PKC/Annotation files and export to a GeoMx object, e.g., geomxSet.RDS
dccFiles <- dir(file.path(baseDir, "DCC"), pattern = ".dcc$", full.names = TRUE, recursive = TRUE)
pkcFile <- dir(file.path(baseDir, "PKC"), pattern = ".pkc$", full.names = TRUE, recursive = TRUE)
annotFile <- dir(file.path(baseDir, "Annot"), pattern = ".xlsx$", full.names = TRUE, recursive = TRUE)
initSet <- readNanoStringGeoMxSet(
        dccFiles = dccFiles,
        pkcFiles = pkcFile,
        phenoDataFile = annotFile,
        phenoDataSheet = "Annotation",
        phenoDataDccColName = "FileName",
        protocolDataColNames = c("ROI", "AOI"),
        experimentDataColNames = c("Panel")
)
## Warning in readNanoStringGeoMxSet(dccFiles = dccFiles, pkcFiles = pkcFile, :
## Annotations missing for the following: DSP-1003320000078-H-A01.dcc,
## DSP-1003320000078-H-A02.dcc, DSP-1003320000078-H-A03.dcc,
## DSP-1003320000078-H-A04.dcc, DSP-1003320000078-H-A05.dcc,
## DSP-1003320000078-H-A06.dcc, DSP-1003320000078-H-A07.dcc,
## DSP-1003320000078-H-A08.dcc, DSP-1003320000078-H-A09.dcc,
## DSP-1003320000078-H-A10.dcc, DSP-1003320000078-H-A11.dcc,
## DSP-1003320000078-H-A12.dcc, DSP-1003320000078-H-B01.dcc,
## DSP-1012300040419-G-A01.dcc, DSP-1012300040610-F-A01.dcc,
## DSP-1012300040610-F-E02.dcc, DSP-1012300040610-F-E03.dcc,
## DSP-1012300040610-F-E04.dcc, DSP-1012300040610-F-E05.dcc,
## DSP-1012300040610-F-E06.dcc, DSP-1012300040610-F-E07.dcc,
## DSP-1012300040610-F-E08.dcc, DSP-1012300040610-F-E09.dcc,
## DSP-1012300040610-F-E10.dcc, DSP-1012300040610-F-E11.dcc,
## DSP-1012300040610-F-E12.dcc, DSP-1012300040610-F-F01.dcc,
## DSP-1012300040610-F-F02.dcc, DSP-1012300040610-F-F03.dcc,
## DSP-1012300040610-F-F04.dcc, DSP-1012300040610-F-F05.dcc,
## DSP-1012300040610-F-F06.dcc, DSP-1012300040610-F-F07.dcc,
## DSP-1012300040610-F-F08.dcc, DSP-1012300040610-F-F09.dcc,
## DSP-1012300040610-F-F10.dcc, DSP-1012300040610-F-F11.dcc,
## DSP-1012300040610-F-F12.dcc, DSP-1012300040610-F-G01.dcc,
## DSP-1012300040610-F-G02.dcc, DSP-1012300040610-F-G03.dcc,
## DSP-1012300040610-F-G04.dcc, DSP-1012300040610-F-G05.dcc,
## DSP-1012300040610-F-G06.dcc, DSP-1012300040610-F-G07.dcc,
## DSP-1012300040610-F-G08.dcc, DSP-1012300040610-F-G09.dcc,
## DSP-1012300040610-F-G10.dcc, DSP-1012300040610-F-G11.dcc,
## DSP-1012300040610-F-G12.dcc, DSP-1012300040610-F-H01.dcc,
## DSP-1012305110210-G-D01.dcc, DSP-1012305110210-G-D02.dcc,
## DSP-1012305110210-G-D03.dcc, DSP-1012305110210-G-D04.dcc,
## DSP-1012305110210-G-D05.dcc, DSP-1012305110210-G-D06.dcc,
## DSP-1012305110210-G-D07.dcc, DSP-1012305110210-G-D08.dcc,
## DSP-1012305110210-G-D09.dcc, DSP-1012305110210-G-D10.dcc,
## DSP-1012305110210-G-D11.dcc, DSP-1012305110210-G-D12.dcc,
## DSP-1012305110210-G-E01.dcc, DSP-1012305110210-G-E02.dcc,
## DSP-1012305110210-G-E03.dcc, DSP-1012305110210-G-E04.dcc,
## DSP-1012305110210-G-E05.dcc, DSP-1012305110210-G-E06.dcc,
## DSP-1012305110210-G-E07.dcc, DSP-1012305110210-G-E08.dcc,
## DSP-1012305110210-G-E09.dcc, DSP-1012305110210-G-E10.dcc,
## DSP-1012305110210-G-E11.dcc, DSP-1012305110210-G-E12.dcc,
## DSP-1012305110210-G-F01.dcc, DSP-1012305110210-G-F02.dcc,
## DSP-1012305110210-G-F03.dcc, DSP-1012305110210-G-F04.dcc,
## DSP-1012305110210-G-F05.dcc, DSP-1012305110210-G-F06.dcc,
## DSP-1012305110210-G-F07.dcc, DSP-1012305110210-G-F08.dcc,
## DSP-1012305110210-G-F09.dcc, DSP-1012305110210-G-F10.dcc,
## DSP-1012305110210-G-F11.dcc, DSP-1012305110210-G-F12.dcc,
## DSP-1012305110210-G-G01.dcc, DSP-1012305110210-G-G02.dcc,
## DSP-1012305110210-G-G03.dcc, DSP-1012305110210-G-G04.dcc,
## DSP-1012305110210-G-G05.dcc, DSP-1012305110210-G-G06.dcc,
## DSP-1012305110210-G-G07.dcc, DSP-1012305110210-G-G08.dcc,
## DSP-1012305110210-G-G09.dcc, DSP-1012305110210-G-G10.dcc,
## DSP-1012305110210-G-G11.dcc, DSP-1012305110210-G-G12.dcc,
## DSP-1012305110210-G-H01.dcc, DSP-1012305110210-G-H02.dcc,
## DSP-1012305110210-G-H03.dcc, DSP-1012310040419-H-A01.dcc, These will be
## excluded from the GeoMxSet object.
## Warning in readNanoStringGeoMxSet(dccFiles = dccFiles, pkcFiles = pkcFile, : Not all probes are found within PKC probe metadata. The following probes are ignored from analysis and were most likely removed from metadata while resolving multiple module PKC version conflicts.
## RTS0021749RTS0021916RTS0021976RTS0021979RTS0022732RTS0027943RTS0031708RTS0036612RTS0036631RTS0050279RTS0050433RTS0050547RTS0050580RTS0051214RTS0051219RTS0051232RTS0051449RTS0052009RTS0029660
dim(initSet)
## Features  Samples 
##    18815      234
head(assayData(initSet)$exprs[, c(1:2)])
##            DSP-1003320000078-H-B02.dcc DSP-1003320000078-H-B03.dcc
## RTS0020877                          81                         177
## RTS0020878                           0                           6
## RTS0020879                           3                           8
## RTS0020880                           4                          16
## RTS0020881                           5                          13
## RTS0020882                           7                          10

3.3 Assign a pseudo value (Step 2)

  • Shift any expression counts with a value of zero (0) to one (1) to enable in downstream transformations
gSet <- shiftCountsOne(initSet, useDALogic = TRUE)
head(assayData(gSet)$exprs[, c(1:2)])
##            DSP-1003320000078-H-B02.dcc DSP-1003320000078-H-B03.dcc
## RTS0020877                          81                         177
## RTS0020878                           1                           6
## RTS0020879                           3                           8
## RTS0020880                           4                          16
## RTS0020881                           5                          13
## RTS0020882                           7                          10

3.4 Segment QC (Step 3)

  • Assess sequencing quality and adequate tissue sampling for every ROI/AOI segment
    • Any segments below the dotted line, i.e., recommended thresholds, will be filtered out
    • Row: Slide, Column: Segment
module <- gsub(".pkc", "", annotation(gSet))

gSet <- setSegmentQCFlags(gSet, qcCutoffs = segmentQcParams)
qcMat <- sData(gSet)
qcMat$Segment <- factor(qcMat$Segment, levels = segmentOrder)
suppressWarnings(histQC(qcMat, "Trimmed (%)", segmentQC_colBy, segmentQC_rowBy, segmentQC_trimmedThre, cols = segmentCols))

suppressWarnings(histQC(qcMat, "Stitched (%)", segmentQC_colBy, segmentQC_rowBy, segmentQC_stitchedThre, cols = segmentCols))

suppressWarnings(histQC(qcMat, "Aligned (%)", segmentQC_colBy, segmentQC_rowBy, segmentQC_alignedThre, cols = segmentCols))

suppressWarnings(histQC(qcMat, "Saturated (%)", segmentQC_colBy, segmentQC_rowBy, segmentQC_saturatedThre, cols = segmentCols))

suppressWarnings(histQC(qcMat, "area", segmentQC_colBy, segmentQC_rowBy, segmentQC_areaThre, "log10", "AOI Area (log10)", cols = segmentCols))

suppressWarnings(histQC(qcMat, "nuclei", segmentQC_colBy, segmentQC_rowBy, segmentQC_nucleiThre, "log10", "AOI nuclei count", cols = segmentCols))

segmentQcResults <- protocolData(gSet)[["QCFlags"]]
flagColumns <- colnames(segmentQcResults)
qcSummary <- data.frame(Pass = colSums(!segmentQcResults[, flagColumns]), Warning = colSums(segmentQcResults[, flagColumns]))
segmentQcResults$QCStatus <- apply(segmentQcResults, 1L, function(x) {
        ifelse(sum(x) == 0L, "PASS", "WARNING")
})
qcSummary["TOTAL FLAGS", ] <- c(sum(segmentQcResults[, "QCStatus"] == "PASS"), sum(segmentQcResults[, "QCStatus"] == "WARNING"))
qcSummary[, "TOTAL"] <- apply(qcSummary, 1, sum)
qcSummary
##               Pass Warning TOTAL
## LowReads       234       0   234
## LowTrimmed     233       1   234
## LowStitched    233       1   234
## LowAligned     233       1   234
## LowSaturation  234       0   234
## LowNegatives   234       0   234
## LowNuclei      234       0   234
## LowArea        234       0   234
## TOTAL FLAGS    233       1   234
gSet <- gSet[, segmentQcResults$QCStatus == "PASS"]
dim(gSet)
## Features  Samples 
##    18815      233

3.5 Background signal (Step 4)

  • Calculate negative count which is the geometric mean of the several unique negative probes in the GeoMx panel that do not target mRNA and establish the background count level per segment
negCol <- "NegGeoMean"

negativeGeoMeans <- esBy(
        negativeControlSubset(gSet),
        GROUP = "Module",
        FUN = function(x) {
                assayDataApply(x, MARGIN = 2, FUN = ngeoMean, elt = "exprs")
        }
)
protocolData(gSet)[[negCol]] <- negativeGeoMeans
pData(gSet)[, negCol] <- sData(gSet)[[negCol]]

backgrounMat <- sData(gSet)
backgrounMat <- backgrounMat[, c("NegGeoMean", segmentQC_colBy, segmentQC_rowBy)]
backgrounMat$Segment <- factor(backgrounMat$Segment, levels = segmentOrder)
suppressWarnings(histQC(backgrounMat, "NegGeoMean", segmentQC_colBy, segmentQC_rowBy, 2, "log10", "GeoMean(negative probes)", cols = segmentCols))

3.6 Probe-level QC (Step 5)

  • Remove low-performing probes. In short, this QC is an outlier removal process, whereby probes are either removed entirely from the study (global) or from specific segments (local)
gSet <- setBioProbeQCFlags(gSet, qcCutoffs = probeQcParams, removeLocalOutliers = TRUE)
probeQcResults <- fData(gSet)[["QCFlags"]]

qcDf <- data.frame(
        Passed = sum(rowSums(probeQcResults[, -1]) == 0),
        Global = sum(probeQcResults$GlobalGrubbsOutlier),
        Local = sum(rowSums(probeQcResults[, -2:-1]) > 0 & !probeQcResults$GlobalGrubbsOutlier),
        TOTAL = nrow(probeQcResults)
)
qcDf
##   Passed Global Local TOTAL
## 1  18790      1    24 18815
probeQcPassed <- subset(
        gSet,
        fData(gSet)[["QCFlags"]][, c("LowProbeRatio")] == FALSE &
                fData(gSet)[["QCFlags"]][, c("GlobalGrubbsOutlier")] == FALSE
)
gSet <- probeQcPassed
dim(gSet)
## Features  Samples 
##    18814      233

3.7 Gene-level aggregation (Step 6)

  • Generate a gene-level count matrix where the count for any gene with multiple probes per segment is calculated as the geometric mean of those probes
newSet <- aggregateCounts(gSet)
newSet <- subset(newSet, fData(newSet)$TargetName != "NegProbe-WTX")
dim(newSet)
## Features  Samples 
##    18676      233

3.8 Segment QC (Step 7)

3.8.1 Calculate LOQ per segment

  • Check the AOI area and the number of nuclei estimated
lowLvqcDf <- data.frame(
        Slide = pData(newSet)$Slide,
        Sample = pData(newSet)$Library,
        Segment = pData(newSet)$Segment,
        Area = pData(newSet)$area,
        Nuclei = pData(newSet)$nuclei
)
lowLvqcDf$Slide <- factor(lowLvqcDf$Slide, levels=slideOrder)
lowLvqcDf$Segment <- factor(lowLvqcDf$Segment, levels=segmentOrder)

dodge <- position_dodge(width = 0.5)
suppressWarnings({
        ggplot(data = lowLvqcDf, aes(x = Segment, y = Area, fill = Segment)) +
                geom_violin(position = dodge, size = 0) +
                geom_boxplot(width = 0.1, position = dodge, fill="white") +
                scale_fill_manual(values = segmentCols) +
                facet_grid( ~ Slide) +
                labs(
                        title = "",
                        x = "", 
                        y = "Area"
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        legend.position = "none", 
                        text = element_text(size = 12)
                ) + 
                scale_y_continuous(trans = "log10")
})

suppressWarnings({
        ggplot(data = lowLvqcDf, aes(x = Segment, y = Nuclei, fill = Segment)) +
                geom_violin(position = dodge, size = 0) +
                geom_boxplot(width = 0.1, position = dodge, fill="white") +
                scale_fill_manual(values = segmentCols) +
                facet_grid( ~ Slide) +
                labs(
                        title = "",
                        x = "", 
                        y = "#Nuclei"
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        legend.position = "none", 
                        text = element_text(size = 12)
                ) + 
                scale_y_continuous(trans = "log10")
})

suppressWarnings({
        scatterPlot <- ggplot(data = lowLvqcDf, aes(x = Area, y = Nuclei, color = Segment, label = Sample, label2 = Slide)) +
                geom_point() +
                scale_color_manual(values = segmentCols) +
                labs(
                        title = "",
                        x = "Area", 
                        y = "#Nuclei"
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        text = element_text(size = 12)
                ) + 
                scale_x_continuous(trans = "log10") + 
                scale_y_continuous(trans = "log10")
})
ggplotly(scatterPlot)
  • Determine the limit of quantification (LOQ) per segment where the LOQ is calculated based on the distribution of negative control probes and is intended to approximate the quantifiable limit of gene expression per segment
loqDf <- data.frame(row.names = colnames(newSet))

varNames <- paste0(c("NegGeoMean_", "NegGeoSD_"), module)
if (all(varNames[1:2] %in% colnames(pData(newSet)))) {
        loqDf[, module] <- pData(newSet)[, varNames[1]] * (pData(newSet)[, varNames[2]]^loqCutoff)
}

statDf <- data.frame(
        Slide = pData(newSet)$Slide,
        Sample = pData(newSet)$Sample,
        Library = protocolData(newSet)$AOI,
        Segment = pData(newSet)$Segment,
        LOQ = loqDf[, 1]
)
statShort <- dcast(statDf, Sample ~ Segment, fun.aggregate = mean, value.var = "LOQ")
perSample <- statShort[, c(2:ncol(statShort))]
rownames(perSample) <- as.character(statShort[, 1])
statShort <- data.frame(
        Sample = rownames(perSample),
        LOQmean = as.matrix(apply(perSample, 1, mean, na.rm = TRUE))
)
statShort <- statShort[order(statShort$LOQmean), ]
statDf$Sample <- factor(statDf$Sample, levels = statShort$Sample)
statDf$Segment <- factor(statDf$Segment, levels = segmentOrder)

dodge <- position_dodge(width = 0.5)
suppressWarnings({
        ggplot(data = statDf, aes(x = Segment, y = log10(LOQ), fill = Segment)) +
                geom_violin(position = dodge, size = 0) +
                geom_boxplot(width = 0.1, position = dodge, fill = "white") +
                scale_fill_manual(values = segmentCols) +
                labs(
                        title = "",
                        x = "Segment",
                        y = "LOQ, log10"
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        legend.position = "none",
                        text = element_text(size = 12)
                ) +
                geom_hline(aes(yintercept = log10(loqMin)), lty = 2, col = "grey50")
})

suppressWarnings({
        ggplot(data = statDf, aes(x = Segment, y = log10(LOQ), fill = Segment)) +
                geom_violin(position = dodge, size = 0) +
                geom_boxplot(width = 0.1, position = dodge, fill = "white") +
                scale_fill_manual(values = segmentCols) +
                facet_grid(~Slide) +
                labs(
                        title = "",
                        x = "Segment",
                        y = "LOQ, log10"
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 90, vjust = 0, hjust = 0),
                        legend.position = "none",
                        text = element_text(size = 12)
                ) +
                geom_hline(aes(yintercept = log10(loqMin)), lty = 2, col = "grey50")
})

suppressWarnings({
        ggplot(data = statDf, aes(x = Sample, y = log10(LOQ), group = Segment)) +
                geom_line(aes(color = Segment), lwd = 0.5) +
                geom_point(aes(color = Segment)) +
                scale_color_manual(values = segmentCols) +
                labs(
                        title = "",
                        x = "",
                        y = "LOQ, log10"
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 90, vjust = 0, hjust = 0),
                        # legend.position = "none",
                        text = element_text(size = 12)
                ) +
                scale_x_discrete(guide = guide_axis(check.overlap = TRUE)) + 
                geom_hline(aes(yintercept = log10(loqMin)), lty = 2, col = "grey50")
})

loqDf[loqDf < loqMin] <- loqMin
pData(newSet)$LOQ <- loqDf

loqMat <- t(esApply(newSet, MARGIN = 1, FUN = function(x) {
        x > LOQ[, module]
}))
loqMat <- loqMat[fData(newSet)$TargetName, ] # Ordering

3.8.2 Segment gene detection

  • Filter out segments with exceptionally low signal that have a small fraction of panel genes detected above the LOQ relative to the other segments in the study
pData(newSet)$GenesDetected <- colSums(loqMat, na.rm = TRUE)
pData(newSet)$GeneDetectionRate <- pData(newSet)$GenesDetected / nrow(newSet)
pData(newSet)$DetectionThreshold <- cut(pData(newSet)$GeneDetectionRate, breaks = geneDetectionRateBins, labels = geneDetectionRateBinLabels)

rateMat <- pData(newSet)
rateMat$Segment <- factor(rateMat$Segment, levels = segmentOrder)
suppressWarnings({
        ggplot(rateMat, aes(x = DetectionThreshold)) +
                geom_bar(aes(fill = Segment)) +
                geom_text(stat = "count", aes(label = after_stat(count)), vjust = -0.5) +
                scale_fill_manual(values = segmentCols) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 90, vjust = 0, hjust = 0),
                        text = element_text(size = 12)
                ) +
                scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
                labs(
                        x = "Gene Detection Rate",
                        y = "Number of Segments/Samples",
                        fill = "Segment"
                ) +
                facet_grid(as.formula(paste("~", segmentQC_rowBy)))
})

statDf <- data.frame(
        Sample = protocolData(newSet)$AOI,
        Segment = pData(newSet)$Segment,
        GenesDetected = colSums(loqMat, na.rm = TRUE),
        Nuclei = pData(newSet)$nuclei
)
statDf$Segment <- factor(statDf$Segment, segmentOrder)

for (segment in segmentOrder) {
        tmpDf <- statDf[which(statDf$Segment == segment), ]
        tmpDf <- tmpDf[order(tmpDf$GenesDetected, decreasing = F), ]
        tmpDf$Sample <- factor(tmpDf$Sample, levels = tmpDf$Sample)
        barplot <- ggplot(tmpDf, aes(x = Sample, y = GenesDetected, fill = Segment)) +
                geom_bar(stat = "identity") +
                scale_fill_manual(values = segmentCols[segment]) +
                theme_minimal() +
                labs(
                        title = "",
                        x = "Sample"
                ) +
                coord_flip() +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0),
                        legend.position = "none"
                ) +
                scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
        print(barplot)
}

newSet <- newSet[, pData(newSet)$GeneDetectionRate >= geneDetectionRateThre]
dim(newSet)
## Features  Samples 
##    18676      233

3.8.3 Gene detection rate

  • Determine the detection rate for genes across the study where individual genes are detected to varying degrees in the segments. In other words, we will calculate the total number of genes detected in different percentages of segments to filter out low detected genes
loqMat <- loqMat[, colnames(newSet)]
fData(newSet)$DetectedSegments <- rowSums(loqMat, na.rm = TRUE)
fData(newSet)$DetectionRate <- fData(newSet)$DetectedSegments / nrow(pData(newSet))

geneDetectionRateDf <- data.frame(
        Gene = rownames(newSet),
        Number = fData(newSet)$DetectedSegments,
        DetectionRate = percent(fData(newSet)$DetectionRate)
)
geneDetectionRateDf <- geneDetectionRateDf[order(geneDetectionRateDf$Number, geneDetectionRateDf$DetectionRate, geneDetectionRateDf$Gene), ]

finalSet <- newSet[fData(newSet)$DetectionRate >= geneDetectionRateThre, ]
dim(finalSet)
## Features  Samples 
##    11225      233

3.9 Normalization (Step 8)

  • Upper-quartile (Q3) normalization method estimates a normalization factor per segment to bring the segment data distributions together
finalSet <- normalize(finalSet, norm_method = "quant", desiredQuantile = .75, toElt = "q_norm")
annot <- pData(finalSet)

segmentIdx <- c()
for (segment in segmentOrder) segmentIdx <- c(segmentIdx, which(annot$Segment == segment))

rawMat <- assayData(finalSet)$exprs
colnames(rawMat) <- annot$Library
rawDf <- melt(rawMat)
colnames(rawDf) <- c("Gene", "Sample", "Expression")
rawDf$Segment <- rep(annot$Segment, each=nrow(rawMat))
rawDf$Segment <- factor(rawDf$Segment, levels = segmentOrder)

normMat <- assayData(finalSet)$q_norm
colnames(normMat) <- annot$Library
normDf <- melt(normMat)
colnames(normDf) <- c("Gene", "Sample", "Expression")
normDf$Segment <- rep(annot$Segment, each=nrow(rawMat))
normDf$Segment <- factor(normDf$Segment, levels = segmentOrder)
boxplot(log10(Expression) ~ Segment, data = rawDf, col = segmentCols, pch = 20, ylab = "Raw count, log10", xlab = "Segment")

rawMat <- rawMat[, segmentIdx]
boxplot(log10(rawMat), col = segmentCols[annot$Segment[segmentIdx]], pch = 20, names = NA, xlab = "Sample", ylab = "Raw count, log10")

boxplot(log10(Expression) ~ Segment, data = normDf, col = segmentCols, pch = 20, ylab = "Upper-quartile norm, log10", xlab = "Segment")

normMat <- normMat[, segmentIdx]
boxplot(log10(normMat), col = segmentCols[annot$Segment[segmentIdx]], pch = 20, names = NA, xlab = "Sample", ylab = "Upper-quartile norm, log10")

3.10 Outlier detection (Step 9)

  • Principal Component Analysis
normMat <- assayData(finalSet)$q_norm
pcaRes <- prcomp(t(normMat), scale = TRUE)
pcaDf <- data.frame(
        Sample = annot$Library,
        X = pcaRes$x[, 1],
        Y = pcaRes$x[, 2],
        Slide = annot$Slide,
        Segment = annot$Segment,
        Region = annot$Region
)
pcaDf$Slide <- factor(pcaDf$Slide, levels = slideOrder)
pcaDf$Segment <- factor(pcaDf$Segment, levels = segmentOrder)
pcaDf$Region <- factor(pcaDf$Region)

suppressWarnings({
        pcaPlot <- ggplot(data = pcaDf, aes(x = X, y = Y, color = Slide, label = Sample)) +
                geom_point(size = 2, shape = 20) +
                scale_color_manual(values = slideCols) +
                labs(
                        x = paste0("PC1 (", round(summary(pcaRes)$importance[2, c(1)] * 100, 1), "%)"),
                        y = paste0("PC2 (", round(summary(pcaRes)$importance[2, c(2)] * 100, 1), "%)")
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        text = element_text(size = 12)
                )
})
ggplotly(pcaPlot)
suppressWarnings({
        pcaPlot <- ggplot(data = pcaDf, aes(x = X, y = Y, color = Segment, label = Sample)) +
                geom_point(size = 2, shape = 20) +
                scale_color_manual(values = segmentCols) +
                labs(
                        x = paste0("PC1 (", round(summary(pcaRes)$importance[2, c(1)] * 100, 1), "%)"),
                        y = paste0("PC2 (", round(summary(pcaRes)$importance[2, c(2)] * 100, 1), "%)")
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        text = element_text(size = 12)
                )
})
ggplotly(pcaPlot)
suppressWarnings({
        pcaPlot <- ggplot(data = pcaDf, aes(x = X, y = Y, color = Region, label = Sample)) +
                geom_point(size = 2, shape = 20) +
                scale_color_manual(values = regionCols) +
                labs(
                        x = paste0("PC1 (", round(summary(pcaRes)$importance[2, c(1)] * 100, 1), "%)"),
                        y = paste0("PC2 (", round(summary(pcaRes)$importance[2, c(2)] * 100, 1), "%)")
                ) +
                theme_bw() +
                theme(
                        axis.line = element_line(colour = "black"),
                        panel.grid.major = element_blank(),
                        panel.grid.minor = element_blank(),
                        panel.border = element_blank(),
                        panel.background = element_blank(),
                        axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                        text = element_text(size = 12)
                )
})
ggplotly(pcaPlot)

3.11 QC Summary

  • The number of features, i.e., probe or gene, and samples
Dataset #Samples #Features
Initial 234 18815
After QC (analysis-ready) 233 11225
  • Study design after QC
countDf <- pData(finalSet) %>% make_long(Slide, Segment, Region)
ggplot(countDf, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
        geom_sankey(flow.alpha = .6, node.color = "gray30") +
        geom_sankey_label(size = 3, color = "black", fill = "white") +
        scale_fill_viridis_d(option = "A", alpha = 0.95) +
        theme_sankey(base_size = 18) +
        labs(
                x = NULL, 
                y = NULL
        ) +
        theme_bw() +
        theme(
                axis.line = element_blank(),
                panel.grid.major = element_blank(),
                panel.grid.minor = element_blank(),
                panel.border = element_blank(),
                panel.background = element_blank(),
                axis.text.x = element_text(angle = 0, vjust = 0, hjust = 0.5),
                axis.ticks.x = element_blank(),
                axis.text.y = element_blank(),
                axis.ticks.y = element_blank(),
                legend.position = "none", 
                text = element_text(size = 12)
        )

  • Stats in the reference datasets, i.e., Nanostring Spatial Organ Atals (SOA)

    • QC paramters used in this report
Step Filter name Threshold Description
SegmentQC minSegmentReads 1000 Minimum number of reads
percentTrimmed 80 Minimum % of reads trimmed
percentStitched 80 Minimum % of reads stitched
percentAligned 75 Minimum % of reads aligned
percentSaturation 50 Minimum sequencing saturation (%)
minNuclei 20 Minimum number of nuclei estimated
minArea 1000 Minimum segment area
ProbeQC minProbeRatio 0.1 Geometric mean of a given probe / geometric mean of all probe
percentFailGrubbs 20 An outlier according to the Grubb’s test (%)
LOQ loqCutoff 2 LOQ cut off value
loqMin 2 LOQ minimum value
GeneQC geneDetectionRateThre 0.05 Minimum gene detection rate