Quick facts about the datasets:
library(tidyverse)
library(FRmatch)
library(SingleCellExperiment)
library(viridis)
The smFISH data is pre-clustered following the SCANPY pipeline using Leiden clustering with resolution 0.8, which resulted in 16 smFISH clusters shown in a UMAP below.
Load pre-made data objects. See here to construct your own data object. The markers in this case will be the smFISH probe genes.
load("~/Documents/SpaceTx-2020/Data/dat-smFISH-filtered-cluster.rda")
load("~/Documents/SpaceTx-2020/Data/dat-scRNA-smFISHprobes.rda")
ls()
[1] "sce.scRNA" "sce.smFISH"
Check the gene exprresion distribution from the two assays. The density plots showed very different data range and distributional property.
par(mfrow=c(1,2))
plot(density(assay(sce.smFISH)), main="smFISH")
plot(density(assay(sce.scRNA)), main="scRNAseq")
Use the built-in normalization function in FR-Match to align the distributions.
sce.scRNA.norm <- normalization(sce.scRNA)
sce.smFISH.norm <- normalization(sce.smFISH, norm.by = "mean")
par(mfrow=c(1,2))
plot(density(assay(sce.smFISH.norm)), main="smFISH after normalization")
plot(density(assay(sce.scRNA.norm)), main="scRNAseq after normalization")
Note that, in our gene by gene examination, we noticed that the sensitivity of the Kcnip4 probe gene in smFISH was low (i.e. noise-like weak expression), therefore we remove this probe gene in our analysis.
par(mfrow=c(1,2))
plot(density(assay(sce.smFISH.norm)["Kcnip4",]), main="Kcnip4 | smFISH normalized")
plot(density(assay(sce.scRNA.norm)["Kcnip4",]), main="Kcnip4 | scRNAseq normalized")
rowData(sce.smFISH.norm)["Kcnip4",] <- 0
rowData(sce.scRNA.norm)["Kcnip4",] <- 0
myrst <- FRmatch_cell2cluster(sce.smFISH.norm, sce.scRNA.norm,
filter.size=5, subsamp.size=10, subsamp.iter=1e3)
Preparing for spatial plot.
## spatial coordinates
coord <- read.csv("~/Documents/SpaceTx-2020/Data/smFISH_filtered/smFISH_filtered_metadata.csv") %>%
select(sample_name, rotated_x, rotated_y) %>% mutate(cell=as.character(sample_name), x=rotated_x, y=rotated_y) %>%
select(cell, x, y)
## order query (smFISH) clusters based on SCANPY dendrogram
query.cluster.levels <- paste0("query.",c("15","11","7","10","0","12","9","14",
"2","3","6","1","5","4","8","13"))
## scRNA cluster map between consensus clusters and subclasses
sumtab.clusters <- data.table::fread("~/Documents/SpaceTx-2020/Data/snRNAseq/mouseVISp_cluster_summary.csv")
scRNA.cluster.map <- sumtab.clusters %>% select(consensus_cluster, subclass, broad_class) %>% distinct() %>%
filter(!(consensus_cluster=="cl12_i155_Crabp1_Sncg"&subclass=="Lamp5"))
Here, we used a bottom-up strategy to summarize the cell type calling at the subclass level. The granularity of the scRNAseq data is high, which resulted in 191 consensus cell types corresponding to 22 cell subclasses.
df.myrst <- myrst$cell2cluster %>% mutate(match=gsub("ref.","",match))
df <- inner_join(coord, df.myrst, by=c("cell"="query.cell")) %>%
mutate(query.cluster = factor(query.cluster, levels = query.cluster.levels)) %>%
left_join(scRNA.cluster.map, by=c("match"="consensus_cluster")) #bottom-up cluster map
## plot subclass
g <- ggplot(df, aes(x=x, y=y, col=query.cluster, label=gsub("query.","",query.cluster))) +
geom_point(aes(alpha=score)) + geom_text(size=2) +
xlim(min(df$x)-1, max(df$x)+1) + ylim(min(df$y)-1, max(df$y)+1) +
theme_bw() +
facet_wrap(~ subclass, ncol=4)
g
Conclusion: