cell-to-cell-heterogeneity

We evaluate the variability in gene expression profiles for each individual samples. The principal component analysis suggest that after our normalization steps, the major source of variation is not driven by replicate labels but instead is highly correlated with mean molecule counts across replicates. Therefore, for transcriptome-wide comparison of cell-to-cell heterogeneity, we combine replicates of each individual samples and compute each gene’s coefficients of variation across the combined samples. link1

To account for the confounding effect of mean molecule counts in the individual difference of coefficients of variation, we fit a data-wide PCA and filtered the first PC (29% of variation), which is highly correlated with mean molecule counts (r = .90).

Transcriptome-wide cell-to-cell heterogeneity between individuals

To compare cell-to-cell heterogeneity of gene expression profiles between individuals, we used coefficients of variation of the normalized counts. We combine the replicates to compute per gene coefficient of variation for each individual samples (link1); principal component analysis of each individual samples consistenly suggest that the main source of sample variation is mean gene expression level, and is not a function of replicate labels. We observed that the CV profiles differ between individuals by the Friedman rank sum test (p < 1e-5). To investigate individual differencees in cell-to-cell heterogeneity beyond the effect of mean gene expression levels, we remove the first PC from the normalized counts, which accounts for 29% of variation in the data and is highly correlated with mean gene expression levels (r = .9) (link2).

Between individuals, the CV profiles are highly correlated prior to PC1 removal, possibly due to the high correlation between PC1 and mean expression levels. The adjusted CV profiles are orthogonal of each other, suggesting the presence of individual-specific cell-to-cell heterogeneity profiles in single-cell sequencing data that is independent of mean expression levels. link3

We furthered investigated genes with extremely high or low adjusted CV profiles (mean +/- 2 standard deviation). [Summarize functional categories…]

We further investigate potential sources of biological variation in iPSC cells that may be due to cel-cycle and pluripotency. The adjusted CV profiles of the cell-cycle genes significantly differed between individuals (p < .0005), and so are the non-cell-cycle genes (p < 1e-6). However, we observed that the adjusted CV profiles do not differ between the pluripotent genes. (link2)