Unique properties of a subset of human pluripotent stem cells with high capacity for self-renewal

Lau, Kevin X.; Mason, Elizabeth A.; Kie, Joshua; De Souza, David P.; Kloehn, Joachim; Tull, Dedreia; McConville, Malcolm J.; Keniry, Andrew; Beck, Tamara; Blewitt, Marnie E.; Ritchie, Matthew E.; Naik, Shalin H.; Zalcenstein, Daniela; Korn, Othmar; Su, Shian; Romero, Irene Gallego; Spruce, Catrina; Baker, Christopher L.; McGarr, Tracy C.; Wells, Christine A.; Pera, Martin F.

doi:10.1038/s41467-020-16214-8

Download PDF

Article
Open access
Published: 15 May 2020

Unique properties of a subset of human pluripotent stem cells with high capacity for self-renewal

Nature Communications volume 11, Article number: 2420 (2020) Cite this article

6373 Accesses
23 Citations
36 Altmetric
Metrics details

Subjects

Abstract

Archetypal human pluripotent stem cells (hPSC) are widely considered to be equivalent in developmental status to mouse epiblast stem cells, which correspond to pluripotent cells at a late post-implantation stage of embryogenesis. Heterogeneity within hPSC cultures complicates this interspecies comparison. Here we show that a subpopulation of archetypal hPSC enriched for high self-renewal capacity (ESR) has distinct properties relative to the bulk of the population, including a cell cycle with a very low G1 fraction and a metabolomic profile that reflects a combination of oxidative phosphorylation and glycolysis. ESR cells are pluripotent and capable of differentiation into primordial germ cell-like cells. Global DNA methylation levels in the ESR subpopulation are lower than those in mouse epiblast stem cells. Chromatin accessibility analysis revealed a unique set of open chromatin sites in ESR cells. RNA-seq at the subpopulation and single cell levels shows that, unlike mouse epiblast stem cells, the ESR subset of hPSC displays no lineage priming, and that it can be clearly distinguished from gastrulating and extraembryonic cell populations in the primate embryo. ESR hPSC correspond to an earlier stage of post-implantation development than mouse epiblast stem cells.

Single cell analyses identify a highly regenerative and homogenous human CD34+ hematopoietic stem cell population

Article Open access 19 April 2022

Fernando Anjos-Afonso, Florian Buettner, … Dominique Bonnet

Acquired genetic changes in human pluripotent stem cells: origins and consequences

Article 23 September 2020

Jason Halliwell, Ivana Barbaric & Peter W. Andrews

The transcriptional regulator ZNF398 mediates pluripotency and epithelial character downstream of TGF-beta in human PSCs

Article Open access 12 May 2020

Irene Zorzan, Marco Pellegrini, … Graziano Martello

Introduction

The successful application of human pluripotent stem cells (hPSC) in research and cell therapy relies on the ability to maintain, expand, and differentiate these cells in vitro in a tightly controlled and efficient fashion. Our understanding of the regulation of pluripotent stem cell self-renewal and lineage specification is in turn built largely on embryological paradigms, with developmental roadmaps providing critical knowledge of the key transitional stages, and pinpointing the extrinsic and internal molecular pathways drive cell fate decisions. In the mouse, we now have a fairly clear understanding of the states of pluripotency that span the developmental stages between the blastocyst and the late gastrula in vivo¹. The characterization of mouse naive embryonic stem cells (ESC)² and epiblast stem cells (EpiSC)^3,4 as in vitro equivalents of the preimplantation epiblast and the anterior primitive streak, respectively⁵, has shed considerable light on the properties of the cultured cells. Recently, a stage between these two pluripotent states called formative pluripotency, corresponding to the early post-implantation epiblast, has been described⁶. Defined by specific molecular and biological features, including an absence of lineage priming and a rapid response to induction of lineage specification (including the germline lineage), formative mouse pluripotent stem cells have yet to be successfully serially propagated in vitro.

The derivation of mouse EpiSC, their characterization as epithelial cells dependent upon activin and FGF2 for maintenance in the pluripotent state^3,4, and their co-expression of lineage-specific and pluripotency genes^4,5,7,8,9,10, led many researchers to the conclusion that hPSC derived and maintained under conventional culture conditions (archetypal hPSC), which share these features, equate to the primed state of mouse pluripotency. This in turn led to a search for conditions that would support long-term maintenance of hPSC in a naive state^11,12. Several culture systems have been described that support a cell with molecular features quite similar to the human preimplantation epiblast^13,14. Extended propagation of diploid hPSC in these systems remains challenging, however.

Population heterogeneity complicates the interpretation of stem cell phenotype. To dissect heterogeneous populations in archetypal hPSC cultures, we used monoclonal antibodies to cell surface antigens to define subsets of stem cells that exist in a hierarchical continuum of cell states. When we subjected these cells to transcriptional profiling, we observed co-expression of pluripotency genes with lineage specific transcription factors, particularly in subpopulations of cells with lower levels of stem cell surface marker and pluripotency gene expression¹⁵. Importantly though, we also found that cells which expressed pluripotency markers at high levels were less likely to display lineage priming.

Subsequently by analyzing cells grown under defined conditions, and using a more refined sorting strategy to isolate the subpopulation enriched for high self-renewal (ESR) followed by medium throughput single-cell RT-QPCR, we were able to show that the cells at the top of the hierarchy expressed very uniform and high levels of pluripotency markers, and showed no lineage priming¹⁶.

As cells traverse through stages of pluripotency in vitro and in vivo, they undergo changes in cell cycle regulation and metabolic activity, they restructure their epigenome, and their gene expression profile changes. The ability to sort highly purified populations of hPSC with high self-renewal capacity, and to analyse transcription in these subpopulations at the genomic level using RNA-Seq for subpopulation and single cell analysis, coupled with the availability of new single-cell gene expression data from preimplantation and post-implantation primate embryos, prompted us to re-examine the properties of the ESR subpopulation of archetypal hPSC, and to reconsider its developmental status. The results show that the ESR subpopulation resembles the primate early post-implantation epiblast, similar to the mouse formative state of pluripotency.

Results

Self-renewal of subsets of hPSC

We have previously used flow cytometry to isolate cell subpopulations, followed by assay of colony forming ability as an indicator of self-renewal^15,17. hPSC survival after dissociation to single cells and flow cytometry is poor. We therefore developed a simple methodology that would allow comparison of self-renewal of defined cell populations at reasonable levels of initial survival after flow cytometry, through sorting small aggregates of cells, to maintain cell–cell contacts and enhance post-sort survival¹⁸.

Fluorescence activated cell sorting using antibodies GCTM-2 and TG30 (recognizing CD9) enabled us to recover small cell aggregates (chiefly doublets; singlets, 17%; doublets, 61%; triplets, 16%; quadruplets, 6%) (Fig. 1a). We seeded wells with equal cell numbers as aggregates or singlets. Although both aggregates and single cells attached to the plate after sorting, a much larger fraction of the aggregates had begun to spread 1 h after plating, indicative of high viability; the difference was more evident after 24 h (Fig. 1b). Initial attachment, spreading and survival of the GCTM-2^highCD9^high subpopulation was similar to the GCTM-2^midCD9^mid subpopulation. Flow cytometry re-analysis of both subpopulations 72 h later showed that they had largely retained their cell surface phenotypes though as shown previously¹⁹, the GCTM-2^highCD9^high population had begun to reconstruct the entire cell state continuum (Fig. 1c). By 4 days, cells plated as aggregates displayed a higher colony forming efficiency than single cells, and the GCTM-2^highCD9^high subpopulation had formed a much larger number of microcolonies (Fig. 1d–e). Colonies formed from GCTM-2^highCD9^high aggregates were larger and showed a higher proportion of cells bearing stem cell markers compared with colonies formed by GCTM-2^midCD9^mid cells (Fig. 1f). Time-lapse video microscopy confirmed that the initial cell numbers attaching to the dish were similar for the two subpopulations. However, subsequent monitoring showed that while both subpopulations were migratory and underwent cell division, the GCTM-2^highCD9^high subpopulation persisted to form colonies of 4–32 cells several days later, whereas the GCTM-2^midCD9^mid colonies suffered abortive expansion and, in many cases, extinction (Supplementary Movies 1 and 2).

**Fig. 1: Assay of self-renewal of subpopulations of hPSC under conditions that maintain cell–cell contacts.**

Differentiation potential of hPSC subpopulations

In the mouse, the ability to differentiate efficiently into germ cells in vitro is limited to epiblast like cells at the stage of formative pluripotency^20,21. Naive pluripotent stem cells or epiblast stem cells both lack this capacity²¹. Using the two-step protocol to generate first a mesoderm-like intermediate and subsequently convert these cells to PGC-like cells²², we measured the degree of differentiation of the GCTM-2^highCD9^highEPCAM^high and GCTM-2^midCD9^mid populations using flow cytometry to quantitate EPCAM/ITGA6 double high cells which represent PGC-like cells (Fig. 2a). GCTM-2^highCD9^highEPCAM^high cells showed higher expression of EPCAM and ITGA6 at the onset, as expected. After 2 days of differentiation into mesoderm-like cells, both groups showed loss of EPCAM expression but at Day 4 after PGC-like cell induction, an EPCAM/ITGA6 double positive fraction was observed in both groups. Cultures that did not receive growth factors to induce germ cell formation contained no EPCAM/ITG6A double high population. The GCTM-2^highCD9^highEPCAM^high population had largely disappeared by the end of the time course, as noted by Sasaki et al.²². The identity of the PGC-like cells was confirmed by staining with antibodies to PRDM1 and NANOS3 (Fig. 2b). More PGC-like cells were consistently obtained from GCTM-2^highCD9^highEPCAM^high compared with GCTM-2^midCD9^mid subpopulations in both cell lines, though there was a high degree of inter-assay variability (Fig. 2c). We and others have shown previously that the ESR fraction of hPSC is pluripotent (Figs. 4 and 5, refs. ^17,19). In this study, directed differentiation in adherent culture, carried out on cell lines WA09 and WA01, confirmed that both the GCTM-2^highCD9^high subpopulation and the remaining cell population were pluripotent, as measured by expression of markers characteristic of progenitors of all three embryonic germ layers (Fig. 2d).

**Fig. 2: Differentiation potential of GCTM-2^highCD9^highEPCAM^high and GCTM-2^midCD9^mid subpopulations.**

ESR cells have a cell cycle with a low G1 fraction

Initial analyses of the mouse ESC cell cycle indicated that naive mouse ESC, and cells of the epiblast, show a shortened cell cycle with a minimal G1 component²³. While more recent work shows that the cell cycle state of naive mouse stem cells is dependent upon culture conditions²⁴, it remains clear that lineage commitment is coupled to cell cycle regulation in mouse and human PSC, and that the G1 phase represents a decision point for undergoing lineage specification^25,26,27.

Here we re-assessed our previous finding that a subset of cells expressing high levels of the stem cell antigen GCTM-2 had a reduced G1 fraction compared with the remainder of the population²⁸. Using a more refined sorting procedure and EdU incorporation to identify S-phase cells. we determined the cell cycle phase distribution of GCTM-2^highCD9^highEPCAM^high, GCTM-2^highCD9^high, and GCTM-2^lowCD9^low subpopulations, compared with unsorted cells (the general population). As shown in Fig. 3, very few of the GCTM-2^highCD9^highEPCAM^high or GCTM-2^highCD9^high cells were in G0/G1 phase, with most of this subpopulation in S or G2/M. By contrast, GCTM-2^lowCD9^low cells were predominantly in G0/G1 (~70%). The cell cycle phase distribution of the unsorted general population was consistent with these findings. Results using a different WA09 subline with a FUCCI reporter²⁹ confirmed these conclusions (Supplementary Fig. 1).

**Fig. 3: Cell cycle analysis of hPSC subpopulations by flow cytometry of EdU labeled cultures.**

Active mitochondria and bivalent metabolism in ESR cells

Metabolic activity is modulated throughout early mammalian development. The preimplantation relies on a combination of aerobic glycolysis and oxidative phosphorylation, and this metabolic pattern is maintained during post-implantation development up to E7.5 in the mouse³⁰. Like the epiblast, naive state mouse ESC show this bivalent metabolism, while primed epiblast stem cells rely primarily on glycolysis³¹. Similar transitions from bivalent metabolism to glycolysis have been reported during conversion of naive to primed hPSC^32,33.

To determine if the energy metabolism status of hPSC is dependent on their position in the pluripotency hierarchy, we first assessed the mitochondrial membrane potential of defined subsets of hPSC by measuring the uptake of the mitochondrial dye tetramethylrhodamine methyl ester (TMRM), or the combined uptake and the ratio of red/green fluorescence of the dye JC-1. Live cell staining of WA09 cells showed that staining with TMRM was strongest at the edge of the colonies, where cells expressing the highest levels of the stem cell marker CD9 are found (Fig. 4a). To quantitate mitochondrial activity across the cell populations, WA09 cells were incubated with TMRM or JC-1, labeled with stem cell surface makers GCTM-2, TG30 (anti-CD9), and anti-EPCAM, and the GCTM-2^highCD9^highEPCAM^high and GCTM-2^lowCD9^low fractions were identified by flow cytometry. The GCTM-2^highCD9^highEPCAM^high subpopulation stained more intensely with TMRM, and showed higher ratio of red to green fluorescence following incubation with JC-1 compared with the general (remaining) population, or the GCTM-2^lowCD9^low subpopulation (Fig. 4b–c), indicating increased mitochondrial activity in the GCTM-2^highCD9^highEPCAM^high subpopulation.

**Fig. 4: Mitochondrial activity in subpopulations of hPSC.**

Analysis of mitochondrial oxidative phosphorylation in live cells using the Agilent Seahorse XF apparatus confirmed that the basal oxygen consumption rate (OCR) was higher in the GCTM-2^highCD9^highEPCAM^high subpopulation compared with the remaining GEN population (Fig. 5a). These cells also exhibited higher spare mitochondrial capacity, as indicated by maximum OCR achieved after addition of proton uncoupling agent FCCP (Fig. 5a). Comprehensive analysis of intracellular metabolite levels in the GCTM-2^highCD9^highEPCAM^high cells and unfractionated population using liquid chromatography and gas chromatography–mass spectrometry (LC–MS, GC–MS) provided further evidence that the GCTM-2^highCD9^highEPCAM^high cells exist in a distinct metabolic state. Principal Component Analysis (PCA, Supplementary Fig. 2a, b) and hierarchical cluster analysis (Fig. 5b, LC–MS; Supplementary Fig. 2c, GC–MS) clearly separated the purified cell population from the GEN population. Consistent with the live cell staining and OCR analysis, the GCTM-2^highCD9^highEPCAM^high contained elevated levels of TCA cycle metabolites (Fig. 5c) and was depleted in many amino acids and metabolites in the urea cycle (Supplementary Tables 1–2), compared with the general population. Pathway analysis further confirmed the distinct metabolic profile of the GCTM-2^highCD9^highEPCAM^high subpopulation (Supplementary Fig. 2d, e).

**Fig. 5: Analysis of the metabolism of GCTM-2^highCD9^highEPCAM^high cells.**

To gain further insights into the metabolic wiring of this subpopulation of cells, GCTM-2^highCD9^highEPCAM^high fractionated cells and the GEN population of hPSC were cultivated in the presence of ¹³C glucose for 2 h and level of ¹³C-enrichment in different intermediates of central carbon metabolism monitored by GC–MS. High levels of ¹³C-enrichment were observed in all intermediates in glycolysis and the pentose phosphate pathway confirming that both cell populations exhibit high rates of aerobic glycolysis (Fig. 5d). Interestingly, ¹³C-enrichment in serine and glycine, which are synthesized from the glycolytic intermediate 3-phosphoglycerate, were higher in the GCTM-2^highCD9^highEPCAM^high cells, indicating that these cells may have higher rates of amino acid and protein synthesis. Consistent with GCTM-2^highCD9^highEPCAM^high cells having elevated rates of mitochondrial metabolism, ¹³C-enrichment in citrate and isocitrate were substantially upregulated in this subpopulation (Fig. 5d). However, ¹³C-enrichment in later intermediates in the oxidative cycle (succinate, fumarate, and malate) were not significantly changed between the two cell populations indicating that early intermediates in the TCA cycle may be diverted into anabolic pathways (catapleurosis). Increased catapleurosis in the GCTM-2^highCD9^highEPCAM^high cells was supported by the elevated levels of ¹³C-labeling in glutamate, which is synthesized from the TCA cycle intermediate, α-ketoglutarate, as well as high levels of ¹³C-enrichment in long chain unsaturated fatty acids (i.e., oleic acid) and cholesterol, indicating high rates of membrane biogenesis or turnover. Overall, these data provide compelling evidence that the GCTM-2^highCD9^highEPCAM^high cells are metabolically more active than the GEN hPSC population, exhibiting higher rates of oxidative phosphorylation and anabolic amino acid and lipid synthesis.

DNA methylation in hPSC cultured under defined conditions

Levels of DNA methylation in the mouse and human epiblast are low, but increase substantially in the mouse following embryo implantation and activation of the de novo methyltransferases Dnmt3a and Dnmt3b^34,35. Reduced representation bisulfite sequencing to assess levels of DNA methylation showed no major differences between the relatively low mean levels of overall DNA methylation or levels of DNA methylation over CpG islands between the GCTM-2^highCD9^highEPCAM^high subpopulation and the general population (Fig. 6a, b). DNA methylation distributions were bimodal (Fig. 6c, d), and there was little evidence of differential methylation at CpG islands across any particular loci in the GCTM-2^highCD9^highEPCAM^high subpopulation (Fig. 6e). The extent of DNA methylation at CpG islands in various repeat elements was high and did not vary between the unsorted and GCTM-2^highCD9^highEPCAM^high fraction (Fig. 6f). De novo DNA methyltransferases DNMT3A and DNMT3B were both expressed along with TET1 across the cell populations studied (below and Supplementary Fig. 3a–j), similar to the early post-implantation epiblast in the mouse³⁶, and suggestive of a highly dynamic state of DNA methylation in these cells.

**Fig. 6: Reduced representation bisulfite sequencing analysis of DNA methylation in unsorted (GEN) and GCTM-2^highCD9^highEPCAM^high (HHH) subpopulations.**

Differential chromatin accessibility in hPSC subpopulations

We next identified critical differences in the chromatin landscape between GCTM-2^highCD9^highEPCAM^high and GCTM-2^midCD9^mid populations using the assay for transposase accessible chromatin³⁷. In total, we identified 118,442 regions as accessible across both populations. Of these, 3144 were more accessible in the GCTM-2^highCD9^highEPCAM^high cells, while 4730 were more accessible in the GCTM-2^midCD9^mid population (Fig. 7a, FDR < 0.01; quality assessment data Supplementary Fig. 4, Supplementary Data 1). Generally, peaks with increased accessibility in the high population were distant from transcription start sites (TSS), more often found in intergenic and introns typical of enhancers, whereas peaks with increased accessibility in the middle populations were closer to TSS (Fig. 7b) identified as promoters (Supplementary Fig. 5a). To annotate open chromatin sites in each population we compared their location to the entire set of transcription factor (TF) binding sites across a diverse range of human cell types identified by the ENCODE project³⁸ using a locus overlap enrichment analysis³⁹ (Supplementary Data 2). Genomic locations more accessible in the GCTM-2^highCD9^highEPCAM^high population were highly enriched for TF binding sites identified in hPSC compared with the GCTM-2^midCD9^mid population (Fig. 7c). Furthermore, the most enriched TFs binding at the GCTM-2^highCD9^highEPCAM^high population sites included known pluripotency factors NANOG, POU5F1 (OCT4), and TCF12 and BCL11A (Fig. 7d, upper panel), the latter two TF having been previously identified as highly expressed in the GCTM-2^highCD9^highEPCAM^high population¹⁶. While these same factors were still identified as enriched in the regions in the GCTM-2^midCD9^mid populations, other general chromatin factors showed the highest overlap (Fig. 7d, lower panel). Finally, we compared regions of increased accessibility in both populations to the tissue and cell-type specific DNAse hypersensitivity clusters identified in human samples⁴⁰. Again, this analysis identified that the regions with greatest enrichment in the GCTM-2^highCD9^highEPCAM^high population were annotated as being unique to stem cells (Supplementary Fig. 5b, c). In summary, these data indicate that the GCTM-2^highCD9^highEPCAM^high population has greater open chromatin at putative enhancers bound by canonical pluripotent factors at sites unique to stem cells, whereas the low population has greater DNA accessibility at promoters bound by general chromatin and transcription factors. Together these data support the hypothesis that the chromatin of cells in the GCTM-2^highCD9^highEPCAM^high population exists in a distinct state compared with the low population.

**Fig. 7: Landscape of accessible chromatin differentiates stem cell populations.**

Comparison of ESR cell transcriptome with primate epiblast

RNA-seq analysis comparing gene expression in the GCTM-2^highCD9^highEPCAM^high subpopulation to the general (total unfractionated) population (Supplementary Data 3) identified 515 genes differentially expressed between the GCTM-2^highCD9^highEPCAM^high subset and the total population (132 upregulated in the GCTM-2^highCD9^highEPCAM^high cells and 383 downregulated relative to the unfractionated cells, at >1.5-fold change in expression level with an adjusted p value <0.05; Fig. 8a).

**Fig. 8: Global gene expression analysis of hPSC subpopulations by RNA-seq.**

Genes upregulated in the GCTM-2^highCD9^highEPCAM^high fraction included NODAL and its antagonists LEFTY1 and LEFTY2, in agreement with our previous study¹⁶. POU3F1, a marker of the naive to formative transition in the mouse, was also upregulated. Notably, a number of small nuclear and small nucleolar RNAs were expressed at high levels in the GCTM-2^highCD9^highEPCAM^high population. Negative regulators of MAPK signaling, including DUSP5 (inactivator of ERK1), DUSP6 (inactivator of ERK2), and SPRY2, were upregulated in the self-renewing fraction, as was DACT1 (an antagonist of canonical WNT signaling). Negative regulators of MAPK signaling including Dusp4 and Spry were recently shown to be upregulated at an early stage during dissolution of the mouse naive state⁴¹. Amongst the genes expressed at lower levels in the GCTM-2^highCD9^highEPCAM^high cellular subset relative to the unfractionated population were members of the WNT signaling pathway, including WNT4, FRZB, FZD3, FZD5, and FZD8. Of the top 100 genes upregulated in the general population, 47 genes (all upregulated at twofold change or higher) were previously reported to be expressed at peak levels at the onset of neural differentiation in the CORTECON study of Temple and colleagues⁴² (global analysis, Supplementary Fig. 6). Examination of previously published microarray data⁴³ for a subset of these neural induction genes confirmed a pattern of continuous upregulation in cell subsets with decreasing levels of pluripotency associated cell surface markers in multiple cell lines (Supplementary Fig. 7, data visualized in the Stemformatics platform https://www.stemformatics.org). BMP2, BMP4, and FST were also upregulated in the general population, consistent with our previous results¹⁶.

Some genes characteristic of the primate preimplantation epiblast and naive hPSC were expressed in the GCTM-2^highCD9^highEPCAM^high fraction (PRDM14, TFCP2L1, ZFP42, DPPA2, and TFAP2C), but others were not (ARGFX, KLF17, TBX3, NLRP7). Genes expressed in primitive endoderm (SOX17, GATA4, GATA6, FOXA2, and APOA2) were not found in either the GCTM-2^highCD9^highEPCAM^high or the general population, nor were genes activated during early gastrulation (T, MIXL, GSC, EOMES, FOXA2, LHX1).

scRNA-seq on hPSC fractionated into four separate subpopulations enabled us to compare gene expression in the fractionated subpopulations with the single-cell data of Nakamura et al.⁴⁴ for Macaca fascicularis pre- and post-implantation embryos (quality assessment, Supplementary Figs. 8–9). We analyzed 300 cells, and we detected expression of 8403 genes. All cells uniformly expressed the general pluripotency associated transcription factors POU5F1, SOX2, and NANOG (Supplementary Fig. 10a–d)). ZFP42, a marker of the naive state in mouse, was expressed throughout the population, but another naive state marker, TFCPL1 was expressed primarily in the GCTM-2^highCD9^highEPCAM^high and GCTM-2^highCD9^high subsets; POU3F1, characteristic of post-implantation epiblast, was expressed throughout (Supplementary Fig. 10e–g). PCA of the human cell subpopulations alone indicated that they could be clearly separated along a continuum of cell states (Fig. 8b). Ontology analysis of differential gene expression highlighted a number of pathways involving ribonucleoprotein complexes, ribosomes, and metabolic processes, in concurrence with the expression of small nuclear and nucleolar RNAs noted above in the subpopulation analysis (Supplementary Table 3), and a number of pathways related to oxidative metabolism, including metabolic processes, mitochondrion, mitochondrion organization, generation of precursor metabolites and energy, mitochondrial inner membrane, and hydrogen ion transmembrane transporter activity.

The GCTM-2^highCD9^highEPCAM^high subset and general populations both expressed markers of the post-implantation mouse epiblast, including POU3F1, OTX2, DNMT3A, DNMT3B, SOX4, SOX11, LIN28A, and ZNF281 (Supplementary Data 3). In a PCA of the human single-cell data and the cynomolgus data of Nakamura et al.⁴⁴ (Fig. 8c), the first principal component resolved the two experiments with cynomolgus and human cells. The second principal component resolved the inner cell mass and pre- and post-implantation epiblast, along with extraembryonic cells. In this dimension, the human cells aligned with preimplantation epiblast stage cells. In the third principal component, which separated post-implantation stages of cynomolgus development, human cells aligned with the post-implantation epiblast, with cells in the GCTM-2^highCD9^highEPCAM^high fraction between early and late post-implantation stages. The human cells were clearly distinguished from inner cell mass and preimplantation epiblast stages, from gastrulating cells, and from extraembryonic tissues. PCA of the cynomolgus cells alone revealed that it was difficult to separate early from late post-implantation epiblast (Supplementary Fig. 11).

The cells used in this study were WA09 cells grown on mTeSR in feeder-free, serum-free conditions. To assess the generality of these findings, we compare these results with two previous microarray analyses: an independent study that examined similarly defined subpopulations of cell lines MEL1 and WA09 grown in proprietary serum replacement with mouse embryo fibroblast feeders or mTeSR1 (WA09)⁴³, and our previous study using cell line ES02 grown in serum-supplemented medium in the presence of mouse fibroblast feeder cells¹⁷. We identified a panel of stage-specific genes (Supplementary Table 4) on the basis of their differential expression in the data of Nakamura et al.⁴⁴ and a recent scRNA-seq study of the human preimplantation embryo⁴⁵ (Fig. 9). Genes specific to the inner cell mass, or mainly expressed in the inner cell mass and preimplantation epiblast, were very weakly expressed in the GCTM-2^highCD9^high subpopulation in the previous works and in the GCTM-2^highCD9^highEPCAM^high cells in the current RNA-seq study. Expression levels of the gene panel characteristic of the inner cell mass, preimplantation epiblast and early post-implantation epiblast were found at appreciable levels in self-renewing hPSC in all studies. Those genes with highest expression levels in all three cynomolgus epiblast populations, or in early and late post-implantation epiblast, were expressed robustly in all hPSC subpopulations in all studies. The gene panel specific to late post-implantation epiblast and gastrulating populations was expressed at somewhat lower levels in all of our populations relative to pan-epiblast specific genes. Last, genes characteristic of gastrulation stages 2A and 2B (nomenclature, ref. ⁴⁴) were expressed at low levels in the GCTM-2^highCD9^highEPCAM^high and GCTM-2^highCD9^high and GCTM-2^midCD9^mid subpopulations, with levels rising in the GCTM-2^lowCD9^low and GCTM-2⁻CD9⁻ fractions. Thus, these data support the same conclusion as PCA of the single cell data: hPSC cell subpopulations enriched for self-renewal capacity show a pattern of gene expression that is strongly similar to early post-implantation epiblast stages in the primate embryo, but clearly distinguished from inner cell mass or gastrulation stages.

**Fig. 9: String section plot comparison of expression of embryonic stage-specific genes by scRNA-seq in cynomolgus embryo³⁰ with expression in subpopulations of hPSC.**

Discussion

We showed previously that the minority ESR subpopulation of hPSC can be isolated with cell surface markers and identified by colony formation assay. Colony formation measures both survival and self-renewal, and dissociation to single cells and flow cytometry compromises survival. We have used several assay strategies to avoid conflation of survival and self-renewal^16,17, but the approach described in this study of isolating aggregates is simple, and yields defined subpopulations with high initial survival for subsequent analysis. Although the initial survival of aggregates of the hPSC subpopulations was similar, further development of viable stem cell colonies was observed predominantly in the fraction bearing the highest level of stem cell markers. Time-lapse video microscopy revealed that cells in the GCTM-2^highCD9^high fraction formed microcolonies that persisted during extended propagation, whereas microcolonies of cells in the lower fraction underwent gradual extinction. This is similar to the observations of Barbaric et al.⁴⁶, who found that only a subset of SSEA3-positive hPSC formed microcolonies that persisted and grew. Their findings and ours suggest that self-renewal might be a function of the formation of a critical mass of cells expressing survival or growth factors at a sufficiently high local level. We have shown that cells in the GCTM-2^highCD9^high or GCTM-2^highCD9^highEPCAM^high fractions express the highest levels of components of the NODAL signaling pathway^16,17. Recent results in zebrafish indicate that cell–cell contacts are key to Nodal signaling⁴⁷, suggesting the possibility that hPSC might similarly depend on a positive feedback loop of NODAL signaling and cell–cell adhesion to drive self-renewal.

In the mouse, under appropriate culture conditions, naive cells pass through an intermediate state between naive and primed pluripotency, and in this transient state, are competent to undergo germline differentiation²¹. It was previously reported that archetypal hPSC can form primordial germ cell-like cells²². We confirm this finding and show that ESR hPSC have the capacity for germline differentiation. This distinguishes these cells from the naive and primed states in the mouse.

Cells in the GCTM-2^highCD9^highEPCAM^high fraction displayed a cell cycle with a very limited G1 fraction relative to other cells in the population. It has been shown that hPSC pause in G1 when preparing to embark on differentiation^25,26,28. It is possible that ESR stem cells do not execute such a differentiation checkpoint and continue in a self-renewing loop, until a pause in the cell cycle is activated, possibly through an RB-dependent mechanism⁴⁸.

Cells in the GCTM-2^highCD9^highEPCAM^high fraction show higher mitochondrial membrane potential and increased oxidative phosphorylation compared with the general population. Comprehensive metabolite profiling and ¹³C-glucose labeling studies confirmed increased rates of catabolism of pyruvate in the mitochondrial TCA cycle. These studies also showed that increased TCA cycle flux in the GCTM-2^highCD9^highEPCAM^high cells is used to generate key intermediates including citrate and α-ketoglutarate which are subsequently exported from the mitochondria and used for synthesis of lipids and amino acids/proteins. Elevated rates of amino acid synthesis in the GCTM-2^highCD9^highEPCAM^high cells was also supported by increased flux of glycolytic intermediates into serine and glycine synthesis. The high rate of de novo synthesis of cholesterol in hPSC is unusual and further highlights the importance of TCA cycle in generating precursors (i.e., citrate) for this pathway. Overall, these data indicate that the GCTM-2^highCD9^highEPCAM^high cells retain a highly active anabolic metabolism.

DNA methylation levels in mouse naive ES are generally low, similar to the pattern in the preimplantation epiblast, and rise in vivo as development progresses toward gastrulation. In the mouse or primate post-implantation embryo, DNMT3A and DNMT3B, and TET1 are activated shortly after implantation³⁶. Levels of methylation in the GCTM-2^highCD9^highEPCAM^high and general population were similar in this study, and considerably lower than levels reported for mouse epiblast stem cells. In human hPSC, co-expression of DNA methyltransferases and TET enzymes could account for the dynamic nature of DNA methylation, exemplified by the dramatic response of these cells to the presence of ascorbic acid, an activator of TET enzymes, in the cell culture medium⁴⁹.

In contrast to DNA methylation levels, chromatin accessibility varied strikingly in the cell subpopulations that we studied. Chromatin regions that showed high accessibility in the GCTM-2^highCD9^highEPCAM^high mapped to sites of previously identified pluripotency transcription factor binding in hPSC, and to sites of DNAse hypersensitivity that were found to be unique to stem cells. This pattern changed markedly in the GCTM-2^midCD9^mid population. These findings suggest that targets of pluripotency transcription factors in the GCTM-2^highCD9^highEPCAM^high cell fraction might be involved in the regulation of self-renewal. We found no evidence for enrichment of binding sites for TFAP2C in open chromatin in our GCTM-2^highCD9^highEPCAM^high cells, thus distinguishing this population from the naive state of hPSC described earlier⁵⁰.

The properties of the ESR subpopulation of hPSC discussed above are consistent with those of the mouse early post-implantation epiblast. The ability of archetypal hPSC to undergo differentiation into amnion^51,52,53 or the germline^22,54 is also consistent with a phenotype closer to an earlier post-implantation state rather than to the EpiSC. We have confirmed the capacity of archetypal hPSC for germline differentiation, a capacity which is found in the majority of the population of archetypal hPSC grown in defined conditions; further study with refinements to this assay will be required to confirm a higher capacity for germline differentiation in GCTM-2^highCD9^highEPCAM^high cells. We previously used embryoid body assays¹⁶ to show that the ESR subpopulation is pluripotent, as did another study using the teratoma assay¹⁹. These findings are not surprising, since flow cytometry analyses of replated populations of the ESR subpopulation reported here (Fig. 1a vs. 1c) and in a more extensive previous work¹⁹ indicate that this subpopulation can eventually regenerate the entire hierarchy of archetypal hPSC. Here quantitative assessment of directed differentiation into three germ layer lineages in adherent cultures confirmed that both the ESR and the remaining population are pluripotent. Together, our current data and the previous work are consistent with a developmental equivalence of the ESR with early post-implantation epiblast, and not naive or primed pluripotent states.

Previous studies have indicated that gene expression in archetypal hPSC aligns with the primate post-implantation but not preimplantation epiblast^44,45. Our studies on the gene expression in the subpopulation of hPSC with a high capacity for self-renewal are consistent with these findings. The post-implantation epiblast persists longer in the primate than in the mouse, and it is difficult to resolve changes in epiblast gene expression during the post-implantation period until gastrulation, when pluripotency genes turn off and the activation of lineage-specific programs occurs. Our results here and elsewhere demonstrate that the cell cycle and metabolic profile, and the lack of lineage priming in the archetypal hPSC subpopulation showing a high capacity for self-renewal, are all consistent with early post-implantation epiblast, but not an mouse EpiSC-like state. Whether a subpopulation within mouse EpiSC cultures with high self-renewal capacity exists is unknown.

Recently Nakanishi et al.⁵⁵ also described a subpopulation of hPSC with high self-renewal capacity. These cells, which reside at the periphery of hPSC colonies, were isolated on the basis of cell surface expression of NCAD, and display priming toward the primitive endoderm lineage. We previously identified a subset of cells at the periphery of hPSC colonies which co-isolated with the self-renewing subpopulation and similarly co-expressed pluripotency and primitive endoderm genes (GATA4/GATA6/HNF4A/BMP2/FN1)¹⁶. We found that these cells were more abundant in cultures grown in medium supplemented with serum replacement on a feeder cell layer relative to defined conditions; they were quite rare in the HSR subpopulation isolated from cultures grown in defined medium¹⁶. We did not observe cells expressing markers of primitive endoderm in the current study. It is possible that these lineage primed cells are descendants of the subpopulation we describe here. Determination of the relationship between the NCAD⁺ cells and the HSR fraction we have identified will require further analysis, but it is clear that the majority of HSR cells in hPSC cultures grown under defined conditions resemble the early post-implantation epiblast, and not primitive endoderm. Cornacchia et al.⁵⁶ recently reported that hPSC cultured in E8 media display features that are intermediate between naive and primed hPSC. We have used E8 and mTeSR interchangeably in these studies and have not noted differences between the two with respect to the parameters we have studied. Several features of the E8 cultures described by Cornacchia et al.⁵⁶ including increased capacity for self-renewal, pluripotency associated marker and gene expression, and the metabolome, are similar to the properties of the ESR population we describe. A simple interpretation of both sets of data and consistent with prior observations^16,43 is that culture in defined media shifts the overall population toward the ESR state.

The self-renewing subpopulations of archetypal hPSC that we have identified here and elsewhere represent a minority of the culture with distinct properties. The majority of the archetypal hPSC population shows features closer to mouse EpiSC, as noted by others. Even under the defined culture conditions used in this study that enhance self-renewal, transcriptional evidence of neural lineage priming is evident in the general population. The development of methods to propagate pure populations of self-renewing archetypal hPSC populations in a state similar to the formative stage in mouse might enhance the efficiency of cloning and differentiation protocols, and reduce the variability in differentiation efficiency often observed between hPSC lines from different genetic backgrounds. Further elucidation of the molecular regulatory pathways that maintain cells in this state will enhance our understanding of human development and guide efforts to model embryology in a dish.

Methods

hPSC culture, differentiation, and marker expression

Experimental procedures for culture and differentiation of human embryonic stem cells, indirect immunofluorescence microscopy and flow cytometry followed minor modifications to established protocols.

hPSC culture

Human embryonic stem cell stocks (WA09 and FUCCI-G1) were maintained as described previously⁵⁷. The FUCCI cells were a gift from Prof Jonathan S. Draper²⁹. Routine maintenance of hPSC was carried out in serum-supplemented medium with fibroblast feeder cell support and subculture was performed using Dispase or collagenase to harvest fragments of colonies dissected manually.

For experiments, cells were transferred to defined feeder-free conditions (mTeSR conditions) using either mTeSR1 medium or Essential 8™ medium on Matrigel or recombinant vitronectin and subculture performed with Dispase or Tryple Express according to the manufacturers’ protocols.

For dissociation to single cells, cultures were treated with 10 μM of InSolution™Blebbistatin (Merck Millipore cat. no. 203389) for 1 h prior to dissociation to single cells using TrypLE™ (Life Tech, cat. no. 12605). Cultures were incubated in media supplement with 10 μM of InSolution™Blebbistatin overnight after which Blebbistatin was removed.

Routine tests confirmed the absence of mycoplasma contamination and a diploid karyotype (20/20 cells on G-banding) in the cell lines used.

Fluorescence activated cell sorting for colony formation assays

Cells grown for 24 h in the presence of Y-27632 Rho kinase inhibitor were dissociated using Accutase, harvested, and examined under the microscope to ensure that most of the cells had not completely dissociated into single cells. Immunolabeling was carried out by incubation in a primary antibody cocktail containing TG30 anti-CD9 antibody (mouse IgG2a) and GCTM-2 (mouse IgM) (neat supernatants, this laboratory) for 20 min at 4 °C, followed by incubation in a secondary antibody cocktail containing goat anti-mouse IgG2a Alexa Fluor 488 antibody and goat anti-mouse IgM Alexa Fluor 647 antibody (A21131, A21238, both from Thermofisher), diluted in 2% FBS in DMEM-F12 flow cytometry buffer, again at 4 °C for 20 min. The cells were resuspended the cells in 1x DAPI in cold mTeSR1 before filtering the suspension through a 35-µm cell strainer to remove any larger aggregates and debris. Commercial antibodies were used at the dilutions recommended by the manufacturer.

All cell sorting was performed on a BD FACSAria III, using the 100-µm nozzle at 17 psi. A custom cytometer configuration was created that used the 17 psi pressure as opposed to the typical 20 psi pressure. This was performed to minimize aggregates from dissociating into single cells as they were deposited into the 96-well plates. To isolate the GCTM-2^highCD9^high and GCTM-2^midCD9^mid fractions of aggregates and single cells, the gating strategy in Fig. 1a was used (forward and side scatter were employed to gate out debris and to identify single cells and aggregates; propidium iodide or AAD-7 was used to identify dead cells). The various fractions were deposited into 96-well tissue culture plates coated with Matrigel via the automated cell deposition unit on the Aria III. Subsequent to sorting, tissue culture plates were centrifuged at 200 × g for 2 min at room temperature. The sheath solution and media containing mixture was discarded and replaced with 150 µL of mTeSR1 medium.

Single cells from each of two FACS populations (GCTM-2^highCD9^high and GCTM-2^midCD9^mid) were plated at a density between 100 and 4000 cells per well of a 96-well Matrigel-coated plate. Aggregates from each of two FACS populations were placed between 50 and 2000 aggregates per well of a 96-well Matrigel-coated plate (Corning, Costar 3603). Cultures were maintained in mTESR1 with medium changed once after 48 h post-plating. After 72 h, wells were fixed and stained for colony counting as follows. The culture media was removed from cells and each well rinsed once with PBS prior to fixing with 100% ethanol for 10 min at room temperature. Ethanol was removed and wells allowed to air dry for 30 min prior to staining with haematoxylin for 10 min. Wells were then washed four times with Milli-Q water. In all, 0.08% aqueous ammonia solution was added into each well and incubated at room temperature for 2 min. After that, the wells were also washed three times with Milli-Q water and allowed to dry overnight. For time-lapse video microscopy, cells were imaged during 1–24 h post-plating in an environmental chamber (Clear State Solutions TCH 885-9G) to maintain a humid atmosphere with 5% CO₂ in air while phase contrast exposures were recorded every 15 min with a Leica SP8 Confocal microscope.

Differentiation potential of hPSC subpopulations

Differentiation into primordial germ cell-like cells was carried out essentially as described²². Cultures grown in defined, feeder-free conditions (mTeSR1 medium, above) were sorted into GCTM-2^highCD9^high and GCTM-2^midCD9^mid fractions and plated onto fibronectin-coated dishes in Glasgow Minimal Essential Medium supplemented with Activin A (50 ng/ml), CHIR99021 (3 μM), and Y-27632 Rho kinase inhibitor (10 μM) for induction of incipient mesoderm-like cells. Two days later, cultures were harvested and replated as aggregates into Glasgow Minimal Essential Medium supplemented with LIF (10 ng/ml), BMP4 (200 ng/ml), KITLG (100 ng/ml), and EGF (50 ng/ml), or without these supplements in low attachment 96-well plates. Samples were taken at Day 0, 2, 4, and 6 and analyzed by flow cytometry for expression of cell surface ITGA6 and EPCAM using anti-human and mouse ITGA6 BV421 (Rat IgG2a, Biolegend 313624) and mouse anti-human CD236-PerCP (Mouse IgG2b, Biolegend 324213). On Day 4, some aggregates were fixed with 4% paraformaldehyde in PBSA for 30 min, permeabilized in 0.2% TritonX100 and 10% bovine serum albumin in PBSA for 30 min, stained overnight with rabbit antisera against PRDM1 (rabbit monoclonal IgG clone 9115 from Cell Signaling Technology) or NANOS3 (rabbit antisera, Abcamab70001), then incubated 1 h with goat anti-rabbit IgG Alexa488 conjugate, and counterstained with DAPI (Life Tech, Cat. #D1306), prior to examination under indirect immunofluorescence microscopy. Commercial antibodies were used at the concentrations recommended by the manufacturer.

To examine the potential for differentiation into somatic lineages in directed differentiation assays, we used the StemCell Technologies StemDiff Trilineage Differentiation Kit. WA09 or WA01 cells were separated by flow cytometry and seeded onto Matrigel-coated 8-well chamber slides, then incubated in three differentiation media for 5 days following the manufacturer’s instructions. Then the cultures were fixed in 4% paraformaldehyde in PBSA for 15 min, incubated with primary antibodies (rabbit anti-PAX6 polyclonal IgG Biolegend Poly19013; goat anti-T polyclonal IgG R&D AF2085; goat anti-SOX17polyclonal IgG R&D AF1924) for 3 h, then in secondary antibodies (donkey anti-rabbit IgG Alexa Fluor Plus 488 A21206; donkey anti-goat IgG Alexa Fluor Plus 488 A11070, both from Thermofisher) for 1 h followed by counterstaining with DAPI and examination under indirect immunofluorescence. Sufficient cells from each group were counted to attain a 95% confidence interval of <0.05% for the proportion of positive cells. Commercial antibodies were used at the concentrations recommended by the manufacturer.

Analytical and preparative flow cytometry

All single-cell analytical sorting was performed on a BD FACSAria III, using the 100 µm nozzle at 20 psi. Single cells were stained and quantified for the following cell surface markers, GCTM-2 and TG30 or GCTM-2, TG30, and EPCAM. Cells were stained in solution using a mixture of GCTM-2 (mouse IgM, neat hybridoma supernatant) and TG30 (anti-CD9, mouse IgG2a, neat hybridoma supernatant) (double stain) and anti-EPCAM-BV421 (BD Cat. #563180) (triple stain). Primary antibodies against GCTM-2 and TG30 were detected using goat anti-mouse IgM-AF647 (A21238) and goat anti-mouse IgG2a-AF488 (A21131), respectively (Life Tech, Carlsbad, CA). Rat anti-mouse IgG2a Secondary Antibody, PE/Cy7 (RMG2a-62, 407107 Biolegend), was used to detect TG30 in experiments that used FUCCI cell lines, mitochondrial membrane potential dyes, or for the Click-iT® EdU Alexa Fluor® 488 Flow Cytometry Assay (Thermo Fisher, Cat. #10425). Commercial antibodies were used at the concentrations recommended by the manufacturer.

Control samples included unlabeled cells, cells labeled with secondary antibody only and single fluorochrome labeled cells. Cells were sorted using a FACSAria (BD Biosciences) with a 100 μM nozzle and low-pressure conditions. Cells were first gated based on forward and side scatter properties then were analyzed for levels of GCTM-2, TG30, and EPCAM labeling. Double stained cells (GCTM-2 and TG30) were sorted into several populations: GCTM-2^lowCD9^low, GCTM-2^highCD9^high, or GCTM-2^highCD9^highEPCAM^high. The low population consists of cells with low (bottom 25%) expression of GCTM-2 and TG30, the GCTM-2^highCD9^high subset consists of the top 25% of cells expressing GCTM-2 and TG30, whereas the GCTM-2^highCD9^highEPCAM^high subset is a fraction of the GCTM-2^highCD9^high population with the highest expression of EPCAM, representing ~10% of GCTM-2^highCD9^high fraction. Sorted single cells were processed for gene expression analysis as described below. In some experiments GCTM-2^highCD9^highEPCAM^high cells were compared with an unsorted (general) population.

For TMRM or JC-1 analysis, rat anti-mouse IgG2a Secondary Antibody, PE/Cy7 (Biolegend 407107, RMG2a-62) was used to detect TG30. 1 × 10⁶ single cells that were stained with antibodies were resuspended in 1 mL of mTeSR1 supplemented with 250 nM TMRM (Thermo Fisher, Cat. #T668) or 0.3 μg/mL JC-1 (Thermo Fisher, Cat. #T3168) and incubated in the incubator for 30 min. The media was subsequently removed and the cells were washed three times with mTeSR1 before analysis on the flow cytometer.

For Edu incorporation, rat anti-mouse IgG2a Secondary Antibody, PE/Cy7 (Biolegend, 407107, RMG2a-62) was used to detect TG30. Stained single cells were subsequently labeled with Click-iT® EdU Alexa Fluor® 488 Flow Cytometry Assay kit (Thermo Fisher, Cat. #10425) according to manufacturer’s method.

All FACs plots were created using FCS express (De Novo Software) and the coefficient of variation was also calculated using FCS express or using the FloJo software package.

Immunofluorescence microscopy

For staining of cells in colony formation assays, H9 cells that were cultured in 96-well matrigel-coated plates were washed twice with PBS prior to fixation with 2% paraformaldehyde (PFA) for 30 min at room temperature. Cells were permeabilized with 0.3% Triton X-100 in PBS and blocked with 1% IgG-free BSA, incubated in antibody GCTM-2 overnight at 4 °C, followed by goat anti-mouse IgM Alexa Fluor 488 for 30 min. Samples were then washed with PBS and counterstained with DAPI (Life Tech, Cat. #D1306).

For co-staining of live cells, anti-CD9 primary antibody was preincubated with with goat anti-mouse IgG2a-AF488 (A21131). WA09 hPSC were incubated with mTeSR1 containing 250 nM TMRM (Thermo Fisher, Cat. #T668) and AF488 bound TG30 for 30 min at 37 °C and the cells were then washed three times with warmed mTeSR1 before they were visualized under indirect fluorescence microscopy.

Metabolomics studies

The GCTM-2^highCD9^highEPCAM^high subpopulation or unsorted cells were plated in 6-well format and allowed to expand for 24 h. Spent media was aspirated, and cell cultures were washed once in Milli-Q water to remove extraneous media, then sufficient liquid nitrogen was added to cover the base of the culture surface and enable metabolic arrest. Next, cells were incubated with 600 μl (per 10 cm² surface area) of ice cold 9:1 MeOH:CHCl₃ containing 0.83 µM ¹³C₆-sorbitol and 8.3 µM ¹³C₅,¹⁵N-valine for 10 min on ice. The cell lysate was scraped and then transferred to a clean tube, incubated on ice for another 5 min, and centrifuged at 16,100 × g for 5 min at 4 °C following which the supernatant was collected for mass spectrometric analysis.

For stable isotope labeling studies, cells were prepared as described above, but the culture media was supplemented with U-¹³C₆-glucose for 24 h prior to cell harvest.

LC–MS intracellular metabolite profiling analysis

Metabolite analysis was performed by LC–MS, using hydrophilic interaction (HILIC) LC and high resolution QTOF mass spectrometry. Sample extracts (10 μL) were injected onto an Agilent 1290 LC fitted with a ZIC-pHILIC column (5 μm, 2.1 × 150 mm; Merck), and 20 mM ammonium carbonate (A) and acetonitrile (B) as the mobile phases. A 14 min gradient starting from 90% B to 40% B over 12 min, held for 2 min followed by washing at 5% B for 3 min and re-equilibration at 90% B, was used. Mass spectrometry utilized an Agilent 6545 QTOF with heated electrospray source operating in negative ionization mode and scan range m/z 50–1700. Conditioning was performed before each batch using 2–3 blanks and 5 mixtures of authentic standards (234 metabolites), which were analyzed in data-dependent MS/MS mode to facilitate downstream metabolite identification where necessary. PBQC samples were analyzed periodically throughout the analysis.

GC–MS intracellular metabolite profiling analysis

Five hundred microliters of the cell extract was evaporated to complete dryness under vacuum (Christ RVC 2-33). Polar metabolites were derivatised online using a Gerstel MPS2 XL autosampler robot (Gerstel, Germany). Samples were first methoxyaminated by the addition of 20 µL methoxyamine (30 mg/mL in pyridine, 2 h, 37 °C, 750 rpm), followed by trimethylsilylation with 20 µL BSTFA + 1% TMCS (1 h, 37 °C, 750 rpm). Metabolite profiles were acquired on an Agilent 7890A Gas Chromatograph coupled to a 5975C Mass Selective Detector, where 1 µL of derivatised sample was injected into a split/splitless inlet set at 250 °C. Chromatographic separation was achieved using an Agilent VF-5 ms capillary column (30 m × 0.25 mm × 0.25 µm + 10 m duraguard). Oven conditions were set at 35 °C starting temperature, held for 2 min, then ramped at 25 °C/min to 325 °C and held for 5 min. Helium was used as the carrier gas at a flow rate of 1 mL/min. Compounds were fragmented by electron impact (EI) ionization and detected across a m/z range of 50–600 amu, with a scan speed of 9.2 scans/s. Chromatograms were processed using PyMS⁵⁸ to align metabolites and quantify a representative target ion, and subsequently generate a data matrix. Metabolites were annotated using the in-house Metabolomics Australia (MA_25C) metabolite library and NIST11 database.

GC–MS intracellular stable isotope incorporation analysis

Stable isotope labeled samples were prepared for GC–MS and analyzed as described above. ¹³C-targeted metabolomics was performed as previously described⁵⁹. GC–MS was carried out using an Agilent 7890 GC system, VF-5 capillary column with 10 m inert eziguard (J&W Scientific, 30 m, 250 μm inner diameter, 0.25 μm film thickness) and an Agilent 5975 MSD (Agilent Technologies, Santa Clara, USA) in electron ionization (EI) mode. We used mass isotopomer peak shift analysis to measure ¹³C-glucose derived carbon labeling in key metabolites of the glycolytic and TCA cycle. Elution times and the fragmentation can be found on the NIST database.

Seahorse flux analysis

The oxygen consumption rate and extracellular acidification rate were determined using an extracellular flux analyzer (Seahorse XFe96 Analyzer, Agilent). GCTM-2^highCD9^highEPCAM^high cells and the unsorted population were seeded into CellTak Cell and Tissue Adhesive (Corning)-coated wells of a Seahorse XF96 Cell Culture Microplate (n > 6) at 10⁵ cells per well in 180 μl Seahorse XF Assay Medium (supplemented with 10 mM glucose, 2 mM glutamine and 1 mM sodium pyruvate). The mitochondrial electron transport chain was challenged using the Seahorse XF Mito Stress Test Kit (Agilent). A total of 12 measurement cycles were carried out, each cycle consisting of 3 min mixing and 3 min measurement of the oxygen consumption rate and extracellular acidification rate. The first three cycles determined the basal rate and 3 × 3 additional cycles were measured after addition of oligomycin (2 μM final), carbonyl cyanide-4-(trifluoromethoxy)phenylhydrazone (FCCP, 2 μM final) and rotenone/antimycin A (0.5 μM final each). The data were exported and analyzed using Excel (Microsoft) and Wave Software (Agilent).

Reduced representational bisulfite sequencing

FACS sorted cells were snap frozen in buffer RLT plus (Qiagen) before DNA extraction using the AllPrep DNA/RNA Mini Kit (Qiagen) as per the manufacturer’s instructions. DNA was treated with RNase A then purified through the DNA Clean and Concentrator column (Zymo). RRBS libraries were made from 100 ng of purified DNA using the Ovation RRBS Methyl-Seq System (NuGEN), according to the manufacturers recommendations, which includes the Qiagen Epitect kit for bisulfite conversion. Sequencing was performed on a NextSeq 500 with a 75 bp single-end sequencing protocol⁶⁰. Sequence quality control was performed using FastQC⁶¹. Trimming of adapters and low-quality base calls was performed with trim_galore⁶². Trimmed reads were filtered for true RRBS reads (which contain an MspI cut site at the 5ʹ) using trimRRBSdiversityAdaptCustomers.py (NuGEN). Reads were aligned to a bisulfite converted human genome (hg38), using Bismark⁶³ Methylation calls were made with bismark_methylation_extractor⁶³. Analysis of methylation over CpG islands was performed using Seqmonk⁶⁴ where only CGIs with 10 or more informative CpG sites were considered. FastQC, trim_galore, and Seqmonk are all available from www.bioninformatics.babraham.ac.uk.

Chromatin accessibility

DNA accessibility was measured using the FAST-ATAC protocol⁶⁵ with 50,000 sorted cells for each of three biological replicates of both GCTM-2^highCD9^highEPCAM^high and GCTM-2^midCD9^mid populations. Sorted cells were pelleted by centrifugation for 5 min at 4 °C, supernatant was removed, and the cell pellet was washed one time with PBS pH 7.2. Cells were resuspended in 50 µl of transposase mixture consisting of 25 µl of 2x TD buffer (Illumina), 2.5 µl of TDE1 enzyme (Illumina), 0.5 µl of 1% digitonin (G9441, Promega), and 22 µl of nuclease-free water. Transposase reactions were incubated at 37 °C for 30 min with constant shaking in an Eppendorf ThermoMixer. Following transposition, DNA was purified using the Zymo DNA Clean and Concentrator-5 Kit (#D4014) following manufactures protocol and eluted in 20 µl of 10 mM Tris-HCL, pH 8. Transposase samples were barcoded using duel indexes during library amplification using NEBNext Ultra II Q5 Master Mix (#M0544L), 25 µM of each primer, and 20 µl of transposed DNA. Libraries were amplified for a total of nine cycles (98 °C for 10 s, 63 °C or 30 s, 72 °C for 1 min), followed by purification using AMPure beads (Beckman Coulter #A63881). ATAC-seq libraries were visualized on Bioanalyzer for typical nucleosome banding followed by sequencing on the Illumina Next-Seq platform. Libraries were trimmed using trimmomatic⁶⁶ and aligned to Ensemble (build hg19) using bwa⁶⁷ with default settings. Duplicates reads were removed using Picard tools MarkDuplicates, and each aligned read was shifted toward the Tn5 cut site as described³⁷. As reported, we found that FAST-ATAC resulted in low percentages of reads mapping to mitochondria (Supplementary Fig. 4a). Regions of open chromatin were determined by combining alignment files across replicates and using MACS v.1.4.3⁶⁸. Open chromatin regions were merged between both high and low samples to form a universal set of peaks (peakome), and regions from ENCODE blacklist removed. Reads were quantified for each interval in the peakome using bedtools⁶⁹ and normalized using TMM method⁷⁰. We found that each sample had a high fraction of reads in the peakome, showed strong enrichment for open chromatin at transcription start sites, and characteristic distribution of fragments showing nucleosome free regions and mono- and di-nucleosome patterns, indicating high-quality ATAC libraries (Supplementary Fig. 4b–d). Data exploration using principle component analysis (PCA) found that the first PCA, explaining 43% of the variance, separated high and low populations, while the second PCA represented potential batch effects (Supplementary Fig. 4e, f). Based on these observations, a general linearized model in edgeR⁷⁰ was used to identify differences in DNA accessibility between populations including batch as a covariate. Peak set annotations and enrichments were calculated using LOLA³⁹ by comparing peaks more accessible in either high or low populations (FDR < 0.01, Supplementary Table 1) with the universal set of DNAse hypersensitivity sites as the universe against the LOLA core (Supplementary Data 2). Statistical analysis, data exploration, and visualization was performed using R (http://www.R-project.org).

Single-cell RNA sequencing

Single cells were flow sorted into a chilled 384-well PCR plate containing 1.2 μl of primer/lysis mix [20 nM indexed polydT primer, 1:6,000,000 dilution of ERCC RNA spike-in mix (Ambion, 4456740), 1 mM dNTPs, 1.2 units SUPERaseIN Rnase Inhibitor (Thermo Fisher, AM2696), DEPC water (Thermo Fisher, AM9920)] using a BD FACSAria III flow cytometer (BD Biosciences, San Jose, CA, USA) and the protocol described above. Sorted plates were sealed, centrifuged for 1 min at 3000 rpm and immediately frozen upside down at −80 °C until further processing using an adapted CelSeq2 protocol⁷¹.

Sorted plates were thawed on ice and briefly centrifuged. To lyse the cells and anneal the mRNA capture primer the plate was incubated at 65 °C for 5 min and immediately chilled on ice for at least 2 min before adding 0.8 μl reverse transcription reaction mix [in 2 μl RT reaction: 1x Fist Strand buffer (Invitrogen, 18064-014), 20 mM DTT (Invitrogen, 18064-014), 4 units RNaseOUT (Invitrogen, 10777-019), 10 units SuperScript II (Invitrogen, 18064-014)]. The plate was incubated at 42 °C for 1 h, 70 °C for 10 min and chilled to 4 °C to generate first strand cDNA. For second strand cDNA synthesis 6 μl of second strand reaction mix were added [1x NEBNext Second Strand Synthesis buffer (NEB #E6111S), NEBNext Second Strand Synthesis Enzyme Mix: 2.4 units DNA Polymerase I (E. coli), 2 units RNase H, 10 units E. coli DNA Ligase (NEB #E6111S), DEPC water (Thermo Fisher, AM9920)]. The plate was incubated at 16 °C for 2 h to generate double stranded cDNA.

All samples were pooled and cleaned using a 1.2X NucleoMag NGS Clean-up and Size select magnetic beads (Macherey-Nagel, 7449970.5) according to manufactures instruction. To reduce the amount of beads for each 100 μl pooled sample, 20 μl beads and 100 μl bead binding buffer (20% PEG8000, 2.5 M NaCl, pH 5.5) was added. The cDNA was eluted in 6.4 μl DEPC water and kept with beads for the following IVT reaction were 9.6 μl of IVT reaction mix [1.6 μl of each of the following: A, G, C, U, 10X T7 buffer, T7 enzyme (MEGAscript T7 transcription kit (Ambion, AM1334))] was added and incubated at 37 °C for 13 h and then chilled and kept at 4 °C. To remove the leftover primers, 6 μl ExoSAP-IT For PCR Product Clean-Up (Affymetrix, 78200) was added and the sample was incubated at 37 °C for 15 min and then chilled and kept at 4 °C.

Chemical heat fragmentation was performed by adding 5.5 μl of 10X Fragmentation buffer (RNA fragmentation reagents, AM8740) to the sample and incubation in pre-heated thermal cycler at 94 °C for 2.5 min followed by immediately chill on ice and addition of 2.75 μl of Fragmentation Stop buffer (RNA fragmentation reagents, AM8740). The fragmented amplified RNA was purified using 1.8X RNAClean XP beads (Beckman Coulter, A63987) according to manufactures instruction and eluted in 6 µl DEPC water of which 5 μl (no beads) were transferred to a fresh tube for library preparation.

The fragmented RNA was transcribed into cDNA using 5ʹ-tagged random hexamer primers (GCCTTGGCACCCGAGAATTCCANNNNNN) introducing a partial Illumina adapter as also described in ref. ⁷¹. To remove RNA secondary structure and anneal the mRNA capture primer 1 μl of tagged random hexamer (100 µM) and 0.5 μl of 10 mM dNTPs (dNTP solution set NEB, N0446S) were added to the sample and incubated at 65 °C for 5 min and immediately chilled on ice for at least 2 min before adding 4 μl reverse transcription reaction mix [in 10 μl RT reaction: 1x First Strand buffer (Invitrogen, 18064-014), 20 mM DTT (Invitrogen, 18064-014), 4 units RNaseOUT (Invitrogen, 10777-019), 10 units SuperScript II (Invitrogen, 18064-014)].

The PCR primers introduce the full-length adaptor sequence required for Illumina sequencing (for details see Illumina small RNA PCR primers). PCR was performed in 12.5 μl using half of the ranhexRT sample as a template [1X KAPA HiFi HotStart ReadyMix (KapaBiosystems KK2602), 400 nM each primer].

The final PCR amplified Library was submitted to two consecutive 1x NucleoMag NGS Clean-up and Size select magnetic beads (Macherey-Nagel, 7449970.5) according to manufactures instruction. The final library was eluted in 20 μl of 10 mM Tris-HCl solution (Sigma-Aldrich, T2319-1L).

RNA-seq data analysis

CelSeq2 scRNA-sequencing reads were mapped to the GRCh38 human genome using the Subread aligner⁷² and assigned to genes using scPipe⁷³ with ENSEMBL v86 annotation. Gene counts were exported as a matrix by scPipe with UMI-aware counting and imported into R. Cells were removed from further analysis if they fewer than 12,500 total counts, or >60,000 total counts, or 4500 total genes detected. Genes were filtered out if they failed to achieve 1 count in at least 10% of a particular cell condition group. Heatmaps were generated on normalized expression values using heatmap2 from the gplots package with row normalization. Dimensionality reduction was performed on normalized log2-cpm expression values with size factors from computeSumFactors in scran⁷⁴. Single-cell RNA-seq data are available through the Gene Expression Omnibus under accession number #:GSE119323.

The non-human primate data were generated on the SC3-seq platform⁷⁵ and are publicly available under GEO accession number GSE74767. In order to compare between human and macaque expression levels, we defined a set of high confidence orthologous metaexons between the two species as in ref. ⁷⁶. Briefly, we used BLAT⁷⁷ to compare every annotated human exon in ENSEMBL release 86 (737,982 unique exons across 63,305 genes) against the human (hg38) and Macaca fascicularis (macFas5) genomes, retaining those that matched the macaque genome with at least 92% sequence identity and mapped back to their annotated location in humans. We then excluded all exons that had a second match with >90% sequence similarity in either genome, to control for interspecies differences in mappability. Overlapping exons from the same gene (associated with different isoforms) were collapsed into a single “metaexon”.

We discarded overlapping exons associated with more than one ENSEMBL gene ID, exons associated with any gene annotated to two or more chromosomes in either species, and exons where the difference in intron size between the two species is ≥10,000 bp, suggestive of poor genome assembly or annotation. After applying these quality control criteria, we ultimately retained 198,172 unique metaexons in humans and macaques across 34,142 annotated ENSEMBL human genes. The final table, as well as code and additional documentation for metaexon identification is available at http://www.bitbucket.org/ee_reh_neh/orthoexon

We processed all 2526 files from ref. ⁴⁴ (from 390 cells) with sickle [https://github.com/najoshi/sickle] to remove bad reads and trim low-quality bases from the 3ʹ end. We then mapped all reads to macFas5 using Rsubread 1.20.6 and R 3.2.2, allowing up to 2 mismatches and 2 indels per 50 bp read, which is proportional to our setting of 5 mismatches or indels for a 100 bp read. Mapped reads were assigned to the orthologous metaexon list using featureCounts at both the single metaexon and whole-gene level, and summed within individuals in R 3.2.2.

Human scRNA-seq data processing

Sequence analysis was performed on the Illumina Next-Seq 500 platform.

Quality control of raw data was assessed using FASTQC and visualized using MultiQC. The scPipe package v1.0 for R was used to count genes based on UMI profile. Gene expression was normalized using scaterv1.6.1 and scran v1.6.6packages for R. FASTQ files were aligned to hg38 using the Subread package v1.26.1 for R statistical software, aligned reads were re-annotated to exons using ENSEMBL v86 transcriptome to define the exon/intron mapping rate. Cells with <4500 expressed genes or more than 60,000 counts were discarded, resulting in 300 of 370 non-control cells retained for downstream analysis.

Data from both species reflected Log2(CPM + 1) aligned to HG38v86. Data from both species were mapped to a set of highly orthologous metaexons. Supplementary Fig. 8 illustrates the distribution of gene expression in each dataset before and after gene-based filtering. A gene was retained in a given dataset based on expression above a cut off of 1 (non-human primate) or 5 (human) Log2(CPM + 1) in at least 10% of cells assigned to a given phenotype. These cutoffs were selected based on the distribution of gene expression in each dataset. Downstream analysis was limited to the 7308 genes commonly expressed in each dataset.

To rescale and combine the datasets, each gene was assigned a rank in a given cell based on its abundance. Ties in the data were assigned the same rank and the minimum value was used. Supplementary Fig. 9 illustrates the distributions of z-scores of ranked gene expression for every cell in each dataset. PCA was performed on the merged data using prcomp function in the stats base package for R statistical software version 3.3.2. Downstream analysis of the principal components was performed using the mixOmics package version 6.3.1 for R version 3.3.2.Panther.db was used to perform Fischer’s exact test for over-representation of ontological terms in gene sets of interest.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

RNA-seq, scRNA-seq, and ATAC-seq data are available at Gene Expression Omnibus under accession Superseries GSE119326 (RNA-seq, GSE119324, scRNA-seq GSE119323, ATAC-seq GSE147338).

References

Ramos-Ibeas, P., Nichols, J. & Alberio, R. States and origins of mammalian embryonic pluripotency in vivo and in a dish. Curr. Top. Dev. Biol. 128, 151–179 (2018).
Article PubMed Google Scholar
Ying, Q. L. et al. The ground state of embryonic stem cell self-renewal. Nature 453, 519–523 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Brons, I. G. et al. Derivation of pluripotent epiblast stem cells from mammalian embryos. Nature 448, 191–195 (2007).
Article ADS CAS PubMed Google Scholar
Tesar, P. J. et al. New cell lines from mouse epiblast share defining features with human embryonic stem cells. Nature 448, 196–199 (2007).
Article ADS CAS PubMed Google Scholar
Kojima, Y. et al. The transcriptional and functional properties of mouse epiblast stem cells resemble the anterior primitive streak. Cell Stem Cell 14, 107–120 (2014).
Article CAS PubMed Google Scholar
Smith, A. Formative pluripotency: the executive phase in a developmental continuum. Development 144, 365–373 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bernemann, C. et al. Distinct developmental ground states of epiblast stem cell lines determine different pluripotency features. Stem Cells 29, 1496–1503 (2011).
Article CAS PubMed Google Scholar
Iwafuchi-Doi, M. et al. Transcriptional regulatory networks in epiblast cells and during anterior neural plate development as modeled in epiblast stem cells. Development 139, 3926–3937 (2012).
Article CAS PubMed Google Scholar
Teo, A. K. et al. Pluripotency factors regulate definitive endoderm specification through eomesodermin. Genes Dev. 25, 238–250 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tsakiridis, A. et al. Distinct Wnt-driven primitive streak-like populations reflect in vivo lineage precursors. Development 141, 1209–1221 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pera, M. F. In search of naivety. Cell Stem Cell 15, 543–545 (2014).
Article CAS PubMed Google Scholar
Davidson, K. C., Mason, E. A. & Pera, M. F. The pluripotent state in mouse and human. Development 142, 3090–3099 (2015).
Article CAS PubMed Google Scholar
Theunissen, T. W. et al. Systematic identification of culture conditions for induction and maintenance of naive human pluripotency. Cell Stem Cell 15, 471–487 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guo, G. et al. Naive pluripotent stem cells derived directly from isolated cells of the human inner cell mass. Stem Cell Rep. 6, 437–446 (2016).
Article CAS Google Scholar
Laslett, A. L. et al. Transcriptional analysis of early lineage commitment in human embryonic stem cells. BMC Developmental Biol. 7, 12 (2007).
Article CAS Google Scholar
Hough, S. R. et al. Single-cell gene expression profiles define self-renewing, pluripotent, and lineage primed States of human pluripotent stem cells. Stem Cell Rep. 2, 881–895 (2014).
Article CAS Google Scholar
Hough, S. R., Laslett, A. L., Grimmond, S. B., Kolle, G. & Pera, M. F. A continuum of cell states spans pluripotency and lineage commitment in human embryonic stem cells. PLoS ONE 4, e7708 (2009).
Article ADS PubMed PubMed Central CAS Google Scholar
Ohgushi, M. & Sasai, Y. Lonely death dance of human pluripotent stem cells: ROCKing between metastable cell states. Trends Cell Biol. 21, 274–282 (2011).
Article CAS PubMed Google Scholar
Polanco, J. C. et al. Identification of unsafe human induced pluripotent stem cell lines using a robust surrogate assay for pluripotency. Stem Cells 31, 1498–1510 (2013).
Article CAS PubMed Google Scholar
Buecker, C. et al. Reorganization of enhancer patterns in transition from naive to primed pluripotency. Cell Stem Cell 14, 838–853 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hayashi, K., Ohta, H., Kurimoto, K., Aramaki, S. & Saitou, M. Reconstitution of the mouse germ cell specification pathway in culture by pluripotent stem cells. Cell 146, 519–532 (2011).
Article CAS PubMed Google Scholar
Sasaki, K. et al. Robust in vitro induction of human germ cell fate from pluripotent stem cells. Cell Stem Cell 17, 178–194 (2015).
Article CAS PubMed Google Scholar
Coronado, D. et al. A short G1 phase is an intrinsic determinant of naive embryonic stem cell pluripotency. Stem Cell Res. 10, 118–131 (2013).
Article PubMed Google Scholar
Ter Huurne, M., Chappell, J., Dalton, S. & Stunnenberg, H. G. Distinct cell-cycle control in two different states of mouse pluripotency. Cell Stem Cell 21, 449–455 (2017).
Article PubMed PubMed Central CAS Google Scholar
Pauklin, S. & Vallier, L. The cell-cycle state of stem cells determines cell fate propensity. Cell 155, 135–147 (2013).
Article CAS PubMed PubMed Central Google Scholar
Singh, A. M. et al. Cell-cycle control of bivalent epigenetic domains regulates the exit from pluripotency. Stem Cell Rep. 5, 323–336 (2015).
Article CAS Google Scholar
Soufi, A. & Dalton, S. Cycling through developmental decisions: how cell cycle dynamics control pluripotency, differentiation and reprogramming. Development 143, 4301–4311 (2016).
Article CAS PubMed PubMed Central Google Scholar
Filipczyk, A. A., Laslett, A. L., Mummery, C. & Pera, M. F. Differentiation is coupled to changes in the cell cycle regulatory apparatus of human embryonic stem cells. Stem Cell Res. 1, 45–60 (2007).
Article CAS PubMed Google Scholar
Calder, A. et al. Lengthened G1 phase indicates differentiation status in human embryonic stem cells. Stem Cells Dev. 22, 279–295 (2013).
Article CAS PubMed Google Scholar
Houghton, F. D., Thompson, J. G., Kennedy, C. J. & Leese, H. J. Oxygen consumption and energy metabolism of the early mouse embryo. Mol. Reprod. Dev. 44, 476–485 (1996).
Article CAS PubMed Google Scholar
Mathieu, J. & Ruohola-Baker, H. Metabolic remodeling during the loss and acquisition of pluripotency. Development 144, 541–551 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sperber, H. et al. The metabolome regulates the epigenetic landscape during naive-to-primed human embryonic stem cell transition. Nat. Cell Biol. 17, 1523–1535 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. HIF1alpha induced switch from bivalent to exclusively glycolytic metabolism during ESC-to-EpiSC/hESC transition. EMBO J. 31, 2103–2116 (2012).
Article CAS PubMed PubMed Central Google Scholar
Atlasi, Y. & Stunnenberg, H. G. The interplay of epigenetic marks during stem cell differentiation and development. Nat. Rev. Genet. 18, 643–658 (2017).
Article CAS PubMed Google Scholar
Guo, H. et al. The DNA methylation landscape of human early embryos. Nature 511, 606–610 (2014).
Article ADS CAS PubMed Google Scholar
Khoueiry, R. et al. Lineage-specific functions of TET1 in the postimplantation mouse embryo. Nat. Genet. 49, 1061–1072 (2017).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Article CAS PubMed PubMed Central Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS CAS Google Scholar
Sheffield, N. C. & Bock, C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics 32, 587–589 (2016).
Article CAS PubMed Google Scholar
Sheffield, N. C. et al. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res. 23, 777–788 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kalkan, T. et al. Tracking the embryonic stem cell transition from ground state pluripotency. Development 144, 1221–1234 (2017).
Article CAS PubMed PubMed Central Google Scholar
van de Leemput, J. et al. CORTECON: a temporal transcriptome analysis of in vitro human cerebral cortex development from human embryonic stem cells. Neuron 83, 51–68 (2014).
Article PubMed CAS Google Scholar
Kolle, G. et al. Identification of human embryonic stem cell surface markers by combined membrane-polysome translation state array analysis and immunotranscriptional profiling. Stem Cells 27, 2446–2456 (2009).
Article ADS CAS PubMed Google Scholar
Nakamura, T. et al. A developmental coordinate of pluripotency among mice, monkeys and humans. Nature 537, 57–62 (2016).
Article ADS CAS PubMed Google Scholar
Stirparo, G. G. et al. Integrated analysis of single-cell embryo data yields a unified transcriptome signature for the human pre-implantation epiblast. Development 145, https://doi.org/10.1242/dev.158501 (2018).
Barbaric, I. et al. Time-lapse analysis of human embryonic stem cells reveals multiple bottlenecks restricting colony formation and their relief upon culture adaptation. Stem Cell Rep. 3, 142–155 (2014).
Article CAS Google Scholar
Barone, V. et al. An effective feedback loop between cell-cell contact duration and morphogen signaling determines cell fate. Dev. Cell 43, 198–211 (2017).
Article CAS PubMed Google Scholar
Boward, B., Wu, T. & Dalton, S. Concise Review: Control of cell fate through cell cycle and pluripotency networks. Stem Cells 34, 1427–1436 (2016).
Article PubMed PubMed Central Google Scholar
Chung, T. L. et al. Vitamin C promotes widespread yet specific DNA demethylation of the epigenome in human embryonic stem cells. Stem Cells 28, 1848–1855 (2010).
Article CAS PubMed Google Scholar
Pastor, W. A. et al. TFAP2C regulates transcription in human naive pluripotency by opening enhancers. Nat. Cell Biol. 20, 553–564 (2018).
Article CAS PubMed PubMed Central Google Scholar
Deglincerti, A. et al. Self-organization of the in vitro attached human embryo. Nature 533, 251–254 (2016).
Article ADS CAS PubMed Google Scholar
Shahbazi, M. N. et al. Self-organization of the human embryo in the absence of maternal tissues. Nat. Cell Biol. 18, 700–708 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shao, Y. et al. Self-organized amniogenesis by human pluripotent stem cells in a biomimetic implantation-like niche. Nat. Mater. 16, 419–425 (2017).
Article ADS CAS PubMed Google Scholar
Kojima, Y. et al. Evolutionarily distinctive transcriptional and signaling programs drive human germ cell lineage specification from pluripotent stem cells. Cell Stem Cell 21, 517–532 (2017).
Article CAS PubMed Google Scholar
Nakanishi, M. et al. Human pluripotency is initiated and preserved by a unique subset of founder cells. Cell https://doi.org/10.1016/j.cell.2019.03.013 (2019).
Cornacchia, D. et al. Lipid deprivation induces a stable, naive-to-primed intermediate state of pluripotency in human PSCs. Cell Stem Cell 25, 120–136 (2019).
Article CAS PubMed PubMed Central Google Scholar
Reubinoff, B. E., Pera, M. F., Fong, C. Y., Trounson, A. & Bongso, A. Embryonic stem cell lines from human blastocysts: somatic differentiation in vitro. Nat. Biotechnol. 18, 399–404 (2000).
Article CAS PubMed Google Scholar
O’Callaghan, S. et al. PyMS: a Python toolkit for processing of gas chromatography-mass spectrometry (GC-MS) data. Application and comparative study of selected tools. BMC Bioinforma. 13, 115 (2012).
Article Google Scholar
Kowalski, G. M. et al. Application of dynamic metabolomics to examine in vivo skeletal muscle glucose metabolism in the chronically high-fat fed mouse. Biochem Biophys. Res. Commun. 462, 27–32 (2015).
Article CAS PubMed Google Scholar
Yin, D. et al. High concordance between Illumina HiSeq2500 and NextSeq500 for reduced representation bisulfite sequencing (RRBS). Genom. Data 10, 97–100 (2016).
Article PubMed PubMed Central Google Scholar
Andrews, S. FastQC (Babrahams Bioinformatics, 2010).
Krueger, F. Trim Galore v.0.6.5. (Babraham Bioinformatics, 2019).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS PubMed PubMed Central Google Scholar
Andrews, S. SeqMonk (Babraham Bioinformatics, 2010).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central CAS Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
Article PubMed PubMed Central CAS Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
Article PubMed PubMed Central CAS Google Scholar
Tian, L. et al. scPipe: a flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data. PLoS Comput Biol. 14, e1006361 (2018).
Article PubMed PubMed Central CAS Google Scholar
Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Article PubMed CAS Google Scholar
Nakamura, T. et al. SC3-seq: a method for highly parallel and quantitative measurement of single-cell gene expression. Nucleic Acids Res. 43, e60 (2015).
Article PubMed PubMed Central CAS Google Scholar
Blekhman, R., Marioni, J. C., Zumbo, P., Stephens, M. & Gilad, Y. Sex-specific and lineage-specific alternative splicing in primates. Genome Res. 20, 180–189 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study was supported by the Australian Research Council Special Research Initiative in StemCell Sciences (SR1101002), and Bioplatforms Australia, through funding from the Australian Government National Collaborative Research Infrastructure Strategy. M.J.M. is a Principal Research Fellow of the National Health and Medical Research Council. M.E.B. was supported by a Bellberry-Viertel Senior Medical Research Fellowship.

Author information

These authors contributed equally: Kevin X. Lau, Elizabeth A. Mason.

Authors and Affiliations

Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Victoria, 3010, Australia
Kevin X. Lau, Elizabeth A. Mason, Joshua Kie, Christine A. Wells & Martin F. Pera
Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Victoria, 3010, Australia
Elizabeth A. Mason & Christine A. Wells
Metabolomics Australia, Bio21 Institute of Molecular Science and Biotechnology, University of Melbourne, Parkville, Victoria, 3052, Australia
David P. De Souza, Dedreia Tull & Malcolm J. McConville
Department of Biochemistry and Molecular Biology, Bio21 Institute of Molecular Science and Biotechnology, University of Melbourne, Parkville, Victoria, 3052, Australia
Joachim Kloehn & Malcolm J. McConville
Division of Molecular Medicine, The Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria, 3052, Australia
Andrew Keniry, Tamara Beck, Marnie E. Blewitt, Matthew E. Ritchie, Shalin H. Naik, Daniela Zalcenstein, Shian Su & Martin F. Pera
Department of Medical Biology, University of Melbourne, Melbourne, Victoria, 3010, Australia
Andrew Keniry & Marnie E. Blewitt
Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, 4072, Australia
Othmar Korn
Melbourne Integrative Genomics, School of Biosciences, University of Melbourne, Melbourne, Victoria, 3010, Australia
Irene Gallego Romero
The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Catrina Spruce, Christopher L. Baker, Tracy C. McGarr & Martin F. Pera
Divisions of Cancer and Hematology and Molecular Medicine, The Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria, 3052, Australia
Christine A. Wells
The Florey Institute of Neuroscience and Mental Health, 30 Royal Parade, Parkville, Victoria, 3052, Australia
Martin F. Pera

Authors

Kevin X. Lau
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth A. Mason
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Kie
View author publications
You can also search for this author in PubMed Google Scholar
David P. De Souza
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Kloehn
View author publications
You can also search for this author in PubMed Google Scholar
Dedreia Tull
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm J. McConville
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Keniry
View author publications
You can also search for this author in PubMed Google Scholar
Tamara Beck
View author publications
You can also search for this author in PubMed Google Scholar
Marnie E. Blewitt
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Ritchie
View author publications
You can also search for this author in PubMed Google Scholar
Shalin H. Naik
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Zalcenstein
View author publications
You can also search for this author in PubMed Google Scholar
Othmar Korn
View author publications
You can also search for this author in PubMed Google Scholar
Shian Su
View author publications
You can also search for this author in PubMed Google Scholar
Irene Gallego Romero
View author publications
You can also search for this author in PubMed Google Scholar
Catrina Spruce
View author publications
You can also search for this author in PubMed Google Scholar
Christopher L. Baker
View author publications
You can also search for this author in PubMed Google Scholar
Tracy C. McGarr
View author publications
You can also search for this author in PubMed Google Scholar
Christine A. Wells
View author publications
You can also search for this author in PubMed Google Scholar
Martin F. Pera
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Contributions of authors to drafting of specific sections of the paper are identified by name of section titles. K.X.L.: design of experiments, collection of data, and writing of the paper (Methods); E.A.M.: analysis of data and writing of the paper (Methods); J.K.: design of experiments, collection of data, and writing of the paper (Methods); D.P.D. and M.M.: design of experiments, analysis of data, and writing of the paper (Methods, Results); J.K.: collection and analysis of data; D.T., T.B., C.S., and T.C.M.: collection of data; A.K.: design of experiments, analysis and collection of data, and writing of the paper (Methods); M.E.B.: design of experiments, analysis of data; M.E.R.: analysis of data; S.H.N.: design of experiments, analysis of results, writing of the paper (Methods); D.Z.: design of experiments and collection of data; O.K. and S.S.: analysis of results; I.G.R., design of experiments, analysis of results, writing of manuscript (Methods); C.L.B.: design of experiments, analysis of results, writing of the paper; C.A.W.: design of experiments, analysis of results, and writing of the paper (Methods, Results); M.F.P.: conception of study, design of experiments, analysis of results, and writing of the paper.

Corresponding author

Correspondence to Martin F. Pera.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Movie 1

Supplementary Movie 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lau, K.X., Mason, E.A., Kie, J. et al. Unique properties of a subset of human pluripotent stem cells with high capacity for self-renewal. Nat Commun 11, 2420 (2020). https://doi.org/10.1038/s41467-020-16214-8

Download citation

Received: 13 June 2019
Accepted: 16 April 2020
Published: 15 May 2020
DOI: https://doi.org/10.1038/s41467-020-16214-8

This article is cited by

Complex regulatory networks influence pluripotent cell state transitions in human iPSCs
- Timothy D. Arthur
- Jennifer P. Nguyen
- Kelly A. Frazer
Nature Communications (2024)
Recurrent RNA edits in human preimplantation potentially enhance maternal mRNA clearance
- Yang Ding
- Yang Zheng
- Xiaochen Bo
Communications Biology (2022)
Spelling Out CICs: A Multi-Organ Examination of the Contributions of Cancer Initiating Cells’ Role in Tumor Progression
- Shivani Baisiwala
- Shreya Budhiraja
- Atique U. Ahmed
Stem Cell Reviews and Reports (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Self-renewal of subsets of hPSC

Differentiation potential of hPSC subpopulations

ESR cells have a cell cycle with a low G1 fraction

Active mitochondria and bivalent metabolism in ESR cells

DNA methylation in hPSC cultured under defined conditions

Differential chromatin accessibility in hPSC subpopulations

Comparison of ESR cell transcriptome with primate epiblast

Discussion

Methods

hPSC culture, differentiation, and marker expression

hPSC culture

Fluorescence activated cell sorting for colony formation assays

Differentiation potential of hPSC subpopulations

Analytical and preparative flow cytometry

Immunofluorescence microscopy

Metabolomics studies

LC–MS intracellular metabolite profiling analysis

GC–MS intracellular metabolite profiling analysis

GC–MS intracellular stable isotope incorporation analysis

Seahorse flux analysis

Reduced representational bisulfite sequencing

Chromatin accessibility

Single-cell RNA sequencing

RNA-seq data analysis

Human scRNA-seq data processing

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links