Paleoproterozoic sterol biosynthesis and the rise of oxygen


Natural products preserved in the geological record can function as ‘molecular fossils’, providing insight into organisms and physiologies that existed in the deep past. One important group of molecular fossils is the steroidal hydrocarbons (steranes), which are the diagenetic remains of sterol lipids. Complex sterols with modified side chains are unique to eukaryotes, although simpler sterols can also be synthesized by a few bacteria1. Sterol biosynthesis is an oxygen-intensive process; thus, the presence of complex steranes in ancient rocks not only signals the presence of eukaryotes, but also aerobic metabolic processes2. In 1999, steranes were reported in 2.7 billion year (Gyr)-old rocks from the Pilbara Craton in Australia3, suggesting a long delay between photosynthetic oxygen production and its accumulation in the atmosphere (also known as the Great Oxidation Event) 2.45–2.32 Gyr ago4. However, the recent reappraisal and rejection of these steranes as contaminants5 pushes the oldest reported steranes forward to around 1.64 Gyr ago (ref. 6). Here we use a molecular clock approach to improve constraints on the evolution of sterol biosynthesis. We infer that stem eukaryotes shared functionally modern sterol biosynthesis genes with bacteria via horizontal gene transfer. Comparing multiple molecular clock analyses, we find that the maximum marginal probability for the divergence time of bacterial and eukaryal sterol biosynthesis genes is around 2.31 Gyr ago, concurrent with the most recent geochemical evidence for the Great Oxidation Event7. Our results therefore indicate that simple sterol biosynthesis existed well before the diversification of living eukaryotes, substantially predating the oldest detected sterane biomarkers (approximately 1.64 Gyr ago6), and furthermore, that the evolutionary history of sterol biosynthesis is tied to the first widespread availability of molecular oxygen in the ocean–atmosphere system.

Figure 1: Phylogeny and synteny of sqmo and osc genes.
Figure 2: Molecular clock for one of the datasets used in this study.
Figure 3: Marginal probability curves for the timing of the Bacterial Group I/stem-eukaryote split.

We gratefully acknowledge funding from the Agouron Institute Geobiology Fellowship to D.A.G. and the Simons Foundation Collaboration on the Origins of Life to R.E.S. and G.P.F. Additional support was provided by the National Science Foundation programme ‘Frontiers of Earth System Dynamics’ (EAR-1338810) to R.E.S., and the National Science Foundation programme ‘Integrated Earth Systems’ (IES-1615426) to G.P.F.

Author information

Authors and Affiliations



R.E.S. and D.A.G. designed the experiment. D.A.G. and A.C. performed the data analysis. All authors were involved in interpreting the data and drafting the manuscript.

Corresponding author

Correspondence to Roger E. Summons.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Maximum likelihood (RAxML) tree, showing removal of problematic SQMO sequences.

Extended Data Figure 2 Maximum likelihood (RAxML) tree, showing removal of problematic OSC sequences.

Extended Data Figure 3 Maximum likelihood (RAxML) tree from vetted SQMO dataset.

Extended Data Figure 4 Maximum likelihood (RAxML) tree from vetted OSC dataset.

Extended Data Figure 5 Bayesian (MrBayes) tree from vetted SQMO dataset.

Extended Data Figure 6 Bayesian (MrBayes) tree from vetted OSC dataset.

Extended Data Figure 7 Reproducibility of BEAST runs, and relationship between BEAST and RelTime trees.

Extended Data Table 1 Distribution of marginal probabilities for all molecular clock analyses, binned by geological time
Extended Data Table 2 Fossil calibration points used in molecular clock

Supplementary information

Supplementary Information

This file contains Supplementary Results and Discussion and additional references. (PDF 346 kb)

Supplementary Data

This zipped file contains the files for Supplementary Data 1 and 2. In Data 1 all amino acid alignments and trees from this study are shown; the GenInfo Identifier (GI) numbers for sequences used are included in the taxon IDs. Data 2 contains the code used in this analysis. Please note that the authors place no restriction on its use. (ZIP 485 kb)

PowerPoint slides

