Supplementary Figure 3: K-mer analysis for estimating the genome size of G. arboreum.
From: Genome sequence of the cultivated cotton Gossypium arboreum

The volume of K-mers is plotted against the frequency at which they occur. The left, truncated peak at low frequency and high volume represents K-mers containing essentially random sequencing errors, whereas the right distribution represents proper (putatively error-free) data. The total K-mer number is 43,099,162,547, and the volume peak is 25. The genome size can be estimated as (total K-mer number)/(volume peak), which is 1,724 Mb.