The cultivated strawberry (Fragaria x ananassa) is one of the most popular and globally consumed fruit crops.
F. x ananassa is an octoploid (2n=8X=56) species that originated from a natural hybridization between F. virginiana and F. chiloensis.
The genus Fragaria belongs to the family Rosaceae, and comprised with one cultivated
(F. x ananassa) and 21 wild species, including 12 diploids, five tetraploids, one hexaploids, two octoploids, and one decaploid.
De novo whole genome sequencing in octoploid strawberry, F. x ananassa, was performed by using Illumina and Roche 454 sequencing platforms. An Japanese variety bred in Chiba prefecture, 'Reikou', was subjected the analysis.
A virtual 'reference genome', which integrated genome sequences of homeologous chromosomes, was constructed by eliminating heterozygous bases in the process of sequence assembly (FANhybrid_r1.2). In parallel, four wild Fragaria species, which represent genetic diversity in the genus Fragaria, were selected based on simple sequence repeat (SSR) markers, and were subjected to whole genome sequences by using an Illumina plat form. The assembled contigs of the wild species, along with the F. x ananassa contigs were designated as below:
FAN (F. x ananassa), FII (F. iinumae), FNI (F. nipponica), FNU (F. nubicola) and FOR (F. orientalis).
The sequence IDs were named according to the following criteria.
Reference genome: The sequences derived from the 454 scaffolds were prefixed 'FANhyb_rscf' with sequence specific eight digits. The sequences derived from the Illumina scaffolds and unassembled contigs were prefixed 'FANhyb_icon' and suffixed '_a' after eight digiits.
The Illumina singlets sequences were prefixed 'FANhyb_iscf' or 'FANhyb_icon'.
The former and later were used for sequences derived from the Illumina scaffolds and unassembled contigs, respectively, that developed by SOAPdenovo 1.0.5. '_r' and '_o' were suffixed after eight digits for repeat and outlier singlets, respectively, and others were suffixed with '_s'.
Illumina assembled genome sequences: The each Illumina scaffolds and unassembled contigs were named with three capital alphabets representing the designated genome names (FAN, FII, FNI, FNU, FOR), followed by 'iscf' or 'icon', and eight digit. 'iscf' and 'icon' were used for scaffolds and unassembled contigs, respectively.