FEEDBACK  |  CONTACT  |  SITE MAP   
Please ask an URGI account
WHEAT URGI
You are here : Home / Home Wheat / Seq Repository / Annotations

Annotations

IWGSC RefSeq Annotations :

  • IWGSC Annotation v2.1 is available in open access for download and display in a browser .

Under the leadership of Frédéric Choulet and Hélène Rimbert (INRAE) and with funding from the French Government managed by the Research National Agency (ANR) under the Investment for the Future program (BreedWheat project ANR-10-BTBR-03), a new annotation, IWGSC Annotation v2.1, to accompany RefSeq v2.1 was completed. Initially, the previous annotation was updated to IWGSC Annotation v1.2 by integrating a set of 117 novel genes and 81 microRNA, many of which had been curated manually by the wheat community and then this, in turn, was used to annotate IWGSC RefSeq v2.1. The transposable elements (TEs) in the resulting assembly IWGSC RefSeq v2.1 were reannotated and gene annotation was updated by transferring the previously known gene models (v1.1) using a fine-tuned, dedicated strategy implemented in the Marker-Assisted Gene Annotation Transfer for Triticeae (MAGATT ) pipeline. The newly released IWGSC RefSeq Annotation v2.1 contains 266,753 genes comprising 106,913 HC genes and 159,840 LC genes.

Article: Zhu et al., Optical maps refine the bread wheat Triticum aestivum cv Chinese Spring genome assembly, Plant J. 2021 Apr 24, https://doi.org/10.1111/tpj.15289

The corresponding asszmbly is also available in open access: IWGSC RefSeq v2.1

cf. IWGSC announcement .

 

The IWGSC RefSeq v1.0 annotation includes gene models generated by integrating predictions made by INRA-GDEC using Triannot and PGSB using their customised pipeline (previously MIPS pipeline). The integration was undertaken by the Earlham institute (EI), who have also added UTRs to the gene models where supporting data are available. Gene models have been assigned to high confidence (HC) or low confidence (LC) classes based on completeness, similarity to genes represented in protein and DNA databases and repeat content. The automated assignment of functional annotation to genes has been generated by PGSB based on AHRD parameters.

In addition, annotated transposable elements (TEs), non-coding RNAs, varietal SNPs, RH maps, GBS maps, optical maps are available.

The syntenic gene pairs are available for download .

More information about these data is provided in the README file .

It is the new version of the genes annotation which refer to the same assembly. It includes genes and RNAseq mapping.

In comparison of the v1.0 annotation,  3 modifications were done:

- add wrongly removed genes during the integration

- remove LC which have an over lap with manually curated genes  (IWGSC_v1.1_LC_removed.ids )

- update ids of TE - LC genes coming from the HC set in order to fit with the LC naming and  numbering (IWGSC_v1.1_LC.correspondanceTEHC.txt).

More information about these data is provided in the README file .

 

How to access the data?

All these data are now in open access. While scientists may freely publish using the IWGSC data, IWGSC does request that the source of the data be properly acknowledged.

>>> The corresponding Assembly is accessible here . <<<

These data should be displayed in Ensembl Plants and GrainGenes .

 

Warning:

Notice that some bioinformatics tools (e.g. GATK) requiere that you split the chromosomes to chunks of 512 Mbp maximum.

 

 

IWGSC Survey sequence annotations

Versions 1 and 2 :

  • Gene models performed by MIPS plant group (K. Mayer) are publicly available. Major changes are:
    a.) we re-named the genome assembly scaffolds from the old e.g ">10" identifiers to ">ta_iwgsc_1al_v2_10" identifiers for the fasta files of CLEANED and repeat-masked genome sequences and adapted the ids in the annotation GTF files accordingly.
    b.) we fixed an issue with missing stop codons in the gene prediction fasta and GTF files
    NO structural changes were made between v2.1 and v2.2 annotation, all gene identifiers remain stable, so this update can be considered cosmetic and mainly attributed to better user convenience.
    Re-named genome assembly: genome_assembly/genome_arm_assemblies_CLEANED/ and genome_assembly/genome_arm_assemblies_CLEANED_REPMASKED/
    gene predictions incl. changelog, README, ...: genePrediction_v2.2/
  • POPSEQ performed by IPK (N. Stein) is publicly available. 

 

Version 3 :

Gene models performed by the National Research Council Canada and the U. of Saskatchewan (A. Sharpe, D. Konkin and C. Pozniak) are publicly available.

 

 

 

 
 


Update: 27 Apr 2021
Creation date: 27 Feb 2013