FEEDBACK  |  CONTACT  |  SITE MAP   
Please ask an URGI account
WHEAT URGI
You are here : Home / Home Wheat / Seq Repository / Annotations

Annotations

IWGSC RefSeq Annotations :

The IWGSC RefSeq v1.0 annotation includes gene models generated by integrating predictions made by INRA-GDEC using Triannot and PGSB using their customised pipeline (previously MIPS pipeline). The integration was undertaken by the Earlham institute (EI), who have also added UTRs to the gene models where supporting data are available. Gene models have been assigned to high confidence (HC) or low confidence (LC) classes based on completeness, similarity to genes represented in protein and DNA databases and repeat content. The automated assignment of functional annotation to genes has been generated by PGSB based on AHRD parameters.

In addition, annotated transposable elements (TEs), non-coding RNAs, varietal SNPs, RH maps, GBS maps, optical maps are available.

More information about these data is provided in the README file .

It is the new version of the genes annotation which refer to the same assembly. It includes genes and RNAseq mapping.

In comparison of the v1.0 annotation,  3 modifications were done:

- add wrongly removed genes during the integration

- remove LC which have an over lap with manually curated genes  (IWGSC_v1.1_LC_removed.ids )

- update ids of TE - LC genes coming from the HC set in order to fit with the LC naming and  numbering (IWGSC_v1.1_LC.correspondanceTEHC.txt).

More information about these data is provided in the README file .

 

How to access the data?

All these data are now in open access. While scientists may freely publish using the IWGSC data, IWGSC does request that the source of the data be properly acknowledged.

>>> The corresponding Assembly is accessible here . <<<

These data should be displayed in Ensembl Plants and GrainGenes .

 

IWGSC Survey sequence annotations

Versions 1 and 2 :

  • Gene models performed by MIPS plant group (K. Mayer) are publicly available. Major changes are:
    a.) we re-named the genome assembly scaffolds from the old e.g ">10" identifiers to ">ta_iwgsc_1al_v2_10" identifiers for the fasta files of CLEANED and repeat-masked genome sequences and adapted the ids in the annotation GTF files accordingly.
    b.) we fixed an issue with missing stop codons in the gene prediction fasta and GTF files
    NO structural changes were made between v2.1 and v2.2 annotation, all gene identifiers remain stable, so this update can be considered cosmetic and mainly attributed to better user convenience.
    Re-named genome assembly: genome_assembly/genome_arm_assemblies_CLEANED/ and genome_assembly/genome_arm_assemblies_CLEANED_REPMASKED/
    gene predictions incl. changelog, README, ...: genePrediction_v2.2/
  • POPSEQ performed by IPK (N. Stein) is publicly available. 

 

Version 3 :

Gene models performed by the National Research Council Canada and the U. of Saskatchewan (A. Sharpe, D. Konkin and C. Pozniak) are publicly available.

 

 

 

 
 


Update: 21 Aug 2018
Creation date: 27 Feb 2013