logo_genomicus

Help and Documentation






A quick primer...

Difference between AlignView and PhyloView

Technical requirements

Symbol Legends

Syntenic Blocks in ancestral species

Conserved Non-coding Elements

Frequently Asked Questions

Questions & Comments: contact us

Credits & How to cite


A quick primer...


1. Enter a gene name and either select a species from the menu or click "search" and select a species from the list that appears. This will be the "reference gene", and the "reference species".
2. A 4-column table is presented where you are asked to select either AlignView or PhyloView for a given root ancestor (column 1). See below for a quick graphical overview of the differences between the two type of views.
3. In the chosen view, navigate around the reference gene by : selecting ancestral genomes, selecting only extant species, collapsing 2X (low coverage) genomes, collapsing branches that are not of immediate interest, zooming in/out, shifting left/right, switching views, switching reference gene/species, etc...

Example: Query with the gene PAX1, and select Species "Human". Click Search. In the table that appears, select AlignView with "Chordata" as root. This will show the human PAX1 gene as reference, with 10 genes on either side, all in different colours. Above the human, by default are shown the reconstructed regions in all the ancestors of human up to chordates. All the extant species are hidden by default. Click on "Hide Ancestors", then "Expand All", then "Hide 2X species". To clear up the display, click on the nodes corresponding to Glires, Laurasiatheria, Sauria, Oppossum and Platypus to collapse (hide) these parts of the tree. What remains is an alignement of the human locus containing PAX1, with some primate and teleost fish genomes. Strikingly, teleost fish genomes each show two chromosomes for a single region in human, where some genes are duplicated but some genes have been differentially lost, as a consequence of the whole genome duplication that occurred in the ancestor of teleost fish. Click here to see the result of the above example.


Difference between ALignView and PhyloView


The choice between the two views is initially made right after querying the database with a gene name. You are presented with a 4-column table where you are asked to select one of two type of views (columns 3 and 4) for a given root ancestor (column 1). See below for a quick graphical overview of the differences between the two type of views. Columns 3 and 4 (e.g. [34 desc.]) indicate the number of nodes in the tree that will be shown in the display in the next page. In AlignView, this corresponds to the number of species (ancestral and extant) that descend from the ancestor shown in column 1. In PhyloView, this corresponds to all the nodes that descend from the ancestral copy of the reference gene in the ancestral root species shown in column 1, including speciation nodes (intermediate ancestral copies of the reference gene), duplication nodes, and terminal nodes (modern copies of the gene in extant species). In short, AlignView will be based on a species tree, while PhyloView will be based on a phylogenetic tree specific to the reference gene. The example below is a query for gene CD209 in human, with Primates as root.

AlignView


PhyloView

This view shows an alignment between gene orders in different species. The species chosen intially when querying the database is the reference species, and the reference gene itself is positioned in the centre of the display, intersecting with the dashed line. The species tree is shown on the left of the aligned genomes, and by default 10 genes on either side of the reference gene will be shown in the reference species, with different colours. Orthologs of these genes in other species are shown in matching colours. If a species shares the root ancestor (here the primate ancestor) with the reference species but does not possess any ortholog of the 21 genes on display in the reference genome (the reference gene plus 10 genes on either side) then this species is not shown at all. This view shows the gene order of a reference gene and its neighbouring genes, and the order of their respective orthologs and paralogs in different species that share the same ancestral "root" species. The tree on the left of the display is the phylogenetic tree (computed by Ensembl) of the gene shown in the middle that intersects the vertical line. In this view, some species may appear twice if there is a duplication node shown as square (as here in the second node along the red path). Orthologs of these genes in other species are shown in matching colours. See below for a more detailed legend of PhyloView symbols
alignview phyloview




Technical requirements back to top

The Genomicus browser is currently optimised for Firefox and Safari, and is known to behave poorly or not at all with Internet Explorer. We are working on this issue. More specifically:

1) Firefox: The browser has been tested on Firefox for Linux (version 1.5.1 and above) and Macintosh (version 3 and above).
2) Safari: The browser has been tested on Safari for Macintosh version 3.1.2 and above.
3) Internet Explorer. The Adobe SVG Viewer plugin needs to be installed on Internet Explorer. Then IE version 6.0 does display some features while IE version 7.0 does not work at all at the moment. We are working on the issue.


Symbol Legend back to top


gene1 The gene initially placed in the centre of the display and aligned over a vertical black line is the gene that was used as query (reference gene).
gene1 Genes outlined in white are paralogs of the genes in the same colour but outlined in black.
gene1 PhyloView only: shaded genes correspond to genes that are not orthologous to any genes from the species used in the query (reference).
gene1 Coloured genes over a light blue background corresponds to extant or ancestral genes that are i) orthologous to genes from the species used in the query that show the same colour ii) but belong to other branches than the branch leading from the ancestral root to the extant species used as queries.
gene1 Coloured genes over a light green background corresponds to ancestral versions of the genes from the species used in the query that show the same colour
gene1 Branches shown as thicker lines in lighter blue represent the path leading from the ancestral root to the reference species used as query.
gene1 Blue square nodes represent ancestral species leading from the same "root" ancestral species to orthologs and/or paralogs of the gene used as query.
gene1 Red square nodes represent duplication events of an ancestral version of the gene used as query
gene1 Open blue square nodes represent extant species
CNES CNES CNES In PhyloView, coloured circles between genes indicate intergenic Conserved Non-coding Elements (CNEs). The colour from green to red to blue indicates the level of conservation (respectively CNE set1, set2, set3; see the CNE section for more information)
CNES CNES CNES In PhyloView, coloured bars between genes indicate intronic Conserved Non-coding Elements (CNEs). The "host" gene is always the first gene on the left of the CNE. The colour from green to red to blue indicates the level of conservation (respectively CNE set1, set2, set3; see the CNE section for more information)
gap1 In AlignView, a thick green line between two genes in an ancestral species is equivalent to a "gap" in the alignment, i.e. the two genes are neighbours in this species but not in the reference species, where the two orthologs are separated by one or more genes.
gap2 In AlignView, a thick blue line between two genes is equivalent to a "gap" in the alignment of this extant species, i.e. the two genes are neighbours in this species but not in the reference species, where their orthologs are separated by one or more genes. Clicking on the line brings up a window indicating the size of the gap in base pairs, and a link back to other genome browsers.
gap2 In AlignView, a thin green line between two genes is equivalent to a "break" in the continuity of the alignment, i.e. the two genes are linked (on the same chromosome or scaffold) in the order shown in the corresponding extant species but at least one gene separates the two genes in that species
arrow In AlignView, a thin double-headed arrow under a block of genes means that the order of the genes shown was flipped around (reversed) compared to the "canonical" orientation found in e.g. Ensembl for extant species and in this browser for ancestral species.


Syntenic Blocks in ancestral species back to top

The Genomicus browser displays (when possible) the predicted order of genes in ancestral species. The method used to predict this order is as follows (See Muffato et al, manuscript in preparation, for a more formal definition):

1) A pairwise comparison between ALL available species is performed to identify pairwise synteny blocs. Two consecutive genes A1 and B1 in species 1 will belong to a syntenic block with their respective orthologs A2 and B2 in species 2 if A2 and B2 are also consecutive and in the same respective orientation as A1 and B1. This definition is applied strictly for any number of consecutive genes.
2) All pairwise syntenic blocs are compared and when two such blocks overlap without any inconsistencies, the two are merged into a larger block.
3) Merged blocks represent the ancestral gene order in the common ancestor of those extant species that contributed pairwise syntenic blocs.

Because the definition of pairwise syntenic blocks is very strict, it is assumed that this order reflects accurately the order and orientation of genes in their last common ancestor. Merging pairwise syntenic blocks solves the problem of gene losses or duplications in terminal branches of the tree that disrupt the above definition.
Conserved Non-coding Elements (CNEs) back to top

CNEs were computed from multiple alignments between 28 vertebrate genomes projected on the mouse genome, generated using multiz and other tools by the UCSC and Penn State Bioinformatics groups, and made available on the UCSC web site (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/multiz30way/).
To identify CNEs, multiple alignments are scanned using a window of fixed size W, and windows that show more than D percent of columns where all bases are strictly identical in the species considered (straight columns) are selected. Next, each selected window is considered an "anchor" that will be extended on either side until a position is reached where at least X consecutive columns are not straight. The extension then stops at the last straight column.

The current CNE set was generated with three levels of conservation:

Set 1: human + mouse + dog + cow
Set 2: human + mouse + dog + cow + chicken
Set 3: human + mouse + dog + cow + chicken + stickleback

Using the following parameters:

W = 20 bases
D = 80 %
X = 2 columns

CNEs are excluded from regions overlapping protein coding sequences in all of the species considered. By convention, intronic CNEs are displayed on the right-hand side of the gene in which they are included (regardless of the transcription orientation of the gene). Intronic CNEs are shown as small vertical bars and intergenic CNEs as circles.


Frequently Asked Questions: back to top

  1. The space between genes is to small to see the CNEs distinctly. Can I make that space wider?
  2. I initially queried for gene XYZ in dog. How can I change the display so that gene XYZ in mouse is now the reference?
  3. I am only interested in the 5' side of my gene, how can I shift the display to only show the region upstream?
  4. When I "mouse over" a CNE, it does not have orthologous CNEs lighting up in another other species. Why?
  5. I am only interested in comparing a gene between primates and fish, how can I clear the display so that other species are not shown?
  6. My gene of interest has an ortholog in the cat genome but why are there no neighbouring genes in this species?
  7. What does the "A" in a red circle mean?
  8. I typed a well known gene symbol in the entry form but it says that the gene does not exist. Why?
  9. The order of genes from left to right is reversed compared to what I see in Ensembl or UCSC. Why?

The space between genes is to small to see the CNEs distinctly. Can I make that space wider?

You can "zoom in" around the gene in the centre of the display (marked by a vertical black line) by using the cursor on top to specify how many gene you want to see on either side. By default, up to 10 genes are shown on either side of the "query" gene. When zooming in, the genes are redistributed over the width of the display, thus making the intergenic regions "wider". If you want to zoom in on an intergenic region that is not next to the "central" gene, first reset the display around a gene that borders your region of interest (click on "explore the tree of this gene" in the information box) and then zoom in.

I initially queried for gene XYZ in dog. How can I change the display so that gene XYZ in mouse is now the reference?

Click on the name of the species that you want to use as reference on the right hand side of the panel containing the genomic regions. The consequence will be that all the genes in that new species (e.g. mouse) will be coloured, and orthologs in other species will be coloured with reference to the mouse genes.

I am only interested in the 5' side of my gene, how can I shift the display to only show the region upstream?

Use the left and right arrows on either side of the zoom button to shift to the left or the right of the genomic region of interest, while keeping the same gene and same species as reference.

When I "mouse over" a CNE, it does not have orthologous CNEs lighting up in another other species. Why?

There may be two reasons for this:

a. If a CNE of set 1 (human + mouse + dog) is shown in human for instance, it necessarily exists in the other two species, since this condition is part of the definition of this CNE (see paragraph on CNEs). However in the Synteny Browser we only show "syntenic" CNEs. In order to be syntenic, the three orthologous non-coding regions in their respective genomes must share an orthologous gene within a range of five genes upstream or downstream. Therefore if in mouse for instance the CNE does not verify this condition then it is not shown in mouse.
b. If a CNE of set 1 (human + mouse + dog) is shown in human, it necessarily exists in the other two species, since this condition is part of the definition of this CNE. Even though in mouse or dog the CNE might be syntenic (see point "a" above) it may not be located in a genomic region that is currently shown on the display. It may lie somewhere "outside" of the display to the left of the leftmost gene or the right of the rightmost gene.

I am only interested in comparing a gene between primates and fish, how can I clear the display so that other species are not shown?

You may hide some extant and ancestral tracks by clicking on the oldest ancestral node of these species (generally a blue circle). The node then becomes underlined to remind you that it is collapsed. Click again to unfold the node again.

My gene of interest has an ortholog in the cat genome but why are there no neighbouring genes in this species?

A number of mammalian species have had their genome sequenced only at low coverage (generally close to 2 X coverage) to provide a first glimpse of the sequence (see the current status of this project at the Broad Institute). Because of the low coverage, the genome assemblies remain fragmented and so we lack long range continuity. The sequence is organised in so-called "scaffolds" and not as chromosomes. A given scaffold may often contain a single gene, hence the reason why it does not have any neighbours on the display.

What does the "A" in a red circle mean?

Strictly speaking, the gene order shown for ancestral species is the order that we predict in the last common ancestor of the extant species located below this node in the tree. We have very little idea of what these last common ancestors looked like and how to represent them graphically. Short of a better idea, we currently use a generic "A" for Ancestor to represent these species.

I typed a well known gene symbol in the entry form but it says that the gene does not exist. Why?

Currently the only genes symbols allowed (other than the Ensembl nomenclature) are human, mouse and zebrafish approved symbols, respectively by the Human Gene Nomenclature Committee, the Mouse Genome Informatics and the ZFIN consortium.

The order of the genes from left to right is reversed compared to what I see in Ensembl or UCSC. Why?

To make comparisons easier, all the chromosome regions on display are oriented using the "root" as reference. In practice, this means that we first determine the orientation of the ancestral version of the query gene (in the root species), then order all the descendent genomic regions with respect to this gene. Sometimes the ancestral and extant genomic regions are in the same orientation and so you don't notice any change compared to Ensembl or UCSC, but sometimes the gene order in the extant species needs to be "flipped" so that it is well aligned with the other genomes. In AlignView, this is indicated by a double arrow under the bloc of genes. Note however that these orders are purely arbitrary and the respective relative gene orders within each regions is obviously unchanged.

Contact us back to top

In case of questions, comments, bug reports, please eMail us.

Credit and how to cite back to top

The initial implementation of the Genomicus browser was the work of Charles-Edouard Poisnel, a student in computer science at the ENSIIE in Evry, during his 2008 summer practical in the Dyogen Lab. Since then the browser has been improved and expanded upon by Matthieu Muffato, a PhD student in the Dyogen Lab. As of March 2009, a publication describing the browser and its applications is in preparation


Last update: 19 March 2009