International Gene Trap Consortium
Overview

Gene Trap Overview

This tutorial gives a brief overview of gene trap technology, reviews gene trap vector types and function, and discusses experimental opportunities available to gene trap cell line users. For information on how to locate a gene trap cell line on the IGTC web site, please see our search tutorial.


Introduction to Gene Trapping

Gene trapping is a method of randomly generating embryonic stem cells with well-characterized insertional mutations. The mutation is generated by inserting a gene trap vector construct into an intronic or coding region of genomic DNA. The gene trap vector constructs contain selectable reporter tags used to identify cell lines where the vector has successfully interrupted a gene. These reporter tags can also be useful for further experimentation in cells and mice. Gene trap sequences are derived from cDNA or genomic DNA from the trapped locus using primer sequences from vector ends, and the sequences are used to identify and annotate the trapped gene. Gene trap cell lines reliably contribute to the germ line, producing very useful mutant mouse strains for the functional characterization of genes. Although the insertion of the vector construct in a genic region typically results in complete inactivation of the “trapped” gene (a null allele), this is not guaranteed. In some cases vector insertion can fail to inactivate a gene, lead to hypomorphic gene function, or result in a dominant negative phenotype. Generally, vector insertion close to the 5' end of a gene, but downstream of the untranslated region before the first exon, is more likely to create a null allele than insertion near the 3' end.

The International Gene Trap Consortium website represents all publicly available gene trap cell lines, which are distributed on a non-collaborative basis for nominal handling fees. By using gene trap cell lines found on the IGTC site, researchers can save the time and expense of targeting a gene for knockout. Researchers can find trapped genes of interest on the IGTC website (see tutorial on finding gene trap cell lines of interest), and order cell lines for the generation of mutant mice through blastocyst injection.


Vector Types and Function

Gene trap vectors are designed to insert into genomic sequence and interrupt transcription of the trapped gene. There are a variety of different gene trap vector types, and each will produce cell lines with different characteristics and research opportunities. Researchers are advised to learn the characteristics of the different vectors used to create cell lines available for a particular gene or locus of interest. This information is available on the IGTC site on the cell line annotation page and on IGTC member websites. For a more detailed review of gene trap technology see:

Figure 1 shows how genes are normally transcribed and spliced into mRNA products. Gene trapping takes advantage of the splicing apparatus by using a vector construct containing a splice acceptor signal, causing the vector sequence to be spliced into the mRNA. Gene trap vectors contain a polyadenylation signal at the 3' end that causes the mRNA to be truncated and non-functional.



The basic traits of a gene trap vector are shown in figure 2 below. The splice acceptor interrupts normal splicing and causes the downstream vector sequence to be transcribed. The gene trap cassette contains a combination of selection and reporter constructs and is followed by a polyadenylation signal, which causes a stop in translation. PolyA vectors work in a different manner, using a promoter and a splice donor to trap the 3' ends of genes, shown in more detail later.



There are two main strategies employed by gene trap vectors: conventional vectors use the endogenous promoter, while polyA vectors contain a strong promotor in the vector sequence. Conventional gene trap vectors use a splice acceptor to take advantage of endogenous transcription and truncate the mRNA, leaving the gene 5' of the insertion site intact, followed by the vector sequence containing the selection/reporter construct. A polyA signal is placed at the 3' end of standard vectors, causing translation to end and producing a truncated fusion protein. This is shown in figure 3 below.



PolyA gene trap vectors employ a different strategy, shown in figure 4. These vectors contain a promoter signal and a transcriptional start site, allowing genes to be trapped that are not normally expressed, or are expressed at very low levels under experimental conditions. A splice donor sequence is present at the end of the gene trap cassette, causing the mRNA product of the vector construct to be fused with any downstream exons. Since these vectors do not have their own polyA sequences to signal the end of translation, only cell lines in which the vector inserts upstream of a terminal exon will produce the selectable reporter tag.

It is important to note that the sequence used to identify a gene inactivated by a polyA gene trap vector is taken from exons 3' of the vector insertion site. This differs from typical gene trap sequence which is taken from the exons 5' of the vector insertion site.




Experimental Opportunities

Gene trap vectors produce different insertional mutations in genes, resulting in different allele types and different options for further manipulation of the trapped locus. Conventional gene trap vectors will produce a null fusion protein that is regulated in the same manner as the trapped gene. PolyA vectors introduce a promoter in the locus to drive transcription, creating constitutively active transcription of the mutant protein. In addition, the insertion site of the gene trap vector will affect the protein as well, with some truncated proteins retaining partial functionality depending on the intragenic location of functional domains.

Newer gene trap vectors have been developed that incorporate site-specific recombination sites, allowing for further modification of the trapped locus. These sites may be used to create different alleles, including reverting the trapped gene to wild type, and creating conditional alleles. Figure 5 shows schematic examples of how these systems work. For more detailed information on the characteristics and options available for particular vectors, view the IGTC publication list and visit IGTC member websites. For more information about post-insertional modification and site-specific recombination, see:



Figure 5 shows the integration of a vector containing site-specific recombination sites, represented by the pink and blue arrows. Two systems are shown; the Cre-LoxP system and the Flp-FRT system, however there are other strategies in use besides these two. Addition of Cre causes the LoxP sites to recombine which excises the vector cassette. This results in reversion of the gene trap cell line to wild type expression of the previously trapped gene.



Figure 6 shows a more complicated strategy involving directional site-specific recombination sites that will invert the gene trap cassette. The first step creates a revertant allele using Cre by inverting the vector cassette to a non-functioning state. The second step returns the locus to the null allele using Flp to reinvert the vector sequence. These recombination steps can be directed in a temporally and spatially specific manner.

Users are advised to learn the details about the vector used, and to also confirm that the insertion occurred at the listed locus. As gene trap sequences can be short and imperfect and handling errors do occur, the identification and annotation provided at the IGTC site should be validated experimentally. For protocols on the use and handling of gene trap cell lines, visit the website of the IGTC member that produced the cell line of interest.


Limitations and Pitfalls

Researchers wishing to order an embryonic stem cell line are advised to carefully investigate the genomic region surrounding the vector insertion to make sure that their gene of interest is likely to be fully and solely inactivated. Because a greater percentage of a gene's exons are prevented from being transcribed and translated, vector insertions near the 5' end of a gene are more likely to result in complete gene inactivation than insertion at the 3' end, as shown in Figure 7.



Another concern is inactivating multiple genes. This can happen if the vector insertion is in a region where the coding regions of multiple genes overlap, or is upstream of an RNA gene such as a microRNA that would normally be transcribed along with the gene that has been identified as inactivated. Genes coding on the opposite strand can also be inactivated by vector insertion, as shown in Figure 8, so be sure to check the strand orientation of your gene of interest.



Alignment of the sequence tag for a cell line to the mouse genome at UCSC or Ensembl or Entrez Gene will provide information about other genes in the region of the vector insertion. It is also wise to re-sequence any cell lines received to ensure that no handling errors have occurred.


Gene and Cell Line Identification Changes

In order to maintain up-to-date information, IGTC cell lines are periodically run through an automated identification process that matches cell line sequence tags with gene sequences in the GenBank database and locations on the mouse genome. Occasionally, the name or symbol of a gene will be changed, resulting in a change in the identification of cell lines in the IGTC database. The gene sequences themselves can also change, due to replacement with a better sequence, changes in vector-removal protocols, or other reasons. Sometimes sequences are removed from databases entirely. This may lead to cell lines being associated with new sequences, retracted sequences no longer available from GenBank or Ensembl, or no sequences at all. This can also result in the category of identification changing between "localized" (L), "transcript found" (T), "unlocalized" (U), and "conflict" (C). Most often when a cell line identification changes, it is only the description or symbol used to represent the gene that has changed, possibly altering the results of keyword searches. When a new version of the mouse genome is released, the coordinates of a gene locus can change. Localizations of cell lines on recent versions of the mouse genome can be accessed from the IGTC database. The genome coordinates may differ, but that usually indicates a numbering change, rather than a new identification.

It is important to note that the knockout carried by a cell line does not change even if the identification changes. Likewise, the sequence used to identify that cell line will not change.


Cell Line Ordering

The International Gene Trap Consortium website is a repository for information on the gene trap cell lines generated by its members. Each member has their own means of distributing their cell lines. A link is provided to the respective member's cell line provider on each cell line page in the Sequence Tag Information section in the Availability line, as shown in the example below:


closet