|
Gene Trap Overview
This tutorial gives a brief overview of gene trap technology, reviews gene trap vector
types and function, and discusses experimental opportunities available to gene trap cell
line users. For information on how to locate a gene trap cell line on the IGTC web site,
please see our search tutorial.
Introduction to Gene Trapping
Gene trapping is a method of randomly generating embryonic stem cells with well-characterized insertional
mutations. The mutation is generated by inserting a gene trap vector construct into an intronic or coding region
of genomic DNA. The gene trap vector constructs contain
selectable reporter tags used to identify cell lines where the vector has successfully interrupted a gene. These reporter tags
can also be useful for further experimentation in cells and mice. Gene trap sequences are derived from cDNA
or genomic DNA from the trapped locus using primer sequences from vector ends, and the sequences are used
to identify and annotate the trapped gene. Gene trap cell lines reliably contribute to the germ line,
producing very useful mutant mouse strains for the functional characterization of genes. Although the insertion of
the vector construct in a genic region typically results in complete inactivation of the “trapped” gene (a null allele),
this is not guaranteed.
In some cases vector insertion can fail to inactivate a gene, lead to hypomorphic gene function, or
result in a dominant negative phenotype. Generally, vector insertion close to the 5' end of a gene, but downstream of
the untranslated region before the first exon, is more likely to create a null allele than insertion near the 3' end.
The International Gene Trap Consortium website represents all publicly available gene trap cell lines,
which are distributed on a non-collaborative basis for nominal handling fees. By using gene trap cell
lines found on the IGTC site, researchers can save the time and expense of targeting a gene for knockout.
Researchers can find trapped genes of interest on the IGTC website (see
tutorial on finding gene trap cell
lines of interest), and order cell lines for the generation of mutant mice through blastocyst injection.
Vector Types and Function
Gene trap vectors are designed to insert into genomic sequence and interrupt transcription of the trapped gene.
There are a variety of different gene trap vector types, and each will produce cell lines with different
characteristics and research opportunities. Researchers are advised to learn the characteristics of the
different vectors used to create cell lines available for a particular gene or locus of interest. This
information is available on the IGTC site on the cell line annotation page and on IGTC member websites. For a
more detailed review of gene trap technology see:
Figure 1 shows how genes are normally transcribed and spliced into mRNA products. Gene trapping takes
advantage of the splicing apparatus by using a vector construct containing a splice acceptor signal,
causing the vector sequence to be spliced into the mRNA. Gene trap vectors contain a polyadenylation signal
at the 3' end that causes the mRNA to be truncated and non-functional.

The basic traits of a gene trap vector are shown in figure 2 below. The splice acceptor interrupts normal splicing
and causes the downstream vector sequence to be transcribed. The gene trap cassette contains a combination of
selection and reporter constructs and is followed by a polyadenylation signal, which causes a stop in translation.
PolyA vectors work in a different manner, using a promoter and a splice donor to trap the 3' ends of genes, shown
in more detail later.

There are two main strategies employed by gene trap vectors: conventional vectors use the endogenous promoter,
while polyA vectors contain a strong promotor in the vector sequence. Conventional gene trap vectors use a
splice acceptor to take advantage of endogenous transcription and truncate the mRNA, leaving the gene 5' of the
insertion site intact, followed by the vector sequence containing the selection/reporter construct. A polyA
signal is placed at the 3' end of standard vectors, causing translation to end and producing a truncated fusion protein.
This is shown in figure 3 below.

PolyA gene trap vectors employ a different strategy, shown in figure 4. These vectors contain a promoter signal
and a transcriptional start site, allowing genes to be trapped that are not normally expressed, or are expressed
at very low levels under experimental conditions. A splice donor sequence is present at the end of the gene trap cassette,
causing the mRNA product of the vector construct to be fused with any downstream exons. Since these
vectors do not have their own polyA sequences to signal the end of translation, only cell lines in which the vector
inserts upstream of a terminal exon will produce the selectable reporter tag.
It is important to note that the sequence used to identify a gene inactivated by a polyA gene trap vector is taken from exons 3'
of the vector insertion site. This differs from typical gene trap sequence which is taken from the exons 5' of the vector
insertion site.

Experimental Opportunities
Gene trap vectors produce different insertional mutations in genes, resulting in different allele types and different
options for further manipulation of the trapped locus. Conventional gene trap vectors will produce a null fusion
protein that is regulated in the same manner as the trapped gene. PolyA vectors introduce a promoter in the locus to
drive transcription, creating constitutively active transcription of the mutant protein. In addition, the insertion
site of the gene trap vector will affect the protein as well, with some truncated proteins retaining partial
functionality depending on the intragenic location of functional domains.
Newer gene trap vectors have been developed that incorporate site-specific recombination sites, allowing for further
modification of the trapped locus. These sites may be used to create different alleles, including reverting the
trapped gene to wild type, and creating conditional alleles. Figure 5 shows schematic examples of how these systems
work. For more detailed information on the characteristics and options available for particular vectors, view the
IGTC publication list and visit IGTC member websites. For more information about post-insertional modification and
site-specific recombination, see:

Figure 5 shows the integration of a vector containing site-specific recombination sites, represented by the pink
and blue arrows. Two systems are shown; the Cre-LoxP system and the Flp-FRT system, however there are other
strategies in use besides these two. Addition of Cre causes the LoxP sites to recombine which excises the vector
cassette. This results in reversion of the gene trap cell line to wild type expression of the previously trapped gene.

Figure 6 shows a more complicated strategy involving directional site-specific recombination sites that will invert
the gene trap cassette. The first step creates a revertant allele using Cre by inverting the vector cassette to a
non-functioning state. The second step returns the locus to the null allele using Flp to reinvert the vector
sequence. These recombination steps can be directed in a temporally and spatially specific manner.
Users are advised to learn the details about the vector used, and to also confirm that the insertion occurred at the
listed locus. As gene trap sequences can be short and imperfect and handling errors do occur, the identification
and annotation provided at the IGTC site should be validated experimentally. For protocols on the use and handling
of gene trap cell lines, visit the website of the IGTC member that produced the cell line of interest.
Limitations and Pitfalls
Researchers wishing to order an embryonic stem cell line are advised to carefully investigate the genomic region
surrounding the vector insertion to make sure that their gene of interest is likely to be fully and solely inactivated. Because
a greater percentage of a gene's exons are prevented from being transcribed and translated,
vector insertions near the 5' end of a gene are more likely to result in complete gene inactivation than insertion
at the 3' end, as shown in Figure 7.

Another concern is inactivating multiple genes. This can happen if the vector insertion is in a region where the coding
regions of multiple genes overlap, or is upstream of an RNA gene such as a microRNA that would normally be transcribed
along with the gene that has been identified as inactivated. Genes coding on the opposite strand can also be inactivated
by vector insertion, as shown in Figure 8, so be sure to check the strand orientation of your gene of interest.

Alignment of the sequence tag for a cell line to the mouse genome
at UCSC or
Ensembl or
Entrez Gene will provide information
about other genes in the region of the vector insertion. It is also wise to re-sequence any cell lines received to ensure
that no handling errors have occurred.
Gene and Cell Line Identification Changes
In order to maintain up-to-date information, IGTC cell lines are periodically run through an automated identification process
that matches cell line sequence tags with gene sequences in the GenBank database and locations on the mouse genome.
Occasionally, the name or symbol of a gene will be changed, resulting in a change in the identification of cell lines in
the IGTC database. The gene sequences themselves can also change, due to replacement with a better sequence, changes in
vector-removal protocols, or other reasons. Sometimes sequences are removed from databases entirely. This may lead to
cell lines being associated with new sequences, retracted sequences no longer available from GenBank or Ensembl, or no
sequences at all. This can also result in the category of identification changing between
"localized" (L), "transcript found"
(T), "unlocalized" (U),
and "conflict" (C). Most often when a cell line identification changes, it is only the
description or symbol used to represent
the gene that has changed, possibly altering the results of keyword searches. When a new version of the mouse genome is
released, the coordinates of a gene locus can change. Localizations of cell lines on recent versions of the mouse
genome can be accessed from the IGTC database. The genome coordinates may differ, but that usually indicates a numbering
change, rather than a new identification.
It is important to note that the knockout carried by a cell line does not change even if the identification changes.
Likewise, the sequence used to identify that cell line will not change.
Cell Line Ordering
The International Gene Trap Consortium website is a repository for information on the gene trap cell lines generated by
its members. Each member has their own means of distributing their cell lines. A link is provided to the respective
member's cell line provider on each cell line page in the Sequence Tag Information section in the Availability
line, as shown in the example below:

|