The project starts around September 2007. The major support will be provided by the Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics Institute , Shenzhen (BGI Shenzhen), China; and the NHGRI , part of the National Institutes of Health (NIH). It has been estimated that the project would likely cost more than $500 million if standard DNA sequencing technologies were used. Therefore, several new technologies (e.g. Solexa, 454, SOLiD) will be applied, lowering the expected costs for the 1000 Genome Project about $120 million over five years, ending in 2012, provided by fundations by foundations and national governments [1] [2]


p { margin-bottom: 0.08in; }h2 { margin-bottom: 0.08in; }h2.ctl { font-family: "Lohit Hindi"; }

The aim of the project as described by the Consortium is explicit: "to find most genetic variants that have a frequency of at least 1% in the population studied". However, this tell us very few about the reasons motivating such enterprise. For each specie, i.e. Homo sapiens, only a reference genome is available. By itself, it is meaningless except if one is able to establish the link between genomic variations and phenotypic traits (disease, drug resistance, etc...). The first attempts to reach such goal are known as Genomic Wide Association Studies, using thousands human samples. Even if a few links were identified between specific genetic variation and diseases, a lot of well known genetically-related disease could be linked with specific variation, known as "missing heritability".

From this observation, the need of in fact fully sequencing thousands genomes appeared necessary. The project by itself correspond to the sequencing of 2500 individuals, samples coming from a large panel covering all 5 continent and genetic population in order to include all possible variations.


Technical point about how and why and the drawbacks...



p { margin-bottom: 0.08in; }

The project “1000 Genomes” is still currently running on. It can divided in 3 different phases.

The first phase was in fact a “pilot” concerning the sequencing (low coverage) of a subset of 179 individuals. This first pilot is actually already fully achieved and the results, limitations, are already published. From this first step, a catalog of 8 millions SNPs and 1 millions structural variants (insertion, deletions etc) was obtained.

The second phase correspond only to the exon sequencing of 679 individuals, as it is assumed that most of the meaningful variants are included in exons.

The third phase will correspond in fact to the fully achieved 2500 genomes sequenced.

Scientific ImpactEdit

Impact on geneticsEdit

p { margin-bottom: 0.08in; }

There are three different projects:

The Trio design with the high sequence coverage enable accurate discovery of multiple variant types across the genome with Mendelian transmission:

  • genotype estimation

  • inference of haplotype

  • quality control

The low-coverage project identifies shared variants on common haplotypes and will give some inaccurate genotype.

The exon project enable to identify common, rare and low-frequency variation in the targeted portion of the genome.

Impact on biotechnologyEdit

The project has, from its inception, pushed levels of sequencing to the limits of technology typically associated with the budgets of public-sector biology. As a side-effect of the project, several innovations have occurred.

  • Using de Bruijn graphs to store massive amounts of sequencing data. A de novo genome assembler was produced using this technology named Cortex (Caccamo, Iqbal et al. 2010) to use the low-coverage sequencing data to find 3.7 Mb of novel sequences longer than 100 bp.
  • Sharing raw data among the sequencing centers and collecting it at the EBI has necessitated the use of high-performance file transfer protocols including Aspera.

Access to the DataEdit

  • Last release Feb 2011

  • There are 3 different projects:
  1. Low-coverage whole genome sequencing of 629 individuals (last release);
  2. Exon targeted sequencing of 697 individuals from 7 populations (first release);
  3. Deep sequencing of 2 mother-father-child trios (first release);
  • For each projects there are 3 datasets availible:

  1. Variant calls, stored in VCF 4.0 format (text). Description of this format could be found at There are several tools to handle VCF data with VCFTools and GATK from Broad beeing the most used ones.
  2. Alignment files in .bam format, availible in the alignment index EBI|NCBI
  3. Raw reads, available as SRA/Fastq format from NCBI or EBI

Competing/Associated ProjectsEdit

One competing project is "BGI and Danish Organizations Initiate DKK 170M Cancer Vaccine and Danish Genome Research", whose aim is to establish a unique catalogue of the millions of variations in Danes' DNA.[3] Another one is Genome 10K whose aims to assemble a genomic zoo—a collection of DNA sequences representing the genomes of 10,000 vertebrate species, approximately one for every vertebrate genus. [4] 1000 Plant Genomes Project: the project aims to obtain the transcriptome (expressed genes) of 1000 different plant species over then next few years. The goals of this project is significantly different from the 1000 Genomes Project, while the latter focuses on genetic variation in a single species, the 1000 Plant Genomes Project looks at the evolutionary relationships and genes of 1000 different plant species.[5] The 1001 Genomes Project: Sequencing the whole genome of 1001 Arabidopsis strains [6] International HapMap Project: The goal of this is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. [7] 10,000 Microbe Genome Plan: A genome project aiming to build a whole-genome sequence map for 10,000 microbes [8]

Teams InvolvedEdit

The sequencing work is being carried out by an international collaboration, with work performed at the Wellcome Trust Sanger Institute, the Beijing Genomics Institute in China, and the National Human Genome Research Institute (NHGRI) Large-Scale Sequencing Network, which includes the Broad Institute of MIT and Harvard; the Washington University Genome Sequencing Center at the Washington University School of Medicine in St. Louis; and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston. The complete list of participants is available here

Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.