Virginia Walbot
2000 年 12月14日 发表于 nature
对一种植物全基因组的顺序测定现已完成,这对认识开花植物的进化和作物的遗传学,提供了极珍贵的信息——— 开花植物是晚近进化史上最为成功的。它出现在两亿年前,现在大约已有250000种之多,是陆地、水域和港湾生态系统的主力军,几乎遍布于地球的每个角落。
人类依靠开花植物提供热量,必需氨基酸、维生素和数以千计的化学物质和药品。这些植物的形式和化学多样性令人吃惊。但已知它们的分异时间较近,因此一个种内的大多数基因可望在其他所有的种内出现相似者。玫瑰有别于野百合,棕搁树不同于李树,可由基因调节或蛋白质活性的改变来解释。拟南芥(Arabidopsis thaliana)是一种很小的植物,有5条染色体,其中最后3条的序列,刊于2000年12月的《自然》杂志;另外两条序列,一年前就公布了。这将使所有植物生物学家直接受益。 如今所实现的目标是1996年确定的。拟南芥联 合测序小组测出了这种植物核基因组内1.187亿个碱 基对的序列。由于技术的进步,这是迄今为止所获得 的最精确的真核细胞的基因组序列。另外被测定序列 的还有酿酒酵母(ccharomyces cerevisiae)、线虫 (Caenorhabditis elegans)和黑腹果蝇(Drosophila mela— nogaste)。拟南芥的序列中也是在富含基因片段的染 色体内“缺口” (空白)最少的。5个着丝粒的基本 序列也已获得,着丝粒是细胞分裂期间染色体配对和 运动所必需的基因贫乏的结构DNA。 也许会有人认为,对于研究工作而言,作物会比 拟南芥这种小草更有用途。岂不知作物本身较大,基 因组往往也大,并难于操作。拟南芥被迅速地用作典 型的实验开花植物是基于这样一种理由,认为它的每 个基因仅有一个复本,重复DNA少于10%。然而, 令人吃惊的是,测定了拟南芥的全部基因组序列后发 现,它的多余基因不少、被识别的基因约有26000 个,但基因组至少有70%是重复的。不同的基因总 共不超过15000个;而且随着研究人员对重复基因的 进一步认识,这个数字还会缩小。
在其他开花植物中,导致基因重复有两个因素: 多倍性化(全套染色体重复)和局部基因重复(染色 体内的个别基因重复)。在拟南芥的进化史上,两种 过程都出现过。两次多倍性化发生的时间分别是1.8 亿年前和1.12亿年前,这可以解释在一条或多条染 色体上有重复的整套基因这种现象。而重复基因中局 部重复约占17%。基因的丢失和染色体的重组,造 成了如今的拟南芥仅有较小的基因组和5条染色体。 相比之下,它的各种芸苔属亲缘作物——甘蓝和花椰 菜等在约1900万—1200万年前虽和拟南芥具有同一 个祖先,但由于进一步的多倍性化,这些作物的基因 组增大了。
拟南芥的基因是致密的,一般包含几个编码区(外 显子),每个编码区由约250个碱基对组成,中间被短 的非编码区(内含子)隔断。基因之间靠得很近,相距 约4.6个千碱基,说明它们的调节区也很短。相反,许 多动物基因,包含几十个外显子,并有10个千碱基或 更大的调节区。拟南芥的基因小,有助于耐受广泛的 基因组的重组;基因越小,越不易破坏。具有更大基因 组的植物,也有致密基因的,但这些植物中基因之间的 距离要比拟南芥大一至两个数量级。 虽有基因的重复很明显,但遗传学家们还是识别 了数干个突变基因(在玉米、西红柿、拟南芥和小麦 中)在植物中造成了显而易见的缺点。但这些缺点只 在重复基因中之一上出现。如果重复基因具有相同的 功能,那么可以预料,发生突变的那个基因将由另一 个补偿。因此这些物种内的许多重复基因有着独特的 作用。调节区内的突变能导致重复基因在发育过程或 在应答环境的变化时会有不同的表述。编码区中的突变,能产生微有变化的蛋白质。从基因重复提供的原料开始,开花植物的进化依靠这两种突变,产生新的、具有种的特异性的植物结构和化学。
拟南芥的外显子,含44%的鸟苷和胞嘧啶碱基,比内含子内这两种碱基的含量(32%)多。这是植物基因的一个显著特征。当基因进行转录时,首先形成一种信使BNA的前体,然后把内含子除去,产生功能性信使BNA。碱基组成的差异,是由于每日和季节性的温度波动影响了上述过程的精确度。 拟南芥的不同基因的数量(不足15000个),只比预计黑腹果蝇的13601个稍多,比线虫的18424个要少。在这些动物种内,有拟南芥多数基因的相应成份,说明了植物和动物具有共同祖先。在所有这三种基因组内,基因数类似于一部通用词典的单词数;以不同方式把这些单词连到一起,就可写出大量的书。因此生命的多样性就在一定程度上取决于基因是如何连结进入各自的发展进化路线的。在植物和动物16亿年的分歧过程中,保守的生物化学成份,例如转录因子和蛋白质激酶等,发生实质性的分歧。 例如,在拟南芥中有几类基因特别丰富。有许多基因编码输送水份的管道;基因组编码的肽激素运输器要比动物基因组中多10倍。据推测受体样的蛋白质激酶有上百种;但动物信号转导的许多成分却并不存在。在拟南芥中,有420多种基因涉及细胞壁的合成和修饰,这在动物中是不存在的。核基因的约25%包含信号序列,是指导编码蛋白质形成叶绿体或线粒体等细胞器的标示基因;可在动物的细胞核内,含有信号序列的蛋白质以形成线粒体的不足5%。这一点不足为奇,因为植物细胞器之间进行的代谢活动比动物和真菌的细胞器要多得多。
有机和无机世界之间的这些代谢交易,使动物和真菌的生命成为可能。植物利用太阳能,转化二氧化碳为糖、碳水化合物和脂肪。它们还原硝酸盐和硫酸盐离子,合成氨基酸。它们生产每种维生素和酶的辅助因子,并且是浓集和制造动物饮食中的有效磷、铁、锌、镁、钾和其他矿物营养的主渠道。这些代谢能力是植物和光合细菌共有的,而拟南芥和蓝细菌集胞菌(Synechocystis )的确有许多相同的基因。所以并不奇怪,在拟南芥中,代谢和生物合成基因要比其他现有真核细胞的基因组多,至少占基因组的10%。
开花植物合成的次级化合物,估计不少于10万种,这些是在动物细胞中见不到的,对植物的生命也不是必需的,但许多具有种或属的特异性。化学上的多样性给我们提供了染料、鲜味香料以及治疗药物。虽然没有一种开花植物能合成所有的多得惊人的不同产物,但在拟南芥的基因中却包括了合成次级产物原型前体的信息。植物种之间的化学差异,主要是由对这些核心分子的修饰来反映的。 在对拟南芥基因组已有认识的基础上,今后的任务是研究其个别蛋白质的作用。而对其他开花植物的研究将确定那些负责种间的结构和生化多样性的基因;以及导致这种多样性的基因的进化途径。美国国家科学基金会,在最初3年内向植物基因组奖励项目投资1.5亿美元,并宣布了到2010年完成的“功能基因组计划”(program in functional Cenomes)。在欧洲和亚洲也有类似的巨额投资。所有这一切,必将加速我们认识植物界这一极其成功的进化分支的进程。
The whole genome of a plant determine the order has been completed, and this understanding of the evolution of flowering plants and crop genetics, providing a very valuable information --- more recent evolutionary history of flowering plants that are most successful. It appears in two million years ago, and now has about 250,000 kinds of as much as it is on land, waters and coastal ecosystems of the main force, almost all over in every corner of the earth. Flowering plants rely on humans to provide energy, essential amino acids, vitamins and thousands of chemical substances and drugs. In the form of these plants and chemical diversity of astonishing. It is known that differentiation of the time they close, so the majority of genes within a species is expected to occur within all the other kinds of look-alikes. Rose is different from the wild lily, palm tree is different from Li, may be gene regulation or to explain changes in protein activity. Arabidopsis (Arabidopsis thaliana) is a small plant, there are five chromosomes, the last three of the series, published in December 2000 of the "Nature" another two series, published a year ago . This will directly benefit all plant biologists. Today, the goal was established in 1996. Measured by the Joint Group of Arabidopsis thaliana sequencing the nuclear genome of this plant within the 118.7 million base pairs of sequence. Due to technological progress, this is by far the most accurate obtained eukaryotic genome sequences. In addition the sequence has been determined there is Saccharomyces cerevisiae (ccharomyces cerevisiae), nematode (Caenorhabditis elegans) and black-bellied fruit fly (Drosophila mela-nogaste). Also in the sequence of Arabidopsis gene-rich chromosomes of "gap" (blank) the least. 5 The basic sequence of the centromere has also been obtained, the centromere is a chromosome pair during cell division and movement necessary for the structure of the gene-poor DNA. Some may ask that, for research, the crop is more than the use of Arabidopsis thaliana that grass. Qi-I do not know crop itself, a larger genome often large and difficult to operate. Arabidopsis thaliana has been used as a typical experiment quickly flowering plants is based on the grounds that it is only one copy of each gene, repetitive DNA is less than 10%. However, surprisingly, is determined the complete genome sequence of Arabidopsis thaliana and found that it is a lot of redundant genes have been identified about 26,000 genes, but at least 70% of the genome is repetitive. Different genes a total of not more than 15,000; and along with the researchers a better understanding of the duplicate genes, this number will shrink. In other flowering plants, resulting in gene duplication are two factors: the polyploidy of (a full set of chromosome duplication), and partial gene duplication (chromosome number of individual gene duplication). The evolutionary history in Arabidopsis, both processes are seen again. Nature of the two times occurred at a time were 1.8 billion years ago and 1.12 billion years ago, which could explain one or more chromosomes duplicate set of genes of this phenomenon. The duplicate genes in the partial duplication of approximately 17%. Gene loss and chromosome reorganization, resulting in today's relatively small genome of Arabidopsis thaliana, and only five chromosomes. In contrast, its various Brassica relatives of crops - such as broccoli, cabbage and about 19 million years ago, although the -1200 and Arabidopsis have the same ancestors, but because of further polyploidy of these crops the genome has increased. Arabidopsis genes are compact, usually contains several coding region (exon), each coding region of about 250 base pairs of the composition, in the middle a short non-coding region (introns) cut off. Between genes are very close to a thousand base pairs away from about 4.6, indicating that their regulatory region are also very short. On the contrary, many animal genes, including dozens of exons, and has 10 thousand base pairs or more regulatory regions. Arabidopsis gene is small, contribute to tolerance of a broad reorgani genes smaller, the more difficult to destroy. With a larger genome plants, there are dense genes, but these plants than the distance between genes in Arabidopsis freshman year to two orders of magnitude. Although gene duplication is very obvious, but geneticists have identified a number of dry months, or mutant gene (in maize, tomato, Arabidopsis and wheat) in the plant caused by the obvious shortcomings. However, these disadvantages only appear on one of the duplicate genes. If the duplicate genes have the same functionality, you can expect that mutations in other genes will be compensated. Therefore, many of these duplicate genes within species has a unique role. Adjust the region could lead to duplicate gene mutation during development or in response to environmental changes there would be a different formulation. Coding region of the mutation, which can produce a slight change in proteins. Gene duplication provides the raw materials from the beginning of the evolution of flowering plants rely on these two mutations to generate new, specific species of plant structure and chemistry. Arabidopsis thaliana exons, with 44% of guanine and cytosine base pairs, compared with the two bases within the intron content (32%) more. This is a plant gene in a prominent feature. When the gene transcription, the first to form a precursor messenger BNA and then remove the intron, resulting in functional messenger BNA. Differences in base composition is due to the daily and seasonal temperature fluctuations affect the accuracy of the above-mentioned process. The number of different Arabidopsis genes (less than 15,000), only to Drosophila melanogaster than expected 13601 slightly more than 18424 fewer nematodes. In these animal species, there are the majority of Arabidopsis genes corresponding components, illustrates the common ancestor of plants and animals. In all three kinds of genome, gene number is similar to a number of com in different ways with these words together, you can write a lot of books. Therefore, the diversity of life in a certain extent, depending on how the genes are linked into the development of their own evolutionary line. In plants and animals 16 million years of divergence process, the conservative biochemical components, such as transcription factors and protein kinases and so on, substantial differences occur. For example, in Arabidopsis there are several types of genes particularly rich. There are many genes encoding trans genome encoded peptide hormone transporter than the animals, the genome 10 times. It is speculated that receptor-like protein kinases are but many of the animals, signal transduction components is does not exist. In Arabidopsis, there are more than 420 kinds of genes involved in cell wall synthesis and modification, which in animals is non-existent. About 25% of the nuclear gene that contains the signal sequence, coding for proteins to guide the formation of organelles such as chloroplasts or mitochondria the cell nucleus in animals, containing signal sequence of mitochondrial proteins to form less than 5%. This is surprising, since plant organelles between the metabolic activity of animals and fungi than in organelles is much larger. Between the world of organic and inorganic metabolism of these transactions, so that the lives of animals and fungi is possible. Plants use solar energy, carbon dioxide into sugar, carbohydrates and fats. They restore the nitrate and sulfate ions, synthetic amino acids. They produce for each vitamin and enzyme cofactor, and is concentrated and the manufacture of animal food in phosphorus, iron, zinc, magnesium, potassium and other minerals nutrition, the main channel. These metabolic capacity is shared by plants and photosynthetic bacteria, while the Arabidopsis and cyanobacteria Synechocystis strains (Synechocystis) does have many of the same genes. Therefore not surprising that, in Arabidopsis thaliana, metabolic and biosynthetic genes than other existing genome of eukaryotic cells, more than account for at least 10% of the genome. The synthesis of secondary compounds flowering plants is estimated that no less than 100,000 kinds, these are not seen in animal cells, and on the plant life is not necessary, but many have a species or genus specificity. The diversity of chemistry provides us with dye, flavor spices and treatments. Although there is no kind of flowering plants can synthesize all of the staggering array of different products, but in Arabidopsis genes have included prototype synthetic precursors of secondary products of the information. The chemical differences between plant species, mainly by the modification of these core elements to reflect them. In the Arabidopsis genome has been recognized, based on the tasks ahead is to study the role of its individual proteins. On other flowering plants of the study will identify those responsible for the structural and biochemical species diversity and lead to the evolution of this diversity means the genes. U.S. National Science Foundation, in the first three years, incentives to plant genome project investment 150 million U.S. dollars, and announced to be completed by 2010 "Functional Genome Project" (program in functional Cenomes). In Europe and Asia have a similar massive investment. All this will certainly accelerate our understanding of the evolution of plant kingdom branch of this highly successful process. 给点分 怪不容易的
