How did the human genome project work

From genome to interactome

Text: Harald Rösch

It all started with the Human Genome Project and the decoding of the human genome in 2001. At that time, diagnosis, prevention and treatment of most, if not all diseases seemed within reach. But the disillusionment followed. Although the decoding of the genome was a milestone in science, scientists are still a long way from a complete understanding of all life processes. And so it happens that medicine has so far benefited less from the human genome project than hoped. Doctors today know many new genes that, for example, influence the risk of cancer, diabetes or arteriosclerosis, but each one increases the risk of the disease only slightly. Reliable disease prognoses are difficult to make in this way. The decoded human genome has so far hardly produced any new treatment methods.

Obviously, knowing the sequence of the letters in the genetic code is not enough to know how a cell works. The reasons for this can be illustrated using the example of an engineer who is supposed to recreate a passenger aircraft. First of all, he needs a blueprint for all the individual parts. He also needs to know what their functions are and how they are put together. Instead of such a construction plan, however, he only has instructions for building the individual parts. So he now knows what the individual parts are made of, but not what they look like, what function each part has, let alone how many copies he needs of each part and how to assemble them. It is obvious that under these circumstances he could never design an airplane.

Biologists are faced with a similar task if they want to use the genetic data to understand the processes in a cell. The genome provides the instructions for the most important components of a cell, the proteins. But which proteins are actually formed, at what point in time and how much of them - that cannot be easily read from the letters of the genetic code.

This can also be seen in the fact that one gene can often result in several proteins. This diversity arises when a gene contains the information for several proteins or when a protein chain is subsequently split into several molecules. Regrouping the messenger RNA in so-called alternative splicing also gives rise to different gene products. In humans, up to ten different proteins can be traced back to a single gene. The number of proteins in a cell can therefore be many times higher than the number of its genes. Assuming 20,000 to 25,000 genes in humans today, scientists estimate the number of human proteins to be 80,000 to 400,000.

This is where the proteome and interactome come into play. No cell can survive without proteins. They are their molecular motors, scaffolding, doors, signaling substances and antennas. The proteome thus determines which task a cell can take on in the organism. If the genome is the building instruction for all essential individual parts, then the proteome is the parts catalog and the interactome is the instruction for which parts are connected to each other. So if they know the proteins and their reaction partners, the researchers hope, they can explain how a cell works much more precisely and thus also get to the bottom of the causes of diseases.

The scientists therefore have great hopes for proteomics. They want to know which proteins an organism, a tissue, a cell or a cell organelle forms and in what quantity. Analogous to the human genome project, the human proteome project is intended to provide new insights into how cells work. In addition, by comparing the proteome of healthy and diseased cells, researchers can identify the causes of disease. Because sometimes a single faulty protein is enough to cause diseases such as cancer, Alzheimer's or Parkinson's.

However, it is not enough just to create such a protein catalog; the scientists also need to know the changes in the protein molecules that these use to transmit signals to other molecules. These subsequent changes - therefore also referred to as post-translational - are mostly smaller molecules that are attached to certain points on the protein molecule. These include, for example, phosphate, methyl or acetyl groups. Depending on where a protein is phosphorylated, methylated or acetylated, it can activate a certain signaling pathway and thus influence metabolic pathways. The aim is to take stock of the proteins and their post-translational changes.

Now the interactome is missing, i.e. which proteins work together. Some proteins assemble in pairs and exchange signals in the process. The interactome of a human cell is estimated to have around 130,000 such pair-wise interactions. Still others make intricate organelles from dozens of proteins, such as the ribosomes.

Challenges, setbacks, successes - the story of the Human Proteome Project

Deciphering the human proteome is therefore a mammoth project. The technical challenges that scientists face are immense. There are two main reasons for this. One is of a principle nature: in the cells of an organism, the same genes are not active everywhere. Depending on the cell type, different genes are read out and other proteins are formed. With around 250 different cell types in the human body, there are at least as many proteomes. And not only that - the proteome of a cell depends on many other factors. A cell can produce different proteins depending on its age, diet or state of health, so the protein composition changes accordingly. Environmental influences such as drugs or pollutants also influence the proteome.

The scientists therefore have to determine the respective proteome for each cell type separately. It can therefore take years for the complete proteome of all human tissue types to be deciphered. For the human proteome project, the researchers have therefore set themselves a goal that can be reached more quickly: they first want to identify an associated protein for each gene. Once that is done, you can gradually add the proteomes of the different cell types, the post-translational changes and the interactome.

The second reason is related to the chemical properties of proteins. In contrast to the chemically similar behavior of DNA molecules, proteins are extremely variable: some are water-soluble, others fat-soluble. The largest are over 200 times heavier than the smallest among them. Some are electrically charged, some are not.

In addition, they occur in very different amounts: some are so common that protein researchers can extract them in large quantities from tissue, while for others, scientists have to make do with a few billionths of a milligram. For example, one milliliter of blood contains ten trillion times more albumin than interleukins. This is why these signal substances, which occur in tiny amounts, are difficult to identify and are easily overlooked. In doing so, they often have crucial functions in the cells. In addition, proteins cannot simply be copied and multiplied like DNA.

All of this makes the analysis of the proteome extremely time-consuming. The Human Proteome Organization (HUPO) was therefore founded in 2001 with the aim of coordinating proteome research worldwide. In 2004 the organization started a project to analyze blood plasma, but the initial results were disappointing. Tests in which one and the same tissue sample was analyzed by several research groups had resulted in different proteomes. The procedures were too different, the analysis methods too error-prone.

HUPO then introduced standards for sampling and standardized data acquisition and evaluation. Since then, new investigation methods such as mass spectrometry, which has been further developed for proteomics or cryoelectron tomography, but also gentler methods for obtaining and purifying proteins from cells and more powerful software for data analysis have made decisive progress in the human proteome project. Today scientists can analyze several thousand proteins at once.

In contrast to the human genome project, which was founded in the USA in 1990 and in which various national human genome projects took part, various independent research associations around the world are working on the human proteome project. In addition to HUPO and its eleven projects, including initiatives on the brain, kidneys, liver and stem cells, in Europe this is primarily PROSPECTS - a consortium of eleven research institutions, headed by Matthias Mann at the Max Planck Institute for Biochemistry in Martinsried near Munich. The aim of PROSPECTS is a catalog of human proteins including their structure, interactions and distribution in the cells. The group is also investigating how findings from proteomics can contribute to the treatment of neurodegenerative diseases such as Alzheimer's or Parkinson's.