The human secretome and membrane proteome

Approximately 36% of the 20162 human protein-coding genes have predicted secreted transcripts or transcripts with at least one predicted transmembrane region suggesting active transport of the corresponding protein out of the cell (secretion) or location in one of the numerous membrane systems in the cell. Interestingly, several genes code for multiple protein isoforms (splice variants) with alternative locations, including 226 genes with both secreted and membrane-bound isoforms. 1891 genes (9%) are predicted to have at least one secreted protein product, while 5580 (28%) are predicted to have at least one membrane-bound protein product. In addition, 12917 (64%) genes are predicted to be intracellular, i.e. no secreted or membrane-bound protein product, and most likely act as intracellular proteins in the cytoplasm and/or nucleus. In Figure 1, the number of protein-coding genes in the various categories are shown for all 20162 genes.

Figure 1. The number of all human protein-coding genes predicted to be (1) intracellular, (2) membrane-spanning (3) secreted and (4) membrane-spanning and secreted protein isoforms, where the latter consists of a group of genes with multiple splice variants with at least one secreted and one membrane-spanning.

The importance of secreted and membrane-bound proteins

Proteins that are secreted from the cell or located in the cellular membranes play a crucial role in many physiological and pathological processes. Medically important secreted proteins include cytokines, coagulation factors, growth factors and other signaling molecules. The functions of membrane proteins are diverse and include ion channel activity or transport of other molecules across the membrane, enzymatic processes, anchoring of other proteins and receptor signaling. A large fraction of the clinically approved treatment regimes today use drugs directed towards (or consisting of) secreted proteins or cell surface-associated membrane proteins. Out of the 854 protein targets with known pharmacological action for approved drugs on the market at present, 120 are secreted and 492 membrane-bound. See the Druggable Proteome page for more details.

What is a secreted protein?

A secretory protein can be defined as a protein which is actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in the secretion of proteins, but all cells in the body secrete proteins to a varying degree. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology. The most abundant secreted proteins include pancreatic enzymes (PRSS1, CELA3A, AMY2A) and other digestive enzymes expressed in salivary gland (PRR4, STATH, ZG16B) or stomach (PGA3, PGA4). One of the most important secretory organs is the liver, which produces a large number of plasma proteins such as albumin, fibrinogen and transferrin. Another group of highly abundant secreted proteins belong to the defensin family and are secreted by glandular cells in epididymis (DEFB118, DEFB106A and DEFB129). More information about the human secretome, including a categorisation of the genes based on whether their predicted final destination is in the blood, the brain, the digestive system or other locations, can be found here.


CELA3A

CPA1

AMY1B

Figure 2. Immunohistochemistry-based images from the secreted proteins CELA3A (Chymotrypsin-like elastase family, member 3A) in pancreas, CPA1 (Carboxypeptidase 1) in pancreas and AMY1B (amylase alpha 1B) in salivary gland.

What is a membrane protein?

Membrane proteins constitute one of the largest and most important classes of proteins. A membrane protein is associated or attached to the membrane of a cell or an organelle inside the cell and can be classified as either peripheral or integral. Peripheral membrane proteins are associated with the membrane by being bound to either peripheral regions of the membrane or to integral membrane proteins, but they do not fully span the membrane. Integral membrane proteins contain alpha-helical or beta-barrel structures which are hydrophobic and therefore can span the entire lipid bilayer and are linked by extramembranous loop regions. The alpha-helical integral membrane proteins form the major category of membrane proteins and are found in all types of biological membranes and will be the main focus here. Their key roles as transporters and receptors explain why they represent approximately 58% of all currently approved drug targets and hence their immense importance for the pharmacological industry. Many important receptors and cell surface molecules are found in the list of human cell differentiation molecules (CD-markers). G-protein coupled receptors (GPCRs) , which contain seven transmembrane (TM) segments and include 743 of the human protein-coding genes, comprise the largest group of membrane protein drug targets.


Figure 3. Different classes of membrane proteins.


C5AR1

CYSLTR2

DSC2

Figure 4. Immunohistochemistry-based images from the CD marker C5AR1 in gall bladder, the G-protein coupled receptor CYSLTR2 in placenta and DSC2 in esophagus.

Prediction of transmembrane protein topology and signal peptides

Developing a better understanding of membrane protein structure and function is of immense importance for both biological and pharmacological purposes. Since membrane proteins are difficult to crystallize and severely underrepresented in structural databases, computational prediction of membrane protein structure has been crucial for continued studies of these key molecules. Most membrane protein prediction methods have focused on the topology of a-helical membrane proteins, i.e. the prediction of the position of the transmembrane (TM) segments in the protein sequence and their orientation relative to the membrane (Figure 5).


Figure 5. A schematic view of the topology of an alpha-helical membrane protein with four transmembrane segments and extracellular N- and C-terminals.

The TM segments are identified based on features such as length, amino acid property and hydrophobicity, and many prediction methods are based on machine-learning techniques. Here, a selection of seven prediction algorithms was used to create a majority decision-based method (MDM), using the combined results from the chosen tools, to estimate the human membrane proteome. Each protein with at least one TM segment with overlapping predictions by four out of the seven methods is considered a membrane protein. Table 1 shows the individual results in number of predicted protein-coding genes by each method, as well as the MDM prediction.

Table 1. Prediction of the human membrane proteome by seven different prediction methods for membrane protein topology as well as the majority decision-based method MDM and a method specialized in prediction of GPCRs.

Protein class Number of genes Number of proteins Source
Predicted membrane proteins 5580 17591 MDM
MEMSAT3 predicted membrane proteins 7505 23235 MEMSAT3
MEMSAT-SVM predicted membrane proteins 6459 20546 MEMSAT-SVM
Phobius predicted membrane proteins 5883 18427 Phobius
SCAMPI predicted membrane proteins 6560 19551 SCAMPI
SPOCTOPUS predicted membrane proteins 7826 25107 SPOCTOPUS
THUMBUP predicted membrane proteins 7290 23205 THUMBUP
TMHMM predicted membrane proteins 5648 17661 TMHMM
GPCRHMM predicted membrane proteins 856 1536 GPCRHMM

The N-terminal signal sequences that are found in most secreted proteins and some types of membrane proteins are often called signal peptides (SP). A signal peptide is primarily identified by a short hydrophobic alpha-helix combined with a number of features that enables computational prediction based on the amino acid sequence of the protein. There are also a number of methods which incorporate a SP prediction model into their TM topology prediction algorithm to enables more reliable results when it comes to distinguishing between the two features. Here, the human secretome was predicted by a whole-proteome scan using three methods for signal peptide prediction: SignalP4.0, Phobius and SPOCTOPUS, which all have been shown to give reliable prediction results in a comparative analysis. Similarly to the MDM, a majority decision-based method for secreted proteins (MDSEC) was constructed using results from three different prediction methods. All proteins with a predicted SP by at least two of the three methods were considered secreted and were further annotated in order to exclude genes that are predicted to reside in intracellular locations such as ER or Golgi, despite having a signal peptide prediction. Since signal peptides are found both in secreted proteins and in certain types of membrane proteins, all proteins with a predicted SP in combination with a predicted TM region according to the MDM are considered membrane-spanning and therefore not secreted. The resulting numbers of genes encoding a predicted secreted protein are shown in Table 2.

Table 2. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC and the final prediction resulting from the annotation of the human secretome.

Protein class Number of genes Number of proteins Source
Predicted secreted proteins 1891 4950 HPA
Secreted proteins predicted by MDSEC 3218 7537 HPA
SignalP predicted secreted proteins 2801 6517 SignalP
Phobius predicted secreted proteins 3616 8477 Phobius
SPOCTOPUS predicted secreted proteins 3947 8982 SPOCTOPUS

Classification of the human proteome

The combined results from analyses of the membrane proteome and the secretome were used to map the distribution of potential membrane proteins and secreted proteins in the human proteome. The protein isoforms of all human genes were annotated using the three categories: (i) secreted, (ii) membrane and (iii) intracellular (i.e., proteins with no predicted SP/TM features). Note that proteins classified as membrane may be located in intracellular membranes such as the endoplasmic reticulum or Golgi. Each of the human protein-coding genes were subsequently classified into those with all isoforms belonging to one of these groups or genes encoding protein isoform belonging to two or all three categories. The results (Figure 6) show that 36% of the human predicted genes have at least one protein isoform which is membrane-spanning or secreted (see top of page).


Figure 6. Venn diagram showing the overlap between the number of genes that are intracellular, membrane-spanning, secreted, or with isoforms belonging more than one of the three categories.

Examples of protein classes including secreted and membrane proteins

There are a number of important protein classes involving membrane-, proteome- and secretome-related proteins. In Table 3, some examples of such classes are presented.

Table 3. A selection of classes related to the membrane proteome and secretome.

Protein class Number of genes Number of proteins Source
CD markers 384 1005 UniProt
Voltage-gated ion channels 132 355 IUPHAR-DB
Transporters 2138 5396 TCDB
GPCRs excl olfactory receptors 391 776 UniProt
Plasma proteins 3750 9474 Plasma Proteome Database

The plasma proteome

Plasma is the clear, liquid fraction of the blood which remains when the white blood cells, red blood cells and platelets are removed. It is composed of water (90%), proteins (7-8%) and smaller substances such as salts, gases and nutrients. The most important functions of plasma includes transport of compounds needed in different parts of the body, balancing the fluid exchange of all tissues by regulating the osmotic pressure, as well as playing a large role in immune system function. Most cells in the body communicate with plasma directly or indirectly through other fluids. Analysis of the proteins present in plasma can therefore provide important information about a patient's health.

The plasma proteome has an extraordinary dynamic range spanning more than 10 orders of magnitude between the concentration of the most abundant protein albumin (ALB), which acts as a transporter and helps maintain colloid osmotic pressure, and the rarest proteins detectable today, which include interleukins and tissue leakage proteins. 90% of the plasma proteome consists of the ten most highly abundant proteins, which along with albumin include fibrinogen, involved in blood clotting, and immunoglobulins mainly involved in immune processes.

Although many proteins of the plasma proteome are secreted proteins that have gone through the secretory pathway, there is another group that is composed of tissue leakage proteins which are found within cells but can be released into plasma as a result of cell death or damage. There is also an interesting class of proteins which go through a non-classical secretion without entering the ER/Golgi-pathway and includes cytokines such as interleukin 1β (IL1B) and mitogens such as fibroblast growth factor 2 (FGF2).

A list of plasma proteins obtained from the Plasma Proteome Database can be found here.

The secretory pathway

In the secretory pathway (Figure 7), proteins with a signal sequence that guides them to the endoplasmic reticulum (ER) are transported from the ER through the Golgi apparatus via vesicles to arrive at the surface of the cell. The signal sequence targeting proteins for secretion, called a signal peptide, is a short, hydrophobic N-terminal sequence which is inserted into the ER membrane and subsequently cleaved off from the protein. Membrane proteins may also contain a SP, but most often the N-terminal transmembrane (TM) region function as the signal sequence. The ER signal sequences are recognized by chaperone proteins which guide the synthesizing ribosomes to the rough ER where translocation of the protein sequence occurs in a protein complex named the translocon. Membrane proteins are transferred to the lipid bilayer of the ER membrane via the translocon whereas secretory proteins are transported into the ER lumen. Once inside the ER lumen, other chaperone proteins make sure that the protein is folded and assembled correctly and the oxidative environment enables formation of disulfide bonds, addition of carbohydrates and proteolytic cleavages. The proteins that pass the ER quality control are transported via vesicles to the Golgi apparatus, where they are further modified in important processes such as glycosylation and phosphorylation. The Golgi is also responsible for sorting of proteins for transport to their final destination, which most often is the plasma membrane, lysosomes or secretion out from the cell.


Figure 7. Overview of the secretory pathway.

Relevant links and publications

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419