Glycan nomenclature

From Wikipedia, the free encyclopedia

Glycan nomenclature is the systematic naming of glycans, which are carbohydrate-based polymers made by all living organisms. In general glycans can be represented in (i) text formats, these include commonly used CarbBank, IUPAC name, and several other types; and (ii) symbol formats, these are consisting of Symbol Nomenclature For Glycans and Oxford Notations.

History[edit]

In the beginning of the nineteenth century, names of sugar molecules were derived from their source. For example, glucose were called grape sugar (Traubenzucker), saccharose were called cane sugar (Rohrzucker). In 1838, the name glucose was coined; subsequently in 1866 Kekulé proposed the name 'dextrose' as glucose is dextrorotatory. It was decided by the scientific community that sugars should be named with the ending '-ose', which then was combined with the French word 'cellule' for cell, resulting in the term cellulose. As the empirical composition of monosaccharides can be expressed as Cn(H2O)n, they were termed as ‘carbohydrate’ (French ‘hydrate de carbone’).[1]

Text tormats[edit]

To represent the structural information of glycans more accurate and achieve specific purpose for the community, several unique formats were designed and used in different carbohydrate databases developed through different research groups and organizations.

CarbBank[edit]

The CarbBank format is originally from CarbBank,[2] a database management system for Complex Carbohydrate structure Database (CCSD). The CarbBank is created by researchers at the Complex Carbohydrate Research Center (CCRC) of University of Georgia. An example of an N-glycan of Man-3-Core F is shown below:

CCSD
                            a-L-Fucp-(1-6)+
                                          |
 a-D-Manp-(1-6)+                  b-D-GlcpNAc-(1-4)-Asn
               |                          |
          b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)+
               |
 a-D-Manp-(1-3)+

In general, this format is human-readable but the vertical bars make it difficult for a computer to parse.

IUPAC[edit]

IUPAC is the International Union of Pure and Applied Chemistry, and they propose a nomenclature for representing complex carbohydrates called 2-Carb.[3] The IUPAC nomenclature provides three forms to represent the glycans.

  • Extended form: In this format, a monosaccharide unit is represented by a given symbol, after the anomeric descriptor and the configuration symbol. An italic letter is used to represent the ring size, e.g. f for furanose and p for pyranose. The parentheses between the symbols is used to provide locants of the linkage and a double-headed arrow is used to show a linkage between two anomeric positions.
  • Condensed form: This format eliminated both the configurational symbol and the letter denoting ring size. In general, the configuration is D (except for fucose and iduronic acid that are generally in L configuration) and the rings are in pyranose form (unless explicitly mentioned as in other form). The parentheses is used to write the anomeric descriptor along with the locants.
  • Short form: It is usually desirable to shorten the notation by eliminating the anomeric carbon atoms locants, the parentheses around the locants of the linkage, and the hyphens. Moreover, branches can be shown on the same line with the aid of appropriate enclosing marks including parentheses and square brackets.

The above example glycan can be represented as below:

Extended form: α-D-Manp-(1→3)-[α-D-Manp-(1→6)]- β-D-Manp-(1→4)- β-D-GlcpNAc-(1→4)-[ α-L-Fucp-(1→6)]- β-D-GlcpNAc-(1→NASN-protein

Condensed form: Man(α1-3)[Man(α1-6)]Man(β1-4)GlcNAc(β1-4)[Fuc(α1-6)]GlcNAc(β1-ASN

Short form: Manα3(Manα6)Manβ4GlcNAcβ4(Fucα6)GlcNAcβASN

Note:

Modified Condensed IUPAC: Manα1-3(Manα1-6)Manβ1-4GlcNAcβ1-4(Fucα1-6)GlcNAcβ1-Asn

LINCUS[edit]

Linear Notation for Unique description of Carbohydrate Sequences (LINCUS)[4] is a format used in Glycosciences.de. This format is targeted to describe the glycan structure unique.[5] The glycan example in LINCUS format could be:

[][ASN]{[(4+1)][B-D-GLCPNAC]{[(4+1)][B-D-GLCPNAC]{[(4+1)][B-D-MANP]{[(3+1)][A-D-MANP]{}[(6+1)][A-D-MANP]{}}}[(6+1)][A-L-FUCP]{}}}

Linear Code[edit]

Linear Code is a linear notation proposed by GlycoMinds Ltd. and is one of the most compact formats. Here, (i) the common monosaccharides are indicated by a maximum two letter code, (ii) linkages are indicated by “a” or “b” for anomers, (iii) the number are at the end carbon number linkage, and (iv) The branches are indicated by parentheses.[6]

Ma3(Ma6)Mb4GNb4(Fa6)GN;N

GlycoCT[edit]

GlycoCT is the format designed and developed under the EuroCarbDB project. This format uses connection table approach to describe the full complexity of carbohydrate sequence data.[7] It is widely used by the bioinformatics community through the database GlycomeDB.[8] A GlycoCT format of the example glycan is shown below:

RES
1b: b-dglc-HEX-1: 5
2s: n-acetyl
3b: b-dglc-HEX-1: 5
4s: n-acetyl
5b: b-dman-HEX-1: 5
6b: a-dman-HEX-1: 5
7b: a-dman-HEX-1: 5
8b: a-lgal-HEX-1: 5 | 6: d
LIN
1: 1d (2 + 1) 2n
2: 1o (4 + 1) 3d
3: 3d (2 + 1) 4n
4: 3o (4 + 1) 5d
5: 5o (3 + 1) 6d
6: 5o (6 + 1) 7d
7: 1o (6 + 1) 8d

WURCS[edit]

The Web3 Unique Representation of Carbohydrate Structures (WURCS) format is initially developed for GlyTouCan, the international glycan structure repository. As GlyTouCan used the Semantic Web technologies for development, it requires a linear string to represent the glycan.[9] The example glycan in WURCS format as below:

WURCS=2.0/4,6,5/[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5][a1221m-1a_1-5]/1-1-2-3-3-4/a4-b1_a6-f1_b4-c1_c3-d1_c6-e1

KCF[edit]

The KEGG Chemical Function (KCF) is designed and used in Kyoto Encyclopedia of Genes and Genomes (KEGG) database.[10] It also uses a connection table approach. The example glycan in KCF format as below:

ENTRY       G10661                      Glycan
NODE        7
            1   Asn        20     3
            2   GlcNAc     10     3
            3   LFuc        0     8
            4   GlcNAc      0    -2
            5   Man       -10    -2
            6   Man       -20     3
            7   Man       -20    -7
EDGE        6
            1     2:b1    1    
            2     3:a1    2:6  
            3     4:b1    2:4  
            4     5:b1    4:4  
            5     6:a1    5:6  
            6     7:a1    5:3  
///

CSDB Linear[edit]

Carbohydrate Structure Database (CSDB) includes the Bacterial (BCSDB) [11][12] and Plant and Fungal (PFCSDB)[13] parts. This database utilizes a connection table for internal storage of structures and the CSDB linear code for input–output.

aDManp(1-3)[aDManp(1-6)]bDManp(1-4)[Ac(1-2)]bDGlcpN(1-4)[aLFucp(1-6),Ac(1-2)]bDGlcpN(1-4)xLAsn

GLYCAM Condensed[edit]

GLYCAM Condensed format, as well as GLYCAM format, is provided by GLYCAM-Web, which is produced by the research group of Professor Robert J. Woods in the Complex Carbohydrate Research Center at the University of Georgia in Athens GA.

GLYCAM Condensed: DManpa1-3[DManpa1-6]DManpb1-4DGlcpNAcb1-4[LFucpa1-6]DGlcpNAcb1-ASN

GlYCAM: a-D-Manp-(1-3)-[a-D-Manp-(1-6)]-b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-[a-L-Fucp-(1-6)]-b-D-GlcpNAc-ASN

Glyde and Glyde II[edit]

The GLYcan Data Exchange (GLYDE) format,[14] is an XML-based representation format for glycomics data. It was a part of the Integrated Technology Resources for Biomedical Glycomics, which established by a team from Complex Carbohydrate Research Center of University of Georgia.

<Glycan>
  <residue>
    <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc" ring_form="p">
      <residue link="6" anomeric_carbon="1" anomer="a" chirality="L" monosaccharide="Fuc" ring_form="p">
      </residue>
      <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="GlcNAc" ring_form="p">
        <residue link="4" anomeric_carbon="1" anomer="b" chirality="D" monosaccharide="Man" ring_form="p">
          <residue link="3" anomeric_carbon="1" anomer="a" chirality="D" monosaccharide="Man" ring_form="p">
          </residue>
          <residue link="6" anomeric_carbon="1" anomer="a" chirality="D" monosaccharide="Man" ring_form="p">
          </residue>
        </residue>
      </residue>
    </residue>
  </residue>
</Glycan>

GLYDE II,[15] is the successor of GLYDE to overcome the limitations of GLYDE, uses a connection table approach.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE GlydeII SYSTEM "http://glycomics.ccrc.uga.edu/GLYDE-II/GLYDE-II-1.2.DTD"[
  <!ENTITY mDBget "http://www.monosaccharideDB.org/GLYDE-II.jsp?G">
]>
<GlydeII>
  <molecule subtype="glycan" id="M3N2">
    <residue subtype="base_type" partid="1" ref="mDBget;=b-dglc-HEX-1:5" />
    <residue subtype="substituent" partid="2" ref="mDBget;=n-acetyl" />
    <residue subtype="base_type" partid="3" ref="mDBget;=a-lfuc-HEX-1:5" />
    <residue subtype="base_type" partid="4" ref="mDBget;=b-dglc-HEX-1:5" />
    <residue subtype="substituent" partid="5" ref="mDBget;=n-acetyl" />
    <residue subtype="base_type" partid="6" ref="mDBget;=b-dman-HEX-1:5" />
    <residue subtype="base_type" partid="7" ref="mDBget;=a-dman-HEX-1:5" />
    <residue subtype="base_type" partid="8" ref="mDBget;=a-dman-HEX-1:5" />
    <residue_link from="2" to="1">
      <atom_link from="N1" to="C2" from_replaces="O2" bond_order="1" />
    </residue_link>
    <residue_link from="3" to="1">
      <atom_link from="C1" to="O6" to_replaces="O1" bond_order="1" />
    </residue_link>
    <residue_link from="4" to="1">
      <atom_link from="C1" to="O4" to_replaces="O1" bond_order="1" />
    </residue_link>
    <residue_link from="5" to="4">
      <atom_link from="N1" to="C2" from_replaces="O2" bond_order="1" />
    </residue_link>
    <residue_link from="6" to="4">
      <atom_link from="C1" to="O4" to_replaces="O1" bond_order="1" />
    </residue_link>
    <residue_link from="7" to="6">
      <atom_link from="C1" to="O3" to_replaces="O1" bond_order="1" />
    </residue_link>
    <residue_link from="8" to="6">
        <atom_link from="C1" to="O6" to_replaces="O1" bond_order="1" />
    </residue_link>
  </molecule>
</GlydeII>

CabosML[edit]

A carbohydrate sequence markup language (CabosML)[16] is a description of carbohydrate structures using XML.

<?xml version="1.0" encoding="UTF-8" ?>
<g:Glyco xmlns:g="http://bio.mki.co.jp/ glycoinformatics/2003">
  <g:Carb_ID/>
  <g:Carb_structure>
    <g:MS name="GlcNAc" >
    <g:MS link="1-6" anom="a" name="Fuc" >
      <g:MS link="1-4" anom="b" name="GlcNAc" >
        <g:MS link="1-4" anom="b" name="Man" >
          <g:MS link="1-3" anom="a" name="Man" />
          <g:MS link="1-6" anom="a" name="Man" />
        </g:MS>
      </g:MS>
    </g:MS>
  </g:Carb_structure>
</g:Glyco>

Symbol formats[edit]

Many glycobiologists use figures to depict the complex glycan structures. Currently, there are two major ways to represent glycans using symbols: Symbol Nomenclature For Glycans (SNFG) and Oxford Notation.

Symbol nomenclature For Glycans[edit]

SNFG representation of Man-3-Core F

Oxford notation[edit]

The Oxford Notation was designed and developed by the researchers from Oxford Glycobiology Institute at University of Oxford in 2009.

Oxford Glycobiology Institute (UOXF) Notation of Man-3-Core F

Hybrid notation[edit]

To comply with the SNFG notation and respect the Oxford notation some drawing tools [17] generate hybrid cartoons with the SNFG symbols (monosaccharides) and linkage orientation as set by Oxford.

Hybrid SNFG + Oxford Glycobiology Institute (UOXF) Notation of Man-3-Core F


Formats conversion tools[edit]

The scientific community has developed a number of software tools to convert glycans represented in one format to another. Some of these most commonly used tools are listed below:

  1. GlycanFormatConverter:[18] A core library of glycan text conversion tools, which encoding WURCS from IUPAC-Extended, KCF and LinearCode® for the great majority of glycans registered in GlyTouCan.
  2. RINGS: A web resource providing algorithmic and data mining tools to aid glycobiology research.
  3. glypy:[19] An open source glycoinformatics library.

References[edit]

  1. ^ Alan D. McNaught, International Union of Pure and App (1996), "Nomenclature of Carbohydrates", Glycoscience, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 2727–2838, doi:10.1007/978-3-540-30429-6_70, ISBN 978-3-540-36154-1, retrieved 2021-10-03
  2. ^ Albersheim, P. (1990-05-01). "CARBBANK--A structural and bibliographic data base". OSTI 5926286. {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ "Carbohydrate Nomenclature". iupac.qmul.ac.uk. Retrieved 2021-10-03.
  4. ^ Bohne-Lang, Andreas; Lang, Elke; Förster, Thomas; von der Lieth, Claus-W. (2001-11-01). "LINUCS: LInear Notation for Unique description of Carbohydrate Sequences". Carbohydrate Research. 336 (1): 1–11. doi:10.1016/S0008-6215(01)00230-0. ISSN 0008-6215. PMID 11675023.
  5. ^ "Glycosciences.de LINUCS - LInear Notation for Unique description of Carbohydrate Sequences". www.glycosciences.de. Retrieved 2021-10-03.
  6. ^ Banin, Ehud; Neuberger, Yael; Altshuler, Yaniv; Halevi, Asaf; Inbar, Ori; Nir, Dotan; Dukler, Avinoam; author_in_Japanese (2002). "A Novel Linear Code® Nomenclature for Complex Carbohydrates". Trends in Glycoscience and Glycotechnology. 14 (77): 127–137. doi:10.4052/tigg.14.127. {{cite journal}}: |last8= has generic name (help)
  7. ^ Herget, S.; Ranzinger, R.; Maass, K.; Lieth, C.-W. V. D. (2008-08-11). "GlycoCT-a unifying sequence format for carbohydrates". Carbohydrate Research. 343 (12): 2162–2171. doi:10.1016/j.carres.2008.03.011. ISSN 0008-6215. PMID 18436199.
  8. ^ Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm (2008-09-19). "GlycomeDB - integration of open-access carbohydrate structure databases". BMC Bioinformatics. 9: 384. doi:10.1186/1471-2105-9-384. ISSN 1471-2105. PMC 2567997. PMID 18803830.
  9. ^ Tanaka, Kenichi; Aoki-Kinoshita, Kiyoko F.; Kotera, Masaaki; Sawaki, Hiromichi; Tsuchiya, Shinichiro; Fujita, Noriaki; Shikanai, Toshihide; Kato, Masaki; Kawano, Shin; Yamada, Issaku; Narimatsu, Hisashi (2014-06-23). "WURCS: The Web3 Unique Representation of Carbohydrate Structures". Journal of Chemical Information and Modeling. 54 (6): 1558–1566. doi:10.1021/ci400571e. ISSN 1549-9596. PMID 24897372.
  10. ^ Kotera, Masaaki; Tabei, Yasuo; Yamanishi, Yoshihiro; Moriya, Yuki; Tokimatsu, Toshiaki; Kanehisa, Minoru; Goto, Susumu (2013-12-13). "KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics". BMC Systems Biology. 7 (6): S2. doi:10.1186/1752-0509-7-S6-S2. ISSN 1752-0509. PMC 4029371. PMID 24564846.
  11. ^ Toukach, FV; Knirel, YA (2005). "New database of bacterial carbohydrate structures". Glycoconj. J. 22: 216–217.
  12. ^ Toukach, Philip V. (2011-01-24). "Bacterial Carbohydrate Structure Database 3: Principles and Realization". Journal of Chemical Information and Modeling. 51 (1): 159–170. doi:10.1021/ci100150d. ISSN 1549-9596. PMID 21155523.
  13. ^ Egorova, K. S.; Toukach, P. V. (2014-05-07). "Expansion of coverage of Carbohydrate Structure Database (CSDB)". Carbohydrate Research. EuroCarb 17. 389: 112–114. doi:10.1016/j.carres.2013.10.009. ISSN 0008-6215. PMID 24680503.
  14. ^ Packer, Nicolle H.; von der Lieth, Claus-Wilhelm; Aoki-Kinoshita, Kiyoko F.; Lebrilla, Carlito B.; Paulson, James C.; Raman, Rahul; Rudd, Pauline; Sasisekharan, Ram; Taniguchi, Naoyuki; York, William S. (January 2008). "Frontiers in glycomics: Bioinformatics and biomarkers in disease An NIH White Paper prepared from discussions by the focus groups at a workshop on the NIH campus, Bethesda MD (September 11–13, 2006)". Proteomics. 8 (1): 8–20. doi:10.1002/pmic.200700917. PMID 18095367. S2CID 23513084.
  15. ^ Ranzinger, Rene; Kochut, Krys J.; Miller, John A.; Eavenson, Matthew; Lütteke, Thomas; York, William S. (2017-01-01). "GLYDE-II: The GLYcan data exchange format". Perspectives in Science. Proceedings of the Beilstein Glyco-Bioinformatics Symposium 2015. 11: 24–30. doi:10.1016/j.pisc.2016.05.013. ISSN 2213-0209. PMC 5611833. PMID 28955652.
  16. ^ Kikuchi, N.; Kameyama, A.; Nakaya, S.; Ito, H.; Sato, T.; Shikanai, T.; Takahashi, Y.; Narimatsu, H. (2004-11-25). "The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures". Bioinformatics. 21 (8): 1717–1718. doi:10.1093/bioinformatics/bti152. ISSN 1367-4803. PMID 15564307.
  17. ^ Lal, K.; Bermeo, R.; Perez, S. (2020-10-02). "Computational tools for drawing, building and displaying carbohydrates: a visual guide". Beilstein J. Org. Chem. 16: 2448–2468. doi:10.3762/bjoc.16.199. PMID 33082879.
  18. ^ Tsuchiya, Shinichiro; Yamada, Issaku; Aoki-Kinoshita, Kiyoko F (2018-12-07). "GlycanFormatConverter: a conversion tool for translating the complexities of glycans". Bioinformatics. 35 (14): 2434–2440. doi:10.1093/bioinformatics/bty990. ISSN 1367-4803. PMC 6612873. PMID 30535258.
  19. ^ Klein, Joshua; Zaia, Joseph (2019-09-06). "glypy: An Open Source Glycoinformatics Library". Journal of Proteome Research. 18 (9): 3532–3537. doi:10.1021/acs.jproteome.9b00367. ISSN 1535-3893. PMC 7158751. PMID 31310539.