Motivation

Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, plays an important role in maintaining the health of peoples of Asia, and is gaining more and more application all over the world. However, TCM is different from the modern western medicine in substance, methodology and philosophy, hindering western countries from recognizing the TCM. Therefore, to make TCM serve people all over the world better, it’s necessary to bring the ancient practice of TCM into line with modern standards, among which the elucidation of the molecular mechanism of TCM is one of important issues. However, owing to the diversity of TCM’s ingredients and the complexity of TCM’s interaction with the human body, to uncover the mechanism underlying TCM is still quite difficult. The clarification of the TCM’s molecular mechanism has become a bottleneck in TCM modernization and internationalization, which is urgently needed to be solved.

BATMAN-TCM is the first online bioinformatics analysis tool specially designed for the research of molecular mechanism of TCM, mainly based on TCM ingredients’ target prediction and the following network pharmacology analyses of the potential targets, aiming to contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM and to provide clues for the following experimental validation.


Main function

The main functions of BATMAN-TCM include:

  • The target prediction for each ingredient of the submitted TCM;
  • Bioinformatics analyses of the TCM’s (potential) targets, including Gene Ontology (GO) function term, KEGG biological pathway and OMIM/TTD disease enrichment analyses;
  • The visualization of the ingredient-target-pathway/disease association network and biological pathways with highlighted TCM’s target proteins;
  • Combinational/Comparison analyses of multiple TCM formulas/herbs or herb lists/compound lists. This function is mainly designed to simultaneously analyze multiple compositive herbs of a formula, helping understand the combinational principle of a formula from molecular and systematic level.

  • Function2TCM: search TCM formulas/herbs significantly associated with a certain GO term/KEGG pathway/TTD or OMIM disease term. This function is implemented based on BATMAN-TCM’s analysis results of TCMID fomulas/herbs.

  • Browser compatibility

    Chrome is recommended to navigate BATMAN-TCM.


    Input data type

    BATMAN-TCM supports three input types: 1) Formula; 2) Herb or Herb list; 3) Compound list. The three input types are introduced in detail as below, and for your convenience, we also provide corresponding examples for demo in the homepage.

    Type 1: Formula (Example 1)

    Select input type "Formula";

    Input the formula’s Pinyin name (e.g. LIU WEI DI HUANG TANG), which is case insensitive;

    For this input type, only one formula each time is supported.

    Type 2: Herb or Herb list(Example 2)

    This input type is designed mainly for the research of a herb or a formula composed of multiple herbs.

    Select input type “Herb or Herb list”;

    Input the herb’s Pinyin name (e.g. REN SHEN), Latin name (Ginseng) or English name (Panax ginseng), which are case insensitive;

    For this input type, multiple herbs are supported, one herb per line.

    Attention:This input type enables user-customized analyses. By this input type, a formula not backed by our database can be analyzed as long as the compositive herb list of the formula is provided.

    Type 3: Compound list(Example 3)

    Select input type “Compound list”;

    Input compounds denoted by Pubchem_CIDs or chemical structures of InChI format, each compound per line;

    Such as: Pubchem_CID:867 or
    InChI format:
    InChI=1S/C27H30O16/c1-8-23(42-27-21(37)19(35)17(33)15(7-28)41-27)20(36)22(38)26(39-8)43-25-18(34)16-13(32)5-10(29)6-14(16)40-24(25)9-2-3-11(30)12(31)4-9/h2-6,8,15,17,19-23,26-33,35-38H,7H2,1H3/t8-,15-,17-,19+,20-,21-,22-,23-,26+,27+/m0/s1

    Attention: This input type enables user-customized analyses. By this input type, users may analyze a formula or a herb not in our database as long as compositive compound list is provided, or use user-defined compound list to analyze the TCM’s molecular mechanism, which may be TCM’s active ingredients, metabolites of original ingredients, ingredients detected in plasma or something like those that really contribute to its therapeutic effects supported by other prior knowledge.


    Multiple queries

    Besides the three input types above, BATMAN-TCM also supports users to simultaneously analyze multiple queries (i.e. TCM formulas/herbs or herb lists/compound lists) for combinational/comparison analyses of the multiple queries by the “Add one cluster” function of BATMAN-TCM.

    Motivation of multiple queries (clusters):

    Besides supporting the research of holistic molecular mechanism of a formula or herb, BATMAN-TCM also provides the function of combinational/comparison analyses of multiple queries (clusters).

    Formula, the main therapeutic concept in TCM, is a combination of multiple herbs which are organized based on the combinatorial principle of “emperor-minister-assistant-courier” (“jun-chen-zuo-shi” in Chinese) .The “jun” herb treats the main cause or primary symptoms of a disease, the “chen” herb assists the “jun” herb to augment its therapeutic effects and relieve secondary symptoms, the “zuo” herb is often used to counteract the toxic or side-effects of other herbs and the “shi” herb generally ensures the absorption of the formula’s components and helps deliver or guide them to the target organs.

    The combinational analysis function is originally designed to be used for comparing different compositive herbs of a formula and helping understand this combinational principle of a formula from molecular and systematic level. But of course this function can also be used to compare any different formulas, herbs/herb lists or compound lists for users’ own purposes.

    Cluster definition and submission(Example 4)

    By clicking on the “Add one cluster” button on the homepage of BATMAN-TCM, users can submit multiple queries (i.e. multiple formulas/herbs or herb lists/compound lists) to perform the combinational/comparison analyses. In this case, each query is defined as a “cluster” in BATMAN-TCM. You can name the cluster by yourself to discriminate different clusters by the “Cluster name”, whose default cluster name is “ClusterN”. Currently maximum 4 clusters are supported.

    Please clicking on"Example 4" on the homepage for the demo of this function’s usage, which performs the comparison/combinational analyses of the compositive herbs of the formula “Ma-huang Decoction”.


    Parameter setting

    Score_cutoff: For each TCM’s ingredient, BATMAN-TCM ranks its predicted candidate targets according to the order of decreasing score given by the target prediction algorithm for the drug-target interaction prediction (Please see the “The target prediction method” part of this tutorial for the detailed introduction of the algorithm). The predicted candidate targets with scores>="Score_cutoff"(including known direct targets backed by DrugBank, KEGG and TTD) will be hypothesized as the potential targets of the query compound. The following network pharmacology analyses are based on these potential targets of TCM. "Score_cutoff" can also be changed on the Result page (see “Explanation of the analysis results” part of this tutorial for details).

    Adjusted P-value cutoff: The significantly enriched GO functional terms, KEGG biological pathways and TTD/OMIM diseases among the potential targets of the interested TCM are analyzed. The cutoff of the P-value after Benjamini-Hochberg multiple testing correction (i.e. adjusted P-value cutoff) for the significant level can be set. The default value is 0.05.

    E-mail notification(optional): This function is optional. If you input your E-mail address, two E-mails will be sent to you. The first one is sent to you after your job is submitted, in which a private URL for your job will be provided to help you check the progress of your job and the analysis results at any time. Note: If you can't receive this E-mail in ten minutes, please check you spam mail just in case. The second one is sent to you when your job is finished.


    Explanation of the analysis results

    Parameters adjustment


    On the result page, parameter Score_cutoff and Adjusted P_value cutoff which have been set on the homepage can be changed and all the results (including Result 1, 2 and 3) will be re-analyzed (Figure 1).

    Figure 1. The screenshot of “Parameters adjustment”.



    Result1: Target prediction result


    Target Prediction Result

    For each ingredient of TCM, the predicted candidate targets (denoted by Gene Symbol) with scores>=Score_cutoff ranked according to the order of decreasing scores given by the target prediction method are listed in the result table. A screenshot of the target prediction result table for Example 1 is provided below (Figure 2). You can also visit this result by clicking here.


    Figure 2. A screenshot of the target prediction result page for Example 1



    A: According to user-defined parameter, only the predicted candidate targets with scores>=Score_cutoff will be presented in the result table and be considered as potential targets.

    B: A summary about the user’s query will be given first including the TCM formula, compositive herbs and chemical ingredient list with corresponding TCMID database’s hyperlinks (The formula-herb-ingredient association relationships in BATMANT-TCM are backed by the data of TCMID database).

    C:The target prediction result table. Compounds are crosslinked to PubChem or TCMID, and targets to GeneBank. For each ingredient, known targets reported by DrugBank(version: 20150726), KEGG (version: July 31, 2014) and TTD (version: 4.3.02) database (marked by “known target in DrugBank, KEGG or TTD”) will be given first if there are, followed by predicted targets.

    D: Users can search the interested protein among these targets. Only one keyword is allowed.

    E: The complete target prediction result can be downloaded (the predicted candidate targets with score not smaller than 2 for each compositive compound).


    Venn diagram for comparison between the predicted target sets of different clusters

    If multiple clusters are submitted by the “Add one cluster” function of BATMAN-TCM, besides the “Target Prediction Result” for each query will be presented, a Venn graph will also be produced for the comparison between the target sets of different clusters. In the Venn graph, the number of targets common to multiple clusters or specific to a cluster will be given, and clicking on the different areas of the Venn graph will lead to the corresponding target list in the right box.

    A screenshot of the Venn graph for Example 4 is provided below (Figure 3). You can also visit this result by clicking here.

    Figure 3. A screenshot of the Venn graph for Example 4



    Result 2: Bioinformatics analyses of potential targets

    Further for these potential targets of TCM, BATMAN-TCM provides three enrichment analyses, including KEGG biological pathway, GO functional annotation and OMIM/TTD disease enrichment analyses. The significantly enriched KEGG biological pathways, GO functional terms (including biological process, molecular function and cellular component) and OMIM disease phenotypes/TTD diseases among the potential targets of TCM’s ingredients together with corresponding adjusted P-value and targets mapped to this term will be presented in the result table.
    Attention: The enrichment analyses are based the predicted candidate targets with scores>=Score_cutoff (including known target).
    All the enrichment analysis results can be downloaded by clicking on the link “Download all the enrichment analysis results” at the top right corner of the Result2 page.

    All the screenshots of Result2 presented in this tutorial for Example 4 can be visited by clicking here.



    KEGG pathway enrichment analysis result

    These significantly enriched biological pathways among the potential targets of the interested TCM may play crucial roles for TCM’s therapeutic effects, providing direct clues for further experimental validation of molecular mechanism of TCM.

    Figure 4. The screenshot of the KEGG pathway enrichment analysis result table for Example 4

    (1) KEGG pathway; (2) KEGG pathway with “tree view”.



    A: Here we provide two kinds of views to present the KEGG pathway enrichment analysis result. Tree view shows the hierarchy of pathways.

    B: Clicking on the “Pathway Graph” will lead to the pathway view with highlighted targets (denoted by different colors for different clusters). Please see the “Pathway mapping” part of this tutorial for details.

    C: “Targets” are referred to as the targets mapped to the pathway, and clicking on the number will present the detailed target list.

    D: The significantly enriched pathways with adjusted P-value smaller than the cutoff set by users are highlighted in red.

    E: The “KEGG pathway IDs” are crosslinked to the KEGG database.


    Pathway mapping

    Clicking on the “Pathway Graph” will lead to the pathway view with highlighted targets (denoted by different colors for different clusters). The biological pathway view of the targets and their upstream/downstream relationship can help the deep analyses of TCM’s molecular mechanism and providing clues for further experimental validation.

    This function is based on the R package of “Pathview”, which is a tool designed for pathway-based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs (Luo W et al. Bioinformatics, 2013, 29(14): 1830-1831). If multiple clusters are submitted, targets from different clusters will be shown by different colors.


    Figure 5. The screenshot of the pathway view for Example 4



    GO enrichment analysis result

    The significantly enriched GO functional terms (including biological processes, molecular functions and cellular components) may play crucial roles for TCM’s therapeutic effects, also providing direct clues for the following experimental validation of molecular mechanism of TCM. The GO enrichment analysis result is presented as a tree structure which shows the hierarchical relationship between GO terms.

    Figure 6. The screenshot of the GO enrichment analysis result table for Example 4

    A: The “GO term IDs” are crosslinked to the GO website. GO terms of three categories are painted by different colors.



    Disease enrichment analysis result

    TCM is prescribed for particular “pattern” (“zheng” in Chinese), while modern western drugs are designed for treating particular disease. The overrepresented diseases among the potential targets of TCM construct the connection between TCM and diseases by targets and disease-related genes, not only bridging the gap between TCM and modern western medicine but also providing clues for exploring novel diseases on which the TCM has therapeutic effects. Here the disease-gene association data are from OMIM and TTD respectively.

    Figure 7. The screenshot of the disease enrichment analysis result table for Example 4

    A. Disease enrichment analysis result based on disease-gene associations from OMIM database.

    B. Disease enrichment analysis result based on disease-gene associations from TTD database.



    Result 3: Network visualization

    TCM is the typical “multi-component and multi-target” agent, and its molecular mechanism is especially suitable to be studied through the network-based analyses, which can capture the TCM’s complexity in a simple and intuitive way. The “ingredient-target-pathway/disease” association network helps intuitively understand the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism and the potential disease treatment mechanism of TCM.

    In the association network, there are four kinds of nodes distinguished by different shapes and colors including TCM’s ingredients, targets, biological pathways and OMIM/TTD diseases and three types of edges including ingredient-target association (if the protein is known or potential target of the ingredient), target-pathway association (if the target protein is a member of the biological pathway) and target-disease association (if the target protein is a known related gene of the disease). In addition, to emphasize the important elements, the size of the target node, pathway node and disease node is proportional to their degree in the network, which is respectively defined as the number of compounds acting on the target, the number of targets involved in the pathway and the number of targets being known the disease-related genes, and meanwhile users can only exhibit those targets with more than M linking compounds (which can be adjusted by the slider on the result page) in the network.

    There are two types of network view. Different from the “Whole network view”, in the “Simplified network view” only those significantly enriched pathways /diseases (adjusted P-value <= cutoff set by users) are shown in the network.

    The network graph is draw based on the predicted candidate target proteins with scores>=Score_cutoff(user defined) of each query TCM’s ingredient.

    A screenshot of the “Network visualization” for Example 4 is provided below (Figure 8), you can also visit this result by clicking here.

    Figure 8. A screenshot of the “Network visualization” for Example 4


    The target prediction method

    Here we used a similarity-based method to predict potential targets of TCM ingredients. The core idea of this method, which was first proposed by Perlman et al.[ J Comput Biol. 2011 Feb;18(2):133-45.], is to rank potential drug-target interactions based on their similarity to the known drug-target interactions.
    Here drug-target prediction was treated as a binary classification problem, and the prediction model was constructed to distinguish true and false drug-target interaction. For each query drug-protein interaction, we defined the feature value of a classification feature as the largest one among the similarity scores between this query interaction and all known drug-target interactions (i.e. the golden standard positive dataset). The similarity between two drug-target interactions was calculated as the product of their drug similarity score and the target similarity score. Here we used 6 scores to measure the drug-drug similarity respectively based on chemical structure (including FP2 fingerprint-based and functional group-based similarity scores), side-effect, ATC (Anatomical, Therapeutic and Chemical) classification system, drug-induced gene expression and the text mining score of chemical-chemical association, and 3 scores to measure protein-protein similarity respectively based on protein sequence, closeness in a protein interaction network and Gene Ontology (GO) functional annotation, resulting in 18 (=6×3) features in total. After minimum redundancy maximum relevance (mRMR) feature selection, the prediction model integrating 8 features was used in BATMAN-TCM, including ATC-GO, FP2-closeness, STITCH-sequence, expression-closeness, ATC-sequence, functional_group-sequence, functional_group-GO and side_effect-sequence. For a query interaction, the maximum of the Likelihood Ratios (LRs) of the 8 features was used as the ultimate prediction score. Likelihood Ratio (LR) is referred to as the ratio of the probability of feature f observed in the golden standard positive (GSP) dataset to that in the golden standard negative (GSN) dataset, which can measure the confidence level of a feature. Generally LR>1 means the feature has the prediction ability. By comprehensive evaluation, we showed the prediction method had good prediction performance (see details in our future publication).
    In BATMAN-TCM, for a query compound, we rank all candidate proteins according to the order of the decreasing prediction scores given by the prediction model, and consider proteins with high scores as the potential drug targets of the query compound.


    The method for enrichment analyses

    The enrichment analyses of GO functional terms, KEGG biological pathways and OMIM phenotypes/TTD diseases of a group of protein targets are all based on the hypergeometric cumulative distribution test(for KEGG pathway enrichment analysis, we exclude those targets located in >=10% KEGG pathways), and the multiple testing correction of the P-value was based on Benjamini-Hochberg correction method.


    Publications

    We would be very grateful if you could provide your publications based on BATMAN-TCM to us!

    Contact: Zhongyang Liu, liuzy1984@163.com, Beijing Proteome Research Center, Beijing, China.