GeneMANIA Help and documentation

Command line tools

Installation

First, you need to download the GeneMANIA JAR file. If you already installed the plugin through Cytoscape, you can find it in one of the following places:

  • Unix/Mac: ~/.cytoscape/Cytoscape Version/plugins/GeneMANIA-Version/
  • Windows: My Documents\.cytoscape\Cytoscape Version\plugins\GeneMANIA-Version\
Gene Sanitizer Prints out the mappings between the given gene list and GeneMANIA's preferred identifiers.
Id Importer Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels.
Query Runner Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file.
Cross Validator Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.
Network Assessor Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.
Network Importer Imports network/profile data from a file into a GeneMANIA data set.
Validation Set Maker Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.

Gene Sanitizer

Prints out the mappings between the given gene list and GeneMANIA's preferred identifiers. This tool is useful for checking which of your genes are recognized by GeneMANIA. The output is a tab-delimited text file containing one mapping per line. The first item is GeneMANIA's preferred identifier, or nothing, if the identifier that follows isn't recognized.

Usage:

java -Xmx900M -cp GeneMANIA.jar org.genemania.plugin.apps.GeneSanitizer options gene-list-file

Example Gene List:

YMR043W YPR113W YCL067C YIL015W YNOT? YCR084C YFL026W YHR084W YGL008C YNL145W

Example Output:

YMR043W MCM1 YPR113W PIS1 YCL067C HMLALPHA2 YIL015W BAR1 YNOT? YCR084C TUP1 YFL026W STE2 YHR084W STE12 YGL008C PMA1 YNL145W MFA2

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--organism name The name or taxonomy id of an organism whose genes should be considered.

Id Importer

Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels. Although the resulting data set is generally treated like an organism, where the given ids denote its genome, it does not have to be an organism. The identifiers can be anything, as long as they're unique within the data set.

Usage:

java -Xmx900M -cp GeneMANIA.jar org.genemania.plugin.apps.IdImporter options

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--filename file-name The path to a file that contains a complete set of identifiers that will serve as the basis of a new data set. Each line in the file should follow this format:
primary-id ( \t alias-1 ... )
--name entity-name The name of the resulting entity (e.g. organism).
--alias entity-name Optional. An alias for the resulting entity (e.g. shorter, informal name)
--taxid number Optional. The taxonomy id of the resulting entity, if applicable.
--description description Optional. A description of the resulting entity.

Query Runner

Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file.

Usage (32-bit JVM):

java -Xmx1800M -cp GeneMANIA.jar org.genemania.plugin.apps.QueryRunner options query-file-1 [ query-file-2 ... ]

Usage (64-bit JVM):

java -d64 -Xmx3G -cp GeneMANIA.jar org.genemania.plugin.apps.QueryRunner options query-file-1 [ query-file-2 ... ]

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--in input-format Optional. The format of the query files, which can be one of:
  • flat (default): Tab-delimited (example).
  • xml: Not yet supported.
--out output-format Optional. The format of the output files, which can be one of:
  • genes (default): List of result genes ordered by score; one per line.
  • flat: Tab-delimited report containing details of prediction results and query parameters.
  • xml: XML-formatted report containing details of prediction results and query parameters.
  • scores: List of result genes with scores ordered by score for the entire genome (ignores related genes limit); one per line.
--scoring-method method Optional. The method used to compute the gene scores, which can be one of:
  • discriminant (default): GeneMANIA's classic scoring method.
  • z: Z-scores.
--ids id-types Optional. A comma-separated list of identifier types, in descending order of preference, which may be one or more of the following:
  • Ensembl Gene Name
  • Entrez Gene Name
  • Ensembl Gene ID
  • RefSeq mRNA ID
  • TAIR ID
  • Uniprot ID
  • RefSeq Protein ID
  • Ensembl Protein ID
  • Entrez Gene ID
If the most preferred identifier is not available for a given gene, the next most preferred identifier is selected. The list above reflects the default order of preference.
--results directory Optional. Path to where the prediction result files will be created (one per input query file). Defaults to the current working directory.
--threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
--verbose Optional. Makes QueryRunner print more details about what's happening.
--list-networks organism-name Optional. Lists the available networks for the given organism. You may need to put quotes around the organism name if invoked from a shell.
--list-genes organism-name Optional. Lists the genes that are recognized for the given organism. You may need to put quotes around the organism name if invoked from a shell. Each line in the output contains a gene and all its synonyms, if any.

Example Query (Flat):

yeast-example.query
S. Cerevisiae CDC27 APC11 APC4 XRS2 RAD54 APC2 RAD52 RAD10 MRE11 APC5 coexp pi gi 150 bp

Flat Query File Format:

organism-name query-gene-1 [ \t query-gene-2 ... ] networks related-gene-limit [ combining-method ]

Cross Validator

Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.

Usage (32-bit JVM):

java -Xmx1800M -cp GeneMANIA.jar org.genemania.plugin.apps.CrossValidator options

Usage (64-bit JVM):

java -d64 -Xmx3G -cp GeneMANIA.jar org.genemania.plugin.apps.CrossValidator options

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--organism name The name or taxonomy id of an organism whose genes should be considered.
--query file-name Perform validation against the gene sets listed in the given file. It must be formatted this way.
--networks network-list A comma-separated list of network types and/or network names. To get a full listing of network names, use the option --list-networks with Query Runner.
--exclude-networks network-list Optional. A comma-separated list of network types and/or network names to exclude from the --networks list.
--folds number Optional. The number of folds to use during cross validation. Defaults to 5.
--min number Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10.
--max number Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300.
--use-go-cache Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300.
--outfile file-name Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console).
--auto-negatives</em> Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file.
--method weighting-method Optional. The weighting method to use when combining the individual networks. Defaults to automatic.
--seed number Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random.
--threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
--verbose Optional. Makes CrossValidator print more details about what's happening.

Query File Format:

Multiple gene sets may be used during cross validation. Each gene set should be on its own line using the format below:

GENE_SET_ID \t + \t gene_symbol1 [ \t gene_symbol2 ... ] [ \t - \t neg_gene_symbol1 [ \t neg_gene_symbol2 ... ] ]

...where GENE_SET_ID is the name of your gene set, gene_symbol is a positive gene example, and neg_gene_symbol is a negative gene example (i.e. definitely not a member of the gene set).

If --use-go-cache is also specified, the query file should contain one GO id per line (e.g. GO:0005786).

Example Query File (S. Cerevisiae):

This query file only lists positive examples of genes. Use the option --auto-negatives to automatically label all other genes in each set as negative examples.

GO:0005786 + SCR1 SRP54 SEC65 SRP14 SRP68 SRP21 SRP72 GO:0022626 + RPS21A RPS21B HEF3 RPS8B RDN18-2 RDN18-1 RPL9A RPL9B RPS11B RPS11A RPS29A RPS29B RPS14A RPL1A RPL1B YGR054W RPS19B RPS19A RPS6B RDN5-1 RDN5-2 RDN5-3 RDN5-4 RDN5-5 RDN5-6 RPL24B RPL8B RPL8A RPL24A RPS22A RPS12 RPS22B RPL18A FES1 RPL10 RPS8A RPL41A RPL42A ASC1 RPS18A RPS18B SQT1 RPL14A RPL31A RPL31B RPL14B RPS2 RPL37B RPL16B RPL16A RPL37A RPS17A RPS17B RPS27B RPL27B RPL27A RPL5 RPL3 RPL7B RPL7A NMD3 RPL41B RPL11B RPL11A RPP2A TIF5 RPP2B RPL20B RPL20A RPS16B RPL17A RPL17B RPS16A RPL26A RPL26B RPS7A RPL6A RPL6B RPS28B RPS28A RDN25-1 TEF1 SIS1 RRP14 RPS31 REI1 RDN25-2 JJJ1 RPL42B RPL35A RPL35B RPL18B RPS5 RPS3 RPS25A RPS25B RPS15 RPL13A RPL13B RDN58-2 RDN58-1 RPS9B RPL22A RPL22B RPS9A RPL36A RPS4A RPS4B RPL36B RPS30B RPS20 RPS30A RPS26A NAT1 RPS26B RPL19B NAT5 RPL19A GCN1 GCN2 RPS7B RPS6A RPL4B RPL4A ARX1 RPL21A RPL21B RPS13 RPP1A RPP1B RPS23B RPL23B RPL23A RPS23A RPL40A RPL40B RPS14B ARD1 MAP1 NIP7 RPS10A RPL29 RPL28 RPL25 GCN20 RPL15B RPL15A RPS10B RPS0A RPS0B RLI1 RPL34B RPL34A RPL43A RPL43B RPS24B RPS24A FUN12 RPS27A RPL2A RPL2B PAT1 RPL38 RPL39 STM1 RPL32 RPP0 RPL30 RPS1B RPS1A RPL33B RPL12B RPL12A RPL33A

Network Assessor

Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.

Usage (32-bit JVM):

java -Xmx1800M -cp GeneMANIA.jar org.genemania.plugin.apps.NetworkAssessor options

Usage (64-bit JVM):

java -d64 -Xmx3G -cp GeneMANIA.jar org.genemania.plugin.apps.NetworkAssessor options

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--organism name The name or taxonomy id of an organism whose genes should be considered.
--query file-name Perform validation against the gene sets listed in the given file. It must be formatted this way.
--baseline network-list A comma-separated list of network types and/or network names to use as a baseline for comparison. To get a full listing of network names, use the option --list-networks with Query Runner.
--exclude-baseline network-list Optional. A comma-separated list of network types and/or network names to exclude from the --baseline list.
--networks network-list A comma-separated list of network types and/or network names representing the networks to assess. To get a full listing of network names, use the option --list-networks with Query Runner.
--exclude-networks network-list Optional. A comma-separated list of network types and/or network names to exclude from the --networks list.
--folds number Optional. The number of folds to use during cross validation. Defaults to 5.
--min number Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10.
--max number Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300.
--use-go-cache Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300.
--outfile file-name Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console).
--auto-negatives</em> Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file.
--method weighting-method Optional. The weighting method to use when combining the individual networks. Defaults to automatic.
--seed number Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random.
--threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
--verbose Optional. Makes NetworkAssessor print more details about what's happening.

Query File Format:

Network Assessor uses the same query file format as Cross Validator.

Network Importer

Imports network/profile data from a file into a GeneMANIA data set.

Usage (32-bit JVM):

java -Xmx1800M -cp GeneMANIA.jar org.genemania.plugin.apps.NetworkImporter options

Usage (64-bit JVM):

java -d64 -Xmx3G -cp GeneMANIA.jar org.genemania.plugin.apps.NetworkImporter options

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--organism name The name or taxonomy id of an organism whose genes should be considered.
--filename path Path to a file containing either interaction or profile data. Supported types of data include:
  • unweighted networks
    GENE1 \t GENE2
  • weighted networks
    GENE1 \t GENE2 \t SCORE
  • expression profiles
    GENE \t EXPR1 ( \t EXPR2 ... )
  • SOFT-formatted expression profiles (e.g. from GEO)
--name network-name The name of the new network.
--description description Optional. A description of the new network.
--group network-type Optional. The network group to which the new network will be added. If this group does not exist, it will be created. Defaults to other.
--group-description description Optional. A short description for a network group being created. Only applicable when the group specified by --group does not already exist.
--color RRGGBB Optional. The colour of the network group being created. Only applicable when the group specified by --group does not already exist. Defaults to 000000 (i.e. black).
--verbose Optional. Makes NetworkImporter print more details about what's happening.

Validation Set Maker

Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.

Usage (32-bit JVM):

java -Xmx900M -cp GeneMANIA.jar org.genemania.plugin.apps.ValidationSetMaker options

Usage (64-bit JVM):

java -d64 -Xmx3G -cp GeneMANIA.jar org.genemania.plugin.apps.ValidationSetMaker options

Options:

Name Description
--data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28).
--organism name The name or taxonomy id of an organism whose genes should be considered.
--query filename The file where the resulting validation set should be saved.
--db JDBC-connection-string Optional. A JDBC connection string for a GO MySQL database. No other database backends are currently supported. Defaults to EBI's MySQL instance (i.e. jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest?user=go_select&password=amigo)
--branch GO-branch Optional. One of bp, mf, cc, or all, which selects GO categories from the biological process, molecular function, cellular component, or all branches, respectively. Defaults to all.

Common Options

Organisms:

Name Taxonomy Id
A. Thaliana3702
C. Elegans6239
D. Melanogaster7227
H. Sapiens9606
M. Musculus10090
S. Cerevisiae4932

Networks

Networks may be specified by type or by name. To get a full listing of network names, use the option --list-networks.

Available Network Types:

coexp Co-expression
coloc Co-localization
gi Genetic interactions
path Pathway interactions
pi Physical interactions
predict Predicted
spd Shared protein domains
other Networks that don't belong to any of the above types.
all Shorthand for specifying all available networks
preferred Shorthand for coexp,pi,gi. Typically used for cross validation.

Weighting Methods

automatic Default — The networks are weighted such that the query genes interact as much as possible.

Note: This option corresponds to the query gene-based combining method on the web site. If you want the same behaviour as the website's automatic combining method, then omit any combining method options.

average All networks are weighted equally.
average_category Networks are weighted such that each type of network has the same overall weight.
For Organisms With GO Annotations:
bp Networks are weighted in an attempt to reproduce Gene Ontology Biological Process co-annotation patterns.
mf Networks are weighted in an attempt to reproduce Gene Ontology Molecular Function co-annotation patterns.
cc Networks are weighted in an attempt to reproduce Gene Ontology Cellular Component co-annotation patterns.