Cytoscape tools

Command line tools reference documentation

Installation

First, you need to download the GeneMANIA JAR file. If you already installed the plugin through Cytoscape, you can find it in one of the following places:

  • Unix/Mac: ~/.cytoscape/Cytoscape Version/plugins/GeneMANIA-Version/
  • Windows: My Documents\.cytoscape\Cytoscape Version\plugins\GeneMANIA-Version\

Second, you need a data set. If you’ve used the Cytoscape plugin to perform predictions, you’ll already have one installed in one of these locations:

  • Unix/Mac: ~/genemania_plugin/gmdata-id/
  • Windows: My Documents\genemania_plugin\gmdata-id\

Otherwise, you can use Data Admin to install one of the available data sets from genemania.org.

Available Tools

Data Import & Management
Data Admin Downloads and manages GeneMANIA data sets from genemania.org.
Gene Sanitizer Prints out the mappings between the given gene list and GeneMANIA’s preferred identifiers.
Id Importer Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels.
Network Importer Imports network/profile data from a file into a GeneMANIA data set.
Prediction
Query Runner Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file.
Validation
Cross Validator Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.
Network Assessor Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.
Validation Set Maker Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.

Data Admin

Downloads and manages GeneMANIA data sets from genemania.org. Each data set consists of multiple organisms which are identified by their data-id. Organisms can be installed and removed individually as needed.

Commands:

    • list: Lists available data sets.

      Usage:

      list

      Example:

      $ java -jar GeneMania.jar DataAdmin list
      Data Set ID	Total Size	Database Version
      2013-10-15	9351.08 MB	15 October 2013
      2013-10-15-core	2059.38 MB	15 October 2013
      2013-10-15-open_license	9324.49 MB	15 October 2013
      2012-08-02	5994.14 MB	19 July 2012
      2012-08-02-core	1764.09 MB	19 July 2012
      2012-08-02-open_license	5963.38 MB	19 July 2012
      ...
    • install: Installs the infrastructure for the given data set ID without actually installing any organism data.

      Usage:

      install data-set-id

      …where data-set-id is one of the IDs given by the list command above.

      Example:

      $ java -jar GeneMania.jar DataAdmin install 2013-10-15-core

      This example will download the data set into the directory gmdata-2013-10-15-core in the current directory.

    • list-data: Lists the data available for download for a particular data set.

      Usage:

      list path/to/data/set

      …where path/to/data/set is the path to a data set downloaded by the Cytoscape plugin, or the install command above.

      Example:

      $ java -jar GeneMania.jar DataAdmin list-data gmdata-2013-10-15-core
      Data ID	Description	Status
      1	A. thaliana Arabidopsis (424 MB)	
      2	C. elegans Worm (141 MB)	
      3	D. melanogaster Fly (237 MB)	
      4	H. sapiens Human (413 MB)	
      5	M. musculus Mouse (412 MB)	
      6	S. cerevisiae Baker's yeast (148 MB)	
      7	R. norvegicus Rat (154 MB)	
      8	D. rerio Zebrafish (126 MB)	
      
    • install-data: Downloads and installs data with the given ID from genemania.org.

      Usage:

      install-data path/to/data/set data-id [data-id ...]

      …where path/to/data/set is the path to a data set downloaded by the Cytoscape plugin, or the install command above; and data-id is one of the IDs given by the list-data command, or all, which is an alias for all available data for the given data set.

      Example: Installing yeast data

      $ java -jar GeneMania.jar DataAdmin install-data gmdata-2013-10-15-core 6

Example: Installing all data for 2013-10-15-core

$ java -jar GeneMania.jar DataAdmin install-data gmdata-2013-10-15-core all
  • uninstall-data: Deletes previously installed data from a data set.

    Usage:

    uninstall-data path/to/data/set data-id [data-id ...]

    …where path/to/data/set is the path to a data set downloaded by the Cytoscape plugin, or the install command above; and data-id is one of the IDs given by the list-data command.

    Example: Uninstalling human data

    $ java -jar GeneMania.jar DataAdmin uninstall-data gmdata-2013-10-15-core 4

Gene Sanitizer

Prints out the mappings between the given gene list and GeneMANIA’s preferred identifiers. This tool is useful for checking which of your genes are recognized by GeneMANIA. The output is a tab-delimited text file containing one mapping per line. The first item is GeneMANIA’s preferred identifier, or nothing, if the identifier that follows isn’t recognized.

Usage:

java -Xmx900M -jar GeneMANIA.jar GeneSanitizer options gene-list-file

Example Gene List:

YMR043W
YPR113W
YCL067C
YIL015W
YNOT?
YCR084C
YFL026W
YHR084W
YGL008C
YNL145W

Example Output:

YMR043W	MCM1
YPR113W	PIS1
YCL067C	HMLALPHA2
YIL015W	BAR1
YNOT?	
YCR084C	TUP1
YFL026W	STE2
YHR084W	STE12
YGL008C	PMA1
YNL145W	MFA2

Options:

Name Description
–data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–organism name The name or taxonomy id of an organism whose genes should be considered.

Id Importer

Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels. Although the resulting data set is generally treated like an organism, where the given ids denote its genome, it does not have to be an organism. The identifiers can be anything, as long as they’re unique within the data set.

Usage:

java -Xmx900M -jar GeneMANIA.jar IdImporter options

Options:

Name Description
–data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–filename file-name The path to a file that contains a complete set of identifiers that will serve as the basis of a new data set. Each line in the file should follow this format:

primary-id ( \t alias-1 ... )
–name entity-name The name of the resulting entity (e.g. organism).
–alias entity-name Optional. An alias for the resulting entity (e.g. shorter, informal name)
–taxid number Optional. The taxonomy id of the resulting entity, if applicable.
–description description Optional. A description of the resulting entity.

Network Importer

Imports network/profile data from a file into a GeneMANIA data set.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar NetworkImporter options

Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar NetworkImporter options

Options:

Name Description
–data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–organism name The name or taxonomy id of an organism whose genes should be considered.
–filename path Path to a file containing either interaction or profile data. Supported types of data include:

  • unweighted networks
    GENE1 \t GENE2
  • weighted networks
    GENE1 \t GENE2 \t SCORE
  • expression profiles
    GENE \t EXPR1 ( \t EXPR2 ... )
  • SOFT-formatted expression profiles (e.g. from GEO)
–name network-name The name of the new network.
–description description Optional. A description of the new network.
–group network-type Optional. The network group to which the new network will be added. If this group does not exist, it will be created. Defaults to other.
–group-description description Optional. A short description for a network group being created. Only applicable when the group specified by --group does not already exist.
–color RRGGBB Optional. The colour of the network group being created. Only applicable when the group specified by --group does not already exist. Defaults to 000000 (i.e. black).
–verbose Optional. Makes NetworkImporter print more details about what’s happening.

Query Runner

Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar QueryRunner options query-file-1 [ query-file-2 ... ]

Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar QueryRunner options query-file-1 [ query-file-2 ... ]

Options:

Name Description
–data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–in input-format Optional. The format of the query files, which can be one of:

  • flat (default): Tab-delimited (example).
  • xml: Not yet supported.
–out output-format Optional. The format of the output files, which can be one of:

  • genes (default): List of result genes ordered by score; one per line.
  • flat: Tab-delimited report containing details of prediction results and query parameters.
  • xml: XML-formatted report containing details of prediction results and query parameters.
  • scores: List of result genes with scores ordered by score for the entire genome (ignores related genes limit); one per line.
–scoring-method method Optional. The method used to compute the gene scores, which can be one of:

  • discriminant (default): GeneMANIA’s classic scoring method.
  • z: Z-scores.
–ids id-types Optional. A comma-separated list of identifier types, in descending order of preference, which may be one or more of the following:

  • Ensembl Gene Name
  • Entrez Gene Name
  • Ensembl Gene ID
  • RefSeq mRNA ID
  • TAIR ID
  • Uniprot ID
  • Uniprot AC
  • RefSeq Protein ID
  • Ensembl Protein ID
  • Entrez Gene ID

If the most preferred identifier is not available for a given gene, the next most preferred identifier is selected. The list above reflects the default order of preference.

–results directory Optional. Path to where the prediction result files will be created (one per input query file). Defaults to the current working directory.
–threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
–verbose Optional. Makes QueryRunner print more details about what’s happening.
–list-networks organism-name Optional. Lists the available networks for the given organism. You may need to put quotes around the organism name if invoked from a shell.
–list-genes organism-name Optional. Lists the genes that are recognized for the given organism. You may need to put quotes around the organism name if invoked from a shell. Each line in the output contains a gene and all its synonyms, if any.

Example Query (Flat):

yeast-example.query
S. Cerevisiae
CDC27	APC11	APC4	XRS2	RAD54	APC2	RAD52	RAD10	MRE11	APC5
coexp	pi	gi
150
bp

Flat Query File Format:

organism-name 
query-gene-1 [ \t query-gene-2 ... ]
networks 
related-gene-limit
[ combining-method ]

Cross Validator

Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar CrossValidator options

Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar CrossValidator options

Options:

Name Description
–data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–organism name The name or taxonomy id of an organism whose genes should be considered.
–query file-name Perform validation against the gene sets listed in the given file. It must be formatted this way.
–networks network-list A comma-separated list of network types and/or network names. To get a full listing of network names, use the option --list-networks with Query Runner.
–exclude-networks network-list Optional. A comma-separated list of network types and/or network names to exclude from the --networks list.
–folds number Optional. The number of folds to use during cross validation. Defaults to 5.
–min number Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10.
–max number Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300.
–use-go-cache Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300.
–outfile file-name Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console).
–auto-negatives Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file.
–method weighting-method Optional. The weighting method to use when combining the individual networks. Defaults to automatic.
–seed number Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random.
–threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
–verbose Optional. Makes CrossValidator print more details about what’s happening.

Query File Format:

Multiple gene sets may be used during cross validation. Each gene set should be on its own line using the format below:

GENE_SET_ID \t + \t gene_symbol1 [ \t gene_symbol2 ... ] [ \t - \t neg_gene_symbol1 [ \t neg_gene_symbol2 ... ] ]

…where GENE_SET_ID is the name of your gene set, gene_symbol is a positive gene example, and neg_gene_symbol is a negative gene example (i.e. definitely not a member of the gene set).

If --use-go-cache is also specified, the query file should contain one GO id per line (e.g. GO:0005786).

Example Query File (S. Cerevisiae):

This query file only lists positive examples of genes. Use the option --auto-negatives to automatically label all other genes in each set as negative examples.

GO:0005786      +       SCR1    SRP54   SEC65   SRP14   SRP68   SRP21   SRP72
GO:0022626      +       RPS21A  RPS21B  HEF3    RPS8B   RDN18-2 RDN18-1 RPL9A   RPL9B   RPS11B  RPS11A  RPS29A  RPS29B    RPS14A  RPL1A   RPL1B   YGR054W RPS19B  RPS19A  RPS6B   RDN5-1  RDN5-2  RDN5-3  RDN5-4  RDN5-5  RDN5-6  RPL24B    RPL8B   RPL8A   RPL24A  RPS22A  RPS12   RPS22B  RPL18A  FES1    RPL10   RPS8A   RPL41A  RPL42A  ASC1    RPS18A    RPS18B  SQT1    RPL14A  RPL31A  RPL31B  RPL14B  RPS2    RPL37B  RPL16B  RPL16A  RPL37A  RPS17A  RPS17B  RPS27B    RPL27B  RPL27A  RPL5    RPL3    RPL7B   RPL7A   NMD3    RPL41B  RPL11B  RPL11A  RPP2A   TIF5    RPP2B   RPL20B    RPL20A  RPS16B  RPL17A  RPL17B  RPS16A  RPL26A  RPL26B  RPS7A   RPL6A   RPL6B   RPS28B  RPS28A  RDN25-1 TEF1      SIS1    RRP14   RPS31   REI1    RDN25-2 JJJ1    RPL42B  RPL35A  RPL35B  RPL18B  RPS5    RPS3    RPS25A  RPS25B    RPS15   RPL13A  RPL13B  RDN58-2 RDN58-1 RPS9B   RPL22A  RPL22B  RPS9A   RPL36A  RPS4A   RPS4B   RPL36B  RPS30B    RPS20   RPS30A  RPS26A  NAT1    RPS26B  RPL19B  NAT5    RPL19A  GCN1    GCN2    RPS7B   RPS6A   RPL4B   RPL4A     ARX1    RPL21A  RPL21B  RPS13   RPP1A   RPP1B   RPS23B  RPL23B  RPL23A  RPS23A  RPL40A  RPL40B  RPS14B  ARD1      MAP1    NIP7    RPS10A  RPL29   RPL28   RPL25   GCN20   RPL15B  RPL15A  RPS10B  RPS0A   RPS0B   RLI1    RPL34B    RPL34A  RPL43A  RPL43B  RPS24B  RPS24A  FUN12   RPS27A  RPL2A   RPL2B   PAT1    RPL38   RPL39   STM1    RPL32     RPP0    RPL30   RPS1B   RPS1A   RPL33B  RPL12B  RPL12A  RPL33A

Network Assessor

Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar NetworkAssessor options

Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar NetworkAssessor options

Options:

Name Description
–data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–organism name The name or taxonomy id of an organism whose genes should be considered.
–query file-name Perform validation against the gene sets listed in the given file. It must be formatted this way.
–baseline network-list A comma-separated list of network types and/or network names to use as a baseline for comparison. To get a full listing of network names, use the option --list-networks with Query Runner.
–exclude-baseline network-list Optional. A comma-separated list of network types and/or network names to exclude from the --baseline list.
–networks network-list A comma-separated list of network types and/or network names representing the networks to assess. To get a full listing of network names, use the option --list-networks with Query Runner.
–exclude-networks network-list Optional. A comma-separated list of network types and/or network names to exclude from the --networks list.
–folds number Optional. The number of folds to use during cross validation. Defaults to 5.
–min number Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10.
–max number Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300.
–use-go-cache Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300.
–outfile file-name Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console).
–auto-negatives Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file.
–method weighting-method Optional. The weighting method to use when combining the individual networks. Defaults to automatic.
–seed number Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random.
–threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
–verbose Optional. Makes NetworkAssessor print more details about what’s happening.

Query File Format:

Network Assessor uses the same query file format as Cross Validator.

Validation Set Maker

Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.

Usage (32-bit JVM):

java -Xmx900M -jar GeneMANIA.jar ValidationSetMaker options

Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar ValidationSetMaker options

Options:

Name Description
–organism name The name or taxonomy id of an organism whose genes should be considered.
–query filename The file where the resulting validation set should be saved.
–db JDBC-connection-string Optional. A JDBC connection string for a GO MySQL database. No other database backends are currently supported. Defaults to EBI’s MySQL instance (i.e. jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest?user=go_select&password=amigo)
–branch GO-branch Optional. One of bp, mf, cc, or all, which selects GO categories from the biological process, molecular function, cellular component, or all branches, respectively. Defaults to all.

Common Options

Organisms:

Name Taxonomy Id
A. Thaliana 3702
C. Elegans 6239
D. Melanogaster 7227
H. Sapiens 9606
M. Musculus 10090
S. Cerevisiae 4932
R. Norvegicus 10116

Networks

Networks may be specified by type or by name. To get a full listing of network names, use the option --list-networks.

Available Network Types:

coexp Co-expression
coloc Co-localization
gi Genetic interactions
path Pathway interactions
pi Physical interactions
predict Predicted
spd Shared protein domains
other Networks that don’t belong to any of the above types.
default The default set of networks used by the Cytoscape plugin and genemania.org.
all Shorthand for specifying all available networks
preferred Shorthand for coexp,pi,gi. Typically used for cross validation.

Weighting Methods

automatic Default — The networks are weighted such that the query genes interact as much as possible.

Note: This option corresponds to the query gene-based combining method on the web site. If you want the same behaviour as the web site’s automatic combining method, use automatic_relevance.

automatic_relevance A weighting method is chosen based on your query. This is the same behaviour as the “Automatically selected weighting method” option on the web site.
average All networks are weighted equally.
average_category Networks are weighted such that each type of network has the same overall weight.
For Organisms With GO Annotations:
bp Networks are weighted in an attempt to reproduce Gene Ontology Biological Process co-annotation patterns.
mf Networks are weighted in an attempt to reproduce Gene Ontology Molecular Function co-annotation patterns.
cc Networks are weighted in an attempt to reproduce Gene Ontology Cellular Component co-annotation patterns.