R Interface
From PathVisio Wiki
Contents |
R interface and package specification
This page provides an overview of the structure of the R interface and the PathVisio R package. The figure below illustrates the structure of the R-interface with a schematic representation of the data flow when performing pathway statistics in PathVisio.
When performing a statistical test, first the pathway and experimental data that were selected by the user are cross-annotated and stored as Java objects. During cross-annotation, Ensembl references are added to the gene-products of the pathways and data. This is required to be able to link these gene-products with each other when working in R, where the annotation database is not available. The resulting Java objects are then transferred via the Java R Interface (JRI) to their R counterparts that are defined by the PathVisio R-package. The data in R can be stored to an R data file for later use. When the data is exported or loaded from a previously stored R data file, the user can choose and configure a statistical function to apply.
The PathVisio R-package defines an object-oriented data structure to represent pathway information and experimental data from PathVisio in R. The objects that represent this data are PathwaySet, Pathway, GeneProduct to represent pathway data, DataSet to represent experimental data and VisioFunction and ResultSet to represent a statistical function and the results thereof respectively. In the next paragraphs, each class will be discussed shortly. Full documentation can be found in the R documentation files included with the R package that can be downloaded from the PathVisio repository .
GeneProduct
Gene-products are represented by the class GeneProduct. A gene-product can contain multiple references, each consisting of an identifier and database code. These references are stored in an n-by-2 matrix, where n is the number of references, the first column contains the identifier and the second column the database code. To increase performance and convenience, references can also be represented as a character vector of the form “identifier:code”. This enables the indexing of lists and arrays by reference, which enables use of the fast lookup and comparison functions in R. The GeneProduct class provides methods to create and convert and compare gene-product objects and their character vector representation.
Pathway and PathwaySet
The Pathway object is used to represent a pathway, which is currently a list of objects of the class GeneProduct. This representation contains sufficient information for current statistical methods, but should be extended when methods are developed that include relationships between pathway-elements in the analysis. The Pathway class provides methods to match lists of gene-product references or objects of the class GeneProduct with the gene-products in the pathway. The PathwaySet object is a list of Pathway objects, used to represent a collection of pathways that is exported from PathVisio.
DataSet
PathVisio exports its experimental data to a DataSet object. This object is a matrix with a row for each gene-product reference and a column for every variable in the data. Additionally, the DataSet class may contain a list that contains the mappings from the reporters to Ensembl genes. This conversion is needed to link gene-product references to the gene-product references in the pathway in case a different database system is used. Both the Pathway and DataSet objects contain a method called asEnsembl to represent all containing references as Ensembl gene. In case a reference represents multiple Ensembl genes, the corresponding row of the DataSet is duplicated for every Ensembl gene.
ResultSet
In order to import the results of a statistical test back into PathVisio, the data structure for these results is defined in the ResultSet class. Generally, statistical tests on pathways result in several numerical values per pathway, such as a p-value. Therefore a DataSet object is a matrix with a row for every pathway and a column for every statistic that needs to be shown in PathVisio. Internally an additional column is used for the filename of the pathway, needed to re-open the pathway in PathVisio for further analysis. These filenames, as well as the pathway names, which are internally stored as rownames, can be obtained using the fileNames and pathwayNames methods. VisioFunction
Functions to be used in PathVisio by end-users can be created by instantiating an object of the class VisioFunction. This class extends the function type of R and includes additional metadata that is used to create the user interface for configuring the function in PathVisio. Examples of this metadata are the name, description and argument types. PathVisio creates a configurable user-interface element for every argument by creating a text-box or combo that lists all variables of the argument type that are present in the R workspace (B).
Java counterparts
Every R object has a java counterpart that is used to export data from PathVisio to R or in case of the VisioFunction and ResultSet objects, to display R data in PathVisio. The java counterpart of VisioFunction can be extended to override the automatic generation of the user interface for configuration of the function. Custom user interface elements can be implemented for the arguments for more user-friendly configuration of the function. The Java counterpart of ResultSet is displayed in as a table in the side-panel, where the rows represent pathways and the columns the different statistics. Using the fileName slot in ResultSet, pathways can be directly opened in PathVisio by clicking on a row.
