GeneDatabaseLayout

From PathVisio Wiki

Jump to: navigation, search

Contents

Synonym Database Schema

Background

We call gene databases in PathVisio PGDB's, which stands for PathVisio Gene Database (although they are not restricted to gene information). They are similar to GenMAPP's gene databases. Both are distributed as standalone files, PGDB's get a .pgdb extension.

Java Code

To see how this is used in practice, take a look at the Java code at our subversion repository (http://svn.bigcat.unimaas.nl/pathvisio). In particular, take a look at these classes

  • org.pathvisio.data.Gdb -> interface for the rest of the application for contacting gene databases. Theoretically, this could be implemented in something other than a relational database, e.g. via a webservice.
  • org.pathvisio.data.SimpleGdb -> simple jdbc-based implementation of the Gdb interface
  • org.pathvisio.data.DBConnector -> interface for a glue layer to work with different database implementations (e.g. MySQL, server, etc)
  • org.pathvisio.data.DataDerby -> Derby-based implementation of DBConnector, can handle jar'ed single-file .pgdb files

Release process

Since pgdb's are intended to be released as stand-alone files, they store a schema version number so that we can write code to provide backwards compatibility. Especially gene databases lose their value with age (old annotation etc.) so we do expect users to upgrade reasonably quickly.

Schema Version 2.0

DataNode

Table to store datanode information such as identifier, database code and description For a list of database codes, see [1]

Table structure

column type key description
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
backpageText varchar(800) backpage text (html formatted)

NOTE: This table was renamed from "Gene" for schema version 2

NOTE: backpageText is deprecated and will probably disappear in the future. The attribute table can take over this function.

Indexes

name column(s) generated
id, code automatic
I_DNCODE code manual
I_DNID id manual

Link

Table to link datanode identifiers from different databases with each other.

Table structure

column type key description
idLeft varchar(50) PRI left datanode identifier
codeLeft varchar(50) PRI left database code identifier
idRight varchar(50) PRI right datanode identifier
codeRight varchar(50) PRI right database code identifier

Thomas 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers

Indexes

name column(s) generated
idLeft, codeLeft, idRight, codeRight automatic
I_CODELEFT codeLeft manual
I_RIGHT idRight, codeRight manual

Attribute

Table used to store attributes of a datanode

Table structure

column type key description
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
attrName varchar(50) PRI attribute name (key)
attrValue varchar(255) attribute value

We can add or remove any type of attribute to this table without breaking anything. Note: this table is new in Schema v2.

Indexes

name column(s) generated
id, code, attrName automatic

Info

Table used to store the version of the database schema, used for compatibility checks in PathVisio This table should have only one row.

Table structure

column type key description
schemaversion integer database schema version, in this case 2
builddate integer date of database generation, format: yyyymmdd

Schema Version 1.0

Gene

Table to store datanode information such as identifier, database code and description

column type key description
id varchar(50) PRI datanode identifier
code varchar(50) PRI identifier database code
bpText varchar backpage text (html formatted)

Link

Table to link datanode identifiers from different databases with each other.

column type key description
idLeft varchar(50) PRI left datanode identifier
codeLeft varchar(50) PRI left database code identifier
idRight varchar(50) PRI right datanode identifier
codeRight varchar(50) PRI right database code identifier
bridge varchar(50) GenMAPP legacy, unused by PathVisio

Thomas 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers

Info

Table used to store the version of the database schema, used for compatibility checks in PathVisio

column type key description
schemaversion integer PRI database schema version, in this case 1

Proposed changes for Version 3.0

  • Suggestion by John Dionisio: 64-bit integers as primary (local) keys -> faster and more efficient
  • Store species information in info table for automatic database selection?
    • Bridge column is already dropped in schema version 2 - thomas 12:20, 23 July 2008 (CEST)
  • Drop backpageText column?
Personal tools