GeneDatabaseLayout
From PathVisio Wiki
Contents |
Synonym Database Schema
Background
We call gene databases in PathVisio PGDB's, which stands for PathVisio Gene Database (although they are not restricted to gene information). They are similar to GenMAPP's gene databases. Both are distributed as standalone files, PGDB's get a .pgdb extension.
Java Code
To see how this is used in practice, take a look at the Java code at our subversion repository (http://svn.bigcat.unimaas.nl/pathvisio). In particular, take a look at these classes
- org.pathvisio.data.Gdb -> interface for the rest of the application for contacting gene databases. Theoretically, this could be implemented in something other than a relational database, e.g. via a webservice.
- org.pathvisio.data.SimpleGdb -> simple jdbc-based implementation of the Gdb interface
- org.pathvisio.data.DBConnector -> interface for a glue layer to work with different database implementations (e.g. MySQL, server, etc)
- org.pathvisio.data.DataDerby -> Derby-based implementation of DBConnector, can handle jar'ed single-file .pgdb files
Release process
Since pgdb's are intended to be released as stand-alone files, they store a schema version number so that we can write code to provide backwards compatibility. Especially gene databases lose their value with age (old annotation etc.) so we do expect users to upgrade reasonably quickly.
Schema Version 2.0
DataNode
Table to store datanode information such as identifier, database code and description For a list of database codes, see [1]
Table structure
| column | type | key | description |
|---|---|---|---|
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| backpageText | varchar(800) | backpage text (html formatted) |
NOTE: This table was renamed from "Gene" for schema version 2
NOTE: backpageText is deprecated and will probably disappear in the future. The attribute table can take over this function.
Indexes
| name | column(s) | generated |
|---|---|---|
| id, code | automatic | |
| I_DNCODE | code | manual |
| I_DNID | id | manual |
Link
Table to link datanode identifiers from different databases with each other.
Table structure
| column | type | key | description |
|---|---|---|---|
| idLeft | varchar(50) | PRI | left datanode identifier |
| codeLeft | varchar(50) | PRI | left database code identifier |
| idRight | varchar(50) | PRI | right datanode identifier |
| codeRight | varchar(50) | PRI | right database code identifier |
Thomas 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers
Indexes
| name | column(s) | generated |
|---|---|---|
| idLeft, codeLeft, idRight, codeRight | automatic | |
| I_CODELEFT | codeLeft | manual |
| I_RIGHT | idRight, codeRight | manual |
Attribute
Table used to store attributes of a datanode
Table structure
| column | type | key | description |
|---|---|---|---|
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| attrName | varchar(50) | PRI | attribute name (key) |
| attrValue | varchar(255) | attribute value |
We can add or remove any type of attribute to this table without breaking anything. Note: this table is new in Schema v2.
Indexes
| name | column(s) | generated |
|---|---|---|
| id, code, attrName | automatic |
Info
Table used to store the version of the database schema, used for compatibility checks in PathVisio This table should have only one row.
Table structure
| column | type | key | description |
|---|---|---|---|
| schemaversion | integer | database schema version, in this case 2 | |
| builddate | integer | date of database generation, format: yyyymmdd |
Schema Version 1.0
Gene
Table to store datanode information such as identifier, database code and description
| column | type | key | description |
|---|---|---|---|
| id | varchar(50) | PRI | datanode identifier |
| code | varchar(50) | PRI | identifier database code |
| bpText | varchar | backpage text (html formatted) |
Link
Table to link datanode identifiers from different databases with each other.
| column | type | key | description |
|---|---|---|---|
| idLeft | varchar(50) | PRI | left datanode identifier |
| codeLeft | varchar(50) | PRI | left database code identifier |
| idRight | varchar(50) | PRI | right datanode identifier |
| codeRight | varchar(50) | PRI | right database code identifier |
| bridge | varchar(50) | GenMAPP legacy, unused by PathVisio |
Thomas 00:41, 7 June 2007 (CEST): Currently the 'left' columns are only used to store Ensembl identifiers
Info
Table used to store the version of the database schema, used for compatibility checks in PathVisio
| column | type | key | description |
|---|---|---|---|
| schemaversion | integer | PRI | database schema version, in this case 1 |
Proposed changes for Version 3.0
- Suggestion by John Dionisio: 64-bit integers as primary (local) keys -> faster and more efficient
- Store species information in info table for automatic database selection?
- Bridge column is already dropped in schema version 2 - thomas 12:20, 23 July 2008 (CEST)
- Drop backpageText column?
