Analyzing experimental data in PathVisio.
Follow the step by step instructions below (a video will be uploaded soon).
Step 1: Preparation and downloads
Download and start PathVisio
- Go to https://www.pathvisio.org/downloads and download the latest PathVisio release.
- Webstart version: Start PathVisio by clicking on the webstart jnlp file.
- Binary installation: Start PathVisio by running pathvisio.sh (Linux and MacOS) or pathvisio.bat (Windows).
Download and prepare tutorial example data
- Download tutorial data (tutorial_files.zip). Extract this zip file. It contains:
- Experimental dataset file: gcrma_ES_EB-expression.xls
- 8 mouse pathways from Wikipathways: pathways directory
- Identifier mapping database: Mm_lite.bridge – Don’t forget to load the identifier mapping database in PathVisio (Data -> Select Gene Database)!
The data is in the file gcrma_ES_EB-expression.xls. This is an Excel spreadsheet. PathVisio cannot read this data directly, so first you need to open it in Excel and save it as a tab delimited text format (.txt or .csv).
- Open gcrma_ES_EB-expression.xls in Excel. While the file is open in Excel, you can examine the dataset. Can you figure out what the columns mean? How many samples have been measured in this dataset, and how many conditions have been compared?
- Click File -> Save as. Change the drop down menu next to Save as type so that it lists Text (Tab delimited).
- Click Save. A popup will appear warning about incompatible features. Click Yes.
- Close Excel. Select No when it asks to save again, this is not necessary.
Open example pathway in PathVisio
Before we import the data in PathVisio, we can already open a pathway.
- Click File -> Open and select the file Mm_Apoptosis.gpml in the pathway directory that you downloaded at the beginning of the tutorial.
Step 2: Data import in PathVisio
- Select Data -> Import expression data.
- For the input file, select the tab delimited text file you just created.
- Leave the output file as it is.
- Make sure the gene database is Mm_Lite.pgdb (can be found in the tutorial_files folder). Click Next.
- Make sure the data delimiter is set to “tab”. Click Next.
- Make sure the primary identifier column is set to “Probe set”.
- Make sure the system code column is set to “System Code”. Click Next. Data import may take a few minutes.
- When the import is finished, you should see a message like this:
1034 genes were added successfully to the expression dataset 273 exceptions occurred
- You can safely ignore these 273 exceptions, these are caused by unrecognized genes. These are usually unknown genes, and it is unavoidable that a certain percentage of data is unknown.
- When the Finish button becomes clickable, click it. A file named gcrma_ES-EB-expression.pgex has now been created, this stores all gene expression data.
Step 3: Visualization options
- Go to Data -> Visualization options. A dialog window pops up.
- Create a new visualization using the tool icon in the top-right corner. It asks you to type a name, any name will do.
- Enable the check box Text Label. A small panel shows up, but you don’t need to change anything there.
- Enable the check box Expression as color. A panel shows up.
- Scroll a bit down in the table and mark the check box for the item log Fold EB vs ES.
- Create a new color set by clicking the tool icon in the bottom right and selecting New.
- In the new dialog, select the Gradient check box.
Now the gene boxes on the pathway should have changed color based on their expression values. For example down regulated genes are blue, up regulated genes are yellow, and unchanged genes are grey (The coloring depends on the exact color options you selected).
Open other pathways. Which pathway has changed the most?
Step 4: Pathway statistics
- Go to Data -> Statistics. In the Expression field, enter the following:
[Fold EB vs ES] > 1.2 AND [rawp] < 0.05
- In the Pathway directory field, select the pathway directory that comes with the tutorial files you downloaded in the beginning.
- Click Calculate.
After the calculations have finished, you now get a list of pathways, ordered by how many genes that have a fold change of > 1.2 (i.e have increased more than 20% in expression) with a significant rawp value (confidence level 0.05).
In the result table you see the values for r, the number of genes that meet your criterion, and n, the total number of genes in the Pathway. From these values a percentage and a z-score are calculated. Note how the z-score and percentages are related for different pathways.
You can do the same calculation for down-regulated genes by using the following expression:
[Fold EB vs ES] < -1.2 AND [rawp] < 0.05