Loading Tabular Data

Introduction

This tutorial will take you through the steps necessary to import tables into Genome Workbench. Tables can be any file that has a fixed number of fields in each row and has well-defined separators, such as tabs, whitespace or commas, between those fields. This can be very useful for user-created files, such as data exported from Microsoft Excel (tm). When a file is in a format already supported by Genome Workbench, such as a BED file, the file can be opened directly by selecting File on the open dialog rather than going through table import.

Select and Preview the Table
Select Field Delimiters
Identify Field Types
Transform and Save Results
View Transformed Tables

The following file will be used as the sample dataset for this tutorial:

sample_table.txt

Step 1: Select and Preview the Table

To start, from the File menu choose Open and select Table from the left side.

This opens the table-import wizard. On the first page under Filename:, click the file selection button and use the file browser to select sample_table.txt. You will immediately see the contents of the file under Table Preview at the bottom of the page.

Notice the option Table Data Format on this screen. It's telling you it thinks this is a delimited table. This option lets you decide how to import the table, as a Delimited table or a Fixed Width table. Most of the tables we deal with should be imported as delimited.

The next option, Import data starting with row: lets you skip the first rows of the table because, for example, they are comment rows. Just set the number of the first row you want to import here. The following field, Comment Character can be set to a single character that precedes all of the header rows. It may be filled in automatically after you select the file, but you can change it if you wish.

The last option, Parse Column Names from Selected Row lets you click on a row in the table header and the importer will then parse column names from that row using the same rules it uses for parsing the other rows. Header names can also be entered manually.

Step 2: Select Field Delimiters

After selecting Next > on the preview screen, you you will able to select the delimiters for separating the table rows into fields. When the file was read in, the table loader made a "best guess" of the delimiters and that will be shown on this page. To change them, click on the combination of delimiters and watch the results under Table Data to see which combination separates the fields correctly.

In addition to choosing the delimiters you can choose whether to Treat adjacent delimiters as a single delimiter. What this means is that if your delimiter is a tab, and you have two adjacent tabs with no text in between, will that create a blank field or will it be treated as a single break between fields. You can also choose whether there is a quote-character for defining strings. If you choose this option and your delimiter is a comma, for example, than any commas that appear between quotes will not be treated as field separators.

If on the first screen you indicated that the Table Data Format was a Fixed Width format, then you will be taken to a different screen for dividing rows into fields. On this screen you split existing columns in two by clicking on Insert Column and then clicking on the column to be split. To remove a column, click on Remove Column and then click on the column to be removed. After you add and remove columns you can move the column dividers around until the fields are properly divided.

Step 3: Identify Field Types

After leaving the delimiter or fixed width pages that divided the rows into fields, the table importer takes its best guess as to the underlying data and semantic types for each column. On this page you can review and update the column names, data type and properties.

To update the name and properties for a column, click on the column name in the table at the bottom of the page. When you then pick a Data Type such as Sequence ID or Integer the Properties list will change to show options for that type. If you select an Integer type you need to indicate whether the value is One-Based or not. The standard convention in Biological data is for data to be one-based, e.g. the first nucleotide in a sequence is position one. If a field is 0-based, the more common programming convention, then the first element of the set would be position 0. Lastly under types, if you do not want to import a column, select Skipped.

If one or more ID columns in the table represent a chromosome number or letter, then select an assembly for the column so that an accession can be found for the ID in the next import step. To choose an assembly for a column, click on the column header, then enter a search term to find the correct genome and then click Find Assembly.

Find Assembly brings up the Select Assembly selection dialog. Select the correct assembly here for the current column. Note that if you have chromosome identifiers in more than one column, you can assign a different assembly to each of these ID columns.

After all the column name, types and, if needed, assemblies have been specified for the columns hit Next > to go to the transformations page.

Step 4: Transform and Save Results

The transformation page allows you to save your table and, if the proper fields are available, to specify locations or features in order to link your table entries to other data. The three transformation and save options are:

Convert to Feature Table
Create Locations
Keep Table Unchanged

The last transformation option, Keep Table Unchanged saves your table directly as an ASN file with the column names and types you specified on the previous Identify Field Types page. This is the only option that you will be able to use if your table does not include the ID, start position and stop position columns needed to create features or locations.

The Convert to Feature Table transformation option adds and updates fields as needed so that the rows can be viewed as region features in Genome Workbench. If none of the columns were identified as a region field, one is added automatically. An ID, start, and stop column are also required. If there is no strand column, one is added based on whether stop is greater than or equal to start (positive) or start is greater than stop (negative). Lastly, if more than one ID column has been identified, you will not be able to create region features since these features can only refer to one region per row. To fix this, go back to the column types page and set the type of the column(s) you do not want to see to be Text or Integer rather than Sequence ID.

The Create Locations option does not change any of the existing columns but it does add a new location column to the table for each set of ID, start and stop fields in the table. When you open the resulting ASN file in Genome Workbench you will be able to click on the locations and jump to the corresponding positions in the Graphical View.

Once you have chosen a Transformation Type click Next > to save the resulting ASN file which you will then be able to open as a table in Genome Workbench.

This window also gives you the option to save your transformation parameters for the table by selecting Save Import Parameters. This option is provided for debugging and for future use.

Step 5: View Transformed Tables

Once you have saved the transformed table you will see it in your project tree and you will be able to open it as a Generic Table View or as a Text View. If you specified a feature table in your transformation step, then you can also open the the corresponding sequence in the same folder of the project tree and see the features on the sequence. Note that to see region features in the Graphical View, you also need to select the display option Other features as shown in the image.

If you added locations to your table, you will see the location field(s) in the Generic Table View on the right hand side of the table. Double-clicking on the locations will cause the corresponding sequence to open at the specified location.

If the table was saved without changes then the fields should appear just as they did in the original file, minus any fields that you chose not to import.

Download

Current Version is 2.6.0 (released August 31, 2012)

Release Notes
Windows
- Windows 7 (64-bit)
- Windows 2000/XP/Vista
Mac OS X
- Mac OS X 10.6+ 64-Bit (Snow Leopard, Lion, Mountain Lion)
Linux (Ubuntu 10.04 LTS (Lucid Lynx))
Source
Older Versions

Genome Workbench

Genome Workbench

Loading Tabular Data

Introduction

Step 1: Select and Preview the Table

Step 2: Select Field Delimiters

Step 3: Identify Field Types

Step 4: Transform and Save Results

Step 5: View Transformed Tables

Download

Simple NCBI Directory

Getting Started

Resources

Popular

Featured

NCBI Information