Working with Non-Public Data

Step 1: Introduction

This tutorial demonstrates two different ways to manage private data in Genome Workbench.

  • You've created your own sequence and want to work with it in Genome Workbench
  • You want to view your own data/annotation on a publicly available sequence

We'll demonstrates using many of the Genome Workbench tools on data not found in the NCBI databases.

It is recommended that you complete Tutorial 1: Basic Operation first.

Here's a link to the sample data you'll need to complete this tutorial - BX530088_BX572102.

Step 2: Getting Started

For the first exercise, we're going to do the following:

  • Load a user-generated AGP file (download sample)
  • SPLIGN some mRNAs on that AGP sequence
  • Create a FASTA file from the AGP
  • BLAST that FASTA sequence to see what's related to it
  • WindowMask that FASTA sequence (or part of it) to look for repetitive regions

Genome workbench starts up and displays the main screen. From here choose File->Open from the main menu to load your data file. Gbench understands many different file formats and for this step choose BX530088_BX572102.comp.agp from the data files downloaded. Click Next and then Next again to accept the defaults. Then click Finish to add the data file to a new project.

Now that your data is loaded, you can view it by selecting the data in the project tree, right clicking and choosing Open View. Then choose Graphical View. While this isn't very interesting you can zoom in to see the sequence.

Step 3: Apply a tool to private data

Now let's align an mRNA to our sequence. We will use the SPLIGN tool. SPLIGN, or SPLiced Aligner, is a global alignment tool used in NCBI's annotation pipeline. Search the NCBI Public Databases for NM_020137.3 and add it to the project. Then in the data folder, select both entries. With both chosen, select Tools->Run Tool to open the Tools dialog and choose SPLIGN and Next.

Select BX530088... for the genomic sequence and NM_020137.3 for the Transcript Sequences and click Next

Add the results to the existing project and click Finish.

Step 4: Export a FASTA file

Select the data file in the Project Tree View we loaded previously. Right click (control click in the Mac OS) on the selected data and choose export. Select FASTA as the format, select a location and give the file a file name. Click Finish. Export FASTA

Now, open the FASTA file you just created. Choose File->Open. Select the file and click Next. Accept the default settings and click Next again. Choose to create a new project and click Finish.

Select the FASTA data in the Project Tree View and double click it. From the Open View menu choose Graphical View.

Step 5: Alignment

From the Graphical View of the FASTA sequence, use region selection to select the entire sequence. Click and drag in the number line at the top of the view to begin the selection. Once you have a region selected, click on the edges and stretch it to the boundaries of the view.

Region Select

Region Select Complete

With the entire region selected, choose Run Tool (Tools->Run Tool from the main menu, or Right Click (control-click on the Mac OS)). From the Run Tool dialog choose BLAST Search and click next. Run Tool BLAST

In the BLAST Search dialog ensure you've selected the Nucleotides Option, BLASTn from the Program menu, and BLAST Human Sequences/genome (all assemblies) from the Database menu and click Next.

BLAST Params 1

From the next dialog, accept the general parameters and check the Filter low complexity regions and select Human from the Species specific repeats for: menu. Then click Next. Choose to add the results to the existing project and click Finish.

BLAST Filter Params

It can take 30 seconds for the analysis to return and present the results.

Step 6: WindowMasker

In this step we'll use WindowMasker on the FASTA sequence to look for repetitive regions. The FASTA file should still be available in the project tree view. Select it, double click and open a graphical view. Select the region by clicking in the number line and dragging a selection around a region.

Region Selection

 Choose Tools->Run Tool from the main menu. Then select Search/Find Repetitive Sequences with WindowMasker and click Next. Ensure that our sequence is selected (BX530088...), select 9606 Homo sapiens from the Mask using parameters for menu. Then click Next. Choose a project to add the results to and click Finish. It can take 60-90 seconds for the job to complete.

The result is a histogram showing regions of repeats. You can scroll and zoom just like you would any other view.

Window Masker Selection

 If the histogram doesn't appear automatically, select the content menu at the bottom of the graphical view and choose Repeat Region (see figure).

Show repeat regions

Step 7: Conclusion

There are many, many ways to use Genome Workbench and this only shows some very simple examples. It gives you enough background to start to explore your data in new and interesting ways. It gives you the privacy you need along with the access to public data desired.

Last updated: 2012-08-27T11:15:20-04:00