Journal
/
/
Informatischen Analyse von Sequenzdaten von Batch-Hefe-2-Hybrid-Bildschirme
JoVE Journal
Genetics
A subscription to JoVE is required to view this content.  Sign in or start your free trial.
JoVE Journal Genetics
Informatic Analysis of Sequence Data from Batch Yeast 2-Hybrid Screens

Informatischen Analyse von Sequenzdaten von Batch-Hefe-2-Hybrid-Bildschirme

7,030 Views

09:14 min

June 28, 2018

DOI:

09:14 min
June 28, 2018

3 Views
, , ,

Transcript

Automatically generated

This method can help answer key questions about the nature of protein interactomes. The main advantage of this technique is that all the bioinformatics processing is operated by an easy-to-use interface. This software employs a user-friendly graphic user interface that molecular biologists, cell biologists, and biochemists that don’t have a lot of expertise in bioinformatics can use to easily complete their analysis.

Generally, individuals new to processing sequence data struggle because the computer programs are confusing and cumbersome. We thought making the informatics easy would make the whole technique much more accessible and useful. The first step in this procedure is to download and install the MAPster software program as described in the text protocol.

Next, enter the required files and parameters through the Main tab. Select the appropriate Pairwise button to enter the files for DEEPN analysis. Turn the Pairwise option to Off to run in single read format.

Load files into MAPster by drag-and-drop into the appropriate window. From the indexed genomes listed in the Genome box, select a reference DNA genome source that corresponds to the source of the Y2H prey library inserts. Since HISAT2 supports multi-threading, under the Threads box, indicate the number of computer processes to be devoted to the mapping program.

Specify an output file name, a short yet descriptive name without space or special characters is recommended since this file name will be used throughout the DEEPN process. Using the Open Output Directory button, specify a folder to output the mapped files. Once the appropriate files and parameters have been selected, use the Add to Queue button to add the mapping job to the jobs queue.

Once all the jobs are entered into the job queue, click the Run Queue button. Begin this analysis by opening the DEEPN software. From the main window, select the corresponding prey library information from the top selection box.

Select a folder where the processed files can go by clicking the Work Folder button and navigating to the folder directory. Once a work folder is selected, DEEPN will create three subfolders with the indicated names. For successful analysis, be sure to place the correct SAM files into the correct folders and to know which files have mapped and unmapped reads.

If using sam files containing both mapped and unmapped reads, such as those produced with default settings of the MAPster program, place them in the sam_files folder. Initiate processing by clicking the Gene Count plus Junction Make button. The processing time depends on the size and number of sequence data files and processing speed of the computer used.

Once DEEPN has processed the data, a full set of folders will be created. The next step is to perform analysis in Stat Maker. Open Stat_Maker, and click the Verify Installation button.

If running for the first time, Stat_Maker will automatically install R, JAGS, and Bioconductor by pulling these resources from the internet. Once R, JAGS, and Bioconductor are detected, Stat_Maker will become active and allow further user input. Click the Choose Folder button to navigate to the working folder that DEEPN processed.

Stat_Maker will automatically find and list the files for statistical analysis in the window. Drag and drop the appropriate files from the file list window above into the file windows below for each vector and bait dataset and for each growth conditions, non-selected and selected. Importantly, Stat_Maker requires duplicate datasets for empty vector alone, two samples of non-selected populations, and two samples of selected.

This gives an estimate of variability within the experiment. Click the Run button. Depending on the speed of the computer, computation will take between five to 15 minutes.

The results from the Stat_Maker output are placed in a new subfolder within the main work folder labeled Stat_Maker Results. Review the results. To review the data on each potential candidate, open the DEEPN software, select the corresponding prey library information, and then select the correct working folder using the work folder.

Click the Blast Query button to load a new window. In the top text box, type the gene name or GenBank NM number to select the candidate gene of interest. The gene name corresponds to the name listed in the Stat Maker output file.

Type Enter or Return, which initiates retrieval of the gene of interest. Select which datasets will be used for the analysis using the Select Dataset menus. Typically, these include the vector only and bait samples grown under non-selective conditions and the bait sample grown under selection conditions.

Initially, loading the data and the gene name can take a little while with the Blast Query program, but once the dataset is loaded, it’ll go much faster. Blast_Query will display the fusion points along the sequence of interest and how abundant each fusion point is. This can be displayed both in a table format using the Results tab or a graphic format using the Plot tab.

To export these results to a csv file, click the Save csv button in the top right. From the DEEPN window, click the Read Depth button under Analyze Data. Once the Read Depth window is open, type the NM number into the top text box.

Use the pull-down menu to select the relevant dataset that contains the enriched gene of interest. Use the table on the left and the graphics display on the right to determine how many reads were found in the data that correspond to the gene of interest. The Stat Maker program produces an Excel-viewable file that summarizes the pertinent information needed to identify candidate interacting proteins.

Also shown is the corresponding analysis of whether the plasmids corresponding to the prey candidate contain the proper open reading frame. The Blast Query evaluates every gene and orders the different positions according to their abundance in the dataset quantified by the ppm of the total number of junctions found within the database. In this example, the most abundant position that is In ORF and In Frame is position 867, indicating that nucleotide 867 of the GenBank NCBI Reference Sequence NM_019648 is the start of the prey fragment.

This graphic shows a hypothetical sequence and how to design the five-prime oligo to capture the correct frame and fusion point between the Gal4 activation domain and the prey sequence of interest. This Read Depth window shows how many sequences were found in the data that correspond to the nucleotide positions of the sequence of interest. Once mastered, processing data can be done in 10 to 20 hours, most of the time unattended.

DEEPN screens and the corresponding informatics analysis will pave the way for investigators in a broad range of fields to explore interaction networks with their protein of interest. The MAPster program is designed to facilitate mapping sequence files from DEEPN experiments. It also provides a user-friendly method for mapping sequence files to reference DNA for a variety of applications, such as RNA-Seq or ChIP-Seq.

Although the DEEPN software is specifically built to process batch yeast two-hybrid data, it can also be used to process RNA-Seq data. The Stat Maker program not only predicts which genes give an authentic yeast two-hybrid interaction, but it also collates a number of other parameters, such as reading frame information, allowing investigators to quickly filter their candidate hits. Once candidate interacting proteins are identified, it is important to validate the interactions using standard yeast two-hybrid assays, biochemical pull-down assays, or related methods.

Summary

Automatically generated

Tiefe Sequenzierung der Hefe Bevölkerung ausgewählt für positive Hefe 2-Hybrid Interaktionen potenziell liefert eine Fülle von Informationen über interagierenden Partner Proteine. Hier beschreiben wir den Betrieb von bestimmten Bioinformatik und aktualisierte Individualsoftware Sequenzdaten von solchen Bildschirmen zu analysieren.

Read Article