ALOHA - Academia Sinica

advertisement
Version 1.2 (Dec, 2011)
U
Usseerr gguuiiddee ooff ssooffttw
waarree A
AL
LO
OH
HA
A
Hsin-Chou Yang, Hsin-Chi Lin and Mei-Chu Huang
Institute of Statistical Science, Academia Sinica
Correspondence: Hsin-Chou Yang (hsinchou@stat.sinica.edu.tw), Institute of
Statistical Science, Academia Sinica, 128, Academia Road, Section 2
Nankang, Taipei 115, Taiwan.
Table of Contents:
1. ALOHA LICENSE
2. INTRODUCTION
3. SOFTWARE DOWNLOAD AND INSTALLATION
4. ALOHA INITIALIZATION
5. ALOHA INTERFACE AND FUNCTIONS
6. DATA INPUT FORMAT
7. TWO TEST EXAMPLES
8. ALOHA VERSION UPGRADE
1
1. ALOHA LICENSE
All copyright are reserved by authors of ALOHA. We welcome any noncommercial
use of ALOHA for your own research. Please do NOT modify or distribute the
program of ALOHA in any form without the permission of authors of ALOHA.
Commercial use of ALOHA should be directed to hsinchou@stat.sinica.edu.tw. For
free software ALOHA, we assume no warranty and no responsibility for the results of
analyses. If publications are based on the results from the use of ALOHA, please cite
the following reference:
Hsin-Chou Yang, Hsin-Chi Lin, Mei-Chu Huang, Ling-Hui Li, Wen-Harn Pan,
Jer-Yuarn Wu, and Yuan-Tsong Chen (2010). A new analysis tool for
individual-level allele frequency for genomic studies. BMC Genomics 11: 415.
2
2. INTRODUCTION
ALOHA (Allele-frequency/Loss-of-heterozygosity/Allele-imbalance; AF/LOH/AI),
written in R and R GUI, provides for genome-wide analysis of allele frequency and
detection of loss of heterozygosity (LOH) and allelic imbalance (AI). An allele
frequency biplot is also provided for sample classification, outlier detection, and SNP
clustering.
3
3. SOFTWARE DOWNLOAD AND INSTALLATION
Execution of ALOHA requires installation of program ALOHA and program R.
1. Download software ALOHA:
Software ALOHA is available at the ALOHA website. The zipped file
“ALOHA.zip” can be downloaded and then unzipped to obtain a directory
“ALOHA” containing the programs of ALOHA and two illustrated data examples.
The directory “ALOHA” can be saved as a working directory, such as
“C:\ALOHA”.
2. Download program R:
Users can download language R “R-2.14.1-win.exe” from the ALOHA website. Or
users can download R from the website of “The R Project for Statistical
Computing” at http://www.r-project.org/. Users click “CRAN” (Comprehensive R
Archive Network) in the left of the page and then select a suitable mirror site to
download software R. Select a platform (Linux, MacOS X, Windows (95 and later))
for R execution in your end. Click the hyperlink “base” and select
“R-2.14.1-win.exe”. Then execute the file to install program R to “C:\Program
Files\R\R-2.14.1”. After finishing the installation of program R, doubly click the
icon “R-2.14.1” to initialize program R, a window “RGui” with a sub-window “R
Console” jumps up await for the subsequent analysis action. Users are suggested to
update packages in R. They can select “Packages” in the tool bar, click “Update
packages” and then select a suitable mirror site to update packages. A window
“CRAN mirror” jumps up and the icon “OK” is clicked to update packages. Note
that the analyses provided by ALOHA require three additional R packages: tcltk,
gtools and pspline. These packages will be automatically downloaded if users use a
latest version of program R, e.g., R-2.14.1. Note that users are suggested to use
program R-2.14.1 or a version of program R newer than program R-2.14.1 for
execution of ALOHA.
4
4. ALOHA INITIALIZATION
Once the packages mentioned in the previous section has been installed, ALOHA can
be initialized by the following procedures. In this user guide, we suppose that
programs of ALOHA are saved in the destination directory “C:\ALOHA”.
1. Initialize software R by doubly clicking the icon “R-2.14.1”.
2. Key
in
the
command,
ALOHA.gui=paste("C://ALOHA//PROGRAM//ALOHA_interface.r",sep=""),
in
the command line in the window “R Console” and press the Enter key.
3. Type the command, source(ALOHA.gui), in the command line and press the
Enter key to initialize ALOHA. The ALOHA interface (see Figure 1) jumps up
and waits for the data entry after pressing the Enter key.
Figure 1. Interface of ALOHA
5
5. ALOHA INTERFACE AND FUNCTIONS
ALOHA has a user friendly interface developed by R GUI (Figure 1). The interface
contains a preface for a short introduction to ALOHA. Directory structure of
ALOHA is shown (Figure 2). Four main item questions are designed for providing
required/optional information for ALOHA data analysis.
Item 1: Input/output path:
 Study group: User should select “one group” for a one-population or
multiple-population analysis or “two groups” for a case-control analysis.
 Directory of data input: Users should provide the working directory where
their data are saved.
 Directory of results output: Users should provide the working directory
where their output should be saved. Note that the output directory must exist
before executing ALOHA.
Item 2: Allele frequency (AF) reference:
 ALOHA database – (a) Users should select a population matched to their
study, and (b) users should determine which type of SNP chip.
 User provided – Users can provide their own AF reference.
Item 3: AI/LOH calculation:
 Confidence level – Users should key in a value X between 0.9 and 1 for
construction of a 100X% confidence interval for AI and LOH indices of
study patients.
 Window size – Users should key in a value at least 10.
 Upper bound (quantile) of reference – Users should key in a value X
between 0.9 and 1 to specify the 100X%-quantile indices of normal controls
for AI and LOH indices.
Item 4: AF biplot:
 Alpha scaling – 0 or 1 should be inputted.
Once all item questions are answered, icon “Run” can be clicked to submit
computational job. Then, ALOHA will check the inputted data information and data
files. If the inputted information is invalid or the data files are ill-format, ALOHA
shows warning message(s) or error message(s), which provides users to make
corrections. If the inputted data pass the examination, ALOHA starts to perform
analysis and a message “Please wait a while, ALOHA is running…” will be shown in
6
the command line. A prompt sign will appear immediately but the computation is
proceeding. Please wait until a new window with the message “Computation of
ALOHA is finished.” jumps up to acknowledge users the completion of ALOHA
computation. Note that users can interrupt the execution of ALOHA anytime by
clicking ESC in the window “R Console”. Once the execution of ALOHA is
finished, the numerical results and graphical outputs will be automatically saved in
the output directory that users provide. We suggest that users should remove figure
files from a previous analysis before a new analysis in case of the confusion of
multiple figure files from old and new analyses.
Figure 2. Directory structure of ALOHA
7
6. DATA INPUT FORMAT
This section introduces data structure/format. Two test examples, which are used to
illustrate a one-group and two-group analysis, are also provided (see Section 7).
6.1 Nomenclature rule for the directory/file name and data format
 Two-group analysis:
As mentioned in Item 1 in Section 5, users can specify a working directory
where allele frequency data are saved, e.g., “C:\ALOHA\Work”. Under the working
directory,
users
MUST
make
subdirectories
“C:\ALOHA\Work\Case”
and
“C:\ALOHA\Work\Control” to save allele frequency data of patient samples and
control samples, respectively. Note that the directory names are case sensitive. In
the directory, allele frequency data for various samples should be saved in the
respective directories with a directory name following the nomenclature rule “X_Y”,
where Y is sample ID, e.g., “01”, “02”, … , and X is an arbitrary string used to
describe attribute of case or control, e.g., “ALL” for acute lymphoblastic leukaemia
and “NC” for normal control. Under the directory of each sample, allele frequency
data for various chromosomes should be save in the respective files with a file name
following the nomenclature rule “Chr_W.txt”, where W is the number of chromosome,
i.e., “01”, “02”, …, “23”. Data file of allele frequency for each chromosome each
sample contains six columns with the following header “Probe_set”, “Chr”,
“Phy_position”, “Genotype”, “Chiptype”, and “AF”. Please refer to Example 1
introduced in Section 7.1 for details.
 One-group analysis:
As mentioned in Item 1 in Section 5, users can specify a working directory
where allele frequency data are saved, e.g., “C:\ALOHA\Work”. In the directory, data
for various populations should be saved in the respective directories with a directory
name following the nomenclature rule “X_Y”, where Y is an arbitrary string for
illustration of study population, e.g., “Asian”, “CEU”, “YRI”, and X is an arbitrary
string used to describe attribute of this study, e.g., “Normal”. Under the directory of
each population, allele frequency data for various samples should be saved in the
respective directories with a directory name following the nomenclature rule “X_Y”,
where Y is sample ID, e.g., “01”, “02”, … , and X is an arbitrary string used to
describe attribute of this study. Under the directory of each sample, allele frequency
8
data for various chromosomes should be saved in the respective files with a file name
following the nomenclature rule “Chr_W.txt”, where W is the number of chromosome,
i.e., “01”, “02”, …, “23”. Data file of allele frequency for each chromosome each
sample contains six columns with the following header “Probe_set”, “Chr”,
“Phy_position”, “Genotype”, and “Chiptype”, and “AF”. Please refer to Example 2
introduced in Section 7.2 for details.
9
7. TWO TEST EXAMPLES
ALOHA provides two test examples generated by Monte Carlo procedures. The first
example demonstrates an analysis of two groups, i.e., case group and control group.
The second example demonstrates an example of one group with three populations.
Data of these two examples are provided in directory “C:\ALOHA\EXAMPLE”.
7.1 Example 1: A two-group (case-control) analysis
This is an example of two groups (case vs. control). This example consists of two
cancer patients and ten normal controls and data are provided in the directory
“C:\ALOHA\EXAMPLE\Test1”. Allele frequency data of the two cancer patients are
saved in the directory “C:\ALOHA\EXAMPLE\Test1\Case” and data of ten normal
controls are saved in the directory “C:\ALOHA\EXAMPLE\Test1\Control”. Directory
names for the two patients are “Abnorm_01_F” and “Abnorm_02_M”; directory
names for the ten controls are “Norm_001_F”, …, “Norm_010_M”, in which allele
frequency data for 23 chromosomes are provided.
This example is the defaulted example of ALOHA and can be run easily by
pressing the “Run” button (keying in Test1 in the directory of data input in Item 1).
Note that the commands filenames directory names are case sensitive. Then
ALOHA starts to perform analysis and a message “Please wait a while, ALOHA is
running…” will be shown in the command line. When the computation is finished, a
message “Computation of ALOHA is finished.” shown to acknowledge users the
completion of ALOHA computation. The computational procedure will take about 15
minutes using a machine with a CPU of Intel Core2 Duo E8400 3.00GHz and RAM
of DDR2 3.25G. Results of the analysis will be automatically saved in the output
directory
“C:\ALOHA\OUTPUT\Test_Example_Output\Test1”,
including
three
subdirectories, “Graphical result”, “Numerical result”, and “Sample list” and a file
“Log.txt”. In addition, a file “Data description.txt” describing the study data and
parameter setting in the analysis will also be provided in the directory “Numerical
result”. In this illustrative example, the graphical results are shown in Figure 3,
Figure 4, Figure 5, and Figure 6. Explanations to the results of these figures can
refer to the ALOHA paper (Yang et al., BMC Genomics, 2010).
10
Figure 3. Chromosomal aberration plots of the first cancer patient
Abnorm_01_F.
Figure 4. Chromosomal aberration plots of the second cancer patient
Abnorm_02_M.
11
Figure 5. Allele frequency biplots of the two cancer patients and 10 normal
controls.
Figure 6. Combined AI plot and combined LOH plot of two cancer patients. (A)
Combined AI plot, and (B) Combined LOH plot. The upper-left subplot shows the
status of AI/LOH in genomic regions for each study sample. Blue color denotes no
occurrence of AI/LOH and red color denotes occurrence of AI/LOH. The lower-left
subplot provides the proportion (%) of samples carrying AI/LOH aberrations in
specific genomic regions. The upper-right subplot provides the proportion (%) of
genomic regions carrying AI/LOH aberrations in a study sample. In this subplot, a
male is indicated by sky blue color and a female is indicated by pink color. The area
displayed by purple bars with left-slanting lines indicates chromosomal aberration.
For example, the first sample is a female and ~3% of her genome carries
chromosomal aberrations (Note that ~97% of her genome are normal but the subplot
is truncated at an aberration proportion < 4%).
12
(A)
(B)
7.2 Example 2: A one-group analysis with three populations
This example provides an analysis of one group with three populations. This example
consists of 15 samples from African (YRI), Caucasian (CEU) and Asian populations,
and each population contains five samples and data are provided in the directory
“C:\ALOHA\EXAMPLE\Test2”. Allele frequency data for the three populations are
provided in the sub-directories “Normal_Asian”, “Normal_CEU”, and “Normal_YRI”
13
under the directory “C:\ALOHA\EXAMPLE\Test2\”. The input data format is the
same as mentioned in the previous example (Example 1).
This example can be run easily by checking “One group”, keying in Test2 in the
directory of data input, and pressing the “Run” button. Note that the commands
filenames directory names are case sensitive. Then ALOHA starts to perform
analysis and a message “Please wait a while, ALOHA is running…” will be shown in
the command line. When the computation is finished, a message “Computation of
ALOHA is finished.” shown to acknowledge users the completion of ALOHA
computation. The computational procedure will take about 2 minutes using a machine
with a CPU of Intel Core2 Duo E8400 3.00GHz and RAM of DDR2 3.25G. Results of
the
analysis
will
be
automatically
saved
in
the
output
directory
“C:\ALOHA\OUTPUT\Test_Example_Output\Test2”, including three subdirectories,
“Graphical result”, “Numerical result”, and “Sample list” and a file “Log.txt”. In
addition, a file “Data description.txt” describing the data and parameter setting in the
analysis will also be provided in the directory “Numerical result”. The graphical result
in this example is shown in Figure 7. Explanations to these figures can refer to the
ALOHA paper (Yang et al., BMC Genomics, 2010).
Figure 7. Allele frequency biplots of the 15 samples from three populations,
Asian, CEU, and YRI populations.
14
8. ALOHA VERSION UPGRADE
Versions:
ALOHA Version 1.0: Jun. 2010
ALOHA Version 1.1: July. 2010
ALOHA Version 1.2: Dec. 2011
What are the new features in ALOHA?
In version 1.2, a new function to provide a combined AI plot and a combined LOH
plot is added.
15
Download