moimport(file,...)
###Basic data import
The function is called with moimport(<file>,...)
,
where the mandatory argument <file>
is a string
specifying the path of the file to be imported. Consider the following
dataset:
provided by the package in the Excel file ‘testDatabasic.xlsx’. The
first column contains the sample IDs followed by marker columns. Marker
labels are in the first row. All marker columns are of type ‘STR’
(default for argument molecular
). The code below imports
and transforms this dataset to standard format:
infile <- system.file("extdata", "testDatabasic.xlsx", package = "MLMOI")
Outfile <- moimport(file = infile)
#> Warning: There are no warnings!
The first six rows of the imported data are printed using the command:
head(Outfile)
#> Sample IDs M1 M2 M3 M4
#> 1 s-id1005 258 200 200 175
#> 2 s-id1009 258 200 197 175
#> 3 s-id1009 260 <NA> <NA> 177
#> 4 s-id1016 258 200 197 175
#> 5 s-id1018 258 202 197 177
#> 6 s-id1024 258 200 197 175
###Exporting the dataset in standard format
If export
is not an absolute path, the dataset will be
stored in the current working directory:
###Importing datasets with metadata {#meta}
The number of metadata columns are specified via argument
nummtd
. If Users want to retain metadata,
keepmtd = TRUE
need to be set. The dataset
‘testDatametadata.xlsx’ contains three metadata variables, as shown
below:
The appropriate code for importing this dataset is:
infile <- system.file("extdata", "testDatametadata.xlsx", package = "MLMOI")
Outfile <- moimport(infile, nummtd = 3, keepmtd = TRUE)
#> Warning: There are no warnings!
The first six rows of the imported data are printed using the command:
head(Outfile)
#> Sample IDs Location year clonality locus1 locus2 locus3 locus4
#> 1 sam73 Lower Maroni 2006 polyclonal 188 256 102 316
#> 2 sam73 Lower Maroni 2006 polyclonal 166 <NA> <NA> 336
#> 3 sam11 Lower Maroni 2008 monoclonal 188 250 102 336
#> 4 sam26 Lower Maroni 2010 polyclonal 160 256 102 320
#> 5 sam26 Lower Maroni 2010 polyclonal <NA> <NA> 110 <NA>
#> 6 sam427 Lower Maroni 2010 polyclonal 160 250 102 300
If users forget to set nummtd
, the import function
regards the metadata columns as marker columns. This usually causes
various warnings.
###Importing different types of molecular markers
The Excel file ‘testDatamt.xlsx’ provided by the package contains different types of molecular data. This dataset is shown below
Marker columns start from third column. They are from left to right
of data type ‘STR’, ‘amino-acid’, ‘SNP’, ‘SNP’, ‘amino-acid’ and
‘codon’. The argument molecular
needs to be set as a
vector:
infile <- system.file("extdata", "testDatamt.xlsx", package = "MLMOI")
Outfile <- moimport(infile, nummtd = 1, molecular = c('str','amino','snp','snp','amino','codon'))
#> Warning: There are no warnings!
The first six rows of the imported data are printed using the command:
head(Outfile)
#> Sample IDs m1STR m1Amino m1SNP m2SNP m2Amino m1Codon
#> 1 sam1 132 HIS C A ARG AGC
#> 2 sam1 136 <NA> <NA> <NA> <NA> <NA>
#> 3 sam2 136 HIS C T CYS AGG
#> 4 sam3 140 HIS G T GLY GGG
#> 5 sam3 <NA> <NA> <NA> <NA> ARG <NA>
#> 6 sam4 140 ASP G A ARG GGG
Since the coding class of markers in this dataset are the default
ones, the argument coding
is not specified.
###Different molecular types with different coding classes
If a dataset contains a molecular marker whose coding is not assumed
as default in moimport
, the argument coding
needs to be set. Consider the following dataset:
called ‘testDatamtc.xlsx’ provided by the package. The molecular
markers in columns 3 (‘SNP’), 5 (‘STR’) and 6 (‘amino-acid’) are not in
default coding. Therefore, the argument coding
needs to be
set:
infile <- system.file("extdata", "testDatamtc.xlsx", package = "MLMOI")
Outfile <- moimport(infile, nummtd = 1,
molecular = c('snp', 'str', 'str', 'amino', 'snp', 'codon'),
coding = c('iupac', 'integer', 'nearest', 'full', '4let', 'triplet'))
#> Warning: The cell on row 13 and marker 'snp-iupac' contains an unidentified
#> entry: 'Z'.
Here a warning is issued because the entry ‘z’ is not in coding ‘iupac’. The last six rows of the imported data are printed using the command:
tail(Outfile)
#> Sample IDs snp-iupac ms-integer ms-real aa-amino snp-nucleo aa-codon
#> 9 xid6 <NA> 148 192 ALA <NA> ATT
#> 10 xid7 T 148 188 ALA G ATT
#> 11 xid8 A 164 182 GLY C ATT
#> 12 xid8 C <NA> 192 ALA G ATG
#> 13 xid9 T 160 188 GLY C ATG
#> 14 xid10 A 160 180 ALA C ATG
It can be seen that the entry ‘z’ is deleted.
###Data from several worksheets
The file ‘testDatacomplex.xlsx’ provided by the package contains datasets in multiple worksheets as shown below:
The numbering determines the order of worksheets in the Excel file.
Clearly, this is the case of multiple worksheet dataset. Hence the
option multsheets = TRUE
needs to be set. Notice that
worksheet (2) is entered in transposed format. Thus, the option
transposed
needs to be set as a logical vector, specifying
which worksheets are in transposed format. The following code imports
the dataset:
infile <- system.file("extdata", "testDatacomplex.xlsx", package = "MLMOI")
Outfile <- moimport(infile, multsheets = TRUE, keepmtd = TRUE, nummtd = c(0, 2, 2, 0), transposed = c(FALSE, TRUE, FALSE, FALSE),
molecular = list(c('str', 'str', 'snp', 'snp'),
c('amino', 'codon', 'snp', 'amino', 'codon'),
c('codon', 'str'),
c('str', 'amino', 'snp', 'str')),
coding = list(rep(c('integer', '4let'), each = 2),
c('1let', 'triplet', 'iupac', '3let', 'triplet'),
c('triplet', 'integer'),
c('integer', '1let', 'iupac', 'integer')))
#> Warning: In worksheet 1: The cell in row 3 and marker 'x1tr' contains the
#> unexpected character '/'.
#> Warning: In worksheet 1: The cell in row 5 and marker 'x1tr' contains the
#> unexpected character '/'.
#> Warning: In worksheet 1: The cell in row 5 and marker 'y1tr' contains the
#> unexpected character '/'.
#> Warning: In worksheet 1: The cell in row 7 and marker 'y1tr' contains the
#> unexpected character '/'.
#> Warning: In worksheet 1: The cell in row 4 and marker 'x1np' contains the
#> unexpected character '/'.
#> Warning: In worksheet 1: The cell in row 4 and marker 'y1np' contains the
#> unexpected character '/'.
#> Warning: In worksheet 2: The cell in column 5 and marker 'x2mino' contains the
#> unexpected character '?'.
#> Warning: In worksheet 2: The cell in column 7 and marker 'x2odon' contains the
#> unexpected character '?'.
#> Warning: In worksheet 2: The cell in column 5 and marker 'x2odon' contains the
#> unexpected character '?'.
#> Warning: In worksheet 2: The cell in column 6 and marker 'y2mino' contains the
#> unexpected character '?'.
Several warnings are issued because data in the first and second worksheets have several entry errors. The first seven rows of the imported data are printed using the command:
head(Outfile, n = 7)
#> Sample IDs site testDate year x1tr y1tr x1np y1np x2mino x2odon x2np y2mino
#> 1 sid4 Aioun 41794 1997 124 95 A A TYR <NA> C CYS
#> 2 sid4 Aioun 41794 1997 134 98 <NA> <NA> <NA> <NA> G PHE
#> 3 sid4 Aioun 41794 1997 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 sid8 Aioun <NA> 1997 <NA> 95 A A TYR CCG A <NA>
#> 5 sid8 Aioun <NA> 1997 <NA> <NA> <NA> <NA> <NA> CGA G <NA>
#> 6 sid3 <NA> 41795 1997 120 102 <NA> <NA> <NA> <NA> <NA> <NA>
#> 7 sid3 <NA> 41795 1997 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> y2odon x3codon x3tr x4tr x4amino x4np y4tr
#> 1 AAA CTA 95 280 THR G 230
#> 2 <NA> TTA 112 260 <NA> T <NA>
#> 3 <NA> CTT <NA> <NA> <NA> <NA> <NA>
#> 4 AAG <NA> <NA> 320 GLU A 230
#> 5 <NA> <NA> <NA> <NA> THR <NA> <NA>
#> 6 <NA> CCT 119 320 GLU A 190
#> 7 <NA> <NA> <NA> 280 <NA> <NA> 205
First warning of worksheet (2), addresses an entry with consecutive amino acids. Notice that, in spite of the warning the consecutive amino acids got converted to their equivalent three-letter designations after all.
moimerge(file1, file2,...)
###Merging datasets {#merge}
Consider the datasets ‘testDatamerge1.xlsx’ and ‘testDatamerge2.xlsx’ from the package: