Conversation
- changed parameter name from `estimate.params.density` to `initialize.on.subset` - allow for values higher than 1, which directly yields number of samples - priorize Mclust argument `initialization` - check for to few samples, 9 seems to be the magic number - add tests
Remove duplicate check for sample size.
| expect_equal(a,b) | ||
| }) | ||
|
|
||
| test_that("check if methSeg with cores > 1 is the same as cores=1 (non-tabix file)" ,{ |
There was a problem hiding this comment.
This seems to be the same test as before.
tests/testthat/test-8-methSeg.r
Outdated
| expect_equal(a,b) | ||
| }) | ||
|
|
||
| methylRawDB.obj <- methRead( |
There was a problem hiding this comment.
Is this really a tabix based object?
I think you have to set save.db=TRUE.
There was a problem hiding this comment.
if I am correct it's dbtype="tabix"
R/methSeg.R
Outdated
| gr0 = gr0[,"meth"] | ||
| }else if("meth.diff" %in% names(mcols(gr0))){ | ||
| gr0 = gr0[,"meth.diff"] | ||
| }else if (class(obj) != "GRanges"){ |
There was a problem hiding this comment.
This is unnecessary since you are already checking above for the class.
R/methSeg.R
Outdated
| # match argument names to fastseg arguments | ||
| args.fastseg=dots[names(dots) %in% names(formals(fastseg)[-1] ) ] | ||
| initialize.on.subset=1, | ||
| cores=1, ...){ |
There was a problem hiding this comment.
cores should be mc.cores to keep argument names consistent with other methylKit functions
| seg.res <- do.call("fastseg", args.fastseg) | ||
|
|
||
| # stop if segmentation produced only one range | ||
| if(length(seg.res)==1) { |
There was a problem hiding this comment.
shouldn't this if clause be called after fastseg and before calling mclust?
There was a problem hiding this comment.
yes, I think I know what you mean, this if clause is here in the original code and you return seg.res if there is only 1 segment
Line 127 in 0d007f2
R/methSeg.R
Outdated
| # methylKit naming convention | ||
| df2getcolnames = as.data.frame(gr0[1]) | ||
| df2getcolnames$width = NULL | ||
| methylKit:::.setMethylDBNames(df2getcolnames) |
There was a problem hiding this comment.
why do we need this line ?
methylKit:::.setMethylDBNames(df2getcolnames)
There was a problem hiding this comment.
This function is used to predict the column names of the given data.frame.
There was a problem hiding this comment.
But we do not need the methylKit:::!
There was a problem hiding this comment.
OK, I got confused, I thought we are resetting names on the tabix files but this doesn't do that, right ?
There was a problem hiding this comment.
No, actually the data.frame that we get from the tabix file does not have column names and with that function we retrieve them.
R/methSeg.R
Outdated
| ## Tabix files | ||
| } else if(class(obj)=="methylDiffDB" | class(obj)=="methylRawDB"){ | ||
|
|
||
| .run.fastseg.tabix = function(gr0, ...){ |
There was a problem hiding this comment.
I would suggest a function which actually takes the class of the object as argument:
.run.fastseg.tabix = function(gr0, class ,...) {
### and then you can directly set the colnames
.setMethylDBNames(df2getcolnames,class)
}
|
I improved the code according to your suggestions besides @al2na suggestion about the if clause #120 (diff) |
|
could we also comment the code wherever possible, please think about people who will maintain this in the future or your future selves. Certain things that are trivial are not going to be trivial after 3 months of not looking at the code. |
|
@al2na I added more comments, hope it's better now |
|
there is something wrong when |
|
I checked if with methylRawDB and multiple cores is faster than using methylRaw object on example of data with ~350K Cs (two chromosomes) and methylRaw is faster. I don't know why. Maybe it depends on the size of the input, I will check that |
|
please check datasets that have multiple chromosomes lets say at least 5
chromosomes, compare also memory consumption.
…On Wed, Jul 4, 2018 at 10:55 AM katwre ***@***.***> wrote:
I checked if with methylRawDB and multiple cores is faster than using
methylRaw object on example of data with ~350K Cs and methylRaw is faster.
I don't know why. Maybe it depends on the size of the input, I will check
that
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#120 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAm9ERdF218-iOSwujVjdGoOCvaGaiKcks5uDIL8gaJpZM4U7trW>
.
|
|
I checked it using 5 chromosomes and it's not better. thanks @alexg9010 for the suggestion to use profvis, but it didnt work for me, I got an error that I didn't what to do with. I used profmem instead and it showed that memory usage when there are parallel cores is smaller than without using multiple cores. |
|
@al2na @alexg9010 I didn't manage to show that this method is faster. Should we close this pull request? |
Hi guys,
I added parallel methSeg() with methylDB objects and non-tabix objects.
I separated code for running fastseg and mclust into two auxiliary function .run.fastseg and .run.mclust. Parallelization is in the step of fastseg and each fastseg run is concatenated. For that, I added to
return.typeinapplyTbxByChr"GRanges" to concatenate GRanges.I wrote some tests, but I think I maybe should add more of them
Let me know what do you think about it
Kasia