Conversation
|
@jananiravi I have made the changes. Kindly review/guide me for further steps if required in this Issue. |
- Refactored and merged R/tree.R CreateFA2Tree and ConvertFA2Tree. - Parametrized every hardcoded value and path in ConvertFA2Tree - Refactored convertAlignment2Trees to support running both modes. - Backward compatible - Updated all documentation for R/tree.R
|
FIxed the commit message to reflect the correct issue number |
epbrenner
left a comment
There was a problem hiding this comment.
A couple thoughts and suggestions as I passed through. I'm not 100% certain on how the tree generation is meant to flow from the older script you're refactoring, so please ask me for clarification or feel free to just toss suggestions if they don't align with the actual logic going on here!
R/tree.R
Outdated
| # estimate a distance matrix using a Jules-Cantor Model | ||
| dna_dist <- dist.ml(prot10, model = "JC69") | ||
| # estimate a distance matrix using a Jules-Cantor Model | ||
| dna_dist <- dist.ml(prot_subset, model = dna_model) |
There was a problem hiding this comment.
| dna_dist <- dist.ml(prot_subset, model = dna_model) | |
| prot_dist <- dist.ml(prot_subset, model = mt) |
I know the original outdated code references DNA here, but I believe this should all be protein based on the inputs. If the change to prot_dist is correct, the code below will also need modification to reflect the variable name changes.
There was a problem hiding this comment.
@jananiravi : Can you confirm if the above can be considered correct?
There was a problem hiding this comment.
@epbrenner are these the unused tree generators? Using dna could be a stub? Maybe change to protein for now (will work with current molevolvr) or make it a function argument to take dna or protein fasta sequences -- future-proofing the function given starting with DNA seqs will be part of molevolvr sooner rather than later. Does that answer your question, @Klangina ?
There was a problem hiding this comment.
Sure @jananiravi, I will create an argument to switch between DNA and protein sequences and keep the default as protein. Will update the pr shortly.
R/tree.R
Outdated
| plot(prot_NJ, main = "Neighbor Joining") | ||
| ## Neighbor Joining, UPGMA, and Maximum Parsimony | ||
| # quick and dirty UPGMA and NJ trees | ||
| prot_UPGMA <- upgma(dna_dist) |
There was a problem hiding this comment.
| prot_UPGMA <- upgma(dna_dist) | |
| prot_UPGMA <- upgma(prot_dist) |
R/tree.R
Outdated
| ## Neighbor Joining, UPGMA, and Maximum Parsimony | ||
| # quick and dirty UPGMA and NJ trees | ||
| prot_UPGMA <- upgma(dna_dist) | ||
| prot_NJ <- NJ(dna_dist) |
There was a problem hiding this comment.
| prot_NJ <- NJ(dna_dist) | |
| prot_NJ <- NJ(prot_dist) |
R/tree.R
Outdated
| # ml estimation w/ distance matrix | ||
| fit <- pml(prot_NJ, prot_subset) | ||
| print(fit) | ||
| fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) |
There was a problem hiding this comment.
| fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) | |
| fit_optimized <- optim.pml(fit, model = fit, rearrangement = rearrangement) |
Or something similar. JC here stands for Jukes-Cantor, which is a DNA substitution model, so removing that from the variable. I believe the model in model = ml_model will actually be the prior tree generated by pml(), which would be fit here, and then optim.pml() optimizes the previous model. Do double-check me on this, though!
There was a problem hiding this comment.
Yes, switching model based on DNA/protein input could be part of the function, too.
R/tree.R
Outdated
| fit <- pml(prot_NJ, prot_subset) | ||
| print(fit) | ||
| fitJC <- optim.pml(fit, model = ml_model, rearrangement = rearrangement) | ||
| logLik(fitJC) |
There was a problem hiding this comment.
| logLik(fitJC) | |
| logLik(fit_optimized) |
If above suggestion is correct.
R/tree.R
Outdated
| bs = bootstrap_reps, optNni = TRUE, multicore = TRUE, | ||
| control = pml.control(trace = 0) | ||
| ) | ||
| plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type) |
There was a problem hiding this comment.
| plotBS(midpoint(fitJC$tree), bs, p = bootstrap_p, type = plot_type) | |
| plotBS(midpoint(fit_optimized$tree), bs, p = bootstrap_p, type = plot_type) |
R/tree.R
Outdated
| prot_subset_NJ <- NJ(prot_subset_dm) | ||
| fit2 <- pml(prot_subset_NJ, data = prot_subset) | ||
| print(fit2) | ||
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) |
There was a problem hiding this comment.
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | |
| fit_optimized2 <- optim.pml(fit2, model = fit2, rearrangement = rearrangement) |
As before, if model should be the output of pml() then update. Discard if I'm mistaken though!
R/tree.R
Outdated
| fit2 <- pml(prot_subset_NJ, data = prot_subset) | ||
| print(fit2) | ||
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | ||
| logLik(fitJC2) |
There was a problem hiding this comment.
| logLik(fitJC2) | |
| logLik(fit_optimized2) |
R/tree.R
Outdated
| print(fit2) | ||
| fitJC2 <- optim.pml(fit2, model = ml_model, rearrangement = rearrangement) | ||
| logLik(fitJC2) | ||
| bs_subset <- bootstrap.pml(fitJC2, |
There was a problem hiding this comment.
| bs_subset <- bootstrap.pml(fitJC2, | |
| bs_subset <- bootstrap.pml(fit_optimized2, |
R/tree.R
Outdated
| bs = bootstrap_reps, optNni = TRUE, multicore = TRUE, | ||
| control = pml.control(trace = 0) | ||
| ) | ||
| bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type) |
There was a problem hiding this comment.
| bs2 <- plotBS(midpoint(fitJC2$tree), bs, p = bootstrap_p, type = plot_type) | |
| bs2 <- plotBS(midpoint(fit_optimized2$tree), bs, p = bootstrap_p, type = plot_type) |
- renamed dna_dist to seq_dist - renamed fitJC and fitJC2 to fit_optimised and fit_optimised2 to make variable names more general. - Updated code and documentation accordingly.
|
@epbrenner @jananiravi : Made changes as per recommendations before. Kindly review. |
… following two issues: createFA2Tree: no visible global function definition for ‘midpoint’ createFA2Tree: no visible binding for global variable ‘fit_optimized2’
Description
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.