-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can you add a function merge? #40
Comments
@panxiaoguang it might be helpful to provide a MWE of code that gives the sorts of inputs you're talking about, and then a few examples of what you'd like this function to do (how you're doing it currently, and what you think a good interface would be) |
Below is a generic example of how I would begin to approach partitioning a collection of intervals, and the merging of partitions. using GenomicFeatures
intervals = [Interval.("chr1", 1:2:10, 2:2:10); Interval.("chr2", 1:2:10, 2:2:10)]
col = IntervalCollection(intervals)
function custom_merge(interval_a::Interval{Nothing}, interval_b::Interval{Nothing})
return Interval(seqname(interval_a), leftposition(interval_a), rightposition(interval_b))
end
function custom_merge(interval::Interval{Nothing})
return interval
end
function custom_merge(intervals::Vector)
return custom_merge(intervals...)
end
function custom_partitioner(tree::GenomicFeatures.IntervalTrees.IntervalTree, p::Integer=2)
return Iterators.partition(tree, p)
end
function process(f::Function, col::IntervalCollection)
# Generate iterators for each tree.
iterators = Iterators.Generator(f, values(col.trees))
return Iterators.flatten(iterators)
end
merged = map(custom_merge, process(custom_partitioner, col))
merged = custom_merge.(process(custom_partitioner, col)) |
return Interval(seqname(interval_a), leftposition(interval_a), rightposition(interval_b)) how about a contains b ? then merged interval should be a |
If GenomicFeatures.jl can have all features of bedtools, I think it will be a great tools for every bioinformatics people |
I would welcome this feature. It would also be nice to see other interval set operations such as differences and intersections. I would probably write some implementations, but I am not sure on how to do these operations efficiently on large datasets. Is there a set of possible test cases to work with for these tasks? |
The merge methods would need to be generic. I don't see merge methods being added as their requirements are far too variable. But, contains or intersects methods could be made available. Test cases for such methods would need designing/sourcing and writing. I would not initially be concerned about speed or efficiency, those may be addressed later. GenomicFeatures has some implemented benchmarks, so you could add to those to evaluate and track what is more or less efficient. The following is an example of a custom merge method where intervals have metadata. function default_merge_checks(interval_a, interval_b)
seqname(interval_a) == seqname(interval_b) || error("Intervals do not have the same seqname.")
strand(interval_a) == strand(interval_b) || error("Intervals do not have the same strand.")
interval_a < interval_b || error("Intervals are not in order.")
end
function custom_merge(f::Function, interval_a::I, interval_b::I; check::Function = default_merge_checks) where {T, I<:AbstractInterval{T}}
check(interval_a, interval_b)
return I(
seqname(interval_a),
leftposition(interval_a),
rightposition(interval_b),
strand(interval_a),
f(metadata(interval_a, metadata(interval_b))
)
end interval = custom_merge(+, interval_a, interval_b) |
Although it's a small problem, merging adjacent intervals still requires a lot of code. Can you provide such a function
The text was updated successfully, but these errors were encountered: