Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can you add a function merge? #40

Open
panxiaoguang opened this issue Apr 20, 2021 · 6 comments
Open

can you add a function merge? #40

panxiaoguang opened this issue Apr 20, 2021 · 6 comments

Comments

@panxiaoguang
Copy link

panxiaoguang commented Apr 20, 2021

Although it's a small problem, merging adjacent intervals still requires a lot of code. Can you provide such a function

@kescobo
Copy link
Member

kescobo commented Apr 26, 2021

@panxiaoguang it might be helpful to provide a MWE of code that gives the sorts of inputs you're talking about, and then a few examples of what you'd like this function to do (how you're doing it currently, and what you think a good interface would be)

@CiaranOMara
Copy link
Member

Below is a generic example of how I would begin to approach partitioning a collection of intervals, and the merging of partitions.

using GenomicFeatures

intervals = [Interval.("chr1", 1:2:10, 2:2:10); Interval.("chr2", 1:2:10, 2:2:10)]

col = IntervalCollection(intervals)

function custom_merge(interval_a::Interval{Nothing}, interval_b::Interval{Nothing})
    return Interval(seqname(interval_a), leftposition(interval_a), rightposition(interval_b))
end

function custom_merge(interval::Interval{Nothing})
    return interval
end

function custom_merge(intervals::Vector)
    return custom_merge(intervals...)
end

function custom_partitioner(tree::GenomicFeatures.IntervalTrees.IntervalTree, p::Integer=2)
    return Iterators.partition(tree, p)
end

function process(f::Function, col::IntervalCollection)

    # Generate iterators for each tree.
    iterators = Iterators.Generator(f, values(col.trees))

    return Iterators.flatten(iterators)
end

merged = map(custom_merge, process(custom_partitioner, col))
merged = custom_merge.(process(custom_partitioner, col))

@panxiaoguang
Copy link
Author

return Interval(seqname(interval_a), leftposition(interval_a), rightposition(interval_b))

how about a contains b ? then merged interval should be a

@panxiaoguang
Copy link
Author

If GenomicFeatures.jl can have all features of bedtools, I think it will be a great tools for every bioinformatics people

@lysogeny
Copy link

I would welcome this feature. It would also be nice to see other interval set operations such as differences and intersections. I would probably write some implementations, but I am not sure on how to do these operations efficiently on large datasets. Is there a set of possible test cases to work with for these tasks?

@CiaranOMara
Copy link
Member

CiaranOMara commented Aug 8, 2021

The merge methods would need to be generic. I don't see merge methods being added as their requirements are far too variable. But, contains or intersects methods could be made available. Test cases for such methods would need designing/sourcing and writing. I would not initially be concerned about speed or efficiency, those may be addressed later. GenomicFeatures has some implemented benchmarks, so you could add to those to evaluate and track what is more or less efficient.

The following is an example of a custom merge method where intervals have metadata.

function default_merge_checks(interval_a, interval_b)
    seqname(interval_a) == seqname(interval_b) || error("Intervals do not have the same seqname.")
    strand(interval_a) == strand(interval_b) || error("Intervals do not have the same strand.")
    interval_a < interval_b || error("Intervals are not in order.")
end

function custom_merge(f::Function, interval_a::I, interval_b::I; check::Function = default_merge_checks) where {T, I<:AbstractInterval{T}}

    check(interval_a, interval_b)

    return I(
        seqname(interval_a),
        leftposition(interval_a),
        rightposition(interval_b),
        strand(interval_a),
        f(metadata(interval_a, metadata(interval_b))
    )

end
interval = custom_merge(+, interval_a, interval_b)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants