Added FUSION metrics #70

bmcgaughey1 · 2024-02-13T22:03:47Z

Added FUSION metrics except the cover-related metrics that require a height threshold. I think I have everything in place but haven't looked at the values that come out. I suspect there are much faster ways to compute some things. For example, I compute 15 percentile height metrics using a function call for each. It is easy with numpy.percentile() to compute all with one call but things aren't set up to pull a list of metrics into TileDB. Same thing goes for the l-moments and l-moment ratios. Having each in a separate function means the l-moments are computed 7 times. Same thing could be done with a single call.

hobu · 2024-02-14T15:44:38Z

it should be a default that the type for all metrics is np.float32 so we don't have to copypasta that allover the place.

I also wonder if each metric shouldn't provide a test() harness that can be run by pytest to protect from dry rot.

…rics

bmcgaughey1 · 2024-02-15T18:45:48Z

Is there an example of what the test() should look like? I poked around in the tests a bit but didn't see anything.

Most of the metrics are computed in simple ways but i suppose the underlying numpy methods could change and cause problems or differences.

hobu · 2024-02-15T19:09:08Z

Is there an example of what the test() should look like? I

I was thinking add something like this to Metric:

    def test(self, vector):
        try:
            results = self._method(vector)
        except Exception as e:
            # if we div0 or or throw or something we failed
            raise e

        # running the method over a vector of input 
        # should only return a single value that is a number (for now)
        assert isinstance(results, (int, float, complex)) and not isinstance(results, bool)
        return True

Then the test harness would pass in stuff that could be weird (like a vector of 0s or a string or other bad input). If any of the test calls raise an exception, we report to the user, etc.

kylemann16 · 2024-02-23T16:44:33Z

@bmcgaughey1 I made a PR with some changes to fix some small stuff as well as adding a simple framework for testing metrics. If you want to look that over, I can merge that into this PR whenever.

One thing that I'm finding with this PR is that some of the methods end up dividing by zero with data that hasn't been sanitized. The ones I can think of off the top of my head are m_crr and lmom4.

Kmann/metrics

bmcgaughey1 · 2024-02-23T18:21:03Z

The divide by zero makes sense. Do you think it is best to test for divide by zero or just trap the exception and return a value indicating the metric couldn't be computed?

bmcgaughey1 · 2024-02-23T18:27:57Z

I accepted the change to the call for np.percentile 1 to 100 instead of 1 to 99. the 100th percentile is max and it isn't used in the profilearea metric because of potential outliers. No problem leaving it at 100 but it adds time (arguable a small amount of time but time nonetheless)

kylemann16 · 2024-02-23T18:33:04Z

I accepted the change to the call for np.percentile 1 to 100 instead of 1 to 99. the 100th percentile is max and it isn't used in the profilearea metric because of potential outliers. No problem leaving it at 100 but it adds time (arguable a small amount of time but time nonetheless)

The range method in python returns results that are inclusive on the bottom and exclusive on the top, so p[98] was throwing an out of bounds error here

>>> print(list(range(9)))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
>>> print(list(range(1,10)))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

hobu · 2024-02-23T18:45:12Z

The divide by zero makes sense. Do you think it is best to test for divide by zero or just trap the exception and return a value indicating the metric couldn't be computed?

I guess we should leave it up to the Metric to decide what it wants to do, but it should be a requirement that if a Metric throws for ANY input on computation, it's default value is written into TileDB. Then a Metric writer has the option of doing nan or min/max or -9999 or whatever it wants.

bmcgaughey1 · 2024-02-29T20:26:48Z

Still waiting for how we want to handle the height thresholds. There is a "base" threshold for all of the simple metrics, a cover threshold, and a "no threshold" case for computing metrics by height strata. The last actually needs a list of thresholds that define the height strata. Seems to me that we want these are part of init. Calls to simple metric functions could filter the data prior to the call or pass the threshold value to the function and let the filtering happen in the function. For the cover metrics, ratios of return counts above the threshold to total return count, the function will need all the points and the threshold. For strata, we might need to move ahead with the ability to have the metric function return a list of values. However, we could call the strata-metric functions with a lower and upper threshold...one call for each strata. just one more case where it could be done more efficiently (compute multiple metrics with a single function call).

kylemann16 · 2024-03-01T16:32:52Z

So what we definitely need in order to function correctly are optional arguments for low and high thresholds for each metric.

I agree that it would be simpler and more efficient for the Shatter process if we allowed for metrics to output multiple results, I think the complexity is transferred to the Extract step, because we would have to include a lot more information in what's being passed to it in order to know how to extract that information.

The biggest problem I see with this is how to generalize this information across Metrics, and what is the relevance of each entry in a list passed to Extract? For percentile, each entry may correspond to it's percentage, but how can we generalize that across multiple Metrics?

I don't immediately see an elegant way to solve this problem. I think we should probably do it eventually, but maybe we should just do the thresholds for now and wait to do multiple value outputs until we have a fleshed out approach to it?

What are your thoughts @bmcgaughey1 ?

bmcgaughey1 · 2024-03-01T20:53:04Z

The single output metric functions makes sense for now. Adding thresholds passed to each metric function also makes sense. These are set globally for all metrics (but one set (high & low) for the base metrics, another for the cover metrics, and then either a list for the strata metrics or the function is called for each strata with strata lower & upper thresholds). I would see these thresholds stored in TileDB along with the list of metrics.

There aren't many of the FUSION metrics that would have multiple return values. percentiles and l-moment metrics are the ones that stand out (just because there are easy ways to compute them). I agree that having a flexible way to handle the multiple return cases is not trivial (or at least not trivial to make it robust).

This reverts commit 87cdbb8.

This reverts commit 9f0e2d5.

This reverts commit 4d3cfc9, reversing changes made to 9a6240b.

kylemann16 · 2024-09-16T17:07:47Z

I'm going to go ahead and merge this. It's in working order, and I would like to apply issues to this as a base and work in smaller increments as it's begun to outgrow its scope.

bmcgaughey1 added 7 commits February 9, 2024 15:00

in progress: adding new metrics

56d5a13

in progress: fixing syntax error

7001c93

added base FUSION metrics

0d7403c

comments

6c0a362

added percentiles as separate functions

bbdbbe5

Corrected syntax metrics.py line 209

97beefa

Added L-moment metrics and profile area

43cd866

bmcgaughey1 temporarily deployed to testing February 13, 2024 22:03 — with GitHub Actions Inactive

Fixing a few metrics, removing errant breakpoint, adding test for met…

7e25d70

…rics

fixing metric test, removing mode method for now

8afcf4e

Merge pull request #71 from hobuinc/kmann/metrics

55e7f35

Kmann/metrics

kylemann16 temporarily deployed to testing February 23, 2024 18:14 — with GitHub Actions Inactive

kylemann16 had a problem deploying to testing September 16, 2024 14:29 — with GitHub Actions Error

fix merge mistake

9f0e2d5

kylemann16 had a problem deploying to testing September 16, 2024 14:30 — with GitHub Actions Failure

kylemann16 had a problem deploying to testing September 16, 2024 14:30 — with GitHub Actions Error

fixing merge mistake

87cdbb8

kylemann16 had a problem deploying to testing September 16, 2024 14:31 — with GitHub Actions Failure

kylemann16 added 5 commits September 16, 2024 10:20

Merge remote-tracking branch 'origin/main' into metric_merge

e3e11e2

Revert "fixing merge mistake"

91d300f

This reverts commit 87cdbb8.

Revert "fix merge mistake"

6594300

This reverts commit 9f0e2d5.

Revert "Merge remote-tracking branch 'origin/main' into metric_merge"

0239aad

This reverts commit 4d3cfc9, reversing changes made to 9a6240b.

Merge branch 'metric_merge' into metrics_revert

6bdf198

kylemann16 temporarily deployed to testing September 16, 2024 16:42 — with GitHub Actions Inactive

kylemann16 merged commit 1454d5f into main Sep 16, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added FUSION metrics #70

Added FUSION metrics #70

bmcgaughey1 commented Feb 13, 2024

hobu commented Feb 14, 2024

bmcgaughey1 commented Feb 15, 2024

hobu commented Feb 15, 2024

kylemann16 commented Feb 23, 2024 •

edited

Loading

bmcgaughey1 commented Feb 23, 2024

bmcgaughey1 commented Feb 23, 2024

kylemann16 commented Feb 23, 2024 •

edited

Loading

hobu commented Feb 23, 2024

bmcgaughey1 commented Feb 29, 2024

kylemann16 commented Mar 1, 2024

bmcgaughey1 commented Mar 1, 2024

kylemann16 commented Sep 16, 2024

Added FUSION metrics #70

Added FUSION metrics #70

Conversation

bmcgaughey1 commented Feb 13, 2024

hobu commented Feb 14, 2024

bmcgaughey1 commented Feb 15, 2024

hobu commented Feb 15, 2024

kylemann16 commented Feb 23, 2024 • edited Loading

bmcgaughey1 commented Feb 23, 2024

bmcgaughey1 commented Feb 23, 2024

kylemann16 commented Feb 23, 2024 • edited Loading

hobu commented Feb 23, 2024

bmcgaughey1 commented Feb 29, 2024

kylemann16 commented Mar 1, 2024

bmcgaughey1 commented Mar 1, 2024

kylemann16 commented Sep 16, 2024

kylemann16 commented Feb 23, 2024 •

edited

Loading

kylemann16 commented Feb 23, 2024 •

edited

Loading