-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added FUSION metrics #70
Conversation
it should be a default that the type for all metrics is I also wonder if each metric shouldn't provide a |
Is there an example of what the test() should look like? I poked around in the tests a bit but didn't see anything. Most of the metrics are computed in simple ways but i suppose the underlying numpy methods could change and cause problems or differences. |
I was thinking add something like this to
Then the test harness would pass in stuff that could be weird (like a vector of 0s or a string or other bad input). If any of the |
@bmcgaughey1 I made a PR with some changes to fix some small stuff as well as adding a simple framework for testing metrics. If you want to look that over, I can merge that into this PR whenever. One thing that I'm finding with this PR is that some of the methods end up dividing by zero with data that hasn't been sanitized. The ones I can think of off the top of my head are |
Kmann/metrics
The divide by zero makes sense. Do you think it is best to test for divide by zero or just trap the exception and return a value indicating the metric couldn't be computed? |
I accepted the change to the call for np.percentile 1 to 100 instead of 1 to 99. the 100th percentile is max and it isn't used in the profilearea metric because of potential outliers. No problem leaving it at 100 but it adds time (arguable a small amount of time but time nonetheless) |
The range method in python returns results that are inclusive on the bottom and exclusive on the top, so
|
I guess we should leave it up to the Metric to decide what it wants to do, but it should be a requirement that if a Metric throws for ANY input on computation, it's |
Still waiting for how we want to handle the height thresholds. There is a "base" threshold for all of the simple metrics, a cover threshold, and a "no threshold" case for computing metrics by height strata. The last actually needs a list of thresholds that define the height strata. Seems to me that we want these are part of init. Calls to simple metric functions could filter the data prior to the call or pass the threshold value to the function and let the filtering happen in the function. For the cover metrics, ratios of return counts above the threshold to total return count, the function will need all the points and the threshold. For strata, we might need to move ahead with the ability to have the metric function return a list of values. However, we could call the strata-metric functions with a lower and upper threshold...one call for each strata. just one more case where it could be done more efficiently (compute multiple metrics with a single function call). |
So what we definitely need in order to function correctly are optional arguments for low and high thresholds for each metric. I agree that it would be simpler and more efficient for the The biggest problem I see with this is how to generalize this information across I don't immediately see an elegant way to solve this problem. I think we should probably do it eventually, but maybe we should just do the thresholds for now and wait to do multiple value outputs until we have a fleshed out approach to it? What are your thoughts @bmcgaughey1 ? |
The single output metric functions makes sense for now. Adding thresholds passed to each metric function also makes sense. These are set globally for all metrics (but one set (high & low) for the base metrics, another for the cover metrics, and then either a list for the strata metrics or the function is called for each strata with strata lower & upper thresholds). I would see these thresholds stored in TileDB along with the list of metrics. There aren't many of the FUSION metrics that would have multiple return values. percentiles and l-moment metrics are the ones that stand out (just because there are easy ways to compute them). I agree that having a flexible way to handle the multiple return cases is not trivial (or at least not trivial to make it robust). |
I'm going to go ahead and merge this. It's in working order, and I would like to apply issues to this as a base and work in smaller increments as it's begun to outgrow its scope. |
Added FUSION metrics except the cover-related metrics that require a height threshold. I think I have everything in place but haven't looked at the values that come out. I suspect there are much faster ways to compute some things. For example, I compute 15 percentile height metrics using a function call for each. It is easy with numpy.percentile() to compute all with one call but things aren't set up to pull a list of metrics into TileDB. Same thing goes for the l-moments and l-moment ratios. Having each in a separate function means the l-moments are computed 7 times. Same thing could be done with a single call.