diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 132de5c1d..dd36665dc 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -14,7 +14,7 @@ concurrency: env: FORCE_COLOR: 3 - TQDM_MININTERVAL: 100 + TQDM_DISABLE: yes_please jobs: build-and-test: @@ -24,7 +24,7 @@ jobs: fail-fast: false matrix: python-version: ["3.9", "3.10", "3.11"] - os: [ubuntu-latest, macOS-latest] + os: [ubuntu-latest, macOS-13] steps: - uses: actions/checkout@v4 diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 6ffddcac6..b7ba2a1b7 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -3,7 +3,7 @@ ci: autoupdate_schedule: "quarterly" autofix_commit_msg: "style: pre-commit fixes" -exclude: ^(attic|tutorials|src/pygama/math|src/pygama/flow/datagroup.py) +exclude: ^(attic|src/pygama/flow/datagroup.py) repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: "v4.5.0" @@ -17,7 +17,6 @@ repos: - id: check-json - id: check-toml - id: check-docstring-first - exclude: ^src/pygama/raw/(fc/fc_event_decoder.py|compass/compass_event_decoder.py)$ - id: debug-statements - id: end-of-file-fixer - id: forbid-new-submodules diff --git a/docs/source/notebooks/MathTutorial.ipynb b/docs/source/notebooks/MathTutorial.ipynb index c14627978..5d2c78f60 100644 --- a/docs/source/notebooks/MathTutorial.ipynb +++ b/docs/source/notebooks/MathTutorial.ipynb @@ -703,10 +703,10 @@ "metadata": {}, "outputs": [], "source": [ - "from pygama.math.functions.pygama_continuous import pygama_continuous\n", + "from pygama.math.functions.pygama_continuous import PygamaContinuous\n", "\n", "\n", - "class cauchy_gen(pygama_continuous):\n", + "class cauchy_gen(PygamaContinuous):\n", " def _pdf(self, x: np.ndarray) -> np.ndarray:\n", " x.flags.writeable = True\n", " return nb_cauchy_pdf(x, 0, 1)\n", @@ -906,7 +906,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Adding distributions with `sum_dists`\n", + "## Adding distributions with `SumDists`\n", "\n", "In the business of fitting data, adding two distributions together is our bread-and-butter. This part of the notebook will show you how to create your own distribution that is an instance of `pygama.math.sum_dists`. There are a couple of different use cases for adding distributions together, and we will look at each in turn, but here is a summary:\n", "\n", @@ -949,16 +949,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## A little about `sum_dists`\n", + "## A little about `SumDists`\n", "The first important thing to note about this class is that all methods in this class are of the form `method(x, *parameter_array)` or `method(x, parameters, ...)`. \n", "\n", - "`sum_dists` works by adding two --- and only two --- distributions together. The first thing a user does in creating a new instance is define the order of elements in a `parameter_array`. Then, `sum_dists` works by grabbing elements at different indices in this `parameter_array` according to rules that the user provides in the instantiation. Here's how to write a generic class: \n", + "`SumDists` works by adding two --- and only two --- distributions together. The first thing a user does in creating a new instance is define the order of elements in a `parameter_array`. Then, `sum_dists` works by grabbing elements at different indices in this `parameter_array` according to rules that the user provides in the instantiation. Here's how to write a generic class: \n", "\n", "1. Create a `parameter_index_array` that holds the indices of what will eventually come in the `parameter_array`. If the user will eventually pass `parameters = [frac1, mu, sigma]` then we just take `parameter_index_array=[frac1, mu, sigma]=range(3)`\n", "\n", - "2. `sum_dists` takes an alternating pattern of distributions and distribution-specific parameter_index_arrays. Each par array can contain `[mu, sigma, shape]`. These par arrays are placed in a tuple with their distribution like `(dist1, [mu1, sigma1, shape1])`. Finally, a list of these tuples is fed to the constructor as `sum_dists([(dist1, [mu1, sigma1, shape1]), (dist2, [mu2, sigma2])])`\n", + "2. `SumDists` takes an alternating pattern of distributions and distribution-specific parameter_index_arrays. Each par array can contain `[mu, sigma, shape]`. These par arrays are placed in a tuple with their distribution like `(dist1, [mu1, sigma1, shape1])`. Finally, a list of these tuples is fed to the constructor as `sum_dists([(dist1, [mu1, sigma1, shape1]), (dist2, [mu2, sigma2])])`\n", "\n", - "3. The `sum_dists` constructor then takes an array corresponding the index locations of where either fraction or area values will be passed in the ultimate `parameter_index_array`.\n", + "3. The `SumDists` constructor then takes an array corresponding the index locations of where either fraction or area values will be passed in the ultimate `parameter_index_array`.\n", "\n", "4. We pass one of the 4 flag options, to be described below. \n", "\n", @@ -978,26 +978,26 @@ "\n", "Finally, we would intitalize (with the `fracs` flag in this case, more on that later) \n", "\n", - "`sum_dists(args, area_frac_idxs = [frac_1], frac_flag = \"fracs\", parameter_names=[\"mu\", \"sigma\", \"tau\", \"frac_1\"])`\n", + "`SumDists(args, area_frac_idxs = [frac_1], frac_flag = \"fracs\", parameter_names=[\"mu\", \"sigma\", \"tau\", \"frac_1\"])`\n", "\n", "\n", - "### So... What is `sum_dists` actually doing? \n", - "Under the hood, `sum_dists` is applying a set of rules so that the following *is always* computed, regardless of the flag sent to the constructor. \n", + "### So... What is `SumDists` actually doing? \n", + "Under the hood, `SumDists` is applying a set of rules so that the following *is always* computed, regardless of the flag sent to the constructor. \n", "\n", "`area1*frac1*dist1(x, mu, sigma, shape) + area2*frac2*dist_2(x, mu2, sigma2, shape2)`\n", "\n", - "It computes this by first grabbing the relevant areas or fraction values from their position in the `parameter_index_array`. Then, at the time of method call, `sum_dists` grabs the values from `parameter_index_array` that correspond to each distribution via slicing the `parameter_index_array` with the individual par arrays passed in the instantiation. In our example above, `sum_dists` knows to grab the values at indices 0, 1, 2 for the first distribution because it is instantiated with the tuple `(dist1, [mu, sigma, tau])`.\n", + "It computes this by first grabbing the relevant areas or fraction values from their position in the `parameter_index_array`. Then, at the time of method call, `SumDists` grabs the values from `parameter_index_array` that correspond to each distribution via slicing the `parameter_index_array` with the individual par arrays passed in the instantiation. In our example above, `SumDists` knows to grab the values at indices 0, 1, 2 for the first distribution because it is instantiated with the tuple `(dist1, [mu, sigma, tau])`.\n", "\n", "There's also some work done to determine which of `area` and `frac` are present. That's the purpose of the `flag`. Let's take some time and learn a little more about what each flag does. \n", " \n", "\n", "\n", - "### The flag in the `sum_dists` constructor\n", + "### The flag in the `SumDists` constructor\n", "Let's say we are interested in knowing the amount of counts present in a signal and a background in our total spectrum, i.e. we want to create and fit a function that looks like\n", "\n", "$pdf= A\\cdot gauss\\_pdf + B\\cdot cauchy\\_pdf$\n", "\n", - "Because we are interested in fitting the areas, we send the `areas` keyword to `flag`. This causes `sum_dists` to look for an `area_frac_idxs` array of length 2 in the instantiation: this array contains the indices that the area values will be located at in the `parameter_index_array`. The `areas` keyword causes the fractions to be set to `1`. \n", + "Because we are interested in fitting the areas, we send the `areas` keyword to `flag`. This causes `SumDists` to look for an `area_frac_idxs` array of length 2 in the instantiation: this array contains the indices that the area values will be located at in the `parameter_index_array`. The `areas` keyword causes the fractions to be set to `1`. \n", "\n", "In our example above for constructing $new\\_pdf(x, \\mu, \\sigma, \\tau, frac_1) = frac_1\\cdot dist_1(x, \\mu, \\sigma, \\tau) + (1-frac1)\\cdot dist_2(x, \\mu, \\sigma)$, the instantiation takes the `fracs` keyword. This causes `sum_dists` to look for an `area_frac_idxs` array of length 1 in the instantiation: this array contains the index that the fraction values will be located at in the `parameter_index_array`. `sum_dists` works by multiplying `frac` times the first distribution, and `1-frac` times the second distribution.\n", "\n", @@ -1011,7 +1011,7 @@ "metadata": {}, "outputs": [], "source": [ - "from pygama.math.functions.sum_dists import sum_dists\n", + "from pygama.math.functions.sum_dists import SumDists\n", "\n", "# we first create an array contains the indices of the parameter array\n", "# that will eventually be input to function calls\n", @@ -1020,7 +1020,7 @@ "# we now create an array containing the tuples of the distributions and their shape parameters\n", "args = [(gaussian, [mu, sigma]), (cauchy, [mu2, sigma2])]\n", "# we initialize with the flag = \"areas\" to let the constructor know we are sending area parameter idxs only\n", - "cauchy_on_gauss = sum_dists(\n", + "cauchy_on_gauss = SumDists(\n", " args,\n", " area_frac_idxs=[area1, area2],\n", " flag=\"areas\",\n", @@ -1097,12 +1097,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## The `fracs` flag in the `sum_dists` constructor\n", - "To get a feel for how `sum_dists` works, let's create a function that creates the following distribution:\n", + "## The `fracs` flag in the `SumDists` constructor\n", + "To get a feel for how `SumDists` works, let's create a function that creates the following distribution:\n", "\n", "$pdf = f_1\\cdot gauss\\_{pdf}+(1-f_1)\\cdot cauchy\\_{pdf}$\n", "\n", - "The `fracs` keyword allows `sum_dists` to look for the parameter index corresponding to the value of one fraction, and then automatically calculates `f*dist1+(1-f)*dist2`" + "The `fracs` keyword allows `SumDists` to look for the parameter index corresponding to the value of one fraction, and then automatically calculates `f*dist1+(1-f)*dist2`" ] }, { @@ -1111,7 +1111,7 @@ "metadata": {}, "outputs": [], "source": [ - "from pygama.math.functions.sum_dists import sum_dists\n", + "from pygama.math.functions.sum_dists import SumDists\n", "\n", "\n", "# we first create an array contains the indices of the parameter array\n", @@ -1121,7 +1121,7 @@ "# we now create an array containing the distributions and their shape parameters\n", "args = [(gaussian, [mu, sigma]), (cauchy, [mu2, sigma2])]\n", "# we initialize with the flag = \"areas\" to let the constructor know we are sending area parameters idxs only\n", - "cauchy_on_gauss = sum_dists(\n", + "cauchy_on_gauss = SumDists(\n", " args,\n", " area_frac_idxs=[frac1],\n", " flag=\"fracs\",\n", @@ -1169,10 +1169,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### By using `sum_dists` on an instance of `sum_dists`, you can even sum three distributions together\n", + "### By using `SumDists` on an instance of `SumDists`, you can even sum three distributions together\n", "See the `hpge_peak` function to see an example of this. \n", "\n", - "There is also one other `flag` keyword that `sum_dists` can take: `one_area`. This special keyword is used if we have an odd number of distributions that we want to add together and fit their areas. The first two distributions can be an instance of `sum_dists` with the `areas` flag; however, to add a third distribution to this, we need a way to pass only one area idx to the instantiation of `sum_dists`. See the `triple_gauss_on_double_step` function for an example of this." + "There is also one other `flag` keyword that `SumDists` can take: `one_area`. This special keyword is used if we have an odd number of distributions that we want to add together and fit their areas. The first two distributions can be an instance of `SumDists` with the `areas` flag; however, to add a third distribution to this, we need a way to pass only one area idx to the instantiation of `SumDists`. See the `triple_gauss_on_double_step` function for an example of this." ] }, { diff --git a/pyproject.toml b/pyproject.toml index 621a67032..f98d87b7d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -30,7 +30,7 @@ classifiers = [ ] requires-python = ">=3.9" dependencies = [ - "boost-histogram", + "hist", "colorlog", "dspeed>=1.3", "h5py>=3.2", @@ -46,7 +46,7 @@ dependencies = [ "scikit-learn", "scipy>=1.0.1", "tables", - "tqdm>=4.27", + "tqdm>=4.66", ] dynamic = [ "version", diff --git a/src/pygama/evt/utils.py b/src/pygama/evt/utils.py index 30d14639a..4f8391353 100644 --- a/src/pygama/evt/utils.py +++ b/src/pygama/evt/utils.py @@ -95,7 +95,7 @@ def find_parameters( if len(hit_flds) > 0: hit_ak = lh5.read_as( - f"{ch.replace('/','')}/{f.hit.group}/", + f"{ch.replace('/', '')}/{f.hit.group}/", f.hit.file, field_mask=hit_flds, idx=idx_ch, @@ -108,7 +108,7 @@ def find_parameters( if len(dsp_flds) > 0: dsp_ak = lh5.read_as( - f"{ch.replace('/','')}/{f.dsp.group}/", + f"{ch.replace('/', '')}/{f.dsp.group}/", f.dsp.file, field_mask=dsp_flds, idx=idx_ch, diff --git a/src/pygama/hit/build_hit.py b/src/pygama/hit/build_hit.py index 7a8c6a241..bb8c4df5a 100644 --- a/src/pygama/hit/build_hit.py +++ b/src/pygama/hit/build_hit.py @@ -153,6 +153,7 @@ def build_hit( if "lgdo_attrs" in info: outcol.attrs |= info["lgdo_attrs"] + log.debug(f"made new column {outname!r}={outcol!r}") outtbl_obj.add_column(outname, outcol) # make high level flags diff --git a/src/pygama/math/binned_fitting.py b/src/pygama/math/binned_fitting.py index 81f6455d3..6199575b5 100644 --- a/src/pygama/math/binned_fitting.py +++ b/src/pygama/math/binned_fitting.py @@ -1,13 +1,12 @@ """ pygama convenience functions for fitting binned data """ + import logging -import math -from typing import Optional, Union, Callable +from typing import Callable, Optional import numpy as np from iminuit import Minuit, cost -from scipy.optimize import brentq, minimize_scalar import pygama.math.histogram as pgh from pygama.math.functions.gauss import nb_gauss_amp @@ -15,31 +14,41 @@ log = logging.getLogger(__name__) -def fit_binned(func: Callable, hist: np.ndarray, bins: np.ndarray, var: np.ndarray=None, guess: np.ndarray=None, - cost_func: str='LL', Extended: bool=True, simplex: bool=False, bounds: tuple[tuple[float, float], ...]=None, fixed: tuple[int, ...] = None) -> tuple[np.ndarray, ...]: +def fit_binned( + func: Callable, + hist: np.ndarray, + bins: np.ndarray, + var: np.ndarray = None, + guess: np.ndarray = None, + cost_func: str = "LL", + extended: bool = True, + simplex: bool = False, + bounds: tuple[tuple[float, float], ...] = None, + fixed: tuple[int, ...] = None, +) -> tuple[np.ndarray, ...]: """ Do a binned fit to a histogram. - Default is Extended Log Likelihood fit, with option for either Least + Default is extended Log Likelihood fit, with option for either Least Squares or other cost function. Parameters ---------- - func + func the function to fit, if using LL as method needs to be a cdf - hist, bins, var + hist, bins, var histogrammed data - guess + guess initial guess parameters cost_func cost function to use - Extended + extended run extended or non extended fit simplex whether to include a round of simpson minimisation before main minimisation - bounds + bounds list of tuples with bounds can be None, e.g. [(0,None), (0,10)] - fixed + fixed list of parameter indices to fix Returns @@ -52,36 +61,38 @@ def fit_binned(func: Callable, hist: np.ndarray, bins: np.ndarray, var: np.ndarr Covariance matrix """ if guess is None: - raise NotImplementedError("auto-guessing not yet implemented, you must supply a guess.") + raise NotImplementedError( + "auto-guessing not yet implemented, you must supply a guess." + ) - if cost_func == 'LL': + if cost_func == "LL": if var is not None: - t_arr = np.zeros((len(hist),2)) - t_arr[:,0] = hist - t_arr[:,1] = var + t_arr = np.zeros((len(hist), 2)) + t_arr[:, 0] = hist + t_arr[:, 1] = var hist = t_arr - if Extended ==True: - cost_func = cost.ExtendedBinnedNLL(hist,bins, func) + if extended is True: + cost_func = cost.ExtendedBinnedNLL(hist, bins, func) else: - cost_func = cost.BinnedNLL( hist,bins, func) + cost_func = cost.BinnedNLL(hist, bins, func) - elif cost_func == 'Least Squares': + elif cost_func == "Least Squares": if var is None: - var = hist # assume Poisson stats if variances are not provided + var = hist # assume Poisson stats if variances are not provided - if len(bins) == len(hist)+1: + if len(bins) == len(hist) + 1: bin_centres = pgh.get_bin_centers(bins) else: - bin_centres = bins + bin_centres = bins # skip "okay" bins with content 0 +/- 0 # if bin content is non-zero but var = 0 let the user see the warning - zeros = (hist == 0) - zero_errors = (var == 0) + zeros = hist == 0 + zero_errors = var == 0 mask = ~(zeros & zero_errors) hist = hist[mask] var = np.sqrt(var[mask]) @@ -96,7 +107,7 @@ def fit_binned(func: Callable, hist: np.ndarray, bins: np.ndarray, var: np.ndarr if fixed is not None: for fix in fixed: m.fixed[fix] = True - if simplex == True: + if simplex is True: m.simplex().migrad() else: m.migrad() @@ -104,13 +115,21 @@ def fit_binned(func: Callable, hist: np.ndarray, bins: np.ndarray, var: np.ndarr return m.values, m.errors, m.covariance -def goodness_of_fit(hist: np.ndarray, bins: np.ndarray, var: np.ndarray, func: Callable, pars: np.ndarray, method: str='var', scale_bins: bool = False) -> tuple[float, int]: +def goodness_of_fit( + hist: np.ndarray, + bins: np.ndarray, + var: np.ndarray, + func: Callable, + pars: np.ndarray, + method: str = "var", + scale_bins: bool = False, +) -> tuple[float, int]: """ Compute chisq and dof of fit Parameters ---------- - hist, bins, var + hist, bins, var histogram data. var can be None if hist is integer counts func the function that was fit to the hist @@ -130,62 +149,73 @@ def goodness_of_fit(hist: np.ndarray, bins: np.ndarray, var: np.ndarray, func: C the number of degrees of freedom """ # arg checks - if method == 'var': + if method == "var": if var is None: raise RuntimeError("var must be non-None to use method 'var'") - if np.any(var==0): + if np.any(var == 0): raise ValueError("var cannot contain zeros") - if method == 'Neyman' and np.any(hist==0): + if method == "Neyman" and np.any(hist == 0): raise ValueError("hist cannot contain zeros for Neyman method") - # compute expected values yy = func(pgh.get_bin_centers(bins), *pars) - if scale_bins == True: + if scale_bins is True: yy *= pgh.get_bin_widths(bins) - - if method == 'LR': - log_lr = 2*np.sum(np.where(hist>0 , yy-hist + hist*np.log((hist+1.e-99) / (yy+1.e-99)), yy-hist)) + if method == "LR": + log_lr = 2 * np.sum( + np.where( + hist > 0, + yy - hist + hist * np.log((hist + 1.0e-99) / (yy + 1.0e-99)), + yy - hist, + ) + ) dof = len(hist) - len(pars) return log_lr, dof else: # compute chi2 numerator and denominator - numerator = (hist - yy)**2 - if method == 'var': + numerator = (hist - yy) ** 2 + if method == "var": denominator = var - elif method == 'Pearson': + elif method == "Pearson": denominator = yy - elif method == 'Neyman': + elif method == "Neyman": denominator = hist else: raise NameError(f"goodness_of_fit: unknown method {method}") # compute chi2 and dof - chisq = np.sum(numerator/denominator) + chisq = np.sum(numerator / denominator) dof = len(hist) - len(pars) return chisq, dof -def poisson_gof(pars: np.ndarray, func: Callable, hist: np.ndarray, bins: np.ndarray, is_integral: bool=False, **kwargs) -> float: +def poisson_gof( + pars: np.ndarray, + func: Callable, + hist: np.ndarray, + bins: np.ndarray, + is_integral: bool = False, + **kwargs, +) -> float: """ - Calculate the goodness of fit for the Poisson likelihood. + Calculate the goodness of fit for the Poisson likelihood. - Parameters + Parameters ---------- - pars + pars The parameters of the function, func - func - The function that was fit - hist - The data that were fit - bins - The bins of the histogram that was fit to - is_integral - Tells get_bin_estimates if the function is an integral function - - Returns + func + The function that was fit + hist + The data that were fit + bins + The bins of the histogram that was fit to + is_integral + Tells get_bin_estimates if the function is an integral function + + Returns ------- Poisson G.O.F. @@ -197,11 +227,19 @@ def poisson_gof(pars: np.ndarray, func: Callable, hist: np.ndarray, bins: np.nda it here. """ mu = pgh.get_bin_estimates(pars, func, bins, is_integral, **kwargs) - return 2.*np.sum(mu + hist*(np.log( (hist+1.e-99) / (mu+1.e-99) ) + 1)) - - -def gauss_mode_width_max(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None, mode_guess: Optional[float]=None, n_bins: Optional[int]=5, - cost_func: str='Least Squares', inflate_errors: Optional[bool]=False, gof_method: Optional[str]='var') -> tuple[np.ndarray, ...]: + return 2.0 * np.sum(mu + hist * (np.log((hist + 1.0e-99) / (mu + 1.0e-99)) + 1)) + + +def gauss_mode_width_max( + hist: np.ndarray, + bins: np.ndarray, + var: Optional[np.ndarray] = None, + mode_guess: Optional[float] = None, + n_bins: Optional[int] = 5, + cost_func: str = "Least Squares", + inflate_errors: Optional[bool] = False, + gof_method: Optional[str] = "var", +) -> tuple[np.ndarray, ...]: r""" Get the max, mode, and width of a peak based on gauss fit near the max Returns the parameters of a gaussian fit over n_bins in the vicinity of the @@ -260,25 +298,37 @@ def gauss_mode_width_max(hist: np.ndarray, bins: np.ndarray, var: Optional[np.nd The covariance matrix for the 3 parameters in pars """ bin_centers = pgh.get_bin_centers(bins) - if mode_guess is not None: i_0 = pgh.find_bin(mode_guess, bins) + if mode_guess is not None: + i_0 = pgh.find_bin(mode_guess, bins) else: i_0 = np.argmax(hist) mode_guess = bin_centers[i_0] amp_guess = hist[i_0] - i_0 -= int(np.floor(n_bins/2)) + i_0 -= int(np.floor(n_bins / 2)) i_n = i_0 + n_bins - width_guess = (bin_centers[i_n] - bin_centers[i_0]) + width_guess = bin_centers[i_n] - bin_centers[i_0] vv = None if var is None else var[i_0:i_n] guess = (mode_guess, width_guess, amp_guess) - pars, errors, cov = fit_binned(nb_gauss_amp, hist[i_0:i_n], bins[i_0:i_n+1], vv, guess=guess, cost_func=cost_func) - if pars[1] < 0: pars[1] = -pars[1] + pars, errors, cov = fit_binned( + nb_gauss_amp, + hist[i_0:i_n], + bins[i_0 : i_n + 1], + vv, + guess=guess, + cost_func=cost_func, + ) + if pars[1] < 0: + pars[1] = -pars[1] if inflate_errors: chi2, dof = goodness_of_fit(hist, bins, var, nb_gauss_amp, pars) - if chi2 > dof: cov *= chi2/dof - return np.asarray([pars['mu'], pars['sigma'], pars['a']]), np.asarray(cov) + if chi2 > dof: + cov *= chi2 / dof + return np.asarray([pars["mu"], pars["sigma"], pars["a"]]), np.asarray(cov) -def gauss_mode_max(hist: np.ndarray, bins: np.ndarray, **kwargs) -> tuple[np.ndarray, ...]: +def gauss_mode_max( + hist: np.ndarray, bins: np.ndarray, **kwargs +) -> tuple[np.ndarray, ...]: """Alias for gauss_mode_width_max that just returns the max and mode See Also @@ -304,7 +354,7 @@ def gauss_mode_max(hist: np.ndarray, bins: np.ndarray, **kwargs) -> tuple[np.nda pars, cov = gauss_mode_width_max(hist, bins, **kwargs) if pars is None or cov is None: raise RuntimeError("fit binned failed to work") - return pars[::2], cov[::2, ::2] # skips "sigma" rows and columns + return pars[::2], cov[::2, ::2] # skips "sigma" rows and columns def gauss_mode(hist: np.ndarray, bins: np.ndarray, **kwargs) -> tuple[float, float]: @@ -320,11 +370,18 @@ def gauss_mode(hist: np.ndarray, bins: np.ndarray, **kwargs) -> tuple[float, flo dmode : the uncertainty in the mode """ pars, cov = gauss_mode_width_max(hist, bins, **kwargs) - if pars is None or cov is None: return None, None + if pars is None or cov is None: + return None, None return pars[0], np.sqrt(cov[0, 0]) -def taylor_mode_max(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None, mode_guess: Optional[float]=None, n_bins: int = 5) -> tuple[np.ndarray, ...]: +def taylor_mode_max( + hist: np.ndarray, + bins: np.ndarray, + var: Optional[np.ndarray] = None, + mode_guess: Optional[float] = None, + n_bins: int = 5, +) -> tuple[np.ndarray, ...]: """Get the max and mode of a peak based on Taylor exp near the max Returns the amplitude and position of a peak based on a poly fit over n_bins in the vicinity of the maximum of the hist (or the max near mode_guess, if provided) @@ -362,18 +419,26 @@ def taylor_mode_max(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray >>> hist, bins, var = pgh.get_hist(normal(size=10000), bins=100, range=(-5,5)) >>> pgf.taylor_mode_max(hist, bins, var, n_bins=5) """ - if mode_guess is not None: i_0 = pgh.find_bin(mode_guess, bins) - else: i_0 = np.argmax(hist) - i_0 -= int(np.floor(n_bins/2)) + if mode_guess is not None: + i_0 = pgh.find_bin(mode_guess, bins) + else: + i_0 = np.argmax(hist) + i_0 -= int(np.floor(n_bins / 2)) i_n = i_0 + n_bins - wts = None if var is None else 1/np.sqrt(var[i_0:i_n]) + wts = None if var is None else 1 / np.sqrt(var[i_0:i_n]) - pars, cov = np.polyfit(pgh.get_bin_centers(bins)[i_0:i_n], hist[i_0:i_n], 2, w=wts, cov='unscaled') + pars, cov = np.polyfit( + pgh.get_bin_centers(bins)[i_0:i_n], hist[i_0:i_n], 2, w=wts, cov="unscaled" + ) mode = -pars[1] / 2 / pars[0] maximum = pars[2] - pars[0] * mode**2 # build the jacobian to compute the output covariance matrix - jac = np.array( [ [pars[1]/2/pars[0]**2, -1/2/pars[0], 0], - [pars[1]**2/4/pars[0]**2, -pars[1]/2/pars[0], 1] ] ) + jac = np.array( + [ + [pars[1] / 2 / pars[0] ** 2, -1 / 2 / pars[0], 0], + [pars[1] ** 2 / 4 / pars[0] ** 2, -pars[1] / 2 / pars[0], 1], + ] + ) cov_jact = np.matmul(cov, jac.transpose()) cov = np.matmul(jac, cov_jact) return (mode, maximum), cov diff --git a/src/pygama/math/distributions.py b/src/pygama/math/distributions.py index aee8ef836..4cff70881 100644 --- a/src/pygama/math/distributions.py +++ b/src/pygama/math/distributions.py @@ -1,32 +1,45 @@ r""" Contains a list of distribution functions, all implemented using Numba's -:func:`numba.jit` to take advantage of the just-in-time speed boost. +:func:`numba.jit` to take advantage of the just-in-time speed boost. """ -from pygama.math.functions.crystal_ball import crystal_ball, nb_crystal_ball_cdf, nb_crystal_ball_pdf -from pygama.math.functions.gauss import ( +# nopycln: file + +from pygama.math.functions.crystal_ball import ( # noqa: F401 + crystal_ball, + nb_crystal_ball_cdf, + nb_crystal_ball_pdf, +) +from pygama.math.functions.error_function import nb_erf, nb_erfc # noqa: F401 +from pygama.math.functions.exgauss import ( # noqa: F401 + exgauss, + nb_exgauss_cdf, + nb_exgauss_pdf, + nb_gauss_tail_approx, +) +from pygama.math.functions.exponential import exponential # noqa: F401 +from pygama.math.functions.gauss import ( # noqa: F401 gaussian, nb_gauss, nb_gauss_cdf, nb_gauss_pdf, - nb_gauss_scaled_pdf + nb_gauss_scaled_pdf, ) -from pygama.math.functions.gauss_on_linear import gauss_on_linear -from pygama.math.functions.gauss_on_uniform import gauss_on_uniform -from pygama.math.functions.gauss_on_step import gauss_on_step -from pygama.math.functions.exgauss import ( - exgauss, - nb_exgauss_pdf, - nb_gauss_tail_approx, - nb_exgauss_cdf +from pygama.math.functions.gauss_on_exgauss import gauss_on_exgauss # noqa: F401 +from pygama.math.functions.gauss_on_linear import gauss_on_linear # noqa: F401 +from pygama.math.functions.gauss_on_step import gauss_on_step # noqa: F401 +from pygama.math.functions.gauss_on_uniform import gauss_on_uniform # noqa: F401 +from pygama.math.functions.hpge_peak import hpge_peak # noqa: F401 +from pygama.math.functions.linear import linear # noqa: F401 +from pygama.math.functions.moyal import moyal # noqa: F401 +from pygama.math.functions.polynomial import nb_poly # noqa: F401 +from pygama.math.functions.step import ( # noqa: F401 + nb_step_cdf, + nb_step_pdf, + nb_unnorm_step_pdf, + step, +) +from pygama.math.functions.triple_gauss_on_double_step import ( # noqa: F401 + triple_gauss_on_double_step, ) -from pygama.math.functions.gauss_on_exgauss import gauss_on_exgauss -from pygama.math.functions.hpge_peak import hpge_peak -from pygama.math.functions.polynomial import nb_poly -from pygama.math.functions.step import step, nb_step_cdf, nb_step_pdf, nb_unnorm_step_pdf -from pygama.math.functions.triple_gauss_on_double_step import triple_gauss_on_double_step -from pygama.math.functions.moyal import moyal -from pygama.math.functions.uniform import uniform -from pygama.math.functions.linear import linear -from pygama.math.functions.exponential import exponential -from pygama.math.functions.error_function import nb_erf, nb_erfc +from pygama.math.functions.uniform import uniform # noqa: F401 diff --git a/src/pygama/math/functions/__init__.py b/src/pygama/math/functions/__init__.py index acb8e89f8..0287f55b7 100644 --- a/src/pygama/math/functions/__init__.py +++ b/src/pygama/math/functions/__init__.py @@ -1,5 +1,5 @@ r""" -Statistical distributions for the Pygama pacakge. +Statistical distributions for the Pygama package. Each distribution requires the definition of four vectorized numbafied functions that take an array as an input: @@ -13,11 +13,11 @@ Returns area*nb_dist_prd 4. :func:`nb_dist_scaled_cdf(x, area, mu, sigma, shape)` -Returns area*nb_dist_cdf +Returns area*nb_dist_cdf NOTE: The order of the arguments of these functions follows the ordering convention from left to right (for whichever are present): x, x_lo, x_hi, area, mu, sigma, shapes -Then these four functions are pacakged into a class that subclasses our own pygama_continuous. This distribution class then has 9 required methods. +Then these four functions are packaged into a class that subclasses our own PygamaContinuous. This distribution class then has 9 required methods. 1. :func:`_pdf(x, mu, sigma, shape)` An overloading of the scipy rv_continuous base class' method, this is slow, but it enables access to other scipy rv_continuous methods like _rvs @@ -48,12 +48,12 @@ NOTE: the order of the arguments to these functions must follow the convention for whichever are present: x, x_lo, x_hi,, area, mu, sigma, shape -pygama_continuous subclasses scipy's rv_continuous and overloads the class instantiation so that is calls numba_frozen. numba_frozen is +PygamaContinuous subclasses scipy's rv_continuous and overloads the class instantiation so that is calls numba_frozen. numba_frozen is a subclass of rv_continuous, and acts exactly like scipy's rv_continuous_frozen, except that this class also has methods for pygama specific get_pdf, get_cdf, pdf_ext, and cdf_ext. This subclassing is necessary so that we can instantiate frozen distributions that -have access to our required pygama class methods. +have access to our required pygama class methods. -In addition, the class sum_dists subclasses rv_continuous. This is so that distributions created out of the sum of distributions also have -access to scipy rv_continuous methods, such as random sampling. sum_dists also has convenience functions such as draw_pdf, which plots the pdf, +In addition, the class SumDists subclasses rv_continuous. This is so that distributions created out of the sum of distributions also have +access to scipy rv_continuous methods, such as random sampling. SumDists also has convenience functions such as draw_pdf, which plots the pdf, and required_args which returns a list of the required arguments for the sum of the distribution. -""" \ No newline at end of file +""" diff --git a/src/pygama/math/functions/crystal_ball.py b/src/pygama/math/functions/crystal_ball.py index 2777fa3bb..c575ec8eb 100644 --- a/src/pygama/math/functions/crystal_ball.py +++ b/src/pygama/math/functions/crystal_ball.py @@ -2,20 +2,20 @@ Crystal ball distributions for Pygama """ -import math +from math import erf import numba as nb import numpy as np -from math import erf -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults - +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_kwargs) -def nb_crystal_ball_pdf(x: np.ndarray, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: +def nb_crystal_ball_pdf( + x: np.ndarray, mu: float, sigma: float, beta: float, m: float +) -> np.ndarray: r""" PDF of a power-law tail plus gaussian. Its range of support is :math:`x\in\mathbb{R}, \beta>0, m>1`. It computes: @@ -24,13 +24,13 @@ def nb_crystal_ball_pdf(x: np.ndarray, mu: float, sigma: float, beta: float, m: pdf(x, \beta, m, \mu, \sigma) = \begin{cases}NA(B-\frac{x-\mu}{\sigma})^{-m} \quad \frac{x-\mu}{\sigma}\leq -\beta \\ Ne^{-(\frac{x-\mu}{\sigma})^2/2} \quad \frac{x-\mu}{\sigma}>-\beta\end{cases} - Where + Where .. math:: A = \frac{m^m}{\beta^m}e^{-\beta^2/2} \\ B = \frac{m}{\beta}-\beta \\ - N = \frac{1}{\sigma \frac{m e^{-\beta^2/2}}{\beta(m-1)} + \sigma \sqrt{\frac{\pi}{2}}\left(1+\text{erf}\left(\frac{\beta}{\sqrt{2}}\right)\right)} + n = \frac{1}{\sigma \frac{m e^{-\beta^2/2}}{\beta(m-1)} + \sigma \sqrt{\frac{\pi}{2}}\left(1+\text{erf}\left(\frac{\beta}{\sqrt{2}}\right)\right)} As a Numba vectorized function, it runs slightly faster than @@ -54,42 +54,47 @@ def nb_crystal_ball_pdf(x: np.ndarray, mu: float, sigma: float, beta: float, m: raise ValueError("beta must be greater than 0, and m must be greater than 1") # Define some constants to calculate the function - const_A =(m/np.abs(beta))**m * np.exp(-1*beta**2/2.0) - const_B = m/np.abs(beta) - np.abs(beta) + const_a = (m / np.abs(beta)) ** m * np.exp(-1 * beta**2 / 2.0) + const_b = m / np.abs(beta) - np.abs(beta) - N = 1.0/(m/np.abs(beta)/(m-1)*np.exp(-beta**2/2.0) + np.sqrt(np.pi/2)*(1+erf(np.abs(beta)/np.sqrt(2.0)))) + n = 1.0 / ( + m / np.abs(beta) / (m - 1) * np.exp(-(beta**2) / 2.0) + + np.sqrt(np.pi / 2) * (1 + erf(np.abs(beta) / np.sqrt(2.0))) + ) y = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): # Shift the distribution - y[i] = (x[i]-mu)/sigma + y[i] = (x[i] - mu) / sigma # Check if it is powerlaw - if y[i] <= -1*beta: - y[i] = N*const_A*(const_B-y[i])**(-1*m)/sigma + if y[i] <= -1 * beta: + y[i] = n * const_a * (const_b - y[i]) ** (-1 * m) / sigma # If it isn't power law, then it Gaussian else: - y[i] = N*np.exp(-1*y[i]**2/2)/sigma + y[i] = n * np.exp(-1 * y[i] ** 2 / 2) / sigma return y @nb.njit(**nb_kwargs) -def nb_crystal_ball_cdf(x: np.ndarray, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: +def nb_crystal_ball_cdf( + x: np.ndarray, mu: float, sigma: float, beta: float, m: float +) -> np.ndarray: r""" - CDF for power-law tail plus gaussian. Its range of support is :math:`x\in\mathbb{R}, \beta>0, m>1`. It computes: + CDF for power-law tail plus gaussian. Its range of support is :math:`x\in\mathbb{R}, \beta>0, m>1`. It computes: - .. math:: - cdf(x, \beta, m, \mu, \sigma)= \begin{cases} NA\sigma\frac{(B-\frac{x-\mu}{\sigma})^{1-m}}{m-1} \quad , \frac{x-\mu}{\sigma} \leq -\beta \\ NA\sigma\frac{(B+\beta)^{1-m}}{m-1} + N\sigma \sqrt{\frac{\pi}{2}}\left(\text{erf}\left(\frac{x-\mu}{\sigma \sqrt{2}}\right)+\text{erf}\left(\frac{\beta}{\sqrt{2}}\right)\right) \quad , \frac{x-\mu}{\sigma} > -\beta \end{cases} + .. math:: + cdf(x, \beta, m, \mu, \sigma)= \begin{cases} NA\sigma\frac{(B-\frac{x-\mu}{\sigma})^{1-m}}{m-1} \quad , \frac{x-\mu}{\sigma} \leq -\beta \\ NA\sigma\frac{(B+\beta)^{1-m}}{m-1} + n\sigma \sqrt{\frac{\pi}{2}}\left(\text{erf}\left(\frac{x-\mu}{\sigma \sqrt{2}}\right)+\text{erf}\left(\frac{\beta}{\sqrt{2}}\right)\right) \quad , \frac{x-\mu}{\sigma} > -\beta \end{cases} - Where + Where .. math:: A = \frac{m^m}{\beta^m}e^{-\beta^2/2} \\ B = \frac{m}{\beta}-\beta \\ - N = \frac{1}{\sigma \frac{m e^{-\beta^2/2}}{\beta(m-1)} + \sigma \sqrt{\frac{\pi}{2}}\left(1+\text{erf}\left(\frac{\beta}{\sqrt{2}}\right)\right)} + n = \frac{1}{\sigma \frac{m e^{-\beta^2/2}}{\beta(m-1)} + \sigma \sqrt{\frac{\pi}{2}}\left(1+\text{erf}\left(\frac{\beta}{\sqrt{2}}\right)\right)} As a Numba vectorized function, it runs slightly faster than @@ -112,33 +117,38 @@ def nb_crystal_ball_cdf(x: np.ndarray, mu: float, sigma: float, beta: float, m: if (beta <= 0) or (m <= 1): raise ValueError("beta must be greater than 0, and m must be greater than 1") # Define some constants to calculate the function - const_A =(m/np.abs(beta))**m * np.exp(-1*beta**2/2.0) - const_B = m/np.abs(beta) - np.abs(beta) + const_a = (m / np.abs(beta)) ** m * np.exp(-1 * beta**2 / 2.0) + const_b = m / np.abs(beta) - np.abs(beta) # Calculate the normalization constant - N = 1.0/((np.sqrt(np.pi/2)*(erf(beta/np.sqrt(2)) + 1))\ - + ((const_A*(const_B+beta)**(1-m))/(m-1))) + n = 1.0 / ( + (np.sqrt(np.pi / 2) * (erf(beta / np.sqrt(2)) + 1)) + + ((const_a * (const_b + beta) ** (1 - m)) / (m - 1)) + ) - y = np.empty_like(x, dtype = np.float64) + y = np.empty_like(x, dtype=np.float64) # Check if it is in the power law part for i in nb.prange(x.shape[0]): # Shift the distribution - y[i] = (x[i]-mu)/sigma - if y[i] <= -1*beta: - y[i] = N*const_A*((const_B-y[i])**(1-m))/(m-1) + y[i] = (x[i] - mu) / sigma + if y[i] <= -1 * beta: + y[i] = n * const_a * ((const_b - y[i]) ** (1 - m)) / (m - 1) # If it isn't in the power law, then it is Gaussian else: - y[i] = const_A*N*((const_B+beta)**(1-m))/(m-1)\ - + N*np.sqrt(np.pi/2)*(erf(beta/np.sqrt(2))+erf(y[i]/np.sqrt(2))) + y[i] = const_a * n * ((const_b + beta) ** (1 - m)) / (m - 1) + n * np.sqrt( + np.pi / 2 + ) * (erf(beta / np.sqrt(2)) + erf(y[i] / np.sqrt(2))) return y @nb.njit(**nb_defaults(parallel=False)) -def nb_crystal_ball_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: +def nb_crystal_ball_scaled_pdf( + x: np.ndarray, area: float, mu: float, sigma: float, beta: float, m: float +) -> np.ndarray: r""" - Scaled PDF of a power-law tail plus gaussian. + Scaled PDF of a power-law tail plus gaussian. As a Numba vectorized function, it runs slightly faster than 'out of the box' functions. @@ -162,9 +172,11 @@ def nb_crystal_ball_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: flo @nb.njit(**nb_defaults(parallel=False)) -def nb_crystal_ball_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: +def nb_crystal_ball_scaled_cdf( + x: np.ndarray, area: float, mu: float, sigma: float, beta: float, m: float +) -> np.ndarray: r""" - Scaled CDF for power-law tail plus gaussian. Used for extended binned fits. + Scaled CDF for power-law tail plus gaussian. Used for extended binned fits. As a Numba vectorized function, it runs slightly faster than 'out of the box' functions. @@ -187,37 +199,81 @@ def nb_crystal_ball_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: flo return area * nb_crystal_ball_cdf(x, mu, sigma, beta, m) -class crystal_ball_gen(pygama_continuous): +class CrystalBallGen(PygamaContinuous): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) - self.x_lo = -1*np.inf + self.x_lo = -1 * np.inf self.x_hi = np.inf - def _pdf(self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: + def _pdf( + self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float + ) -> np.ndarray: x.flags.writeable = True return nb_crystal_ball_pdf(x, mu[0], sigma[0], beta[0], m[0]) - def _cdf(self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: + + def _cdf( + self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float + ) -> np.ndarray: x.flags.writeable = True return nb_crystal_ball_cdf(x, mu[0], sigma[0], beta[0], m[0]) - def get_pdf(self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: + def get_pdf( + self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float + ) -> np.ndarray: return nb_crystal_ball_pdf(x, mu, sigma, beta, m) - def get_cdf(self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: + + def get_cdf( + self, x: np.ndarray, mu: float, sigma: float, beta: float, m: float + ) -> np.ndarray: return nb_crystal_ball_cdf(x, mu, sigma, beta, m) - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: + def pdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + beta: float, + m: float, + ) -> np.ndarray: return self._pdf_norm(x, x_lo, x_hi, mu, sigma, beta, m) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: - return self._cdf_norm(x, x_lo, x_hi, mu, sigma, beta, m) + def cdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + beta: float, + m: float, + ) -> np.ndarray: + return self._cdf_norm(x, x_lo, x_hi, mu, sigma, beta, m) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: - return np.diff(nb_crystal_ball_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma, beta, m))[0], nb_crystal_ball_scaled_pdf(x, area, mu, sigma, beta, m) - def cdf_ext(self, x: np.ndarray, area: float, mu: float, sigma: float, beta: float, m: float) -> np.ndarray: + def pdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + beta: float, + m: float, + ) -> np.ndarray: + return np.diff( + nb_crystal_ball_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma, beta, m) + )[0], nb_crystal_ball_scaled_pdf(x, area, mu, sigma, beta, m) + + def cdf_ext( + self, x: np.ndarray, area: float, mu: float, sigma: float, beta: float, m: float + ) -> np.ndarray: return nb_crystal_ball_scaled_cdf(x, area, mu, sigma, beta, m) def required_args(self) -> tuple[str, str, str, str]: return "mu", "sigma", "beta", "m" -crystal_ball = crystal_ball_gen(name = "crystal_ball") \ No newline at end of file + +crystal_ball = CrystalBallGen(name="crystal_ball") diff --git a/src/pygama/math/functions/error_function.py b/src/pygama/math/functions/error_function.py index bdff7647a..e9e2f8abd 100644 --- a/src/pygama/math/functions/error_function.py +++ b/src/pygama/math/functions/error_function.py @@ -1,13 +1,11 @@ -from typing import Union -import math from math import erf, erfc +from typing import Union import numba as nb import numpy as np -@nb.vectorize([nb.float32(nb.float32), -nb.float64(nb.float64)]) +@nb.vectorize([nb.float32(nb.float32), nb.float64(nb.float64)]) def nb_erf(x: Union[float, np.ndarray]) -> Union[float, np.ndarray]: r""" Numba version of error function. @@ -29,9 +27,7 @@ def nb_erf(x: Union[float, np.ndarray]) -> Union[float, np.ndarray]: return erf(x) - -@nb.vectorize([nb.float32(nb.float32), -nb.float64(nb.float64)]) +@nb.vectorize([nb.float32(nb.float32), nb.float64(nb.float64)]) def nb_erfc(x: Union[float, np.ndarray]) -> Union[float, np.ndarray]: r""" Numba version of complementary error function @@ -49,5 +45,5 @@ def nb_erfc(x: Union[float, np.ndarray]) -> Union[float, np.ndarray]: math.erfc(x) Complementary error function acting on the input """ - + return erfc(x) diff --git a/src/pygama/math/functions/exgauss.py b/src/pygama/math/functions/exgauss.py index 6af84269f..cbdbff948 100644 --- a/src/pygama/math/functions/exgauss.py +++ b/src/pygama/math/functions/exgauss.py @@ -3,27 +3,25 @@ """ import sys -from typing import Union +from math import erf, erfc import numba as nb import numpy as np -from math import erfc, erf -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.math.functions.error_function import nb_erf, nb_erfc -from pygama.math.functions.gauss import nb_gauss_cdf, nb_gauss_pdf - -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.gauss import nb_gauss_pdf +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs -limit = np.log(sys.float_info.max)/10 - +limit = np.log(sys.float_info.max) / 10 @nb.njit(**nb_defaults(parallel=False)) -def nb_gauss_tail_exact(x: float, mu: float, sigma: float, tau: float, tmp: float) -> float: +def nb_gauss_tail_exact( + x: float, mu: float, sigma: float, tau: float, tmp: float +) -> float: r""" - Exact form of a normalized exponentially modified Gaussian PDF. + Exact form of a normalized exponentially modified Gaussian PDF. It computes the following PDF: @@ -52,23 +50,31 @@ def nb_gauss_tail_exact(x: float, mu: float, sigma: float, tau: float, tmp: floa The scaled version of the exponential argument - See Also + See Also -------- :func:`nb_exgauss_pdf` """ abstau = np.absolute(tau) - if tmp < limit: tmp = tmp - else: tmp = limit - if sigma ==0 or abstau ==0: - return x*0 - z = (x-mu)/sigma - tail_f = (1/(2*abstau)) * np.exp(tmp) * erfc( (tau*z + sigma)/(np.sqrt(2)*abstau)) + if tmp < limit: + tmp = tmp + else: + tmp = limit + if sigma == 0 or abstau == 0: + return x * 0 + z = (x - mu) / sigma + tail_f = ( + (1 / (2 * abstau)) + * np.exp(tmp) + * erfc((tau * z + sigma) / (np.sqrt(2) * abstau)) + ) return tail_f @nb.njit(**nb_defaults(parallel=False)) -def nb_gauss_tail_approx(x: np.ndarray, mu: float, sigma: float, tau: float) -> np.ndarray: +def nb_gauss_tail_approx( + x: np.ndarray, mu: float, sigma: float, tau: float +) -> np.ndarray: r""" Approximate form of a normalized exponentially modified Gaussian PDF As a Numba JIT function, it runs slightly faster than @@ -86,16 +92,16 @@ def nb_gauss_tail_approx(x: np.ndarray, mu: float, sigma: float, tau: float) -> The characteristic scale of the Gaussian tail - See Also + See Also -------- :func:`nb_exgauss_pdf` """ - if sigma ==0: - return x*0 - elif (sigma + tau*(x-mu)/sigma)==0: - return x*0 - den = 1/(sigma + tau*(x-mu)/sigma) - tail_f = sigma * nb_gauss_pdf(x, mu, sigma) * den * (1.-tau*tau*den*den) + if sigma == 0: + return x * 0 + elif (sigma + tau * (x - mu) / sigma) == 0: + return x * 0 + den = 1 / (sigma + tau * (x - mu) / sigma) + tail_f = sigma * nb_gauss_pdf(x, mu, sigma) * den * (1.0 - tau * tau * den * den) return tail_f @@ -122,7 +128,7 @@ def nb_exgauss_pdf(x: np.ndarray, mu: float, sigma: float, tau: float) -> np.nda The characteristic scale of the Gaussian tail - See Also + See Also -------- :func:`nb_gauss_tail_exact`, :func:`nb_gauss_tail_approx` """ @@ -133,7 +139,7 @@ def nb_exgauss_pdf(x: np.ndarray, mu: float, sigma: float, tau: float) -> np.nda if tau == 0: tail_f[i] = np.nan else: - tmp = ((x[i]-mu)/tau) + ((sigma**2)/(2*tau**2)) + tmp = ((x[i] - mu) / tau) + ((sigma**2) / (2 * tau**2)) if tmp < limit: tail_f[i] = nb_gauss_tail_exact(x[i], mu, sigma, tau, tmp) else: @@ -145,7 +151,7 @@ def nb_exgauss_pdf(x: np.ndarray, mu: float, sigma: float, tau: float) -> np.nda def nb_exgauss_cdf(x: np.ndarray, mu: float, sigma: float, tau: float) -> np.ndarray: r""" The CDF for a normalized exponentially modified Gaussian. Its range of support is :math:`x\in(-\infty,\infty)`, :math:`\tau\in(-\infty,\infty)` - It computes the following CDF: + It computes the following CDF: .. math:: @@ -174,17 +180,23 @@ def nb_exgauss_cdf(x: np.ndarray, mu: float, sigma: float, tau: float) -> np.nda if tau == 0: cdf[i] = np.nan else: - cdf[i] = (tau/(2*abstau)) * erf((tau*(x[i]-mu) )/(np.sqrt(2)*sigma*abstau)) - tmp = ((x[i]-mu)/tau) + ((sigma**2)/(2*tau**2)) + cdf[i] = (tau / (2 * abstau)) * erf( + (tau * (x[i] - mu)) / (np.sqrt(2) * sigma * abstau) + ) + tmp = ((x[i] - mu) / tau) + ((sigma**2) / (2 * tau**2)) if tmp < limit: - cdf[i] += tau*nb_gauss_tail_exact(x[i], mu, sigma, tau, tmp) + 0.5 # This is duplicated code from the pdf, but putting it in parallel makes it run faster! + cdf[i] += ( + tau * nb_gauss_tail_exact(x[i], mu, sigma, tau, tmp) + 0.5 + ) # This is duplicated code from the pdf, but putting it in parallel makes it run faster! else: - cdf[i] += tau*nb_gauss_tail_approx(x[i], mu, sigma, tau) + 0.5 + cdf[i] += tau * nb_gauss_tail_approx(x[i], mu, sigma, tau) + 0.5 return cdf @nb.njit(**nb_defaults(parallel=False)) -def nb_exgauss_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float, tau: float) -> np.ndarray: +def nb_exgauss_scaled_pdf( + x: np.ndarray, area: float, mu: float, sigma: float, tau: float +) -> np.ndarray: r""" Scaled PDF of an exponentially modified Gaussian distribution Can be used as a component of other fit functions w/args mu,sigma,tau @@ -209,7 +221,9 @@ def nb_exgauss_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float, t @nb.njit(**nb_defaults(parallel=False)) -def nb_exgauss_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float, tau: float) -> np.ndarray: +def nb_exgauss_scaled_cdf( + x: np.ndarray, area: float, mu: float, sigma: float, tau: float +) -> np.ndarray: r""" Scaled CDF of an exponentially modified Gaussian distribution As a Numba JIT function, it runs slightly faster than @@ -228,40 +242,76 @@ def nb_exgauss_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float, t tau The characteristic scale of the Gaussian tail """ - + return area * nb_exgauss_cdf(x, mu, sigma, tau) -class exgauss_gen(pygama_continuous): +class ExgaussGen(PygamaContinuous): def __init__(self, *args, **kwargs): - self.x_lo = -1*np.inf + self.x_lo = -1 * np.inf self.x_hi = np.inf super().__init__(*args, **kwargs) def _pdf(self, x: np.ndarray, sigma: float, tau: float) -> np.ndarray: x.flags.writeable = True - return nb_exgauss_pdf(x, 0, 1, tau[0]/sigma[0]) # the scipy parameter k = tau/sigma + return nb_exgauss_pdf( + x, 0, 1, tau[0] / sigma[0] + ) # the scipy parameter k = tau/sigma + def _cdf(self, x: np.ndarray, sigma: float, tau: float) -> np.ndarray: x.flags.writeable = True - return nb_exgauss_cdf(x, 0, 1, tau[0]/sigma[0]) + return nb_exgauss_cdf(x, 0, 1, tau[0] / sigma[0]) def get_pdf(self, x: np.ndarray, mu: float, sigma: float, tau: float) -> np.ndarray: - return nb_exgauss_pdf(x, mu, sigma, tau) + return nb_exgauss_pdf(x, mu, sigma, tau) + def get_cdf(self, x: np.ndarray, mu: float, sigma: float, tau: float) -> np.ndarray: return nb_exgauss_cdf(x, mu, sigma, tau) - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, tau: float) -> np.ndarray: + def pdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + tau: float, + ) -> np.ndarray: return self._pdf_norm(x, x_lo, x_hi, mu, sigma, tau) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, tau: float) -> np.ndarray: + + def cdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + tau: float, + ) -> np.ndarray: return self._cdf_norm(x, x_lo, x_hi, mu, sigma, tau) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, tau: float) -> np.ndarray: - return np.diff(nb_exgauss_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma, tau))[0], nb_exgauss_scaled_pdf(x, area, mu, sigma, tau) - def cdf_ext(self, x: np.ndarray, area: float, mu: float, sigma: float, tau: float) -> np.ndarray: + def pdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + tau: float, + ) -> np.ndarray: + return np.diff( + nb_exgauss_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma, tau) + )[0], nb_exgauss_scaled_pdf(x, area, mu, sigma, tau) + + def cdf_ext( + self, x: np.ndarray, area: float, mu: float, sigma: float, tau: float + ) -> np.ndarray: return nb_exgauss_scaled_cdf(x, area, mu, sigma, tau) def required_args(self) -> tuple[str, str, str]: return "mu", "sigma", "tau" -exgauss = exgauss_gen(name='exgauss') + +exgauss = ExgaussGen(name="exgauss") diff --git a/src/pygama/math/functions/exponential.py b/src/pygama/math/functions/exponential.py index 645cb0ef0..a29e3eee6 100644 --- a/src/pygama/math/functions/exponential.py +++ b/src/pygama/math/functions/exponential.py @@ -2,26 +2,25 @@ Exponential distributions for pygama """ -import sys - import numba as nb import numpy as np -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults - +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_kwargs) -def nb_exponential_pdf(x: np.ndarray, mu: float, sigma: float, lamb: float) -> np.ndarray: +def nb_exponential_pdf( + x: np.ndarray, mu: float, sigma: float, lamb: float +) -> np.ndarray: r""" - Normalised exponential probability density distribution, w/ args: mu, sigma, lamb. Its range of support is :math:`x\in[0,\infty), \lambda>0`. + Normalised exponential probability density distribution, w/ args: mu, sigma, lamb. Its range of support is :math:`x\in[0,\infty), \lambda>0`. It computes: .. math:: - pdf(x, \lambda, \mu, \sigma) = \begin{cases} \lambda e^{-\lambda\frac{x-\mu}{\sigma}} \quad , \frac{x-\mu}{\sigma}\geq 0 \\ 0 \quad , \frac{x-\mu}{\sigma}<0 \end{cases} + pdf(x, \lambda, \mu, \sigma) = \begin{cases} \lambda e^{-\lambda\frac{x-\mu}{\sigma}} \quad , \frac{x-\mu}{\sigma}\geq 0 \\ 0 \quad , \frac{x-\mu}{\sigma}<0 \end{cases} As a Numba JIT function, it runs slightly faster than @@ -40,25 +39,27 @@ def nb_exponential_pdf(x: np.ndarray, mu: float, sigma: float, lamb: float) -> n The rate """ - y = np.empty_like(x, dtype = np.float64) + y = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): - y[i] = (x[i]-mu)/sigma - if y[i] <0 : + y[i] = (x[i] - mu) / sigma + if y[i] < 0: y[i] = 0 else: - y[i] = (lamb*np.exp(-1*lamb*y[i]))/sigma + y[i] = (lamb * np.exp(-1 * lamb * y[i])) / sigma return y @nb.njit(**nb_kwargs) -def nb_exponential_cdf(x: np.ndarray, mu: float, sigma: float, lamb: float) -> np.ndarray: +def nb_exponential_cdf( + x: np.ndarray, mu: float, sigma: float, lamb: float +) -> np.ndarray: r""" - Normalised exponential cumulative distribution, w/ args: mu, sigma, lamb. Its range of support is :math:`x\in[0,\infty), \lambda>0`. + Normalised exponential cumulative distribution, w/ args: mu, sigma, lamb. Its range of support is :math:`x\in[0,\infty), \lambda>0`. It computes: .. math:: - cdf(x, \lambda, \mu, \sigma) = \begin{cases} 1-e^{-\lambda\frac{x-\mu}{\sigma}} \quad , \frac{x-\mu}{\sigma} > 0 \\ 0 \quad , \frac{x-\mu}{\sigma}\leq 0 \end{cases} + cdf(x, \lambda, \mu, \sigma) = \begin{cases} 1-e^{-\lambda\frac{x-\mu}{\sigma}} \quad , \frac{x-\mu}{\sigma} > 0 \\ 0 \quad , \frac{x-\mu}{\sigma}\leq 0 \end{cases} As a Numba JIT function, it runs slightly faster than @@ -77,18 +78,20 @@ def nb_exponential_cdf(x: np.ndarray, mu: float, sigma: float, lamb: float) -> n The rate """ - y = np.empty_like(x, dtype = np.float64) + y = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): - y[i] = (x[i]-mu)/sigma + y[i] = (x[i] - mu) / sigma if y[i] <= 0: - y[i] = 0 + y[i] = 0 else: - y[i] = (1-np.exp(-1*lamb*y[i])) + y[i] = 1 - np.exp(-1 * lamb * y[i]) return y @nb.njit(**nb_defaults(parallel=False)) -def nb_exponential_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float, lamb: float) -> np.ndarray: +def nb_exponential_scaled_pdf( + x: np.ndarray, area: float, mu: float, sigma: float, lamb: float +) -> np.ndarray: r""" Scaled exponential probability distribution, w/ args: area, mu, sigma, lambd. As a Numba JIT function, it runs slightly faster than @@ -106,15 +109,17 @@ def nb_exponential_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: floa The amount to scale the distribution lamb The rate - """ + """ return area * nb_exponential_pdf(x, mu, sigma, lamb) @nb.njit(**nb_defaults(parallel=False)) -def nb_exponential_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float, lamb: float) -> np.ndarray: +def nb_exponential_scaled_cdf( + x: np.ndarray, area: float, mu: float, sigma: float, lamb: float +) -> np.ndarray: r""" - Exponential cdf scaled by the area, used for extended binned fits + Exponential cdf scaled by the area, used for extended binned fits As a Numba JIT function, it runs slightly faster than 'out of the box' functions. @@ -131,12 +136,12 @@ def nb_exponential_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: floa The amount to scale the distribution lamb The rate - """ - + """ + return area * nb_exponential_cdf(x, mu, sigma, lamb) -class exponential_gen(pygama_continuous): +class ExponentialGen(PygamaContinuous): def __init__(self, *args, **kwargs): self.x_lo = 0 @@ -146,27 +151,65 @@ def __init__(self, *args, **kwargs): def _pdf(self, x: np.ndarray, mu: float, sigma: float, lamb: float) -> np.ndarray: x.flags.writeable = True return nb_exponential_pdf(x, mu[0], sigma[0], lamb[0]) + def _cdf(self, x: np.ndarray, mu: float, sigma: float, lamb: float) -> np.ndarray: x.flags.writeable = True return nb_exponential_cdf(x, mu[0], sigma[0], lamb[0]) - def get_pdf(self, x: np.ndarray, mu: float, sigma: float, lamb: float) -> np.ndarray: - return nb_exponential_pdf(x, mu, sigma, lamb) - def get_cdf(self, x: np.ndarray, mu: float, sigma: float, lamb: float) -> np.ndarray: + def get_pdf( + self, x: np.ndarray, mu: float, sigma: float, lamb: float + ) -> np.ndarray: + return nb_exponential_pdf(x, mu, sigma, lamb) + + def get_cdf( + self, x: np.ndarray, mu: float, sigma: float, lamb: float + ) -> np.ndarray: return nb_exponential_cdf(x, mu, sigma, lamb) # needed so that we can hack iminuit's introspection to function parameter names... unless - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, lamb: float) -> np.ndarray: + def pdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + lamb: float, + ) -> np.ndarray: return self._pdf_norm(x, x_lo, x_hi, mu, sigma, lamb) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, lamb: float) -> np.ndarray: + + def cdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + lamb: float, + ) -> np.ndarray: return self._cdf_norm(x, x_lo, x_hi, mu, sigma, lamb) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, lamb: float) -> np.ndarray: - return np.diff(nb_exponential_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma, lamb))[0], nb_exponential_scaled_pdf(x, area, mu, sigma, lamb) - def cdf_ext(self, x: np.ndarray, area: float, mu: float, sigma: float, lamb: float) -> np.ndarray: + def pdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + lamb: float, + ) -> np.ndarray: + return np.diff( + nb_exponential_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma, lamb) + )[0], nb_exponential_scaled_pdf(x, area, mu, sigma, lamb) + + def cdf_ext( + self, x: np.ndarray, area: float, mu: float, sigma: float, lamb: float + ) -> np.ndarray: return nb_exponential_scaled_cdf(x, area, mu, sigma, lamb) - + def required_args(self) -> tuple[str, str, str]: return "mu", "sigma", "lambda" -exponential = exponential_gen(a=0.0, name='exponential') \ No newline at end of file + +exponential = ExponentialGen(a=0.0, name="exponential") diff --git a/src/pygama/math/functions/gauss.py b/src/pygama/math/functions/gauss.py index 11b2b2be4..4b2cb8841 100644 --- a/src/pygama/math/functions/gauss.py +++ b/src/pygama/math/functions/gauss.py @@ -1,14 +1,12 @@ """ Gaussian distributions for pygama """ -import sys -from typing import Union import numba as nb import numpy as np from pygama.math.functions.error_function import nb_erf -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults @@ -32,10 +30,12 @@ def nb_gauss(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: TODO:: remove this in favor of using nb_gauss_pdf with a different normalization """ - if sigma ==0: invs=np.inf - else: invs = 1.0 / sigma + if sigma == 0: + invs = np.inf + else: + invs = 1.0 / sigma z = (x - mu) * invs - return np.exp(-0.5 * z ** 2) + return np.exp(-0.5 * z**2) @nb.njit(**nb_defaults(parallel=False)) @@ -60,10 +60,12 @@ def nb_gauss_amp(x: np.ndarray, mu: float, sigma: float, a: float) -> np.ndarray TODO:: potentially remove this, redundant with ``nb_gauss_scaled_pdf`` """ - if sigma ==0: invs=np.inf - else: invs = 1.0 / sigma + if sigma == 0: + invs = np.inf + else: + invs = 1.0 / sigma z = (x - mu) * invs - return a * np.exp(-0.5 * z ** 2) + return a * np.exp(-0.5 * z**2) @nb.njit(**nb_defaults(parallel=False)) @@ -87,11 +89,13 @@ def nb_gauss_pdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: The standard deviation of the Gaussian """ - if sigma ==0: invs=np.inf - else: invs = 1.0 / sigma + if sigma == 0: + invs = np.inf + else: + invs = 1.0 / sigma z = (x - mu) * invs - invnorm = invs/ np.sqrt(2 * np.pi) - return np.exp(-0.5 * z ** 2) * invnorm + invnorm = invs / np.sqrt(2 * np.pi) + return np.exp(-0.5 * z**2) * invnorm @nb.njit(**nb_defaults(parallel=False)) @@ -115,15 +119,18 @@ def nb_gauss_cdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: The standard deviation of the Gaussian """ - if sigma == 0 : invs=np.inf - else: - invs = 1.0/ sigma + if sigma == 0: + invs = np.inf + else: + invs = 1.0 / sigma - return 1/2 * (1 + nb_erf(invs*(x - mu)/(np.sqrt(2)))) + return 1 / 2 * (1 + nb_erf(invs * (x - mu) / (np.sqrt(2)))) @nb.njit(**nb_defaults(parallel=False)) -def nb_gauss_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float) -> np.ndarray: +def nb_gauss_scaled_pdf( + x: np.ndarray, area: float, mu: float, sigma: float +) -> np.ndarray: """ Gaussian with height as a parameter for fwhm etc. As a Numba JIT function, it runs slightly faster than @@ -139,15 +146,17 @@ def nb_gauss_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float) -> The centroid of the Gaussian sigma The standard deviation of the Gaussian - """ + """ return area * nb_gauss_pdf(x, mu, sigma) @nb.njit(**nb_defaults(parallel=False)) -def nb_gauss_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float) -> np.ndarray: +def nb_gauss_scaled_cdf( + x: np.ndarray, area: float, mu: float, sigma: float +) -> np.ndarray: """ - Gaussian CDF scaled by the number of signal counts for extended binned fits + Gaussian CDF scaled by the number of signal counts for extended binned fits As a Numba JIT function, it runs slightly faster than 'out of the box' functions. @@ -161,40 +170,61 @@ def nb_gauss_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float) -> The centroid of the Gaussian sigma The standard deviation of the Gaussian - """ - + """ + return area * nb_gauss_cdf(x, mu, sigma) -class gaussian_gen(pygama_continuous): +class GaussianGen(PygamaContinuous): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) - self.x_lo = -1*np.inf + self.x_lo = -1 * np.inf self.x_hi = np.inf def _pdf(self, x: np.ndarray) -> np.ndarray: x.flags.writeable = True return nb_gauss_pdf(x, 0, 1) + def _cdf(self, x: np.ndarray) -> np.ndarray: return nb_gauss_cdf(x, 0, 1) def get_pdf(self, x: np.ndarray, mu: float, sigma: float) -> np.ndarray: return nb_gauss_pdf(x, mu, sigma) + def get_cdf(self, x: np.ndarray, mu: float, sigma: float) -> np.ndarray: return nb_gauss_cdf(x, mu, sigma) - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float) -> np.ndarray: + def pdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float + ) -> np.ndarray: return self._pdf_norm(x, x_lo, x_hi, mu, sigma) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float) -> np.ndarray: + + def cdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float + ) -> np.ndarray: return self._cdf_norm(x, x_lo, x_hi, mu, sigma) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float) -> np.ndarray: - return np.diff(nb_gauss_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma))[0], nb_gauss_scaled_pdf(x, area, mu, sigma) - def cdf_ext(self, x: np.ndarray, area: float, mu: float, sigma: float) -> np.ndarray: + def pdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + ) -> np.ndarray: + return np.diff(nb_gauss_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma))[ + 0 + ], nb_gauss_scaled_pdf(x, area, mu, sigma) + + def cdf_ext( + self, x: np.ndarray, area: float, mu: float, sigma: float + ) -> np.ndarray: return nb_gauss_scaled_cdf(x, area, mu, sigma) def required_args(self) -> tuple[str, str]: return "mu", "sigma" -gaussian = gaussian_gen(name='gaussian') \ No newline at end of file + +gaussian = GaussianGen(name="gaussian") diff --git a/src/pygama/math/functions/gauss_on_exgauss.py b/src/pygama/math/functions/gauss_on_exgauss.py index e5b9381c3..db4460210 100644 --- a/src/pygama/math/functions/gauss_on_exgauss.py +++ b/src/pygama/math/functions/gauss_on_exgauss.py @@ -13,19 +13,26 @@ Returns ------- gauss_on_exgauss - A subclass of :class:`sum_dists` and :class:`rv_continuous`, has methods of :func:`pdf`, :func:`cdf`, etc. + A subclass of :class:`SumDists` and :class:`rv_continuous`, has methods of :func:`pdf`, :func:`cdf`, etc. Notes ----- The extended Gaussian distribution shares the mu, sigma with the Gaussian """ -from pygama.math.functions.sum_dists import sum_dists - -from pygama.math.functions.gauss import gaussian from pygama.math.functions.exgauss import exgauss +from pygama.math.functions.gauss import gaussian +from pygama.math.functions.sum_dists import SumDists -(mu, sigma, frac, tau) = range(4) # the sum_dist array we will pass will be mu, sigma, frac/htail, tau -par_array = [(exgauss, [mu, sigma, tau]), (gaussian, [mu, sigma])] +(mu, sigma, frac, tau) = range( + 4 +) # the sum_dist array we will pass will be mu, sigma, frac/htail, tau +par_array = [(exgauss, [mu, sigma, tau]), (gaussian, [mu, sigma])] -gauss_on_exgauss = sum_dists(par_array, [frac], "fracs", parameter_names = ["mu", "sigma", "htail", "tau"], name="gauss_on_exgauss") \ No newline at end of file +gauss_on_exgauss = SumDists( + par_array, + [frac], + "fracs", + parameter_names=["mu", "sigma", "htail", "tau"], + name="gauss_on_exgauss", +) diff --git a/src/pygama/math/functions/gauss_on_exponential.py b/src/pygama/math/functions/gauss_on_exponential.py index 40a5c2dcb..6c9c9b1b7 100644 --- a/src/pygama/math/functions/gauss_on_exponential.py +++ b/src/pygama/math/functions/gauss_on_exponential.py @@ -17,15 +17,22 @@ Returns ------- gauss_on_exponential - A subclass of :class:`sum_dists` and :class:`rv_continuous`, has methods of :func:`pdf`, :func:`cdf`, etc. + A subclass of :class:`SumDists` and :class:`rv_continuous`, has methods of :func:`pdf`, :func:`cdf`, etc. """ -from pygama.math.functions.sum_dists import sum_dists - -from pygama.math.functions.gauss import gaussian from pygama.math.functions.exponential import exponential +from pygama.math.functions.gauss import gaussian +from pygama.math.functions.sum_dists import SumDists -(n_sig, mu, sigma, n_bkg, lamb, mu_exp, sigma_exp)= range(7) # the sum_dist array we will pass will be n_sig, mu, sigma, lambda, n_bkg, mu_exp, sigma_exp -par_array = [(gaussian, [mu, sigma]), (exponential, [mu_exp, sigma_exp, lamb])] +(n_sig, mu, sigma, n_bkg, lamb, mu_exp, sigma_exp) = range( + 7 +) # the sum_dist array we will pass will be n_sig, mu, sigma, lambda, n_bkg, mu_exp, sigma_exp +par_array = [(gaussian, [mu, sigma]), (exponential, [mu_exp, sigma_exp, lamb])] -gauss_on_exponential = sum_dists(par_array, [n_sig, n_bkg], "areas", parameter_names = ["n_sig", "mu", "sigma", "n_bkg", "lambd", "mu_exp", "sigma_exp"], name="gauss_on_exponential") \ No newline at end of file +gauss_on_exponential = SumDists( + par_array, + [n_sig, n_bkg], + "areas", + parameter_names=["n_sig", "mu", "sigma", "n_bkg", "lambd", "mu_exp", "sigma_exp"], + name="gauss_on_exponential", +) diff --git a/src/pygama/math/functions/gauss_on_linear.py b/src/pygama/math/functions/gauss_on_linear.py index bd6660a54..edf995fee 100644 --- a/src/pygama/math/functions/gauss_on_linear.py +++ b/src/pygama/math/functions/gauss_on_linear.py @@ -1,19 +1,17 @@ -import sys - r""" -Provide a convenience function for a Gaussian on top of a linear distribution. +Provide a convenience function for a Gaussian on top of a linear distribution. -Parameters +Parameters ---------- -area1 +area1 The area of the Gaussian distribution mu, sigma The location and scale of the first Gaussian -area2 +area2 The area of the linear distributions respectively -m - The slope of the linear distribution -b +m + The slope of the linear distribution +b The y intercept of the linear distribution x_lo The lower bound on which to normalize the linear distribution @@ -21,17 +19,25 @@ The upper bound on which to normalize the linear distribution -Returns +Returns ------- gauss_on_linear - An instance of sum_dists and rv_continuous, has methods of pdf, cdf, etc. + An instance of SumDists and rv_continuous, has methods of pdf, cdf, etc. """ -from pygama.math.functions.sum_dists import sum_dists from pygama.math.functions.gauss import gaussian from pygama.math.functions.linear import linear +from pygama.math.functions.sum_dists import SumDists -(linear_x_lo, linear_x_hi, n_sig, mu, sigma, n_bkg, m, b) = range(8) # the sum_dist array we will pass will be x_lo, x_hi, n_sig, mu, sigma, n_bkg, m, b -par_array = [(gaussian, [mu, sigma]), (linear, [linear_x_lo, linear_x_hi, m, b])] +(linear_x_lo, linear_x_hi, n_sig, mu, sigma, n_bkg, m, b) = range( + 8 +) # the sum_dist array we will pass will be x_lo, x_hi, n_sig, mu, sigma, n_bkg, m, b +par_array = [(gaussian, [mu, sigma]), (linear, [linear_x_lo, linear_x_hi, m, b])] -gauss_on_linear = sum_dists(par_array, [n_sig, n_bkg], "areas", parameter_names = ["x_lo", "x_hi", "n_sig", "mu", "sigma", "n_bkg", "m", "b"], name="gauss_on_linear") +gauss_on_linear = SumDists( + par_array, + [n_sig, n_bkg], + "areas", + parameter_names=["x_lo", "x_hi", "n_sig", "mu", "sigma", "n_bkg", "m", "b"], + name="gauss_on_linear", +) diff --git a/src/pygama/math/functions/gauss_on_step.py b/src/pygama/math/functions/gauss_on_step.py index a58ce587d..a44267f11 100644 --- a/src/pygama/math/functions/gauss_on_step.py +++ b/src/pygama/math/functions/gauss_on_step.py @@ -1,36 +1,43 @@ r""" -Provide a convenience function for a Gaussian on top of a step distribution. +Provide a convenience function for a Gaussian on top of a step distribution. -Parameters +Parameters ---------- -x_lo, x_hi - The lower and upper bounds of the support of the step function +x_lo, x_hi + The lower and upper bounds of the support of the step function area1 The area of the Gaussian distribution mu, sigma The location and scale of the first Gaussian, and step function -area2 +area2 The area of the step function hstep The height of the step function -Returns +Returns ------- gauss_on_step - An instance of sum_dists and rv_continuous, has methods of pdf, cdf, etc. - -Notes + An instance of SumDists and rv_continuous, has methods of pdf, cdf, etc. + +Notes ----- -The step function shares a mu and sigma with the step function -The parameter array must have ordering (x_lo, x_hi, area1, mu, sigma, area2, hstep) +The step function shares a mu and sigma with the step function +The parameter array must have ordering (x_lo, x_hi, area1, mu, sigma, area2, hstep) """ -from pygama.math.functions.sum_dists import sum_dists from pygama.math.functions.gauss import gaussian from pygama.math.functions.step import step +from pygama.math.functions.sum_dists import SumDists +(x_lo, x_hi, n_sig, mu, sigma, n_bkg, hstep) = range( + 7 +) # the sum_dist array we will pass will be x_lo, x_hi, n_sig, mu, sigma, n_bkg, hstep +par_array = [(gaussian, [mu, sigma]), (step, [x_lo, x_hi, mu, sigma, hstep])] -(x_lo, x_hi, n_sig, mu, sigma, n_bkg, hstep) = range(7) # the sum_dist array we will pass will be x_lo, x_hi, n_sig, mu, sigma, n_bkg, hstep -par_array = [(gaussian, [mu, sigma]), (step, [x_lo, x_hi, mu, sigma, hstep])] - -gauss_on_step = sum_dists(par_array, [n_sig, n_bkg], "areas", parameter_names = ["x_lo", "x_hi", "n_sig", "mu", "sigma", "n_bkg", "hstep"], name="gauss_on_step") \ No newline at end of file +gauss_on_step = SumDists( + par_array, + [n_sig, n_bkg], + "areas", + parameter_names=["x_lo", "x_hi", "n_sig", "mu", "sigma", "n_bkg", "hstep"], + name="gauss_on_step", +) diff --git a/src/pygama/math/functions/gauss_on_uniform.py b/src/pygama/math/functions/gauss_on_uniform.py index 50f629306..fe1f9facf 100644 --- a/src/pygama/math/functions/gauss_on_uniform.py +++ b/src/pygama/math/functions/gauss_on_uniform.py @@ -1,28 +1,36 @@ r""" -Provide a convenience function for a Gaussian on top of a uniform distribution. +Provide a convenience function for a Gaussian on top of a uniform distribution. -Parameters +Parameters ---------- x_lo, x_hi The upper and lower bounds on which the uniform distribution is defined -n_sig +n_sig The area of the Gaussian distribution mu, sigma The location and scale of the first Gaussian -n_bkg +n_bkg The area of the uniform distributions respectively -Returns +Returns ------- gauss_on_uniform - An instance of sum_dists and rv_continuous, has methods of pdf, cdf, etc. + An instance of SumDists and rv_continuous, has methods of pdf, cdf, etc. """ -from pygama.math.functions.sum_dists import sum_dists from pygama.math.functions.gauss import gaussian +from pygama.math.functions.sum_dists import SumDists from pygama.math.functions.uniform import uniform -(x_lo, x_hi, n_sig, mu, sigma, n_bkg) = range(6) # the sum_dist array we will pass will be x_lo, x_hi, n_sig, mu, sigma, n_bkg -par_array = [(gaussian, [mu, sigma]), (uniform, [x_lo, x_hi])] +(x_lo, x_hi, n_sig, mu, sigma, n_bkg) = range( + 6 +) # the sum_dist array we will pass will be x_lo, x_hi, n_sig, mu, sigma, n_bkg +par_array = [(gaussian, [mu, sigma]), (uniform, [x_lo, x_hi])] -gauss_on_uniform = sum_dists(par_array, [n_sig, n_bkg], "areas", parameter_names = ["x_lo", "x_hi", "n_sig", "mu", "sigma", "n_bkg"], name="gauss_on_uniform") +gauss_on_uniform = SumDists( + par_array, + [n_sig, n_bkg], + "areas", + parameter_names=["x_lo", "x_hi", "n_sig", "mu", "sigma", "n_bkg"], + name="gauss_on_uniform", +) diff --git a/src/pygama/math/functions/hpge_peak.py b/src/pygama/math/functions/hpge_peak.py index 6adad4dfc..6ad18ba62 100644 --- a/src/pygama/math/functions/hpge_peak.py +++ b/src/pygama/math/functions/hpge_peak.py @@ -1,15 +1,15 @@ r""" -Provide a convenience function for the HPGe peak shape. +Provide a convenience function for the HPGe peak shape. A HPGe peak consists of a Gaussian -on an Exgauss on a step function. +on an Exgauss on a step function. .. math:: PDF = n_sig*((1-htail)*gauss + htail*exgauss) + n_bkg*step -Called with +Called with hpge_peak.get_pdf(x, x_lo, x_hi, n_sig, mu, sigma, htail, tau, n_bkg, hstep) @@ -34,153 +34,204 @@ hstep The height of the step function background -Returns +Returns ------- hpge_peak - A subclass of sum_dists and rv_continuous, has methods of pdf, cdf, etc. + A subclass of SumDists and rv_continuous, has methods of pdf, cdf, etc. -Notes ------ +Notes +----- The extended Gaussian distribution and the step distribution share the mu, sigma with the Gaussian """ + import numpy as np -from pygama.math.functions.sum_dists import sum_dists from pygama.math.functions.gauss_on_exgauss import gauss_on_exgauss from pygama.math.functions.step import step -from pygama.math.hpge_peak_fitting import hpge_peak_fwhm, hpge_peak_fwfm, hpge_peak_mode - +from pygama.math.functions.sum_dists import SumDists +from pygama.math.hpge_peak_fitting import hpge_peak_fwfm, hpge_peak_fwhm, hpge_peak_mode (x_lo, x_hi, n_sig, mu, sigma, frac1, tau, n_bkg, hstep) = range(9) -par_array = [(gauss_on_exgauss, [mu, sigma, frac1, tau]), (step, [x_lo, x_hi, mu, sigma, hstep])] - -hpge_peak = sum_dists(par_array, [n_sig, n_bkg], "areas", parameter_names = ["x_lo", "x_hi", "n_sig", "mu", "sigma", "htail", "tau", "n_bkg", "hstep"], name="hpge_peak") - -# This is defined here as to avoid a circular import inside `sum_dists` +par_array = [ + (gauss_on_exgauss, [mu, sigma, frac1, tau]), + (step, [x_lo, x_hi, mu, sigma, hstep]), +] + +hpge_peak = SumDists( + par_array, + [n_sig, n_bkg], + "areas", + parameter_names=[ + "x_lo", + "x_hi", + "n_sig", + "mu", + "sigma", + "htail", + "tau", + "n_bkg", + "hstep", + ], + name="hpge_peak", +) + + +# This is defined here as to avoid a circular import inside `SumDists` def hpge_get_fwhm(self, pars: np.ndarray, cov: np.ndarray = None) -> tuple: r""" Get the fwhm value from the output of a fit quickly - Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, + Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, and otherwise returns 2sqrt(2log(2))*sigma - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Optional, array of covariances for calculating error on the fwhm - Returns + Returns ------- - fwhm, error + fwhm, error the value of the fwhm and its error """ req_args = np.array(self.required_args()) sigma_idx = np.where(req_args == "sigma")[0][0] - if ("htail" in req_args) and ("hstep" in req_args): #having both the htail and hstep means it is an exgauss on a step + if ("htail" in req_args) and ( + "hstep" in req_args + ): # having both the htail and hstep means it is an exgauss on a step htail_idx = np.where(req_args == "htail")[0][0] tau_idx = np.where(req_args == "tau")[0][0] # We need to ditch the x_lo and x_hi columns and rows if cov is not None: cov = np.array(cov) dropped_cov = cov[:, 2:][2:, :] - return hpge_peak_fwhm(pars[sigma_idx], pars[htail_idx], pars[tau_idx], dropped_cov) + return hpge_peak_fwhm( + pars[sigma_idx], pars[htail_idx], pars[tau_idx], dropped_cov + ) else: return hpge_peak_fwhm(pars[sigma_idx], pars[htail_idx], pars[tau_idx]) - else: + else: if cov is None: - return pars[sigma_idx]*2*np.sqrt(2*np.log(2)) + return pars[sigma_idx] * 2 * np.sqrt(2 * np.log(2)) else: - return pars[sigma_idx]*2*np.sqrt(2*np.log(2)), np.sqrt(cov[sigma_idx][sigma_idx])*2*np.sqrt(2*np.log(2)) + return pars[sigma_idx] * 2 * np.sqrt(2 * np.log(2)), np.sqrt( + cov[sigma_idx][sigma_idx] + ) * 2 * np.sqrt(2 * np.log(2)) + -# This is defined here as to avoid a circular import inside `sum_dists` -def hpge_get_fwfm(self, pars: np.ndarray, frac_max=0.5, cov: np.ndarray = None) -> tuple: +# This is defined here as to avoid a circular import inside `SumDists` +def hpge_get_fwfm( + self, pars: np.ndarray, frac_max=0.5, cov: np.ndarray = None +) -> tuple: r""" Get the fwhm value from the output of a fit quickly - Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, + Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, and otherwise returns 2sqrt(-2log(frac_max))*sigma - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Optional, array of covariances for calculating error on the fwhm - Returns + Returns ------- - fwhm, error + fwhm, error the value of the fwhm and its error """ req_args = np.array(self.required_args()) sigma_idx = np.where(req_args == "sigma")[0][0] - if ("htail" in req_args) and ("hstep" in req_args): #having both the htail and hstep means it is an exgauss on a step + if ("htail" in req_args) and ( + "hstep" in req_args + ): # having both the htail and hstep means it is an exgauss on a step htail_idx = np.where(req_args == "htail")[0][0] tau_idx = np.where(req_args == "tau")[0][0] # We need to ditch the x_lo and x_hi columns and rows if cov is not None: cov = np.array(cov) dropped_cov = cov[:, 2:][2:, :] - return hpge_peak_fwfm(pars[sigma_idx], pars[htail_idx], pars[tau_idx], frac_max=frac_max, cov=dropped_cov) + return hpge_peak_fwfm( + pars[sigma_idx], + pars[htail_idx], + pars[tau_idx], + frac_max=frac_max, + cov=dropped_cov, + ) else: - return hpge_peak_fwfm(pars[sigma_idx], pars[htail_idx], pars[tau_idx], frac_max=frac_max) + return hpge_peak_fwfm( + pars[sigma_idx], pars[htail_idx], pars[tau_idx], frac_max=frac_max + ) - else: + else: if cov is None: - return pars[sigma_idx]*2*np.sqrt(2*np.log(2)) + return pars[sigma_idx] * 2 * np.sqrt(2 * np.log(2)) else: - return pars[sigma_idx]*2*np.sqrt(-2*np.log(frac_max)), np.sqrt(cov[sigma_idx][sigma_idx])*2*np.sqrt(-2*np.log(frac_max)) - -# This is defined here as to avoid a circular import inside `sum_dists` + return pars[sigma_idx] * 2 * np.sqrt(-2 * np.log(frac_max)), np.sqrt( + cov[sigma_idx][sigma_idx] + ) * 2 * np.sqrt(-2 * np.log(frac_max)) + + +# This is defined here as to avoid a circular import inside `SumDists` def hpge_get_mode(self, pars: np.ndarray, cov: np.ndarray = None) -> tuple: r""" Get the fwhm value from the output of a fit quickly - Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, + Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, and otherwise returns 2sqrt(2log(2))*sigma - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Optional, array of covariances for calculating error on the fwhm - Returns + Returns ------- - fwhm, error + fwhm, error the value of the fwhm and its error """ req_args = np.array(self.required_args()) sigma_idx = np.where(req_args == "sigma")[0][0] mu_idx = np.where(req_args == "mu")[0][0] - if ("htail" in req_args) and ("hstep" in req_args): #having both the htail and hstep means it is an exgauss on a step + if ("htail" in req_args) and ( + "hstep" in req_args + ): # having both the htail and hstep means it is an exgauss on a step htail_idx = np.where(req_args == "htail")[0][0] tau_idx = np.where(req_args == "tau")[0][0] # We need to ditch the x_lo and x_hi columns and rows if cov is not None: cov = np.array(cov) dropped_cov = cov[:, 2:][2:, :] - - return hpge_peak_mode(pars[mu_idx], pars[sigma_idx], pars[htail_idx], - pars[tau_idx], dropped_cov) + + return hpge_peak_mode( + pars[mu_idx], + pars[sigma_idx], + pars[htail_idx], + pars[tau_idx], + dropped_cov, + ) else: - return hpge_peak_mode(pars[mu_idx], pars[sigma_idx], pars[htail_idx], - pars[tau_idx]) + return hpge_peak_mode( + pars[mu_idx], pars[sigma_idx], pars[htail_idx], pars[tau_idx] + ) - else: + else: if cov is None: return pars[mu_idx] else: return np.sqrt(cov[mu_idx][mu_idx]) + # hpge_peak.get_fwhm = hpge_get_fwhm hpge_peak.get_fwfm = hpge_get_fwfm.__get__(hpge_peak) hpge_peak.get_mode = hpge_get_mode.__get__(hpge_peak) -hpge_peak.get_fwhm = hpge_get_fwhm.__get__(hpge_peak) \ No newline at end of file +hpge_peak.get_fwhm = hpge_get_fwhm.__get__(hpge_peak) diff --git a/src/pygama/math/functions/linear.py b/src/pygama/math/functions/linear.py index 037b16b82..b61e119ca 100644 --- a/src/pygama/math/functions/linear.py +++ b/src/pygama/math/functions/linear.py @@ -1,19 +1,18 @@ -import sys - - import numba as nb import numpy as np from numba import prange -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_kwargs) -def nb_linear_pdf(x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) -> np.ndarray: +def nb_linear_pdf( + x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float +) -> np.ndarray: r""" - Normalised linear probability density function, w/ args: m, b. Its range of support is :math:`x\in(x_{lower},x_{upper})`. + Normalised linear probability density function, w/ args: m, b. Its range of support is :math:`x\in(x_{lower},x_{upper})`. If :math:`x_{lower} = np.inf` and :math:`x_{upper} = np.inf`, then the function takes :math:`x_{upper} = :func:`np.min(x)` and :math:`x_{upper} = :func:`np.amax(x)` It computes: @@ -38,18 +37,20 @@ def nb_linear_pdf(x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) - b The y-intercept of the linear part """ - norm = (m/2)*(x_hi**2 - x_lo**2) + b*(x_hi-x_lo) + norm = (m / 2) * (x_hi**2 - x_lo**2) + b * (x_hi - x_lo) result = np.empty_like(x, np.float64) for i in prange(x.shape[0]): - result[i] = (m*x[i] + b)/norm + result[i] = (m * x[i] + b) / norm return result @nb.njit(**nb_kwargs) -def nb_linear_cdf(x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) -> np.ndarray: +def nb_linear_cdf( + x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float +) -> np.ndarray: r""" - Normalised linear cumulative density function, w/ args: m, b. Its range of support is :math:`x\in(x_{lower},x_{upper})`. + Normalised linear cumulative density function, w/ args: m, b. Its range of support is :math:`x\in(x_{lower},x_{upper})`. If :math:`x_{lower} = np.inf` and :math:`x_{upper} = np.inf`, then the function takes :math:`x_{upper} = :func:`np.min(x)` and :math:`x_{upper} = :func:`np.amax(x)` It computes: @@ -74,16 +75,18 @@ def nb_linear_cdf(x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) - b The y-intercept of the linear part """ - norm = (m/2)*(x_hi**2 - x_lo**2) + b*(x_hi-x_lo) + norm = (m / 2) * (x_hi**2 - x_lo**2) + b * (x_hi - x_lo) result = np.empty_like(x, np.float64) for i in prange(x.shape[0]): - result[i] = (m/2*(x[i]**2-x_lo**2) + b*(x[i]-x_lo))/norm + result[i] = (m / 2 * (x[i] ** 2 - x_lo**2) + b * (x[i] - x_lo)) / norm return result @nb.njit(**nb_defaults(parallel=False)) -def nb_linear_scaled_pdf(x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float) -> np.ndarray: +def nb_linear_scaled_pdf( + x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float +) -> np.ndarray: r""" Scaled linear probability distribution, w/ args: m, b. As a Numba JIT function, it runs slightly faster than @@ -103,15 +106,17 @@ def nb_linear_scaled_pdf(x: np.ndarray, x_lo: float, x_hi: float, area: float, m The slope of the linear part b The y-intercept of the linear part - """ + """ return area * nb_linear_pdf(x, x_lo, x_hi, m, b) @nb.njit(**nb_defaults(parallel=False)) -def nb_linear_scaled_cdf(x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float) -> np.ndarray: +def nb_linear_scaled_cdf( + x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float +) -> np.ndarray: r""" - Linear cdf scaled by the area for extended binned fits + Linear cdf scaled by the area for extended binned fits As a Numba JIT function, it runs slightly faster than 'out of the box' functions. @@ -129,13 +134,13 @@ def nb_linear_scaled_cdf(x: np.ndarray, x_lo: float, x_hi: float, area: float, m The slope of the linear part b The y-intercept of the linear part - """ - + """ + return area * nb_linear_cdf(x, x_lo, x_hi, m, b) -class linear_gen(pygama_continuous): - +class LinearGen(PygamaContinuous): + def __init__(self, *args, **kwargs): self.x_lo = None self.x_hi = None @@ -146,28 +151,47 @@ def _argcheck(self, x_lo, x_hi, m, b): def _pdf(self, x: np.ndarray, x_lo: float, x_hi: float, m, b) -> np.ndarray: x.flags.writeable = True - return nb_linear_pdf(x, x_lo[0], x_hi[0], m[0], b[0]) + return nb_linear_pdf(x, x_lo[0], x_hi[0], m[0], b[0]) + def _cdf(self, x: np.ndarray, x_lo: float, x_hi: float, m, b) -> np.ndarray: x.flags.writeable = True return nb_linear_cdf(x, x_lo[0], x_hi[0], m[0], b[0]) - def get_pdf(self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) -> np.ndarray: - return nb_linear_pdf(x, x_lo, x_hi, m, b) - def get_cdf(self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) -> np.ndarray: + def get_pdf( + self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float + ) -> np.ndarray: + return nb_linear_pdf(x, x_lo, x_hi, m, b) + + def get_cdf( + self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float + ) -> np.ndarray: return nb_linear_cdf(x, x_lo, x_hi, m, b) - # Because this function is already normalized over its limited support, we need to alias get_pdf as pdf_norm - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) -> np.ndarray: - return nb_linear_pdf(x, x_lo, x_hi, m, b) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float) -> np.ndarray: + # Because this function is already normalized over its limited support, we need to alias get_pdf as pdf_norm + def pdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float + ) -> np.ndarray: + return nb_linear_pdf(x, x_lo, x_hi, m, b) + + def cdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, m: float, b: float + ) -> np.ndarray: return nb_linear_cdf(x, x_lo, x_hi, m, b) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float) -> np.ndarray: - return np.diff(nb_linear_scaled_cdf(np.array([x_lo, x_hi]), x_lo, x_hi, area, m, b))[0], nb_linear_scaled_pdf(x, x_lo, x_hi, area, m, b) - def cdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float) -> np.ndarray: + def pdf_ext( + self, x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float + ) -> np.ndarray: + return np.diff( + nb_linear_scaled_cdf(np.array([x_lo, x_hi]), x_lo, x_hi, area, m, b) + )[0], nb_linear_scaled_pdf(x, x_lo, x_hi, area, m, b) + + def cdf_ext( + self, x: np.ndarray, x_lo: float, x_hi: float, area: float, m: float, b: float + ) -> np.ndarray: return nb_linear_scaled_cdf(x, x_lo, x_hi, area, m, b) def required_args(self) -> tuple[str, str, str, str]: - return "x_lo", "x_hi", "m", "b" + return "x_lo", "x_hi", "m", "b" + -linear = linear_gen(name='linear') \ No newline at end of file +linear = LinearGen(name="linear") diff --git a/src/pygama/math/functions/moyal.py b/src/pygama/math/functions/moyal.py index 959f74172..2d43e4ab3 100644 --- a/src/pygama/math/functions/moyal.py +++ b/src/pygama/math/functions/moyal.py @@ -2,24 +2,21 @@ Moyal distributions for pygama """ -import sys +from math import erfc import numba as nb import numpy as np -from numba import prange -from math import erfc -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults - +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_kwargs) def nb_moyal_pdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: r""" Normalised Moyal probability distribution function, w/ args: mu, sigma. Its support is :math:`x\in\mathbb{R}` - It computes: + It computes: .. math:: @@ -40,10 +37,10 @@ def nb_moyal_pdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: The amount to scale the distribution """ - y = np.empty_like(x, dtype = np.float64) + y = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): - y[i] = (x[i]-mu)/sigma - y[i] = np.exp(-1 * (y[i] + np.exp(-y[i])) / 2.0) / np.sqrt(2.0*np.pi) / sigma + y[i] = (x[i] - mu) / sigma + y[i] = np.exp(-1 * (y[i] + np.exp(-y[i])) / 2.0) / np.sqrt(2.0 * np.pi) / sigma return y @@ -52,7 +49,7 @@ def nb_moyal_cdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: r""" Normalised Moyal cumulative distribution, w/ args: mu, sigma. Its support is :math:`x\in\mathbb{R}` - It computes: + It computes: .. math:: @@ -73,15 +70,17 @@ def nb_moyal_cdf(x: np.ndarray, mu: float, sigma: float) -> np.ndarray: The amount to scale the distribution """ - y = np.empty_like(x, dtype = np.float64) + y = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): - y[i] = (x[i]-mu)/sigma - y[i] = erfc(np.exp(-1*y[i]/2)/np.sqrt(2)) + y[i] = (x[i] - mu) / sigma + y[i] = erfc(np.exp(-1 * y[i] / 2) / np.sqrt(2)) return y @nb.njit(**nb_defaults(parallel=False)) -def nb_moyal_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float) -> np.ndarray: +def nb_moyal_scaled_pdf( + x: np.ndarray, area: float, mu: float, sigma: float +) -> np.ndarray: r""" Scaled Moyal probability density function, w/ args: mu, sigma, area. As a Numba JIT function, it runs slightly faster than @@ -97,15 +96,17 @@ def nb_moyal_scaled_pdf(x: np.ndarray, area: float, mu: float, sigma: float) -> The amount to shift the distribution sigma The amount to scale the distribution - """ + """ return area * nb_moyal_pdf(x, mu, sigma) @nb.njit(**nb_defaults(parallel=False)) -def nb_moyal_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float) -> np.ndarray: +def nb_moyal_scaled_cdf( + x: np.ndarray, area: float, mu: float, sigma: float +) -> np.ndarray: r""" - Moyal cdf scaled by the area, used for extended binned fits + Moyal cdf scaled by the area, used for extended binned fits As a Numba JIT function, it runs slightly faster than 'out of the box' functions. @@ -120,41 +121,62 @@ def nb_moyal_scaled_cdf(x: np.ndarray, area: float, mu: float, sigma: float) -> The amount to shift the distribution sigma The amount to scale the distribution - """ - + """ + return area * nb_moyal_cdf(x, mu, sigma) -class moyal_gen(pygama_continuous): +class MoyalGen(PygamaContinuous): def __init__(self, *args, **kwargs): - self.x_lo = -1*np.inf + self.x_lo = -1 * np.inf self.x_hi = np.inf super().__init__(*args, **kwargs) def _pdf(self, x: np.ndarray) -> np.ndarray: x.flags.writeable = True return nb_moyal_pdf(x, 0, 1) + def _cdf(self, x: np.ndarray) -> np.ndarray: x.flags.writeable = True return nb_moyal_cdf(x, 0, 1) def get_pdf(self, x: np.ndarray, mu: float, sigma: float) -> np.ndarray: - return nb_moyal_pdf(x, mu, sigma) + return nb_moyal_pdf(x, mu, sigma) + def get_cdf(self, x: np.ndarray, mu: float, sigma: float) -> np.ndarray: return nb_moyal_cdf(x, mu, sigma) - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float) -> np.ndarray: + def pdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float + ) -> np.ndarray: return self._pdf_norm(x, x_lo, x_hi, mu, sigma) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float) -> np.ndarray: + + def cdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float + ) -> np.ndarray: return self._cdf_norm(x, x_lo, x_hi, mu, sigma) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float) -> np.ndarray: - return np.diff(nb_moyal_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma))[0], nb_moyal_scaled_pdf(x, area, mu, sigma) - def cdf_ext(self, x: np.ndarray, area: float, mu: float, sigma: float) -> np.ndarray: + def pdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + ) -> np.ndarray: + return np.diff(nb_moyal_scaled_cdf(np.array([x_lo, x_hi]), area, mu, sigma))[ + 0 + ], nb_moyal_scaled_pdf(x, area, mu, sigma) + + def cdf_ext( + self, x: np.ndarray, area: float, mu: float, sigma: float + ) -> np.ndarray: return nb_moyal_scaled_cdf(x, area, mu, sigma) def required_args(self) -> tuple[str, str]: return "mu", "sigma" -moyal = moyal_gen(name='moyal') \ No newline at end of file + +moyal = MoyalGen(name="moyal") diff --git a/src/pygama/math/functions/poisson.py b/src/pygama/math/functions/poisson.py index 80951571b..4c486d4fc 100644 --- a/src/pygama/math/functions/poisson.py +++ b/src/pygama/math/functions/poisson.py @@ -1,15 +1,13 @@ """ Poisson distributions for pygama """ -import sys import numba as nb import numpy as np -from numba import prange - from scipy.stats import rv_discrete -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs + from pygama.utils import numba_math_defaults as nb_defaults +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_defaults(parallel=False)) @@ -19,6 +17,7 @@ def factorial(nn): res *= ii return res + @nb.njit(**nb_kwargs) def nb_poisson_pmf(x: np.ndarray, mu: int, lamb: float) -> np.ndarray: r""" @@ -39,14 +38,14 @@ def nb_poisson_pmf(x: np.ndarray, mu: int, lamb: float) -> np.ndarray: lamb The rate """ - - y = np.empty_like(x, dtype = np.float64) - for i in nb.prange(x.shape[0]): + + y = np.empty_like(x, dtype=np.float64) + for i in nb.prange(x.shape[0]): y[i] = x[i] - mu if y[i] < 0: y[i] = 0 else: - y[i] = lamb**y[i] * np.exp(-lamb) / factorial(int(y[i])) + y[i] = lamb ** y[i] * np.exp(-lamb) / factorial(int(y[i])) return y @@ -70,20 +69,21 @@ def nb_poisson_cdf(x: np.ndarray, mu: int, lamb: float) -> np.ndarray: The rate """ - y = np.empty_like(x, dtype = np.float64) - for i in nb.prange(x.shape[0]): + y = np.empty_like(x, dtype=np.float64) + for i in nb.prange(x.shape[0]): y[i] = x[i] - mu z = 0 - for j in nb.prange(1, np.floor(y[i])+2): + for j in nb.prange(1, np.floor(y[i]) + 2): j -= 1 z += lamb**j / factorial(j) - y[i] = z*np.exp(-lamb) + y[i] = z * np.exp(-lamb) return y - @nb.njit(**nb_defaults(parallel=False)) -def nb_poisson_scaled_pmf(x: np.ndarray, area: float, mu: int, lamb: float) -> np.ndarray: +def nb_poisson_scaled_pmf( + x: np.ndarray, area: float, mu: int, lamb: float +) -> np.ndarray: r""" Scaled Poisson probability distribution, w/ args: lamb, mu. As a Numba JIT function, it runs slightly faster than @@ -99,15 +99,17 @@ def nb_poisson_scaled_pmf(x: np.ndarray, area: float, mu: int, lamb: float) -> n Amount to shift the distribution lamb The rate - """ + """ return area * nb_poisson_pmf(x, mu, lamb) @nb.njit(**nb_defaults(parallel=False)) -def nb_poisson_scaled_cdf(x: np.ndarray, area: float, mu: int, lamb: float) -> np.ndarray: +def nb_poisson_scaled_cdf( + x: np.ndarray, area: float, mu: int, lamb: float +) -> np.ndarray: r""" - Poisson cdf scaled by the number of signal counts for extended binned fits + Poisson cdf scaled by the number of signal counts for extended binned fits As a Numba JIT function, it runs slightly faster than 'out of the box' functions. @@ -121,12 +123,12 @@ def nb_poisson_scaled_cdf(x: np.ndarray, area: float, mu: int, lamb: float) -> n Amount to shift the distribution lamb The rate - """ - + """ + return area * nb_poisson_cdf(x, mu, lamb) - -class poisson_gen(rv_discrete): + +class PoissonGen(rv_discrete): def __init__(self, *args, **kwargs): self.x_lo = 0 @@ -136,21 +138,29 @@ def __init__(self, *args, **kwargs): def _pmf(self, x: np.array, mu: int, lamb: float) -> np.array: x.flags.writeable = True return nb_poisson_pmf(x, mu[0], lamb[0]) - def _cdf(self, x: np.array, mu: int, lamb: float) -> np.array: + + def _cdf(self, x: np.array, mu: int, lamb: float) -> np.array: x.flags.writeable = True return nb_poisson_cdf(x, mu[0], lamb[0]) def get_pmf(self, x: np.array, mu: int, lamb: float) -> np.array: return nb_poisson_pmf(x, mu, lamb) + def get_cdf(self, x: np.array, mu: int, lamb: float) -> np.array: return nb_poisson_cdf(x, mu, lamb) - def pmf_ext(self, x: np.array, x_lo: float, x_hi: float, area: float, mu: int, lamb: float) -> np.array: - return np.diff(nb_poisson_scaled_cdf(np.array([x_lo, x_hi]), area, mu, lamb))[0], nb_poisson_scaled_pmf(x, area, mu, lamb) + def pmf_ext( + self, x: np.array, x_lo: float, x_hi: float, area: float, mu: int, lamb: float + ) -> np.array: + return np.diff(nb_poisson_scaled_cdf(np.array([x_lo, x_hi]), area, mu, lamb))[ + 0 + ], nb_poisson_scaled_pmf(x, area, mu, lamb) + def cdf_ext(self, x: np.array, area: float, mu: int, lamb: float) -> np.array: return nb_poisson_scaled_cdf(x, area, mu, lamb) def required_args(self) -> tuple[str, str]: return "mu", "lamb" -poisson = poisson_gen(a=0, name='poisson') \ No newline at end of file + +poisson = PoissonGen(a=0, name="poisson") diff --git a/src/pygama/math/functions/polynomial.py b/src/pygama/math/functions/polynomial.py index 808436083..189f280d8 100644 --- a/src/pygama/math/functions/polynomial.py +++ b/src/pygama/math/functions/polynomial.py @@ -1,5 +1,6 @@ import numba as nb import numpy as np + from pygama.utils import numba_math_defaults as nb_defaults @@ -23,21 +24,22 @@ def nb_poly(x: np.ndarray, pars: np.ndarray) -> np.ndarray: pars Coefficients of the polynomial, in numpy polynomial convention - Returns - ------- - result + Returns + ------- + result The polynomial defined by pars at the evaluated at x - + Notes ----- This follows the :func:`numpy.polynomial` convention of :math:`a_0 + a_1 x +.... + a_n x^n` """ - - result = x*0 # do x*0 to keep shape of x (scalar or array) - if len(pars) == 0: return result + + result = x * 0 # do x*0 to keep shape of x (scalar or array) + if len(pars) == 0: + return result result += pars[0] - y=x + y = x for i in nb.prange(1, len(pars)): - result += pars[i]*x - x = x*y + result += pars[i] * x + x = x * y return result diff --git a/src/pygama/math/functions/pygama_continuous.py b/src/pygama/math/functions/pygama_continuous.py index abb0f1f30..e3517f5a2 100644 --- a/src/pygama/math/functions/pygama_continuous.py +++ b/src/pygama/math/functions/pygama_continuous.py @@ -1,39 +1,40 @@ """ -Subclasses of scipy.stat's rv_continuous, allows for freezing of pygama numba-fied distributions for faster functions +Subclasses of scipy.stat's rv_continuous, allows for freezing of pygama numba-fied distributions for faster functions Example: >>> moyal = moyal_gen(name='moyal') >>> moyal = moyal(1, 2) ->>> moyal.get_pdf([-1,2,3]) # a direct call to the faster numbafied method +>>> moyal.get_pdf([-1,2,3]) # a direct call to the faster numbafied method >>> moyal.pdf([-1,2,3]) # a call to the slower scipy method, allows access to other scipy methods like .rvs >>> moyal.rvs(100) # Can access scipy methods! NOTE: the dist.pdf method is a slow scipy method, and the dist.get_pdf method is fast """ -from scipy.stats._distn_infrastructure import rv_continuous, rv_frozen import numpy as np +from scipy.stats._distn_infrastructure import rv_continuous, rv_frozen + -class numba_frozen(rv_frozen): +class NumbaFrozen(rv_frozen): r""" - Essentially, an overloading of the definition of rv_continuous_frozen so that our pygama class instantiations have + Essentially, an overloading of the definition of rv_continuous_frozen so that our pygama class instantiations have access to both the slower scipy methods, as well as the faster pygama methods (like get_pdf) """ - + def pdf(self, x: np.ndarray) -> np.ndarray: r""" Overloading of scipy's pdf function, this is a slow function """ return self.dist.pdf(x, *self.args, **self.kwds) - + def logpdf(self, x: np.ndarray) -> np.ndarray: r""" Overloading of scipy's logpdf function, this is a slow function """ return self.dist.logpdf(x, *self.args, **self.kwds) - + def get_pdf(self, x: np.ndarray) -> np.ndarray: r""" Direct access to numba-fied pdfs, a fast function @@ -56,7 +57,9 @@ def get_cdf(self, x: np.ndarray) -> np.ndarray: return self.dist.get_cdf(np.array(x), *self.args, **self.kwds) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float) -> tuple[float, np.ndarray]: + def pdf_ext( + self, x: np.ndarray, x_lo: float, x_hi: float, area: float + ) -> tuple[float, np.ndarray]: r""" Direct access to numba-fied extended pdfs, a fast function @@ -77,7 +80,7 @@ def cdf_ext(self, x: np.ndarray, area: float) -> np.ndarray: """ return self.dist.cdf_ext(np.array(x), area, *self.args, **self.kwds) - + def pdf_norm(self, x, x_lo: float, x_hi: float): r""" Direct access to numba-fied pdfs, a fast function. Normalized on a fit range @@ -87,7 +90,7 @@ def pdf_norm(self, x, x_lo: float, x_hi: float): fit-range normalized pdf """ return self.dist.pdf_norm(x, x_lo, x_hi, *self.args, **self.kwds) - + def cdf_norm(self, x, x_lo: float, x_hi: float): r""" Direct access to numba-fied cdfs, a fast function. Normalized on a fit range @@ -97,20 +100,20 @@ def cdf_norm(self, x, x_lo: float, x_hi: float): fit-range normalized cdf """ return self.dist.cdf_norm(x, x_lo, x_hi, *self.args, **self.kwds) - + def required_args(self) -> tuple: r""" - Allow access to the required args of frozen distribution + Allow access to the required args of frozen distribution - Returns + Returns ------- Required shape, location, and scale arguments """ return self.dist.required_args() - -class pygama_continuous(rv_continuous): + +class PygamaContinuous(rv_continuous): r""" Subclass rv_continuous, and modify the instantiation so that we call an overloaded version of rv_continuous_frozen that has direct access to pygama numbafied functions @@ -118,72 +121,71 @@ class pygama_continuous(rv_continuous): def _pdf_norm(self, x, x_lo, x_hi, *args, **kwds): r""" - Normalize a pdf on a subset of its support, typically over a fit-range. + Normalize a pdf on a subset of its support, typically over a fit-range. - Parameters - ---------- - x + Parameters + ---------- + x The data - x_lo - The lower range to normalize on - x_hi - The upper range to normalize on - args - The shape and location and scale parameters of a specific distribution - - Returns + x_lo + The lower range to normalize on + x_hi + The upper range to normalize on + args + The shape and location and scale parameters of a specific distribution + + Returns ------- - pdf_norm - The pdf that is normalized on a smaller range + pdf_norm + The pdf that is normalized on a smaller range - Notes - ----- + Notes + ----- If upper_range and lower_range are both :func:`np.inf`, then the function automatically takes x_lo =:func:`np.amin(x)`, x_hi=:func:`np.amax(x)` We also need to overload this in every subclass because we want functions to be able to introspect the shape and location/scale names For distributions that are only defined on a limited range, like with lower_range, upper_range, we don't need to call these, instead just call the normal pdf. """ - norm = np.diff(self.get_cdf(np.array([x_lo, x_hi]), *args)) if norm == 0: return np.full_like(x, np.inf) else: - return self.get_pdf(x,*args)/norm - + return self.get_pdf(x, *args) / norm + def _cdf_norm(self, x, x_lo, x_hi, *args, **kwds): r""" - Derive a cdf from a pdf that is normalized on a subset of its support, typically over a fit-range. + Derive a cdf from a pdf that is normalized on a subset of its support, typically over a fit-range. - Parameters - ---------- - x + Parameters + ---------- + x The data - x_lo - The lower range to normalize on - x_hi - The upper range to normalize on - args - The shape and location and scale parameters of a specific distribution - - Returns + x_lo + The lower range to normalize on + x_hi + The upper range to normalize on + args + The shape and location and scale parameters of a specific distribution + + Returns ------- - pdf_norm - The pdf that is normalized on a smaller range + pdf_norm + The pdf that is normalized on a smaller range - Notes - ----- + Notes + ----- If upper_range and lower_range are both :func:`np.inf`, then the function automatically takes x_lo =:func:`np.amin(x)`, x_hi=:func:`np.amax(x)` - We also need to overload this in every subclass because we want functions to be able to introspect the shape and location/scale names. + We also need to overload this in every subclass because we want functions to be able to introspect the shape and location/scale names. For distributions that are only defined on a limited range, like with lower_range, upper_range, we don't need to call these, instead just call the normal pdf. """ norm = np.diff(self.get_cdf(np.array([x_lo, x_hi]), *args)) - + if norm == 0: return np.full_like(x, np.inf) else: - return (self.get_cdf(x,*args))/norm + return (self.get_cdf(x, *args)) / norm def __call__(self, *args, **kwds): - return numba_frozen(self, *args, **kwds) \ No newline at end of file + return NumbaFrozen(self, *args, **kwds) diff --git a/src/pygama/math/functions/step.py b/src/pygama/math/functions/step.py index 5a3950ac9..ae7d4652c 100644 --- a/src/pygama/math/functions/step.py +++ b/src/pygama/math/functions/step.py @@ -2,65 +2,66 @@ Step distributions for pygama """ -import sys -from typing import Union +from math import erf import numba as nb import numpy as np -from math import erf -from pygama.math.functions.gauss import nb_gauss -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_defaults(parallel=False)) def nb_step_int(x: float, mu: float, sigma: float, hstep: float) -> np.ndarray: r""" - Integral of step function w/args mu, sigma, hstep. It computes: + Integral of step function w/args mu, sigma, hstep. It computes: - .. math:: + .. math:: \int cdf(x, hstep, \mu, \sigma)\, dx = \sigma\left(\frac{x-\mu}{\sigma} + hstep \left(\frac{x-\mu}{\sigma}\text{erf}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right) + \sqrt{\frac{2}{\pi}}\exp\left(-(\frac{x-\mu}{\sigma})^2/2\right) \right)\right) As a Numba JIT function, it runs slightly faster than 'out of the box' functions. - Parameters + Parameters ---------- - x + x A single input data point - mu + mu The location of the step - sigma - The width of the step - hstep - The height of the step + sigma + The width of the step + hstep + The height of the step - Returns + Returns ------- step_int The cumulative integral of the step distribution at x """ if sigma == 0: - y = (x)+hstep*(x-mu) + y = (x) + hstep * (x - mu) else: - z = (x-mu)/sigma - y = sigma*(z + hstep*(z*erf(z/np.sqrt(2)) + np.sqrt(2/np.pi)*np.exp(-z**2/2))) + z = (x - mu) / sigma + y = sigma * ( + z + + hstep + * (z * erf(z / np.sqrt(2)) + np.sqrt(2 / np.pi) * np.exp(-(z**2) / 2)) + ) return y @nb.njit(**nb_defaults(parallel=False)) -def nb_unnorm_step_pdf(x: float, mu: float, sigma: float, hstep: float) -> float: +def nb_unnorm_step_pdf(x: float, mu: float, sigma: float, hstep: float) -> float: r""" - Unnormalised step function for use in pdfs. It computes: + Unnormalised step function for use in pdfs. It computes: - .. math:: + .. math:: pdf(x, hstep, \mu, \sigma) = 1+ hstep\text{erf}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right) @@ -68,100 +69,105 @@ def nb_unnorm_step_pdf(x: float, mu: float, sigma: float, hstep: float) -> floa As a Numba JIT function, it runs slightly faster than 'out of the box' functions. - Parameters + Parameters ---------- x A single input data point - mu + mu The location of the step - sigma + sigma The "width" of the step, because we are using an error function to define it - hstep - The height of the step + hstep + The height of the step """ - invs = (np.sqrt(2)*sigma) - if invs == 0: - return 1+hstep - else: - step_f = 1 + hstep * erf((x-mu)/invs) + invs = np.sqrt(2) * sigma + if invs == 0: + return 1 + hstep + else: + step_f = 1 + hstep * erf((x - mu) / invs) return step_f @nb.njit(**nb_kwargs) -def nb_step_pdf(x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: +def nb_step_pdf( + x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float +) -> np.ndarray: r""" - Normalised step function w/args mu, sigma, hstep, x_lo, x_hi. Its range of support is :math:`x\in` (x_lo, x_hi). It computes: + Normalised step function w/args mu, sigma, hstep, x_lo, x_hi. Its range of support is :math:`x\in` (x_lo, x_hi). It computes: - .. math:: + .. math:: pdf(x, \text{x_lo}, \text{x_hi}, \mu, \sigma, hstep) = pdf(y=\frac{x-\mu}{\sigma}, step, \text{x_lo}, \text{x_hi}) = \frac{1+hstep\text{erf}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)}{\sigma\left[(y-y_{min}) +hstep\left(y\text{erf}(\frac{y}{\sqrt{2}})+\sqrt{\frac{2}{\pi}}e^{-y^2/2}-y_{min}\text{erf}(\frac{y_{min}}{\sqrt{2}})+\sqrt{\frac{2}{\pi}}e^{-y_{min}^2/2}\right)\right]} - Where :math:`y_{max} = \frac{\text{x_hi} - \mu}{\sigma}, y_{min} = \frac{\text{x_lo} - \mu}{\sigma}`. + Where :math:`y_{max} = \frac{\text{x_hi} - \mu}{\sigma}, y_{min} = \frac{\text{x_lo} - \mu}{\sigma}`. As a Numba JIT function, it runs slightly faster than 'out of the box' functions. - Parameters + Parameters ---------- x The input data x_lo The lower range on which to normalize the step PDF, default is to normalize from min to max x values - x_hi + x_hi The upper range on which to normalize the step PDF - mu + mu The location of the step - sigma + sigma The "width" of the step, because we are using an error function to define it - hstep - The height of the step + hstep + The height of the step """ - # Compute the normalization - norm = nb_step_int(x_hi, mu, sigma, hstep)-nb_step_int(x_lo, mu, sigma, hstep) + # Compute the normalization + norm = nb_step_int(x_hi, mu, sigma, hstep) - nb_step_int(x_lo, mu, sigma, hstep) - if norm == 0 : + if norm == 0: # If the normalization is zero, don't waste time computing the step_pdf z = np.full_like(x, np.inf, dtype=np.float64) else: z = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): - step_f = nb_unnorm_step_pdf(x[i], mu, sigma, hstep) - z[i] = step_f/norm + step_f = nb_unnorm_step_pdf(x[i], mu, sigma, hstep) + z[i] = step_f / norm return z + @nb.njit(**nb_kwargs) -def nb_step_cdf(x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: +def nb_step_cdf( + x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float +) -> np.ndarray: r""" - Normalized CDF for step function w/args mu, sigma, hstep, x_lo, x_hi. Its range of support is :math:`x\in` (x_lo, x_hi). It computes: + Normalized CDF for step function w/args mu, sigma, hstep, x_lo, x_hi. Its range of support is :math:`x\in` (x_lo, x_hi). It computes: .. math:: cdf(x, \text{x_lo}, \text{x_hi}, \mu, \sigma, hstep) = cdf(y=\frac{x-\mu}{\sigma}, hstep, \text{x_lo}, \text{x_hi}) = \frac{(y-y_{min}) +hstep\left(y\text{erf}(\frac{y}{\sqrt{2}})+\sqrt{\frac{2}{\pi}}e^{-y^2/2}-y_{min}\text{erf}(\frac{y_{min}}{\sqrt{2}})+\sqrt{\frac{2}{\pi}}e^{-y_{min}^2/2}\right)}{\sigma\left[(y_{max}-y_{min}) +hstep\left(y_{max}\text{erf}(\frac{y_{max}}{\sqrt{2}})+\sqrt{\frac{2}{\pi}}e^{-y_{max}^2/2}-y_{min}\text{erf}(\frac{y_{min}}{\sqrt{2}})+\sqrt{\frac{2}{\pi}}e^{-y_{min}^2/2}\right)\right] } - Where :math:`y_{max} = \frac{\text{x_hi} - \mu}{\sigma}, y_{min} = \frac{\text{x_lo} - \mu}{\sigma}`. + Where :math:`y_{max} = \frac{\text{x_hi} - \mu}{\sigma}, y_{min} = \frac{\text{x_lo} - \mu}{\sigma}`. As a Numba JIT function, it runs slightly faster than 'out of the box' functions. - Parameters + Parameters ---------- x The input data x_lo The lower range on which to normalize the step PDF, default is to normalize from min to max x values - x_hi + x_hi The upper range on which to normalize the step PDF - mu + mu The location of the step - sigma + sigma The "width" of the step, because we are using an error function to define it - hstep - The height of the step + hstep + The height of the step """ integral_lo = nb_step_int(x_lo, mu, sigma, hstep) integral_hi = nb_step_int(x_hi, mu, sigma, hstep) @@ -169,108 +175,199 @@ def nb_step_cdf(x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float if norm == 0: # If the normalization is zero, return np.inf and avoid wasting time computing the integral - cdf = np.full_like(x, np.inf, dtype = np.float64) + cdf = np.full_like(x, np.inf, dtype=np.float64) else: integrated_pdf = np.empty_like(x, dtype=np.float64) for i in nb.prange(x.shape[0]): integrated_pdf[i] = nb_step_int(x[i], mu, sigma, hstep) - cdf = (integrated_pdf - integral_lo)/norm # compute (cdf(x)-cdf(x_min))/norm + cdf = (integrated_pdf - integral_lo) / norm # compute (cdf(x)-cdf(x_min))/norm return cdf @nb.njit(**nb_defaults(parallel=False)) -def nb_step_scaled_pdf(x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, hstep: float) -> np.ndarray: +def nb_step_scaled_pdf( + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + hstep: float, +) -> np.ndarray: r""" Scaled step function pdf w/args x_lo, x_hi, area, mu, sigma, hstep. As a Numba JIT function, it runs slightly faster than 'out of the box' functions. - Parameters + Parameters ---------- x The input data x_lo The lower range on which to normalize the step PDF, default is to normalize from min to max x values - x_hi + x_hi The upper range on which to normalize the step PDF area The number to scale the distribution by - mu + mu The location of the step - sigma + sigma The "width" of the step, because we are using an error function to define it - hstep - The height of the step + hstep + The height of the step """ return area * nb_step_pdf(x, x_lo, x_hi, mu, sigma, hstep) @nb.njit(**nb_defaults(parallel=False)) -def nb_step_scaled_cdf(x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, hstep: float) -> np.ndarray: +def nb_step_scaled_cdf( + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + hstep: float, +) -> np.ndarray: r""" Scaled step function CDF w/args x_lo, x_hi, area, mu, sigma, hstep. Used for extended binned fits. As a Numba JIT function, it runs slightly faster than 'out of the box' functions. - Parameters + Parameters ---------- x The input data x_lo The lower range on which to normalize the step PDF, default is to normalize from min to max x values - x_hi + x_hi The upper range on which to normalize the step PDF area The number to scale the distribution by - mu + mu The location of the step - sigma + sigma The "width" of the step, because we are using an error function to define it - hstep - The height of the step + hstep + The height of the step """ - + return area * nb_step_cdf(x, x_lo, x_hi, mu, sigma, hstep) -class step_gen(pygama_continuous): +class StepGen(PygamaContinuous): def _argcheck(self, x_lo, x_hi, mu, sigma, hstep): - return (x_hi>x_lo) - + return x_hi > x_lo + def __init__(self, *args, **kwargs): self.x_lo = None self.x_hi = None super().__init__(*args, **kwargs) - def _pdf(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + def _pdf( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: x.flags.writeable = True return nb_step_pdf(x, x_lo[0], x_hi[0], mu[0], sigma[0], hstep[0]) - def _cdf(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + + def _cdf( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: x.flags.writeable = True return nb_step_cdf(x, x_lo[0], x_hi[0], mu[0], sigma[0], hstep[0]) - def get_pdf(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + def get_pdf( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: return nb_step_pdf(x, x_lo, x_hi, mu, sigma, hstep) - def get_cdf(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + + def get_cdf( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: return nb_step_cdf(x, x_lo, x_hi, mu, sigma, hstep) # Because step is only defined on a user specified range, we don't need to return a different pdf_norm, just alias get_pdf and get_cdf - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + def pdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: return nb_step_pdf(x, x_lo, x_hi, mu, sigma, hstep) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + + def cdf_norm( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: return nb_step_cdf(x, x_lo, x_hi, mu, sigma, hstep) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, hstep: float) -> np.ndarray: - return np.diff(nb_step_scaled_cdf(np.array([x_lo, x_hi]), x_lo, x_hi, area, mu, sigma, hstep))[0], nb_step_scaled_pdf(x, x_lo, x_hi, area, mu, sigma, hstep) - def cdf_ext(self, x: np.ndarray, x_lo: float, x_hi: float, area: float, mu: float, sigma: float, hstep: float) -> np.ndarray: + def pdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: + return np.diff( + nb_step_scaled_cdf( + np.array([x_lo, x_hi]), x_lo, x_hi, area, mu, sigma, hstep + ) + )[0], nb_step_scaled_pdf(x, x_lo, x_hi, area, mu, sigma, hstep) + + def cdf_ext( + self, + x: np.ndarray, + x_lo: float, + x_hi: float, + area: float, + mu: float, + sigma: float, + hstep: float, + ) -> np.ndarray: return nb_step_scaled_cdf(x, x_lo, x_hi, area, mu, sigma, hstep) def required_args(self) -> tuple[str, str, str, str, str]: - return "x_lo", "x_hi", "mu", "sigma", "hstep" + return "x_lo", "x_hi", "mu", "sigma", "hstep" + -step = step_gen(name='step') \ No newline at end of file +step = StepGen(name="step") diff --git a/src/pygama/math/functions/sum_dists.py b/src/pygama/math/functions/sum_dists.py index fa2e69a70..c7be6abbc 100644 --- a/src/pygama/math/functions/sum_dists.py +++ b/src/pygama/math/functions/sum_dists.py @@ -1,10 +1,10 @@ r""" -A class that creates the sum of distributions, with methods for scipy computed :func:`pdfs` and :func:`cdfs`, as well as fast :func:`get_pdfs`, and :func:`pdf_ext` +A class that creates the sum of distributions, with methods for scipy computed :func:`pdfs` and :func:`cdfs`, as well as fast :func:`get_pdfs`, and :func:`pdf_ext` .. code-block:: python mu1, mu2, sigma, frac = range(4) - moyal_add = sum_dists([(moyal, [mu1, sigma]), (moyal, [mu2, sigma])], [frac], "fracs") # create two moyals that share a sigma and differ by a fraction, + moyal_add = SumDists([(moyal, [mu1, sigma]), (moyal, [mu2, sigma])], [frac], "fracs") # create two moyals that share a sigma and differ by a fraction, x = np.arange(-10,10) pars = np.array([1, 2, 2, 0.1]) # corresponds to mu1 = 1, mu2 = 2, sigma = 2, frac=0.1 moyal_add.pdf(x, *pars) @@ -12,67 +12,79 @@ moyal_add.required_args() """ -import numba as nb -import numpy as np -import matplotlib.pyplot as plt -from scipy.stats._distn_infrastructure import rv_continuous, rv_frozen -from pygama.math.functions.pygama_continuous import pygama_continuous -from typing import Tuple import inspect +from typing import Tuple + +import matplotlib.pyplot as plt +import numpy as np +from scipy.stats._distn_infrastructure import rv_continuous -from scipy._lib._util import getfullargspec_no_self as _getfullargspec -def get_dists_and_par_idxs(dists_and_pars_array: np.array(tuple, tuple)) -> Tuple[np.array, np.array]: +from pygama.math.functions.pygama_continuous import PygamaContinuous + + +def get_dists_and_par_idxs( + dists_and_pars_array: np.array(tuple, tuple) +) -> Tuple[np.array, np.array]: r""" - Split the array of tuples passed to the :func:`sum_dists` constructor into separate arrays with one containing only the + Split the array of tuples passed to the :func:`SumDists` constructor into separate arrays with one containing only the distributions and the other containing the parameter index arrays. Also performs some sanity checks. Parameters ---------- - dists_and_pars_array - An array of tuples, each tuple contains a distribution and the indices of the parameters - in the :func:`sum_dists` call that correspond to that distribution + dists_and_pars_array + An array of tuples, each tuple contains a distribution and the indices of the parameters + in the :func:`SumDists` call that correspond to that distribution Returns ------- dists, par_idxs - Returned only if every distribution has the correct number of required arguments + Returned only if every distribution has the correct number of required arguments """ # can only sum two dists at once, check: if len(dists_and_pars_array) != 2: raise ValueError("Can only sum two distributions at once.") - + dists = [] par_idxs = [] # Check that each tuple in the dists_and_pars_array is of length two for dist_and_pars in dists_and_pars_array: if len(dist_and_pars) != 2: - raise ValueError("Each tuple needs a distribution and a parameter index array.") + raise ValueError( + "Each tuple needs a distribution and a parameter index array." + ) else: dists.append(dist_and_pars[0]) par_idxs.append(np.array(dist_and_pars[1])) return dists, par_idxs -def get_areas_fracs(params: np.array, area_frac_idxs: np.array, frac_flag: bool, area_flag: bool, one_area_flag: bool) -> Tuple[np.array, np.array]: + +def get_areas_fracs( + params: np.array, + area_frac_idxs: np.array, + frac_flag: bool, + area_flag: bool, + one_area_flag: bool, +) -> Tuple[np.array, np.array]: r""" - Grab the value(s) of either the fraction or the areas passed in the params array from the :func:`sum_dists` call. - If :func:`sum_dists` is in "fracs" mode, then this grabs `f` from the params array and returns fracs = [f, 1-f] and areas of unity. - If :func:`sum_dists` is in "areas" mode, then this grabs `s, b` from the params array and returns unity fracs and areas = [s, b] - If :func:`sum_dists` is in "one_area" mode, then this grabs `s` from the params array and returns unity fracs and areas = [s, 1] + Grab the value(s) of either the fraction or the areas passed in the params array from the :func:`SumDists` call. + If :func:`SumDists` is in "fracs" mode, then this grabs `f` from the params array and returns fracs = [f, 1-f] and areas of unity. + If :func:`SumDists` is in "areas" mode, then this grabs `s, b` from the params array and returns unity fracs and areas = [s, b] + If :func:`SumDists` is in "one_area" mode, then this grabs `s` from the params array and returns unity fracs and areas = [s, 1] Parameters ---------- - params - An array containing the shape values from a :func:`sum_dists` call + params + An array containing the shape values from a :func:`SumDists` call area_frac_idxs An array containing the indices of either the fracs or the areas present in the params array frac_flag - A boolean telling if :func:`sum_dists` is in fracs mode or not + A boolean telling if :func:`SumDists` is in fracs mode or not area_flag - A boolean telling if :func:`sum_dists` is in areas mode or not - one_area_flag - A boolean telling if :func:`sum_dists` is to apply only area to one distribution + A boolean telling if :func:`SumDists` is in areas mode or not + one_area_flag + A boolean telling if :func:`SumDists` is to apply only area to one distribution Returns ------- @@ -80,7 +92,7 @@ def get_areas_fracs(params: np.array, area_frac_idxs: np.array, frac_flag: bool, Values of the fractions and the areas to post-multiply the sum of the distributions with """ if frac_flag: - fracs = np.array([params[area_frac_idxs[0]], 1-params[area_frac_idxs[0]]]) + fracs = np.array([params[area_frac_idxs[0]], 1 - params[area_frac_idxs[0]]]) areas = np.array([1, 1]) elif area_flag: fracs = np.array([1, 1]) @@ -95,19 +107,20 @@ def get_areas_fracs(params: np.array, area_frac_idxs: np.array, frac_flag: bool, return fracs, areas + def get_parameter_names(dists: np.array, par_idxs: np.array, par_size: int) -> np.array: r""" - Returns an array of the names of the required parameters for an instance of :func:`sum_dists` - Works by calling :func:`.required_args` for each distribution present in the sum. - If a parameter is shared between distributions its name is only added once. + Returns an array of the names of the required parameters for an instance of :func:`SumDists` + Works by calling :func:`.required_args` for each distribution present in the sum. + If a parameter is shared between distributions its name is only added once. If two parameters are required and share a name, then the second parameter gets an added index at the end. Parameters ---------- - dists - An array containing the distributions in this instance of :func:`sum_dists` + dists + An array containing the distributions in this instance of :func:`SumDists` par_idxs - An array of arrays, each array contains the indices of the parameters in the :func:`sum_dists` call that correspond to that distribution + An array of arrays, each array contains the indices of the parameters in the :func:`SumDists` call that correspond to that distribution par_size The size of the single parameter index array @@ -116,11 +129,15 @@ def get_parameter_names(dists: np.array, par_idxs: np.array, par_size: int) -> n param_names An array containing the required parameter names """ - param_names = np.empty(par_size+1, dtype=object) + param_names = np.empty(par_size + 1, dtype=object) overall_par_idxs = [] for i in range(len(dists)): - mask = ~np.isin(par_idxs[i], overall_par_idxs) # see if indices of the required args that are not included yet - new_idxs = par_idxs[i][mask] # get indices of the required args that are not included yet + mask = ~np.isin( + par_idxs[i], overall_par_idxs + ) # see if indices of the required args that are not included yet + new_idxs = par_idxs[i][ + mask + ] # get indices of the required args that are not included yet prereq_names = np.array(dists[i].required_args())[mask] req_names = [] # Check for duplicate names @@ -131,57 +148,60 @@ def get_parameter_names(dists: np.array, par_idxs: np.array, par_size: int) -> n else: name = name + "1" req_names.append(name) - param_names[new_idxs] = req_names overall_par_idxs.extend(par_idxs[i]) return param_names + def copy_signature(signature_to_copy, obj_to_copy_to): """ - Copy the signature provided in signature_to_copy into the signature for "obj_to_copy_to". - This is necessary so that we can override the signature for the various methods attached to + Copy the signature provided in signature_to_copy into the signature for "obj_to_copy_to". + This is necessary so that we can override the signature for the various methods attached to different objects. """ + def wrapper(*args, **kwargs): return obj_to_copy_to(*args, **kwargs) + wrapper.__signature__ = signature_to_copy return wrapper -class sum_dists(rv_continuous): + +class SumDists(rv_continuous): r""" - Initialize an rv_continuous method so that we gain access to scipy computable methods. - Precompute the support of the sum of the distributions. + Initialize an rv_continuous method so that we gain access to scipy computable methods. + Precompute the support of the sum of the distributions. - The correct way to initialize is sum_dists([(d1, [p1]), (d2, [p2])], [area_idx_1/frac_idx, area_idx_2/], frac_flag) + The correct way to initialize is SumDists([(d1, [p1]), (d2, [p2])], [area_idx_1/frac_idx, area_idx_2/], frac_flag) Where d_i is a distribution and p_i is a parameter index array for that distribution. - Parameter index arrays contain indices that slice a single parameter array that is passed to method calls. + Parameter index arrays contain indices that slice a single parameter array that is passed to method calls. For example, if the user will eventually pass parameters=[mu, sigma, tau, frac] to function.get_pdf(x, parameters) and the first distribution takes (mu, sigma) as its parameters, then p1=[0,1]. If the second distribution takes (mu, sigma, tau) - then its parameter index array would be p2=[0, 1, 2] because tau is the index 2 entry in parameters. + then its parameter index array would be p2=[0, 1, 2] because tau is the index 2 entry in parameters. - Each par array can contain [x_lo, x_hi, mu, sigma, shape], and *must be placed in that order*. + Each par array can contain [x_lo, x_hi, mu, sigma, shape], and *must be placed in that order*. The single parameter array passed to function calls *must* follow the ordering convention [x_lo, x_hi, frac/areas, shapes_1, frac/areas2, shapes_2] - The single parameter array that is passed to :func:`pdf_norm`, :func:`pdf_ext`, and :func:`cdf_norm` calls *must* have x_lo, x_hi - as its first two arguments if none of the distributions require an explicit definition of their support. + The single parameter array that is passed to :func:`pdf_norm`, :func:`pdf_ext`, and :func:`cdf_norm` calls *must* have x_lo, x_hi + as its first two arguments if none of the distributions require an explicit definition of their support. There are 4 flag options: 1. flag = "areas", two areas passed as fit variables - 2. flag = "fracs", one fraction is passed a a fit variable, with the convention that sum_dists performs f*dist_1 + (1-f)*dist_2 - 3. flag = "one_area", one area is passed a a fit variable, with the convention that sum_dists performs area*dist_1 + 1*dist_2 + 2. flag = "fracs", one fraction is passed a a fit variable, with the convention that SumDists performs f*dist_1 + (1-f)*dist_2 + 3. flag = "one_area", one area is passed a a fit variable, with the convention that SumDists performs area*dist_1 + 1*dist_2 4. flag = None, no areas or fracs passed as fit variables, both are normalized to unity. - Notes + Notes ----- - dists must be unfrozen pygama distributions of the type :func:`pygama_continuous` or :func:`sum_dist`. + dists must be unfrozen pygama distributions of the type :func:`PygamaContinuous` or :func:`sum_dist`. """ def _argcheck(self, *args) -> bool: """ - Check that each distribution gets its own valid arguments. + Check that each distribution gets its own valid arguments. Notes ----- @@ -193,8 +213,16 @@ def _argcheck(self, *args) -> bool: cond = np.logical_and(cond, dist._argcheck(*args[self.par_idxs[i]])) return cond - - def __init__(self, dists_and_pars_array, area_frac_idxs, flag, parameter_names=None, components=False, support_required=False, **kwds): + def __init__( + self, + dists_and_pars_array, + area_frac_idxs, + flag, + parameter_names=None, + components=False, + support_required=False, + **kwds, + ): """ Parameters ---------- @@ -205,11 +233,11 @@ def __init__(self, dists_and_pars_array, area_frac_idxs, flag, parameter_names=N flag One of three strings that initialize :func:`sum_dist` in different modes. Either "fracs", "areas" or "one_area". parameter_names - An optional list of strings that contain the parameters names in the order they will appear in method calls + An optional list of strings that contain the parameters names in the order they will appear in method calls components A boolean that if true will cause methods to return components instead of the sum of the distributions support_required - A boolean that if true tells :func:`sum_dist` that x_lo, x_hi will *always* be passed in method calls. + A boolean that if true tells :func:`sum_dist` that x_lo, x_hi will *always* be passed in method calls. """ # Extract the distributions and parameter index arrays from the constructor dists, par_idxs = get_dists_and_par_idxs(dists_and_pars_array) @@ -222,43 +250,53 @@ def __init__(self, dists_and_pars_array, area_frac_idxs, flag, parameter_names=N # Check that the dists are in fact distributions for i in range(len(dists)): - if (not isinstance(dists[i], pygama_continuous)) and (not isinstance(dists[i], sum_dists)): - raise ValueError(f"Distribution at index {i} has value {dists[i]},\ - and is an array and not a pygama_continuous distribution") + if (not isinstance(dists[i], PygamaContinuous)) and ( + not isinstance(dists[i], SumDists) + ): + raise ValueError( + f"Distribution at index {i} has value {dists[i]},\ + and is an array and not a PygamaContinuous distribution" + ) # Get the parameter names for later introspection # First, find the length of the eventual single parameter array par_size = 0 for par_idx in par_idxs: - if np.amax(par_idx)>=par_size: + if np.amax(par_idx) >= par_size: par_size = np.amax(par_idx) - par_size = np.amax([par_size, np.amax(area_frac_idxs)]) if len(area_frac_idxs)!= 0 else par_size + par_size = ( + np.amax([par_size, np.amax(area_frac_idxs)]) + if len(area_frac_idxs) != 0 + else par_size + ) shapes = get_parameter_names(dists, par_idxs, par_size) - + # Set the internal state depending on what flag was passed if flag == "fracs": if len(area_frac_idxs) != 1: - raise ValueError("sum_dists only accepts the parameter position of one fraction.") + raise ValueError( + "SumDists only accepts the parameter position of one fraction." + ) self.frac_flag = True self.area_flag = False self.one_area_flag = False - shapes[area_frac_idxs[0]] = "f" # add frac name to the shapes + shapes[area_frac_idxs[0]] = "f" # add frac name to the shapes elif flag == "areas": if len(area_frac_idxs) != 2: - raise ValueError("sum_dists needs two parameter indices of areas.") + raise ValueError("SumDists needs two parameter indices of areas.") self.frac_flag = False self.area_flag = True self.one_area_flag = False - shapes[area_frac_idxs] = ["s", "b"] # add area names to the shapes + shapes[area_frac_idxs] = ["s", "b"] # add area names to the shapes elif flag == "one_area": - # needed so that we can create sum_dists from sum_dists without having an overall area to the second dist + # needed so that we can create SumDists from SumDists without having an overall area to the second dist # Sets the area of the second dist to 1 if len(area_frac_idxs) != 1: - raise ValueError("sum_dists needs one parameter index of an area.") + raise ValueError("SumDists needs one parameter index of an area.") self.frac_flag = False self.area_flag = False self.one_area_flag = True - shapes[area_frac_idxs]= "s" # add area name to the shapes + shapes[area_frac_idxs] = "s" # add area name to the shapes else: self.frac_flag = False self.area_flag = False @@ -266,7 +304,7 @@ def __init__(self, dists_and_pars_array, area_frac_idxs, flag, parameter_names=N shapes = list(shapes) # override the parameter names if string is passed to the constructor - if parameter_names != None: + if parameter_names is not None: shapes = parameter_names # If a support is explicitly passed, flag it. @@ -276,77 +314,116 @@ def __init__(self, dists_and_pars_array, area_frac_idxs, flag, parameter_names=N self.support_required = True self.x_lo_idx = shapes.index("x_lo") self.x_hi_idx = shapes.index("x_hi") - extended_shapes = ["x", *shapes] # get the parameter names for methods that require the fit range passed, like extended fits - self.extended_shapes = extended_shapes + extended_shapes = [ + "x", + *shapes, + ] # get the parameter names for methods that require the fit range passed, like extended fits + self.extended_shapes = extended_shapes else: self.x_lo_idx = 0 self.x_hi_idx = 1 - extended_shapes = ["x", "x_lo", "x_hi", *shapes] # get the parameter names for methods that require the fit range passed, like extended fits - self.extended_shapes = extended_shapes + extended_shapes = [ + "x", + "x_lo", + "x_hi", + *shapes, + ] # get the parameter names for methods that require the fit range passed, like extended fits + self.extended_shapes = extended_shapes # Now that we have inferred or set the shapes, store them so that :func:`required_args` can return them self.req_args = shapes - # set atttributes for the methods so that Iminuit can introspect parameter names - x_shapes = ["x", *shapes] # add the x values to the parameter names that are being passed, Iminuit assumes that this is in the first position - self.x_shapes = x_shapes - - # Scipy requires the argument names as one string with commas inbetween each parameter name - shapes = ','.join(str(x) for x in shapes) + # set attributes for the methods so that Iminuit can introspect parameter names + x_shapes = [ + "x", + *shapes, + ] # add the x values to the parameter names that are being passed, Iminuit assumes that this is in the first position + self.x_shapes = x_shapes + # Scipy requires the argument names as one string with commas in between each parameter name + shapes = ",".join(str(x) for x in shapes) super().__init__(self, shapes=shapes, **kwds) def _pdf(self, x, *params): """ - Overload :func:`rv_continuous` definition of pdf in order to access other methods. + Overload :func:`rv_continuous` definition of pdf in order to access other methods. """ - pdfs = self.dists - params = np.array(params)[:,0] # scipy ravels the parameter array... + pdfs = self.dists + params = np.array(params)[:, 0] # scipy ravels the parameter array... - fracs, areas = get_areas_fracs(params, self.area_frac_idxs, self.frac_flag, self.area_flag, self.one_area_flag) + fracs, areas = get_areas_fracs( + params, + self.area_frac_idxs, + self.frac_flag, + self.area_flag, + self.one_area_flag, + ) if self.components: - return areas[0]*fracs[0]*pdfs[0].pdf(x, *params[self.par_idxs[0]]), areas[1]*fracs[1]*pdfs[1].pdf(x, *params[self.par_idxs[1]]) + return areas[0] * fracs[0] * pdfs[0].pdf( + x, *params[self.par_idxs[0]] + ), areas[1] * fracs[1] * pdfs[1].pdf(x, *params[self.par_idxs[1]]) else: # This is faster than list comprehension - probs = areas[0]*fracs[0]*pdfs[0].pdf(x, *params[self.par_idxs[0]]) + areas[1]*fracs[1]*pdfs[1].pdf(x, *params[self.par_idxs[1]]) + probs = areas[0] * fracs[0] * pdfs[0].pdf( + x, *params[self.par_idxs[0]] + ) + areas[1] * fracs[1] * pdfs[1].pdf(x, *params[self.par_idxs[1]]) return probs def _cdf(self, x, *params): - """ - Overload :func:`rv_continuous` definition of cdf in order to access other methods. + """ + Overload :func:`rv_continuous` definition of cdf in order to access other methods. """ cdfs = self.dists - params = np.array(params)[:,0] + params = np.array(params)[:, 0] - fracs, areas = get_areas_fracs(params, self.area_frac_idxs, self.frac_flag, self.area_flag, self.one_area_flag) + fracs, areas = get_areas_fracs( + params, + self.area_frac_idxs, + self.frac_flag, + self.area_flag, + self.one_area_flag, + ) if self.components: - return areas[0]*fracs[0]*cdfs[0].cdf(x, *params[self.par_idxs[0]]) , areas[1]*fracs[1]*cdfs[1].cdf(x, *params[self.par_idxs[1]]) + return areas[0] * fracs[0] * cdfs[0].cdf( + x, *params[self.par_idxs[0]] + ), areas[1] * fracs[1] * cdfs[1].cdf(x, *params[self.par_idxs[1]]) else: # This is faster than list comprehension - probs = areas[0]*fracs[0]*cdfs[0].cdf(x, *params[self.par_idxs[0]]) + areas[1]*fracs[1]*cdfs[1].cdf(x, *params[self.par_idxs[1]]) + probs = areas[0] * fracs[0] * cdfs[0].cdf( + x, *params[self.par_idxs[0]] + ) + areas[1] * fracs[1] * cdfs[1].cdf(x, *params[self.par_idxs[1]]) return probs - def get_pdf(self, x, *params): """ Returns the specified sum of all distributions' :func:`get_pdf` methods. """ - pdfs = self.dists + pdfs = self.dists params = np.array(params) - fracs, areas = get_areas_fracs(params, self.area_frac_idxs, self.frac_flag, self.area_flag, self.one_area_flag) - + fracs, areas = get_areas_fracs( + params, + self.area_frac_idxs, + self.frac_flag, + self.area_flag, + self.one_area_flag, + ) + if self.components: - return areas[0]*fracs[0]*pdfs[0].get_pdf(x, *params[self.par_idxs[0]]), areas[1]*fracs[1]*pdfs[1].get_pdf(x, *params[self.par_idxs[1]]) + return areas[0] * fracs[0] * pdfs[0].get_pdf( + x, *params[self.par_idxs[0]] + ), areas[1] * fracs[1] * pdfs[1].get_pdf(x, *params[self.par_idxs[1]]) else: # This is faster than list comprehension - probs = areas[0]*fracs[0]*pdfs[0].get_pdf(x, *params[self.par_idxs[0]]) + areas[1]*fracs[1]*pdfs[1].get_pdf(x, *params[self.par_idxs[1]]) + probs = areas[0] * fracs[0] * pdfs[0].get_pdf( + x, *params[self.par_idxs[0]] + ) + areas[1] * fracs[1] * pdfs[1].get_pdf(x, *params[self.par_idxs[1]]) return probs def get_cdf(self, x, *params): @@ -356,109 +433,157 @@ def get_cdf(self, x, *params): cdfs = self.dists params = np.array(params) - fracs, areas = get_areas_fracs(params, self.area_frac_idxs, self.frac_flag, self.area_flag, self.one_area_flag) + fracs, areas = get_areas_fracs( + params, + self.area_frac_idxs, + self.frac_flag, + self.area_flag, + self.one_area_flag, + ) if self.components: - return areas[0]*fracs[0]*cdfs[0].get_cdf(x, *params[self.par_idxs[0]]) , areas[1]*fracs[1]*cdfs[1].get_cdf(x, *params[self.par_idxs[1]]) + return areas[0] * fracs[0] * cdfs[0].get_cdf( + x, *params[self.par_idxs[0]] + ), areas[1] * fracs[1] * cdfs[1].get_cdf(x, *params[self.par_idxs[1]]) else: # This is faster than list comprehension - probs = areas[0]*fracs[0]*cdfs[0].get_cdf(x, *params[self.par_idxs[0]]) + areas[1]*fracs[1]*cdfs[1].get_cdf(x, *params[self.par_idxs[1]]) + probs = areas[0] * fracs[0] * cdfs[0].get_cdf( + x, *params[self.par_idxs[0]] + ) + areas[1] * fracs[1] * cdfs[1].get_cdf(x, *params[self.par_idxs[1]]) return probs - def pdf_norm(self, x, *params): """ Returns the specified sum of all distributions' :func:`get_pdf` methods, but normalized on a range x_lo to x_hi. Used in unbinned NLL fits. NOTE: This assumes that x_lo, x_hi are the first two elements in the parameter array, unless x_lo and x_hi are required to define the support - of the :func:`sum_dists` for *all* method calls. For :func:`sum_dists` created from a distribution where the support needs - to be explicitly passed, this means that :func:`pdf_norm` sets the fit range equal to the support range. + of the :func:`SumDists` for *all* method calls. For :func:`SumDists` created from a distribution where the support needs + to be explicitly passed, this means that :func:`pdf_norm` sets the fit range equal to the support range. It also means that summing distributions with two different explicit supports is not allowed, e.g. we cannot sum two step functions with different supports. """ # Grab the values of x_lo and x_hi from the parameters. For functions where the support needs to be defined, this can be anywhere in the parameter array # Otherwise, x_lo and x_hi *must* be the first two values passed. x_lo = params[self.x_lo_idx] x_hi = params[self.x_hi_idx] - if not self.support_required: params = params[2:] # chop off x_lo, x_hi from the shapes to actually pass to get_cdf + if not self.support_required: + params = params[ + 2: + ] # chop off x_lo, x_hi from the shapes to actually pass to get_cdf norm = np.diff(self.get_cdf(np.array([x_lo, x_hi]), *params)) if norm == 0: return np.full_like(x, np.inf) else: - return self.get_pdf(x, *params)/norm + return self.get_pdf(x, *params) / norm def cdf_norm(self, x, *params): """ Returns the specified sum of all distributions' :func:`get_cdf` methods, but normalized on a range x_lo to x_hi. Used in binned NLL fits. NOTE: This assumes that x_lo, x_hi are the first two elements in the parameter array, unless x_lo and x_hi are required to define the support - of the :func:`sum_dists` for *all* method calls. For :func:`sum_dists` created from a distribution where the support needs - to be explicitly passed, this means that :func:`cdf_norm` sets the fit range equal to the support range. + of the :func:`SumDists` for *all* method calls. For :func:`SumDists` created from a distribution where the support needs + to be explicitly passed, this means that :func:`cdf_norm` sets the fit range equal to the support range. It also means that summing distributions with two different explicit supports is not allowed, e.g. we cannot sum two step functions with different supports. """ # Grab the values of x_lo and x_hi from the parameters. For functions where the support needs to be defined, this can be anywhere in the parameter array # Otherwise, x_lo and x_hi *must* be the first two values passed. x_lo = params[self.x_lo_idx] x_hi = params[self.x_hi_idx] - if not self.support_required: params = params[2:] # chop off x_lo, x_hi from the shapes to actually pass to get_cdf + if not self.support_required: + params = params[ + 2: + ] # chop off x_lo, x_hi from the shapes to actually pass to get_cdf norm = np.diff(self.get_cdf(np.array([x_lo, x_hi]), *params)) - + if norm == 0: return np.full_like(x, np.inf) else: - return (self.get_cdf(x, *params))/norm - + return (self.get_cdf(x, *params)) / norm def pdf_ext(self, x, *params): """ Returns the specified sum of all distributions' :func:`pdf_ext` methods, normalized on a range x_lo to x_hi. Used in extended unbinned NLL fits. NOTE: This assumes that x_lo, x_hi are the first two elements in the parameter array, unless x_lo and x_hi are required to define the support - of the :func:`sum_dists` for *all* method calls. For :func:`sum_dists` created from a distribution where the support needs - to be explicitly passed, this means that :func:`pdf_ext` sets the fit range equal to the support range. + of the :func:`SumDists` for *all* method calls. For :func:`SumDists` created from a distribution where the support needs + to be explicitly passed, this means that :func:`pdf_ext` sets the fit range equal to the support range. It also means that summing distributions with two different explicit supports is not allowed, e.g. we cannot sum two step functions with different supports. """ - pdf_exts = self.dists + pdf_exts = self.dists params = np.array(params) # Grab the values of x_lo and x_hi from the parameters. For functions where the support needs to be defined, this can be anywhere in the parameter array # Otherwise, x_lo and x_hi *must* be the first two values passed. x_lo = params[self.x_lo_idx] x_hi = params[self.x_hi_idx] - if not self.support_required: params = params[2:] # chop off x_lo, x_hi from the shapes to actually pass to get_cdf - - fracs, areas = get_areas_fracs(params, self.area_frac_idxs, self.frac_flag, self.area_flag, self.one_area_flag) + if not self.support_required: + params = params[ + 2: + ] # chop off x_lo, x_hi from the shapes to actually pass to get_cdf + + fracs, areas = get_areas_fracs( + params, + self.area_frac_idxs, + self.frac_flag, + self.area_flag, + self.one_area_flag, + ) # sig = areas[0] + areas[1] # this is a hack, it performs faster but may not *always* be true - sig = areas[0]*fracs[0]*np.diff(pdf_exts[0].get_cdf(np.array([x_lo, x_hi]), *params[self.par_idxs[0]]))[0] + areas[1]*fracs[1]*np.diff(pdf_exts[1].get_cdf(np.array([x_lo, x_hi]), *params[self.par_idxs[1]]))[0] - + sig = ( + areas[0] + * fracs[0] + * np.diff( + pdf_exts[0].get_cdf(np.array([x_lo, x_hi]), *params[self.par_idxs[0]]) + )[0] + + areas[1] + * fracs[1] + * np.diff( + pdf_exts[1].get_cdf(np.array([x_lo, x_hi]), *params[self.par_idxs[1]]) + )[0] + ) if self.components: - return sig, areas[0]*fracs[0]*pdf_exts[0].get_pdf(x, *params[self.par_idxs[0]]) , areas[1]*fracs[1]*pdf_exts[1].get_pdf(x, *params[self.par_idxs[1]]) + return ( + sig, + areas[0] * fracs[0] * pdf_exts[0].get_pdf(x, *params[self.par_idxs[0]]), + areas[1] * fracs[1] * pdf_exts[1].get_pdf(x, *params[self.par_idxs[1]]), + ) else: - probs = areas[0]*fracs[0]*pdf_exts[0].get_pdf(x, *params[self.par_idxs[0]]) + areas[1]*fracs[1]*pdf_exts[1].get_pdf(x, *params[self.par_idxs[1]]) + probs = areas[0] * fracs[0] * pdf_exts[0].get_pdf( + x, *params[self.par_idxs[0]] + ) + areas[1] * fracs[1] * pdf_exts[1].get_pdf(x, *params[self.par_idxs[1]]) return sig, probs - def cdf_ext(self, x, *params): - """ + """ Returns the specified sum of all distributions' :func:`get_cdf` methods, used in extended binned NLL fits. """ - cdf_exts = self.dists + cdf_exts = self.dists params = np.array(params) - fracs, areas = get_areas_fracs(params, self.area_frac_idxs, self.frac_flag, self.area_flag, self.one_area_flag) + fracs, areas = get_areas_fracs( + params, + self.area_frac_idxs, + self.frac_flag, + self.area_flag, + self.one_area_flag, + ) if self.components: - return areas[0]*fracs[0]*cdf_exts[0].get_cdf(x, *params[self.par_idxs[0]]) , areas[1]*fracs[1]*cdf_exts[1].get_cdf(x, *params[self.par_idxs[1]]) + return areas[0] * fracs[0] * cdf_exts[0].get_cdf( + x, *params[self.par_idxs[0]] + ), areas[1] * fracs[1] * cdf_exts[1].get_cdf(x, *params[self.par_idxs[1]]) else: # This is faster than list comprehension - probs = areas[0]*fracs[0]*cdf_exts[0].get_cdf(x, *params[self.par_idxs[0]]) + areas[1]*fracs[1]*cdf_exts[1].get_cdf(x, *params[self.par_idxs[1]]) + probs = areas[0] * fracs[0] * cdf_exts[0].get_cdf( + x, *params[self.par_idxs[0]] + ) + areas[1] * fracs[1] * cdf_exts[1].get_cdf(x, *params[self.par_idxs[1]]) return probs def required_args(self) -> list: @@ -466,76 +591,76 @@ def required_args(self) -> list: Returns ------- req_args - A list of the required arguments for the :func:`sum_dists` instance, either passed by the user, or inferred. + A list of the required arguments for the :func:`SumDists` instance, either passed by the user, or inferred. """ return self.req_args - def draw_pdf(self, x: np.ndarray, *params) -> None: plt.plot(x, self.get_pdf(x, *params)) - + def draw_cdf(self, x: np.ndarray, *params) -> None: plt.plot(x, self.get_cdf(x, *params)) + # Create some convenience functions for fitting - # Create some convenience functions for fitting - - def get_mu(self, pars: np.ndarray, cov:np.ndarray = None, errors:np.ndarray = None) -> tuple: + def get_mu( + self, pars: np.ndarray, cov: np.ndarray = None, errors: np.ndarray = None + ) -> tuple: r""" - Get the mu value from the output of a fit quickly + Get the mu value from the output of a fit quickly - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Array of covariances - errors - Array of erros + errors + Array of errors - Returns + Returns ------- - mu, error + mu, error where mu is the fit value, and error is either from the covariance matrix or directly passed """ - req_args = np.array(self.required_args()) mu_idx = np.where(req_args == "mu")[0][0] mu = pars[mu_idx] - + if errors is not None: return mu, errors[mu_idx] elif cov is not None: return mu, np.sqrt(cov[mu_idx][mu_idx]) else: return mu - - def get_mode(self, pars: np.ndarray, cov:np.ndarray = None, errors:np.ndarray = None) -> tuple: + + def get_mode( + self, pars: np.ndarray, cov: np.ndarray = None, errors: np.ndarray = None + ) -> tuple: r""" - Get the mode value from the output of a fit quickly + Get the mode value from the output of a fit quickly Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Array of covariances - errors - Array of erros + errors + Array of errors - Returns + Returns ------- - mu, error + mu, error where mu is the fit value, and error is either from the covariance matrix or directly passed """ - req_args = np.array(self.required_args()) mu_idx = np.where(req_args == "mu")[0][0] mu = pars[mu_idx] - + if errors is not None: return mu, errors[mu_idx] elif cov is not None: @@ -543,24 +668,23 @@ def get_mode(self, pars: np.ndarray, cov:np.ndarray = None, errors:np.ndarray = else: return mu - def get_fwhm(self, pars: np.ndarray, cov: np.ndarray = None) -> tuple: r""" Get the fwhm value from the output of a fit quickly - Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, + Need to overload this to use hpge_peak_fwhm (to avoid a circular import) for when self is an hpge peak, and otherwise returns 2sqrt(2log(2))*sigma - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Optional, array of covariances for calculating error on the fwhm - Returns + Returns ------- - fwhm, error + fwhm, error the value of the fwhm and its error """ @@ -568,27 +692,29 @@ def get_fwhm(self, pars: np.ndarray, cov: np.ndarray = None) -> tuple: sigma_idx = np.where(req_args == "sigma")[0][0] if cov is None: - return pars[sigma_idx]*2*np.sqrt(2*np.log(2)) + return pars[sigma_idx] * 2 * np.sqrt(2 * np.log(2)) else: - return pars[sigma_idx]*2*np.sqrt(2*np.log(2)), np.sqrt(cov[sigma_idx][sigma_idx])*2*np.sqrt(2*np.log(2)) - + return pars[sigma_idx] * 2 * np.sqrt(2 * np.log(2)), np.sqrt( + cov[sigma_idx][sigma_idx] + ) * 2 * np.sqrt(2 * np.log(2)) + def get_fwfm(self, pars: np.ndarray, cov: np.ndarray = None, frac_max=0.5) -> tuple: r""" Get the fwfm value from the output of a fit quickly - Need to overload this to use hpge_peak_fwfm (to avoid a circular import) for when self is an hpge peak, + Need to overload this to use hpge_peak_fwfm (to avoid a circular import) for when self is an hpge peak, and otherwise returns 2sqrt(2log(2))*sigma - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Optional, array of covariances for calculating error on the fwhm - Returns + Returns ------- - fwhm, error + fwhm, error the value of the fwhm and its error """ @@ -596,43 +722,48 @@ def get_fwfm(self, pars: np.ndarray, cov: np.ndarray = None, frac_max=0.5) -> tu sigma_idx = np.where(req_args == "sigma")[0][0] if cov is None: - return pars[sigma_idx]*2*np.sqrt(-2*np.log(frac_max)) + return pars[sigma_idx] * 2 * np.sqrt(-2 * np.log(frac_max)) else: - return pars[sigma_idx]*2*np.sqrt(-2*np.log(frac_max)), np.sqrt(cov[sigma_idx][sigma_idx])*2*np.sqrt(-2*np.log(frac_max)) + return pars[sigma_idx] * 2 * np.sqrt(-2 * np.log(frac_max)), np.sqrt( + cov[sigma_idx][sigma_idx] + ) * 2 * np.sqrt(-2 * np.log(frac_max)) - - def get_total_events(self, pars: np.ndarray, cov: np.ndarray = None, errors: np.ndarray =None) -> tuple: + def get_total_events( + self, pars: np.ndarray, cov: np.ndarray = None, errors: np.ndarray = None + ) -> tuple: r""" Get the total events from the output of an extended fit quickly - The total number of events come from the sum of the area of the signal and the area of the background components + The total number of events come from the sum of the area of the signal and the area of the background components - Parameters + Parameters ---------- - pars + pars Array of fit parameters - cov + cov Array of covariances - errors - Array of erros + errors + Array of errors - Returns + Returns ------- - n_total, error - the total number of events in a spectrum and its associated error + n_total, error + the total number of events in a spectrum and its associated error """ - + req_args = np.array(self.required_args()) n_sig_idx = np.where(req_args == "n_sig")[0][0] n_bkg_idx = np.where(req_args == "n_bkg")[0][0] - if errors is not None: - return pars[n_sig_idx]+pars[n_bkg_idx], np.sqrt(errors[n_sig_idx]**2 + errors[n_bkg_idx]**2) + return pars[n_sig_idx] + pars[n_bkg_idx], np.sqrt( + errors[n_sig_idx] ** 2 + errors[n_bkg_idx] ** 2 + ) elif cov is not None: - return pars[n_sig_idx]+pars[n_bkg_idx], np.sqrt(cov[n_sig_idx][n_sig_idx]**2 + cov[n_bkg_idx][n_bkg_idx]**2) + return pars[n_sig_idx] + pars[n_bkg_idx], np.sqrt( + cov[n_sig_idx][n_sig_idx] ** 2 + cov[n_bkg_idx][n_bkg_idx] ** 2 + ) else: - return pars[n_sig_idx]+pars[n_bkg_idx] - + return pars[n_sig_idx] + pars[n_bkg_idx] def __getattribute__(self, attr): """ @@ -640,14 +771,20 @@ def __getattribute__(self, attr): """ value = object.__getattribute__(self, attr) - if attr in ['get_pdf','get_cdf', 'cdf_ext']: - params = [inspect.Parameter(param, - inspect.Parameter.POSITIONAL_OR_KEYWORD) - for param in self.x_shapes] + if attr in ["get_pdf", "get_cdf", "cdf_ext"]: + params = [ + inspect.Parameter(param, inspect.Parameter.POSITIONAL_OR_KEYWORD) + for param in self.x_shapes + ] value = copy_signature(inspect.Signature(params), value) - if attr in ['pdf_norm', 'pdf_ext', 'cdf_norm']: # Set these to include the x_lo, x_hi at the correct positions since these always need those params - params = [inspect.Parameter(param, - inspect.Parameter.POSITIONAL_OR_KEYWORD) - for param in self.extended_shapes] + if attr in [ + "pdf_norm", + "pdf_ext", + "cdf_norm", + ]: # Set these to include the x_lo, x_hi at the correct positions since these always need those params + params = [ + inspect.Parameter(param, inspect.Parameter.POSITIONAL_OR_KEYWORD) + for param in self.extended_shapes + ] value = copy_signature(inspect.Signature(params), value) - return value \ No newline at end of file + return value diff --git a/src/pygama/math/functions/triple_gauss_on_double_step.py b/src/pygama/math/functions/triple_gauss_on_double_step.py index bd60d3d42..105b7bb41 100644 --- a/src/pygama/math/functions/triple_gauss_on_double_step.py +++ b/src/pygama/math/functions/triple_gauss_on_double_step.py @@ -1,54 +1,128 @@ r""" Provide a convenience function for three gaussians on two steps -Parameters +Parameters ---------- x_lo - The lower range to compute the normalization of the step functions + The lower range to compute the normalization of the step functions x_hi - The upper range to compute the normalization of the step functions -area1, mu_1, sigma_1 + The upper range to compute the normalization of the step functions +area1, mu_1, sigma_1 The area, location, and scale of the first Gaussian area2, mu_2, sigma_2 - The area, location, and scale of the second Gaussian -area3, mu_3, sigma_3 + The area, location, and scale of the second Gaussian +area3, mu_3, sigma_3 The area, location, and scale of the third Gaussian -area4, hstep_1 +area4, hstep_1 The area and height of the first step function area5, hstep_2 - The area and height of the second step function + The area and height of the second step function Example -------- +------- triple_gauss_on_double_step.get_pdf(x, pars = [x_lo, x_hi, n_sig1, mu1, sigma1, n_sig2, mu2, sigma2, n_sig3, mu3, sigma3, n_bkg1, hstep1, n_bkg2, hstep2]) -Returns +Returns ------- triple_gauss_on_double_step - A subclass of sum_dists and rv_continuous, has methods of pdf, cdf, etc. + A subclass of SumDists and rv_continuous, has methods of pdf, cdf, etc. -Notes ------ +Notes +----- The first step function shares the mu_1, sigma_1 with the first Gaussian, and the second step function shares the mu_2, sigma_2 with the second Gaussian """ -import sys - -import numba as nb -import numpy as np - -from scipy.stats import rv_continuous -from pygama.math.functions.sum_dists import sum_dists from pygama.math.functions.gauss import gaussian from pygama.math.functions.gauss_on_step import gauss_on_step +from pygama.math.functions.sum_dists import SumDists -(x_lo, x_hi, area1, mu1, sigma1, area4, hstep1, area2, mu2, sigma2, area5, hstep2) = range(12) -par_array = [(gauss_on_step, [x_lo, x_hi, area1, mu1, sigma1, area4, hstep1]), (gauss_on_step, [x_lo, x_hi, area2, mu2, sigma2, area5, hstep2])] +(x_lo, x_hi, area1, mu1, sigma1, area4, hstep1, area2, mu2, sigma2, area5, hstep2) = ( + range(12) +) +par_array = [ + (gauss_on_step, [x_lo, x_hi, area1, mu1, sigma1, area4, hstep1]), + (gauss_on_step, [x_lo, x_hi, area2, mu2, sigma2, area5, hstep2]), +] -double_gauss_on_double_step = sum_dists(par_array, [], None, parameter_names = ["x_lo", "x_hi", "area1", "mu1", "sigma1", "area4", "hstep1", "area2", "mu2", "sigma2", "area5", "hstep2"]) +double_gauss_on_double_step = SumDists( + par_array, + [], + None, + parameter_names=[ + "x_lo", + "x_hi", + "area1", + "mu1", + "sigma1", + "area4", + "hstep1", + "area2", + "mu2", + "sigma2", + "area5", + "hstep2", + ], +) -(x_lo, x_hi, area1, mu1, sigma1, area2, mu2, sigma2, area3, mu3, sigma3, area4, hstep1, area5, hstep2) = range(15) +( + x_lo, + x_hi, + area1, + mu1, + sigma1, + area2, + mu2, + sigma2, + area3, + mu3, + sigma3, + area4, + hstep1, + area5, + hstep2, +) = range(15) -par_array = [(gaussian, [mu3, sigma3]), (double_gauss_on_double_step, [x_lo, x_hi, area1, mu1, sigma1, area4, hstep1, area2, mu2, sigma2, area5, hstep2])] -triple_gauss_on_double_step = sum_dists(par_array, [area3], "one_area", parameter_names=["x_lo", "x_hi", "n_sig1", "mu1", "sigma1", "n_sig2", "mu2", "sigma2", "n_sig3", "mu3", "sigma3", "n_bkg1", "hstep1", "n_bkg2", "hstep2"], name="triple_gauss_on_double_step") \ No newline at end of file +par_array = [ + (gaussian, [mu3, sigma3]), + ( + double_gauss_on_double_step, + [ + x_lo, + x_hi, + area1, + mu1, + sigma1, + area4, + hstep1, + area2, + mu2, + sigma2, + area5, + hstep2, + ], + ), +] +triple_gauss_on_double_step = SumDists( + par_array, + [area3], + "one_area", + parameter_names=[ + "x_lo", + "x_hi", + "n_sig1", + "mu1", + "sigma1", + "n_sig2", + "mu2", + "sigma2", + "n_sig3", + "mu3", + "sigma3", + "n_bkg1", + "hstep1", + "n_bkg2", + "hstep2", + ], + name="triple_gauss_on_double_step", +) diff --git a/src/pygama/math/functions/uniform.py b/src/pygama/math/functions/uniform.py index 05bb2ac95..55d8ce695 100644 --- a/src/pygama/math/functions/uniform.py +++ b/src/pygama/math/functions/uniform.py @@ -2,16 +2,13 @@ Uniform distributions for pygama """ -import sys - - import numba as nb import numpy as np from numba import prange -from pygama.math.functions.pygama_continuous import pygama_continuous -from pygama.utils import numba_math_defaults_kwargs as nb_kwargs +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.utils import numba_math_defaults as nb_defaults +from pygama.utils import numba_math_defaults_kwargs as nb_kwargs @nb.njit(**nb_kwargs) @@ -36,12 +33,12 @@ def nb_uniform_pdf(x: np.ndarray, a: float, b: float) -> np.ndarray: b The upper edge of the distribution """ - b = a+b # gives dist on [a, a+b] like scipy's does - w = b-a + b = a + b # gives dist on [a, a+b] like scipy's does + w = b - a p = np.empty_like(x, np.float64) for i in prange(x.shape[0]): if a <= x[i] <= b: - p[i] = 1/w + p[i] = 1 / w else: p[i] = 0 return p @@ -70,14 +67,14 @@ def nb_uniform_cdf(x: np.ndarray, a: float, b: float) -> np.ndarray: The upper edge of the distribution """ - b = a+b # gives dist on [a, a+b] like scipy's does - w = b-a + b = a + b # gives dist on [a, a+b] like scipy's does + w = b - a p = np.empty_like(x, np.float64) for i in prange(x.shape[0]): if a <= x[i]: if x[i] <= b: - p[i] = (x[i]-a)/w - else: + p[i] = (x[i] - a) / w + else: p[i] = 1 else: p[i] = 0 @@ -102,7 +99,7 @@ def nb_uniform_scaled_pdf(x: np.ndarray, area: float, a: float, b: float) -> np. The lower edge of the distribution b The upper edge of the distribution - """ + """ return area * nb_uniform_pdf(x, a, b) @@ -110,7 +107,7 @@ def nb_uniform_scaled_pdf(x: np.ndarray, area: float, a: float, b: float) -> np. @nb.njit(**nb_defaults(parallel=False)) def nb_uniform_scaled_cdf(x: np.ndarray, area: float, a: float, b: float) -> np.ndarray: r""" - Uniform cdf scaled by the area, used for extended binned fits + Uniform cdf scaled by the area, used for extended binned fits As a Numba JIT function, it runs slightly faster than 'out of the box' functions. @@ -125,19 +122,19 @@ def nb_uniform_scaled_cdf(x: np.ndarray, area: float, a: float, b: float) -> np. The lower edge of the distribution b The upper edge of the distribution - """ - + """ + return area * nb_uniform_cdf(x, a, b) -class uniform_gen(pygama_continuous): +class UniformGen(PygamaContinuous): def _get_support(self, a, b): return a, b def _argcheck(self, a, b): - return (b>a) - + return b > a + def __init__(self, *args, **kwargs): self.x_lo = None self.x_hi = None @@ -145,25 +142,38 @@ def __init__(self, *args, **kwargs): def _pdf(self, x: np.ndarray, a, b) -> np.ndarray: return nb_uniform_pdf(x, a[0], b[0]) + def _cdf(self, x: np.ndarray, a, b) -> np.ndarray: return nb_uniform_cdf(x, a[0], b[0]) def get_pdf(self, x: np.ndarray, a: float, b: float) -> np.ndarray: - return nb_uniform_pdf(x, a, b) + return nb_uniform_pdf(x, a, b) + def get_cdf(self, x: np.ndarray, a: float, b: float) -> np.ndarray: return nb_uniform_cdf(x, a, b) - def pdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, a: float, b: float) -> np.ndarray: + def pdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, a: float, b: float + ) -> np.ndarray: return self._pdf_norm(x, x_lo, x_hi, a, b) - def cdf_norm(self, x: np.ndarray, x_lo: float, x_hi: float, a: float, b: float) -> np.ndarray: + + def cdf_norm( + self, x: np.ndarray, x_lo: float, x_hi: float, a: float, b: float + ) -> np.ndarray: return self._cdf_norm(x, x_lo, x_hi, a, b) - def pdf_ext(self, x: np.ndarray, x_lo: float, x_hi:float, area: float, a: float, b: float) -> np.ndarray: - return np.diff(nb_uniform_scaled_cdf(np.array([x_lo, x_hi]), area, a, b))[0], nb_uniform_scaled_pdf(x, area, a, b) + def pdf_ext( + self, x: np.ndarray, x_lo: float, x_hi: float, area: float, a: float, b: float + ) -> np.ndarray: + return np.diff(nb_uniform_scaled_cdf(np.array([x_lo, x_hi]), area, a, b))[ + 0 + ], nb_uniform_scaled_pdf(x, area, a, b) + def cdf_ext(self, x: np.ndarray, area: float, a: float, b: float) -> np.ndarray: return nb_uniform_scaled_cdf(x, area, a, b) def required_args(self) -> tuple[str, str]: return "a", "b" -uniform = uniform_gen(a=0.0, b=1.0, name='uniform') \ No newline at end of file + +uniform = UniformGen(a=0.0, b=1.0, name="uniform") diff --git a/src/pygama/math/histogram.py b/src/pygama/math/histogram.py index 45cb89c7b..f76c83b85 100644 --- a/src/pygama/math/histogram.py +++ b/src/pygama/math/histogram.py @@ -13,21 +13,28 @@ they will help you if you need to do something trickier than is provided (e.g. 2D hists). """ + import logging -from typing import Optional, Union, Callable +from typing import Callable, Optional, Union +import hist as bh import matplotlib.pyplot as plt +import numba as nb import numpy as np from matplotlib import rcParams -import boost_histogram as bh -import numba as nb import pygama.math.utils as pgu log = logging.getLogger(__name__) -def get_hist(data: np.ndarray, bins: Optional[Union[int, np.ndarray, str]] = None, range: Optional[tuple[float, float]] = None, dx: Optional[float] = None, wts: Optional[Union[float, np.ndarray]] = None) -> tuple[np.ndarray, ...]: +def get_hist( + data: np.ndarray, + bins: Optional[Union[int, np.ndarray, str]] = None, + range: Optional[tuple[float, float]] = None, + dx: Optional[float] = None, + wts: Optional[Union[float, np.ndarray]] = None, +) -> tuple[np.ndarray, ...]: """return hist, bins, var after binning data This is just a wrapper for humba.histogram, with optional weights for each @@ -38,7 +45,7 @@ def get_hist(data: np.ndarray, bins: Optional[Union[int, np.ndarray, str]] = Non Available binning methods: - Default (no binning arguments) : 100 bins over an auto-detected range - - bins=N, range=(x_lo, x_hi) : N bins over the specified range (or leave + - bins=n, range=(x_lo, x_hi) : n bins over the specified range (or leave range=None for auto-detected range) - bins=[str] : use one of np.histogram's automatic binning algorithms - bins=bin_edges_array : array lower bin edges, supports non-uniform binning @@ -78,7 +85,7 @@ def get_hist(data: np.ndarray, bins: Optional[Union[int, np.ndarray, str]] = Non array of variances in each bin of the histogram """ if bins is None: - bins = 100 # override boost_histogram.Histogram default of just 10 + bins = 100 # override boost_histogram.Histogram default of just 10 if dx is not None: bins = int((range[1] - range[0]) / dx) @@ -88,14 +95,18 @@ def get_hist(data: np.ndarray, bins: Optional[Union[int, np.ndarray, str]] = Non # bins includes left edge of first bin and right edge of all other bins # allow scalar weights - if wts is not None and np.shape(wts) == (): wts = np.full_like(data, wts) + if wts is not None and np.shape(wts) == (): + wts = np.full_like(data, wts) - # initialize the boost_histogram object + # initialize the boost_histogram object if isinstance(bins, int): - boost_histogram = bh.Histogram(bh.axis.Regular(bins=bins, start=range[0], stop=range[1]), storage=bh.storage.Weight()) + boost_histogram = bh.Hist( + bh.axis.Regular(bins=bins, start=range[0], stop=range[1]), + storage=bh.storage.Weight(), + ) else: # if bins are specified need to use variable - boost_histogram = bh.Histogram(bh.axis.Variable(bins), storage=bh.storage.Weight()) + boost_histogram = bh.Hist(bh.axis.Variable(bins), storage=bh.storage.Weight()) # create the histogram boost_histogram.fill(data, weight=wts) # read out the histogram, bins, and variances @@ -105,8 +116,10 @@ def get_hist(data: np.ndarray, bins: Optional[Union[int, np.ndarray, str]] = Non return hist, bins, var -def better_int_binning(x_lo: float = 0, x_hi: float = None, dx: float = None, n_bins: float = None) -> tuple[int, int, int, int]: - """ Get a good binning for integer data. +def better_int_binning( + x_lo: float = 0, x_hi: float = None, dx: float = None, n_bins: float = None +) -> tuple[int, int, int, int]: + """Get a good binning for integer data. Guarantees an integer bin width. @@ -135,42 +148,46 @@ def better_int_binning(x_lo: float = 0, x_hi: float = None, dx: float = None, n_ best int n_bins, returned if arg n_bins is not None """ # process inputs - n_Nones = int(x_hi is None) + int(dx is None) + int(n_bins is None) - if n_Nones > 1: - raise ValueError('must provide two of x_hi, dx or n_bins') - if n_Nones == 0: - log.warning('Overconstrained. Ignoring x_hi.') + n_nones = int(x_hi is None) + int(dx is None) + int(n_bins is None) + if n_nones > 1: + raise ValueError("must provide two of x_hi, dx or n_bins") + if n_nones == 0: + log.warning("Overconstrained. Ignoring x_hi.") x_hi = None # get valid dx or n_bins if dx is not None: if dx <= 0: - raise ValueError(f'invalid dx={dx}') + raise ValueError(f"invalid dx={dx}") dx = np.round(dx) - if dx == 0: dx = 1 + if dx == 0: + dx = 1 if n_bins is not None: if n_bins <= 0: - raise ValueError(f'invalid n_bins={n_bins}') + raise ValueError(f"invalid n_bins={n_bins}") n_bins = np.round(n_bins) # can already return if no x_hi - if x_hi is None: # must have both dx and n_bins + if x_hi is None: # must have both dx and n_bins return int(x_lo), int(dx), int(n_bins) # x_hi is valid. Get a valid dx if we don't have one - if dx is None: # must have n_bins - dx = np.round((x_hi-x_lo)/n_bins) - if dx == 0: dx = 1 + if dx is None: # must have n_bins + dx = np.round((x_hi - x_lo) / n_bins) + if dx == 0: + dx = 1 # Finally, build a good binning from dx - n_bins = np.ceil((x_hi-x_lo)/dx) + n_bins = np.ceil((x_hi - x_lo) / dx) x_lo = np.floor(x_lo) - x_hi = x_lo + n_bins*dx - if n_bins is None: return int(x_lo), int(x_hi), int(dx) - else: return int(x_lo), int(x_hi), int(n_bins) + x_hi = x_lo + n_bins * dx + if n_bins is None: + return int(x_lo), int(x_hi), int(dx) + else: + return int(x_lo), int(x_hi), int(n_bins) -@nb.njit(parallel = False, fastmath = True) +@nb.njit(parallel=False, fastmath=True) def get_bin_centers(bins: np.ndarray) -> np.ndarray: """ Returns an array of bin centers from an input array of bin edges. @@ -186,10 +203,10 @@ def get_bin_centers(bins: np.ndarray) -> np.ndarray: bin_centers The array of bin centers """ - return (bins[:-1] + bins[1:]) / 2. + return (bins[:-1] + bins[1:]) / 2.0 -@nb.njit(parallel = False, fastmath = True) +@nb.njit(parallel=False, fastmath=True) def get_bin_widths(bins: np.ndarray) -> np.ndarray: """ Returns an array of bin widths from an input array of bin edges. @@ -205,10 +222,10 @@ def get_bin_widths(bins: np.ndarray) -> np.ndarray: bin_widths An array of bin widths """ - return (bins[1:] - bins[:-1]) + return bins[1:] - bins[:-1] -@nb.njit(parallel = False, fastmath = True) +@nb.njit(parallel=False, fastmath=True) def find_bin(x: float, bins: np.ndarray) -> int: """ Returns the index of the bin containing x @@ -222,7 +239,7 @@ def find_bin(x: float, bins: np.ndarray) -> int: The value to search for amongst the bins bins The input array of bin-edges - + Returns ------- bin_idx @@ -231,45 +248,66 @@ def find_bin(x: float, bins: np.ndarray) -> int: TODO: replace np.searchsorted with for loops, numba will speed this function up then """ # first handle overflow / underflow - if len(bins) == 0: return 0 # i.e. overflow - if x < bins[0]: return -1 - if x > bins[-1]: return len(bins) + if len(bins) == 0: + return 0 # i.e. overflow + if x < bins[0]: + return -1 + if x > bins[-1]: + return len(bins) # we are definitely in range, and there are at least 2 bin edges, one below # and one above x. try assuming uniform bins - dx = bins[1]-bins[0] - index = int(np.floor((x-bins[0])/dx)) - if bins[index] <= x and bins[index+1] > x: return index + dx = bins[1] - bins[0] + index = int(np.floor((x - bins[0]) / dx)) + if bins[index] <= x and bins[index + 1] > x: + return index # bins are non-uniform: find by binary search - return np.searchsorted(bins, x, side='right') + return np.searchsorted(bins, x, side="right") -def range_slice(x_min: float, x_max: float, hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None) -> tuple[np.ndarray, ...]: +def range_slice( + x_min: float, + x_max: float, + hist: np.ndarray, + bins: np.ndarray, + var: Optional[np.ndarray] = None, +) -> tuple[np.ndarray, ...]: """ Get the histogram bins and values for a given slice of the range - Parameters - ---------- - x_min - Lower endpoint of the range - x_min - Upper endpoint of the range - hist, bins, var + Parameters + ---------- + x_min + Lower endpoint of the range + x_min + Upper endpoint of the range + hist, bins, var The histogrammed data to search through See Also -------- - find_bin + find_bin for parameters and return values """ i_min = find_bin(x_min, bins) i_max = find_bin(x_max, bins) - if var is not None: var = var[i_min:i_max] - return hist[i_min:i_max], bins[i_min:i_max+1], var - - -def get_fwhm(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None, mx: Optional[Union[float , tuple[float, float]]]=None, dmx: Optional[float]=0, bl: Optional[Union[float , tuple[float, float]]]=0, dbl: Optional[float]=0, method: str='bins_over_f', n_slope: int=3) -> tuple[float, float]: + if var is not None: + var = var[i_min:i_max] + return hist[i_min:i_max], bins[i_min : i_max + 1], var + + +def get_fwhm( + hist: np.ndarray, + bins: np.ndarray, + var: Optional[np.ndarray] = None, + mx: Optional[Union[float, tuple[float, float]]] = None, + dmx: Optional[float] = 0, + bl: Optional[Union[float, tuple[float, float]]] = 0, + dbl: Optional[float] = 0, + method: str = "bins_over_f", + n_slope: int = 3, +) -> tuple[float, float]: """Convenience function for the FWHM of data in a histogram Typically used by sending slices around a peak. Searches about argmax(hist) @@ -323,11 +361,24 @@ def get_fwhm(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None, Function that computes the FWFM """ if len(bins) == len(hist): - log.warning("note: this function has been updated to require bins rather than bin_centers. Don't trust this result") + log.warning( + "note: this function has been updated to require bins rather than bin_centers. Don't trust this result" + ) return get_fwfm(0.5, hist, bins, var, mx, dmx, bl, dbl, method, n_slope) -def get_fwfm(fraction: float, hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None, mx: Optional[Union[float , tuple[float, float]]]=None, dmx: Optional[float]=0, bl: Optional[Union[float , tuple[float, float]]]=0, dbl: Optional[float]=0, method: str='bins_over_f', n_slope: int=3) -> tuple[float, float]: +def get_fwfm( + fraction: float, + hist: np.ndarray, + bins: np.ndarray, + var: Optional[np.ndarray] = None, + mx: Optional[Union[float, tuple[float, float]]] = None, + dmx: Optional[float] = 0, + bl: Optional[Union[float, tuple[float, float]]] = 0, + dbl: Optional[float] = 0, + method: str = "bins_over_f", + n_slope: int = 3, +) -> tuple[float, float]: """ Estimate the full width at some fraction of the max of data in a histogram @@ -394,7 +445,7 @@ def get_fwfm(fraction: float, hist: np.ndarray, bins: np.ndarray, var: Optional[ mx = np.amax(hist) if var is not None and dmx == 0: dmx = np.sqrt(var[np.argmax(hist)]) - idxs_over_f = hist > (bl + fraction * (mx-bl)) + idxs_over_f = hist > (bl + fraction * (mx - bl)) # argmax will return the index of the first occurrence of a maximum # so we can use it to find the first and last time idxs_over_f is "True" @@ -403,101 +454,121 @@ def get_fwfm(fraction: float, hist: np.ndarray, bins: np.ndarray, var: Optional[ bin_centers = get_bin_centers(bins) # precalc dheight: uncertainty in height used as the threshold - dheight2 = (fraction*dmx)**2 + ((1-fraction)*dbl)**2 + dheight2 = (fraction * dmx) ** 2 + ((1 - fraction) * dbl) ** 2 - if method == 'bins_over_f': + if method == "bins_over_f": # the simplest method: just take the difference in the bin centers fwfm = bin_centers[bin_hi] - bin_centers[bin_lo] # compute rough uncertainty as [bin width] (+) [dheight / slope] - dx = bin_centers[bin_lo] - bin_centers[bin_lo-1] - dy = hist[bin_lo] - hist[bin_lo-1] - if dy == 0: dy = (hist[bin_lo+1] - hist[bin_lo-2])/3 - dfwfm2 = dx**2 + dheight2 * (dx/dy)**2 - dx = bin_centers[bin_hi+1] - bin_centers[bin_hi] - dy = hist[bin_hi] - hist[bin_hi+1] - if dy == 0: dy = (hist[bin_hi-1] - hist[bin_hi+2])/3 - dfwfm2 += dx**2 + dheight2 * (dx/dy)**2 + dx = bin_centers[bin_lo] - bin_centers[bin_lo - 1] + dy = hist[bin_lo] - hist[bin_lo - 1] + if dy == 0: + dy = (hist[bin_lo + 1] - hist[bin_lo - 2]) / 3 + dfwfm2 = dx**2 + dheight2 * (dx / dy) ** 2 + dx = bin_centers[bin_hi + 1] - bin_centers[bin_hi] + dy = hist[bin_hi] - hist[bin_hi + 1] + if dy == 0: + dy = (hist[bin_hi - 1] - hist[bin_hi + 2]) / 3 + dfwfm2 += dx**2 + dheight2 * (dx / dy) ** 2 return fwfm, np.sqrt(dfwfm2) - elif method == 'interpolate': + elif method == "interpolate": # interpolate between the two bins that cross the [fraction] line # works well for high stats - if bin_lo < 1 or bin_hi >= len(hist)-1: + if bin_lo < 1 or bin_hi >= len(hist) - 1: raise ValueError(f"Can't interpolate ({bin_lo}, {bin_hi})") - val_f = bl + fraction*(mx-bl) + val_f = bl + fraction * (mx - bl) # x_lo - dx = bin_centers[bin_lo] - bin_centers[bin_lo-1] - dhf = val_f - hist[bin_lo-1] - dh = hist[bin_lo] - hist[bin_lo-1] - x_lo = bin_centers[bin_lo-1] + dx * dhf/dh + dx = bin_centers[bin_lo] - bin_centers[bin_lo - 1] + dhf = val_f - hist[bin_lo - 1] + dh = hist[bin_lo] - hist[bin_lo - 1] + x_lo = bin_centers[bin_lo - 1] + dx * dhf / dh # uncertainty dx2_lo = 0 if var is not None: - dx2_lo = (dhf/dh)**2 * var[bin_lo] + ((dh-dhf)/dh)**2 * var[bin_lo-1] - dx2_lo *= (dx/dh)**2 - dDdh = -dx/dh + dx2_lo = (dhf / dh) ** 2 * var[bin_lo] + ((dh - dhf) / dh) ** 2 * var[ + bin_lo - 1 + ] + dx2_lo *= (dx / dh) ** 2 + dd_dh = -dx / dh # x_hi - dx = bin_centers[bin_hi+1] - bin_centers[bin_hi] + dx = bin_centers[bin_hi + 1] - bin_centers[bin_hi] dhf = hist[bin_hi] - val_f - dh = hist[bin_hi] - hist[bin_hi+1] + dh = hist[bin_hi] - hist[bin_hi + 1] if dh == 0: - raise ValueError(f"Interpolation failed, dh == 0") - x_hi = bin_centers[bin_hi] + dx * dhf/dh + raise ValueError("Interpolation failed, dh == 0") + x_hi = bin_centers[bin_hi] + dx * dhf / dh if x_hi < x_lo: - raise ValueError(f"Interpolation produced negative fwfm") + raise ValueError("Interpolation produced negative fwfm") # uncertainty dx2_hi = 0 if var is not None: - dx2_hi = (dhf/dh)**2 * var[bin_hi+1] + ((dh-dhf)/dh)**2 * var[bin_hi] - dx2_hi *= (dx/dh)**2 - dDdh += dx/dh + dx2_hi = (dhf / dh) ** 2 * var[bin_hi + 1] + ((dh - dhf) / dh) ** 2 * var[ + bin_hi + ] + dx2_hi *= (dx / dh) ** 2 + dd_dh += dx / dh - return x_hi - x_lo, np.sqrt(dx2_lo + dx2_hi + dDdh**2 * dheight2) + return x_hi - x_lo, np.sqrt(dx2_lo + dx2_hi + dd_dh**2 * dheight2) - elif method == 'fit_slopes': + elif method == "fit_slopes": # evaluate the [fraction] point on a line fit to n_slope bins near the crossing. # works okay even when stats are moderate - val_f = bl + fraction*(mx-bl) + val_f = bl + fraction * (mx - bl) # x_lo - i_0 = bin_lo - int(np.floor(n_slope/2)) + i_0 = bin_lo - int(np.floor(n_slope / 2)) if i_0 < 0: - raise RuntimeError(f"Fit slopes failed") + raise RuntimeError("Fit slopes failed") i_n = i_0 + n_slope - wts = None if var is None else 1/np.sqrt(var[i_0:i_n]) #fails for any var = 0 + wts = ( + None if var is None else 1 / np.sqrt(var[i_0:i_n]) + ) # fails for any var = 0 wts = [w if w != np.inf else 0 for w in wts] try: - (m, b), cov = np.polyfit(bin_centers[i_0:i_n], hist[i_0:i_n], 1, w=wts, cov='unscaled') + (m, b), cov = np.polyfit( + bin_centers[i_0:i_n], hist[i_0:i_n], 1, w=wts, cov="unscaled" + ) except np.linalg.LinAlgError: - raise RuntimeError(f"LinAlgError in x_lo") - x_lo = (val_f-b)/m - #uncertainty - dxl2 = cov[0,0]/m**2 + (cov[1,1] + dheight2)/(val_f-b)**2 + 2*cov[0,1]/(val_f-b)/m + raise RuntimeError("LinAlgError in x_lo") + x_lo = (val_f - b) / m + # uncertainty + dxl2 = ( + cov[0, 0] / m**2 + + (cov[1, 1] + dheight2) / (val_f - b) ** 2 + + 2 * cov[0, 1] / (val_f - b) / m + ) dxl2 *= x_lo**2 # x_hi - i_0 = bin_hi - int(np.floor(n_slope/2)) + 1 + i_0 = bin_hi - int(np.floor(n_slope / 2)) + 1 if i_0 == len(hist): - raise RuntimeError(f"Fit slopes failed") + raise RuntimeError("Fit slopes failed") i_n = i_0 + n_slope - wts = None if var is None else 1/np.sqrt(var[i_0:i_n]) + wts = None if var is None else 1 / np.sqrt(var[i_0:i_n]) wts = [w if w != np.inf else 0 for w in wts] try: - (m, b), cov = np.polyfit(bin_centers[i_0:i_n], hist[i_0:i_n], 1, w=wts, cov='unscaled') + (m, b), cov = np.polyfit( + bin_centers[i_0:i_n], hist[i_0:i_n], 1, w=wts, cov="unscaled" + ) except np.linalg.LinAlgError: - raise RuntimeError(f"LinAlgError in x_hi") - x_hi = (val_f-b)/m + raise RuntimeError("LinAlgError in x_hi") + x_hi = (val_f - b) / m if x_hi < x_lo: - raise RuntimeError(f"Fit slopes produced negative fwfm") + raise RuntimeError("Fit slopes produced negative fwfm") - #uncertainty - dxh2 = cov[0,0]/m**2 + (cov[1,1] + dheight2)/(val_f-b)**2 + 2*cov[0,1]/(val_f-b)/m + # uncertainty + dxh2 = ( + cov[0, 0] / m**2 + + (cov[1, 1] + dheight2) / (val_f - b) ** 2 + + 2 * cov[0, 1] / (val_f - b) / m + ) dxh2 *= x_hi**2 return x_hi - x_lo, np.sqrt(dxl2 + dxh2) @@ -506,11 +577,22 @@ def get_fwfm(fraction: float, hist: np.ndarray, bins: np.ndarray, var: Optional[ raise NameError(f"Unrecognized method {method}") -def plot_hist(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None, show_stats: bool=False, stats_hloc: float=0.75, stats_vloc: float=0.85, fill: bool=False, fillcolor: str='r', fillalpha: float=0.2, **kwargs) -> None: +def plot_hist( + hist: np.ndarray, + bins: np.ndarray, + var: Optional[np.ndarray] = None, + show_stats: bool = False, + stats_hloc: float = 0.75, + stats_vloc: float = 0.85, + fill: bool = False, + fillcolor: str = "r", + fillalpha: float = 0.2, + **kwargs, +) -> None: """ Plot a step histogram, with optional error bars - Parameters + Parameters ---------- hist The histogram data array containing the peak @@ -520,74 +602,104 @@ def plot_hist(hist: np.ndarray, bins: np.ndarray, var: Optional[np.ndarray]=None An array of histogram variances. Used with the 'fit_slopes' method show_stats If True, shows the mean, mean error, and standard deviation of the histogram on the plot - stats_hloc + stats_hloc matplotlib.pyplot horizontal location to place the stats - stats_vloc + stats_vloc matplotlib.pyplot vertical location to place the stats - fill + fill If True, fills in a step histogram when plotting fill_color - Color to fill the step histogram if fill is True - fill_alpha + Color to fill the step histogram if fill is True + fill_alpha Alpha amount if fill is True """ if fill: # the concat calls get the steps to draw correctly at the range boundaries # where="post" tells plt to draw the step y[i] between x[i] and x[i+1] save_color = None - if 'color' in kwargs: save_color = kwargs.pop('color') - elif 'c' in kwargs: save_color = kwargs.pop('c') - plt.fill_between(np.concatenate(([bins[0]], bins)), np.concatenate(([0], hist, [0])), step="post", color=fillcolor, alpha=fillalpha, **kwargs) - if save_color is not None: kwargs['color'] = save_color + if "color" in kwargs: + save_color = kwargs.pop("color") + elif "c" in kwargs: + save_color = kwargs.pop("c") + plt.fill_between( + np.concatenate(([bins[0]], bins)), + np.concatenate(([0], hist, [0])), + step="post", + color=fillcolor, + alpha=fillalpha, + **kwargs, + ) + if save_color is not None: + kwargs["color"] = save_color if var is None: # the concat calls get the steps to draw correctly at the range boundaries # where="post" tells plt to draw the step y[i] between x[i] and x[i+1] - plt.step(np.concatenate(([bins[0]], bins)), np.concatenate(([0], hist, [0])), where="post", **kwargs) + plt.step( + np.concatenate(([bins[0]], bins)), + np.concatenate(([0], hist, [0])), + where="post", + **kwargs, + ) else: - plt.errorbar(get_bin_centers(bins), hist, - xerr=get_bin_widths(bins) / 2, yerr=np.sqrt(var), - fmt='none', **kwargs) + plt.errorbar( + get_bin_centers(bins), + hist, + xerr=get_bin_widths(bins) / 2, + yerr=np.sqrt(var), + fmt="none", + **kwargs, + ) if show_stats is True: bin_centers = get_bin_centers(bins) - N = np.sum(hist) - if N <= 1: - raise RuntimeError(f"can't compute sigma for N = {N}") - mean = np.sum(hist*bin_centers)/N - x2ave = np.sum(hist*bin_centers*bin_centers)/N - stddev = np.sqrt(N/(N-1) * (x2ave - mean*mean)) - dmean = stddev/np.sqrt(N) + n = np.sum(hist) + if n <= 1: + raise RuntimeError(f"can't compute sigma for n = {n}") + mean = np.sum(hist * bin_centers) / n + x2ave = np.sum(hist * bin_centers * bin_centers) / n + stddev = np.sqrt(n / (n - 1) * (x2ave - mean * mean)) + dmean = stddev / np.sqrt(n) mean, dmean = pgu.get_formatted_stats(mean, dmean, 2) - stats = f'$\\mu={mean} \\pm {dmean}$\n$\\sigma={stddev:#.3g}$' - stats_fontsize = rcParams['legend.fontsize'] - plt.text(stats_hloc, stats_vloc, stats, transform=plt.gca().transAxes, fontsize = stats_fontsize) - - -def get_gaussian_guess(hist: np.ndarray , bins: np.ndarray) -> tuple[float, float, float]: + stats = f"$\\mu={mean} \\pm {dmean}$\n$\\sigma={stddev:#.3g}$" + stats_fontsize = rcParams["legend.fontsize"] + plt.text( + stats_hloc, + stats_vloc, + stats, + transform=plt.gca().transAxes, + fontsize=stats_fontsize, + ) + + +def get_gaussian_guess( + hist: np.ndarray, bins: np.ndarray +) -> tuple[float, float, float]: """ given a hist, gives guesses for mu, sigma, and amplitude - Parameters + Parameters ---------- hist Array of histogrammed data - bins + bins Array of histogram bins Returns - ------- - guess_mu + ------- + guess_mu guess for the mu parameter of a Gaussian - guess_sigma + guess_sigma guess for the sigma parameter of a Gaussian guess_area guess for the A parameter of a Gaussian """ if len(bins) == len(hist): - log.warning("note: this function has been updated to require bins rather than bin_centers. Don't trust this result") + log.warning( + "note: this function has been updated to require bins rather than bin_centers. Don't trust this result" + ) max_idx = np.argmax(hist) - guess_mu = (bins[max_idx] + bins[max_idx])/2 # bin center + guess_mu = (bins[max_idx] + bins[max_idx]) / 2 # bin center guess_amp = hist[max_idx] # find 50% amp bounds on both sides for a FWHM guess @@ -597,33 +709,39 @@ def get_gaussian_guess(hist: np.ndarray , bins: np.ndarray) -> tuple[float, floa return (guess_mu, guess_sigma, guess_area) -def get_bin_estimates(pars: np.ndarray, func: Callable, bins: np.ndarray, is_integral: bool = False, **kwargs) -> np.ndarray: +def get_bin_estimates( + pars: np.ndarray, + func: Callable, + bins: np.ndarray, + is_integral: bool = False, + **kwargs, +) -> np.ndarray: """ Bin expected means are estimated by f(bin_center)*bin_width. Supply an integrating function to compute the integral over the bin instead. TODO: make default integrating function a numerical method that is off by default. - Parameters + Parameters ---------- - pars + pars The parameters of the function, func func - The function at which to evalutate the bin centers - bins - Array of histogram bins + The function at which to evaluate the bin centers + bins + Array of histogram bins is_integral if True, then func is an integral function - Returns + Returns ------- f(bin_center)*bin_width The expected mean of a bin See Also -------- - get_bin_widths - Returns the width of the bins + get_bin_widths + Returns the width of the bins get_bin_centers Finds the bin centers of the supplied bins """ @@ -633,6 +751,7 @@ def get_bin_estimates(pars: np.ndarray, func: Callable, bins: np.ndarray, is_int # func can be an integral functions return func(bins[1:], *pars, **kwargs) - func(bins[:-1], *pars, **kwargs) + def get_i_local_extrema(data, delta): """Get lists of indices of the local maxima and minima of data diff --git a/src/pygama/math/hpge_peak_fitting.py b/src/pygama/math/hpge_peak_fitting.py index 27e61efca..ad7d5501e 100644 --- a/src/pygama/math/hpge_peak_fitting.py +++ b/src/pygama/math/hpge_peak_fitting.py @@ -1,23 +1,25 @@ """ pygama convenience functions for fitting hpge peak shape data """ + import logging import math -from typing import Optional, Union +from typing import Optional import numpy as np from scipy.optimize import brentq, minimize_scalar from pygama.math.functions.exgauss import nb_exgauss_pdf -from pygama.math.functions.step import nb_unnorm_step_pdf -from pygama.math.functions.gauss import nb_gauss_pdf +from pygama.math.functions.gauss import nb_gauss_pdf from pygama.math.functions.gauss_on_exgauss import gauss_on_exgauss - +from pygama.math.functions.step import nb_unnorm_step_pdf log = logging.getLogger(__name__) -def hpge_peak_fwhm(sigma: float, htail: float, tau: float, cov: Optional[float] = None) -> tuple[float, float]: +def hpge_peak_fwhm( + sigma: float, htail: float, tau: float, cov: Optional[float] = None +) -> tuple[float, float]: """ Return the FWHM of the hpge_peak function, ignoring background and step components. If calculating error also need the normalisation for the step @@ -30,62 +32,85 @@ def hpge_peak_fwhm(sigma: float, htail: float, tau: float, cov: Optional[float] htail The height of the tail in the hpge_peak tau - The characteristic scale in the extended Gaussian in the hpge_peak + The characteristic scale in the extended Gaussian in the hpge_peak cov The covariant matrix of the previous parameters - Returns + Returns FWHM, FWHM_uncertainty The FWHM of the hpge_peak and its uncertainty """ + # optimize this to find max value - def neg_hpge_peak_peak_bgfree(E, sigma, htail, tau): - return -gauss_on_exgauss.get_pdf(np.array([E]), *np.array([0, sigma, htail, tau]))[0] + def neg_hpge_peak_peak_bgfree(e, sigma, htail, tau): + return -gauss_on_exgauss.get_pdf( + np.array([e]), *np.array([0, sigma, htail, tau]) + )[0] - if htail<0 or htail>1: + if htail < 0 or htail > 1: raise ValueError("htail outside allowed limits of 0 and 1") - res = minimize_scalar( neg_hpge_peak_peak_bgfree, - args=(sigma, htail, tau), - bounds=(-sigma-htail, sigma+htail) ) - Emax = res.x - half_max = -neg_hpge_peak_peak_bgfree(Emax, sigma, htail, tau)/2. + res = minimize_scalar( + neg_hpge_peak_peak_bgfree, + args=(sigma, htail, tau), + bounds=(-sigma - htail, sigma + htail), + ) + e_max = res.x + half_max = -neg_hpge_peak_peak_bgfree(e_max, sigma, htail, tau) / 2.0 # root find this to find the half-max energies - def hpge_peak_peak_bgfree_halfmax(E, sigma, htail, tau, half_max): - return gauss_on_exgauss.get_pdf(np.array([E]), *np.array([0, sigma, htail, tau]))[0] - half_max + def hpge_peak_peak_bgfree_halfmax(e, sigma, htail, tau, half_max): + return ( + gauss_on_exgauss.get_pdf(np.array([e]), *np.array([0, sigma, htail, tau]))[ + 0 + ] + - half_max + ) try: - lower_hm = brentq( hpge_peak_peak_bgfree_halfmax, - -(2.5*sigma/2 + htail*tau), Emax, - args = (sigma, htail, tau, half_max) ) - except: - lower_hm = brentq( hpge_peak_peak_bgfree_halfmax, - -(5*sigma + htail*tau), Emax, - args = (sigma, htail, tau, half_max) ) + lower_hm = brentq( + hpge_peak_peak_bgfree_halfmax, + -(2.5 * sigma / 2 + htail * tau), + e_max, + args=(sigma, htail, tau, half_max), + ) + except Exception: + lower_hm = brentq( + hpge_peak_peak_bgfree_halfmax, + -(5 * sigma + htail * tau), + e_max, + args=(sigma, htail, tau, half_max), + ) try: - upper_hm = brentq( hpge_peak_peak_bgfree_halfmax, - Emax, 2.5*sigma/2, - args = (sigma, htail, tau, half_max) ) - except: - upper_hm = brentq( hpge_peak_peak_bgfree_halfmax, - Emax, 5*sigma, - args = (sigma, htail, tau, half_max) ) - - if cov is None: return upper_hm - lower_hm - - #calculate uncertainty - #nsig set to 1, mu to 0, hstep+nbkg set to 0 - pars = [1,0, sigma, htail, tau, 0,0] + upper_hm = brentq( + hpge_peak_peak_bgfree_halfmax, + e_max, + 2.5 * sigma / 2, + args=(sigma, htail, tau, half_max), + ) + except Exception: + upper_hm = brentq( + hpge_peak_peak_bgfree_halfmax, + e_max, + 5 * sigma, + args=(sigma, htail, tau, half_max), + ) + + if cov is None: + return upper_hm - lower_hm + + # calculate uncertainty + # nsig set to 1, mu to 0, hstep+nbkg set to 0 + pars = [1, 0, sigma, htail, tau, 0, 0] step_norm = 1 - gradmax = hpge_peak_parameter_gradient(Emax, pars, step_norm) + gradmax = hpge_peak_parameter_gradient(e_max, pars, step_norm) gradmax *= 0.5 - grad1 = hpge_peak_parameter_gradient(lower_hm, pars,step_norm) + grad1 = hpge_peak_parameter_gradient(lower_hm, pars, step_norm) grad1 -= gradmax - grad1 /= hpge_peak_peakshape_derivative(lower_hm, pars,step_norm) - grad2 = hpge_peak_parameter_gradient(upper_hm, pars,step_norm) + grad1 /= hpge_peak_peakshape_derivative(lower_hm, pars, step_norm) + grad2 = hpge_peak_parameter_gradient(upper_hm, pars, step_norm) grad2 -= gradmax - grad2 /= hpge_peak_peakshape_derivative(upper_hm, pars,step_norm) + grad2 /= hpge_peak_peakshape_derivative(upper_hm, pars, step_norm) grad2 -= grad1 fwfm_unc = np.sqrt(np.dot(grad2, np.dot(cov, grad2))) @@ -93,109 +118,138 @@ def hpge_peak_peak_bgfree_halfmax(E, sigma, htail, tau, half_max): return upper_hm - lower_hm, fwfm_unc -def hpge_peak_fwfm(sigma, htail, tau, frac_max = 0.5, cov = None): +def hpge_peak_fwfm(sigma, htail, tau, frac_max=0.5, cov=None): """ Return the FWHM of the radford_peak function, ignoring background and step components. If calculating error also need the normalisation for the step function. """ + # optimize this to find max value - def neg_radford_peak_bgfree(E, sigma, htail, tau): - return -gauss_on_exgauss.get_pdf(np.array([E]), 0, sigma, htail, tau)[0] + def neg_radford_peak_bgfree(e, sigma, htail, tau): + return -gauss_on_exgauss.get_pdf(np.array([e]), 0, sigma, htail, tau)[0] - if htail<0 or htail>1: + if htail < 0 or htail > 1: raise ValueError("htail outside allowed limits of 0 and 1") - res = minimize_scalar( neg_radford_peak_bgfree, - args=(sigma, htail, tau), - bounds=(-sigma-htail, sigma+htail) ) - Emax = res.x - val_frac_max = -neg_radford_peak_bgfree(Emax, sigma, htail, tau)*frac_max + res = minimize_scalar( + neg_radford_peak_bgfree, + args=(sigma, htail, tau), + bounds=(-sigma - htail, sigma + htail), + ) + e_max = res.x + val_frac_max = -neg_radford_peak_bgfree(e_max, sigma, htail, tau) * frac_max # root find this to find the half-max energies - def radford_peak_bgfree_fracmax(E, sigma, htail, tau, val_frac_max): - return gauss_on_exgauss.get_pdf(np.array([E]), 0, sigma, htail, tau)[0] - val_frac_max + def radford_peak_bgfree_fracmax(e, sigma, htail, tau, val_frac_max): + return ( + gauss_on_exgauss.get_pdf(np.array([e]), 0, sigma, htail, tau)[0] + - val_frac_max + ) try: - lower_hm = brentq( radford_peak_bgfree_fracmax, - -(2.5*sigma/2 + htail*tau), Emax, - args = (sigma, htail, tau, val_frac_max) ) - except: - lower_hm = brentq( radford_peak_bgfree_fracmax, - -(5*sigma + htail*tau), Emax, - args = (sigma, htail, tau, val_frac_max) ) + lower_hm = brentq( + radford_peak_bgfree_fracmax, + -(2.5 * sigma / 2 + htail * tau), + e_max, + args=(sigma, htail, tau, val_frac_max), + ) + except Exception: + lower_hm = brentq( + radford_peak_bgfree_fracmax, + -(5 * sigma + htail * tau), + e_max, + args=(sigma, htail, tau, val_frac_max), + ) try: - upper_hm = brentq( radford_peak_bgfree_fracmax, - Emax, 2.5*sigma/2, - args = (sigma, htail, tau, val_frac_max) ) - except: - upper_hm = brentq( radford_peak_bgfree_fracmax, - Emax, 5*sigma, - args = (sigma, htail, tau, val_frac_max) ) - - if cov is None: return upper_hm - lower_hm - #calculate uncertainty - #nsig set to 1, mu to 0, hstep+nbkg set to 0 - pars = [1,0, sigma, htail, tau,0,0] + upper_hm = brentq( + radford_peak_bgfree_fracmax, + e_max, + 2.5 * sigma / 2, + args=(sigma, htail, tau, val_frac_max), + ) + except Exception: + upper_hm = brentq( + radford_peak_bgfree_fracmax, + e_max, + 5 * sigma, + args=(sigma, htail, tau, val_frac_max), + ) + + if cov is None: + return upper_hm - lower_hm + # calculate uncertainty + # nsig set to 1, mu to 0, hstep+nbkg set to 0 + pars = [1, 0, sigma, htail, tau, 0, 0] rng = np.random.default_rng(1) par_b = rng.multivariate_normal(pars, cov, size=100) y_b = np.zeros(len(par_b)) - for p in par_b: + for i, p in enumerate(par_b): try: - y_b[i] = hpge_peak_fwfm(p[2],p[3],p[4], frac_max=frac_max) + y_b[i] = hpge_peak_fwfm(p[2], p[3], p[4], frac_max=frac_max) except Exception: y_b[i] = np.nan yerr_boot = np.nanstd(y_b, axis=0) return upper_hm - lower_hm, yerr_boot -def hpge_peak_mode(mu, sigma, htail, tau, cov = None): - - if htail<0 or htail>1: + +def hpge_peak_mode(mu, sigma, htail, tau, cov=None): + + if htail < 0 or htail > 1: if cov is not None: return np.nan, np.nan else: return np.nan try: - mode = brentq(hpge_peak_peakshape_derivative, - mu-2*sigma - htail*tau, mu+2*sigma + htail*tau, - args = ([1,mu,sigma,htail,tau,0,0],1 )) + mode = brentq( + hpge_peak_peakshape_derivative, + mu - 2 * sigma - htail * tau, + mu + 2 * sigma + htail * tau, + args=([1, mu, sigma, htail, tau, 0, 0], 1), + ) except ValueError: try: - mode = brentq(hpge_peak_peakshape_derivative, - mu-4*sigma - htail*tau, mu+4*sigma + htail*tau, - args = ([1,mu,sigma,htail,tau,0,0],1 )) + mode = brentq( + hpge_peak_peakshape_derivative, + mu - 4 * sigma - htail * tau, + mu + 4 * sigma + htail * tau, + args=([1, mu, sigma, htail, tau, 0, 0], 1), + ) except ValueError: mode = np.nan - if cov is None: return mode + if cov is None: + return mode else: - #nsig set to 1, hstep+nbkg set to 0 - pars = np.array([1, mu, sigma, htail, tau,0,0]) + # nsig set to 1, hstep+nbkg set to 0 + pars = np.array([1, mu, sigma, htail, tau, 0, 0]) rng = np.random.default_rng(1) par_b = rng.multivariate_normal(pars, cov, size=10000) - modes = np.array([hpge_peak_mode(p[1],p[2],p[3],p[4]) for p in par_b]) + modes = np.array([hpge_peak_mode(p[1], p[2], p[3], p[4]) for p in par_b]) mode_err_boot = np.nanstd(modes, axis=0) return mode, mode_err_boot -def hpge_peak_peakshape_derivative(E: np.ndarray, pars: np.ndarray, step_norm: float) -> np.ndarray: +def hpge_peak_peakshape_derivative( + e: np.ndarray, pars: np.ndarray, step_norm: float +) -> np.ndarray: """ Computes the derivative of the hpge_peak peak shape - Parameters + Parameters ---------- - E - The array of energies of the hpge_peak - pars - The parameters of the hpge_peak fit + e + The array of energies of the hpge_peak + pars + The parameters of the hpge_peak fit step_norm The normalization of the background step function in the hpge_peak - Returns + Returns ------- derivative the derivative of the hpge_peak @@ -203,66 +257,82 @@ def hpge_peak_peakshape_derivative(E: np.ndarray, pars: np.ndarray, step_norm: f n_sig, mu, sigma, htail, tau, n_bkg, hstep = pars sigma = abs(sigma) - gaus = nb_gauss_pdf(E, mu, sigma) - y = (E-mu)/sigma - ret = -(1-htail)*(y/sigma)*gaus - ret -= htail/tau*(-nb_exgauss_pdf(np.array([E,E-1]), mu, sigma, tau)[0]+gaus) + gaus = nb_gauss_pdf(e, mu, sigma) + y = (e - mu) / sigma + ret = -(1 - htail) * (y / sigma) * gaus + ret -= ( + htail / tau * (-nb_exgauss_pdf(np.array([e, e - 1]), mu, sigma, tau)[0] + gaus) + ) - return n_sig*ret - n_bkg*hstep*gaus/step_norm #need norm factor for bkg + return n_sig * ret - n_bkg * hstep * gaus / step_norm # need norm factor for bkg -def hpge_peak_parameter_gradient(E: float, pars: np.ndarray, step_norm: float) -> np.ndarray: +def hpge_peak_parameter_gradient( + e: float, pars: np.ndarray, step_norm: float +) -> np.ndarray: """ Computes the gradient of the hpge_peak parameters - Parameters + Parameters ---------- - E - The energy of the hpge_peak - pars - The parameters of the hpge_peak fit + e + The energy of the hpge_peak + pars + The parameters of the hpge_peak fit step_norm The normalization of the background step function in the hpge_peak - Returns + Returns ------- gradient - gradient of the n_sig, mu, sigma, h_tail, tau, n_bkg, and hstep parameters of the + gradient of the n_sig, mu, sigma, h_tail, tau, n_bkg, and hstep parameters of the HPGe peak """ n_sig, mu, sigma, htail, tau, n_bkg, hstep = pars - gaus = nb_gauss_pdf(np.array([E, E-1]), mu, sigma)[0] + gaus = nb_gauss_pdf(np.array([e, e - 1]), mu, sigma)[0] - tailL = nb_exgauss_pdf(np.array([E, E-1]), mu, sigma, tau)[0] - if n_bkg ==0: + tail_l = nb_exgauss_pdf(np.array([e, e - 1]), mu, sigma, tau)[0] + if n_bkg == 0: step_f = 0 else: - step_f = nb_unnorm_step_pdf(np.array([E, E-1]), mu, sigma, hstep)[0] /step_norm + step_f = ( + nb_unnorm_step_pdf(np.array([e, e - 1]), mu, sigma, hstep)[0] / step_norm + ) - #some unitless numbers that show up a bunch - y = (E-mu)/sigma - sigtauL = sigma/tau + # some unitless numbers that show up a bunch + y = (e - mu) / sigma + sig_tau_l = sigma / tau - g_n_sig = 0.5*(htail*tailL + (1-htail)*gaus) + g_n_sig = 0.5 * (htail * tail_l + (1 - htail) * gaus) g_n_bkg = step_f - g_hs = n_bkg*math.erfc(y/np.sqrt(2))/step_norm - - g_ht = (n_sig/2)*(tailL-gaus) - - #gradient of gaussian part - g_mu = (1-htail)*y/sigma*gaus - g_sigma = (1-htail)*(y*y +-1)/sigma*gaus - - #gradient of low tail, use approximation if necessary - g_mu += htail/tau*(-tailL+gaus) - g_sigma += htail/tau*(sigtauL*tailL-(sigtauL-y)*gaus) - g_tau = -htail/tau*( (1.+sigtauL*y+sigtauL*sigtauL)*tailL - sigtauL*sigtauL*gaus) * n_sig - - g_mu = n_sig*g_mu + (2*n_bkg*hstep*gaus)/step_norm - g_sigma = n_sig*g_sigma + (2*n_bkg*hstep*gaus*y)/(step_norm*np.sqrt(sigma)) + g_hs = n_bkg * math.erfc(y / np.sqrt(2)) / step_norm + + g_ht = (n_sig / 2) * (tail_l - gaus) + + # gradient of gaussian part + g_mu = (1 - htail) * y / sigma * gaus + g_sigma = (1 - htail) * (y * y + -1) / sigma * gaus + + # gradient of low tail, use approximation if necessary + g_mu += htail / tau * (-tail_l + gaus) + g_sigma += htail / tau * (sig_tau_l * tail_l - (sig_tau_l - y) * gaus) + g_tau = ( + -htail + / tau + * ( + (1.0 + sig_tau_l * y + sig_tau_l * sig_tau_l) * tail_l + - sig_tau_l * sig_tau_l * gaus + ) + * n_sig + ) + + g_mu = n_sig * g_mu + (2 * n_bkg * hstep * gaus) / step_norm + g_sigma = n_sig * g_sigma + (2 * n_bkg * hstep * gaus * y) / ( + step_norm * np.sqrt(sigma) + ) gradient = g_n_sig, g_mu, g_sigma, g_ht, g_tau, g_n_bkg, g_hs return np.array(gradient) diff --git a/src/pygama/math/least_squares.py b/src/pygama/math/least_squares.py index 302bdbe22..e4d0b4c1d 100644 --- a/src/pygama/math/least_squares.py +++ b/src/pygama/math/least_squares.py @@ -1,10 +1,15 @@ """ pygama convenience functions for linearly fitting data """ -import numpy as np + from typing import Optional, Union -def linear_fit_by_sums(x: np.ndarray, y: np.ndarray, var: Optional[Union[float, np.ndarray]] = 1) -> tuple[float, float]: +import numpy as np + + +def linear_fit_by_sums( + x: np.ndarray, y: np.ndarray, var: Optional[Union[float, np.ndarray]] = 1 +) -> tuple[float, float]: """ Fast computation of weighted linear least squares fit to a linear model @@ -25,19 +30,21 @@ def linear_fit_by_sums(x: np.ndarray, y: np.ndarray, var: Optional[Union[float, The slope (m) and y-intercept (b) of the best fit (in the least-squares sense) of the data to y = mx + b """ - y = y/var - x = x/var - sum_wts = len(y)/var if np.isscalar(var) else np.sum(1/var) + y = y / var + x = x / var + sum_wts = len(y) / var if np.isscalar(var) else np.sum(1 / var) sum_x = np.sum(x) - sum_xx = np.sum(x*x) + sum_xx = np.sum(x * x) sum_y = np.sum(y) - sum_yx = np.sum(y*x) + sum_yx = np.sum(y * x) m = (sum_wts * sum_yx - sum_y * sum_x) / (sum_wts * sum_xx - sum_x**2) b = (sum_y - m * sum_x) / sum_wts return m, b -def fit_simple_scaling(x: np.ndarray, y: np.ndarray, var: Optional[Union[float, np.ndarray]] = 1) -> tuple[float, float]: +def fit_simple_scaling( + x: np.ndarray, y: np.ndarray, var: Optional[Union[float, np.ndarray]] = 1 +) -> tuple[float, float]: """ Fast computation of weighted linear least squares fit to a simple scaling @@ -59,6 +66,6 @@ def fit_simple_scaling(x: np.ndarray, y: np.ndarray, var: Optional[Union[float, """ x = np.asarray(x) y = np.asarray(y) - scale_var = 1/np.sum(x*x/var) - scale = np.sum(y*x/var) * scale_var + scale_var = 1 / np.sum(x * x / var) + scale = np.sum(y * x / var) * scale_var return scale, scale_var diff --git a/src/pygama/math/unbinned_fitting.py b/src/pygama/math/unbinned_fitting.py index 274898b88..2d63d3c47 100644 --- a/src/pygama/math/unbinned_fitting.py +++ b/src/pygama/math/unbinned_fitting.py @@ -1,20 +1,28 @@ """ pygama convenience functions for fitting ubinned data """ + import logging -import numpy as np from typing import Callable +import numpy as np from iminuit import Minuit, cost log = logging.getLogger(__name__) -def fit_unbinned(func: Callable, data: np.ndarray, guess:np.ndarray =None, - Extended: bool = True, cost_func:str = 'LL', simplex: bool = False, - bounds: list[tuple[float, float], ...]=None, fixed: list[bool, ...]=None) -> tuple[np.ndarray, ...]: +def fit_unbinned( + func: Callable, + data: np.ndarray, + guess: np.ndarray = None, + extended: bool = True, # noqa: N803 + cost_func: str = "LL", + simplex: bool = False, + bounds: list[tuple[float, float], ...] = None, + fixed: list[bool, ...] = None, +) -> tuple[np.ndarray, ...]: """Do a unbinned fit to data. - Default is Extended Log Likelihood fit, with option for other cost functions. + Default is extended Log Likelihood fit, with option for other cost functions. Parameters ---------- @@ -24,7 +32,7 @@ def fit_unbinned(func: Callable, data: np.ndarray, guess:np.ndarray =None, the data values to be fit guess initial guess parameters - Extended + extended run extended or non extended fit cost_func cost function to use. LL is ExtendedUnbinnedNLL, None is for just UnbinnedNLL @@ -42,10 +50,12 @@ def fit_unbinned(func: Callable, data: np.ndarray, guess:np.ndarray =None, the best-fit parameters and their errors / covariance """ if guess is None: - raise NotImplementedError("auto-guessing not yet implemented, you must supply a guess.") + raise NotImplementedError( + "auto-guessing not yet implemented, you must supply a guess." + ) - if cost_func =='LL': - if Extended == True: + if cost_func == "LL": + if extended is True: cost_func = cost.ExtendedUnbinnedNLL(data, func) else: @@ -63,7 +73,7 @@ def fit_unbinned(func: Callable, data: np.ndarray, guess:np.ndarray =None, if fixed is not None: for fix in fixed: m.fixed[fix] = True - if simplex == True: + if simplex is True: m.simplex().migrad() else: m.migrad() diff --git a/src/pygama/math/units.py b/src/pygama/math/units.py index 288014a20..a6c096d6c 100644 --- a/src/pygama/math/units.py +++ b/src/pygama/math/units.py @@ -1,8 +1,8 @@ -from pint import Measurement, Quantity, Unit, UnitRegistry, set_application_registry +from pint import UnitRegistry, set_application_registry unit_registry = UnitRegistry(auto_reduce_dimensions=True) set_application_registry(unit_registry) # Define constants useful for LEGEND below -unit_registry.define('m_76 = 75.921402729 * amu') # 10.1103/PhysRevC.81.032501 -unit_registry.define('Q_bb = 2039.061 * keV') # 10.1103/PhysRevC.81.032501 +unit_registry.define("m_76 = 75.921402729 * amu") # 10.1103/PhysRevC.81.032501 +unit_registry.define("Q_bb = 2039.061 * keV") # 10.1103/PhysRevC.81.032501 diff --git a/src/pygama/math/utils.py b/src/pygama/math/utils.py index 48bd6e365..b72a629ed 100644 --- a/src/pygama/math/utils.py +++ b/src/pygama/math/utils.py @@ -1,30 +1,30 @@ """ pygama utility functions. """ + import logging -import sys -from typing import Optional, Union, Callable +from typing import Callable, Optional import numpy as np log = logging.getLogger(__name__) -def sizeof_fmt(num: float, suffix: str='B') -> str: +def sizeof_fmt(num: float, suffix: str = "B") -> str: """ given a file size in bytes, output a human-readable form. - Parameters + Parameters ---------- - num - File size, in bytes + num + File size, in bytes suffix Desired file size suffix """ - for unit in ['', 'K', 'M', 'G', 'T', 'P', 'E', 'Z']: + for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]: if abs(num) < 1024.0: return f"{num:.3f} {unit}{suffix}" num /= 1024.0 - return "{:.1f} {} {}".format(num, 'Y', suffix) + return "{:.1f} {} {}".format(num, "Y", suffix) def get_par_names(func: Callable) -> tuple[str, ...]: @@ -32,58 +32,66 @@ def get_par_names(func: Callable) -> tuple[str, ...]: Return a list containing the names of the arguments of "func" other than the first argument. In pygamaland, those are the function's "parameters." - Parameters + Parameters ---------- func A function whose parameters we want to return """ import inspect + par = inspect.getfullargspec(func) return par[0][1:] -def get_formatted_stats(mean: float, sigma: float, ndigs: int =2) -> str: +def get_formatted_stats(mean: float, sigma: float, ndigs: int = 2) -> str: """ convenience function for formatting mean +/- sigma to the right number of significant figures. Parameters ---------- - mean + mean The mean value we want to format sigma The sigma value we want to format - ndigs + ndigs The number of significant digits we want to display """ if sigma == 0: - fmt = '%d' % ndigs - fmt = '%#.' + fmt + 'g' + fmt = "%d" % ndigs + fmt = "%#." + fmt + "g" return fmt % mean, fmt % sigma sig_pos = int(np.floor(np.log10(abs(sigma)))) - sig_fmt = '%d' % ndigs - sig_fmt = '%#.' + sig_fmt + 'g' + sig_fmt = "%d" % ndigs + sig_fmt = "%#." + sig_fmt + "g" mean_pos = int(np.floor(np.log10(abs(mean)))) - mdigs = mean_pos-sig_pos+ndigs - if mdigs < ndigs-1: mdigs = ndigs - 1 - mean_fmt = '%d' % mdigs - mean_fmt = '%#.' + mean_fmt + 'g' + mdigs = mean_pos - sig_pos + ndigs + if mdigs < ndigs - 1: + mdigs = ndigs - 1 + mean_fmt = "%d" % mdigs + mean_fmt = "%#." + mean_fmt + "g" return mean_fmt % mean, sig_fmt % sigma -def print_fit_results(pars: np.ndarray, cov: np.ndarray, func: Optional[Callable]=None, title: Optional[str]=None, pad: Optional[bool]=True) -> None: +def print_fit_results( + pars: np.ndarray, + cov: np.ndarray, + func: Optional[Callable] = None, + title: Optional[str] = None, + pad: Optional[bool] = True, +) -> None: """ Convenience function to write scipy.optimize.curve_fit results to the log - Parameters + Parameters ---------- - pars + pars The parameter values of the function func - func + func A function, if passed then the function's parameters' names are logged title A title to log - pad + pad If True, adds spaces to the log messages Returns @@ -95,7 +103,8 @@ def print_fit_results(pars: np.ndarray, cov: np.ndarray, func: Optional[Callable log.info(f"{title}:") par_names = [] if func is None: - for i in range(len(pars)): par_names.append("p"+str(i)) + for i in range(len(pars)): + par_names.append("p" + str(i)) else: par_names = get_par_names(func) for i in range(len(pars)): diff --git a/src/pygama/pargen/AoE_cal.py b/src/pygama/pargen/AoE_cal.py index 45efafb33..e3a853307 100644 --- a/src/pygama/pargen/AoE_cal.py +++ b/src/pygama/pargen/AoE_cal.py @@ -30,14 +30,14 @@ ) from pygama.math.functions.gauss import nb_gauss_amp from pygama.math.functions.hpge_peak import hpge_get_fwfm, hpge_get_fwhm, hpge_get_mode -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists from pygama.pargen.utils import convert_to_minuit, return_nans log = logging.getLogger(__name__) (x_lo, x_hi, n_sig, mu, sigma, n_bkg, tau) = range(7) par_array = [(gaussian, [mu, sigma]), (exgauss, [mu, sigma, tau])] -aoe_peak = sum_dists( +aoe_peak = SumDists( par_array, [n_sig, n_bkg], "areas", @@ -51,7 +51,7 @@ (gauss_on_exgauss, [mu, sigma, frac1, tau]), (exgauss, [mu, sigma, tau_bkg]), ] -aoe_peak_with_high_tail = sum_dists( +aoe_peak_with_high_tail = SumDists( par_array, [n_sig, n_bkg], "areas", @@ -251,7 +251,8 @@ def unbinned_aoe_fit( Fitting function for A/E, first fits just a Gaussian before using the full pdf to fit if fails will return NaN values - Args: + Parameters + ---------- aoe: np.array A/E values pdf: PDF @@ -259,7 +260,9 @@ def unbinned_aoe_fit( display: int Level of display - Returns: tuple(np.array, np.array) + Returns + ------- + tuple(np.array, np.array) Tuple of fit values and errors """ if not isinstance(aoe, np.ndarray): @@ -1507,7 +1510,7 @@ def get_aoe_cut_fit( ) valid_fits = self.cut_fits.query( - f'sf_err<{(1.5 * np.nanpercentile(self.cut_fits["sf_err"],85))}&sf_err==sf_err' + f'sf_err<{(1.5 * np.nanpercentile(self.cut_fits["sf_err"], 85))}&sf_err==sf_err' ) c = cost.LeastSquares( diff --git a/src/pygama/pargen/data_cleaning.py b/src/pygama/pargen/data_cleaning.py index 91e4eb4ad..c513476d6 100644 --- a/src/pygama/pargen/data_cleaning.py +++ b/src/pygama/pargen/data_cleaning.py @@ -21,12 +21,12 @@ import pygama.pargen.energy_cal as pgc from pygama.math.binned_fitting import goodness_of_fit from pygama.math.distributions import exgauss, gaussian -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists from pygama.math.unbinned_fitting import fit_unbinned (x_lo, x_hi, n_sig, mu, sigma, n_bkg, tau) = range(7) par_array = [(gaussian, [mu, sigma]), (exgauss, [mu, sigma, tau])] -gauss_on_exgauss_areas = sum_dists( +gauss_on_exgauss_areas = SumDists( par_array, [n_sig, n_bkg], "areas", @@ -36,7 +36,7 @@ (x_lo, x_hi, n_sig, mu, sigma, tau1, n_bkg, tau2) = range(8) par_array = [(exgauss, [mu, sigma, tau1]), (exgauss, [mu, sigma, tau2])] -double_exgauss = sum_dists( +double_exgauss = SumDists( par_array, [n_sig, n_bkg], "areas", @@ -373,58 +373,79 @@ def generate_cuts( Parameters ---------- data : lh5 table, dictionary of arrays or pandas dataframe - data to calculate cuts on + data to calculate cuts on + parameters : dict - dictionary of the form: - { - "output_parameter_name": { - "cut_parameter": "parameter_to_cut_on", - "cut_level": number_of_sigmas, - "mode": "inclusive" or "exclusive" - } + dictionary of the form: + + .. code-block:: json + + { + "output_parameter_name": { + "cut_parameter": "parameter_to_cut_on", + "cut_level": "number_of_sigmas", + "mode": "inclusive/exclusive" } - number of sigmas can instead be a dictionary to specify different cut levels for low and high side - or to only have a one sided cut only specify one of the low or high side - e.g. - { - "output_parameter_name": { - "cut_parameter": "parameter_to_cut_on", - "cut_level": {"low_side": 3, "high_side": 2}, - "mode": "inclusive" or "exclusive" - } + } + + number of sigmas can instead be a dictionary to specify different cut levels for low and high side + or to only have a one sided cut only specify one of the low or high side + e.g. + + .. code-block:: json + + { + "output_parameter_name": { + "cut_parameter": "parameter_to_cut_on", + "cut_level": {"low_side": "3", "high_side": "2"}, + "mode": "inclusive/exclusive" } - alternatively can specify hit dict fields to just copy dict into output dict e.g. - { - "is_valid_t0":{ - "expression":"(tp_0_est>a)&(tp_0_esta)&(tp_0_esta)&(tp_0_esta)&(tp_0_est None | Table: - """Builds a skimmed file from a (set) of `evt/hit/dsp` tier file(s). - - Parameters - ---------- - f_evt - path of `evt` file. - f_hit - path of `hit` file. - f_dsp - path of `dsp` file. - f_tcm - path of `tcm` file. - skm_conf - name of configuration file or dictionary defining `skm` fields. - - - ``multiplicity`` defines up to which row length - :class:`.VectorOfVector` fields should be kept. - - ``postfixes`` list of postfixes must be list of - ``len(multiplicity)``. If not given, numbers from 0 to - ``multiplicity -1`` are used - - ``operations`` are forwarded from lower tiers and clipped/padded - according to ``missing_value`` if needed. If the forwarded field - is not an evt tier, ``tcm_idx`` must be passed that specifies the - value to pick across channels. - - For example: - - .. code-block:: json - - { - "multiplicity": 2, - "postfixes":["", "aux"], - "operations": { - "timestamp":{ - "forward_field": "evt.timestamp" - }, - "multiplicity":{ - "forward_field": "evt.multiplicity" - }, - "energy":{ - "forward_field": "hit.cuspEmax_ctc_cal", - "missing_value": "np.nan", - "tcm_idx": "evt.energy_idx" - }, - "energy_id":{ - "forward_field": "tcm.array_id", - "missing_value": 0, - "tcm_idx": "evt.energy_idx" - } - } - } - f_skm - name of the `skm` output file. If ``None``, return the output - class:`.Table` instead of writing to disk. - - wo_mode - writing mode. - - - ``write_safe`` or ``w``: only proceed with writing if the file does - not already exists. - - ``append`` or ``a``: append to file. - - ``overwrite`` or ``o``: replaces existing file. - - skm_group - `skm` LH5 root group name. - evt_group - `evt` LH5 root group name. - hit_group - `hit` LH5 root group name. - dsp_group - `dsp` LH5 root group name. - tcm_group - `tcm` LH5 root group name. - tcm_id_table_pattern - pattern to format `tcm` id values to table name in higher tiers. Must have one - placeholder which is the `tcm` id. - """ - f_dict = {evt_group: f_evt, hit_group: f_hit, dsp_group: f_dsp, tcm_group: f_tcm} - log = logging.getLogger(__name__) - log.debug(f"I am skimming {len(f_evt) if isinstance(f_evt,list) else 1} files") - - tbl_cfg = skm_conf - if not isinstance(tbl_cfg, (str, dict)): - raise TypeError() - if isinstance(tbl_cfg, str): - with open(tbl_cfg) as f: - tbl_cfg = json.load(f) - - # Check if multiplicity is given - if "multiplicity" not in tbl_cfg.keys(): - raise ValueError("multiplicity field missing") - - multi = int(tbl_cfg["multiplicity"]) - store = LH5Store() - # df = pd.DataFrame() - table = Table() - if "operations" in tbl_cfg.keys(): - for op in tbl_cfg["operations"].keys(): - miss_val = np.nan - if "missing_value" in tbl_cfg["operations"][op].keys(): - miss_val = tbl_cfg["operations"][op]["missing_value"] - if isinstance(miss_val, str) and ( - miss_val in ["np.nan", "np.inf", "-np.inf"] - ): - miss_val = eval(miss_val) - - fw_fld = tbl_cfg["operations"][op]["forward_field"] - - # load object if from evt tier - if evt_group in fw_fld.replace(".", "/"): - obj = store.read( - f"/{fw_fld.replace('.','/')}", f_dict[fw_fld.split(".", 1)[0]] - )[0].view_as("ak") - - # else collect data from lower tier via tcm_idx - else: - if "tcm_idx" not in tbl_cfg["operations"][op].keys(): - raise ValueError( - f"{op} is an sub evt level operation. tcm_idx field must be specified" - ) - tcm_idx_fld = tbl_cfg["operations"][op]["tcm_idx"] - tcm_idx = store.read( - f"/{tcm_idx_fld.replace('.','/')}", - f_dict[tcm_idx_fld.split(".")[0]], - )[0].view_as("ak")[:, :multi] - - obj = ak.Array([[] for x in range(len(tcm_idx))]) - - # load TCM data to define an event - ids = store.read(f"/{tcm_group}/array_id", f_tcm)[0].view_as("ak") - ids = ak.unflatten(ids[ak.flatten(tcm_idx)], ak.count(tcm_idx, axis=-1)) - - idx = store.read(f"/{tcm_group}/array_idx", f_tcm)[0].view_as("ak") - idx = ak.unflatten(idx[ak.flatten(tcm_idx)], ak.count(tcm_idx, axis=-1)) - - if "tcm.array_id" == tbl_cfg["operations"][op]["forward_field"]: - obj = ids - elif "tcm.array_idx" == tbl_cfg["operations"][op]["forward_field"]: - obj = idx - - else: - chns = np.unique( - ak.to_numpy(ak.flatten(ids), allow_missing=False) - ).astype(int) - - # Get the data - for ch in chns: - ch_idx = idx[ids == ch] - ct_idx = ak.count(ch_idx, axis=-1) - fl_idx = ak.to_numpy(ak.flatten(ch_idx), allow_missing=False) - - if ( - f"{utils.get_table_name_by_pattern(tcm_id_table_pattern,ch)}/{fw_fld.replace('.','/')}" - not in lh5.ls( - f_dict[[key for key in f_dict if key in fw_fld][0]], - f"ch{ch}/{fw_fld.rsplit('.',1)[0]}/", - ) - ): - och = Array(nda=np.full(len(fl_idx), miss_val)) - else: - och, _ = store.read( - f"{utils.get_table_name_by_pattern(tcm_id_table_pattern,ch)}/{fw_fld.replace('.','/')}", - f_dict[[key for key in f_dict if key in fw_fld][0]], - idx=fl_idx, - ) - if not isinstance(och, Array): - raise ValueError( - f"{type(och)} not supported. Forward only Array fields" - ) - och = och.view_as("ak") - och = ak.unflatten(och, ct_idx) - obj = ak.concatenate((obj, och), axis=-1) - - # Pad, clip and numpyfy - if obj.ndim > 1: - obj = ak.pad_none(obj, multi, clip=True) - obj = ak.to_numpy(ak.fill_none(obj, miss_val)) - - if obj.ndim > 1: - if "postfixes" in tbl_cfg.keys(): - nms = [f"{op}{x}" for x in tbl_cfg["postfixes"]] - else: - nms = [f"{op}_{x}" for x in range(multi)] - - for i in range(len(nms)): - # add attribute if present - ob = Array(nda=obj[:, i]) - if "lgdo_attrs" in tbl_cfg["operations"][op].keys(): - ob.attrs |= tbl_cfg["operations"][op]["lgdo_attrs"] - table.add_field(nms[i], ob, True) - else: - obj = Array(nda=obj) - if "lgdo_attrs" in tbl_cfg["operations"][op].keys(): - obj.attrs |= tbl_cfg["operations"][op]["lgdo_attrs"] - table.add_field(op, obj, True) - - if not f_skm: - return table - - # last thing missing is writing it out - if wo_mode not in ["w", "write_safe", "o", "overwrite", "a", "append"]: - raise ValueError(f"wo_mode {wo_mode} not valid.") - - log.debug("saving skm file") - if (wo_mode in ["w", "write_safe"]) and os.path.exists(f_skm): - raise FileExistsError(f"Write_safe mode: {f_skm} exists.") - - wo = wo_mode if wo_mode not in ["o", "overwrite"] else "of" - store.write(obj=table, name=f"/{skm_group}/", lh5_file=f_skm, wo_mode=wo) diff --git a/tests/conftest.py b/tests/conftest.py index 2460708e4..3f09ace06 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -30,7 +30,7 @@ def pytest_sessionfinish(session, exitstatus): @pytest.fixture(scope="session") def lgnd_test_data(): ldata = LegendTestData() - ldata.checkout("89e91f6") + ldata.checkout("8f55832") return ldata diff --git a/tests/evt/test_utils.py b/tests/evt/test_evt_utils.py similarity index 100% rename from tests/evt/test_utils.py rename to tests/evt/test_evt_utils.py diff --git a/tests/hit/test_build_hit.py b/tests/hit/test_build_hit.py index 7c918a229..c43a8c720 100644 --- a/tests/hit/test_build_hit.py +++ b/tests/hit/test_build_hit.py @@ -2,6 +2,7 @@ import os from pathlib import Path +import awkward as ak import numpy as np import pytest from lgdo import lh5 @@ -236,3 +237,28 @@ def test_build_hit_spms_calc(dsp_test_file_spm, tmptestdir): assert np.nanmean(df0) == 0 assert np.nanmean(df1) == 1 assert np.nanmean(df2) == 2 + + +def test_vov_input(lgnd_test_data, tmptestdir): + infile = lgnd_test_data.get_path( + "lh5/l200-p03-r000-phy-20230312T055349Z-tier_psp.lh5" + ) + outfile = f"{tmptestdir}/LDQTA_r117_20200110T105115Z_cal_geds_hit.lh5" + + hit_config = { + "outputs": ["a"], + "operations": { + "a": { + "expression": "a + m * energies", + "parameters": {"a": 0, "m": 1}, + } + }, + } + + build_hit( + infile, outfile=outfile, hit_config=hit_config, wo_mode="of", buffer_len=9999999 + ) + + orig = lh5.read_as("ch1067205/dsp/energies", infile, "ak") + data = lh5.read_as("ch1067205/hit/a", outfile, "ak") + assert ak.all(data == orig) diff --git a/tests/math/functions/test_crystal_ball.py b/tests/math/functions/test_crystal_ball.py index 24f9e3e4f..480fc74a9 100644 --- a/tests/math/functions/test_crystal_ball.py +++ b/tests/math/functions/test_crystal_ball.py @@ -2,7 +2,7 @@ from scipy.stats import crystalball as scipy_crystal_ball from pygama.math.functions.crystal_ball import crystal_ball -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous def test_crystalball_pdf(): @@ -18,7 +18,7 @@ def test_crystalball_pdf(): y_direct = crystal_ball.get_pdf(x, *par_array) scipy_y = scipy_crystal_ball.pdf(x, beta, m, mu, sigma) - assert isinstance(crystal_ball, pygama_continuous) + assert isinstance(crystal_ball, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -51,7 +51,7 @@ def test_crystalball_cdf(): y_direct = crystal_ball.get_cdf(x, *par_array) scipy_y = scipy_crystal_ball.cdf(x, beta, m, mu, sigma) - assert isinstance(crystal_ball, pygama_continuous) + assert isinstance(crystal_ball, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_exgauss.py b/tests/math/functions/test_exgauss.py index 8b05786a4..123dca265 100644 --- a/tests/math/functions/test_exgauss.py +++ b/tests/math/functions/test_exgauss.py @@ -2,7 +2,7 @@ from scipy.stats import exponnorm from pygama.math.functions.exgauss import exgauss -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous def test_exgauss_pdf(): @@ -19,7 +19,7 @@ def test_exgauss_pdf(): -1 * x, tau / sigma, -1 * mu, sigma ) # to be equivalent to the scipy version, x -> -x, mu -> -mu, k -> k/sigma - assert isinstance(exgauss, pygama_continuous) + assert isinstance(exgauss, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -53,7 +53,7 @@ def test_exgauss_cdf(): ) # to be equivalent to the scipy version, x -> -x, mu -> -mu, k -> k/sigma y_direct = exgauss.get_cdf(x, *par_array) - assert isinstance(exgauss, pygama_continuous) + assert isinstance(exgauss, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_exponential.py b/tests/math/functions/test_exponential.py index 19fab0d0d..32e36665c 100644 --- a/tests/math/functions/test_exponential.py +++ b/tests/math/functions/test_exponential.py @@ -2,7 +2,7 @@ from scipy.stats import expon from pygama.math.functions.exponential import exponential -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous def test_exponential_pdf(): @@ -17,7 +17,7 @@ def test_exponential_pdf(): y_direct = exponential.get_pdf(x, *par_array) scipy_y = expon.pdf(x, mu, sigma / lamb) - assert isinstance(exponential, pygama_continuous) + assert isinstance(exponential, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -46,7 +46,7 @@ def test_exponential_cdf(): y_direct = exponential.get_cdf(x, *par_array) scipy_y = expon.cdf(x, mu, sigma / lamb) - assert isinstance(exponential, pygama_continuous) + assert isinstance(exponential, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_gauss.py b/tests/math/functions/test_gauss.py index 717863cde..86c03aa35 100644 --- a/tests/math/functions/test_gauss.py +++ b/tests/math/functions/test_gauss.py @@ -2,7 +2,7 @@ from scipy.stats import norm as scipy_gaussian from pygama.math.functions.gauss import gaussian -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous def test_gaussian_pdf(): @@ -14,7 +14,7 @@ def test_gaussian_pdf(): y_direct = gaussian.get_pdf(x, mu, sigma) scipy_y = scipy_gaussian.pdf(x, mu, sigma) - assert isinstance(gaussian, pygama_continuous) + assert isinstance(gaussian, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -41,7 +41,7 @@ def test_gaussian_cdf(): y_direct = gaussian.get_cdf(x, mu, sigma) scipy_y = scipy_gaussian.cdf(x, mu, sigma) - assert isinstance(gaussian, pygama_continuous) + assert isinstance(gaussian, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_gauss_on_exgauss.py b/tests/math/functions/test_gauss_on_exgauss.py index 0b39529b9..fca88cd46 100644 --- a/tests/math/functions/test_gauss_on_exgauss.py +++ b/tests/math/functions/test_gauss_on_exgauss.py @@ -2,7 +2,7 @@ from scipy.stats import exponnorm, norm from pygama.math.functions.gauss_on_exgauss import gauss_on_exgauss -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def test_gauss_on_exgauss_pdf(): @@ -14,7 +14,7 @@ def test_gauss_on_exgauss_pdf(): pars = np.array([mu, sigma, h_tail, tau], dtype=float) - assert isinstance(gauss_on_exgauss, sum_dists) + assert isinstance(gauss_on_exgauss, SumDists) y_direct = gauss_on_exgauss.get_pdf(x, *pars) scipy_exgauss = h_tail * exponnorm.pdf( @@ -45,7 +45,7 @@ def test_gauss_on_exgauss_cdf(): pars = np.array([mu, sigma, h_tail, tau], dtype=float) - assert isinstance(gauss_on_exgauss, sum_dists) + assert isinstance(gauss_on_exgauss, SumDists) y_direct = gauss_on_exgauss.get_cdf(x, *pars) diff --git a/tests/math/functions/test_gauss_on_exponential.py b/tests/math/functions/test_gauss_on_exponential.py index c584375a5..405228ff7 100644 --- a/tests/math/functions/test_gauss_on_exponential.py +++ b/tests/math/functions/test_gauss_on_exponential.py @@ -2,7 +2,7 @@ from scipy.stats import expon, norm from pygama.math.functions.gauss_on_exponential import gauss_on_exponential -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def test_gauss_on_exponential_pdf(): @@ -17,7 +17,7 @@ def test_gauss_on_exponential_pdf(): pars = np.array([n_sig, mu, sigma, n_bkg, lamb, mu_exp, sigma_exp], dtype=float) - assert isinstance(gauss_on_exponential, sum_dists) + assert isinstance(gauss_on_exponential, SumDists) y_direct = gauss_on_exponential.get_pdf(x, *pars) scipy_exponential = n_bkg * expon.pdf(x, mu_exp, sigma_exp / lamb) @@ -51,7 +51,7 @@ def test_gauss_on_exponential_cdf(): pars = np.array([n_sig, mu, sigma, n_bkg, lamb, mu_exp, sigma_exp], dtype=float) - assert isinstance(gauss_on_exponential, sum_dists) + assert isinstance(gauss_on_exponential, SumDists) y_direct = gauss_on_exponential.get_cdf(x, *pars) diff --git a/tests/math/functions/test_gauss_on_linear.py b/tests/math/functions/test_gauss_on_linear.py index e2f990ff5..ac6722d73 100644 --- a/tests/math/functions/test_gauss_on_linear.py +++ b/tests/math/functions/test_gauss_on_linear.py @@ -2,7 +2,7 @@ from scipy.stats import norm from pygama.math.functions.gauss_on_linear import gauss_on_linear -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def line(x, m, b): @@ -26,7 +26,7 @@ def test_gauss_on_linear_pdf(): pars = np.array([x_lo, x_hi, n_sig, mu, sigma, n_bkg, m, b], dtype=float) - assert isinstance(gauss_on_linear, sum_dists) + assert isinstance(gauss_on_linear, SumDists) y_direct = gauss_on_linear.get_pdf(x, *pars) @@ -60,7 +60,7 @@ def test_gauss_on_linear_cdf(): pars = np.array([x_lo, x_hi, n_sig, mu, sigma, n_bkg, m, b], dtype=float) - assert isinstance(gauss_on_linear, sum_dists) + assert isinstance(gauss_on_linear, SumDists) y_direct = gauss_on_linear.get_cdf(x, *pars) diff --git a/tests/math/functions/test_gauss_on_step.py b/tests/math/functions/test_gauss_on_step.py index 17d72601b..9adb58327 100644 --- a/tests/math/functions/test_gauss_on_step.py +++ b/tests/math/functions/test_gauss_on_step.py @@ -2,7 +2,7 @@ from scipy.stats import norm from pygama.math.functions.gauss_on_step import gauss_on_step -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def test_gauss_on_step_pdf(): @@ -45,7 +45,7 @@ def test_gauss_on_step_pdf(): scipy_y_gauss = norm.pdf(x, mu, sigma) scipy_y = n_sig * scipy_y_gauss + n_bkg * scipy_y_step - assert isinstance(gauss_on_step, sum_dists) + assert isinstance(gauss_on_step, SumDists) assert np.allclose(y_direct, scipy_y, rtol=1e-8) y_sig, y_ext = gauss_on_step.pdf_ext(x, *pars) @@ -106,7 +106,7 @@ def test_gauss_on_step_cdf(): # for the cdf, we actually do not want to normalize by the sum of the areas, this is because iminuit takes a total unnormalized cdf for extended binned fits scipy_y = n_sig * scipy_y_gauss + n_bkg * scipy_y_step - assert isinstance(gauss_on_step, sum_dists) + assert isinstance(gauss_on_step, SumDists) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_gauss_on_uniform.py b/tests/math/functions/test_gauss_on_uniform.py index bf911c731..fcb483aca 100644 --- a/tests/math/functions/test_gauss_on_uniform.py +++ b/tests/math/functions/test_gauss_on_uniform.py @@ -2,7 +2,7 @@ from scipy.stats import norm, uniform from pygama.math.functions.gauss_on_uniform import gauss_on_uniform -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def test_gauss_on_uniform_pdf(): @@ -14,7 +14,7 @@ def test_gauss_on_uniform_pdf(): pars = np.array([np.amin(x), np.amax(x), n_sig, mu, sigma, n_bkg], dtype=float) - assert isinstance(gauss_on_uniform, sum_dists) + assert isinstance(gauss_on_uniform, SumDists) y_direct = gauss_on_uniform.get_pdf(x, *pars) @@ -39,7 +39,7 @@ def test_gauss_on_uniform_cdf(): pars = np.array([np.amin(x), np.amax(x), n_sig, mu, sigma, n_bkg], dtype=float) - assert isinstance(gauss_on_uniform, sum_dists) + assert isinstance(gauss_on_uniform, SumDists) y_direct = gauss_on_uniform.get_cdf(x, *pars) diff --git a/tests/math/functions/test_hpge_peak.py b/tests/math/functions/test_hpge_peak.py index 5eace500e..0aa738824 100644 --- a/tests/math/functions/test_hpge_peak.py +++ b/tests/math/functions/test_hpge_peak.py @@ -2,7 +2,7 @@ from scipy.stats import exponnorm, norm from pygama.math.functions.hpge_peak import hpge_peak -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def test_hpge_peak_pdf(): @@ -23,7 +23,7 @@ def test_hpge_peak_pdf(): dtype=float, ) - assert isinstance(hpge_peak, sum_dists) + assert isinstance(hpge_peak, SumDists) y_direct = hpge_peak.get_pdf(x, *pars) scipy_exgauss = htail * exponnorm.pdf( @@ -76,7 +76,7 @@ def test_hpge_peak_cdf(): dtype=float, ) - assert isinstance(hpge_peak, sum_dists) + assert isinstance(hpge_peak, SumDists) y_direct = hpge_peak.get_cdf(x, *pars) diff --git a/tests/math/functions/test_linear.py b/tests/math/functions/test_linear.py index cdc158089..62a7809ab 100644 --- a/tests/math/functions/test_linear.py +++ b/tests/math/functions/test_linear.py @@ -2,7 +2,7 @@ from scipy.stats import uniform as scipy_uniform from pygama.math.functions.linear import linear -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.math.functions.uniform import uniform @@ -17,7 +17,7 @@ def test_linear_pdf(): y_direct = linear.get_pdf(x, x_lo, x_hi, m, b) scipy_y = scipy_uniform.pdf(x, x[0], x[-1] + np.abs(x[0])) - assert isinstance(linear, pygama_continuous) + assert isinstance(linear, PygamaContinuous) assert np.array_equal(y, scipy_y) assert np.array_equal(y_direct, scipy_y) @@ -44,7 +44,7 @@ def test_linear_cdf(): y_direct = linear.get_cdf(x, x_lo, x_hi, m, b) scipy_y = scipy_uniform.cdf(x, x[0], x[-1] + np.abs(x[0])) - assert isinstance(uniform, pygama_continuous) + assert isinstance(uniform, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_moyal.py b/tests/math/functions/test_moyal.py index 2023ed539..06b8ad20f 100644 --- a/tests/math/functions/test_moyal.py +++ b/tests/math/functions/test_moyal.py @@ -2,7 +2,7 @@ from scipy.stats import moyal as scipy_moyal from pygama.math.functions.moyal import moyal -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous def test_moyal_pdf(): @@ -14,7 +14,7 @@ def test_moyal_pdf(): y_direct = moyal.get_pdf(x, mu, sigma) scipy_y = scipy_moyal.pdf(x, mu, sigma) - assert isinstance(moyal, pygama_continuous) + assert isinstance(moyal, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -41,7 +41,7 @@ def test_moyal_cdf(): y_direct = moyal.get_cdf(x, mu, sigma) scipy_y = scipy_moyal.cdf(x, mu, sigma) - assert isinstance(moyal, pygama_continuous) + assert isinstance(moyal, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_numba_frozen.py b/tests/math/functions/test_numba_frozen.py index 6cb648954..1c13fea4e 100644 --- a/tests/math/functions/test_numba_frozen.py +++ b/tests/math/functions/test_numba_frozen.py @@ -2,7 +2,7 @@ from scipy.stats import norm from pygama.math.functions.gauss import gaussian -from pygama.math.functions.pygama_continuous import numba_frozen +from pygama.math.functions.pygama_continuous import NumbaFrozen def test_numba_frozen(): @@ -10,7 +10,7 @@ def test_numba_frozen(): mu = 2 sigma = 3 gauss = gaussian(mu, sigma) - assert isinstance(gauss, numba_frozen) + assert isinstance(gauss, NumbaFrozen) numba_pdf = gauss.pdf(x) scipy_pdf = norm.pdf(x, mu, sigma) diff --git a/tests/math/functions/test_step.py b/tests/math/functions/test_step.py index 7ae932b58..1aa03dea9 100644 --- a/tests/math/functions/test_step.py +++ b/tests/math/functions/test_step.py @@ -1,7 +1,7 @@ import numpy as np from scipy.stats import norm -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.math.functions.step import step @@ -41,7 +41,7 @@ def test_step_pdf(): # compute the normalized step function scipy_y = scipy_step / normalization - assert isinstance(step, pygama_continuous) + assert isinstance(step, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -103,7 +103,7 @@ def test_step_cdf(): # Compute the cdf scipy_y = unnormalized_cdf / pdf_normalization - assert isinstance(step, pygama_continuous) + assert isinstance(step, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_sum_dists.py b/tests/math/functions/test_sum_dists.py index 82bd5954d..e75945e69 100644 --- a/tests/math/functions/test_sum_dists.py +++ b/tests/math/functions/test_sum_dists.py @@ -2,12 +2,12 @@ from scipy.stats import norm from pygama.math.functions.gauss import gaussian -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists def test_only_two_distributions(): with pytest.raises(ValueError) as exc_info: - sum_dists([(gaussian, [0, 1])], [0], "fracs") + SumDists([(gaussian, [0, 1])], [0], "fracs") exception_raised = exc_info.value assert str(exception_raised) == "Can only sum two distributions at once." @@ -15,7 +15,7 @@ def test_only_two_distributions(): def test_two_distributions_no_par_array(): with pytest.raises(ValueError) as exc_info: - sum_dists([(gaussian, [0, 1]), (gaussian, [], [])], [0], "fracs") + SumDists([(gaussian, [0, 1]), (gaussian, [], [])], [0], "fracs") exception_raised = exc_info.value assert ( @@ -26,7 +26,7 @@ def test_two_distributions_no_par_array(): def test_pygama_continuous_check(): with pytest.raises(ValueError) as exc_info: - sum_dists([(gaussian, [0, 1]), (norm, [1, 0])], [0], "fracs") + SumDists([(gaussian, [0, 1]), (norm, [1, 0])], [0], "fracs") exception_raised = exc_info.value assert str(exception_raised)[-12:] == "distribution" @@ -34,26 +34,26 @@ def test_pygama_continuous_check(): def test_fracs_flag_with_wrong_args(): with pytest.raises(ValueError) as exc_info: - sum_dists([(gaussian, [0, 1]), (gaussian, [1, 0])], [0, 2], "fracs") + SumDists([(gaussian, [0, 1]), (gaussian, [1, 0])], [0, 2], "fracs") exception_raised = exc_info.value assert ( str(exception_raised) - == "sum_dists only accepts the parameter position of one fraction." + == "SumDists only accepts the parameter position of one fraction." ) def test_areas_flag_with_wrong_args(): with pytest.raises(ValueError) as exc_info: - sum_dists([(gaussian, [0, 1]), (gaussian, [1, 0])], [0], "areas") + SumDists([(gaussian, [0, 1]), (gaussian, [1, 0])], [0], "areas") exception_raised = exc_info.value - assert str(exception_raised) == "sum_dists needs two parameter indices of areas." + assert str(exception_raised) == "SumDists needs two parameter indices of areas." def test_onea_area_flag_with_wrong_args(): with pytest.raises(ValueError) as exc_info: - sum_dists([(gaussian, [0, 1]), (gaussian, [1, 0])], [0, 2], "one_area") + SumDists([(gaussian, [0, 1]), (gaussian, [1, 0])], [0, 2], "one_area") exception_raised = exc_info.value - assert str(exception_raised) == "sum_dists needs one parameter index of an area." + assert str(exception_raised) == "SumDists needs one parameter index of an area." diff --git a/tests/math/functions/test_triple_gauss_on_double_step.py b/tests/math/functions/test_triple_gauss_on_double_step.py index c46b53c30..bb04bfbc8 100644 --- a/tests/math/functions/test_triple_gauss_on_double_step.py +++ b/tests/math/functions/test_triple_gauss_on_double_step.py @@ -1,7 +1,7 @@ import numpy as np from scipy.stats import norm -from pygama.math.functions.sum_dists import sum_dists +from pygama.math.functions.sum_dists import SumDists from pygama.math.functions.triple_gauss_on_double_step import ( triple_gauss_on_double_step, ) @@ -83,7 +83,7 @@ def test_triple_gauss_on_double_step_pdf(): scipy_y = scipy_y_step + scipy_y_gauss - assert isinstance(triple_gauss_on_double_step, sum_dists) + assert isinstance(triple_gauss_on_double_step, SumDists) assert np.allclose(y_direct, scipy_y, rtol=1e-8) @@ -155,7 +155,7 @@ def test_triple_gauss_on_double_step_cdf(): ) scipy_y = scipy_y_step + scipy_y_gauss - assert isinstance(triple_gauss_on_double_step, sum_dists) + assert isinstance(triple_gauss_on_double_step, SumDists) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/functions/test_uniform.py b/tests/math/functions/test_uniform.py index 00ba18af3..d21904735 100644 --- a/tests/math/functions/test_uniform.py +++ b/tests/math/functions/test_uniform.py @@ -1,7 +1,7 @@ import numpy as np from scipy.stats import uniform as scipy_uniform -from pygama.math.functions.pygama_continuous import pygama_continuous +from pygama.math.functions.pygama_continuous import PygamaContinuous from pygama.math.functions.uniform import uniform @@ -14,7 +14,7 @@ def test_uniform_pdf(): y_direct = uniform.get_pdf(x, a, b) scipy_y = scipy_uniform.pdf(x, a, b) - assert isinstance(uniform, pygama_continuous) + assert isinstance(uniform, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.array_equal(y_direct, scipy_y) @@ -40,7 +40,7 @@ def test_uniform_cdf(): y_direct = uniform.get_cdf(x, a, b) scipy_y = scipy_uniform.cdf(x, a, b) - assert isinstance(uniform, pygama_continuous) + assert isinstance(uniform, PygamaContinuous) assert np.allclose(y, scipy_y, rtol=1e-8) assert np.allclose(y_direct, scipy_y, rtol=1e-8) diff --git a/tests/math/test_unbinned_fitting.py b/tests/math/test_unbinned_fitting.py index e3467dd9b..9d2427241 100644 --- a/tests/math/test_unbinned_fitting.py +++ b/tests/math/test_unbinned_fitting.py @@ -18,7 +18,7 @@ def test_fit_unbinned(): from pygama.math.functions.gauss import gaussian fit, fit_error, fit_cov = pgubf.fit_unbinned( - gaussian.get_pdf, xdata, guess=[0, 0.9], cost_func="LL", Extended=False + gaussian.get_pdf, xdata, guess=[0, 0.9], cost_func="LL", extended=False ) assert fit["mu"] == approx(mu, rel=1e-2) assert fit["sigma"] == approx(sigma, rel=1e-1) diff --git a/tests/skm/configs/basic-skm-config.json b/tests/skm/configs/basic-skm-config.json deleted file mode 100644 index e1ffda941..000000000 --- a/tests/skm/configs/basic-skm-config.json +++ /dev/null @@ -1,25 +0,0 @@ -{ - "multiplicity": 3, - "operations": { - "timestamp": { - "forward_field": "evt.timestamp", - "lgdo_attrs": { "info": "pk was here" } - }, - "energy_sum": { - "forward_field": "evt.energy_sum" - }, - "multiplicity": { - "forward_field": "evt.multiplicity" - }, - "energy": { - "forward_field": "hit.cuspEmax_ctc_cal", - "missing_value": 0.0, - "tcm_idx": "evt.energy_idx" - }, - "energy_id": { - "forward_field": "tcm.array_id", - "missing_value": 0, - "tcm_idx": "evt.energy_idx" - } - } -} diff --git a/tests/skm/test_build_skm.py b/tests/skm/test_build_skm.py deleted file mode 100644 index 00fda9f08..000000000 --- a/tests/skm/test_build_skm.py +++ /dev/null @@ -1,119 +0,0 @@ -import os -from pathlib import Path - -import awkward as ak -import lgdo -import pytest -from lgdo.lh5 import LH5Store - -from pygama.evt import build_evt -from pygama.skm import build_skm - -config_dir = Path(__file__).parent / "configs" -evt_config_dir = Path(__file__).parent.parent / "evt" / "configs" -store = LH5Store() - - -@pytest.fixture(scope="module") -def files_config(lgnd_test_data, tmptestdir): - tcm_path = "lh5/prod-ref-l200/generated/tier/tcm/phy/p03/r001/l200-p03-r001-phy-20230322T160139Z-tier_tcm.lh5" - outfile = f"{tmptestdir}/l200-p03-r001-phy-20230322T160139Z-tier_evt.lh5" - - return { - "tcm": (lgnd_test_data.get_path(tcm_path), "hardware_tcm_1"), - "dsp": (lgnd_test_data.get_path(tcm_path.replace("tcm", "dsp")), "dsp", "ch{}"), - "hit": (lgnd_test_data.get_path(tcm_path.replace("tcm", "hit")), "hit", "ch{}"), - "evt": (outfile, "evt"), - } - - -def test_basics(tmptestdir, files_config): - build_evt( - files_config, - config=f"{evt_config_dir}/vov-test-evt-config.json", - wo_mode="of", - ) - outfile = files_config["evt"][0] - - skm_conf = f"{config_dir}/basic-skm-config.json" - skm_out = f"{tmptestdir}/l200-p03-r001-phy-20230322T160139Z-tier_skm.lh5" - - result = build_skm( - outfile, - files_config["hit"][0], - files_config["dsp"][0], - files_config["tcm"][0], - skm_conf, - ) - - assert isinstance(result, lgdo.Table) - - build_skm( - outfile, - files_config["hit"][0], - files_config["dsp"][0], - files_config["tcm"][0], - skm_conf, - skm_out, - wo_mode="o", - ) - - assert os.path.exists(skm_out) - obj, _ = store.read("/skm/", skm_out) - - assert obj == result - - df = obj.view_as("pd") - assert "timestamp" in df.keys() - assert "energy_0" in df.keys() - assert "energy_1" in df.keys() - assert "energy_2" in df.keys() - assert "energy_id_0" in df.keys() - assert "energy_id_1" in df.keys() - assert "energy_id_2" in df.keys() - assert "multiplicity" in df.keys() - assert "energy_sum" in df.keys() - assert (df.multiplicity.to_numpy() <= 3).all() - - vov_eid = ak.to_numpy( - ak.fill_none( - ak.pad_none( - store.read("/evt/energy_id", outfile)[0].view_as("ak"), 3, clip=True - ), - 0, - ), - allow_missing=False, - ) - assert (vov_eid[:, 0] == df.energy_id_0.to_numpy()).all() - assert (vov_eid[:, 1] == df.energy_id_1.to_numpy()).all() - assert (vov_eid[:, 2] == df.energy_id_2.to_numpy()).all() - - -def test_attribute_passing(tmptestdir, files_config): - outfile = f"{tmptestdir}/l200-p03-r001-phy-20230322T160139Z-tier_evt.lh5" - if os.path.exists(outfile): - os.remove(outfile) - - build_evt( - files_config, - config=f"{evt_config_dir}/vov-test-evt-config.json", - wo_mode="of", - ) - - skm_conf = f"{config_dir}/basic-skm-config.json" - - skm_out = f"{tmptestdir}/l200-p03-r001-phy-20230322T160139Z-tier_skm.lh5" - - build_skm( - outfile, - files_config["hit"][0], - files_config["dsp"][0], - files_config["tcm"][0], - skm_conf, - f_skm=skm_out, - wo_mode="o", - ) - - assert os.path.exists(skm_out) - assert "info" in store.read("/skm/timestamp", skm_out)[0].getattrs().keys() - assert store.read("/skm/timestamp", skm_out)[0].getattrs()["info"] == "pk was here"