Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ENH: implement gradient approximation #60

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions dask_glm/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ def compute_stepsize_dask(beta, step, Xbeta, Xstep, y, curr_val,


@normalize
def gradient_descent(X, y, max_iter=100, tol=1e-14, family=Logistic, **kwargs):
def gradient_descent(X, y, max_iter=100, tol=1e-14, family=Logistic,
approx_grad=False, initial_batch_size=None, **kwargs):
"""
Michael Grant's implementation of Gradient Descent.

Expand All @@ -80,6 +81,13 @@ def gradient_descent(X, y, max_iter=100, tol=1e-14, family=Logistic, **kwargs):
Maximum allowed change from prior iteration required to
declare convergence
family : Family
approx_grad : bool
If True (default False), approximate the gradient with a subset of
examples in X and y. When the number of examples is very large this can
result in speed gains.
initial_batch_size : int
The initial batch size when approximating the gradient. Only used when
`approx_grad == True`. Defaults to `min(n // n_chunks, n // 100)`

Returns
-------
Expand All @@ -97,9 +105,23 @@ def gradient_descent(X, y, max_iter=100, tol=1e-14, family=Logistic, **kwargs):
backtrackMult = firstBacktrackMult
beta = np.zeros(p)

if approx_grad:
n_chunks = max(len(c) for c in X.chunks)
batch_size = (min(n // n_chunks, n // 100) + 1
if initial_batch_size is None else initial_batch_size)
keep = {'X': X, 'y': y}

for k in range(max_iter):
if approx_grad:
batch_size = min(1.1 * batch_size + 1, n)
i = np.random.permutation(n).astype(int)
batch = list(i[:int(batch_size)])
X, y = keep['X'][batch], keep['y'][batch]
Xbeta = X.dot(beta)
func = loglike(Xbeta, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can downsample the array without creating an explicit list of indices locally. This probably doesn't matter much when doing things on a single machine but may matter on a distributed system.

If we can index with a boolean dask.array of increasing density then that might work more nicely, though I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this comment was hanging around from a while ago. Just submitted now


# how necessary is this recalculation?
if k % recalcRate == 0:
if not approx_grad and k % recalcRate == 0:
Xbeta = X.dot(beta)
func = loglike(Xbeta, y)

Expand Down