-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RandomSampler #24
Changes from 1 commit
a005317
a49551e
e414cb1
c27e8b6
80a3c3f
947320b
176c25c
9f740ce
bc196a4
db329d4
f1c61dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,13 +24,32 @@ or locally on your CPU. | |
*dwave-samplers* implements the following classical algorithms for solving | ||
:term:`binary quadratic model`\ s (BQM): | ||
|
||
* Random: a sampler that draws uniform random samples. | ||
* `Simulated Annealing`_: a probabilistic heuristic for optimization and approximate | ||
Boltzmann sampling well suited to finding good solutions of large problems. | ||
* `Steepest Descent`_: a discrete analogue of gradient descent, often used in | ||
machine learning, that quickly finds a local minimum. | ||
* `Tabu`_: a heuristic that employs local search with methods to escape local minima. | ||
* `Tree Decomposition`_: an exact solver for problems with low treewidth. | ||
|
||
Random | ||
====== | ||
|
||
Random samplers provide a useful baseline performance comparison. The variable | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this be the place to mention which PRNG is used? e.g., Mersenne-Twister? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
assignments in each sample are chosen by a coin flip. | ||
|
||
>>> from dwave.samplers import RandomSampler | ||
>>> sampler = RandomSampler() | ||
|
||
Create a random binary quadratic model. | ||
|
||
>>> import dimod | ||
>>> bqm = dimod.generators.gnp_random_bqm(100, .5, 'BINARY') | ||
|
||
Get the 20 best random samples found in .2 seconds of searching. | ||
|
||
>>> sampleset = sampler.sample(bqm, num_reads=20, time_limit=.2) | ||
|
||
Simulated Annealing | ||
=================== | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Copyright 2022 D-Wave Systems Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from dwave.samplers.random.sampler import * |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
# distutils: language = c++ | ||
# cython: language_level = 3 | ||
|
||
# Copyright 2022 D-Wave Systems Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
cimport cython | ||
|
||
from cpython.pycapsule cimport PyCapsule_IsValid, PyCapsule_GetPointer | ||
from libc.time cimport time, time_t | ||
from libcpp.algorithm cimport sort | ||
from libcpp.vector cimport vector | ||
from posix.time cimport clock_gettime, timespec, CLOCK_REALTIME | ||
|
||
import dimod | ||
cimport dimod | ||
import numpy as np | ||
cimport numpy as np | ||
cimport numpy.random | ||
|
||
cdef extern from *: | ||
""" | ||
#if defined(_WIN32) || defined(_WIN64) | ||
|
||
#include <Windows.h> | ||
|
||
double realtime_clock() { | ||
LARGE_INTEGER frequency; | ||
LARGE_INTEGER now; | ||
|
||
QueryPerformanceFrequency(&frequency); | ||
QueryPerformanceCounter(&now); | ||
|
||
return now.QuadPart / frequency.QuadPart; | ||
} | ||
|
||
#else | ||
|
||
double realtime_clock() { | ||
struct timespec ts; | ||
clock_gettime(CLOCK_MONOTONIC, &ts); | ||
return ts.tv_sec + ts.tv_nsec / 1e9; | ||
} | ||
|
||
#endif | ||
""" | ||
double realtime_clock() | ||
|
||
|
||
cdef struct state_t: | ||
np.float64_t energy | ||
vector[np.int8_t] sample | ||
|
||
|
||
cdef bint comp_state(const state_t& a, const state_t& b) nogil: | ||
return a.energy < b.energy | ||
|
||
|
||
cdef state_t get_sample(dimod.cyBQM_float64 cybqm, | ||
numpy.random.bitgen_t* bitgen, | ||
bint is_spin = False, | ||
): | ||
# developer note: there is a bunch of potential optimization here | ||
cdef state_t state | ||
|
||
# generate the sample | ||
state.sample.reserve(cybqm.num_variables()) | ||
cdef Py_ssize_t i | ||
for i in range(cybqm.num_variables()): | ||
state.sample.push_back(bitgen.next_uint32(bitgen.state) % 2) | ||
|
||
if is_spin: | ||
# go back through and convert to spin | ||
for i in range(state.sample.size()): | ||
state.sample[i] = 2 * state.sample[i] - 1 | ||
|
||
state.energy = cybqm.data().energy(state.sample.begin()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Converting spins and computing energies seem more fitting as a postprocessing step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only in the case when you're keeping every sample. Otherwise you need the energies in order to keep the population to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK that makes sense. Follow-up question: why do we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Four different combos with their uses:
Obviously these are not inclusive of use cases but should give some idea. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would forbid user setting both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 947320b changed the behavior to always return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was looking at your comment above, and missed that update, thanks. |
||
|
||
return state | ||
|
||
|
||
@cython.boundscheck(False) | ||
@cython.wraparound(False) | ||
def sample(bqm, Py_ssize_t num_reads, object seed, np.float64_t time_limit): | ||
|
||
cdef double preprocessing_start_time = realtime_clock() | ||
|
||
cdef Py_ssize_t i, j # counters for use later | ||
|
||
# Get Cython access to the BQM. We could template to avoid the copy, | ||
# but honestly everyone just uses float64 anyway so... | ||
cdef dimod.cyBQM_float64 cybqm = dimod.as_bqm(bqm, dtype=float).data | ||
cdef bint is_spin = bqm.vartype is dimod.SPIN | ||
|
||
# Get Cython access to the rng | ||
rng = np.random.default_rng(seed) | ||
cdef numpy.random.bitgen_t *bitgen | ||
cdef const char *capsule_name = "BitGenerator" | ||
capsule = rng.bit_generator.capsule | ||
if not PyCapsule_IsValid(capsule, capsule_name): | ||
raise ValueError("Invalid pointer to anon_func_state") | ||
bitgen = <numpy.random.bitgen_t *> PyCapsule_GetPointer(capsule, capsule_name) | ||
|
||
cdef double sampling_start_time = realtime_clock() | ||
|
||
cdef double sampling_stop_time | ||
if time_limit < 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be Quoting docstring below
Should this be strictly positive? update: Nvm, I can see why a strict inequality is an intuitive choice. I don't have an objective argument at the moment. |
||
sampling_stop_time = float('inf') | ||
else: | ||
sampling_stop_time = sampling_start_time + time_limit | ||
|
||
# try sampling | ||
cdef Py_ssize_t num_drawn = 0 | ||
cdef vector[state_t] samples | ||
for i in range(num_reads): | ||
samples.push_back(get_sample(cybqm, bitgen, is_spin)) | ||
num_drawn += 1 | ||
|
||
if realtime_clock() > sampling_stop_time: | ||
break | ||
|
||
if time_limit >= 0: | ||
while realtime_clock() < sampling_stop_time: | ||
|
||
samples.push_back(get_sample(cybqm, bitgen, is_spin)) | ||
sort(samples.begin(), samples.end(), comp_state) | ||
samples.pop_back() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps use a heap instead? (for total Alternatively, you can use insertion sort (for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tried that, it actually hurt performance. Imagine it's the compiler being smart. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or constant factor too big to notice benefits on smaller scale. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, after the change introduced in 176c25c, I was able to get a performance boost at very large There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For posterity I used bqm = dimod.generators.gnp_random_bqm(1000, .5, 'BINARY')
num_reads=10000 |
||
|
||
num_drawn += 1 | ||
|
||
cdef double postprocessing_start_time = realtime_clock() | ||
|
||
if time_limit < 0: | ||
# for consistency we sort in this case as well, though we count | ||
# it towards postprocessing since it's not necessary | ||
sort(samples.begin(), samples.end(), comp_state) | ||
|
||
record = np.rec.array(np.empty(num_reads, | ||
dtype=[('sample', np.int8, (cybqm.num_variables(),)), | ||
('energy', float), | ||
('num_occurrences', int)])) | ||
|
||
record['num_occurrences'][:] = 1 | ||
|
||
cdef np.float64_t[:] energies_view = record['energy'] | ||
for i in range(num_reads): | ||
energies_view[i] = samples[i].energy | ||
|
||
cdef np.int8_t[:, :] sample_view = record['sample'] | ||
for i in range(num_reads): | ||
for j in range(cybqm.num_variables()): | ||
sample_view[i, j] = samples[i].sample[j] | ||
|
||
sampleset = dimod.SampleSet(record, bqm.variables, info=dict(), vartype=bqm.vartype) | ||
|
||
sampleset.info.update( | ||
num_drawn=num_drawn, | ||
prepreocessing_time=sampling_start_time-preprocessing_start_time, | ||
sampling_time=postprocessing_start_time-sampling_start_time, | ||
postprocessing_time=realtime_clock()-preprocessing_start_time, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Return as a dict, i.e., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am open to having it be nested. Though since we're already changing the names, IMO consistency is not the main reason to do it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do nest it, and if we're already changing the names, IMO it would look nicer to have info = dict(timing=dict(
preprocessing=...,
sampling=...,
postprocessing=...,
)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO consistency of timing terminology with QPU and other solvers is a priority. Otherwise the code to plot data from different solvers on the same graph becomes very unwieldy. This may require some exceptions and compromises because different solvers structures have different structures, but aiming for maximum comparability would be good thing. This would also be an argument for generic timing categories based on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for nested timings! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also consider typing the duration variables. Perhaps best to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also like that. I did it this way for consistency with QPU timing, but we're already being inconsistent in other places so I think probably worth raising at #22 ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. QPU timing info could be considered encoded for transport -- and we can decode it as |
||
) | ||
|
||
return sampleset |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# Copyright 2022 D-Wave Systems Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import typing | ||
|
||
import dimod | ||
import numpy as np | ||
|
||
from dwave.samplers.random.cyrandom import sample | ||
|
||
|
||
__all__ = ['RandomSampler'] | ||
|
||
|
||
class RandomSampler(dimod.Sampler): | ||
"""A random sampler, useful as a performance baseline and for testing. | ||
|
||
Examples: | ||
|
||
>>> from dwave.samplers import RandomSampler | ||
>>> sampler = RandomSampler() | ||
|
||
Create a random binary quadratic model. | ||
|
||
>>> import dimod | ||
>>> bqm = dimod.generators.gnp_random_bqm(100, .5, 'BINARY') | ||
|
||
Get the 20 best random samples found in .2 seconds of searching. | ||
|
||
>>> sampleset = sampler.sample(bqm, num_reads=20, time_limit=.2) | ||
|
||
""" | ||
|
||
parameters: typing.Mapping[str, typing.List] = dict( | ||
num_reads=[], | ||
seed=[], | ||
time_limit=[], | ||
) | ||
"""Keyword arguments accepted by the sampling methods. | ||
|
||
Examples: | ||
|
||
>>> from dwave.samplers import RandomSampler | ||
>>> sampler = RandomSampler() | ||
>>> sampler.parameters | ||
{'num_reads': [], 'seed': [], 'time_limit': []} | ||
|
||
""" | ||
|
||
properties: typing.Mapping[str, typing.Any] = dict( | ||
) | ||
"""Information about the solver. Empty. | ||
|
||
Examples: | ||
|
||
>>> from dwave.samplers import RandomSampler | ||
>>> sampler = RandomSampler() | ||
>>> sampler.properties | ||
{} | ||
|
||
""" | ||
|
||
def sample(self, | ||
bqm: dimod.BinaryQuadraticModel, | ||
*, | ||
num_reads: int = 10, | ||
seed: typing.Union[None, int, np.random.Generator] = None, | ||
time_limit: typing.Optional[float] = None, | ||
**kwargs, | ||
) -> dimod.SampleSet: | ||
"""Return random samples for a binary quadratic model. | ||
|
||
Args: | ||
bqm: Binary quadratic model to be sampled from. | ||
|
||
num_reads: The number of samples to be returned. | ||
|
||
seed: | ||
Seed for the random number generator. | ||
Passed to :func:`numpy.random.default_rng()`. | ||
|
||
time_limit: | ||
The maximum sampling time in seconds. | ||
If given and non-negative, samples are drawn until ``time_limit``. | ||
Only the best ``num_reads`` (or fewer) samples are kept. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Until we figure out how best to extend/generalize A case of setting Perhaps we should introduce So:
With this scheme, I would set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had already changed it so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To summarize the current implementation's behaviour...
I think the bolded case could be an invitation to making human errors for default edit: I reread the above, and I think the ambiguity comes from the word argument name "time limit" being suggestive of a hard cutoff. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, happy to lower the default to 1. I had it higher for backwards compatibility reasons with dimod but I don't think that actually matters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, so it looks an implied AND between
I find this awkward and inconsistent with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, how about I just make two samplers. One that take a Personally I think that's way overkill, but it seems like no one else agrees with me 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The problem is that we generate a lot of samples.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we're deciding between the two, I also favour the latter for similar reasons. I do think Radomir's proposal is more intuitive and gives more control over the behaviour of the sampler. Another point to consider is if we add a similar feature for Neal, a consistent behaviour across classes of samplers would be desirable. And in the case of Neal, I think it's reasonable to set a time limit and retrieve all solutions it came across. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I concede. I will refactor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extending sampler interface with |
||
|
||
Returns: | ||
A sample set. | ||
Some additional information is provided in the | ||
:attr:`~dimod.SampleSet.info` dictionary: | ||
|
||
* **num_drawn**: The total number of samples generated. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is my understanding as follows correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specifically when |
||
* **prepreocessing_time**: The time to parse the ``bqm`` and to | ||
initialize the random number generator. | ||
* **sampling_time**: The time used to generate the samples | ||
and calculate the energies. This is the number controlled by | ||
``time_limit``. | ||
* **postprocessing_time**: The time to construct the sample | ||
set. | ||
|
||
""" | ||
|
||
# we could count this towards preprocesing time but IMO it's fine to | ||
# skip for simplicity. | ||
self.remove_unknown_kwargs(**kwargs) | ||
|
||
return sample(bqm, | ||
num_reads=num_reads, | ||
seed=seed, | ||
time_limit=-1 if time_limit is None else time_limit, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing backticks i.e.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are for the link. In this case I am not linking to anything. Though open to a good place to link to, https://en.wikipedia.org/wiki/Randomization seemed a bit over general 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's backticks plus underscore to link to the new Random section in line 35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I misunderstood. Will do.