Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom clustering now takes position matrix instead of distance^2 matrix #87

Open
wants to merge 48 commits into
base: custom_cluster
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
f1f52ed
neighbor spelling (assuming we're sticking to American spelling!)
AdamOrmondroyd May 19, 2022
1c40b21
custom clustering now takes position matrix instead of distance^2 mat…
AdamOrmondroyd May 23, 2022
634a4b8
corrected order of dimensions of position_matrix
AdamOrmondroyd May 24, 2022
d98cd08
renamed calculate_similarity_matrix() to calculate_position_matrix()
AdamOrmondroyd May 24, 2022
a04231a
renamed calculate_similarity_matrix() to calculate_position2_matrix()
AdamOrmondroyd May 24, 2022
d9571a3
Merge branch 'ormorod-custom_cluster' of github.com:Ormorod/PolyChord…
AdamOrmondroyd May 24, 2022
9568985
corrected index in default_cluster
AdamOrmondroyd May 25, 2022
92b3809
added new dimension to cluster input as position_matrix is not square…
AdamOrmondroyd May 26, 2022
db02cc2
changed cluster interfaces to add extra dimension for position_matrix
AdamOrmondroyd May 26, 2022
876de66
Merge branch 'master' into ormorod-custom_cluster
williamjameshandley May 26, 2022
f5f8c93
Updated CI to work on pull request to all branches
williamjameshandley May 26, 2022
eeb83ac
changed cluster interfaces to add extra dimension for position_matrix
AdamOrmondroyd May 26, 2022
4d68c9d
changed distance2_matrix to no longer be square
AdamOrmondroyd May 26, 2022
b6885cb
added extra dimension to cluster argument for non-square distance2_ma…
AdamOrmondroyd May 26, 2022
4f638f2
Merge branch 'ormorod-custom_cluster' of github.com:Ormorod/PolyChord…
AdamOrmondroyd May 26, 2022
7da5423
renamed m to nDims, n to nPoints, and position_matrix (and distance2_…
AdamOrmondroyd May 26, 2022
fa5c42d
changed neighbour to ~~correct~~ UK spelling
AdamOrmondroyd May 26, 2022
5184835
forgot to change the interface in clustering itself
AdamOrmondroyd May 26, 2022
3ed846c
another clustering interface using the wrong dimension of points
AdamOrmondroyd May 26, 2022
e4622f5
swapped indices over
AdamOrmondroyd May 26, 2022
1b90073
Merge branch 'ormorod-custom_cluster' of github.com:Ormorod/PolyChord…
AdamOrmondroyd May 26, 2022
23d8db8
Revert "swapped indices over"
AdamOrmondroyd May 26, 2022
4a9bf0a
now uses c indexing in c and python, and fortran indexing in fortran
AdamOrmondroyd May 26, 2022
2f0c339
Merge branch 'PolyChord:master' into master
AdamOrmondroyd May 28, 2022
f68dcb3
similarity matrix renamed to distance^2 in comments
AdamOrmondroyd Jun 13, 2022
da78633
wrap_cluster() adds one to the cluster list, as Python algos number c…
AdamOrmondroyd Jun 30, 2022
512fca1
Merge branch 'ormorod-custom_cluster' of github.com:Ormorod/PolyChord…
AdamOrmondroyd Jun 30, 2022
be3467a
example cluster now returns -1
AdamOrmondroyd Jun 30, 2022
eb412c3
Merge branch 'master' into ormorod-custom_cluster
AdamOrmondroyd Jul 29, 2022
9fbf7a3
Merge branch 'custom_cluster' of github.com:PolyChord/PolyChordLite i…
AdamOrmondroyd Jul 29, 2022
14efda1
example to test custom clustering
AdamOrmondroyd Jul 29, 2022
3029e60
equal weights of each Gaussian peak, corrected comments
AdamOrmondroyd Jul 29, 2022
88572e3
remove mutual_nearest_neighbours function
AdamOrmondroyd Jul 29, 2022
b49fb80
replace sklearn dependence with compute_knn
AdamOrmondroyd Jul 29, 2022
f8dc320
compute_knn() computed the full nn ordering, then returned the first …
AdamOrmondroyd Jul 29, 2022
4d95077
realised the for loop in the copied knn clustering is actually a whil…
AdamOrmondroyd Jul 29, 2022
2d99c89
Removed unused num_clusters_old from native clustering
AdamOrmondroyd Jul 29, 2022
1a561de
comment reminds that clustering algorithm should be 0-indexed
AdamOrmondroyd Jul 29, 2022
f67543d
updated run_pypolychord.ipynb with py2nb
AdamOrmondroyd Jul 29, 2022
8954de3
correct docstring to clarify cluster function must number clusters fr…
AdamOrmondroyd Sep 22, 2022
ebf3b91
further correction to clustering in docstring
AdamOrmondroyd Sep 22, 2022
e1102fb
change cluster function example comments to docstring
AdamOrmondroyd Sep 22, 2022
4d3adcc
remove unnecessary np.sqrt import
AdamOrmondroyd Sep 22, 2022
70628a5
Correct root=root -> root=0 in make_resume_file (#92)
AdamOrmondroyd Feb 7, 2023
6cf0fc4
PWD->CURDIR
williamjameshandley Feb 16, 2023
3084a81
Updated PWD to CURDIR in setup.py
williamjameshandley Feb 16, 2023
1854372
Update python3.6 CI (#96)
AdamOrmondroyd Feb 20, 2023
cfc77c4
Merge branch 'master' into ormorod-custom_cluster
AdamOrmondroyd Feb 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you merge PolyChord:master into Ormorod:ormorod-custom_cluster so that the diff is cleaner?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to add a python test of clustering if you have time.

schedule:
- cron: "0 0 * * *"

Expand All @@ -16,12 +15,15 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.6, 3.7, 3.8]
python-version: ['3.7', '3.8', '3.9', '3.10', '3.11']
include:
- os: ubuntu-20.04
python-version: '3.6'

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ EXAMPLES = gaussian pyramidal rastrigin twin_gaussian random_gaussian himmelblau
PROGRAMS = polychord_fortran polychord_CC polychord_CC_ini

# Directories
SRC_DIR = $(PWD)/src
SRC_DIR = $(CURDIR)/src
DRIVERS_DIR = $(SRC_DIR)/drivers
POLYCHORD_DIR = $(SRC_DIR)/polychord
LIKELIHOOD_DIR = $(PWD)/likelihoods
LIKELIHOOD_DIR = $(CURDIR)/likelihoods
EXAMPLES_DIR = $(LIKELIHOOD_DIR)/examples
BIN_DIR = $(PWD)/bin
LIB_DIR = $(PWD)/lib
BIN_DIR = $(CURDIR)/bin
LIB_DIR = $(CURDIR)/lib
export DRIVERS_DIR POLYCHORD_DIR PYPOLYCHORD_DIR LIKELIHOOD_DIR EXAMPLES_DIR BIN_DIR LIB_DIR

# Whether to use MPI
Expand Down
20 changes: 10 additions & 10 deletions pypolychord/_pypolychord.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,24 +117,24 @@ void dumper(int ndead, int nlive, int npars, double* live, double* dead, double*
/* Callback to the cluster */
static PyObject *python_cluster = NULL;

void cluster(double* distance2_matrix, int* cluster_list, int n)
void cluster(double* points, int* cluster_list, int nDims, int nPoints)
{
/* create a python version of distance2_matrix */
npy_intp shape[] = {n,n};
PyObject *array_distance2_matrix = PyArray_SimpleNewFromData(2, shape, NPY_DOUBLE, distance2_matrix);
if (array_distance2_matrix ==NULL) throw PythonException();
PyArray_CLEARFLAGS(reinterpret_cast<PyArrayObject*>(array_distance2_matrix), NPY_ARRAY_WRITEABLE);
/* create a python version of points */
npy_intp shape[] = {nPoints,nDims};
PyObject *array_points = PyArray_SimpleNewFromData(2, shape, NPY_DOUBLE, points);
if (array_points ==NULL) throw PythonException();
PyArray_CLEARFLAGS(reinterpret_cast<PyArrayObject*>(array_points), NPY_ARRAY_WRITEABLE);

/* create a python version of cluster_list */
npy_intp shape1[] = {n};
npy_intp shape1[] = {nPoints};
PyObject *array_cluster_list = PyArray_SimpleNewFromData(1, shape1, NPY_INT, cluster_list);
if (array_cluster_list==NULL) {Py_DECREF(array_distance2_matrix); throw PythonException();}
if (array_cluster_list==NULL) {Py_DECREF(array_points); throw PythonException();}

/* Compute cluster_list from the cluster */
PyObject_CallFunctionObjArgs(python_cluster,array_distance2_matrix,array_cluster_list,NULL);
PyObject_CallFunctionObjArgs(python_cluster,array_points,array_cluster_list,NULL);

/* Garbage collect */
Py_DECREF(array_cluster_list); Py_DECREF(array_distance2_matrix);
Py_DECREF(array_cluster_list); Py_DECREF(array_points);
}


Expand Down
26 changes: 15 additions & 11 deletions pypolychord/polychord.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ def default_dumper(live, dead, logweights, logZ, logZerr):
pass


def default_cluster(distance2_matrix):
return np.zeros(distance2_matrix.shape[0],dtype=int)
def default_cluster(points):
return np.full(points.shape[0], -1,dtype=int)


def run_polychord(loglikelihood, nDims, nDerived, settings,
Expand Down Expand Up @@ -112,14 +112,14 @@ def run_polychord(loglikelihood, nDims, nDerived, settings,

Parameters
----------
distance2_matrix: numpy.array
squared distances between points. Shape (nPoints, nPoints)
points: numpy.array
positions of points. Shape (nDims, nPoints)
williamjameshandley marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
cluster_list: array-like
cluster labels. Must start from 1. All clusters must have at least
one point, so that max(cluster_list) gives the number of clusters
cluster labels. Must start from 0. All clusters must have at least
one point, so that max(cluster_list)-1 gives the number of clusters
found. Length nPoints.


Expand Down Expand Up @@ -197,8 +197,11 @@ def wrap_loglikelihood(theta, phi):
def wrap_prior(cube, theta):
theta[:] = prior(cube)

def wrap_cluster(distance2_matrix, cluster_list):
cluster_list[:] = cluster(distance2_matrix)
def wrap_cluster(points, cluster_list):
cluster_list[:] = cluster(points) + 1

settings.grade_dims = [int(d) for d in settings.grade_dims]
settings.nlives = {float(logL):int(nlive) for logL, nlive in settings.nlives.items()}

# Run polychord from module library
_pypolychord.run(wrap_loglikelihood,
Expand Down Expand Up @@ -254,6 +257,7 @@ def make_resume_file(settings, loglikelihood, prior):
rank = comm.Get_rank()
size = comm.Get_size()
except ImportError:
MPI = False
rank = 0
size = 1

Expand All @@ -267,17 +271,17 @@ def make_resume_file(settings, loglikelihood, prior):
nDerived = len(derived)
lives.append(np.concatenate([cube,theta,derived,[logL_birth, logL]]))

try:
if MPI:
sendbuf = np.array(lives).flatten()
sendcounts = np.array(comm.gather(len(sendbuf)))
if rank == 0:
recvbuf = np.empty(sum(sendcounts), dtype=int)
else:
recvbuf = None
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendcounts), root=root)
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendcounts), root=0)

lives = np.reshape(sendbuf, (len(settings.cube_samples), len(lives[0])))
except NameError:
else:
lives = np.array(lives)

if rank == 0:
Expand Down
158 changes: 66 additions & 92 deletions run_pypolychord.ipynb

Large diffs are not rendered by default.

22 changes: 12 additions & 10 deletions run_pypolychord.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from numpy import pi, log, sqrt
from numpy import pi, log
import numpy as np
import pypolychord
from pypolychord.settings import PolyChordSettings
Expand Down Expand Up @@ -41,15 +41,17 @@ def dumper(live, dead, logweights, logZ, logZerr):

#| Optional cluster function allow user-defined clustering

def cluster(distance2_matrix):
npoints = distance2_matrix.shape[0]
clusters = np.ones(npoints, dtype=int)

# <do some clustering algorithm to assign clusters>
# - clusters should be an array of cluster labels for each point
# - each cluster should have at least one point
# - thus max(clusters) should be the number of clusters
# - work with the above numpy integer array
def cluster(points):
"""
<do some clustering algorithm to assign clusters>
- clusters should be an array of cluster labels for each point
- each cluster should have at least one point
- thus max(clusters)+1 should be the number of clusters
- i.e. clusters are 0-indexed
- work with the above numpy integer array
"""
npoints = points.shape[0]
clusters = np.full(npoints, -1, dtype=int)

return clusters

Expand Down
151 changes: 151 additions & 0 deletions run_pypolychord_custom_clustering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@

from numpy import pi, log, sqrt
import numpy as np
import pypolychord
from pypolychord.settings import PolyChordSettings
from pypolychord.priors import UniformPrior
from scipy.spatial import distance_matrix
from scipy.special import logsumexp
try:
from mpi4py import MPI
except ImportError:
pass



#| Define a four-dimensional twin-spherical gaussian likelihood,
#| width sigma=0.1, centered on ±[0.5]^nDims with one derived parameter.
#| The derived parameter is the squared radius from [0]^nDims

nDims = 4
nDerived = 1
sigma = 0.1

centres = np.array([[-0.5] * nDims, [0.5] * nDims])
weights = np.array([0.5, 0.5])


def likelihood(theta):
"""Twin Gaussian likelihood."""
nDims = len(theta)
logL = -np.log(2 * np.pi * sigma * sigma) * nDims / 2
logL += logsumexp(-np.sum((theta - centres) ** 2, axis=-1) / 2 / sigma / sigma, b = weights)
return logL, [np.sum(theta**2)]


#| Define a box uniform prior from -1 to 1

def prior(hypercube):
""" Uniform prior from [-1,1]^D. """
return UniformPrior(-1, 1)(hypercube)


#| Optional cluster function allow user-defined clustering
## KNN clustering algorithm translated from clustering.F90

def compute_nn(position_matrix):
return np.argsort(distance_matrix(position_matrix, position_matrix))


def relabel(labels):
k = max(labels)
appearance_order = {}
num_found = 0

for label in labels:
if label not in appearance_order:
appearance_order[label] = num_found
num_found += 1

for i, old_label in enumerate(labels):
labels[i] = appearance_order[old_label]
return labels


def do_clustering(knn_array):
num_points = knn_array.shape[0]
labels = np.arange(num_points)
for iii in range(num_points):
for ii in range(iii + 1, num_points):
if labels[ii] != labels[iii]:
if (ii in knn_array[iii]) or (iii in knn_array[ii]):
for i in range(num_points):
if labels[i] == labels[ii] or labels[i] == labels[iii]:
labels[i] = min([labels[ii], labels[iii]])
return relabel(labels)


def knn_cluster(position_matrix):
"""slight concern if two points are the same because sklearn"""
npoints = position_matrix.shape[0]
k = min(npoints, 10)
nn_array = compute_nn(position_matrix)
labels = np.arange(npoints)
num_clusters = npoints

labels_old = labels

n = 2
while n <= k:
labels = do_clustering(nn_array[:, :n])
num_clusters = max(labels) + 1

if num_clusters <= 0:
raise ValueError("somehow got <= 0 clusters")
elif 1 == num_clusters:
return labels
elif np.all(labels == labels_old):
break
elif k == n:
k = min(k * 2, npoints)

labels_old = labels
n += 1

if num_clusters > 1:
i_cluster = 0
while i_cluster < num_clusters:
cluster = position_matrix[labels == i_cluster]
labels[labels == i_cluster] = knn_cluster(cluster) + num_clusters
labels = relabel(labels)
if num_clusters - 1 == max(labels):
i_cluster += 1
return labels


#| Initialise the settings

settings = PolyChordSettings(nDims, nDerived)
settings.file_root = "custom_clustering"
settings.nlive = 200
settings.do_clustering = True
settings.read_resume = False

#| Run PolyChord

output = pypolychord.run_polychord(likelihood, nDims, nDerived, settings, prior, cluster=knn_cluster)

#| Create a paramnames file

paramnames = [('p%i' % i, r'\theta_%i' % i) for i in range(nDims)]
paramnames += [('r*', 'r')]
output.make_paramnames_files(paramnames)

#| Make an anesthetic plot (could also use getdist)
try:
from anesthetic import NestedSamples
samples = NestedSamples(root= settings.base_dir + '/' + settings.file_root)
fig, axes = samples.plot_2d(['p0','p1','p2','p3','r'])
fig.savefig('posterior.pdf')

except ImportError:
try:
import getdist.plots
posterior = output.posterior
g = getdist.plots.getSubplotPlotter()
g.triangle_plot(posterior, filled=True)
g.export('custom_clustering.pdf')
except ImportError:
print("Install matplotlib and getdist for plotting examples")

print("Install anesthetic or getdist for for plotting examples")
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def run(self):
env["DEBUG"] = "1"

BASE_PATH = os.path.dirname(os.path.abspath(__file__))
env["PWD"] = BASE_PATH
env["CURDIR"] = BASE_PATH
env.update({k : os.environ[k] for k in ["CC", "CXX", "FC"] if k in os.environ})
subprocess.check_call(["make", "-e", "libchord.so"], env=env, cwd=BASE_PATH)
if not os.path.isdir("pypolychord/lib/"):
Expand Down
10 changes: 5 additions & 5 deletions src/polychord/c_interface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,14 @@ void run_polychord(
double (*c_loglikelihood_ptr)(double*,int,double*,int),
void (*c_prior_ptr)(double*,double*,int),
void (*c_dumper_ptr)(int,int,int,double*,double*,double*,double,double),
void (*c_cluster_ptr)(double*,int*,int),
void (*c_cluster_ptr)(double*,int*,int,int),
Settings s, MPI_Comm& comm)
#else
void run_polychord(
double (*c_loglikelihood_ptr)(double*,int,double*,int),
void (*c_prior_ptr)(double*,double*,int),
void (*c_dumper_ptr)(int,int,int,double*,double*,double*,double,double),
void (*c_cluster_ptr)(double*,int*,int),
void (*c_cluster_ptr)(double*,int*,int,int),
Settings s)
#endif
{
Expand Down Expand Up @@ -125,7 +125,7 @@ void run_polychord(
double (*c_loglikelihood_ptr)(double*,int,double*,int),
void (*c_prior_ptr)(double*,double*,int),
void (*c_dumper_ptr)(int,int,int,double*,double*,double*,double,double),
void (*c_cluster_ptr)(double*,int*,int),
void (*c_cluster_ptr)(double*,int*,int,int),
Settings s)
{
int flag;
Expand Down Expand Up @@ -228,5 +228,5 @@ void default_prior(double* cube, double* theta, int nDims)

void default_dumper(int,int,int,double*,double*,double*,double,double) {}

void default_cluster(double* distance2_matrix, int* cluster_list, int n)
{ for(int i=0;i<n;i++) cluster_list[i] = 0; }
void default_cluster(double* points, int* cluster_list, int nDims, int nPoints)
{ for(int i=0;i<nPoints;i++) cluster_list[i] = 0; }
Loading