Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNIST dataset eval #3

Open
snapo opened this issue Nov 15, 2023 · 1 comment
Open

MNIST dataset eval #3

snapo opened this issue Nov 15, 2023 · 1 comment

Comments

@snapo
Copy link
Contributor

snapo commented Nov 15, 2023

Very interesting behaviour...

When
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.94, random_state=42)

using only 6% to learn (with the one-hot encoded data) and 94% as test... we still get 81% accuracy...
This is an absolut amazing result!

It seems Method 1 > Method 3 > Method 2

I run multiple different tests and all came up with the same result that Method 1 is the best method of those 3.

It is also (afaik) a record from all i remember on MNIST to get 81% accuracy with a single threaded cpu within 15 seconds!!! Never saw this before (in python , not c)...

Will probably also try it with CIFAR 10 and see how well it does...

Result:

----------------------------
Size of training set: 4200
Size of testing set: 65800
----------------------------
De-correlating all the xs with each other
----------------------------
Method 1. regression on ys using multiple y_classes in the form of one_hot matrix
train accuracy: 0.9147619047619048
test accuracy: 0.8130851063829787
---------------------------------
Method 2. regression on ys with simple rounding & thresholding of the predicted y classes.....
train accuracy: 0.27404761904761904
test accuracy: 0.2289209726443769
---------------------------------
Method 3. regression on ys using multiple y_classes in the form of random vectors (embeddings)
train accuracy: 0.7904761904761904
test accuracy: 0.6704103343465045
@hunar4321
Copy link
Owner

hunar4321 commented Nov 16, 2023

Thanks for the addition
You can increase the performance further by passing the data through a non-linear layer of weights e.g Relu.
I have added the following lines to the beginning of the code which improves the performance of method 1 to 0.91 on the testing set.
The more nodes, the better performance (with 1000 nodes you can reach 95%) but the computational run becomes exponentially high due to the quadratic nature of "xs" de-correlation. Also the risk of over-fitting increases with more nodes


def activate(w, x):
    linear = x = w.T @ x    
    # out = linear
    out = np.maximum(linear, 0) #relu    
    return out

nodes = 500
w = np.random.randn(xs.shape[0], nodes)
xs = activate(w , xs)

X_train = activate(w , X_train.T).T
X_test = activate(w , X_test.T).T

Ps. Also you can increase the performance of Method 3 if you increase the embed_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants