Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy issue #8

Open
lx-zg opened this issue Jul 18, 2023 · 4 comments
Open

Accuracy issue #8

lx-zg opened this issue Jul 18, 2023 · 4 comments

Comments

@lx-zg
Copy link

lx-zg commented Jul 18, 2023

image
I tested Filipino and AG_NEWS datasets, but I couldn’t achieve the accuracy mentioned in your paper. I’m not sure where I went wrong.

@lx-zg
Copy link
Author

lx-zg commented Jul 18, 2023

I would like to add some additional information about the experimental results. I found that the performance is best when k is set to 2. However, during the 9 tests conducted with k=2, the accuracy was not stable and did not consistently reach above 80%. In general, the accuracy rate is 73%.

@kts
Copy link

kts commented Jul 18, 2023

What command did you run? Use --all_train and --all_test

@lx-zg
Copy link
Author

lx-zg commented Jul 20, 2023

Thank you very much for your response!
When using the "all_train" parameter as True, there was an unclear error message. So I extracted your core function, and the specific code is as follows. The result was also generated from this code. (I assume this is the result when "all_train = False" and "all_test = False".)

like this:

# Get the data first
num_test = 100
test_idx_fn = None
num_train = 100
train_idx_fn = "./save"
compressor = "gzip"
k = 5
para = False

# Print out the dataset pair, number of test samples, and test index file name
print("dataset_pair:", dataset_pair[1], "args.num_test:", num_test, "args.test_idx_fn:",
      test_idx_fn)

# Get the training data and labels by selecting a certain number of samples from each class in the dataset
train_data, train_labels = pick_n_sample_from_each_class_given_dataset(dataset_pair[0], num_train,
                                                                               train_idx_fn)

# Get the test data and labels by selecting a certain number of samples from each class in the dataset
test_data, test_labels = pick_n_sample_from_each_class_given_dataset(dataset_pair[1], num_test, test_idx_fn)

# Run the k-NN experiment without using neural networks
non_neural_knn_exp(compressor, test_data, test_labels, train_data, train_labels, agg_by_concat_space, NCD, k, para=para)

# Set the range of k values from 1 to 9
k_values = range(1, 10)
accuracies = []  # Store the accuracies for different k values

# Iterate through each k value
for k in k_values:
    # Run the k-NN experiment without using neural networks and get the predictions and correctness
    pred, correct = non_neural_knn_exp(compressor, test_data, test_labels, train_data, train_labels, agg_by_concat_space, NCD, k, para=para)
    # Calculate the accuracy by dividing the number of correct predictions by the total number of predictions
    accuracy = sum(correct) / len(correct)
    accuracies.append(accuracy)
    print("Accuracy:", accuracy)

# Plot the accuracy vs. k graph
plt.plot(k_values, accuracies)
plt.xlabel('k')
plt.ylabel('Accuracy')
plt.title('Accuracy vs. k')
plt.xlim(1, 9)
plt.ylim(0, 1)
plt.show()```

@bazingagin
Copy link
Owner

Hi @lx-zg, as @kts pointed out, the issue is that you are running on 100-shot settings instead of on the whole training set.

What's your error msg when setting all_train=True? I don't think your extracted code is for running on the whole training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants