You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
We're using word2vec for hypernymy discovering. In order to design a more efficient version of word2vec, we need to know what is exactly the semantics of the variable "c" in function ReadVocab() within the file word2vec.c? Thanks in advance.
void ReadVocab() {
long long a, i = 0;
char c;
char word[MAX_STRING];
FILE *fin = fopen(read_vocab_file, "rb");
if (fin == NULL) {
printf("Vocabulary file not found\n");
exit(1);
}
for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1;
vocab_size = 0;
while (1) {
ReadWord(word, fin);
if (feof(fin)) break;
a = AddWordToVocab(word);
fscanf(fin, "%lld%c", &vocab[a].cn, &c); // semantics of c?
i++;
}
SortVocab();
if (debug_mode > 0) {
printf("Vocab size: %lld\n", vocab_size);
printf("Words in train file: %lld\n", train_words);
}
fin = fopen(train_file, "rb");
if (fin == NULL) {
printf("ERROR: training data file not found!\n");
exit(1);
}
fseek(fin, 0, SEEK_END);
file_size = ftell(fin);
fclose(fin);
}
The text was updated successfully, but these errors were encountered:
Hi mereogeometry, I'd analyzed the word2vec.c for further study.
The format of the binary file 'read_vocab_file' is the following:
word1Acount1Bword2Acount2B......
where A and B are whitespace characters (binary) such as '\t' or '\n' or etc.
For example, the function ReadWord() reads 'word1' and 'A', then fscanf() reads 'count1' and 'B', 'B' is assigned to the variable c, like a trash variable.
Hi,
We're using word2vec for hypernymy discovering. In order to design a more efficient version of word2vec, we need to know what is exactly the semantics of the variable "c" in function ReadVocab() within the file word2vec.c? Thanks in advance.
void ReadVocab() {
long long a, i = 0;
char c;
char word[MAX_STRING];
FILE *fin = fopen(read_vocab_file, "rb");
if (fin == NULL) {
printf("Vocabulary file not found\n");
exit(1);
}
for (a = 0; a < vocab_hash_size; a++) vocab_hash[a] = -1;
vocab_size = 0;
while (1) {
ReadWord(word, fin);
if (feof(fin)) break;
a = AddWordToVocab(word);
fscanf(fin, "%lld%c", &vocab[a].cn, &c); // semantics of c?
i++;
}
SortVocab();
if (debug_mode > 0) {
printf("Vocab size: %lld\n", vocab_size);
printf("Words in train file: %lld\n", train_words);
}
fin = fopen(train_file, "rb");
if (fin == NULL) {
printf("ERROR: training data file not found!\n");
exit(1);
}
fseek(fin, 0, SEEK_END);
file_size = ftell(fin);
fclose(fin);
}
The text was updated successfully, but these errors were encountered: