Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pair programming #14

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

pair programming #14

wants to merge 5 commits into from

Conversation

wmilkowska
Copy link

No description provided.

@wmilkowska
Copy link
Author

myplot
an example plot here ^

speed.py contains the easy task, we tried to make the code as clear as possible. I'll be working on other tasks in the near future :)

Comment on lines +47 to +64
x = df.iloc[0:16]
x = x[["pitch", "velocity"]]

scores = {"score": [], "idx": []}

seq_len = len(x)

for i in range(0, len(df) - seq_len):
seq = df.iloc[i : i + seq_len]
seq = seq[["pitch", "velocity"]]
score = cos_sim_score(x, seq)
scores["score"].append(score)
scores["idx"].append(i)

similarity = pd.DataFrame(scores)
similarity.sort_values(by="score", ascending=False, inplace=True)

print(similarity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x = df.iloc[0:16]
x = x[["pitch", "velocity"]]
scores = {"score": [], "idx": []}
seq_len = len(x)
for i in range(0, len(df) - seq_len):
seq = df.iloc[i : i + seq_len]
seq = seq[["pitch", "velocity"]]
score = cos_sim_score(x, seq)
scores["score"].append(score)
scores["idx"].append(i)
similarity = pd.DataFrame(scores)
similarity.sort_values(by="score", ascending=False, inplace=True)
print(similarity)
if __name__ == "__main__":
# Moved this from the top of the script
dataset = load_dataset("roszcz/internship-midi-data-science", split="train")
record = dataset[0]
df = pd.DataFrame(record["notes"])
print(df.head())
x = df.iloc[0:16]
x = x[["pitch", "velocity"]]
scores = {"score": [], "idx": []}
seq_len = len(x)
for i in range(0, len(df) - seq_len):
seq = df.iloc[i : i + seq_len]
seq = seq[["pitch", "velocity"]]
score = cos_sim_score(x, seq)
scores["score"].append(score)
scores["idx"].append(i)
similarity = pd.DataFrame(scores)
similarity.sort_values(by="score", ascending=False, inplace=True)
print(similarity)

Having code outside of if __name__ == "__main__" will execute it every time you try to import this file (i.e. from sequence_similarity import cos_sim_score), and it's usually better to avoid that :)

Comment on lines +12 to +20
def cos_sim_score(sequence: pd.DataFrame, window: pd.DataFrame) -> float:
"""
Calculating cosine similarity between sequence and window
Args:
sequence (pd.DataFrame): input sequence
window (pd.DataFrame): subset of rolling window
Returns:
float: cosine similarity score
"""
Copy link
Member

@roszcz roszcz Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def cos_sim_score(sequence: pd.DataFrame, window: pd.DataFrame) -> float:
"""
Calculating cosine similarity between sequence and window
Args:
sequence (pd.DataFrame): input sequence
window (pd.DataFrame): subset of rolling window
Returns:
float: cosine similarity score
"""
def cos_sim_score(sequence_a: pd.DataFrame, sequence_b: pd.DataFrame) -> float:
"""
Calculating cosine similarity between two sequences
Args:
sequence_a (pd.DataFrame): first sequence
sequence_b (pd.DataFrame): second sequence
Returns:
float: cosine similarity score
"""

This function is great, but name "window" doesn't really make sense from the point of view of this function - it measures distance between any two sequences and it's only in your specific use case that the second sequence is a "window" moving over the signal.

Comment on lines +47 to +59
x = df.iloc[0:16]
x = x[["pitch", "velocity"]]

scores = {"score": [], "idx": []}

seq_len = len(x)

for i in range(0, len(df) - seq_len):
seq = df.iloc[i : i + seq_len]
seq = seq[["pitch", "velocity"]]
score = cos_sim_score(x, seq)
scores["score"].append(score)
scores["idx"].append(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x = df.iloc[0:16]
x = x[["pitch", "velocity"]]
scores = {"score": [], "idx": []}
seq_len = len(x)
for i in range(0, len(df) - seq_len):
seq = df.iloc[i : i + seq_len]
seq = seq[["pitch", "velocity"]]
score = cos_sim_score(x, seq)
scores["score"].append(score)
scores["idx"].append(i)
target_sequence = df.iloc[0:16]
target_sequence = target_sequence[["pitch", "velocity"]]
scores = {"score": [], "idx": []}
seq_len = len(target_sequence)
for i in range(0, len(df) - seq_len):
sequence_window = df.iloc[i : i + seq_len]
sequence_window = sequence_window[["pitch", "velocity"]]
score = cos_sim_score(sequence_a=target_sequence, sequence_b=sequence_window)
scores["score"].append(score)
scores["idx"].append(i)

Logic's good, these are just readability suggestions 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants