pair programming #14

wmilkowska · 2023-09-13T10:34:45Z

No description provided.

wmilkowska · 2023-09-13T12:04:42Z

an example plot here ^

speed.py contains the easy task, we tried to make the code as clear as possible. I'll be working on other tasks in the near future :)

roszcz · 2023-09-14T15:35:01Z

sequence_similarity.py

+x = df.iloc[0:16]
+x = x[["pitch", "velocity"]]
+
+scores = {"score": [], "idx": []}
+
+seq_len = len(x)
+
+for i in range(0, len(df) - seq_len):
+    seq = df.iloc[i : i + seq_len]
+    seq = seq[["pitch", "velocity"]]
+    score = cos_sim_score(x, seq)
+    scores["score"].append(score)
+    scores["idx"].append(i)
+
+similarity = pd.DataFrame(scores)
+similarity.sort_values(by="score", ascending=False, inplace=True)
+
+print(similarity)


Suggested change

x = df.iloc[0:16]

x = x[["pitch", "velocity"]]

scores = {"score": [], "idx": []}

seq_len = len(x)

for i in range(0, len(df) - seq_len):

seq = df.iloc[i : i + seq_len]

seq = seq[["pitch", "velocity"]]

score = cos_sim_score(x, seq)

scores["score"].append(score)

scores["idx"].append(i)

similarity = pd.DataFrame(scores)

similarity.sort_values(by="score", ascending=False, inplace=True)

print(similarity)

if __name__ == "__main__":

# Moved this from the top of the script

dataset = load_dataset("roszcz/internship-midi-data-science", split="train")

record = dataset[0]

df = pd.DataFrame(record["notes"])

print(df.head())

x = df.iloc[0:16]

x = x[["pitch", "velocity"]]

scores = {"score": [], "idx": []}

seq_len = len(x)

for i in range(0, len(df) - seq_len):

seq = df.iloc[i : i + seq_len]

seq = seq[["pitch", "velocity"]]

score = cos_sim_score(x, seq)

scores["score"].append(score)

scores["idx"].append(i)

similarity = pd.DataFrame(scores)

similarity.sort_values(by="score", ascending=False, inplace=True)

print(similarity)

Having code outside of if __name__ == "__main__" will execute it every time you try to import this file (i.e. from sequence_similarity import cos_sim_score), and it's usually better to avoid that :)

roszcz · 2023-09-14T15:39:58Z

sequence_similarity.py

+def cos_sim_score(sequence: pd.DataFrame, window: pd.DataFrame) -> float:
+    """
+    Calculating cosine similarity between sequence and window
+    Args:
+        sequence (pd.DataFrame): input sequence
+        window (pd.DataFrame): subset of rolling window
+    Returns:
+        float: cosine similarity score
+    """


Suggested change

def cos_sim_score(sequence: pd.DataFrame, window: pd.DataFrame) -> float:

"""

Calculating cosine similarity between sequence and window

Args:

sequence (pd.DataFrame): input sequence

window (pd.DataFrame): subset of rolling window

Returns:

float: cosine similarity score

"""

def cos_sim_score(sequence_a: pd.DataFrame, sequence_b: pd.DataFrame) -> float:

"""

Calculating cosine similarity between two sequences

Args:

sequence_a (pd.DataFrame): first sequence

sequence_b (pd.DataFrame): second sequence

Returns:

float: cosine similarity score

"""

This function is great, but name "window" doesn't really make sense from the point of view of this function - it measures distance between any two sequences and it's only in your specific use case that the second sequence is a "window" moving over the signal.

roszcz · 2023-09-14T15:42:47Z

sequence_similarity.py

+x = df.iloc[0:16]
+x = x[["pitch", "velocity"]]
+
+scores = {"score": [], "idx": []}
+
+seq_len = len(x)
+
+for i in range(0, len(df) - seq_len):
+    seq = df.iloc[i : i + seq_len]
+    seq = seq[["pitch", "velocity"]]
+    score = cos_sim_score(x, seq)
+    scores["score"].append(score)
+    scores["idx"].append(i)


Suggested change

x = df.iloc[0:16]

x = x[["pitch", "velocity"]]

scores = {"score": [], "idx": []}

seq_len = len(x)

for i in range(0, len(df) - seq_len):

seq = df.iloc[i : i + seq_len]

seq = seq[["pitch", "velocity"]]

score = cos_sim_score(x, seq)

scores["score"].append(score)

scores["idx"].append(i)

target_sequence = df.iloc[0:16]

target_sequence = target_sequence[["pitch", "velocity"]]

scores = {"score": [], "idx": []}

seq_len = len(target_sequence)

for i in range(0, len(df) - seq_len):

sequence_window = df.iloc[i : i + seq_len]

sequence_window = sequence_window[["pitch", "velocity"]]

score = cos_sim_score(sequence_a=target_sequence, sequence_b=sequence_window)

scores["score"].append(score)

scores["idx"].append(i)

Logic's good, these are just readability suggestions 👍

wmilkowska added 3 commits September 13, 2023 12:33

first commit

25d95cd

speed task

c266694

speed task

cdb31a8

Sequence similarity task

d0d2c6a

roszcz reviewed Sep 14, 2023

View reviewed changes

Chords task

4a583da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pair programming #14

pair programming #14

wmilkowska commented Sep 13, 2023

wmilkowska commented Sep 13, 2023

roszcz Sep 14, 2023

roszcz Sep 14, 2023 •

edited

Loading

roszcz Sep 14, 2023

pair programming #14

Are you sure you want to change the base?

pair programming #14

Conversation

wmilkowska commented Sep 13, 2023

wmilkowska commented Sep 13, 2023

roszcz Sep 14, 2023

Choose a reason for hiding this comment

roszcz Sep 14, 2023 • edited Loading

Choose a reason for hiding this comment

roszcz Sep 14, 2023

Choose a reason for hiding this comment

roszcz Sep 14, 2023 •

edited

Loading