Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated circulars are being distributed in batches #2639

Closed
1 of 2 tasks
dakota002 opened this issue Oct 21, 2024 · 10 comments · Fixed by #2655
Closed
1 of 2 tasks

Duplicated circulars are being distributed in batches #2639

dakota002 opened this issue Oct 21, 2024 · 10 comments · Fixed by #2655

Comments

@dakota002
Copy link
Contributor

dakota002 commented Oct 21, 2024

There have been at least 2 reports of users with Kafka consumers that are subscribed to gcn.circulars receiving upwards of hundreds of old circulars in apparently random times.

Acceptance Criteria:

  • Determine cause of the redistribution
  • Prevent it from happening again
@lpsinger lpsinger changed the title Circulars are being redistributed in batches for no reason Duplicated circulars are being distributed in batches Oct 21, 2024
@PeterBKramer
Copy link

Last night at 4:03am East Coast Time to 4:10 am East Coast Time there were 16 old GCN Circulars with "SC" numbers in them such as SC231206ca, SC190602aq and SC231224e. Many of the 16 were multiple repeats of the same circular.

@lpsinger
Copy link
Member

Here is the immediate cause. Once a week, we retrieve bibliographic entries from the Astrophysics Data System (ADS) and update the citation data for GCN Circulars. For several old GCN Circulars, the ADS bibcode alternates between two different values each week.

I am not sure yet whether this is due to a bug in our code or a problem in ADS.

@lpsinger
Copy link
Member

The problem is that there are some ADS entries with incorrect volume numbers. Once a week, the affected GCN Circulars are swapped between the correct or the incorrect ADS entries.

I sent the following to ADS:

Dear ADS,

We (the maintainers of the General Coordinates Network, https://gcn.nasa.gov/) have noticed some duplicated or incorrect ADS entries for GCN Circulars. There are two general categories:

There are between tens and hundreds of each category. What is the most efficient way to correct these ADS records?

Thanks,
Leo

@PeterBKramer
Copy link

PeterBKramer commented Oct 29, 2024 via email

@lpsinger
Copy link
Member

Thanks, @PeterBKramer, but now that this has been brought to our attention we don't need any more data on the duplicate Kafka records. I used our code snippet for replaying records from the earliest available offset to find all the duplicates in the past two weeks, and I have a good sense of what is going on now.

If, on the other hand, you receive duplicate emails please let us know.

lpsinger added a commit to lpsinger/gcn.nasa.gov that referenced this issue Oct 30, 2024
Explain why duplicate GCN Circular records occur on the
`gcn.circulars` Kafka topic in normal operation.

There are still _abnormal_ duplicates that are currently occurring
due to some errors in ADS' records for GCN Circulars.

See nasa-gcn#2639.
@lpsinger
Copy link
Member

See #2654 for partial fix.

@lpsinger
Copy link
Member

lpsinger commented Nov 4, 2024

I received the following from ADS:

I can fix the 2nd category of problems in bulk. They should be correct after the weekend update.
For the first category of problems, it is usually because we got them from two different sources with conflicting
author information (including whether the collaboration should be the author or not). Since we have no way to
know which is correct, it would be best if you can send us that information - I just need to know which of the
pair of bibcodes is the good one.

@lpsinger lpsinger reopened this Nov 4, 2024
@PeterBKramer
Copy link

PeterBKramer commented Nov 4, 2024 via email

@lpsinger
Copy link
Member

lpsinger commented Nov 4, 2024

@PeterBKramer, thank you, but we have more than sufficient information now to reproduce this. We are still working on correcting the ADS entries, which will eliminate the unintentional duplicate messages.

There will always be some intentional duplicates; see our FAQ here: https://gcn.nasa.gov/docs/faq#why-do-i-receive-duplicates-of-old-gcn-circulars-over-kafka-on-the-gcncirculars-topic.

@lpsinger
Copy link
Member

lpsinger commented Nov 4, 2024

Fixed. Now tracking corrections to duplicate entries in ADS as #2656.

@lpsinger lpsinger closed this as completed Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants