-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated circulars are being distributed in batches #2639
Comments
Last night at 4:03am East Coast Time to 4:10 am East Coast Time there were 16 old GCN Circulars with "SC" numbers in them such as SC231206ca, SC190602aq and SC231224e. Many of the 16 were multiple repeats of the same circular. |
Here is the immediate cause. Once a week, we retrieve bibliographic entries from the Astrophysics Data System (ADS) and update the citation data for GCN Circulars. For several old GCN Circulars, the ADS bibcode alternates between two different values each week. I am not sure yet whether this is due to a bug in our code or a problem in ADS. |
The problem is that there are some ADS entries with incorrect volume numbers. Once a week, the affected GCN Circulars are swapped between the correct or the incorrect ADS entries. I sent the following to ADS:
|
If you need further duplicates, I can, unfortunately, retrieve a large list of the repeated Circulars that start with “S”.
For example, Circular 35424 for S231224e was received on
Oct 28, 2024 08:08:32 UT
Oct 28, 2024 08:04:12 UT
Oct 21, 2024
Oct 14, 2024
Oct 14, 2024
Oct 07, 2024
Oct 07, 2024
Sep 30, 2024
Sep 30, 2024
in addition to the original on
Dec 23, 2023
Peter
… On Oct 29, 2024, at 4:07 PM, Leo Singer ***@***.***> wrote:
The problem is that there are some ADS entries with incorrect volume numbers. Once a week, the affected GCN Circulars are swapped between the correct or the incorrect ADS entries.
I sent the following to ADS:
Dear ADS,
We (the maintainers of the General Coordinates Network, https://gcn.nasa.gov/) have noticed some duplicated or incorrect ADS entries for GCN Circulars. There are two general categories:
Pairs of duplicate ADS entries, for example https://ui.adsabs.harvard.edu/abs/2023GCN.35446....1L/abstract and https://ui.adsabs.harvard.edu/abs/2023GCN.35446....1Z/abstract.
Entries for which the volume number is saved incorrectly, although the volume is correct in the bibcode. For example, https://ui.adsabs.harvard.edu/abs/2009GCN.10001....1H/abstract. The bibcode is 2009GCN.10001....1H but the volume number is stored 101. The correct volume number is 10001.
There are between tens and hundreds of each category. What is the most efficient way to correct these ADS records?
Thanks,
Leo
—
Reply to this email directly, view it on GitHub <#2639 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV5GOCTFIYCLV6ABS3LZ57TIXAVCNFSM6AAAAABQKRKZOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBVGIZDKOBWG4>.
You are receiving this because you commented.
|
Thanks, @PeterBKramer, but now that this has been brought to our attention we don't need any more data on the duplicate Kafka records. I used our code snippet for replaying records from the earliest available offset to find all the duplicates in the past two weeks, and I have a good sense of what is going on now. If, on the other hand, you receive duplicate emails please let us know. |
Explain why duplicate GCN Circular records occur on the `gcn.circulars` Kafka topic in normal operation. There are still _abnormal_ duplicates that are currently occurring due to some errors in ADS' records for GCN Circulars. See nasa-gcn#2639.
See #2654 for partial fix. |
I received the following from ADS:
|
This morning at 8:06-8:09 UT I received repeated Circulars with the following NUMBER values:
37884, 37778,37777,35297,24718
These are repeated circulars with a SUBJECT containing “LIGO/Virgo”. There may have been additional repeated Circulars without that phrase in their SUBJECT.
Current Circulars contain NUMBER values that increase with each Circular and are currently greater than 38070
Peter B Kramer
***@***.***
… On Nov 4, 2024, at 8:33 AM, Leo Singer ***@***.***> wrote:
I received the following from ADS:
I can fix the 2nd category of problems in bulk. They should be correct after the weekend update.
For the first category of problems, it is usually because we got them from two different sources with conflicting
author information (including whether the collaboration should be the author or not). Since we have no way to
know which is correct, it would be best if you can send us that information - I just need to know which of the
pair of bibcodes is the good one.
—
Reply to this email directly, view it on GitHub <#2639 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADREQV7RKAN6IXFOUD5VQW3Z65STFAVCNFSM6AAAAABQKRKZOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJUG4ZDSOJVGY>.
You are receiving this because you were mentioned.
|
@PeterBKramer, thank you, but we have more than sufficient information now to reproduce this. We are still working on correcting the ADS entries, which will eliminate the unintentional duplicate messages. There will always be some intentional duplicates; see our FAQ here: https://gcn.nasa.gov/docs/faq#why-do-i-receive-duplicates-of-old-gcn-circulars-over-kafka-on-the-gcncirculars-topic. |
Fixed. Now tracking corrections to duplicate entries in ADS as #2656. |
There have been at least 2 reports of users with Kafka consumers that are subscribed to
gcn.circulars
receiving upwards of hundreds of old circulars in apparently random times.Acceptance Criteria:
The text was updated successfully, but these errors were encountered: