You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 17, 2024. It is now read-only.
Recently there was some work with BED files and RefSeq/Genbank chromosome IDs which typically have a period in them for versioning purposes (e.g. "NC_000001.11"). This is currently not allowed as-is in the spec. Only alphanumeric characters are allowed.
I e-mailed Jim Kent regarding this issue and this is what he had to say:
"Yes, I would consider this an error. All of our parsers are good with anything but white space there. Most of our utilities will handle spaces if you throw in a -tab option, but I wouldn't want to encourage that."
The text was updated successfully, but these errors were encountered:
There was another response from UCSC. Matthew Speir had this to say:
In short, we think periods should be allowed in an update to the BED specification...
bigBed, bigWig, and other big* formats similarly don't have restrictions on using periods in the chrom field.
The details and initial reasoning come from specifically an engineer there named Angie Hinrichs:
When we exclusively used MySQL for storage (before bigBed, etc), we split some of our largest tracks into a table per chromosome. For example, instead of a single table "xenoMrna" there would be separate tables chr1_xenoMrna, chr2_xenoMrna and so on. This meant only characters that could be used in MySQL table names without special quoting could be used for the chrom field, because they might end up as prefixes in mysql table names. As I'm sure you know, '.' has special meaning in SQL as a separator between database, table, and field.
However, we had to stop using "split tables" when we added new organisms whose assemblies consisted of tens of thousands or even hundreds of thousands of scaffold sequences -- that would just be way too many MySQL tables. That restriction still applied to old databases with split tables, but not to new databases after a certain point.
Hello,
Recently there was some work with BED files and RefSeq/Genbank chromosome IDs which typically have a period in them for versioning purposes (e.g. "NC_000001.11"). This is currently not allowed as-is in the spec. Only alphanumeric characters are allowed.
I e-mailed Jim Kent regarding this issue and this is what he had to say:
"Yes, I would consider this an error. All of our parsers are good with anything but white space there. Most of our utilities will handle spaces if you throw in a -tab option, but I wouldn't want to encourage that."
The text was updated successfully, but these errors were encountered: