-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of the continuous-time format #56
Addition of the continuous-time format #56
Conversation
- Standard datetime format : **UTC** | ||
> The UTC datetime format is adopted to consider the following levels: year, month, day, hour, minute, and second (without consider time zones). | ||
> For example: `2020-01-01, 2020-01-02, ...` or `2020-01-01T13:00, ...` | ||
> Inclusion of attributes is possible such as duration (More details [here](https://github.com/openENTRANCE/nomenclature/blob/master/nomenclature/definitions/subannual/months.yaml)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more details link opens a file describing months?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldnt be written because we are proposing the use of codes to generate list of date and datetimes. In the same way, there will be similar codes in validation function of the nomenclature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erikfilias for cleaning up the proposal! What confuses me, though, is that the description of this PR has a lot more information than the actual README in the PR - can you please include all relevant info in the README?
start_dt = datetime(2020, 6, 23) | ||
end_dt = datetime(2020, 6, 30) | ||
for dt in DateTimeRange(start_dt, end_dt): | ||
print(dt.strftime("%Y-%m-%d %H:%M:%S")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snippet is not consistent with the example given above!
> The UTC datetime format is adopted to consider the following levels: year, month, day, hour, minute, and second (without consider time zones). | ||
> For example: `2020-01-01, 2020-01-02, ...` or `2020-01-01T13:00, ...` | ||
> The use of lists of dates and datetimes will be generated by the codes of the next section. | ||
- Using time zone : **no relevance** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think writing "no relevance" is misleading - this might be clearly relevant for some contexts, but we should rather explain that we do not include special treatment for identifying time zones and assume everything is in UTC-Z.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got it.
print(dt.strftime("%Y-%m-%d %H:%M:%S")) | ||
``` | ||
|
||
The format `"%Y-%m-%d %H:%M:%S"` is composed by tokens. Each token represents a different part of the date-time, like day, month, year, etc. More details can be found in [strftime() and strptime() section](https://docs.python.org/3/library/datetime.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such section yet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, right. It should be written "strftime() and strptime() Format Codes".
From our (@tperger) point of view, the current representation of the months is not intuitive and consistent with the extension in this PR regarding the hourly and weekly resolution of the time series. The monthly descriptions (January, February, ...) are currently entered in the column "Subannual". We suggest instead Monthly in this column. The columns with the UTC time format will then not differ between day, week and month. Examples to make it clearer:
|
I agree in part with you about Monthly in column. In addition, we could maintain the UTC time format in "Subannual" in the representation of the months. Cause in a daily or hourly representation we could have several columns. For example, I'm planning to include data about 8760 hours. |
You are right @erikfilias, we might end up with many columns. But imo having time stamps in the rows and columns is rather confusing. It would be more intuitive to have one line per time series and the subannual columns telling us what it is (e.g. the resolution) |
In my opinion, there are three concerns:
I don't see how this proposal increases intuition or improves the third point... |
@danielhuppmann . We could include a TimeStep parameter in our data as we made in openTEPES. Cause at first, we start with hourly resolution for a year, and after that a the LoadLevels (timestamps) are averaged automatically according to the TimeStep (e.g., every 2 hours). @tperger , you right. This idea of using columns could be used for representative timeslices to reduce the rows in the file, I think. |
Just so I understand: What is the intuitive equivalent of "January" for the first day or week of the year? (This might be a question at the other levels of aggregation mentioned above I guess.)
As far as I understood it the attribute |
Regarding the weeks, should not we include week1, week2 (or W1, W2) like what is used in our favourite calendars (I do not know your uses, but here we often relate to W23 as the 23rd week of the year, and the definition is the same for everyone; weel1 is the week which comprises january1st . |
I thought we had decided not to add columns? (only the subannual one) and treat all data in different lines to keep consistancy. But I may have missed sth.... |
I should think that kind of parameter is not a data but a model parameter and probably should not be included in the data, no? |
Indeed, there was no decision to change the columns, and I also do not see any convincing argument why a change would be better (more intuitive and/or easier to process). |
I am also going to incude hourly data (8760 per year then), but I had planned to have one line per hour, so that we can easily use the aggregation/disaggregation functions.... and morover I doubt we can host files with 8766 columns.... (where files with too many lines can easily be cut into pieces) |
I personnaly see arguments against... (all my data are organised with 1 hour per line, changing this would necessitate more transformations - not complex but as the volumetry of the data is huge, long to process.... plus the fact that file with 8766 columns are not handable.....). (thanks for your answer, I was a bit worried having to do all this....) |
Responding to @sebastianzwickl
The duration attribute can be included where it is obvious and where it can be helpful in the automated use of (dis)aggregation tools in the post-processing (e.g., take all timeslices that have the attribute
This is the crux of the matter: say you have Option 1: You can define two sets of subannual timesteps, where Option 2: Both models provide their data directly in UTC-compatible format |
And to make matters more complicated (I'm sorry): @erikfilias and @sandrinecharousset are going to submit data at a resolution of 8760 hours per year. Is So we should find an agreement whether we follow the actual calendar years in using UTC (with leapyears?), or if we construct a simplified "hypothetical" archetype year. |
Regarding last comment from Daniel there is no problem for us to use UTC style to enumerate the 8760 hours of a year and internally the model is not affected by a year starting on Wednesday, Friday or whatsoever. In any case it should be responisbility of the modeler to take this into account and to make it consistent with the load, for example. |
In our case to assume a Monday, Tuesday,... for the |
The data I am going to include have been treated so to fit to the calendar of 2050; in particular regarding electricity loads, there are dependencies with the calendar: seasons, but also the profile of a week, specific profiles for week-days, week-end days, fridays are different also, and also bank holidays, and all school holidays of all countries.... 1/1/2050 is a saturday. |
just to complement, we do not need to tell the model that 1/1/2050 is a saturday, there are plenty of nice functions that compute that easily..... |
Ok, so there is a consensus that we use "real" calendar days. So for being specific, this implies that the years 2020, 2024, etc. have 8784 hours (not 8760). @erikfilias can you please update the proposed changes to the README:
|
responding to @sandrinecharousset
Can we please hold off on that discussion for a new PR after this one is merged? The discussion on how to deal with UTC and (multi-)hourly data seems already complicated enough... |
yes of course, anyway we already can deal with weekly data provided that we use UTC for dates and a duration of 168 hours....Not even sure any model would require to define weeks |
thanks @erikfilias, I'm confused... we previously always discussed to use UTC-format, but now, the description in the README says that we use Central European Time... So if a team uploads a timestamp |
Yeah sorry, it should be Everything is in the same time zone (Berlin, Vienna, Paris, Madrid, etc.). I corrected it. Should I rebase the PR to include "stickler-ci" file? |
I think the worst thing to do when establishing a data-sharing convention is to come up with something that is almost identical to a very well-established format - in this case, you propose to use Two options to resolve this:
|
For me It's ok option 2, since we'll use the complete format. I'll try to include a clarified information about it in README if you agree. |
@openENTRANCE/coordination, do you agree that we should require to add the CET-timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@openENTRANCE/coordination, do you agree that we should require to add the CET-timestamp
+01:00
(option 2 above)?
Hi, yes I do :-)
|
||
- Accumulating values over the time span: | ||
> A value at the start of the period for stock over the time-period until the start of the next timeslice. | ||
> For example, Capacity or Reservoir Level at 2020-01-01T13:00 is the value at 1pm, if subannual is given as 2020-01-01, it is understood as midnight that day, if it's January, it is understood as midnight on the first day of the month. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unclear :
- capacity is not a stock variable (unit is MW), whereas Reservoir level is;
- For variables such as "storage level", the definition is fine
- But what about variables such as Inflows? Should be " the value given for a timestamp eg 2020-01-01T13:00 is the accumulation between this timestamp and the following (which means that if eg an inflow value is given in MWh for 2020-01-01T13:00 and next time stamp is 2020-01-02T13:00, if one needs the hourly value between those 2 timestamps, he has to divide by the duration, here 24);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just add a 3rd bullet for defining what it means for values such as inflows? Other variables may be included in that class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm getting your point, a 3rd bullet could be added to clarify if the value is divided automatically or avoid this assumption. I prefer the second option, so we could fall into many details that can be managed before. I think that we should upload data after the aggregation/disaggregation process and try to avoid any automatic math operation. Any opinion?
> To distinguish between different granularity levels of representative timeslices, It was proposed the following: `<Granularity>|<Name of timeslice>`. For example: `2 Season-2 Times|Summer-Day`. | ||
|
||
- Averaging values over the time span : | ||
> A value is always the average for flow variables (i.e. It is the average between the subannual time and the subsequent one, where average is contingent on the lowest level of granularity - if you use 2020-01-01T13:00, it is hourly average, if you use 2020-01-01, it is daily average...). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you mean by 'flow' variables (for me it is unclear). It is true that for variables such as 'capacity', 'flow in a line', 'generation' the value given on one time stamp is the mean value on the duration,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related to power flow, inflows, etc (e.g. any fluids, I think).
- Calendar : **Use of real days** | ||
> For any model with an hourly resolution, it assumes that uses 8760 hours for a common year and 8784 hours for leap years as 2020, 2024, etc. | ||
|
||
- Using time zone : **Everything is in UTC+01:00 (Central European Time)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that using central european time may simplify the conversion process for most of us, as the majority of partners (except khas I think) are located in central europe.... but anyway converting from one time zone to another is not that difficult.... (except that when you have time series over one year you 'lose' the first or last hours of the time serie.....
Yes, I also agree on this. |
In this explanation, a brief description of using the format and some code to get list of dates and datetimes
Co-authored-by: Daniel Huppmann <[email protected]>
I added a description related to the use of UTC format and indications about hour duration, calendar ans summer/winter tim in the continuous-time format
aebfaf7
to
ca32b8f
Compare
It is updating and improving in #66 . |
Main details about the continuous-time format (daily and hourly data):
Then, this PR considers an update of the README related to subannual section, and scripts to get a list of dates and hours, respectively.
The format and details for representative timeslices will be performed in a new PR, taking in account the following: