Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration option to output logs in logfmt #1022

Open
wants to merge 8 commits into
base: unstable
Choose a base branch
from

Conversation

azuredream
Copy link

See more disscussion:
redis/redis#12934

Add ability to configure Redis to output logs in logfmt (See https://brandur.org/logfmt) as well as configure timestamp format options to more standard ISO 8601 or unix timestamp.

This change is implemented by two configs:
log-format: Either default or logfmt.
log-timestamp-format: default, iso8601, or unix.

#1006

$ ./valkey-server  /home/zhaoz12/git/valkey/valkey/valkey.conf
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=warning message="WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect."
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Valkey version=255.255.255, bits=64, commit=affbea5d, modified=1, pid=109463, just started"
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Configuration loaded"
pid=109463 role=master timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="monotonic clock: POSIX clock_gettime"
pid=109463 role=master timestamp="2024-09-10T20:37:25.739-04:00" level=warning message="Failed to write PID file: Permission denied"

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 24.07407% with 41 lines in your changes missing coverage. Please review.

Project coverage is 70.62%. Comparing base (262d970) to head (3b0084d).
Report is 18 commits behind head on unstable.

Files with missing lines Patch % Lines
src/server.c 24.07% 41 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1022      +/-   ##
============================================
+ Coverage     70.61%   70.62%   +0.01%     
============================================
  Files           114      114              
  Lines         61734    63125    +1391     
============================================
+ Hits          43595    44584     +989     
- Misses        18139    18541     +402     
Files with missing lines Coverage Δ
src/config.c 78.84% <ø> (+0.14%) ⬆️
src/server.h 100.00% <ø> (ø)
src/server.c 87.67% <24.07%> (-0.97%) ⬇️

... and 90 files with indirect coverage changes

@madolson
Copy link
Member

😀 thanks for opening this, we're working on launching the next version but will take a look shortly.

@PingXie
Copy link
Member

PingXie commented Sep 12, 2024

I like this change. Free-form log lines can be tricky to parse and search, and this feels like a good step toward adopting more structured logging. We could even extend it further to include properties like "area" (e.g., "cluster," "normal replication," "dual channel replication," etc.). Thanks for your work on this, @azuredream!

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Sep 12, 2024
src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
@madolson
Copy link
Member

madolson commented Oct 6, 2024

@valkey-io/core-team Please indicate your vote on this comment. The open questions I still have are if we want to also do the larger investigation before 8.1 launches about also including the component the log was generated from. We also have the JSON format to add on.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this before in the open source redis time. I like it, just a few comments.

It seems that JSON is more popular. We can add that too in another PR.

Configurable time format is good.

src/server.h Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/config.c Outdated
Comment on lines 127 to 129
configEnum log_timestamp_format_enum[] = {{"default", LOG_TIMESTAMP_DEFAULT},
{"iso8601", LOG_TIMESTAMP_ISO8601},
{"unix", LOG_TIMESTAMP_UNIX},
Copy link
Member

@madolson madolson Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
configEnum log_timestamp_format_enum[] = {{"default", LOG_TIMESTAMP_DEFAULT},
{"iso8601", LOG_TIMESTAMP_ISO8601},
{"unix", LOG_TIMESTAMP_UNIX},
configEnum log_timestamp_format_enum[] = {{"legacy", LOG_TIMESTAMP_LEGACY},
{"iso8601", LOG_TIMESTAMP_ISO8601},
{"milliseconds", LOG_TIMESTAMP_MILLISECONDS},

@azuredream Was there a reason we added the unix milliseconds timestamp option? While discussing it, we weren't sure if it was needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@madolson The PR currently emits it as seconds, not milliseconds, since the epoch.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the epoch, I think millisecond timestamp makes a lot more sense. A lot of important events happen sub-second.

@madolson madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Oct 7, 2024
src/server.c Outdated
snprintf(buf + off, sizeof(buf) - off, "%03d%s", (int)tv.tv_usec / 1000, tzbuf);
break;

case LOG_TIMESTAMP_UNIX: snprintf(buf, sizeof(buf), "%ld", tv.tv_sec); break;
Copy link
Member

@madolson madolson Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in milliseconds, not seconds.

@azuredream
Copy link
Author

@madolson @zuiderkwast
PR updated as you suggested.
Thanks for reviewing.

@azuredream
Copy link
Author

azuredream commented Oct 12, 2024

Demo logs:

log-format logfmt
log-timestamp-format milliseconds

pid=557888 role=RDB/AOF timestamp="1728757076.562" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"

log-format legacy
log-timestamp-format milliseconds

558253:C 1728757310.034 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo

log-format logfmt
log-timestamp-format iso8601

pid=558427 role=RDB/AOF timestamp="2024-10-12T14:24:28.141-04:00" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"

log-format legacy
log-timestamp-format iso8601

558523:C 2024-10-12T14:24:57.923-04:00 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo

log-format logfmt
log-timestamp-format legacy

pid=558599 role=RDB/AOF timestamp="12 Oct 2024 14:25:59.813" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"

log-format legacy
log-timestamp-format legacy

558689:C 12 Oct 2024 14:26:30.872 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo

src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
@madolson madolson self-requested a review October 15, 2024 04:41
src/server.c Outdated
/* Low level logging. To use only for very big messages, otherwise
* serverLog() is to prefer. */
void serverLogRaw(int level, const char *msg) {
void serverLogRaw(int level, char *msg) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutability modified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I hope this works. I see many places in the code where this function is called with a string literal, like

serverLogRaw(LL_WARNING, "Failed applying new module configuration. Restoring previous settings.");

This can't be mutable, I think...

src/server.c Outdated
Comment on lines 140 to 146
while (msg[src] != '\0') {
if (msg[src] == '"') {
msg[dst++] = '\'';
} else if (msg[src] != '\n' && msg[src] != '\r') {
msg[dst++] = msg[src];
}
src++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: If we replace \r and \n with a space instead, then we don't need to copy/move the rest of the string. We don't need the dst index.

Suggested change
while (msg[src] != '\0') {
if (msg[src] == '"') {
msg[dst++] = '\'';
} else if (msg[src] != '\n' && msg[src] != '\r') {
msg[dst++] = msg[src];
}
src++;
while (msg[src] != '\0') {
if (msg[src] == '"') {
msg[src] = '\'';
} else if (msg[src] == '\n' && msg[src] == '\r') {
msg[src] = ' ';
}
src++;

src/server.c Outdated Show resolved Hide resolved
@azuredream
Copy link
Author

Code updateded.

src/server.c Outdated Show resolved Hide resolved
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.

One of the commits is missing sign-off. See the Details link on the DCO CI job. Can you fix it? You can squash and force-push as you want.

See also the clang-format CI job. It's a trailing whitespace on one line.

Is the Valkey logo logged on startup? It must look very weird with logfmt?

src/server.c Outdated
} else {
role_char = (server.primary_host ? 'S' : 'M'); /* replica or Primary. */
role_index = (server.primary_host ? 2 : 3); /* Slave or Master. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should avoid slave and master even in comments.

Suggested change
role_index = (server.primary_host ? 2 : 3); /* Slave or Master. */
role_index = (server.primary_host ? 2 : 3); /* replica or primary. */

@azuredream
Copy link
Author

Is the Valkey logo logged on startup? It must look very weird with logfmt?

Actually it looks fine

pid=716151 role=RDB/AOF timestamp="1729404137284" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"
pid=716151 role=RDB/AOF timestamp="1729404137284" level=notice message="Valkey version=255.255.255, bits=64, commit=ca0cedf5, modified=1, pid=716151, just started"
pid=716151 role=RDB/AOF timestamp="1729404137284" level=notice message="Configuration loaded"
pid=716151 role=primary timestamp="1729404137285" level=notice message="monotonic clock: POSIX clock_gettime"
pid=716151 role=primary timestamp="1729404137286" level=warning message="Failed to write PID file: Permission denied"
                .+^+.                                                
            .+#########+.                                            
        .+########+########+.           Valkey 255.255.255 (ca0cedf5/1) 64 bit
    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in standalone mode
 |####+'     .+#######+.     '+####|    Port: 6379
 |###|   .+###############+.   |###|    PID: 716151                     
 |###|   |#####*'' ''*#####|   |###|                                 
 |###|   |####'  .-.  '####|   |###|                                 
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io      
 |###|   |####.  '-'  .####|   |###|                                 
 |###|   |#####*.   .*#####|   |###|                                 
 |###|   '+#####|   |#####+'   |###|                                 
 |####+.     +##|   |#+'     .+####|                                 
 '#######+   |##|        .+########'                                 
    '+###|   |##|    .+########+'                                    
        '|   |####+########+'                                        
             +#########+'                                            
                '+v+' 

@azuredream
Copy link
Author

Updated comment, fixed format, fixed unsigned commit.

I appreciate your continuous effort to review my contributions and provide valuable suggestions for improvements.
Thanks to all of you.
@zuiderkwast @madolson @PingXie

@zuiderkwast
Copy link
Contributor

I appreciate your continuous effort to review my contributions and provide valuable suggestions for improvements.

Thanks, that's our job. :) Thanks for your patience!

It's a bit weird that we print role=RDB/AOF for the messages during startup, like "oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo". I guess it was always a "C" for these messages, but I never noticed it.

@azuredream
Copy link
Author

It's a bit weird that we print role=RDB/AOF for the messages during startup, like "oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo". I guess it was always a "C" for these messages, but I never noticed it.

Please refer to the discussion around Jan 11th. We introduced reader friendly role string Intentionally.
redis/redis#12934

    I suggest role=%c (instead of role_char=%c). Is one char clear enough? If not, we can consider using role=master, role=sentinel, etc. but it makes the log longer. Maybe one char is clear enough?

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Oct 20, 2024

I know we want reader-friendly role. What I think is weird is that we print RDB/AOF during startup. This is not a forked child process performing RDB dump or AOF rewrite.

Now I see why: server.pid has not been initialized yet when these line are logged. (I think we can initialize server.pid earlier, but that is for another PR. It's a separate discussion.)

@azuredream
Copy link
Author

Now I see why: server.pid has not been initialized yet when these line are logged. (I think we can initialize server.pid earlier, but that is for another PR. It's a separate discussion.)

I think that means I don't need to update this PR.
I'll be glad to help if there is any further requirement.

@zuiderkwast
Copy link
Contributor

Now I see why: server.pid has not been initialized yet when these line are logged. (I think we can initialize server.pid earlier, but that is for another PR. It's a separate discussion.)

I think that means I don't need to update this PR. I'll be glad to help if there is any further requirement.

That's right, I hope this PR is ready to merge now. @madolson do you have any more comments before we merge this?

@azuredream If you want to make a follow-up PR for the role during startup, please continue reading...

  1. I'm looking at where server.pid = getpid() is done. Currently, it's done in initServer() but it would be better to do it before the first logging happens, in main() before we load config files. This is a simple fix. With this change, we will print "primary" instead of "RDB/AOF" during startup, which I think makes more sense. The server starts as a primary. Do you want to do it in a follow-up PR?

  2. Another potential problem. Actually, I don't think this one is a problem we can solve, but maybe something we can mention in the documentation later.

    I see that loading config is done early during startup. It can happen that some messages are logged before we set the logging format config, so maybe we get some log lines with the legacy format before we switch to logfmt. That's also not good if we want to parse the file using logfmt tools.

    If we fail to load a config file, we log "Fatal error, can't open config file '%s': %s" and exit. We always load a config file before we read any config provided on the command line (in valkey-server dummy.conf --log-format logfmt, the file is loaded first) so there is no way to get this logged in logfmt format. We probably don't want to change the behaviour to load configs on the command line before the file. That would be a breaking change, I think.

    Another message is when we load config from stdin.

     serverLog(LL_NOTICE, "Reading config from stdin");
    

    To read config from stdin, you pass "-" on the command line. I tried this and I used echo to pipe some config to it. I got one line of "legacy" log before switching to logfmt:

    $ echo "save ''" | ./valkey-server --log-format logfmt -
    514332:C 21 Oct 2024 10:53:37.887 * Reading config from stdin
    pid=514332 role=RDB/AOF timestamp="21 Oct 2024 10:53:37.887" level=warning message="WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect."
    pid=514332 role=RDB/AOF timestamp="21 Oct 2024 10:53:37.888" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"
    ...
    

    One line of legacy log format, before we switch to logfmt. I don't know if we can solve this. Just like when loading from a file, we load the stdin before any command line config flags.

@azuredream
Copy link
Author

  1. I'm looking at where server.pid = getpid() is done. Currently, it's done in initServer() but it would be better to do it before the first logging happens, in main() before we load config files. This is a simple fix. With this change, we will print "primary" instead of "RDB/AOF" during startup, which I think makes more sense. The server starts as a primary. Do you want to do it in a follow-up PR?

Sure. Please feel free to create an issue and assign to me.

@zuiderkwast
Copy link
Contributor

Please feel free to create an issue and assign to me.

We don't need issues for everything, but if you insist...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants