'APP' and 'STG' logs fail to show up #1708

jbuns · 2021-03-15T13:52:00Z

Describe the bug
We’re currently facing issues with loggregator-bridge. When doing cf logs logs of type APP and STG fail to show up.

To Reproduce
We've seen failing in two different scenarios.

scenario 1: we’ve got a long-running deployment of kubecf with eirini and noticed that after a while, APP and STG logs stop appearing during cf logs. I’ve traced it down to loggregator-bridge. The pod logs looks like:

{"level":"info","ts":1615161473.9439654,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615161473.9440942,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Error:  unexpected EOF
Error:  unexpected EOF
Received non-pod object in watcher channel

scenario 2: after a fresh installation of kubecf+eirini on OpenShift 4.6 (k8s version 1.19), the cf logs fail to appear and the problem is the exact same as above.

Expected behavior
When doing cf logs I should be able to also see APP and STG logs.

Environment
KubeCF version: 2.7.12
Eirini version: 1.8
Kubernetes: 1.19

Additional context
This was tested on OpenShift 4.4 and 4.6

The text was updated successfully, but these errors were encountered:

jbuns · 2021-03-19T16:40:41Z

Tested also on AKS and seeing the same problem:

$ k logs loggregator-bridge-59f5cb64bc-9scbb -n kubecf
{"level":"info","ts":1615563276.4147344,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615563276.414798,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Received non-pod object in watcher channel
Error:  unexpected EOF

jandubois · 2021-03-19T17:14:39Z

@mudler Any ideas what this might be / where to look next?

jbuns · 2021-03-26T17:16:14Z

I've turned on DEBUG logging for loggregator-bridge and this is the error I'm seeing:

Starting Loggregator
{"level":"info","ts":1615410279.0113866,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615410279.0114636,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Received event:  {ERROR &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:too old resource version: 43522014 (43524698),Reason:Expired,Details:nil,Code:410,}}
Received non-pod object in watcher channel

In the code, I can see that the failure is happening here:
https://github.com/cloudfoundry-incubator/eirini-loggregator-bridge/blob/master/podwatcher/podwatcher.go#L293-L306

@mudler / @jandubois any suggestions on how we can try to fix this?

jandubois · 2021-03-26T17:35:20Z

@jbuns Sorry, I know nothing about the eirini-loggregator-bridge, and have no time to learn about it.

Let's see if @mudler can give you hints next week; this week has been Hackweek at SUSE, so everyone has been working on other stuff... (FWIW, I spend half a day of my hackweek time yesterday on getting Eirini-1.8 to continue to work with the latest cf-deployment, so we don't have to drop it (yet) for the kubecf-2.8 releases).

mudler · 2021-03-30T07:04:04Z

It looks like we are receiving old events in the channel - this reminds me the work done in EiriniX cloudfoundry-incubator/eirinix#38 - is the loggregator-bridge using latest EiriniX including that fix? Otherwise, the alternative is specifying manually a ResourceVersion to start watch on.

From the error message, it looks the watcher is starting to listen on events which are old and not there anymore - while the above PR was meant to fetch the latest ResourceVersion during start to fix exactly that issue

jbuns · 2021-03-30T11:29:56Z

@mudler loggregator-bridge is using eirinix v0.3.1
https://github.com/cloudfoundry-incubator/eirini-loggregator-bridge/blob/master/go.mod#L4

so I'm assuming that it's got the fix you've mentioned since cloudfoundry-incubator/eirinix#38 was merged since v0.2.0:
cloudfoundry-incubator/eirinix@v0.2.0...master

Does that mean that the manager in eirinix is the one that's failing? Only difference I can see between the PR above and what's in the code now is this line:
https://github.com/cloudfoundry-incubator/eirinix/blob/master/manager.go#L298

jbuns · 2021-03-30T15:53:09Z

The status Message:too old resource version seems to be an expected behaviour according to kubernetes:
kubernetes/kubernetes#22024

It looks like podwatcher needs to be updated in order to handle this, rather than erroring out.

@mudler any preference on how I should fix this or should I just come up with the fix and it can be reviewed in a PR?

jbuns added the Type: Bug Something isn't working label Mar 15, 2021

jbuns assigned gaktive Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'APP' and 'STG' logs fail to show up #1708

'APP' and 'STG' logs fail to show up #1708

jbuns commented Mar 15, 2021

jbuns commented Mar 19, 2021

jandubois commented Mar 19, 2021

jbuns commented Mar 26, 2021 •

edited

Loading

jandubois commented Mar 26, 2021

mudler commented Mar 30, 2021 •

edited

Loading

jbuns commented Mar 30, 2021

jbuns commented Mar 30, 2021

'APP' and 'STG' logs fail to show up #1708

'APP' and 'STG' logs fail to show up #1708

Comments

jbuns commented Mar 15, 2021

jbuns commented Mar 19, 2021

jandubois commented Mar 19, 2021

jbuns commented Mar 26, 2021 • edited Loading

jandubois commented Mar 26, 2021

mudler commented Mar 30, 2021 • edited Loading

jbuns commented Mar 30, 2021

jbuns commented Mar 30, 2021

jbuns commented Mar 26, 2021 •

edited

Loading

mudler commented Mar 30, 2021 •

edited

Loading