Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

'APP' and 'STG' logs fail to show up #1708

Open
jbuns opened this issue Mar 15, 2021 · 7 comments
Open

'APP' and 'STG' logs fail to show up #1708

jbuns opened this issue Mar 15, 2021 · 7 comments
Assignees
Labels
Type: Bug Something isn't working

Comments

@jbuns
Copy link
Collaborator

jbuns commented Mar 15, 2021

Describe the bug
We’re currently facing issues with loggregator-bridge. When doing cf logs logs of type APP and STG fail to show up.

To Reproduce
We've seen failing in two different scenarios.

scenario 1: we’ve got a long-running deployment of kubecf with eirini and noticed that after a while, APP and STG logs stop appearing during cf logs. I’ve traced it down to loggregator-bridge. The pod logs looks like:

{"level":"info","ts":1615161473.9439654,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615161473.9440942,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Error:  unexpected EOF
Error:  unexpected EOF
Received non-pod object in watcher channel

scenario 2: after a fresh installation of kubecf+eirini on OpenShift 4.6 (k8s version 1.19), the cf logs fail to appear and the problem is the exact same as above.

Expected behavior
When doing cf logs I should be able to also see APP and STG logs.

Environment
KubeCF version: 2.7.12
Eirini version: 1.8
Kubernetes: 1.19

Additional context
This was tested on OpenShift 4.4 and 4.6

@jbuns jbuns added the Type: Bug Something isn't working label Mar 15, 2021
@jbuns
Copy link
Collaborator Author

jbuns commented Mar 19, 2021

Tested also on AKS and seeing the same problem:

$ k logs loggregator-bridge-59f5cb64bc-9scbb -n kubecf
{"level":"info","ts":1615563276.4147344,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615563276.414798,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Received non-pod object in watcher channel
Error:  unexpected EOF

@jandubois
Copy link
Member

@mudler Any ideas what this might be / where to look next?

@jbuns
Copy link
Collaborator Author

jbuns commented Mar 26, 2021

I've turned on DEBUG logging for loggregator-bridge and this is the error I'm seeing:

Starting Loggregator
{"level":"info","ts":1615410279.0113866,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615410279.0114636,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Received event:  {ERROR &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:too old resource version: 43522014 (43524698),Reason:Expired,Details:nil,Code:410,}}
Received non-pod object in watcher channel

In the code, I can see that the failure is happening here:
https://github.com/cloudfoundry-incubator/eirini-loggregator-bridge/blob/master/podwatcher/podwatcher.go#L293-L306

@mudler / @jandubois any suggestions on how we can try to fix this?

@jandubois
Copy link
Member

@jbuns Sorry, I know nothing about the eirini-loggregator-bridge, and have no time to learn about it.

Let's see if @mudler can give you hints next week; this week has been Hackweek at SUSE, so everyone has been working on other stuff... (FWIW, I spend half a day of my hackweek time yesterday on getting Eirini-1.8 to continue to work with the latest cf-deployment, so we don't have to drop it (yet) for the kubecf-2.8 releases).

@mudler
Copy link
Member

mudler commented Mar 30, 2021

It looks like we are receiving old events in the channel - this reminds me the work done in EiriniX cloudfoundry-incubator/eirinix#38 - is the loggregator-bridge using latest EiriniX including that fix? Otherwise, the alternative is specifying manually a ResourceVersion to start watch on.

From the error message, it looks the watcher is starting to listen on events which are old and not there anymore - while the above PR was meant to fetch the latest ResourceVersion during start to fix exactly that issue

@jbuns
Copy link
Collaborator Author

jbuns commented Mar 30, 2021

@mudler loggregator-bridge is using eirinix v0.3.1
https://github.com/cloudfoundry-incubator/eirini-loggregator-bridge/blob/master/go.mod#L4

so I'm assuming that it's got the fix you've mentioned since cloudfoundry-incubator/eirinix#38 was merged since v0.2.0:
cloudfoundry-incubator/eirinix@v0.2.0...master

Does that mean that the manager in eirinix is the one that's failing? Only difference I can see between the PR above and what's in the code now is this line:
https://github.com/cloudfoundry-incubator/eirinix/blob/master/manager.go#L298

@jbuns
Copy link
Collaborator Author

jbuns commented Mar 30, 2021

The status Message:too old resource version seems to be an expected behaviour according to kubernetes:
kubernetes/kubernetes#22024

It looks like podwatcher needs to be updated in order to handle this, rather than erroring out.

@mudler any preference on how I should fix this or should I just come up with the fix and it can be reviewed in a PR?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants