Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page-xfer error during TestUsernsCheckpoint in runc CI #2551

Open
kolyshkin opened this issue Dec 18, 2024 · 6 comments
Open

page-xfer error during TestUsernsCheckpoint in runc CI #2551

kolyshkin opened this issue Dec 18, 2024 · 6 comments

Comments

@kolyshkin
Copy link
Contributor

Description

While testing #2545 in runc CI (opencontainers/runc#4559), I saw this error three times for the last 36 or so hours. I don't think this is something I've seen before.

The error is always happens on AlmaLinux-8 in cirrus-ci. Here's a copy-paste of all three occurences.

  1. From https://cirrus-ci.com/task/6207980225429504
=== RUN   TestUsernsCheckpoint/0
=== RUN   TestUsernsCheckpoint/1
time="2024-12-17T06:57:27Z" level=warning msg="--- Quoting \"/tmp/TestUsernsCheckpoint12863306293/003/criu/dump.log\""
time="2024-12-17T06:57:27Z" level=warning msg="841:(00.111681) page-xfer: Transferring pages:"
time="2024-12-17T06:57:27Z" level=warning msg="842:(00.111682) page-xfer: \tbuf 1/1"
time="2024-12-17T06:57:27Z" level=warning msg="843:(00.111684) page-xfer: \tp 0x7fff903b8000 [1]"
time="2024-12-17T06:57:27Z" level=warning msg="844:(00.111689) page-xfer: \th 0x7fff903b9000 [1]"
time="2024-12-17T06:57:27Z" level=warning msg="845:(00.111691) page-xfer: Checking 0x7fff903b9000/4096 hole"
time="2024-12-17T06:57:27Z" level=warning msg="846:(00.111693) Error (criu/page-xfer.c:299): page-xfer: Missing 7fff903b9000 in parent pagemap"
time="2024-12-17T06:57:27Z" level=warning msg="847:(00.111697) Error (criu/page-xfer.c:342): page-xfer: Hole 0x7fff903b9000/4096 not found in parent"
time="2024-12-17T06:57:27Z" level=warning msg="848:(00.111716) page-pipe: Killing page pipe"
time="2024-12-17T06:57:27Z" level=warning msg="849:(00.111760) ----------------------------------------"
time="2024-12-17T06:57:27Z" level=warning msg="850:(00.111764) Error (criu/mem.c:672): Can't dump page with parasite"
time="2024-12-17T06:57:27Z" level=warning msg=...
time="2024-12-17T06:57:27Z" level=warning msg="860:(00.112043) net: Unlock network"
time="2024-12-17T06:57:27Z" level=warning msg="861:(00.112046) Running network-unlock scripts"
time="2024-12-17T06:57:27Z" level=warning msg="862:(00.112048) \tRPC"
time="2024-12-17T06:57:27Z" level=warning msg="863:(00.133784) Unfreezing tasks into 1"
time="2024-12-17T06:57:27Z" level=warning msg="864:(00.133799) \tUnseizing 97673 into 1"
time="2024-12-17T06:57:27Z" level=warning msg="865:(00.133822) Error (criu/cr-dump.c:2111): Dumping FAILED."
time="2024-12-17T06:57:27Z" level=warning msg=---
    checkpoint_test.go:118: criu failed: type DUMP errno 0
=== RUN   TestUsernsCheckpoint/2
=== RUN   TestUsernsCheckpoint/3
  1. From https://cirrus-ci.com/task/6751216950050816:
=== RUN   TestUsernsCheckpoint
=== RUN   TestUsernsCheckpoint/0
time="2024-12-17T07:10:49Z" level=warning msg="--- Quoting \"/tmp/TestUsernsCheckpoint02954809155/003/criu/dump.log\""
time="2024-12-17T07:10:49Z" level=warning msg="842:(00.186487) page-xfer: Transferring pages:"
time="2024-12-17T07:10:49Z" level=warning msg="843:(00.186489) page-xfer: \tbuf 1/1"
time="2024-12-17T07:10:49Z" level=warning msg="844:(00.186491) page-xfer: \tp 0x7ffde716c000 [1]"
time="2024-12-17T07:10:49Z" level=warning msg="845:(00.186498) page-xfer: \th 0x7ffde716d000 [1]"
time="2024-12-17T07:10:49Z" level=warning msg="846:(00.186499) page-xfer: Checking 0x7ffde716d000/4096 hole"
time="2024-12-17T07:10:49Z" level=warning msg="847:(00.186502) Error (criu/page-xfer.c:299): page-xfer: Missing 7ffde716d000 in parent pagemap"
time="2024-12-17T07:10:49Z" level=warning msg="848:(00.186506) Error (criu/page-xfer.c:342): page-xfer: Hole 0x7ffde716d000/4096 not found in parent"
time="2024-12-17T07:10:49Z" level=warning msg="849:(00.186529) page-pipe: Killing page pipe"
time="2024-12-17T07:10:49Z" level=warning msg="850:(00.186561) ----------------------------------------"
time="2024-12-17T07:10:49Z" level=warning msg="851:(00.186563) Error (criu/mem.c:672): Can't dump page with parasite"
time="2024-12-17T07:10:49Z" level=warning msg=...
time="2024-12-17T07:10:49Z" level=warning msg="861:(00.186977) net: Unlock network"
time="2024-12-17T07:10:49Z" level=warning msg="862:(00.186981) Running network-unlock scripts"
time="2024-12-17T07:10:49Z" level=warning msg="863:(00.186983) \tRPC"
time="2024-12-17T07:10:49Z" level=warning msg="864:(00.204552) Unfreezing tasks into 1"
time="2024-12-17T07:10:49Z" level=warning msg="865:(00.204578) \tUnseizing 95994 into 1"
time="2024-12-17T07:10:49Z" level=warning msg="866:(00.204602) Error (criu/cr-dump.c:2111): Dumping FAILED."
time="2024-12-17T07:10:49Z" level=warning msg=---
    checkpoint_test.go:118: criu failed: type DUMP errno 0
=== RUN   TestUsernsCheckpoint/1
=== RUN   TestUsernsCheckpoint/2
  1. From https://cirrus-ci.com/task/5627926906929152?logs=unit_tests_1#L20
=== RUN   TestUsernsCheckpoint
time="2024-12-18T04:09:25Z" level=warning msg="--- Quoting \"/tmp/TestUsernsCheckpoint1601804805/003/criu/dump.log\""
time="2024-12-18T04:09:25Z" level=warning msg="843:(00.143747) page-xfer: Transferring pages:"
time="2024-12-18T04:09:25Z" level=warning msg="844:(00.143748) page-xfer: \tbuf 1/1"
time="2024-12-18T04:09:25Z" level=warning msg="845:(00.143750) page-xfer: \tp 0x7ffcd5eec000 [1]"
time="2024-12-18T04:09:25Z" level=warning msg="846:(00.143756) page-xfer: \th 0x7ffcd5eed000 [1]"
time="2024-12-18T04:09:25Z" level=warning msg="847:(00.143758) page-xfer: Checking 0x7ffcd5eed000/4096 hole"
time="2024-12-18T04:09:25Z" level=warning msg="848:(00.143761) Error (criu/page-xfer.c:299): page-xfer: Missing 7ffcd5eed000 in parent pagemap"
time="2024-12-18T04:09:25Z" level=warning msg="849:(00.143764) Error (criu/page-xfer.c:342): page-xfer: Hole 0x7ffcd5eed000/4096 not found in parent"
time="2024-12-18T04:09:25Z" level=warning msg="850:(00.143793) page-pipe: Killing page pipe"
time="2024-12-18T04:09:25Z" level=warning msg="851:(00.143820) ----------------------------------------"
time="2024-12-18T04:09:25Z" level=warning msg="852:(00.143822) Error (criu/mem.c:672): Can't dump page with parasite"
time="2024-12-18T04:09:25Z" level=warning msg=...
time="2024-12-18T04:09:25Z" level=warning msg="862:(00.144124) net: Unlock network"
time="2024-12-18T04:09:25Z" level=warning msg="863:(00.144129) Running network-unlock scripts"
time="2024-12-18T04:09:25Z" level=warning msg="864:(00.144131) \tRPC"
time="2024-12-18T04:09:25Z" level=warning msg="865:(00.155348) Unfreezing tasks into 1"
time="2024-12-18T04:09:25Z" level=warning msg="866:(00.155361) \tUnseizing 96793 into 1"
time="2024-12-18T04:09:25Z" level=warning msg="867:(00.155389) Error (criu/cr-dump.c:2111): Dumping FAILED."
time="2024-12-18T04:09:25Z" level=warning msg=---
    checkpoint_test.go:113: criu failed: type DUMP errno 0
--- FAIL: TestUsernsCheckpoint (0.60s)

Steps to reproduce the issue:

I guess you can download runc and run make localunittest or go test -v -exec sudo -run Checkpoint ./libcontainer/integration.

Describe the results you received:

See above.

Describe the results you expected:

No errors.

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

I don't have full logs but let me know if it will be helpful if I amend CI to produce those.

CRIU version is from #2545

Additional environment details:

Cirrus-CI, almalinux-8 (see .cirrus.yml in opencontainers/runc#4559)

@kolyshkin
Copy link
Contributor Author

@kolyshkin
Copy link
Contributor Author

One more, this time it's non-userns test. https://cirrus-ci.com/task/6077670582124544?logs=unit_tests_1#L750

@kolyshkin
Copy link
Contributor Author

@rst0git
Copy link
Member

rst0git commented Dec 18, 2024

Error (criu/page-xfer.c:299): page-xfer: Missing 7fff903b9000 in parent pagemap

@kolyshkin Would it be possible to confirm that the parent images have not been modified by another test?

@kolyshkin
Copy link
Contributor Author

Error (criu/page-xfer.c:299): page-xfer: Missing 7fff903b9000 in parent pagemap

@kolyshkin Would it be possible to confirm that the parent images have not been modified by another test?

The source code for the test is here: https://github.com/opencontainers/runc/blob/main/libcontainer/integration/checkpoint_test.go

It does not use t.Parallel(), meaning the tests are running one-by-one. Even if we would, each test case run uses its own temp dir.

If you have any ideas of how to confirm this in any other way, let me know.

@kolyshkin
Copy link
Contributor Author

Note that:

  • I can only reproduce this on AlmaLinux 8 in Cirrus-CI environment
  • I can't repro this in Cirrus CI using AlmaLinux 9 (newer kernel etc) in an otherwise similar setup
  • I can't repro this using GHA CI (Ubuntu 20.04)
  • I can't repro this locally using AlmaLinux 8 in a vagrant vm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants