-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guest shuts-off with call trace after migrating between Boston and ZZ systems #19
Comments
------- Comment From [email protected] 2017-10-23 04:31:48 EDT------- Guest is shutting down due to tty driver crashing with ISI exception. |
------- Comment From [email protected] 2017-10-23 05:05:11 EDT------- > Guest is shutting down due to tty driver crashing with ISI exception. Hi Leonardo, Guest is crashing with ISI execption in tty driver soon after migration. It doesn't |
------- Comment From [email protected] 2017-12-06 15:10:36 EDT-------
I tried migrating with 2 different guests: Ubuntu 17.04 with kernel 4.10.0-28-ppc64le and Fedora26 with kernel 4.13.16-202.fc26.ppc64le. With both guests, migration between ZZ -> Boston Boston -> ZZ worked without problems. I see that the bug was opened against QEMU 2.10.0-2. It is likely that whatever caused the bug to happen is already fixed upstream. |
Direct leak of 160 byte(s) in 4 object(s) allocated from: #0 0x55ed7678cda8 in calloc (/home/elmarco/src/qq/build/x86_64-softmmu/qemu-system-x86_64+0x797da8) #1 0x7f3f5e725f75 in g_malloc0 /home/elmarco/src/gnome/glib/builddir/../glib/gmem.c:124 #2 0x55ed778aa3a7 in query_option_descs /home/elmarco/src/qq/util/qemu-config.c:60:16 #3 0x55ed778aa307 in get_drive_infolist /home/elmarco/src/qq/util/qemu-config.c:140:19 #4 0x55ed778a9f40 in qmp_query_command_line_options /home/elmarco/src/qq/util/qemu-config.c:254:36 #5 0x55ed76d4868c in qmp_marshal_query_command_line_options /home/elmarco/src/qq/build/qmp-marshal.c:3078:14 #6 0x55ed77855dd5 in do_qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:104:5 #7 0x55ed778558cc in qmp_dispatch /home/elmarco/src/qq/qapi/qmp-dispatch.c:131:11 #8 0x55ed768b592f in handle_qmp_command /home/elmarco/src/qq/monitor.c:3840:11 #9 0x55ed7786ccfe in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105:5 #10 0x55ed778fe37c in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323:13 #11 0x55ed778fdde6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373:15 #12 0x55ed7786cd83 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124:12 #13 0x55ed768b559e in monitor_qmp_read /home/elmarco/src/qq/monitor.c:3882:5 #14 0x55ed77714f29 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:167:9 #15 0x55ed77714fde in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:179:9 #16 0x55ed7772ffad in tcp_chr_read /home/elmarco/src/qq/chardev/char-socket.c:440:13 #17 0x55ed7777113b in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84:12 #18 0x7f3f5e71d90b in g_main_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3182 #19 0x7f3f5e71e7ac in g_main_context_dispatch /home/elmarco/src/gnome/glib/builddir/../glib/gmain.c:3847 #20 0x55ed77886ffc in glib_pollfds_poll /home/elmarco/src/qq/util/main-loop.c:214:9 #21 0x55ed778865fd in os_host_main_loop_wait /home/elmarco/src/qq/util/main-loop.c:261:5 #22 0x55ed77886222 in main_loop_wait /home/elmarco/src/qq/util/main-loop.c:515:11 #23 0x55ed76d2a4df in main_loop /home/elmarco/src/qq/vl.c:1995:9 #24 0x55ed76d1cb4a in main /home/elmarco/src/qq/vl.c:4914:5 #25 0x7f3f555f6039 in __libc_start_main (/lib64/libc.so.6+0x21039) Signed-off-by: Marc-André Lureau <[email protected]> Reviewed-by: Eric Blake <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
Spotted thanks to ASAN: ==25226==ERROR: AddressSanitizer: global-buffer-overflow on address 0x556715a1f120 at pc 0x556714b6f6b1 bp 0x7ffcdfac1360 sp 0x7ffcdfac1350 READ of size 1 at 0x556715a1f120 thread T0 #0 0x556714b6f6b0 in init_disasm /home/elmarco/src/qemu/disas/s390.c:219 #1 0x556714b6fa6a in print_insn_s390 /home/elmarco/src/qemu/disas/s390.c:294 #2 0x55671484d031 in monitor_disas /home/elmarco/src/qemu/disas.c:635 #3 0x556714862ec0 in memory_dump /home/elmarco/src/qemu/monitor.c:1324 #4 0x55671486342a in hmp_memory_dump /home/elmarco/src/qemu/monitor.c:1418 #5 0x5567148670be in handle_hmp_command /home/elmarco/src/qemu/monitor.c:3109 #6 0x5567148674ed in qmp_human_monitor_command /home/elmarco/src/qemu/monitor.c:613 #7 0x556714b00918 in qmp_marshal_human_monitor_command /home/elmarco/src/qemu/build/qmp-marshal.c:1704 #8 0x556715138a3e in do_qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:104 #9 0x556715138f83 in qmp_dispatch /home/elmarco/src/qemu/qapi/qmp-dispatch.c:131 #10 0x55671485cf88 in handle_qmp_command /home/elmarco/src/qemu/monitor.c:3839 #11 0x55671514e80b in json_message_process_token /home/elmarco/src/qemu/qobject/json-streamer.c:105 #12 0x5567151bf2dc in json_lexer_feed_char /home/elmarco/src/qemu/qobject/json-lexer.c:323 #13 0x5567151bf827 in json_lexer_feed /home/elmarco/src/qemu/qobject/json-lexer.c:373 #14 0x55671514ee62 in json_message_parser_feed /home/elmarco/src/qemu/qobject/json-streamer.c:124 #15 0x556714854b1f in monitor_qmp_read /home/elmarco/src/qemu/monitor.c:3881 #16 0x556715045440 in qemu_chr_be_write_impl /home/elmarco/src/qemu/chardev/char.c:172 #17 0x556715047184 in qemu_chr_be_write /home/elmarco/src/qemu/chardev/char.c:184 #18 0x55671505a8e6 in tcp_chr_read /home/elmarco/src/qemu/chardev/char-socket.c:440 #19 0x5567150943c3 in qio_channel_fd_source_dispatch /home/elmarco/src/qemu/io/channel-watch.c:84 #20 0x7fb90292b90b in g_main_dispatch ../glib/gmain.c:3182 #21 0x7fb90292c7ac in g_main_context_dispatch ../glib/gmain.c:3847 #22 0x556715162eca in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #23 0x556715163001 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #24 0x5567151631fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #25 0x556714ad6d3b in main_loop /home/elmarco/src/qemu/vl.c:1950 #26 0x556714ade329 in main /home/elmarco/src/qemu/vl.c:4865 #27 0x7fb8fe5c9009 in __libc_start_main (/lib64/libc.so.6+0x21009) #28 0x5567147af4d9 in _start (/home/elmarco/src/qemu/build/s390x-softmmu/qemu-system-s390x+0xf674d9) 0x556715a1f120 is located 32 bytes to the left of global variable 'char_hci_type_info' defined in '/home/elmarco/src/qemu/hw/bt/hci-csr.c:493:23' (0x556715a1f140) of size 104 0x556715a1f120 is located 8 bytes to the right of global variable 's390_opcodes' defined in '/home/elmarco/src/qemu/disas/s390.c:860:33' (0x556715a15280) of size 40600 This fix is based on Andreas Arnez <[email protected]> upstream commit: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=9ace48f3d7d80ce09c5df60cccb433470410b11b 2014-08-19 Andreas Arnez <[email protected]> * s390-dis.c (init_disasm): Simplify initialization of opc_index[]. This also fixes an access after the last element of s390_opcodes[]. Signed-off-by: Marc-André Lureau <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
------- Comment From [email protected] 2018-02-14 11:01:16 EDT------- https://bugzilla.linux.ibm.com/show_bug.cgi?id=163870 This is the patch that fixes that bug: It's not upstream yet but it is already in the 'fixes' branch of the ppc kernel maintainer. As soon as it is upstream we can get a patched Host OS kernel to verify if it fixes this bug we're seeing here too. |
Spotted thanks to ASAN: QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/migration-test -p /x86_64/migration/bad_dest ==30302==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f60efba1a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) #1 0x7f60eef3cf75 in g_malloc0 ../glib/gmem.c:124 #2 0x55ca9094702c in error_copy /home/elmarco/src/qemu/util/error.c:203 #3 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139 #4 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150 #5 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411 #6 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81 #7 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85 #8 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142 #9 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88 #10 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552 #11 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182 #12 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847 #13 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #14 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #15 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #16 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942 #17 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724 #18 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009) Indirect leak of 45 byte(s) in 1 object(s) allocated from: #0 0x7f60efba1850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7f60eef3cf0c in g_malloc ../glib/gmem.c:94 #2 0x7f60eef3d1cf in g_malloc_n ../glib/gmem.c:331 #3 0x7f60eef596eb in g_strdup ../glib/gstrfuncs.c:363 #4 0x55ca90947085 in error_copy /home/elmarco/src/qemu/util/error.c:204 #5 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139 #6 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150 #7 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411 #8 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81 #9 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85 #10 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142 #11 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88 #12 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552 #13 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182 #14 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847 #15 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #16 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #17 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #18 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942 #19 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724 #20 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009) Signed-off-by: Marc-André Lureau <[email protected]> Message-Id: <[email protected]> Reviewed-by: Dr. David Alan Gilbert <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Eric Auger reported the problem days ago that OOB broke ARM when running with libvirt: http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html The problem was that the monitor dispatcher bottom half was bound to qemu_aio_context now, which could be polled unexpectedly in block code. We should keep the dispatchers run in iohandler_ctx just like what we did before the Out-Of-Band series (chardev uses qio, and qio binds everything with iohandler_ctx). If without this change, QMP dispatcher might be run even before reaching main loop in block IO path, for example, in a stack like (the ARM case, "cont" command handler run even during machine init phase): #0 qmp_cont () #1 0x00000000006bd210 in qmp_marshal_cont () #2 0x0000000000ac05c4 in do_qmp_dispatch () #3 0x0000000000ac07a0 in qmp_dispatch () #4 0x0000000000472d60 in monitor_qmp_dispatch_one () #5 0x000000000047302c in monitor_qmp_bh_dispatcher () #6 0x0000000000acf374 in aio_bh_call () #7 0x0000000000acf428 in aio_bh_poll () #8 0x0000000000ad5110 in aio_poll () #9 0x0000000000a08ab8 in blk_prw () #10 0x0000000000a091c4 in blk_pread () #11 0x0000000000734f94 in pflash_cfi01_realize () #12 0x000000000075a3a4 in device_set_realized () #13 0x00000000009a26cc in property_set_bool () #14 0x00000000009a0a40 in object_property_set () #15 0x00000000009a3a08 in object_property_set_qobject () #16 0x00000000009a0c8c in object_property_set_bool () #17 0x0000000000758f94 in qdev_init_nofail () #18 0x000000000058e190 in create_one_flash () #19 0x000000000058e2f4 in create_flash () #20 0x00000000005902f0 in machvirt_init () #21 0x00000000007635cc in machine_run_board_init () #22 0x00000000006b135c in main () Actually the problem is more severe than that. After we switched to the qemu AIO handler it means the monitor dispatcher code can even be called with nested aio_poll(), then it can be an explicit aio_poll() inside another main loop aio_poll() which could be racy too; breaking code like TPM and 9p that use nested event loops. Switch to use the iohandler_ctx for monitor dispatchers. My sincere thanks to Eric Auger who offered great help during both debugging and verifying the problem. The ARM test was carried out by applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the patch. A quick test of mine shows that after this patch applied we can pass all raw iotests even with OOB on by default. CC: Eric Blake <[email protected]> CC: Markus Armbruster <[email protected]> CC: Stefan Hajnoczi <[email protected]> CC: Fam Zheng <[email protected]> Reported-by: Eric Auger <[email protected]> Tested-by: Eric Auger <[email protected]> Signed-off-by: Peter Xu <[email protected]> Message-Id: <[email protected]> Reviewed-by: Eric Blake <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Eric Blake <[email protected]>
------- Comment From [email protected] 2018-07-31 10:47:03 EDT------- |
------- Comment From [email protected] 2018-08-31 13:20:06 EDT------- |
cde:info Mirrored with LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=160010 </cde:info>
Description:
Guest goes to shut-off state after migrating between Boston and ZZ systems, it is observed that migration succeeds without any issues but after migration in destination guest gets call traces and shuts off.
Call Trace:
Test Environment:
The text was updated successfully, but these errors were encountered: