Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32 board support? #1

Open
dgtlrift opened this issue Oct 5, 2016 · 23 comments
Open

ESP32 board support? #1

dgtlrift opened this issue Oct 5, 2016 · 23 comments

Comments

@dgtlrift
Copy link

dgtlrift commented Oct 5, 2016

I see there's ESP8266 support, any chance of the new ESP32 support being added?

@jcmvbkbc
Copy link
Member

jcmvbkbc commented Oct 6, 2016

You might have noticed that ESP8266 support is very rudimentary.

Espressif haven't provided any feedback to my request for the hardware specifications of ESP32, I haven't been following ESP32 development/reverse engineering closely and I don't have the hardware ATM.
For the ESP32 support to appear something of the following need to happen: either somebody contributes the implementation (ideal, but doesn't sound likely), or somebody publishes hardware specifications (at least the subset with the memory layout and core peripherals description/register maps)/ROM dumps. In addition somebody willing to test that QEMU model behavior is somewhat close to the real hardware behavior is needed.

@Ebiroll
Copy link

Ebiroll commented Nov 7, 2016

I got my ESP32 board from adafruit. I have managed to dump the ROM and use it to start an elf file compiled with esp-idf in my patched version of QEMU. You can see the results here, https://github.com/Ebiroll/qemu_esp32 . However, I was wondering, what is the difference of this repository and the xtensa support in QEMU? (https://github.com/qemu/qemu) Thank you.

@jcmvbkbc
Copy link
Member

jcmvbkbc commented Nov 7, 2016

Hi @Ebiroll
the master branch of this tree follows (with some noticeable lag) the master branch of the QEMU mainline.
Other branches are experimental (e.g. windowed*, xtensa-perf or xtensa-esp8266), work in progress (e.g. xtensa-ccount, xtensa-tcg, xtensa-user), not currently upstreamable for some reason, but having somewhat useful functionality (e.g. xtensa-nommu, xtensa-smp*) or some kind of helpers (e.g. xtensa-cores, xtensa-fp-test, xtensa-staging*, xtensa-trace, xtensa-xtfpga).

@Ebiroll
Copy link

Ebiroll commented Nov 7, 2016

Thanks a lot. I guess the xtensa-esp8266 branch would be most helpful for esp32 emulation work.
I had to make some rom patches to make it run, but proper hardware emulation would be much better.

Running gdb with qemu on assembly level works quite well, however when stepping through the c-code I get this assertion in gdb: /home/ivan/e/crosstool-NG/.build/src/gdb-7.10/gdb/regcache.c:697: internal-error: regcache_raw_read_unsigned: Assertion `regnum >= 0 && regnum < regcache->descr->nr_raw_registers' failed.
This happens sometimes at return from a function. Maybe it has something to do with a new task being created.

Any chance you have an idea of whats the cause of this problem? Has it something to do with the windowed call mechanism or interrupts screwing up the registers? Its really annoying and discourages any further work on this. If not stepping in but rather just setting a breakpoint or running (next) it runs well.

Here is your toolchain that espresstif are using.
https://github.com/espressif/crosstool-NG
https://github.com/espressif/esp-idf/blob/master/docs/linux-setup.rst

Thanks for your great work anyways.
/Olof

@jcmvbkbc
Copy link
Member

jcmvbkbc commented Nov 7, 2016

I guess the xtensa-esp8266 branch would be most helpful for esp32 emulation work.

Sure, feel free to reuse it if it helps.

Running gdb with qemu on assembly level works quite well, however when stepping through the c-code I get this assertion in gdb: /home/ivan/e/crosstool-NG/.build/src/gdb-7.10/gdb/regcache.c:697: internal-error: regcache_raw_read_unsigned: Assertion `regnum >= 0 && regnum < regcache->descr->nr_raw_registers' failed.
This happens sometimes at return from a function. Maybe it has something to do with a new task being created.
Any chance you have an idea of whats the cause of this problem?

I believe it's gdb issue. Have seen it multiple times in different contexts with different xtensa cores in recent gdb versions, I'll likely have a chance to look deeper into it this week.

@Ebiroll
Copy link

Ebiroll commented Nov 8, 2016

Have seen it multiple times in different contexts with different xtensa cores in recent gdb versions,
I looked closer at the assert.
Assertion `regnum >= 0 && regnum < regcache->descr->nr_raw_registers' failed.
I added some printouts to gdb and it seems that regnum is 116 and nr_raw_registers is 105. My guess is that 116 is the EPC1 register and that one is not included in the 'raw' registers.

If you look for register 116 you find it here
From gdb xtensa-config.c
XTREG(116,464,32, 4, 4,0x02b1,0x0007,-2, 2,0x1000,epc1, 0,0,0,0,0,0)

The only location that it is accessed by gdb is here,
In xtensa-tdep.c
static void
xtensa_window_interrupt_frame_cache (struct frame_info this_frame,
xtensa_frame_cache_t *cache,
CORE_ADDR pc)
{
...
/
Read PC of interrupted function from EPC1 register. */
epc1_regnum = xtensa_find_register_by_name (gdbarch,"epc1");
if (epc1_regnum < 0)
error(_("Unable to read Xtensa register EPC1"));

cache->ra = xtensa_read_register (epc1_regnum);
cache->pc = get_frame_func (this_frame);
}

I also did a analysis of the gdb-protocol that was sent, you can see it here,
https://github.com/Ebiroll/qemu_esp32/blob/master/gdbanalysis.md

When patching the xtensa_window_interrupt_frame_cache() functions it seems like qemu gets into to the interrupt function , _WindowUnderflow8, however I am not sure what to put in cache->ra.

It could be an error in the file core-esp32/gdb-config.c but when I add more than 105 registers then gdb complaints about Remote 'g' packet reply is too long....

@jcmvbkbc
Copy link
Member

@Ebiroll looks like there are two issues here: first is that gdb overlay in the espressif crosstool-NG is not patched to mark all registers as unprivileged ( & ~1 in the 7th column of XTREG list in gdb/xtensa-config.c), second is that you've specified incorrect num_regs in https://github.com/Ebiroll/qemu_esp32/blob/master/qemu-patch/target-xtensa/core-esp32.c . It should work correctly with gdb corrected as said above if you just drop this line.

I've pushed a branch xtensa-esp32 with my version of the esp32 core to this repository, please take a look.
The combination of these two fixes works fine in my limited testing. I'll look at it some more.

@Ebiroll
Copy link

Ebiroll commented Nov 11, 2016

gdb overlay in the espressif crosstool-NG is not patched to mark all registers as unprivileged

Thanks, the gdb patch worked! That really improved the debugging experience when debugging with qemu. I also tested your xtensa-esp32 branch but had to comment out //gen_exception_cause(dc, ILLEGAL_INSTRUCTION_CAUSE); in the function gen_check_sr(), translate.c,
(register 97) SR 97 is not implemented
Other missing instructions,
WUR 234 not implemented, TBD(pc = 400971db)
WUR 235 not implemented, TBD(pc = 400971e0)
WUR 236 not implemented, TBD(pc = 400971e5)
But they do not generate exceptions. I guess these instructions are related to synchronisation between the cores.

static bool gen_check_sr(DisasContext *dc, uint32_t sr, unsigned access)
{
if (!xtensa_option_bits_enabled(dc->config, sregnames[sr].opt_bits)) {
if (sregnames[sr].name) {
qemu_log_mask(LOG_GUEST_ERROR, "SR %s is not configured\n", sregnames[sr].name);
} else {
qemu_log_mask(LOG_UNIMP, "SR %d %s is not implemented\n", sr);
}
//gen_exception_cause(dc, ILLEGAL_INSTRUCTION_CAUSE);

Keep the good work up and have a nice weekend!

@jcmvbkbc
Copy link
Member

(register 97) SR 97 is not implemented

Oh, right, MEMCTL. I'll implement it.

WUR 234 not implemented, TBD(pc = 400971db)
WUR 235 not implemented, TBD(pc = 400971e0)
WUR 236 not implemented, TBD(pc = 400971e5)

These are the registers of the double precision floating point accelerator. I'll look at implementing it too.

BTW, this core (esp32) uses new hardware floating point, slightly different from what's in the QEMU right now, and with additional instructions for division and square root. It's still in WIP state at the xtensa-new-ops branch.

@Ebiroll
Copy link

Ebiroll commented Nov 14, 2016

Thanks for MEMCTL. It works fine.
Espressif have published a reference manual, updated with MMU chapter on the 9/11, 2016
http://espressif.com/sites/default/files/documentation/esp32_technical_reference_manual_en.pdf
Do you have the hardware now? I can make a pull request with ROM0 & ROM1 if you want.
Otherwise making a romdump is easy with esptool.py, esp-idf/components/esptool_py/esptool/esptool.py --chip esp32 -b 921600 -p /dev/ttyUSB0 dump_mem 0x40000000 0x000C2000 rom0.bin

Also I have a question about the mapfile generated in the build directory by esp-idf. Is there anyway to use this information directly in gdb? The purpose of this is to get the name of the rom function in gdb when running in qemu. From the mapfile we know that uartAttach is here 0x40008fd0.
Rather than this,
(gdb) where
0x40008fd0 in ?? ()
0x40080990 in call_start_cpu0
I would like to see this,
0x40008fd0 in uartAttach
0x40080990 in call_start_cpu0
I tried adding -defsym=uartAttach=0x40008fd0 when linking but not with any success.
Try using, make VERBOSE=1 , with esp-idf to see how its compiled and linked. It would help a lot when trying to understand the code in the romdump.

@jcmvbkbc
Copy link
Member

Espressif have published a reference manual, updated with MMU chapter on the 9/11, 2016

Thanks, I'll take a look.

Do you have the hardware now?

No, not yet. Looks like I won't have time to play with it for the following several weeks anyway.

I can make a pull request with ROM0 & ROM1 if you want.

Ok, I can take a look.

Also I have a question about the mapfile generated in the build directory by esp-idf. Is there anyway to use this information directly in gdb? The purpose of this is to get the name of the rom function in gdb when running in qemu.

Not that I know of. OTOH I did this for ESP8266 for the same purpose. The resulting elf may be loaded into gdb with 'add-symbol-file' command.

@Ebiroll
Copy link

Ebiroll commented Nov 14, 2016

Thanks, almost works for me, not sure how to modify this though

rom-functions.s: esp32.rom.ld
sed -n 's/PROVIDE[[:space:]]([[:space:]]([^[:space:]=]+)[^0]+(0x4.......).*/\1 = \2 - 0x40000000 + _stext\n.global \1\n.type \1, @function/p' < $^ &gt; $@
Whe I run I get this,
(gdb) add-symbol-file rom.elf
The address where rom.elf has been loaded is missing

Output from make
xtensa-esp32-elf-ld -M -T bootrom.ld -r bootrom.o -o bootrom.elf
xtensa-esp32-elf-ld: bootrom.elf section .text' will not fit in regionbootrom0'

Discarded input sections

.data 0x0000000000000000 0x0 bootrom.o
.bss 0x0000000000000000 0x0 bootrom.o

Memory Configuration

Name Origin Length Attributes
bootrom0 0x0000000040000000 0x0000000000010000
default 0x0000000000000000 0xffffffffffffffff

Linker script and memory map

.text 0x0000000040000000 0xc2000
*(.text)
.text 0x0000000040000000 0xc2000 bootrom.o
0x0000000040000000 _WindowOverflow4
0x0000000040000010 _xtos_alloca_handler
0x0000000040000040 _WindowUnderflow4
0x0000000040000080 _WindowOverflow8

@Ebiroll
Copy link

Ebiroll commented Nov 14, 2016

Sorry, I was in a hurry.
(gdb) add-symbol-file rom.elf 0x40000000 did the trick.

@Ebiroll
Copy link

Ebiroll commented Dec 5, 2016

@jcmvbkbc any time to update xtensa-esp32 yet?

I tried to look at the new-ops branch and tried to add the missing instructions here. https://github.com/Ebiroll/qemu-xtensa-esp32
There is not support for DFP accelerator in xtensa-gcc at the moment so I guess there is no need to add all floating point coprocessor instructions. I guess it is also not really necessary to add the RUR and WUR instructions either as its not used in the code. After the rom lets the control to the app.elf file xthal_restore_extra_nw is not called and these instructions are not called at all.

I have also added an open core lwip ethernet driver, but never got interrupts to work. I need to look closer at this. I did something like this.
xtensa_get_extint(env, 9)
s = SYS_BUS_DEVICE(dev);
sysbus_connect_irq(s, 0, irq);

But when I called qemu_irq_raise(s->irq); no interrupt was generated. Any ideas?
Probably some numbering mismatch or maybe I should try to call qemu_irq_pulse() in case the interrupt is edge triggered.

This is how interrupt handler is added by the real esp32 ether-net driver.
static void emac_enable_intr()
{
//init emac intr
REG_SET_FIELD(DPORT_PRO_EMAC_INT_MAP_REG, DPORT_PRO_EMAC_INT_MAP, ETS_EMAC_INUM);
xt_set_interrupt_handler(ETS_EMAC_INUM, emac_process_intr, NULL);
xt_ints_on(1 << ETS_EMAC_INUM);

REG_WRITE(EMAC_DMAINTERRUPT_EN_REG, EMAC_INTR_ENABLE_BIT);

}

@jcmvbkbc
Copy link
Member

jcmvbkbc commented Dec 6, 2016

irq = xtensa_get_extint(env, 9)

AFAICS 9 is the internal IRQ number, not the external. Should be irq = env->irq_inputs[9];

@Ebiroll
Copy link

Ebiroll commented Dec 8, 2016

Thanks. It worked! I also spent a few extra hours because I forgot to add the argument -net nic,model=vlan0
:-P

Now I try to start both cores, but I am stuck here
x40009a69 <Cache_Flush+85> memw
0x40009a6c <Cache_Flush+88> l32i.n a10, a8, 0
0x40009a6e <Cache_Flush+90> bnone a10, a9, 0x40009a69 <Cache_Flush+85>
io read 58 DPORT_APP_CACHE_CTRL_REG 3ff00058=0x01

I guess returning 0 could do the trick. Any other advice for running 2 cores?
This is the startup code,
#if !CONFIG_FREERTOS_UNICORE
ESP_EARLY_LOGI(TAG, "Starting app cpu, entry point is %p", call_start_cpu1);
//Flush and enable icache for APP CPU
Cache_Flush(1);
Cache_Read_Enable(1);
esp_cpu_unstall(1);
//Enable clock gating and reset the app cpu.
SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_B_REG, DPORT_APPCPU_CLKGATE_EN);
CLEAR_PERI_REG_MASK(DPORT_APPCPU_CTRL_C_REG, DPORT_APPCPU_RUNSTALL);
SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, DPORT_APPCPU_RESETTING);
CLEAR_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, DPORT_APPCPU_RESETTING);
ets_set_appcpu_boot_addr((uint32_t)call_start_cpu1);

while (!app_cpu_started) {
    ets_delay_us(100);
}

#else
(gdb) info threads
Id Target Id Frame

  • 2 Thread 2 (CPU#1 [running]) 0x400003c0 in _DoubleExceptionVector ()
    1 Thread 1 (CPU#0 [running]) 0x40009a69 in Cache_Flush ()

Any thoughts, advice? I guess
SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, DPORT_APPCPU_RESETTING);
should cause a reset of the app cpu.
Thanks for all your great advice anyways.

@Ebiroll
Copy link

Ebiroll commented Dec 8, 2016

Ivan sent me updated info regarding inter core communication.

Communication between cores is done using an atomic compare-and-set instruction (S32C1I) , which is implemented as a transaction over the AHB bus. We also use shared memory for some things, mainly in FreeRTOS scheduler. Once FreeRTOS is running, we use its communication primitives (queues, semaphores) to communicate between tasks running on different CPUs.

A simple transaction on the AHB consists of an address phase and a subsequent data phase (without wait states: only two bus-cycles). Access to the target device is controlled through a MUX (non-tristate), thereby admitting bus-access to one bus-master at a time.

Any thoughts? Is S32C1I possible to implement?

How about Flash MMU support? Would that be difficult to implement? Would it help to look at the esp8266 implementation? (esp8266_spi)

@jcmvbkbc
Copy link
Member

jcmvbkbc commented Dec 9, 2016

Any other advice for running 2 cores?
esp_cpu_unstall(1);
//Enable clock gating and reset the app cpu.
SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_B_REG, DPORT_APPCPU_CLKGATE_EN);
CLEAR_PERI_REG_MASK(DPORT_APPCPU_CTRL_C_REG, DPORT_APPCPU_RUNSTALL);
SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, DPORT_APPCPU_RESETTING);
CLEAR_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, DPORT_APPCPU_RESETTING);

Looks like you need proper runstall signal to prevent app CPU from running too early. I've had it implemented in my xtensa-smp* branch, but IIRC it got broken in qemu-2.7. I'll look at it.

I guess SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, PORT_APPCPU_RESETTING); should cause a reset of the app cpu.

I agree.

Is S32C1I possible to implement?

It's there.

How about Flash MMU support? Would that be difficult to implement?

It doesn't look too difficult, but one part of it isn't entirely clear to me:

The decision an MPU can make, based on this information, is to allow or deny a process to access
the memory region or peripheral

in case of CPU being denied memory access, will it get any exception? Or only write would be ignored/read would return garbage?

@Ebiroll
Copy link

Ebiroll commented Dec 10, 2016

It worked! APP & PRO cpu running!
I (97142) cpu_start: Starting scheduler on PRO CPU.
I (68253) cpu_start: Starting scheduler on APP CPU.

Not sure about how the runstall stuff should have worked and I was not able to do a proper reset,
I tried this,
XtensaCPU *APPcpu = NULL; // Is set at init and saved for reset,

 CPUClass *cc = CPU_CLASS(APPcpu);
 XtensaCPUClass *xcc = XTENSA_CPU_GET_CLASS(APPcpu);
 CPUXtensaState *env = &APPcpu->env;
 cc->reset(env);   // core-dump. 

Instead I just set pc register like this instead of cc->reset(env);
env->pc = env->config->exception_vector[EXC_RESET];
Maybe you can tell me how to call reset_cpu(APPcpu)?

At startup, The app cpu (2) hangs in a _double exception until the pro cpu calls,
SET_PERI_REG_MASK(DPORT_APPCPU_CTRL_A_REG, PORT_APPCPU_RESETTING);
It seems like they communicate the app-cpu flash code startup adress with DPORT_APPCPU_CTRL_D_REG.

in case of CPU being denied memory access, will it get any exception? Or only write would be ignored/read would return garbage?

My guess would be that it would be ignored and maybe return the same value as previous memory access. You could probably test it on the actual hardware. See table 35. I could test if I knew how,

I am also very unsure of how to do MMU flash emulation. Do you have any example code?
I saw this,
c4ec11a

If you want to try running and implement SPI-flash emulation, I have implemented unpatch rom function that restores the original rom
int *unpatch=(int *) 0x3ff00088;
*unpatch=0x42;

Have a nice weekend anyways.

@Ebiroll
Copy link

Ebiroll commented Dec 10, 2016

One small observation regarding interrupts and -smp.
If starting 2 cores the lan driver only gets interrupts on the last created core.
How should interrupts work when running two cores?
Anyway, I changed the order so that PRO cpu is initated last. Maybe I am doing it wrong.

for (n = 0; n < smp_cpus; n++) {
    cpu = cpu_xtensa_init(cpu_model);
    if (cpu == NULL) {
        error_report("unable to find CPU definition '%s'",
                     cpu_model);
        exit(EXIT_FAILURE);
    }
    env = &cpu->env;

    if (smp_cpus==2) {
        if (n==1) {
            env->sregs[PRID] = 0xCDCD;
        } else {
            env->sregs[PRID] = 0xABAB;
            APPcpu = cpu;            
        }
    } else {
            // If only running one core, it should be PRO-CPU 
            env->sregs[PRID] = 0xCDCD;
    }

    qemu_register_reset(lx60_reset, cpu);
    /* Need MMU initialized prior to ELF loading,
     * so that ELF gets loaded into virtual addresses
     */
    cpu_reset(CPU(cpu));
}

I was doing it wrong. I was working too late.
I used the env = &cpu->env; variable, thats why only the last created core got the interrupts.

 open_net_init(system_memory,0x3ff69000,0x3ff69400 , 0x3FFF8000,
                  env->irq_inputs[9]
                  , nd_table);   //  

https://github.com/Ebiroll/qemu-xtensa-esp32/blob/master/hw/xtensa/esp32.c

jcmvbkbc pushed a commit that referenced this issue Dec 14, 2016
ASAN complains about buffer overflow when running:
aarch64-softmmu/qemu-system-aarch64 -machine xilinx-zynq-a9

==476==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000035e38 at pc 0x000000f75253 bp 0x7ffc597e0ec0 sp 0x7ffc597e0eb0
READ of size 8 at 0x602000035e38 thread T0
    #0 0xf75252 in xilinx_spips_realize hw/ssi/xilinx_spips.c:623
    #1 0xb9ef6c in device_set_realized hw/core/qdev.c:918
    #2 0x129ae01 in property_set_bool qom/object.c:1854
    #3 0x1296e70 in object_property_set qom/object.c:1088
    #4 0x129dd1b in object_property_set_qobject qom/qom-qobject.c:27
    #5 0x1297168 in object_property_set_bool qom/object.c:1157
    #6 0xb9aeac in qdev_init_nofail hw/core/qdev.c:358
    #7 0x78a5bf in zynq_init_spi_flashes /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:125
    #8 0x78af60 in zynq_init /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:238
    #9 0x998eac in main /home/elmarco/src/qemu/vl.c:4534
    #10 0x7f96ed692730 in __libc_start_main (/lib64/libc.so.6+0x20730)
    #11 0x41d0a8 in _start (/home/elmarco/src/qemu/aarch64-softmmu/qemu-system-aarch64+0x41d0a8)

0x602000035e38 is located 0 bytes to the right of 8-byte region [0x602000035e30,0x602000035e38)
allocated by thread T0 here:
    #0 0x7f970b014e60 in malloc (/lib64/libasan.so.3+0xc6e60)
    #1 0x7f96f15b0e18 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee18)
    #2 0xb9ef6c in device_set_realized hw/core/qdev.c:918
    #3 0x129ae01 in property_set_bool qom/object.c:1854
    #4 0x1296e70 in object_property_set qom/object.c:1088
    #5 0x129dd1b in object_property_set_qobject qom/qom-qobject.c:27
    #6 0x1297168 in object_property_set_bool qom/object.c:1157
    #7 0xb9aeac in qdev_init_nofail hw/core/qdev.c:358
    #8 0x78a5bf in zynq_init_spi_flashes /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:125
    #9 0x78af60 in zynq_init /home/elmarco/src/qemu/hw/arm/xilinx_zynq.c:238
    #10 0x998eac in main /home/elmarco/src/qemu/vl.c:4534
    #11 0x7f96ed692730 in __libc_start_main (/lib64/libc.so.6+0x20730)

s->spi is allocated with the size of num_busses which may be 1 (by
default).  Change to use a loop up to s->num_busses also for the
call to ssi_auto_connect_slaves().

Reported-by: Marc-André Lureau <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Dec 14, 2016
Qemu crash in the source side while migrating, after starting ipmi service inside vm.

./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 \
-drive file=/work/suse/suse11_sp3_64_vt,format=raw,if=none,id=drive-virtio-disk0,cache=none \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 \
-vnc :99 -monitor vc -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-kcs,bmc=bmc0,ioport=0xca2

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffec4268700 (LWP 7657)]
__memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2757
(gdb) bt
 #0  __memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2757
 #1  0x00005555559ef775 in memcpy (__len=3, __src=0xc1421c, __dest=<optimized out>)
     at /usr/include/bits/string3.h:51
 #2  qemu_put_buffer (f=0x555557a97690, buf=0xc1421c <Address 0xc1421c out of bounds>, size=3)
     at migration/qemu-file.c:346
 #3  0x00005555559eef66 in vmstate_save_state (f=f@entry=0x555557a97690,
     vmsd=0x555555f8a5a0 <vmstate_ISAIPMIKCSDevice>, opaque=0x555557231160,
     vmdesc=vmdesc@entry=0x55555798cc40) at migration/vmstate.c:333
 #4  0x00005555557cfe45 in vmstate_save (f=f@entry=0x555557a97690, se=se@entry=0x555557231de0,
     vmdesc=vmdesc@entry=0x55555798cc40) at /mnt/sdb/zyy/qemu/migration/savevm.c:720
 #5  0x00005555557d2be7 in qemu_savevm_state_complete_precopy (f=0x555557a97690,
     iterable_only=iterable_only@entry=false) at /mnt/sdb/zyy/qemu/migration/savevm.c:1128
 #6  0x00005555559ea102 in migration_completion (start_time=<synthetic pointer>,
     old_vm_running=<synthetic pointer>, current_active_state=<optimized out>,
     s=0x5555560eaa80 <current_migration.44078>) at migration/migration.c:1707
 #7  migration_thread (opaque=0x5555560eaa80 <current_migration.44078>) at migration/migration.c:1855
 #8  0x00007ffff3900dc5 in start_thread (arg=0x7ffec4268700) at pthread_create.c:308
 #9  0x00007fffefc6c71d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Signed-off-by: Zhuang Yanying <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
@Ebiroll
Copy link

Ebiroll commented Dec 20, 2016

I saw your runstall changes, and I meged them.
Any plans on doing spi/flash support? Otherwise I will give it a try.
I will delete the pull request.

@Ebiroll
Copy link

Ebiroll commented Dec 29, 2016

Hello again, @jcmvbkbc
I used the code from the esp8266 branch and added flash support. It worked fine.
I use the values written to the MMU table to map from flash to physical memory, with
cpu_physical_memory_write . I also store the written data in a memory mapped flash file.
/* Flash MMU table for PRO CPU /
#define DPORT_PRO_FLASH_MMU_TABLE ((volatile uint32_t
) 0x3FF10000)
Just to let you know, still work in progress. https://github.com/Ebiroll/qemu_esp32
Have fun & Happy new year.

jcmvbkbc pushed a commit that referenced this issue Jan 25, 2017
Current code that handles Tx buffer desciprtor ring scanning employs the
following algorithm:

	1. Restore current buffer descriptor pointer from TBPTRn

	2. Process current descriptor

	3. If current descriptor has BD_WRAP flag set set current
	   descriptor pointer to start of the descriptor ring

	4. If current descriptor points to start of the ring exit the
	   loop, otherwise increment current descriptor pointer and go
	   to #2

	5. Store current descriptor in TBPTRn

The way the code is implemented results in buffer descriptor ring being
scanned starting at offset/descriptor #0. While covering 99% of the
cases, this algorithm becomes problematic for a number of edge cases.

Consider the following scenario: guest OS driver initializes descriptor
ring to N individual descriptors and starts sending data out. Depending
on the volume of traffic and probably guest OS driver implementation it
is possible that an edge case where a packet, spread across 2
descriptors is placed in descriptors N - 1 and 0 in that order(it is
easy to imagine similar examples involving more than 2 descriptors).

What happens then is aforementioned algorithm starts at descriptor 0,
sees a descriptor marked as BD_LAST, which it happily sends out as a
separate packet(very much malformed at this point) then the iteration
continues and the first part of the original packet is tacked to the
next transmission which ends up being bogus as well.

This behvaiour can be pretty reliably observed when scp'ing data from a
guest OS via TAP interface for files larger than 160K (every time for
700K+).

This patch changes the scanning algorithm to do the following:

	1. Restore "current" buffer descriptor pointer from
	   TBPTRn

	2. If "current" descriptor does not have BD_TX_READY set, goto #6

	3. Process current descriptor

	4. If "current" descriptor has BD_WRAP flag set "current"
	   descriptor pointer to start of the descriptor ring otherwise
	   set increment "current" by the size of one descriptor

	5. Goto #1

	6. Save "current" buffer descriptor in TBPTRn

This way we preserve the information about which descriptor was
processed last and always start where we left off avoiding the original
problem. On top of that, judging by the following excerpt from
MPC8548ERM (p. 14-48):

"... When the end of the TxBD ring is reached, eTSEC initializes TBPTRn
to the value in the corresponding TBASEn. The TBPTR register is
internally written by the eTSEC’s DMA controller during
transmission. The pointer increments by eight (bytes) each time a
descriptor is closed successfully by the eTSEC..."

revised algorithm might also a more correct way of emulating this aspect
of eTSEC peripheral.

Cc: Alexander Graf <[email protected]>
Cc: Scott Wood <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: [email protected]
Signed-off-by: Andrey Smirnov <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Jan 25, 2017
QEMU will crash with the follow backtrace if the new created thread exited before
we call qemu_thread_set_name() for it.

  (gdb) bt
  #0 0x00007f9a68b095d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
  #1 0x00007f9a68b0acc8 in __GI_abort () at abort.c:90
  #2 0x00007f9a69cda389 in PAT_abort () from /usr/lib64/libuvpuserhotfix.so
  #3 0x00007f9a69cdda0d in patchIllInsHandler () from /usr/lib64/libuvpuserhotfix.so
  #4 <signal handler called>
  #5 pthread_setname_np (th=140298470549248, name=name@entry=0x8cc74a "io-task-worker") at ../nptl/sysdeps/unix/sysv/linux/pthread_setname.c:49
  #6 0x00000000007f5f20 in qemu_thread_set_name (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker") at util/qemu_thread_posix.c:459
  #7 0x00000000007f679e in qemu_thread_create (thread=thread@entry=0x7ffd2ac09680, name=name@entry=0x8cc74a "io-task-worker",start_routine=start_routine@entry=0x7c1300 <qio_task_thread_worker>, arg=arg@entry=0x7f99b8001720, mode=mode@entry=1) at util/qemu_thread_posix.c:498
  #8 0x00000000007c15b6 in qio_task_run_in_thread (task=task@entry=0x7f99b80033d0, worker=worker@entry=0x7bd920 <qio_channel_socket_connect_worker>, opaque=0x7f99b8003370, destroy=0x7c6220 <qapi_free_SocketAddress>) at io/task.c:133
  #9 0x00000000007bda04 in qio_channel_socket_connect_async (ioc=0x7f99b80014c0, addr=0x37235d0, callback=callback@entry=0x54ad00 <qemu_chr_socket_connected>, opaque=opaque@entry=0x38118b0, destroy=destroy@entry=0x0) at io/channel_socket.c:191
  #10 0x00000000005487f6 in socket_reconnect_timeout (opaque=0x38118b0) at qemu_char.c:4402
  #11 0x00007f9a6a1533b3 in g_timeout_dispatch () from /usr/lib64/libglib-2.0.so.0
  #12 0x00007f9a6a15299a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
  #13 0x0000000000747386 in glib_pollfds_poll () at main_loop.c:227
  #14 0x0000000000747424 in os_host_main_loop_wait (timeout=404000000) at main_loop.c:272
  #15 0x0000000000747575 in main_loop_wait (nonblocking=nonblocking@entry=0) at main_loop.c:520
  #16 0x0000000000557d31 in main_loop () at vl.c:2170
  #17 0x000000000041c8b7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:5083

Let's detach the new thread after calling qemu_thread_set_name().

Signed-off-by: Caoxinhua <[email protected]>
Signed-off-by: zhanghailiang <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Feb 23, 2017
AioHandlers marked ->is_external must be skipped when aio_node_check()
fails.  bdrv_drained_begin() needs this to prevent dataplane from
submitting new I/O requests while another thread accesses the device and
relies on it being quiesced.

This patch fixes the following segfault:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650
  2650            bdrv_io_plug(child->bs);
  [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))]
  (gdb) bt
  #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at qemu/block/io.c:2650
  #1  0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:1561
  #2  0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589
  #3  0x00005577f5d4240d in virtio_blk_data_plane_handle_output (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at qemu/hw/block/dataplane/virtio-blk.c:158
  #4  0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at qemu/hw/virtio/virtio.c:1304
  #5  0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134
  #6  0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at qemu/aio-posix.c:493
  #7  0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:569
  #8  0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at qemu/aio-posix.c:601
  #9  0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at qemu/block/io.c:2403
  #10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at qemu/block.c:2322
  #11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at qemu/block.c:2465
  #12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at qemu/block.c:3425
  #13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at qemu/block.c:1361
  #14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at qemu/block/block-backend.c:491
  #15 0x00005577f6111b1b in blk_remove_all_bs () at qemu/block/block-backend.c:245
  #16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382
  #17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398, envp=0x7ffea6eb8440) at qemu/vl.c:4684

The key thing is that bdrv_close() uses bdrv_drained_begin() and
virtio_queue_host_notifier_aio_poll() must not be called.

Thanks to Fam Zheng <[email protected]> for identifying the root cause of
this crash.

Reported-by: Alberto Garcia <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Fam Zheng <[email protected]>
Tested-by: Alberto Garcia <[email protected]>
Message-id: [email protected]
Signed-off-by: Stefan Hajnoczi <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Apr 19, 2017
Running postcopy-test with ASAN produces the following error:

QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64  tests/postcopy-test
...
=================================================================
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f1556600000 thread T6
    #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528
    #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665
    #2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044
    #3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976
    #4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
    #5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)

0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000)
allocated by thread T0 here:
    #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
    #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106
    #2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122
    #3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214
    #4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261
    #5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
    #6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
    #7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)

Thread T6 created by T0 here:
    #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
    #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465
    #2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096
    #3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500
    #4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87
    #5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
    #6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88
    #7 0x7f15823e38e6  (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass

index seems to be wrongly incremented, unless I miss something that
would be worth a comment.

Signed-off-by: Marc-André Lureau <[email protected]>
Signed-off-by: David Gibson <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Apr 19, 2017
During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:

    qmp_block_commit
      commit_active_start
        bdrv_reopen
          bdrv_reopen_multiple
            bdrv_reopen_prepare
              bdrv_flush
                aio_poll
                  aio_bh_poll
                    aio_bh_call
                      block_job_defer_to_main_loop_bh
                        stream_complete
                          bdrv_reopen

block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.

Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well.  To achieve
this, this patch forces draining the BH in BDRV_POLL_WHILE.

As a side effect this fixes a hang in block_job_detach_aio_context
during system_reset when a block job is ready:

    #0  0x0000555555aa79f3 in bdrv_drain_recurse
    #1  0x0000555555aa825d in bdrv_drained_begin
    #2  0x0000555555aa8449 in bdrv_drain
    #3  0x0000555555a9c356 in blk_drain
    #4  0x0000555555aa3cfd in mirror_drain
    #5  0x0000555555a66e11 in block_job_detach_aio_context
    #6  0x0000555555a62f4d in bdrv_detach_aio_context
    #7  0x0000555555a63116 in bdrv_set_aio_context
    #8  0x0000555555a9d326 in blk_set_aio_context
    #9  0x00005555557e38da in virtio_blk_data_plane_stop
    #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd
    #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd
    #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd
    #13 0x00005555559f6a18 in virtio_pci_reset
    #14 0x00005555559139a9 in qdev_reset_one
    #15 0x0000555555916738 in qbus_walk_children
    #16 0x0000555555913318 in qdev_walk_children
    #17 0x0000555555916738 in qbus_walk_children
    #18 0x00005555559168ca in qemu_devices_reset
    #19 0x000055555581fcbb in pc_machine_reset
    #20 0x00005555558a4d96 in qemu_system_reset
    #21 0x000055555577157a in main_loop_should_exit
    #22 0x000055555577157a in main_loop
    #23 0x000055555577157a in main

The rationale is that the loop in block_job_detach_aio_context cannot
make any progress in pausing/completing the job, because bs->in_flight
is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
BH. With this patch, it does.

Reported-by: Jeff Cody <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Jeff Cody <[email protected]>
Tested-by: Jeff Cody <[email protected]>
Signed-off-by: Fam Zheng <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Jun 6, 2017
ASAN detects an "unknown-crash" when running pxe-test:

/ppc64/pxe/spapr-vlan: =================================================================
==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0
READ of size 128 at 0x7f6dcd298d30 thread T2
    #0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73
    #1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289
    #2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446
    #3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82
    #4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67

Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame
    #0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13

  This frame has 3 object(s):
    [32, 48) '<unknown>'
    [96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable
    [160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable

The sockaddr_storage pointer is the sockaddr_in6 lhost on the
stack. Copy only the source addr size.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Samuel Thibault <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
When adding the generic PCA955xClass in commit 736132e, we
forgot to set the class_size field. Fill it now to avoid:

  (gdb) run -machine mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  Starting program: ../../qemu/qemu/arm-softmmu/qemu-system-arm -machine mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  double free or corruption (!prev)
  Thread 1 "qemu-system-arm" received signal SIGABRT, Aborted.
  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  (gdb) where
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x00007ffff75d8859 in __GI_abort () at abort.c:79
  #2  0x00007ffff76433ee in __libc_message
      (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff776d285 "%s\n")
      at ../sysdeps/posix/libc_fatal.c:155
  #3  0x00007ffff764b47c in malloc_printerr
      (str=str@entry=0x7ffff776f690 "double free or corruption (!prev)")
      at malloc.c:5347
  #4  0x00007ffff764d12c in _int_free
      (av=0x7ffff779eb80 <main_arena>, p=0x5555567a3990, have_lock=<optimized out>) at malloc.c:4317
  #5  0x0000555555c906c3 in type_initialize_interface
      (ti=ti@entry=0x5555565b8f40, interface_type=0x555556597ad0, parent_type=0x55555662ca10) at qom/object.c:259
  #6  0x0000555555c902da in type_initialize (ti=ti@entry=0x5555565b8f40)
      at qom/object.c:323
  #7  0x0000555555c90d20 in type_initialize (ti=0x5555565b8f40)
      at qom/object.c:1028

  $ valgrind --track-origins=yes qemu-system-arm -M mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  ==77479== Memcheck, a memory error detector
  ==77479== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
  ==77479== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
  ==77479== Command: qemu-system-arm -M mcimx6ul-evk -m 128M -display none -serial stdio -kernel ./OS.elf
  ==77479==
  ==77479== Invalid write of size 2
  ==77479==    at 0x6D8322: pca9552_class_init (pca9552.c:424)
  ==77479==    by 0x844D1F: type_initialize (object.c:1029)
  ==77479==    by 0x844D1F: object_class_foreach_tramp (object.c:1016)
  ==77479==    by 0x4AE1057: g_hash_table_foreach (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x8453A4: object_class_foreach (object.c:1038)
  ==77479==    by 0x8453A4: object_class_get_list (object.c:1095)
  ==77479==    by 0x556194: select_machine (vl.c:2416)
  ==77479==    by 0x556194: qemu_init (vl.c:3828)
  ==77479==    by 0x40AF9C: main (main.c:48)
  ==77479==  Address 0x583f108 is 0 bytes after a block of size 200 alloc'd
  ==77479==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
  ==77479==    by 0x4AF8D30: g_malloc0 (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x844258: type_initialize.part.0 (object.c:306)
  ==77479==    by 0x844D1F: type_initialize (object.c:1029)
  ==77479==    by 0x844D1F: object_class_foreach_tramp (object.c:1016)
  ==77479==    by 0x4AE1057: g_hash_table_foreach (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.2)
  ==77479==    by 0x8453A4: object_class_foreach (object.c:1038)
  ==77479==    by 0x8453A4: object_class_get_list (object.c:1095)
  ==77479==    by 0x556194: select_machine (vl.c:2416)
  ==77479==    by 0x556194: qemu_init (vl.c:3828)
  ==77479==    by 0x40AF9C: main (main.c:48)

Fixes: 736132e ("hw/misc/pca9552: Add generic PCA955xClass")
Reported-by: Jean-Christophe DUBOIS <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: Jean-Christophe DUBOIS <[email protected]>
Message-id: [email protected]
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Peter Maydell <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
Coverity has problems seeing through __builtin_choose_expr, which
result in it abandoning analysis of later functions that utilize a
definition that used MIN_CONST or MAX_CONST, such as in qemu-file.c:

 50    DECLARE_BITMAP(may_free, MAX_IOV_SIZE);

CID 1429992 (#1 of 1): Unrecoverable parse warning (PARSE_ERROR)1.
expr_not_constant: expression must have a constant value

As has been done in the past (see 07d6667), it's okay to dumb things
down when compiling for static analyzers.  (Of course, now the
syntax-checker has a false positive on our reference to
__COVERITY__...)

Reported-by: Peter Maydell <[email protected]>
Fixes: CID 1429992, CID 1429995, CID 1429997, CID 1429999
Signed-off-by: Eric Blake <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
tries to fix a leak detected when building with --enable-sanitizers:
./i386-softmmu/qemu-system-i386
Upon exit:
==13576==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1216 byte(s) in 1 object(s) allocated from:
    #0 0x7f9d2ed5c628 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5)
    #1 0x7f9d2e963500 in g_malloc (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.)
    #2 0x55fa646d25cc in object_new_with_type /tmp/qemu/qom/object.c:686
    #3 0x55fa63dbaa88 in qdev_new /tmp/qemu/hw/core/qdev.c:140
    #4 0x55fa638a533f in pc_pflash_create /tmp/qemu/hw/i386/pc_sysfw.c:88
    #5 0x55fa638a54c4 in pc_system_flash_create /tmp/qemu/hw/i386/pc_sysfw.c:106
    #6 0x55fa646caa1d in object_init_with_type /tmp/qemu/qom/object.c:369
    #7 0x55fa646d20b5 in object_initialize_with_type /tmp/qemu/qom/object.c:511
    #8 0x55fa646d2606 in object_new_with_type /tmp/qemu/qom/object.c:687
    #9 0x55fa639431e9 in qemu_init /tmp/qemu/softmmu/vl.c:3878
    #10 0x55fa6335c1b8 in main /tmp/qemu/softmmu/main.c:48
    #11 0x7f9d2cf06e0a in __libc_start_main ../csu/libc-start.c:308
    #12 0x55fa6335f8e9 in _start (/tmp/qemu/build/i386-softmmu/qemu-system-i386)

Signed-off-by: Alexander Bulekov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
… staging

* Make checkpatch say 'qemu' instead of 'kernel' (Aleksandar)
* Fix PSE guests with emulated NPT (Alexander B. #1)
* Fix leak (Alexander B. #2)
* HVF fixes (Roman, Cameron)
* New Sapphire Rapids CPUID bits (Cathy)
* cpus.c and softmmu/ cleanups (Claudio)
* TAP driver tweaks (Daniel, Havard)
* object-add bugfix and testcases (Eric A.)
* Fix Coverity MIN_CONST and MAX_CONST (Eric B.)
* "info lapic" improvement (Jan)
* SSE fixes (Joseph)
* "-msg guest-name" option (Mario)
* support for AMD nested live migration (myself)
* Small i386 TCG fixes (myself)
* improved error reporting for Xen (myself)
* fix "-cpu host -overcommit cpu-pm=on" (myself)
* Add accel/Kconfig (Philippe)
* iscsi sense handling fixes (Yongji)
* Misc bugfixes

# gpg: Signature made Sat 11 Jul 2020 00:33:41 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Paolo Bonzini <[email protected]>" [full]
# gpg:                 aka "Paolo Bonzini <[email protected]>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (47 commits)
  linux-headers: update again to 5.8
  apic: Report current_count via 'info lapic'
  scripts: improve message when TAP based tests fail
  target/i386: Enable TSX Suspend Load Address Tracking feature
  target/i386: Add SERIALIZE cpu feature
  softmmu/vl: Remove the check for colons in -accel parameters
  cpu-throttle: new module, extracted from cpus.c
  softmmu: move softmmu only files from root
  pc: fix leak in pc_system_flash_cleanup_unused
  cpus: Move CPU code from exec.c to cpus-common.c
  target/i386: Correct the warning message of Intel PT
  checkpatch: Change occurences of 'kernel' to 'qemu' in user messages
  iscsi: return -EIO when sense fields are meaningless
  iscsi: handle check condition status in retry loop
  target/i386: sev: fail query-sev-capabilities if QEMU cannot use SEV
  target/i386: sev: provide proper error reporting for query-sev-capabilities
  KVM: x86: believe what KVM says about WAITPKG
  target/i386: implement undocumented "smsw r32" behavior
  target/i386: remove gen_io_end
  Makefile: simplify MINIKCONF rules
  ...

Signed-off-by: Peter Maydell <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
…tart

When the disconnect event is triggered in the connecting stage,
the tcp_chr_disconnect_locked may be called twice.

The first call:
    #0  qemu_chr_socket_restart_timer (chr=0x55555582ee90) at chardev/char-socket.c:120
    #1  0x000055555558e38c in tcp_chr_disconnect_locked (chr=<optimized out>) at chardev/char-socket.c:490
    #2  0x000055555558e3cd in tcp_chr_disconnect (chr=0x55555582ee90) at chardev/char-socket.c:497
    #3  0x000055555558ea32 in tcp_chr_new_client (chr=chr@entry=0x55555582ee90, sioc=sioc@entry=0x55555582f0b0) at chardev/char-socket.c:892
    #4  0x000055555558eeb8 in qemu_chr_socket_connected (task=0x55555582f300, opaque=<optimized out>) at chardev/char-socket.c:1090
    #5  0x0000555555574352 in qio_task_complete (task=task@entry=0x55555582f300) at io/task.c:196
    #6  0x00005555555745f4 in qio_task_thread_result (opaque=0x55555582f300) at io/task.c:111
    #7  qio_task_wait_thread (task=0x55555582f300) at io/task.c:190
    #8  0x000055555558f17e in tcp_chr_wait_connected (chr=0x55555582ee90, errp=0x555555802a08 <error_abort>) at chardev/char-socket.c:1013
    #9  0x0000555555567cbd in char_socket_client_reconnect_test (opaque=0x5555557fe020 <client8unix>) at tests/test-char.c:1152
The second call:
    #0  0x00007ffff5ac3277 in raise () from /lib64/libc.so.6
    #1  0x00007ffff5ac4968 in abort () from /lib64/libc.so.6
    #2  0x00007ffff5abc096 in __assert_fail_base () from /lib64/libc.so.6
    #3  0x00007ffff5abc142 in __assert_fail () from /lib64/libc.so.6
    #4  0x000055555558d10a in qemu_chr_socket_restart_timer (chr=0x55555582ee90) at chardev/char-socket.c:125
    #5  0x000055555558df0c in tcp_chr_disconnect_locked (chr=<optimized out>) at chardev/char-socket.c:490
    #6  0x000055555558df4d in tcp_chr_disconnect (chr=0x55555582ee90) at chardev/char-socket.c:497
    #7  0x000055555558e5b2 in tcp_chr_new_client (chr=chr@entry=0x55555582ee90, sioc=sioc@entry=0x55555582f0b0) at chardev/char-socket.c:892
    #8  0x000055555558e93a in tcp_chr_connect_client_sync (chr=chr@entry=0x55555582ee90, errp=errp@entry=0x7fffffffd178) at chardev/char-socket.c:944
    #9  0x000055555558ec78 in tcp_chr_wait_connected (chr=0x55555582ee90, errp=0x555555802a08 <error_abort>) at chardev/char-socket.c:1035
    #10 0x000055555556804b in char_socket_client_test (opaque=0x5555557fe020 <client8unix>) at tests/test-char.c:1023

Run test/test-char to reproduce this issue.

test-char: chardev/char-socket.c:125: qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed.

Signed-off-by: Li Feng <[email protected]>
Acked-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
With a reconnect socket, qemu_char_open() will start a background
thread. It should keep a reference on the chardev.

Fixes invalid read:
READ of size 8 at 0x6040000ac858 thread T7
    #0 0x5555598d37b8 in unix_connect_saddr /home/elmarco/src/qq/util/qemu-sockets.c:954
    #1 0x5555598d4751 in socket_connect /home/elmarco/src/qq/util/qemu-sockets.c:1109
    #2 0x555559707c34 in qio_channel_socket_connect_sync /home/elmarco/src/qq/io/channel-socket.c:145
    #3 0x5555596adebb in tcp_chr_connect_client_task /home/elmarco/src/qq/chardev/char-socket.c:1104
    #4 0x555559723d55 in qio_task_thread_worker /home/elmarco/src/qq/io/task.c:123
    #5 0x5555598a6731 in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:519
    #6 0x7ffff40d4431 in start_thread (/lib64/libpthread.so.0+0x9431)
    #7 0x7ffff40029d2 in __clone (/lib64/libc.so.6+0x1019d2)

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Message-Id: <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
"tmp.tls_hostname" and "tmp.tls_creds" allocated by migrate_params_test_apply()
is forgot to free at the end of qmp_migrate_set_parameters(). Fix that.

The leak stack:
Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   #1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   #2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   #3 0xaaaac52447fb in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1377)
   #4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   #5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c:165)
   #6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   #7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   #8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   #9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   #10 0xaaaac55cae0f in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   #11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   #12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   #13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   #14 0xaaaac47f45ef(/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   #15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   #16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Direct leak of 2 byte(s) in 2 object(s) allocated from:
   #0 0xffffb597c20b in __interceptor_malloc (/usr/lib64/libasan.so.4+0xd320b)
   #1 0xffffb52dcb1b in g_malloc (/usr/lib64/libglib-2.0.so.0+0x58b1b)
   #2 0xffffb52f8143 in g_strdup (/usr/lib64/libglib-2.0.so.0+0x74143)
   #3 0xaaaac5244893 in migrate_params_test_apply (/usr/src/debug/qemu-4.1.0/migration/migration.c:1382)
   #4 0xaaaac52fdca7 in qmp_migrate_set_parameters (/usr/src/debug/qemu-4.1.0/qapi/qapi-commands-migration.c:192)
   #5 0xaaaac551d543 in qmp_dispatch (/usr/src/debug/qemu-4.1.0/qapi/qmp-dispatch.c)
   #6 0xaaaac52a0a8f in qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:125)
   #7 0xaaaac52a1c7f in monitor_qmp_dispatch (/usr/src/debug/qemu-4.1.0/monitor/qmp.c:214)
   #8 0xaaaac55cb0cf in aio_bh_call (/usr/src/debug/qemu-4.1.0/util/async.c:117)
   #9 0xaaaac55d4543 in aio_bh_poll (/usr/src/debug/qemu-4.1.0/util/aio-posix.c:459)
   #10 0xaaaac55cae0f in in aio_dispatch (/usr/src/debug/qemu-4.1.0/util/async.c:268)
   #11 0xffffb52d6a7b in g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0+0x52a7b)
   #12 0xaaaac55d1e3b(/usr/bin/qemu-kvm-4.1.0+0x1622e3b)
   #13 0xaaaac4e314bb(/usr/bin/qemu-kvm-4.1.0+0xe824bb)
   #14 0xaaaac47f45ef (/usr/bin/qemu-kvm-4.1.0+0x8455ef)
   #15 0xffffb4bfef3f in __libc_start_main (/usr/lib64/libc.so.6+0x23f3f)
   #16 0xaaaac47ffacb(/usr/bin/qemu-kvm-4.1.0+0x850acb)

Signed-off-by: Chuan Zheng <[email protected]>
Reviewed-by: KeQian Zhu <[email protected]>
Reviewed-by: HaiLiang <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O
request until it has completed. ENOMEDIUM requests are special because
there is no BlockDriverState when the drive has no medium!

Define a .get_aio_context() function for BlkAioEmAIOCB requests so that
bdrv_aio_cancel() can find the AioContext where the completion BH is
pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM
requests!

libFuzzer triggered the following assertion:

  cat << EOF | qemu-system-i386 -M pc-q35-5.0 \
    -nographic -monitor none -serial none \
    -qtest stdio -trace ide\*
  outl 0xcf8 0x8000fa24
  outl 0xcfc 0xe106c000
  outl 0xcf8 0x8000fa04
  outw 0xcfc 0x7
  outl 0xcf8 0x8000fb20
  write 0x0 0x3 0x2780e7
  write 0xe106c22c 0xd 0x1130c218021130c218021130c2
  write 0xe106c218 0x15 0x110010110010110010110010110010110010110010
  EOF
  ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7
  ide_reset IDEstate 0x56170a77a340
  Aborted (core dumped)

  (gdb) bt
  #1  0x00007ffff4f93895 in abort () at /lib64/libc.so.6
  #2  0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745
  #3  0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546
  #4  0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318
  #5  0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422
  #6  0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650
  #7  0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360
  #8  0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513
  #9  0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483

Looking at bdrv_aio_cancel:

2728 /* async I/Os */
2729
2730 void bdrv_aio_cancel(BlockAIOCB *acb)
2731 {
2732     qemu_aio_ref(acb);
2733     bdrv_aio_cancel_async(acb);
2734     while (acb->refcnt > 1) {
2735         if (acb->aiocb_info->get_aio_context) {
2736             aio_poll(acb->aiocb_info->get_aio_context(acb), true);
2737         } else if (acb->bs) {
2738             /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
2739              * assert that we're not using an I/O thread.  Thread-safe
2740              * code should use bdrv_aio_cancel_async exclusively.
2741              */
2742             assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context());
2743             aio_poll(bdrv_get_aio_context(acb->bs), true);
2744         } else {
2745             abort();     <===============
2746         }
2747     }
2748     qemu_aio_unref(acb);
2749 }

Fixes: 02c50ef ("block: Add bdrv_aio_cancel_async")
Reported-by: Alexander Bulekov <[email protected]>
Buglink: https://bugs.launchpad.net/qemu/+bug/1878255
Originally-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Max Reitz <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
To make deallocating partially constructed objects work, the
visit_type_STRUCT() need to succeed without doing anything when passed
a null object.

Commit cdd2b22 "qapi: Smooth visitor error checking in generated
code" broke that.  To reproduce, run tests/test-qobject-input-visitor
with AddressSanitizer:

    ==4353==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 16 byte(s) in 1 object(s) allocated from:
	#0 0x7f192d0c5d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
	#1 0x7f192cd21b10 in g_malloc0 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x51b10)
	#2 0x556725f6bbee in visit_next_list qapi/qapi-visit-core.c:86
	#3 0x556725f49e15 in visit_type_UserDefOneList tests/test-qapi-visit.c:474
	#4 0x556725f4489b in test_visitor_in_fail_struct_in_list tests/test-qobject-input-visitor.c:1086
	#5 0x7f192cd42f29  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x72f29)

    SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).

Test case /visitor/input/fail/struct-in-list feeds a list with a bad
element to the QObject input visitor.  Visiting that element duly
fails, and aborts the visit with the list only partially constructed:
the faulty object is null.  Cleaning up the partially constructed list
visits that null object, fails, and aborts the visit before the list
node gets freed.

Fix the the generated visit_type_STRUCT() to succeed for null objects.

Fixes: cdd2b22
Reported-by: Li Qiang <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>
Message-Id: <[email protected]>
Tested-by: Li Qiang <[email protected]>
Reviewed-by: Li Qiang <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 12, 2020
It should be safe to reenter qio_channel_yield() on io/channel read/write
path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
    echo $n; ((n++));
    ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done

After some time, qemu-io will hang trying to drain, for example, like
this:

 #3 aio_poll (ctx=0x55f006bdd890, blocking=true) at
    util/aio-posix.c:600
 #4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false,
    parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427
 #5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433
 #6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710
 #7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498
 #8 bdrv_open_inherit (filename=0x7fffba1563bc
    "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0,
    flags=24578, parent=0x0, child_class=0x0, child_role=0,
    errp=0x7fffba154620) at block.c:3491
 #9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block.c:3513
 #10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
    reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
    block/block-backend.c:421

And connection_co stack like this:

 #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918,
    action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302
 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193
 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at
    io/channel.c:472
 #3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:110
 #4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
    niov=1, errp=0x7fe96d729eb0) at io/channel.c:143
 #5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28
    "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at
    io/channel.c:247
 #6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:365
 #7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28,
    desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
    /work/src/qemu/master/include/block/nbd.h:391
 #8 nbd_start_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, structured_reply=true,
    zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904
 #9 nbd_receive_negotiate (aio_context=0x55f006bdd890,
    ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
    outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at
    nbd/client.c:1032
 #10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at
    block/nbd.c:1460
 #11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287
 #12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309
 #13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360
 #14 coroutine_trampoline (i0=113190480, i1=22000) at
    util/coroutine-ucontext.c:173

Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Eric Blake <[email protected]>
Signed-off-by: Eric Blake <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Aug 25, 2020
After build qemu with '-fsanitize=address' extra-cflags,
'make check' show following leak:

=================================================================
==44580==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 2500 byte(s) in 1 object(s) allocated from:
    #0 0x7f1b5a8b8d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
    #1 0x7f1b5a514b10 in g_malloc0 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x51b10)
    #2 0xd79ea4e4c0ad31c3  (<unknown module>)

SUMMARY: AddressSanitizer: 2500 byte(s) leaked in 1 allocation(s).

Call 'g_rand_free' in the end of function to avoid this.

Fixes: 4d3a329("tests/util-sockets: add abstract unix socket cases")
Signed-off-by: Li Qiang <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by:  xiaoqiang zhao <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>
jcmvbkbc pushed a commit that referenced this issue May 18, 2021
Refactor qemu_xkeymap_mapping_table() to have a single exit point,
so we can easily free the memory allocated by XGetAtomName().

This fixes when running a binary configured with --enable-sanitizers:

  Direct leak of 22 byte(s) in 1 object(s) allocated from:
      #0 0x561344a7473f in malloc (qemu-system-x86_64+0x1dab73f)
      #1 0x7fa4d9dc08aa in XGetAtomName (/lib64/libX11.so.6+0x2a8aa)

Fixes: 2ec7870 ("ui: convert GTK and SDL1 frontends to keycodemapdb")
Reviewed-by: Daniel P. Berrangé <[email protected]>
Reviewed-by: Laurent Vivier <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
jcmvbkbc pushed a commit that referenced this issue May 18, 2021
Running the WDR opcode triggers a segfault:

  $ cat > foo.S << EOF
  > __start:
  >     wdr
  > EOF
  $ avr-gcc -nostdlib -nostartfiles -mmcu=avr6 foo.S -o foo.elf
  $ qemu-system-avr -serial mon:stdio -nographic -no-reboot \
    -M mega -bios foo.elf -d in_asm --singlestep
  IN:
  0x00000000:  WDR
  Segmentation fault (core dumped)

  (gdb) bt
     #0  0x00005555add0b23a in gdb_get_cpu_pid (cpu=0x5555af5a4af0) at ../gdbstub.c:718
     #1  0x00005555add0b2dd in gdb_get_cpu_process (cpu=0x5555af5a4af0) at ../gdbstub.c:743
     #2  0x00005555add0e477 in gdb_set_stop_cpu (cpu=0x5555af5a4af0) at ../gdbstub.c:2742
     #3  0x00005555adc99b96 in cpu_handle_guest_debug (cpu=0x5555af5a4af0) at ../softmmu/cpus.c:306
     #4  0x00005555adcc66ab in rr_cpu_thread_fn (arg=0x5555af5a4af0) at ../accel/tcg/tcg-accel-ops-rr.c:224
     #5  0x00005555adefaf12 in qemu_thread_start (args=0x5555af5d9870) at ../util/qemu-thread-posix.c:521
     #6  0x00007f692d940ea5 in start_thread () from /lib64/libpthread.so.0
     #7  0x00007f692d6699fd in clone () from /lib64/libc.so.6

Since the watchdog peripheral is not implemented, simply
log the opcode as unimplemented and keep going.

Reported-by: Fred Konrad <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: KONRAD Frederic <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
jcmvbkbc pushed a commit that referenced this issue May 21, 2021
This is a partial revert of commits 77542d4 and bc79c87.

Usually, an error during initialisation means that the configuration was
wrong. Reconnecting won't make the error go away, but just turn the
error condition into an endless loop. Avoid this and return errors
again.

Additionally, calling vhost_user_blk_disconnect() from the chardev event
handler could result in use-after-free because none of the
initialisation code expects that the device could just go away in the
middle. So removing the call fixes crashes in several places.

For example, using a num-queues setting that is incompatible with the
backend would result in a crash like this (dereferencing dev->opaque,
which is already NULL):

 #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
 #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
 #2  0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
 #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
 #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
 #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
 #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
 #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
 #8  0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
 #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660

Note that this removes the ability to reconnect during initialisation
(but not during operation) when there is no permanent error, but the
backend restarts, as the implementation was buggy. This feature can be
added back in a follow-up series after changing error paths to
distinguish cases where retrying could help from cases with permanent
errors.

Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 29, 2022
Include the qtest reproducer provided by Alexander Bulekov
in https://gitlab.com/qemu-project/qemu/-/issues/542.
Without the previous commit, we get:

  $ make check-qtest-i386
  ...
  Running test tests/qtest/intel-hda-test
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==1580408==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc3d566fe0
      #0 0x63d297cf in address_space_translate_internal softmmu/physmem.c:356
      #1 0x63d27260 in flatview_do_translate softmmu/physmem.c:499:15
      #2 0x63d27af5 in flatview_translate softmmu/physmem.c:565:15
      #3 0x63d4ce84 in flatview_write softmmu/physmem.c:2850:10
      #4 0x63d4cb18 in address_space_write softmmu/physmem.c:2950:18
      #5 0x63d4d387 in address_space_rw softmmu/physmem.c:2960:16
      #6 0x62ae12f2 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
      #7 0x62ae104a in dma_memory_rw include/sysemu/dma.h:132:12
      #8 0x62ae6157 in dma_memory_write include/sysemu/dma.h:173:12
      #9 0x62ae5ec0 in stl_le_dma include/sysemu/dma.h:275:1
      #10 0x62ae5ba2 in stl_le_pci_dma include/hw/pci/pci.h:871:1
      #11 0x62ad59a6 in intel_hda_response hw/audio/intel-hda.c:372:12
      #12 0x62ad2afb in hda_codec_response hw/audio/intel-hda.c:107:5
      #13 0x62aec4e1 in hda_audio_command hw/audio/hda-codec.c:655:5
      #14 0x62ae05d9 in intel_hda_send_command hw/audio/intel-hda.c:307:5
      #15 0x62adff54 in intel_hda_corb_run hw/audio/intel-hda.c:342:9
      #16 0x62adc13b in intel_hda_set_corb_wp hw/audio/intel-hda.c:548:5
      #17 0x62ae5942 in intel_hda_reg_write hw/audio/intel-hda.c:977:9
      #18 0x62ada10a in intel_hda_mmio_write hw/audio/intel-hda.c:1054:5
      #19 0x63d8f383 in memory_region_write_accessor softmmu/memory.c:492:5
      #20 0x63d8ecc1 in access_with_adjusted_size softmmu/memory.c:554:18
      #21 0x63d8d5d6 in memory_region_dispatch_write softmmu/memory.c:1504:16
      #22 0x63d5e85e in flatview_write_continue softmmu/physmem.c:2812:23
      #23 0x63d4d05b in flatview_write softmmu/physmem.c:2854:12
      #24 0x63d4cb18 in address_space_write softmmu/physmem.c:2950:18
      #25 0x63d4d387 in address_space_rw softmmu/physmem.c:2960:16
      #26 0x62ae12f2 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
      #27 0x62ae104a in dma_memory_rw include/sysemu/dma.h:132:12
      #28 0x62ae6157 in dma_memory_write include/sysemu/dma.h:173:12
      #29 0x62ae5ec0 in stl_le_dma include/sysemu/dma.h:275:1
      #30 0x62ae5ba2 in stl_le_pci_dma include/hw/pci/pci.h:871:1
      #31 0x62ad59a6 in intel_hda_response hw/audio/intel-hda.c:372:12
      #32 0x62ad2afb in hda_codec_response hw/audio/intel-hda.c:107:5
      #33 0x62aec4e1 in hda_audio_command hw/audio/hda-codec.c:655:5
      #34 0x62ae05d9 in intel_hda_send_command hw/audio/intel-hda.c:307:5
      #35 0x62adff54 in intel_hda_corb_run hw/audio/intel-hda.c:342:9
      #36 0x62adc13b in intel_hda_set_corb_wp hw/audio/intel-hda.c:548:5
      #37 0x62ae5942 in intel_hda_reg_write hw/audio/intel-hda.c:977:9
      #38 0x62ada10a in intel_hda_mmio_write hw/audio/intel-hda.c:1054:5
      #39 0x63d8f383 in memory_region_write_accessor softmmu/memory.c:492:5
      #40 0x63d8ecc1 in access_with_adjusted_size softmmu/memory.c:554:18
      #41 0x63d8d5d6 in memory_region_dispatch_write softmmu/memory.c:1504:16
      #42 0x63d5e85e in flatview_write_continue softmmu/physmem.c:2812:23
      #43 0x63d4d05b in flatview_write softmmu/physmem.c:2854:12
      #44 0x63d4cb18 in address_space_write softmmu/physmem.c:2950:18
      #45 0x63d4d387 in address_space_rw softmmu/physmem.c:2960:16
      #46 0x62ae12f2 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
      #47 0x62ae104a in dma_memory_rw include/sysemu/dma.h:132:12
      #48 0x62ae6157 in dma_memory_write include/sysemu/dma.h:173:12
      ...
  SUMMARY: AddressSanitizer: stack-overflow softmmu/physmem.c:356 in address_space_translate_internal
  ==1580408==ABORTING
  Broken pipe
  Aborted (core dumped)

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Thomas Huth <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 29, 2022
The issue reported by OSS-Fuzz produces the following backtrace:

  ==447470==ERROR: AddressSanitizer: heap-buffer-overflow
  READ of size 1 at 0x61500002a080 thread T0
      #0 0x71766d47 in sdhci_read_dataport hw/sd/sdhci.c:474:18
      #1 0x7175f139 in sdhci_read hw/sd/sdhci.c:1022:19
      #2 0x721b937b in memory_region_read_accessor softmmu/memory.c:440:11
      #3 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
      #4 0x7216f47c in memory_region_dispatch_read1 softmmu/memory.c:1424:16
      #5 0x7216ebb9 in memory_region_dispatch_read softmmu/memory.c:1452:9
      #6 0x7212db5d in flatview_read_continue softmmu/physmem.c:2879:23
      #7 0x7212f958 in flatview_read softmmu/physmem.c:2921:12
      #8 0x7212f418 in address_space_read_full softmmu/physmem.c:2934:18
      #9 0x721305a9 in address_space_rw softmmu/physmem.c:2962:16
      #10 0x7175a392 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
      #11 0x7175a0ea in dma_memory_rw include/sysemu/dma.h:132:12
      #12 0x71759684 in dma_memory_read include/sysemu/dma.h:152:12
      #13 0x7175518c in sdhci_do_adma hw/sd/sdhci.c:823:27
      #14 0x7174bf69 in sdhci_data_transfer hw/sd/sdhci.c:935:13
      #15 0x7176aaa7 in sdhci_send_command hw/sd/sdhci.c:376:9
      #16 0x717629ee in sdhci_write hw/sd/sdhci.c:1212:9
      #17 0x72172513 in memory_region_write_accessor softmmu/memory.c:492:5
      #18 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
      #19 0x72170766 in memory_region_dispatch_write softmmu/memory.c:1504:16
      #20 0x721419ee in flatview_write_continue softmmu/physmem.c:2812:23
      #21 0x721301eb in flatview_write softmmu/physmem.c:2854:12
      #22 0x7212fca8 in address_space_write softmmu/physmem.c:2950:18
      #23 0x721d9a53 in qtest_process_command softmmu/qtest.c:727:9

A DMA descriptor is previously filled in RAM. An I/O access to the
device (frames #22 to #16) start the DMA engine (frame #13). The
engine fetch the descriptor and execute the request, which itself
accesses the SDHCI I/O registers (frame #1 and #0), triggering a
re-entrancy issue.

Fix by prohibit transactions from the DMA to devices. The DMA engine
is thus restricted to memories.

Reported-by: OSS-Fuzz (Issue 36391)
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/451
Message-Id: <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 29, 2022
Include the qtest reproducer provided by Alexander Bulekov
in https://gitlab.com/qemu-project/qemu/-/issues/451. Without
the previous commit, we get:

  $ make check-qtest-i386
  ...
  Running test qtest-i386/fuzz-sdcard-test
  ==447470==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61500002a080 at pc 0x564c71766d48 bp 0x7ffc126c62b0 sp 0x7ffc126c62a8
  READ of size 1 at 0x61500002a080 thread T0
      #0 0x564c71766d47 in sdhci_read_dataport hw/sd/sdhci.c:474:18
      #1 0x564c7175f139 in sdhci_read hw/sd/sdhci.c:1022:19
      #2 0x564c721b937b in memory_region_read_accessor softmmu/memory.c:440:11
      #3 0x564c72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
      #4 0x564c7216f47c in memory_region_dispatch_read1 softmmu/memory.c:1424:16
      #5 0x564c7216ebb9 in memory_region_dispatch_read softmmu/memory.c:1452:9
      #6 0x564c7212db5d in flatview_read_continue softmmu/physmem.c:2879:23
      #7 0x564c7212f958 in flatview_read softmmu/physmem.c:2921:12
      #8 0x564c7212f418 in address_space_read_full softmmu/physmem.c:2934:18
      #9 0x564c721305a9 in address_space_rw softmmu/physmem.c:2962:16
      #10 0x564c7175a392 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
      #11 0x564c7175a0ea in dma_memory_rw include/sysemu/dma.h:132:12
      #12 0x564c71759684 in dma_memory_read include/sysemu/dma.h:152:12
      #13 0x564c7175518c in sdhci_do_adma hw/sd/sdhci.c:823:27
      #14 0x564c7174bf69 in sdhci_data_transfer hw/sd/sdhci.c:935:13
      #15 0x564c7176aaa7 in sdhci_send_command hw/sd/sdhci.c:376:9
      #16 0x564c717629ee in sdhci_write hw/sd/sdhci.c:1212:9
      #17 0x564c72172513 in memory_region_write_accessor softmmu/memory.c:492:5
      #18 0x564c72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
      #19 0x564c72170766 in memory_region_dispatch_write softmmu/memory.c:1504:16
      #20 0x564c721419ee in flatview_write_continue softmmu/physmem.c:2812:23
      #21 0x564c721301eb in flatview_write softmmu/physmem.c:2854:12
      #22 0x564c7212fca8 in address_space_write softmmu/physmem.c:2950:18
      #23 0x564c721d9a53 in qtest_process_command softmmu/qtest.c:727:9

  0x61500002a080 is located 0 bytes to the right of 512-byte region [0x615000029e80,0x61500002a080)
  allocated by thread T0 here:
      #0 0x564c708e1737 in __interceptor_calloc (qemu-system-i386+0x1e6a737)
      #1 0x7ff05567b5e0 in g_malloc0 (/lib64/libglib-2.0.so.0+0x5a5e0)
      #2 0x564c71774adb in sdhci_pci_realize hw/sd/sdhci-pci.c:36:5

  SUMMARY: AddressSanitizer: heap-buffer-overflow hw/sd/sdhci.c:474:18 in sdhci_read_dataport
  Shadow bytes around the buggy address:
    0x0c2a7fffd3c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c2a7fffd3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c2a7fffd3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c2a7fffd3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c2a7fffd400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0c2a7fffd410:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c2a7fffd420: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c2a7fffd430: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c2a7fffd440: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c2a7fffd450: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c2a7fffd460: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Heap left redzone:       fa
    Freed heap region:       fd
  ==447470==ABORTING
  Broken pipe
  ERROR qtest-i386/fuzz-sdcard-test - too few tests run (expected 3, got 2)

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Acked-by: Thomas Huth <[email protected]>
Message-Id: <[email protected]>
[thuth: Replaced "-m 4G" with "-m 512M"]
Signed-off-by: Thomas Huth <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Apr 20, 2022
Since commit 0439c5a ("block/block-backend.c: assertions for
block-backend") QEMU crashes when using Cocoa on Darwin hosts.

Example on macOS:

  $ qemu-system-i386
  Assertion failed: (qemu_in_main_thread()), function blk_all_next, file block-backend.c, line 552.
  Abort trap: 6

Looking with lldb:

  Assertion failed: (qemu_in_main_thread()), function blk_all_next, file block-backend.c, line 552.
  Process 76914 stopped
  * thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
     frame #4: 0x000000010057c2d4 qemu-system-i386`blk_all_next.cold.1
  at block-backend.c:552:5 [opt]
      549    */
      550   BlockBackend *blk_all_next(BlockBackend *blk)
      551   {
  --> 552       GLOBAL_STATE_CODE();
      553       return blk ? QTAILQ_NEXT(blk, link)
      554                  : QTAILQ_FIRST(&block_backends);
      555   }
  Target 1: (qemu-system-i386) stopped.

  (lldb) bt
  * thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
     frame #0: 0x00000001908c99b8 libsystem_kernel.dylib`__pthread_kill + 8
     frame #1: 0x00000001908fceb0 libsystem_pthread.dylib`pthread_kill + 288
     frame #2: 0x000000019083a314 libsystem_c.dylib`abort + 164
     frame #3: 0x000000019083972c libsystem_c.dylib`__assert_rtn + 300
   * frame #4: 0x000000010057c2d4 qemu-system-i386`blk_all_next.cold.1 at block-backend.c:552:5 [opt]
     frame #5: 0x00000001003c00b4 qemu-system-i386`blk_all_next(blk=<unavailable>) at block-backend.c:552:5 [opt]
     frame #6: 0x00000001003d8f04 qemu-system-i386`qmp_query_block(errp=0x0000000000000000) at qapi.c:591:16 [opt]
     frame #7: 0x000000010003ab0c qemu-system-i386`main [inlined] addRemovableDevicesMenuItems at cocoa.m:1756:21 [opt]
     frame #8: 0x000000010003ab04 qemu-system-i386`main(argc=<unavailable>, argv=<unavailable>) at cocoa.m:1980:5 [opt]
     frame #9: 0x00000001012690f4 dyld`start + 520

As we are in passed release 7.0 hard freeze, disable the block
backend assertion which, while being valuable during development,
is not helpful to users. We'll restore this assertion immediately
once 7.0 is released and work on a fix.

Suggested-by: Akihiko Odaki <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Akihiko Odaki <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Message-Id: <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 2, 2023
Fixes the appended use-after-free. The root cause is that
during tb invalidation we use CPU_FOREACH, and therefore
to safely free a vCPU we must wait for an RCU grace period
to elapse.

$ x86_64-linux-user/qemu-x86_64 tests/tcg/x86_64-linux-user/munmap-pthread
=================================================================
==1800604==ERROR: AddressSanitizer: heap-use-after-free on address 0x62d0005f7418 at pc 0x5593da6704eb bp 0x7f4961a7ac70 sp 0x7f4961a7ac60
READ of size 8 at 0x62d0005f7418 thread T2
    #0 0x5593da6704ea in tb_jmp_cache_inval_tb ../accel/tcg/tb-maint.c:244
    #1 0x5593da6704ea in do_tb_phys_invalidate ../accel/tcg/tb-maint.c:290
    #2 0x5593da670631 in tb_phys_invalidate__locked ../accel/tcg/tb-maint.c:306
    #3 0x5593da670631 in tb_invalidate_phys_page_range__locked ../accel/tcg/tb-maint.c:542
    #4 0x5593da67106d in tb_invalidate_phys_range ../accel/tcg/tb-maint.c:614
    #5 0x5593da6a64d4 in target_munmap ../linux-user/mmap.c:766
    #6 0x5593da6dba05 in do_syscall1 ../linux-user/syscall.c:10105
    #7 0x5593da6f564c in do_syscall ../linux-user/syscall.c:13329
    #8 0x5593da49e80c in cpu_loop ../linux-user/x86_64/../i386/cpu_loop.c:233
    #9 0x5593da6be28c in clone_func ../linux-user/syscall.c:6633
    #10 0x7f496231cb42 in start_thread nptl/pthread_create.c:442
    #11 0x7f49623ae9ff  (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff)

0x62d0005f7418 is located 28696 bytes inside of 32768-byte region [0x62d0005f0400,0x62d0005f8400)
freed by thread T148 here:
    #0 0x7f49627b6460 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
    #1 0x5593da5ac057 in cpu_exec_unrealizefn ../cpu.c:180
    #2 0x5593da81f851  (/home/cota/src/qemu/build/qemu-x86_64+0x484851)

Signed-off-by: Emilio Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 2, 2023
Fixes this tsan crash, easy to reproduce with any large enough program:

$ tests/unit/test-qht
1..2
ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=1821568)
    #0 __tsan::CheckUnwind() ../../../../src/libsanitizer/tsan/tsan_rtl.cpp:353 (libtsan.so.2+0x90034)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86 (libtsan.so.2+0xca555)
    #2 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::addLock(unsigned long, unsigned long, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:67 (libtsan.so.2+0xb3616)
    #3 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::addLock(unsigned long, unsigned long, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:59 (libtsan.so.2+0xb3616)
    #4 __sanitizer::DeadlockDetector<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >::onLockAfter(__sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, __sanitizer::BasicBitVector<unsigned long> > >*, unsigned long, unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h:216 (libtsan.so.2+0xb3616)
    #5 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, __sanitizer::DDMutex*, bool, bool) ../../../../src/libsanitizer/sanitizer_common/sanitizer_deadlock_detector1.cpp:169 (libtsan.so.2+0xb3616)
    #6 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int, int) ../../../../src/libsanitizer/tsan/tsan_rtl_mutex.cpp:200 (libtsan.so.2+0xa3382)
    #7 __tsan_mutex_post_lock ../../../../src/libsanitizer/tsan/tsan_interface_ann.cpp:384 (libtsan.so.2+0x76bc3)
    #8 qemu_spin_lock /home/cota/src/qemu/include/qemu/thread.h:259 (test-qht+0x44a97)
    #9 qht_map_lock_buckets ../util/qht.c:253 (test-qht+0x44a97)
    #10 do_qht_iter ../util/qht.c:809 (test-qht+0x45f33)
    #11 qht_iter ../util/qht.c:821 (test-qht+0x45f33)
    #12 iter_check ../tests/unit/test-qht.c:121 (test-qht+0xe473)
    #13 qht_do_test ../tests/unit/test-qht.c:202 (test-qht+0xe473)
    #14 qht_test ../tests/unit/test-qht.c:240 (test-qht+0xe7c1)
    #15 test_default ../tests/unit/test-qht.c:246 (test-qht+0xe828)
    #16 <null> <null> (libglib-2.0.so.0+0x7daed)
    #17 <null> <null> (libglib-2.0.so.0+0x7d80a)
    #18 <null> <null> (libglib-2.0.so.0+0x7d80a)
    #19 g_test_run_suite <null> (libglib-2.0.so.0+0x7dfe9)
    #20 g_test_run <null> (libglib-2.0.so.0+0x7e055)
    #21 main ../tests/unit/test-qht.c:259 (test-qht+0xd2c6)
    #22 __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f)
    #23 __libc_start_main_impl ../csu/libc-start.c:392 (libc.so.6+0x29e3f)
    #24 _start <null> (test-qht+0xdb44)

Signed-off-by: Emilio Cota <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Alex Bennée <[email protected]>
Message-Id: <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 2, 2023
address_space_map() can fail:

  uart:~$ hash test
  sha256_test
  tv[0]:
  Segmentation fault: 11
  Thread 3 "qemu-system-arm" received signal SIGSEGV, Segmentation fault.
  gen_acc_mode_iov (req_len=0x7ffff18b7778, id=<optimized out>, iov=0x7ffff18b7780, s=0x555556ce0bd0)
      at ../hw/misc/aspeed_hace.c:171
  171         if (has_padding(s, &iov[id], *req_len, &total_msg_len, &pad_offset)) {
  (gdb) bt
  #0  gen_acc_mode_iov (req_len=0x7ffff18b7778, id=<optimized out>, iov=0x7ffff18b7780, s=0x555556ce0bd0)
      at ../hw/misc/aspeed_hace.c:171
  #1  do_hash_operation (s=s@entry=0x555556ce0bd0, algo=3, sg_mode=sg_mode@entry=true, acc_mode=acc_mode@entry=true)
      at ../hw/misc/aspeed_hace.c:224
  #2  0x00005555559bdbb8 in aspeed_hace_write (opaque=<optimized out>, addr=12, data=262488, size=<optimized out>)
      at ../hw/misc/aspeed_hace.c:358

This change doesn't fix much, but at least the guest
can't crash QEMU anymore. Instead it is still usable:

  uart:~$ hash test
  sha256_test
  tv[0]:hash_final error
  sha384_test
  tv[0]:hash_final error
  sha512_test
  tv[0]:hash_final error
  [00:00:06.278,000] <err> hace_global: HACE poll timeout
  [00:00:09.324,000] <err> hace_global: HACE poll timeout
  [00:00:12.261,000] <err> hace_global: HACE poll timeout
  uart:~$

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Peter Delevoryas <[email protected]>
Reviewed-by: Cédric Le Goater <[email protected]>
Signed-off-by: Cédric Le Goater <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 2, 2023
When compiling QEMU with "--enable-sanitizers --enable-xkbcommon --cc=clang"
there is a memory leak warning when running qemu-keymap:

 $ ./qemu-keymap -f pc-bios/keymaps/de -l de

 =================================================================
 ==610321==ERROR: LeakSanitizer: detected memory leaks

 Direct leak of 136 byte(s) in 1 object(s) allocated from:
     #0 0x5642830d0820 in __interceptor_calloc.part.11 asan_malloc_linux.cpp.o
     #1 0x7f31873b8d2b in xkb_state_new (/lib64/libxkbcommon.so.0+0x1dd2b) (BuildId: dd32581e2248833243f3f646324ae9b98469f025)

 SUMMARY: AddressSanitizer: 136 byte(s) leaked in 1 allocation(s).

It can be silenced by properly releasing the "state" again
after it has been used.

Message-Id: <[email protected]>
Reviewed-by: Peter Maydell <[email protected]>
Signed-off-by: Thomas Huth <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 14, 2023
After live migration with virtio block device, qemu crash at:

	#0  0x000055914f46f795 in object_dynamic_cast_assert (obj=0x559151b7b090, typename=0x55914f80fbc4 "qio-channel", file=0x55914f80fb90 "/images/testvfe/sw/qemu.gerrit/include/io/channel.h", line=30, func=0x55914f80fcb8 <__func__.17257> "QIO_CHANNEL") at ../qom/object.c:872
	#1  0x000055914f480d68 in QIO_CHANNEL (obj=0x559151b7b090) at /images/testvfe/sw/qemu.gerrit/include/io/channel.h:29
	#2  0x000055914f4812f8 in qio_net_listener_set_client_func_full (listener=0x559151b7a720, func=0x55914f580b97 <tcp_chr_accept>, data=0x5591519f4ea0, notify=0x0, context=0x0) at ../io/net-listener.c:166
	#3  0x000055914f580059 in tcp_chr_update_read_handler (chr=0x5591519f4ea0) at ../chardev/char-socket.c:637
	#4  0x000055914f583dca in qemu_chr_be_update_read_handlers (s=0x5591519f4ea0, context=0x0) at ../chardev/char.c:226
	#5  0x000055914f57b7c9 in qemu_chr_fe_set_handlers_full (b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false, sync_state=true) at ../chardev/char-fe.c:279
	#6  0x000055914f57b86d in qemu_chr_fe_set_handlers (b=0x559152bf23a0, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=false) at ../chardev/char-fe.c:304
	#7  0x000055914f378caf in vhost_user_async_close (d=0x559152bf21a0, chardev=0x559152bf23a0, vhost=0x559152bf2420, cb=0x55914f2fb8c1 <vhost_user_blk_disconnect>) at ../hw/virtio/vhost-user.c:2725
	#8  0x000055914f2fba40 in vhost_user_blk_event (opaque=0x559152bf21a0, event=CHR_EVENT_CLOSED) at ../hw/block/vhost-user-blk.c:395
	#9  0x000055914f58388c in chr_be_event (s=0x5591519f4ea0, event=CHR_EVENT_CLOSED) at ../chardev/char.c:61
	#10 0x000055914f583905 in qemu_chr_be_event (s=0x5591519f4ea0, event=CHR_EVENT_CLOSED) at ../chardev/char.c:81
	#11 0x000055914f581275 in char_socket_finalize (obj=0x5591519f4ea0) at ../chardev/char-socket.c:1083
	#12 0x000055914f46f073 in object_deinit (obj=0x5591519f4ea0, type=0x5591519055c0) at ../qom/object.c:680
	#13 0x000055914f46f0e5 in object_finalize (data=0x5591519f4ea0) at ../qom/object.c:694
	#14 0x000055914f46ff06 in object_unref (objptr=0x5591519f4ea0) at ../qom/object.c:1202
	#15 0x000055914f4715a4 in object_finalize_child_property (obj=0x559151b76c50, name=0x559151b7b250 "char3", opaque=0x5591519f4ea0) at ../qom/object.c:1747
	#16 0x000055914f46ee86 in object_property_del_all (obj=0x559151b76c50) at ../qom/object.c:632
	#17 0x000055914f46f0d2 in object_finalize (data=0x559151b76c50) at ../qom/object.c:693
	#18 0x000055914f46ff06 in object_unref (objptr=0x559151b76c50) at ../qom/object.c:1202
	#19 0x000055914f4715a4 in object_finalize_child_property (obj=0x559151b6b560, name=0x559151b76630 "chardevs", opaque=0x559151b76c50) at ../qom/object.c:1747
	#20 0x000055914f46ef67 in object_property_del_child (obj=0x559151b6b560, child=0x559151b76c50) at ../qom/object.c:654
	#21 0x000055914f46f042 in object_unparent (obj=0x559151b76c50) at ../qom/object.c:673
	#22 0x000055914f58632a in qemu_chr_cleanup () at ../chardev/char.c:1189
	#23 0x000055914f16c66c in qemu_cleanup () at ../softmmu/runstate.c:830
	#24 0x000055914eee7b9e in qemu_default_main () at ../softmmu/main.c:38
	#25 0x000055914eee7bcc in main (argc=86, argv=0x7ffc97cb8d88) at ../softmmu/main.c:48

In char_socket_finalize after s->listener freed, event callback function
vhost_user_blk_event will be called to handle CHR_EVENT_CLOSED.
vhost_user_blk_event is calling qio_net_listener_set_client_func_full which
is still using s->listener.

Setting s->listener = NULL after object_unref(OBJECT(s->listener)) can
solve this issue.

Signed-off-by: Yajun Wu <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 17, 2023
This leakage can be seen through test-io-channel-tls:

$ ../configure --target-list=aarch64-softmmu --enable-sanitizers
$ make ./tests/unit/test-io-channel-tls
$ ./tests/unit/test-io-channel-tls

Indirect leak of 104 byte(s) in 1 object(s) allocated from:
    #0 0x7f81d1725808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7f81d135ae98 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57e98)
    #2 0x55616c5d4c1b in object_new_with_propv ../qom/object.c:795
    #3 0x55616c5d4a83 in object_new_with_props ../qom/object.c:768
    #4 0x55616c5c5415 in test_tls_creds_create ../tests/unit/test-io-channel-tls.c:70
    #5 0x55616c5c5a6b in test_io_channel_tls ../tests/unit/test-io-channel-tls.c:158
    #6 0x7f81d137d58d  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)

Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7f81d1725a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0x7f81d1472a20 in gnutls_dh_params_init (/lib/x86_64-linux-gnu/libgnutls.so.30+0x46a20)
    #2 0x55616c6485ff in qcrypto_tls_creds_x509_load ../crypto/tlscredsx509.c:634
    #3 0x55616c648ba2 in qcrypto_tls_creds_x509_complete ../crypto/tlscredsx509.c:694
    #4 0x55616c5e1fea in user_creatable_complete ../qom/object_interfaces.c:28
    #5 0x55616c5d4c8c in object_new_with_propv ../qom/object.c:807
    #6 0x55616c5d4a83 in object_new_with_props ../qom/object.c:768
    #7 0x55616c5c5415 in test_tls_creds_create ../tests/unit/test-io-channel-tls.c:70
    #8 0x55616c5c5a6b in test_io_channel_tls ../tests/unit/test-io-channel-tls.c:158
    #9 0x7f81d137d58d  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)

...

SUMMARY: AddressSanitizer: 49143 byte(s) leaked in 184 allocation(s).

The docs for `g_source_add_child_source(source, child_source)` says
"source will hold a reference on child_source while child_source is
attached to it." Therefore, we should unreference the child source at
`qio_channel_tls_read_watch()` after attaching it to `source`. With this
change, ./tests/unit/test-io-channel-tls shows no leakages.

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Matheus Tavares Bernardino <[email protected]>
Signed-off-by: Daniel P. Berrangé <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 17, 2023
xbzrle_encode_buffer_avx512() checks for overflows too scarcely in its
outer loop, causing out-of-bounds writes:

$ ../configure --target-list=aarch64-softmmu --enable-sanitizers --enable-avx512bw
$ make tests/unit/test-xbzrle && ./tests/unit/test-xbzrle

==5518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000b100 at pc 0x561109a7714d bp 0x7ffed712a440 sp 0x7ffed712a430
WRITE of size 1 at 0x62100000b100 thread T0
    #0 0x561109a7714c in uleb128_encode_small ../util/cutils.c:831
    #1 0x561109b67f6a in xbzrle_encode_buffer_avx512 ../migration/xbzrle.c:275
    #2 0x5611099a7428 in test_encode_decode_overflow ../tests/unit/test-xbzrle.c:153
    #3 0x7fb2fb65a58d  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)
    #4 0x7fb2fb65a333  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a333)
    #5 0x7fb2fb65aa79 in g_test_run_suite (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa79)
    #6 0x7fb2fb65aa94 in g_test_run (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa94)
    #7 0x5611099a3a23 in main ../tests/unit/test-xbzrle.c:218
    #8 0x7fb2fa78c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #9 0x5611099a608d in _start (/qemu/build/tests/unit/test-xbzrle+0x28408d)

0x62100000b100 is located 0 bytes to the right of 4096-byte region [0x62100000a100,0x62100000b100)
allocated by thread T0 here:
    #0 0x7fb2fb823a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0x7fb2fb637ef0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57ef0)

Fix that by performing the overflow check in the inner loop, instead.

Signed-off-by: Matheus Tavares Bernardino <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Mar 22, 2023
For ex, when resetting the xlnx-zcu102 machine:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x50)
   * frame #0: 0x10020a740 gd_vc_send_chars(vc=0x000000000) at
gtk.c:1759:41 [opt]
     frame #1: 0x100636264 qemu_chr_fe_accept_input(be=<unavailable>) at
char-fe.c:159:9 [opt]
     frame #2: 0x1000608e0 cadence_uart_reset_hold [inlined]
uart_rx_reset(s=0x10810a960) at cadence_uart.c:158:5 [opt]
     frame #3: 0x1000608d4 cadence_uart_reset_hold(obj=0x10810a960) at
cadence_uart.c:530:5 [opt]
     frame #4: 0x100580ab4 resettable_phase_hold(obj=0x10810a960,
opaque=0x000000000, type=<unavailable>) at resettable.c:0 [opt]
     frame #5: 0x10057d1b0 bus_reset_child_foreach(obj=<unavailable>,
cb=(resettable_phase_hold at resettable.c:162), opaque=0x000000000,
type=RESET_TYPE_COLD) at bus.c:97:13 [opt]
     frame #6: 0x1005809f8 resettable_phase_hold [inlined]
resettable_child_foreach(rc=0x000060000332d2c0, obj=0x0000600002c1c180,
cb=<unavailable>, opaque=0x000000000, type=RESET_TYPE_COLD) at
resettable.c:96:9 [opt]
     frame #7: 0x1005809d8 resettable_phase_hold(obj=0x0000600002c1c180,
opaque=0x000000000, type=RESET_TYPE_COLD) at resettable.c:173:5 [opt]
     frame #8: 0x1005803a0
resettable_assert_reset(obj=0x0000600002c1c180, type=<unavailable>) at
resettable.c:60:5 [opt]
     frame #9: 0x10058027c resettable_reset(obj=0x0000600002c1c180,
type=RESET_TYPE_COLD) at resettable.c:45:5 [opt]

While the chardev is created early, the VirtualConsole is associated
after, during qemu_init_displays().

Signed-off-by: Marc-André Lureau <[email protected]>
Tested-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Apr 5, 2023
blk_get_geometry() eventually calls bdrv_nb_sectors(), which is a
co_wrapper_mixed_bdrv_rdlock. This means that when it is called from
coroutine context, it already assume to have the graph locked.

However, virtio_blk_sect_range_ok() in block/export/virtio-blk-handler.c
(used by vhost-user-blk and VDUSE exports) runs in a coroutine, but
doesn't take the graph lock - blk_*() functions are generally expected
to do that internally. This causes an assertion failure when accessing
an export for the first time if it runs in an iothread.

This is an example of the crash:

  $ ./storage-daemon/qemu-storage-daemon --object iothread,id=th0 --blockdev file,filename=/home/kwolf/images/hd.img,node-name=disk --export vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.sock,node-name=disk,id=exp0,iothread=th0
  qemu-storage-daemon: ../block/graph-lock.c:268: void assert_bdrv_graph_readable(void): Assertion `qemu_in_main_thread() || reader_count()' failed.

  (gdb) bt
  #0  0x00007ffff6eafe5c in __pthread_kill_implementation () from /lib64/libc.so.6
  #1  0x00007ffff6e5fa76 in raise () from /lib64/libc.so.6
  #2  0x00007ffff6e497fc in abort () from /lib64/libc.so.6
  #3  0x00007ffff6e4971b in __assert_fail_base.cold () from /lib64/libc.so.6
  #4  0x00007ffff6e58656 in __assert_fail () from /lib64/libc.so.6
  #5  0x00005555556337a3 in assert_bdrv_graph_readable () at ../block/graph-lock.c:268
  #6  0x00005555555fd5a2 in bdrv_co_nb_sectors (bs=0x5555564c5ef0) at ../block.c:5847
  #7  0x00005555555ee949 in bdrv_nb_sectors (bs=0x5555564c5ef0) at block/block-gen.c:256
  #8  0x00005555555fd6b9 in bdrv_get_geometry (bs=0x5555564c5ef0, nb_sectors_ptr=0x7fffef7fedd0) at ../block.c:5884
  #9  0x000055555562ad6d in blk_get_geometry (blk=0x5555564cb200, nb_sectors_ptr=0x7fffef7fedd0) at ../block/block-backend.c:1624
  #10 0x00005555555ddb74 in virtio_blk_sect_range_ok (blk=0x5555564cb200, block_size=512, sector=0, size=512) at ../block/export/virtio-blk-handler.c:44
  #11 0x00005555555dd80d in virtio_blk_process_req (handler=0x5555564cbb98, in_iov=0x7fffe8003830, out_iov=0x7fffe8003860, in_num=1, out_num=0) at ../block/export/virtio-blk-handler.c:189
  #12 0x00005555555dd546 in vu_blk_virtio_process_req (opaque=0x7fffe8003800) at ../block/export/vhost-user-blk-server.c:66
  #13 0x00005555557bf4a1 in coroutine_trampoline (i0=-402635264, i1=32767) at ../util/coroutine-ucontext.c:177
  #14 0x00007ffff6e75c20 in ?? () from /lib64/libc.so.6
  #15 0x00007fffefffa870 in ?? ()
  #16 0x0000000000000000 in ?? ()

Fix this by creating a new blk_co_get_geometry() that takes the lock,
and changing blk_get_geometry() to be a co_wrapper_mixed around it.

To make the resulting code cleaner, virtio-blk-handler.c can directly
call the coroutine version now (though that wouldn't be necessary for
fixing the bug, taking the lock in blk_co_get_geometry() is what fixes
it).

Fixes: 8ab8140
Reported-by: Lukáš Doktor <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Emanuele Giuseppe Esposito <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Nov 19, 2023
"blob" resources don't have an associated pixman image:

#0  pixman_image_get_stride (image=0x0) at ../pixman/pixman-image.c:921
#1  0x0000562327c25236 in virtio_gpu_save (f=0x56232bb13b00, opaque=0x56232b555a60, size=0, field=0x5623289ab6c8 <__compound_literal.3+104>, vmdesc=0x56232ab59fe0) at ../hw/display/virtio-gpu.c:1225

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=2236353

Signed-off-by: Marc-André Lureau <[email protected]>
Acked-by: Peter Xu <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Nov 19, 2023
If there is a pending DMA operation during ide_bus_reset(), the fact
that the IDEState is already reset before the operation is canceled
can be problematic. In particular, ide_dma_cb() might be called and
then use the reset IDEState which contains the signature after the
reset. When used to construct the IO operation this leads to
ide_get_sector() returning 0 and nsector being 1. This is particularly
bad, because a write command will thus destroy the first sector which
often contains a partition table or similar.

Traces showing the unsolicited write happening with IDEState
0x5595af6949d0 being used after reset:

> ahci_port_write ahci(0x5595af6923f0)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000300
> ahci_reset_port ahci(0x5595af6923f0)[0]: reset port
> ide_reset IDEstate 0x5595af6949d0
> ide_reset IDEstate 0x5595af694da8
> ide_bus_reset_aio aio_cancel
> dma_aio_cancel dbs=0x7f64600089a0
> dma_blk_cb dbs=0x7f64600089a0 ret=0
> dma_complete dbs=0x7f64600089a0 ret=0 cb=0x5595acd40b30
> ahci_populate_sglist ahci(0x5595af6923f0)[0]
> ahci_dma_prepare_buf ahci(0x5595af6923f0)[0]: prepare buf limit=512 prepared=512
> ide_dma_cb IDEState 0x5595af6949d0; sector_num=0 n=1 cmd=DMA WRITE
> dma_blk_io dbs=0x7f6420802010 bs=0x5595ae2c6c30 offset=0 to_dev=1
> dma_blk_cb dbs=0x7f6420802010 ret=0

> (gdb) p *qiov
> $11 = {iov = 0x7f647c76d840, niov = 1, {{nalloc = 1, local_iov = {iov_base = 0x0,
>       iov_len = 512}}, {__pad = "\001\000\000\000\000\000\000\000\000\000\000",
>       size = 512}}}
> (gdb) bt
> #0  blk_aio_pwritev (blk=0x5595ae2c6c30, offset=0, qiov=0x7f6420802070, flags=0,
>     cb=0x5595ace6f0b0 <dma_blk_cb>, opaque=0x7f6420802010)
>     at ../block/block-backend.c:1682
> #1  0x00005595ace6f185 in dma_blk_cb (opaque=0x7f6420802010, ret=<optimized out>)
>     at ../softmmu/dma-helpers.c:179
> #2  0x00005595ace6f778 in dma_blk_io (ctx=0x5595ae0609f0,
>     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>     io_func=io_func@entry=0x5595ace6ee30 <dma_blk_write_io_func>,
>     io_func_opaque=io_func_opaque@entry=0x5595ae2c6c30,
>     cb=0x5595acd40b30 <ide_dma_cb>, opaque=0x5595af6949d0,
>     dir=DMA_DIRECTION_TO_DEVICE) at ../softmmu/dma-helpers.c:244
> #3  0x00005595ace6f90a in dma_blk_write (blk=0x5595ae2c6c30,
>     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>     cb=cb@entry=0x5595acd40b30 <ide_dma_cb>, opaque=opaque@entry=0x5595af6949d0)
>     at ../softmmu/dma-helpers.c:280
> #4  0x00005595acd40e18 in ide_dma_cb (opaque=0x5595af6949d0, ret=<optimized out>)
>     at ../hw/ide/core.c:953
> #5  0x00005595ace6f319 in dma_complete (ret=0, dbs=0x7f64600089a0)
>     at ../softmmu/dma-helpers.c:107
> #6  dma_blk_cb (opaque=0x7f64600089a0, ret=0) at ../softmmu/dma-helpers.c:127
> #7  0x00005595ad12227d in blk_aio_complete (acb=0x7f6460005b10)
>     at ../block/block-backend.c:1527
> #8  blk_aio_complete (acb=0x7f6460005b10) at ../block/block-backend.c:1524
> #9  blk_aio_write_entry (opaque=0x7f6460005b10) at ../block/block-backend.c:1594
> #10 0x00005595ad258cfb in coroutine_trampoline (i0=<optimized out>,
>     i1=<optimized out>) at ../util/coroutine-ucontext.c:177

Signed-off-by: Fiona Ebner <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Tested-by: [email protected]
Message-ID: <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
jcmvbkbc pushed a commit that referenced this issue Nov 22, 2023
On LoongArch host,  we got an Aborted from tcg_out_mov().

qemu-x86_64 configure with '--enable-debug'.

> (gdb) b /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312
> Breakpoint 1 at 0x2576f0: file /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc, line 312.
> (gdb) run hello
[...]
> Thread 1 "qemu-x86_64" hit Breakpoint 1, tcg_out_mov (s=0xaaaae91760 <tcg_init_ctx>, type=TCG_TYPE_V128, ret=TCG_REG_V2,
>     arg=TCG_REG_V0) at /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312
> 312           g_assert_not_reached();
> (gdb) bt
> #0  tcg_out_mov (s=0xaaaae91760 <tcg_init_ctx>, type=TCG_TYPE_V128, ret=TCG_REG_V2, arg=TCG_REG_V0)
>     at /home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312
> #1  0x000000aaaad0fee0 in tcg_reg_alloc_mov (s=0xaaaae91760 <tcg_init_ctx>, op=0xaaaaf67c20) at ../tcg/tcg.c:4632
> #2  0x000000aaaad142f4 in tcg_gen_code (s=0xaaaae91760 <tcg_init_ctx>, tb=0xffe8030340 <code_gen_buffer+197328>,
>     pc_start=4346094) at ../tcg/tcg.c:6135
[...]
> (gdb) c
> Continuing.
> **
> ERROR:/home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312:tcg_out_mov: code should not be reached
> Bail out! ERROR:/home1/gaosong/code/qemu/tcg/loongarch64/tcg-target.c.inc:312:tcg_out_mov: code should not be reached
>
> Thread 1 "qemu-x86_64" received signal SIGABRT, Aborted.
> 0x000000fff7b1c390 in raise () from /lib64/libc.so.6
> (gdb) q

Fixes: 16288de ("tcg/loongarch64: Lower basic tcg vec ops to LSX")
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Richard Henderson <[email protected]>
Signed-off-by: Song Gao <[email protected]>
Message-Id: <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants