Description of problem: Trying to run Windows 10 VM from virt-manager (qemu based virtualization) - after few seconds VM is stopped with state 'Shutdown (Crashed)' In last week or two this VM has worked without any problems. I tried to downgrade libvirt, qemu, kernel but I can't find on which level this problem occurs. Ubuntu 20.04 running as Live CD has no problem. Version-Release number of selected component: qemu-system-x86-core-2:6.0.0-6.fc35 Additional info: reporter: libreport-2.15.2 backtrace_rating: 4 cgroup: 0::/machine.slice/machine-qemu\x2d6\x2dwin10.scope/libvirt/emulator cmdline: /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S -object $'{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-6-win10/master-key.aes"}' -machine pc-q35-3.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff -m 1024 -object $'{"qom-type":"memory-backend-ram","id":"pc.ram","size":1073741824}' -overcommit mem-lock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 6c87098b-da4a-4415-ae9c-06872147750a -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=36,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-pci-bridge,id=pci.6,bus=pci.4,addr=0x0 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device lsi,id=scsi0,bus=pci.6,addr=0x1 -blockdev $'{"driver":"file","filename":"/home/user/.local/share/gnome-boxes/images/win10","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' -blockdev $'{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' -device ide-hd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=1 -netdev tap,fd=39,id=hostnet0 -device e1000e,netdev=hostnet0,id=net0,mac=52:54:00:e5:c3:d3,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -audiodev id=audio1,driver=spice -vnc 127.0.0.1:0,audiodev=audio1 -device virtio-vga,id=video0,virgl=off,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0,audiodev=audio1 -device virtio-balloon-pci,id=balloon0,bus=pci.3,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on crash_function: __libc_scratch_buffer_grow dso_list: /usr/bin/qemu-system-x86_64 qemu-system-x86-core-2:6.0.0-6.fc35.x86_64 (Fedora Project) 1624446818 executable: /usr/bin/qemu-system-x86_64 journald_cursor: s=452c1c6d97264aa68b749c09e33cabeb;i=1f7235;b=7a4b111a86db4f23a4c94b8b6eae9d79;m=f20f5698;t=5c56e1dee3d5d;x=bfa1f2772e4d355e kernel: 5.13.0-0.rc7.51.fc35.x86_64 rootdir: / runlevel: N 5 type: CCpp uid: 107
Created attachment 1793790 [details] File: backtrace
Created attachment 1793791 [details] File: core_backtrace
Created attachment 1793792 [details] File: cpuinfo
Created attachment 1793793 [details] File: environ
Created attachment 1793794 [details] File: exploitable
Created attachment 1793796 [details] File: limits
Created attachment 1793797 [details] File: maps
Created attachment 1793798 [details] File: mountinfo
Created attachment 1793799 [details] File: open_fds
Created attachment 1793800 [details] File: proc_pid_status
Seems to be a crash deep in glibc. Does downgrading glibc help?
Hi, I just downgraded glibc to glibc-2.33.9000-13 - this problem still occur.
I'm hitting this problem now: qemu-system-x86-6.0.0-7.fc35.x86_64 glibc-2.33.9000-34.fc35.x86_64 The stack trace is ridiculously deep with the frames between __libc_scratch_buffer_grow repeating until it crashes. #34009 0x00007ff6c4664d65 in __libc_scratch_buffer_grow () at /lib64/libc.so.6 #34010 0x00007ff6c46d3099 in get_nprocs () at /lib64/libc.so.6 #34011 0x00007ff6c465f7da in arena_get2.part () at /lib64/libc.so.6 #34012 0x00007ff6c4661659 in tcache_init.part () at /lib64/libc.so.6 #34013 0x00007ff6c4661a0e in malloc () at /lib64/libc.so.6 #34014 0x00007ff6c4664d65 in __libc_scratch_buffer_grow () at /lib64/libc.so.6 #34015 0x00007ff6c46d3099 in get_nprocs () at /lib64/libc.so.6 #34016 0x00007ff6c465f7da in arena_get2.part () at /lib64/libc.so.6 #34017 0x00007ff6c4661659 in tcache_init.part () at /lib64/libc.so.6 #34018 0x00007ff6c4662007 in free () at /lib64/libc.so.6 #34019 0x00007ff6c4c1824d in g_free () at /lib64/libglib-2.0.so.0 #34020 0x000055ac6f6473a6 in qemu_thread_start () #34021 0x00007ff6c4651197 in start_thread () at /lib64/libc.so.6 #34022 0x00007ff6c46d6e54 in clone () at /lib64/libc.so.6 I'll see if I can get some debuginfo installed to get more information. But at the moment I'd say it looks like either memory corruption or a bug in glibc.
The crash happens here (although this is probably just because the stack limit has been hit): Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007ff6c46d305f in scratch_buffer_init (buffer=0x7ff6598f5e80) at ../include/scratch_buffer.h:78 78 buffer->length = sizeof (buffer->__space); The top of the stack with symbols looks like: #54415 0x00007ff6c4664d65 in __GI___libc_scratch_buffer_grow (buffer=buffer@entry=0x7ff65a0f3d20) at scratch_buffer_grow.c:37 #54416 0x00007ff6c46d3099 in scratch_buffer_grow (buffer=0x7ff65a0f3d20) at ../include/scratch_buffer.h:101 #54417 __GI___get_nprocs () at ../sysdeps/unix/sysv/linux/getsysstats.c:44 #54418 0x00007ff6c465f7da in arena_get2 (size=size@entry=640, avoid_arena=avoid_arena@entry=0x0) at /usr/src/debug/glibc-2.33.9000-34.fc35.x86_64/malloc/arena.c:902 #54419 0x00007ff6c4661659 in arena_get2 (avoid_arena=0x0, size=640) at /usr/src/debug/glibc-2.33.9000-34.fc35.x86_64/malloc/arena.c:893 #54420 tcache_init () at malloc.c:3182 #54421 0x00007ff6c4661a0e in tcache_init () at malloc.c:3179 #54422 __GI___libc_malloc (bytes=bytes@entry=2048) at malloc.c:3245 #54423 0x00007ff6c4664d65 in __GI___libc_scratch_buffer_grow (buffer=buffer@entry=0x7ff65a0f41f0) at scratch_buffer_grow.c:37 #54424 0x00007ff6c46d3099 in scratch_buffer_grow (buffer=0x7ff65a0f41f0) at ../include/scratch_buffer.h:101 #54425 __GI___get_nprocs () at ../sysdeps/unix/sysv/linux/getsysstats.c:44 #54426 0x00007ff6c465f7da in arena_get2 (size=size@entry=640, avoid_arena=avoid_arena@entry=0x0) at /usr/src/debug/glibc-2.33.9000-34.fc35.x86_64/malloc/arena.c:902 #54427 0x00007ff6c4661659 in arena_get2 (avoid_arena=0x0, size=640) at /usr/src/debug/glibc-2.33.9000-34.fc35.x86_64/malloc/arena.c:893 #54428 tcache_init () at malloc.c:3182 #54429 0x00007ff6c4662007 in tcache_init () at malloc.c:3179 #54430 __GI___libc_free (mem=0x7ff648000b60) at malloc.c:3333 #54431 0x00007ff6c4c1824d in g_free () at /lib64/libglib-2.0.so.0 #54432 0x000055ac6f6473a6 in qemu_thread_start (args=0x7ff64c000b60) at ../util/qemu-thread-posix.c:518 #54433 0x00007ff6c4651197 in start_thread (arg=<optimized out>) at pthread_create.c:429 #54434 0x00007ff6c46d6e54 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100 (Not sure why there are more frames, since this is the same core dump as in the previous comment, but maybe debuginfo allows inlined frames to be exposed?) These are the frames in the recursive loop: (gdb) frame 54423 #54423 0x00007ff6c4664d65 in __GI___libc_scratch_buffer_grow ( buffer=buffer@entry=0x7ff65a0f41f0) at scratch_buffer_grow.c:37 37 new_ptr = malloc (new_length); (gdb) frame 54422 #54422 __GI___libc_malloc (bytes=bytes@entry=2048) at malloc.c:3245 3245 MAYBE_INIT_TCACHE (); (gdb) frame 54421 #54421 0x00007ff6c4661a0e in tcache_init () at malloc.c:3179 3179 if (tcache_shutting_down) (gdb) frame 54420 #54420 tcache_init () at malloc.c:3182 3182 arena_get (ar_ptr, bytes); (gdb) frame 54419 #54419 0x00007ff6c4661659 in arena_get2 (avoid_arena=0x0, size=640) at /usr/src/debug/glibc-2.33.9000-34.fc35.x86_64/malloc/arena.c:893 893 if (a == NULL) (gdb) frame 54418 #54418 0x00007ff6c465f7da in arena_get2 (size=size@entry=640, avoid_arena=avoid_arena@entry=0x0) at /usr/src/debug/glibc-2.33.9000-34.fc35.x86_64/malloc/arena.c:902 902 int n = __get_nprocs (); (gdb) frame 54417 #54417 __GI___get_nprocs () at ../sysdeps/unix/sysv/linux/getsysstats.c:44 44 if (!scratch_buffer_grow (&set)) (gdb) frame 54416 #54416 0x00007ff6c46d3099 in scratch_buffer_grow (buffer=0x7ff65a0f3d20) at ../include/scratch_buffer.h:101 101 return __glibc_likely (__libc_scratch_buffer_grow (buffer)); (gdb) frame 54415 #54415 0x00007ff6c4664d65 in __GI___libc_scratch_buffer_grow ( buffer=buffer@entry=0x7ff65a0f3d20) at scratch_buffer_grow.c:37 37 new_ptr = malloc (new_length);
This looks like a kernel/container host problem: sched_getaffinity fails, which does not happen on real Linux. However, the bug is real for systems with thousands of CPUs, so we have to fix this anyway.
(In reply to Florian Weimer from comment #15) > This looks like a kernel/container host problem: sched_getaffinity fails, > which does not happen on real Linux. seccomp? IIUC the code correctly, it is blocked by qemu here: https://github.com/qemu/qemu/blob/13d5f87cc3b94bfccc501142df4a7b12fee3a6e7/softmmu/qemu-seccomp.c#L108 I guess a broken syscall shouldn't cause glibc to go into an infinite loop though :-/
(In reply to Florian Weimer from comment #15) > This looks like a kernel/container host problem: sched_getaffinity fails, > which does not happen on real Linux. Any syscall on "real" Linux can fail when seccomp is involved. We see from the args above that -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny resourcecontrol=deny blocks the sched_getaffinity syscall, amongst others.
Related glibc commit is probably this one: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=903bc7dcc2acafc40be11639767e10a2de712649
I think there's two problems: a) qemu's -sandbox resourcecontrol=deny blocks sched_get calls as well as set; arguably that's a bit too mean of it, we should allow get's and deny set's b) glibc calls sched_getaffinity repeatedly with increasing cpusetsize - but my reading is that it should only try increasing cpusetsize if sched_getaffinity returns EINVAL; any other failure isn't a problem with it being too small. Dave
(In reply to Daniel Berrangé from comment #17) > (In reply to Florian Weimer from comment #15) > > This looks like a kernel/container host problem: sched_getaffinity fails, > > which does not happen on real Linux. > > Any syscall on "real" Linux can fail when seccomp is involved. The kernel and glibc position is that if you break the system with seccomp, the seccomp filter needs to be fixed. A patch to document a more elaborate system call handshake that takes seccomp into account was quite strongly rejected: https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/ But as I mentioned in comment 15, this is a glibc bug for other reasons, so we will fix this in glibc. Blocking sched_getaffinity or returning bad data from it is still a very bad idea, though.
(In reply to Florian Weimer from comment #20) > (In reply to Daniel Berrangé from comment #17) > > (In reply to Florian Weimer from comment #15) > > > This looks like a kernel/container host problem: sched_getaffinity fails, > > > which does not happen on real Linux. > > > > Any syscall on "real" Linux can fail when seccomp is involved. > > The kernel and glibc position is that if you break the system with seccomp, > the seccomp filter needs to be fixed. I will do a fix for QEMU to relax this restriction > A patch to document a more elaborate > system call handshake that takes seccomp into account was quite strongly > rejected: > https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/ Reading that makes my head hurt, so can't disagree with it being rejected :-) (In reply to Dr. David Alan Gilbert from comment #19) > I think there's two problems: > a) qemu's -sandbox resourcecontrol=deny blocks sched_get calls as well as > set; arguably that's a bit too > mean of it, we should allow get's and deny set's Yes, I'll aim to allow the getters. Even the setters have been trouble in the past - Mesa tried to use sched_setaffinity on its own internal threads and that broke in QEMU too :-( seccomp is just soo fragile to use. Either the filter is wide open and thus largely useless, or it is restrictive and continuously breaks in unexpected ways :-(
Upstream patch posted: https://sourceware.org/pipermail/libc-alpha/2021-June/128271.html
Hi, I just downloaded new version glibc-2.33.9000-36 from koji and it looks that it solve my problem, VM which I had problems is starting and running for much more than few seconds like before.
VM is running more than 20 minutes, I don't observe any problems. Big thanks for solving this issue.
I have also tested it and it fixes the original problem.