Description of problem: If appending object-memory and sandbox together, guest will fail, and Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.1.2-13.el7.x86_64 kernel-3.10.0-206.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. # /usr/libexec/qemu-kvm -name r7a -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -cpu host -m 1024 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid 15af3918-627a-4b3a-af32-0502c4557a17 -no-user-config -nodefaults -rtc base=utc -no-shutdown -boot strict=on -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=8,bus=pci.0,addr=0x2 -sandbox on 1>out 2>err Bad system call 2. there is no standard output and standard error # cat out # cat err Actual results: guest NUMA + sandbox: fails Expected results: Guest boot success. If guest NUMA is not supported when sandbox enabled, qemu-kvm-rhev should report some error, so that the upper vm management software could aware of this problem.
After running the qemu-kvm command line in the original problem report, please run the following command and report the output: # ausearch --start recent -m SECCOMP
(In reply to Paul Moore from comment #1) > After running the qemu-kvm command line in the original problem report, > please run the following command and report the output: > > # ausearch --start recent -m SECCOMP Here is the audit log: # ausearch --start recent -m SECCOMP ---- time->Thu Dec 11 11:35:18 2014 type=SECCOMP msg=audit(1418268918.436:2032): auid=0 uid=0 gid=0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12377 comm="qemu-kvm" sig=31 arch=c000003e syscall=237 compat=0 ip=0x7f2e13047839 code=0x0
It appears that the problematic syscall is mbind(): # scmp_sys_resolver -a x86_64 237 mbind This makes sense as mbind() is used to set the NUMA memory policy for a memory range. I'm guessing that in addition to mbind we may also want to allow set_mempolicy() and get_mempolicy().
Looking at the current upstream code, of the three syscalls mentioned in comment #3, it appears that only mbind(2) is used (inside backends/hostmem.c).
Upstream posting: * https://marc.info/?l=qemu-devel&m=141884942806950&w=2
The patch is now present in the upstream QEMU repository: commit be6c340fe98ccca9e51cac193f13f22c9dbb7e0b Author: Paul Moore <pmoore> Date: Fri Jan 9 12:51:21 2015 -0500 seccomp: add mbind() to the syscall whitelist The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to set the policy for a memory range. Add the syscall to the seccomp sandbox whitelist. Signed-off-by: Paul Moore <pmoore>
(In reply to Paul Moore from comment #11) > The patch is now present in the upstream QEMU repository: > > commit be6c340fe98ccca9e51cac193f13f22c9dbb7e0b > Author: Paul Moore <pmoore> > Date: Fri Jan 9 12:51:21 2015 -0500 > > seccomp: add mbind() to the syscall whitelist > > The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to > set the policy for a memory range. Add the syscall to the seccomp > sandbox whitelist. > > Signed-off-by: Paul Moore <pmoore> Nevermind, please disregard comment #11; the patch has still not been merged upstream.
Now the patch is present in the upstream QEMU repository: commit ea259acae5b2d88ee6e92caf1cf44eb501eaef47 Author: Paul Moore <pmoore> Date: Wed Dec 17 15:50:09 2014 -0500 seccomp: add mbind() to the syscall whitelist The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to set the policy for a memory range. Add the syscall to the seccomp sandbox whitelist. Signed-off-by: Paul Moore <pmoore> Signed-off-by: Eduardo Otubo <eduardo.otubo> Acked-by: Eduardo Otubo <eduardo.otubo> Tested-by: Eduardo Habkost <ehabkost> Reviewed-by: Eduardo Habkost <ehabkost>
Created attachment 980156 [details] 01-bz1172473.patch
Fix included in qemu-kvm-rhev-2.1.2-21.el7
Reproduced with qemu-kvm-rhev-2.1.2-20.el7.x86_64. Steps: 1. boot a guest with -sandbox on as well as -object memory-backend-file or -object memory-backend-ram Actual Result: Bad system call Verified pass with qemu-kvm-rhev-2.1.2-21.el7.x86_64. Covering memory-backend-file and memory-backend-ram, each covers 4 modes - default, preferred, bind, interleave. No such issue any more. CLI: /usr/libexec/qemu-kvm -sandbox on -realtime mlock=off -M pc -S -cpu SandyBridge,enforce -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -global kvm-pit.lost_tick_policy=discard -usb -device usb-tablet,id=input0 -rtc base=utc,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/home/rhel6.6.z.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=58:61:52:B6:40:21,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -spice port=5900,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -monitor stdio -object memory-backend-file,prealloc=yes,policy=bind,host-nodes=0,id=mem-0,size=2048M,mem-path=/mnt/hugepage-1 -object memory-backend-file,prealloc=yes,policy=bind,host-nodes=1,id=mem-1,size=2048M,mem-path=/mnt/hugepage-2 -numa node,cpus=0,cpus=2,memdev=mem-0 -numa node,cpus=1,cpus=3,memdev=mem-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0624.html