Bug 1172473

Summary: BUG: seccomp filter failure with "-object memory-backend-ram"
Product: Red Hat Enterprise Linux 7 Reporter: Jincheng Miao <jmiao>
Component: qemu-kvm-rhevAssignee: Paul Moore <pmoore>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.1CC: chayang, dyuan, ehabkost, honzhang, huding, jmiao, juli, juzhang, knoel, mrezanin, mzhan, ovasik, pbonzini, pmoore, tlavigne, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.1.2-21.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 09:57:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
01-bz1172473.patch none

Description Jincheng Miao 2014-12-10 06:51:16 UTC
Description of problem:
If appending object-memory and sandbox together, guest will fail, and 

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.1.2-13.el7.x86_64
kernel-3.10.0-206.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.
# /usr/libexec/qemu-kvm -name r7a -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -cpu host -m 1024 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid 15af3918-627a-4b3a-af32-0502c4557a17 -no-user-config -nodefaults -rtc base=utc -no-shutdown -boot strict=on -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=8,bus=pci.0,addr=0x2 -sandbox on 1>out 2>err

Bad system call

2. there is no standard output and standard error
# cat out

# cat err


Actual results:
guest NUMA + sandbox: fails

Expected results:
Guest boot success.
If guest NUMA is not supported when sandbox enabled, qemu-kvm-rhev should report some error, so that the upper vm management software could aware of this problem.

Comment 1 Paul Moore 2014-12-10 16:29:21 UTC
After running the qemu-kvm command line in the original problem report, please run the following command and report the output:

 # ausearch --start recent -m SECCOMP

Comment 2 Jincheng Miao 2014-12-11 03:36:13 UTC
(In reply to Paul Moore from comment #1)
> After running the qemu-kvm command line in the original problem report,
> please run the following command and report the output:
> 
>  # ausearch --start recent -m SECCOMP

Here is the audit log:

# ausearch --start recent -m SECCOMP
----
time->Thu Dec 11 11:35:18 2014
type=SECCOMP msg=audit(1418268918.436:2032): auid=0 uid=0 gid=0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12377 comm="qemu-kvm" sig=31 arch=c000003e syscall=237 compat=0 ip=0x7f2e13047839 code=0x0

Comment 3 Paul Moore 2014-12-11 14:27:14 UTC
It appears that the problematic syscall is mbind():

  # scmp_sys_resolver -a x86_64 237
  mbind

This makes sense as mbind() is used to set the NUMA memory policy for a memory range.  I'm guessing that in addition to mbind we may also want to allow set_mempolicy() and get_mempolicy().

Comment 4 Paul Moore 2014-12-16 19:25:43 UTC
Looking at the current upstream code, of the three syscalls mentioned in comment #3, it appears that only mbind(2) is used (inside backends/hostmem.c).

Comment 10 Paul Moore 2014-12-17 20:54:21 UTC
Upstream posting:

 * https://marc.info/?l=qemu-devel&m=141884942806950&w=2

Comment 11 Paul Moore 2015-01-09 17:52:56 UTC
The patch is now present in the upstream QEMU repository:

    commit be6c340fe98ccca9e51cac193f13f22c9dbb7e0b
    Author: Paul Moore <pmoore>
    Date:   Fri Jan 9 12:51:21 2015 -0500

    seccomp: add mbind() to the syscall whitelist
    
    The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to
    set the policy for a memory range.  Add the syscall to the seccomp
    sandbox whitelist.
    
    Signed-off-by: Paul Moore <pmoore>

Comment 12 Paul Moore 2015-01-09 19:38:06 UTC
(In reply to Paul Moore from comment #11)
> The patch is now present in the upstream QEMU repository:
> 
>     commit be6c340fe98ccca9e51cac193f13f22c9dbb7e0b
>     Author: Paul Moore <pmoore>
>     Date:   Fri Jan 9 12:51:21 2015 -0500
> 
>     seccomp: add mbind() to the syscall whitelist
>     
>     The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to
>     set the policy for a memory range.  Add the syscall to the seccomp
>     sandbox whitelist.
>     
>     Signed-off-by: Paul Moore <pmoore>

Nevermind, please disregard comment #11; the patch has still not been merged upstream.

Comment 13 Paul Moore 2015-01-13 23:18:14 UTC
Now the patch is present in the upstream QEMU repository:

    commit ea259acae5b2d88ee6e92caf1cf44eb501eaef47
    Author: Paul Moore <pmoore>
    Date:   Wed Dec 17 15:50:09 2014 -0500

    seccomp: add mbind() to the syscall whitelist
    
    The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to
    set the policy for a memory range.  Add the syscall to the seccomp
    sandbox whitelist.
    
    Signed-off-by: Paul Moore <pmoore>
    Signed-off-by: Eduardo Otubo <eduardo.otubo>
    Acked-by: Eduardo Otubo <eduardo.otubo>
    Tested-by: Eduardo Habkost <ehabkost>
    Reviewed-by: Eduardo Habkost <ehabkost>

Comment 14 Paul Moore 2015-01-14 18:45:41 UTC
Created attachment 980156 [details]
01-bz1172473.patch

Comment 20 Jeff Nelson 2015-01-23 20:00:22 UTC
Fix included in qemu-kvm-rhev-2.1.2-21.el7

Comment 22 Chao Yang 2015-01-26 07:25:36 UTC
Reproduced with qemu-kvm-rhev-2.1.2-20.el7.x86_64.

Steps:
1. boot a guest with -sandbox on as well as -object memory-backend-file or -object memory-backend-ram

Actual Result:
Bad system call


Verified pass with qemu-kvm-rhev-2.1.2-21.el7.x86_64. Covering memory-backend-file and memory-backend-ram, each covers 4 modes - default, preferred, bind, interleave. No such issue any more.

CLI: 

 /usr/libexec/qemu-kvm -sandbox on -realtime mlock=off -M pc -S -cpu SandyBridge,enforce -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -global kvm-pit.lost_tick_policy=discard -usb -device usb-tablet,id=input0 -rtc base=utc,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/home/rhel6.6.z.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=58:61:52:B6:40:21,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -spice port=5900,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -monitor stdio -object memory-backend-file,prealloc=yes,policy=bind,host-nodes=0,id=mem-0,size=2048M,mem-path=/mnt/hugepage-1 -object memory-backend-file,prealloc=yes,policy=bind,host-nodes=1,id=mem-1,size=2048M,mem-path=/mnt/hugepage-2 -numa node,cpus=0,cpus=2,memdev=mem-0 -numa node,cpus=1,cpus=3,memdev=mem-1

Comment 25 errata-xmlrpc 2015-03-05 09:57:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html