Bug 1240187
| Summary: | virsh attach-interface / attach-device hangs VMs with high amount of memory for several seconds | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Martin Tessun <mtessun> |
| Component: | qemu-kvm | Assignee: | jason wang <jasowang> |
| Status: | CLOSED WONTFIX | QA Contact: | xiywang |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.6 | CC: | ailan, chayang, cww, hhuang, jasowang, juzhang, mkenneth, mst, rbalakri, sfroemer, virt-bugs, virt-maint, weliao, xiywang |
| Target Milestone: | rc | Flags: | jasowang:
needinfo?
(mst) |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-31 21:07:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1172231, 1269194, 1359965 | ||
Some additional tests about the influence of main-memory on that procedure: All mentioned sizes are Main Memory. 96GB: [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m3.707s user 0m0.033s sys 0m0.028s [root@ibm-hx5-7872-01 ~]# 64GB: [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m2.690s user 0m0.037s sys 0m0.017s 32GB: [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m1.448s user 0m0.031s sys 0m0.031s [root@ibm-hx5-7872-01 ~]# 16GB: [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m1.419s user 0m0.026s sys 0m0.025s [root@ibm-hx5-7872-01 ~]# 8GB: [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m0.759s user 0m0.037s sys 0m0.022s [root@ibm-hx5-7872-01 ~]# So it somehow looks dependant on the amount of main memory, although it doesn't make any difference for 16 and 32GB. But maybe that was just some coincidence. Hi Chayang, Could you take care this issue? Best Regards, Junyi Hi jason, QE can reproduce this issue.
and I have a finding: if add NIC without vhost, no this issue, just add NIC with vhost=on, hit this issue.
Host version:
2.6.32-599.el6.x86_64
qemu-kvm-0.12.1.2-2.483.el6.x86_64
Guest: 2.6.32-595.el6.x86_64
step:
1.launch guest with big Memory
2.execute while loop in guest:
while true; do date; sleep 1; done
3.add NIC with vhost=on
{ 'execute' : 'qmp_capabilities' }
{"return": {}}
{ "execute": "netdev_add", "arguments": { "type":"tap","id":"hostnet1","vhost":"on"}}
{"return": {}}
qemu command:
/usr/libexec/qemu-kvm -name aaa \
-M rhel6.6.0 -enable-kvm \
-m 163840 -realtime mlock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid 8f751f99-1dc9-c004-2b1a-595876325a48 \
-nodefconfig -nodefaults \
-rtc base=utc -no-shutdown \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 \
-drive file=/mnt/RHEL-Server-6.7-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b7:eb:79,bus=pci.0,addr=0x3 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0 -vnc :0 \
-vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 \
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-msg timestamp=on \
-monitor stdio \
-qmp tcp:0:5555,nowait,server
scenarios:
1.120G
-m 122880 ; hang about 5s
Tue Jan 12 10:53:11 CST 2016
Tue Jan 12 10:53:16 CST 2016
2.64G
-m 65536 hang about 2-3s
Tue Jan 12 10:49:41 CST 2016
Tue Jan 12 10:49:44 CST 2016
3.32G
-m 65536 hang about 2s
Tue Jan 12 10:55:41 CST 2016
Tue Jan 12 10:55:43 CST 2016
4.16G hang about 1s
(In reply to weliao from comment #8) > Hi jason, QE can reproduce this issue. > and I have a finding: if add NIC without vhost, no this issue, just add NIC > with vhost=on, hit this issue. > > Host version: > 2.6.32-599.el6.x86_64 > qemu-kvm-0.12.1.2-2.483.el6.x86_64 > Guest: 2.6.32-595.el6.x86_64 > > step: > 1.launch guest with big Memory > 2.execute while loop in guest: > while true; do date; sleep 1; done > 3.add NIC with vhost=on > { 'execute' : 'qmp_capabilities' } > {"return": {}} > { "execute": "netdev_add", "arguments": { > "type":"tap","id":"hostnet1","vhost":"on"}} > {"return": {}} > > qemu command: > /usr/libexec/qemu-kvm -name aaa \ > -M rhel6.6.0 -enable-kvm \ > -m 163840 -realtime mlock=off \ > -smp 2,sockets=2,cores=1,threads=1 \ > -uuid 8f751f99-1dc9-c004-2b1a-595876325a48 \ > -nodefconfig -nodefaults \ > -rtc base=utc -no-shutdown \ > -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 \ > -device > ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on, > addr=0x5 \ > -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 \ > -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 \ > -drive > file=/mnt/RHEL-Server-6.7-64-virtio.qcow2,if=none,id=drive-virtio-disk0, > format=qcow2,cache=none \ > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0, > id=virtio-disk0,bootindex=1 \ > -netdev tap,id=hostnet0,vhost=on \ > -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b7:eb:79,bus=pci.0, > addr=0x3 \ > -chardev pty,id=charserial0 \ > -device isa-serial,chardev=charserial0,id=serial0 \ > -device usb-tablet,id=input0 -vnc :0 \ > -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 \ > -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \ > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \ > -msg timestamp=on \ > -monitor stdio \ > -qmp tcp:0:5555,nowait,server > > scenarios: > 1.120G > -m 122880 ; hang about 5s > Tue Jan 12 10:53:11 CST 2016 > Tue Jan 12 10:53:16 CST 2016 > > 2.64G > -m 65536 hang about 2-3s > Tue Jan 12 10:49:41 CST 2016 > Tue Jan 12 10:49:44 CST 2016 > > 3.32G > -m 65536 hang about 2s > Tue Jan 12 10:55:41 CST 2016 > Tue Jan 12 10:55:43 CST 2016 > > 4.16G hang about 1s Can you reproduce this on RHEL7? And how about retrying without cirrus? hi, jason on RHEL7,the same test command and the same test steps I can't reproduce this bug. Host: 3.10.0-351.el7.x86_64 qemu-kvm-1.5.3-105.el7_2.3.x86_64 Guest: rhel7.3 and rhel6.8 guest all no hit this issue. retested without cirrues(-spice port=5900,disable-ticketing -vga qxl) on RHEL6.8 Host, also had this issue. Host: 2.6.32-621.el6.x86_64 qemu-kvm-0.12.1.2-2.490.el6.x86_64 Guest: rhel6.8 and rhel7.3 guest all hit this issue. (In reply to weliao from comment #10) > hi, jason > on RHEL7,the same test command and the same test steps I can't reproduce > this bug. > Host: > 3.10.0-351.el7.x86_64 > qemu-kvm-1.5.3-105.el7_2.3.x86_64 > Guest: > rhel7.3 and rhel6.8 guest all no hit this issue. Cool, can you do a reverse bisection to see which commit fixes the issue? If you wish, I can tell you the steps. > > retested without cirrues(-spice port=5900,disable-ticketing -vga qxl) on > RHEL6.8 Host, also had this issue. > Host: > 2.6.32-621.el6.x86_64 > qemu-kvm-0.12.1.2-2.490.el6.x86_64 > Guest: > rhel6.8 and rhel7.3 guest all hit this issue. Hi Jason, can I get a current status on this? Michael: Any commit in your mind that can be back-ported to fix this issue? I remember we fix similar issues in the past. Thanks Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com |
Description of problem: When attaching a Device to a running VM with one of the following commands, the command might hang for several seconds in case the VM has a huge amount of memory: virsh attach-interface <interface definition> virsh attach-device <xml-file> Version-Release number of selected component (if applicable): Most probably all versions, reproduced with the following versions: qemu-kvm-0.12.1.2-2.445.el6.x86_64 libvirt-0.10.2-46.el6.x86_64 How reproducible: always Steps to Reproduce: 1. Define a VM with 120GB+ Main memory: <memory unit='KiB'>125829120</memory> <currentMemory unit='KiB'>125829120</currentMemory> <vcpu placement='static'>4</vcpu> 2. Start the VM # virsh start <VM-NAME> 3. Attach an interface to the domain: [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m4.332s user 0m0.041s sys 0m0.012s [root@ibm-hx5-7872-01 ~]# 4. During the operation also run a simple while-loop in the VM: [root@rhel6 ~]# while true; do date; sleep 1; done Mon Jul 6 09:05:03 CEST 2015 Mon Jul 6 09:05:04 CEST 2015 Mon Jul 6 09:05:05 CEST 2015 Mon Jul 6 09:05:06 CEST 2015 Mon Jul 6 09:05:11 CEST 2015 <=== 5 seconds gap!! Mon Jul 6 09:05:12 CEST 2015 Mon Jul 6 09:05:13 CEST 2015 ^C [root@rhel6 ~]# Actual results: The VM "hangs" for about 4-5 seconds. Expected results: The VM shouldn't hang, and the attach operation should be similiar as with less memory (e.g. 4 GB): [root@ibm-hx5-7872-01 ~]# virsh dumpxml rhel6 <domain type='kvm' id='2'> <name>rhel6</name> <uuid>b72d1061-517b-292a-b989-07d66e5151b6</uuid> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <vcpu placement='static'>4</vcpu> [...] [root@ibm-hx5-7872-01 ~]# time virsh attach-interface rhel6 --type bridge --model virtio --source virbr0 --mac 00:16:3e:00:08:24 Interface attached successfully real 0m0.405s user 0m0.034s sys 0m0.027s [root@ibm-hx5-7872-01 ~]# [root@rhel6 ~]# while true; do date; sleep 1; done Mon Jul 6 09:19:43 CEST 2015 Mon Jul 6 09:19:44 CEST 2015 Mon Jul 6 09:19:45 CEST 2015 Mon Jul 6 09:19:46 CEST 2015 Mon Jul 6 09:19:47 CEST 2015 Mon Jul 6 09:19:48 CEST 2015 Mon Jul 6 09:19:49 CEST 2015 Mon Jul 6 09:19:50 CEST 2015 Mon Jul 6 09:19:51 CEST 2015 Mon Jul 6 09:19:52 CEST 2015 Mon Jul 6 09:19:53 CEST 2015 Mon Jul 6 09:19:54 CEST 2015 Mon Jul 6 09:19:55 CEST 2015 ^C [root@rhel6 ~]# Additional info: After stracing the qemu process, the main reason for this seems to be the futex call: 10269 13:20:02.221543 write(23, "\1\0\0\0\0\0\0\0", 8) = 8 <0.000022> 10269 13:20:02.221624 ioctl(53, 0x400454d4, 0x7fff9c19681c) = 0 <0.000023> 10269 13:20:02.221710 ioctl(54, 0xaf01 <unfinished ...> 10281 13:20:02.303602 <... ioctl resumed> , 0) = 0 <0.241958> 10281 13:20:02.303689 futex(0x7f86a5acb7a0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 10269 13:20:02.487004 <... ioctl resumed> , 0) = 0 <0.265269> 10269 13:20:02.487166 ioctl(54, 0x8008af00, 0x7fff9c1967e8) = 0 <0.001335> [2.5 Seconds nothing happens] 10269 13:20:04.987971 ioctl(53, 0x800454d7, 0x7fff9c1967ec) = 0 <0.002352> 10269 13:20:04.990440 ioctl(53, 0x400454d8, 0x7fff9c1967dc) = 0 <0.000023> That is also represented in the duration of the systemcalls: system-call, total, count, mean, min, max, unfinished open, 0.000061, 2, 0.000030, 0.000029, 0.000032, 0 select, 0.541804, 13896, 0.000039, 0.000000, 0.001208, 0 brk, 0.000069, 3, 0.000023, 0.000022, 0.000024, 0 futex, 24.476751, 15157, 0.001615, 0.000000, 9.285405, 2 fdatasync, 0.000838, 15, 0.000056, 0.000000, 0.000080, 0 [...]