Bug 1408333
Summary: | Regression: [BISECTED] Guest hangs on migrate, reverting patch fixes the problem | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | bugzilla | ||||||
Component: | qemu-kvm | Assignee: | Dr. David Alan Gilbert <dgilbert> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 7.3 | CC: | bugzilla, chayang, hhuang, juzhang, knoel, mdeng, michen, qzhang, rbalakri, rh-bugzilla, virt-maint, xianwang, xuma, zhengtli | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-02-15 18:36:36 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
bugzilla
2016-12-22 23:59:31 UTC
Created attachment 1234901 [details]
Guest kernel panic after migrate
Hi, Min Please give a help to reproduce the bug, thanks. Could you please provide an entire qemu cli and accurate version of el7.3.1611 if possible.Thanks in advance ! Thanks Min What do you mean entire qemu cli and accurate version? The qemu-kvm package version at issue as above: qemu-kvm-1.5.3-126.el7.x86_64.rpm We downloaded the .src.rpm, rebuilt and confirmed the problem. We then removed the patch shown above and our VM stopped hanging on migrate. We just use this to migrate, nothing special and no direct qemu monitor interaction: virsh --connect=qemu:///system --quiet migrate --live myfavoritevm qemu+ssh://remotenode/system Libvirt doesn't seem involved, but we are using this version: libvirt-2.0.0-10.el7_3.2.x86_64 which comes with the latest el7 7.3.1611 release. (In reply to Eric Wheeler from comment #5) > What do you mean entire qemu cli and accurate version? The qemu-kvm package > version at issue as above: qemu-kvm-1.5.3-126.el7.x86_64.rpm > > We downloaded the .src.rpm, rebuilt and confirmed the problem. We then > removed the patch shown above and our VM stopped hanging on migrate. > > We just use this to migrate, nothing special and no direct qemu monitor > interaction: > virsh --connect=qemu:///system --quiet migrate --live myfavoritevm > qemu+ssh://remotenode/system > > Libvirt doesn't seem involved, but we are using this version: > libvirt-2.0.0-10.el7_3.2.x86_64 which comes with the latest el7 7.3.1611 > release. Hi, Eric Thanks for your reply. The entire qemu cli here means entire qemu "command line" which could be gathered on host with "#ps ax | grep kvm" when the vm is running. Regards, Qunfang Ah! That makes sense. Here it is: /usr/libexec/qemu-kvm -name demo-1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 384 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 08edf62d-1580-41f9-9fbc-36310a48bbca -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-105-demo-1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/drbd/by-res/demo-1,format=raw,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -msg timestamp=on If it is helpful, we have confirmed this on the following CPU hardware: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz Thanks for the information. qemu-kvm-1.5.3-126.el7.x86_64 guestos:RHEL-7.3-updates-20161130.1 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz shared storage:nfs I can't reproduce this issue with qemu cmd lines as comment 7. I will try to reproduce it if i reserve a machine with cpu as comment 8. Hello All, We continued to do troubleshooting on our side. It occurred to us that perhaps this is not a user space problem. We were running the Linux longterm 4.1.y releases and discovered that this was causing the problem. It turns out that in 4.1.16, This patch was merged into the kernel: KVM: x86: expose MSR_TSC_AUX to userspace https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=8a3185c54d650a86dafc8d8bcafa124b50944315 It was flagged for cc: stable.org, but had some dependencies that were missed. In order to be stable, these commits must also be pulled into the 4.1.y series: 609e36d372a KVM: x86: pass host_initiated to functions that read MSRs 81b1b9ca6d5 backport: KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX These commits were signed off by: Signed-off-by: Paolo Bonzini <pbonzini> Signed-off-by: Haozhong Zhang <haozhong.zhang> I'm not sure if they should be added to this BZ or not, so I will let your team decide on that. I understand that because that this is not a supported kernel that you may be inclined to mark this as "not a bug" or "won't fix" or some other appropriate flag for RHEL itself. Please feel free to do whatever is most appropriate for your work flow. However, Please leave this BZ public so that I can post to the KVM and Linux-stable lists and reference this bug for complete information. Thank you everyone for your help in troubleshooting to get to the root of this issue! Sincerely, Eric Wheeler (In reply to bugzilla from comment #11) > Hello All, > > We continued to do troubleshooting on our side. It occurred to us that > perhaps this is not a user space problem. We were running the Linux longterm > 4.1.y releases and discovered that this was causing the problem. I wish you'd mentioned you were using a non-distro kernel earlier! Does it work with the distro kernel? > It turns > out that in 4.1.16, This patch was merged into the kernel: KVM: x86: expose > MSR_TSC_AUX to userspace > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/ > ?id=8a3185c54d650a86dafc8d8bcafa124b50944315 OK, I see that in our distro kernel. > It was flagged for cc: stable.org, but had some dependencies > that were missed. In order to be stable, these commits must also be pulled > into the 4.1.y series: > 609e36d372a KVM: x86: pass host_initiated to functions that read MSRs > 81b1b9ca6d5 backport: KVM: VMX: Fix host initiated access to guest > MSR_TSC_AUX I see both of those in our distro kernel. > These commits were signed off by: > Signed-off-by: Paolo Bonzini <pbonzini> > Signed-off-by: Haozhong Zhang <haozhong.zhang> > > I'm not sure if they should be added to this BZ or not, so I will let your > team decide on that. You might want b0996ae48 as well which is a fix for the first of those. > > I understand that because that this is not a supported kernel that you may > be inclined to mark this as "not a bug" or "won't fix" or some other > appropriate flag for RHEL itself. Please feel free to do whatever is most > appropriate for your work flow. > > However, Please leave this BZ public so that I can post to the KVM and > Linux-stable lists and reference this bug for complete information. > > Thank you everyone for your help in troubleshooting to get to the root of > this issue! Can you just confirm it works fine with the distro kernel? Thanks for tracking it down and making sure the missing fixes went into stable. Dave > Sincerely, > > Eric Wheeler (In reply to Dr. David Alan Gilbert from comment #12) > (In reply to bugzilla from comment #11) > > Hello All, > > > > We continued to do troubleshooting on our side. It occurred to us that > > perhaps this is not a user space problem. We were running the Linux longterm > > 4.1.y releases and discovered that this was causing the problem. > > I wish you'd mentioned you were using a non-distro kernel earlier! > Does it work with the distro kernel? My apologies for not mentioning the kernel version. Since I was able to fix this in userspace, I had not considered it could be a kernel issue and forgot to mention it. Yes, it works with the distro kernel. > > It turns > > out that in 4.1.16, This patch was merged into the kernel: KVM: x86: expose > > MSR_TSC_AUX to userspace > > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/ > > ?id=8a3185c54d650a86dafc8d8bcafa124b50944315 > > OK, I see that in our distro kernel. > > > It was flagged for cc: stable.org, but had some dependencies > > that were missed. In order to be stable, these commits must also be pulled > > into the 4.1.y series: > > 609e36d372a KVM: x86: pass host_initiated to functions that read MSRs > > 81b1b9ca6d5 backport: KVM: VMX: Fix host initiated access to guest > > MSR_TSC_AUX > > I see both of those in our distro kernel. > > > These commits were signed off by: > > Signed-off-by: Paolo Bonzini <pbonzini> > > Signed-off-by: Haozhong Zhang <haozhong.zhang> > > > > I'm not sure if they should be added to this BZ or not, so I will let your > > team decide on that. > > You might want b0996ae48 as well which is a fix for the first of those. Thank you for that! > > I understand that because that this is not a supported kernel that you may > > be inclined to mark this as "not a bug" or "won't fix" or some other > > appropriate flag for RHEL itself. Please feel free to do whatever is most > > appropriate for your work flow. > > > > However, Please leave this BZ public so that I can post to the KVM and > > Linux-stable lists and reference this bug for complete information. > > > > Thank you everyone for your help in troubleshooting to get to the root of > > this issue! > > Can you just confirm it works fine with the distro kernel? Confirmed. This problem does not present itself with the distro kernel. > Thanks for tracking it down and making sure the missing fixes went into > stable. You're welcome, I am happy to help! -Eric > Dave > > Sincerely, > > > > Eric Wheeler Thanks! Based on comment 13; Closed Not-a-bug; the distro kernel works, the main upstream works; it's just a bug in the upstream stable tree. |