Bug 1116398
Summary: | RHEV-H crashes and reboots when ksmd (MOM) is enabled | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | akotov | ||||||||||
Component: | kernel | Assignee: | Paolo Bonzini <pbonzini> | ||||||||||
kernel sub component: | KVM | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||||
Severity: | urgent | ||||||||||||
Priority: | urgent | CC: | aarcange, agkesos, alitke, areis, audgiri, bmcclain, carlos.molina.ext, chayang, cshao, dhoward, fdeutsch, f_ella, gouyang, hkrzesin, huiwa, iheim, jbuchta, juzhang, lcapitulino, leiwang, lilu, liwan, michen, mkenneth, pagupta, pbonzini, pstehlik, qiguo, qzhang, rbalakri, rbarry, rhodain, rhod, riel, rpacheco, virt-bugs, virt-maint, yaniwang, yanwang, ycui, yeylon | ||||||||||
Version: | 6.5 | Keywords: | Reopened, ZStream | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 6.5 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | kernel-2.6.32-527.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Cause: a page fault is taken by KVM with interrupts disabled
Consequence:, the page fault handler tries to take a lock, but KSM has sent an IPI while taking the same lock. KSM waits for the IPI to be processed, but KVM will not process it until it takes the lock. KSM and KVM then would deadlock, each waiting for the other.
Fix: Avoid operations that can page fault while interrupts are disabled.
Result: KVM and KSM do not bring each other to deadlock
|
Story Points: | --- | ||||||||||
Clone Of: | |||||||||||||
: | 1192055 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2015-07-22 08:09:45 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1002699, 1069309, 1192055 | ||||||||||||
Attachments: |
|
Description
akotov
2014-07-04 13:00:18 UTC
Moving this to qemu-kvm for now, but I am not sure if this is a qemu-kvm-rhev or kernel issue. This is affecting (at least) RHEV-H so requesting 6.5.z Hi, The first thing that comes to mind when I see a soft lockup like this is that you are trying to run too many vCPUs given the number of physical cores you have in the hardware. Could you share some more information about your configuration: 1. /proc/cpuinfo on the host 2. Details about the VM load on the host a. how many vms b. how many vCPUs per vm c. how much memory assigned to each VM d. workload that is driving memory consumption inside the VMs 3. /var/log/vdsm/mom.log around the time of the crash so I can see the ksm settings that are being activated. RHEV-H QE got the following conclusions after did three different scenarios. Test version: rhev-hypervisor6-6.5-20140624.0.el6ev ovirt-node-3.0.1-18.el6_5.11.noarch vdsm-4.14.7-3.el6ev.x86_64 RHEVM av10 Test scenario 1 (Host: RHEV-H): Run lots of VMs to full of memory for trigger KSM activation. Test result: ksm server can start automatic and RHEV-H will no crash. Test scenario 2 (Host: RHEV-H): 1. Create a VM and made the memory more closely to the host(e.g. The memory of host= 48G, then set the Memory of VM to 48G) 2. Run eatmemory script on VM for run out of memory Test result: RHEV-H crashes. Test scenario 3 (Host: RHEL): Do the same steps with scenario 2 on RHEL host. Test result: 1. The process of eatmemory will be killed automatic. 2. Host(RHEL) will not crash. So the crash only occurs on RHEV-H but no RHEL. You can find the script and crash.png in the attachment. Thanks! Created attachment 917023 [details]
eatmemory
Created attachment 917024 [details]
RHEV-H-crash.png
Hey Ying, thanks for the excessive testing. Pelase provide some more details (see inline). (In reply to shaochen from comment #6) > Test scenario 2 (Host: RHEV-H): > 1. Create a VM and made the memory more closely to the host(e.g. The memory > of host= 48G, then set the Memory of VM to 48G) > 2. Run eatmemory script on VM for run out of memory > > Test result: > RHEV-H crashes. Please provide the output of `free -m` , or some details on how much swap was avilable. > Test scenario 3 (Host: RHEL): > Do the same steps with scenario 2 on RHEL host. > > Test result: > 1. The process of eatmemory will be killed automatic. > 2. Host(RHEL) will not crash. The same as above - please proved `free -m` or some details on the swap available. > So the crash only occurs on RHEV-H but no RHEL. > You can find the script and crash.png in the attachment. IIUIC then it's quite normal that memory hogs get killed in cases that the kernel runs out of memory. > Please provide the output of `free -m` , or some details on how much swap > was avilable. > ===================================================== [root@ibm-x3650m3-01 admin]# free -m swap total used free shared buffers cached Mem: 48259 4505 43754 0 30 195 -/+ buffers/cache: 4278 43981 Swap: 32323 868 31455 [root@ibm-x3650m3-01 admin]# free -m swap total used free shared buffers cached Mem: 48259 9817 38442 0 30 195 -/+ buffers/cache: 9591 38668 Swap: 32323 868 31455 [root@ibm-x3650m3-01 admin]# free -m swap total used free shared buffers cached Mem: 48259 22919 25340 0 30 196 -/+ buffers/cache: 22692 25567 Swap: 32323 867 31456 [root@ibm-x3650m3-01 admin]# free -m swap total used free shared buffers cached Mem: 48259 36447 11812 0 30 196 -/+ buffers/cache: 36220 12039 Swap: 32323 864 31459 [root@ibm-x3650m3-01 admin]# free -m swap total used free shared buffers cached Mem: 48259 46712 1547 0 30 196 -/+ buffers/cache: 46485 1774 Swap: 32323 864 31459 [root@ibm-x3650m3-01 admin]# free -m swap total used free shared buffers cached Mem: 48259 47990 269 0 22 148 -/+ buffers/cache: 47819 440 Swap: 32323 1137 31186 The scenario 2's error is different with original bug error. > > Test scenario 3 (Host: RHEL): > > Do the same steps with scenario 2 on RHEL host. > > > > Test result: > > 1. The process of eatmemory will be killed automatic. > > 2. Host(RHEL) will not crash. > > The same as above - please proved `free -m` or some details on the swap > available. ==================================================== [root@dell-op740-03 ~]# free -m swap total used free shared buffers cached Mem: 7808 1428 6380 0 81 163 -/+ buffers/cache: 1183 6625 Swap: 0 0 0 [root@dell-op740-03 ~]# free -m swap total used free shared buffers cached Mem: 7808 3955 3853 0 81 163 -/+ buffers/cache: 3710 4098 Swap: 0 0 0 [root@dell-op740-03 ~]# free -m swap total used free shared buffers cached Mem: 7808 5995 1813 0 81 163 -/+ buffers/cache: 5750 2058 Swap: 0 0 0 [root@dell-op740-03 ~]# free -m swap total used free shared buffers cached Mem: 7808 7583 225 0 81 163 -/+ buffers/cache: 7338 470 Swap: 0 0 0 > > > So the crash only occurs on RHEV-H but no RHEL. > > You can find the script and crash.png in the attachment. > > IIUIC then it's quite normal that memory hogs get killed in cases that the > kernel runs out of memory. Chen, could you also please provide the informations Adam needs, see comment 5. (In reply to Fabian Deutsch from comment #14) > Chen, could you also please provide the informations Adam needs, see comment > 5. 1. /proc/cpuinfo on the host Please see attachment "cpuinfo.txt" 2. Details about the VM load on the host a. how many vms 9 vms b. how many vCPUs per vm 1 or 2 vCPUs per vm c. how much memory assigned to each VM The total Memory of the host is 48G, and 4800M memory assigned to each VM. d. workload that is driving memory consumption inside the VMs 99% 3. /var/log/vdsm/mom.log around the time of the crash so I can see the ksm settings that are being activated. Please see attachment "mom.log" Created attachment 918569 [details]
mom.log
Created attachment 918570 [details]
cpuinfo.txt
Hey Linqing, As comment 10 said, the scenario 2's error is different with original bug error. We suspect that it is another new bug, not the same issue as this bug. And we can not determine Shao Chen's test procedure is 100% steps to reproduce the customer bug. Could you help to reproduce this bug in kernel side? Thanks Ying Hey Alexander, See comment 5, could you reply and provide the information? Thanks Ying Hey Adam, do the informations (thanks Chen) from comment 15 shed some more light on this? Other two needinfo flags from got removed unintentionally. Adding them back. Sorry for the confusion. Attachment in comment #8 is a different bug than the one in comment #0 and comment #1. For comment #8 please file another bugreport, I've fixed in my upstream aa git tree several issues with OOM handling related to ext4 I/O errors that even lead to remounting the fs readonly (found with trinity triggering floods of OOM). For this bug (comment #0 and comment #1) it seems some sort of deadlock in smp_call_function_single/many. I would have expected the NMI watchdog to trigger too but checking the sos report it didn't. The softlockup shows the deadlock kept running for 67 seconds before full crash. The NMI watchdog should fire in 5 seconds much less. It's unclear if it's a lock inversion between all those smp_call_functions running simultanously or something else. One wouldn't expect bugs in the IPI delivery logic because it runs all the time. It would help if you could run SYSRQ+L and SYSRQ+T while syslog is still able to log (i.e. within the first 67 seconds) and report it. A crash dump would also help as then we could see all CPU stacktraces. Just to make an example CPU 1 is not shown. It is possible the culprit is in one of those CPUs that don't show the softlockup. I'll think more about the available stack trace next week. And if this is only reproducible in a single NUMA system and not everywhere else, we could evaluate if there could be hardware issues in the NUMA IPI delivery. A lost IPI can explain this too: there is a CPU waiting in csd_lock_wait in generic_exec_single, that is just waiting the IPI to run. (again if the IPI doesn't run normally it means the irqs have been disabled for too long on such CPU, but then the NMI watchdog should have fired, or the IPI was lost by the hardware, or there is some other software bug in the IPI delivery). "grep NMI /proc/interrupts" and "cat /proc/sys/kernel/nmi_watchdog" can also verify the NMI watchdog is running. Shao Chen, As comment 25, could you help to submit a new bug for your comment #8? Thanks. (In reply to Ying Cui from comment #28) > Shao Chen, > As comment 25, could you help to submit a new bug for your comment #8? > Thanks. I can't reproduce this issue with rhev-hypervisor6-6.5-20140821.1.el6ev(kernel-2.6.32-431.29.2.el6.x86_64 + qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64). The process of eatmemory will be killed automatic. so please ignore my comment. Thanks! ./eatmemory 20000M Eating 20971520000 bytes in chunks of 1024... Killed I'm building a patch after discussion with Andrea. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. *** Bug 1083448 has been marked as a duplicate of this bug. *** Patch(es) available on kernel-2.6.32-527.el6 Test following scenarios with # uname -r 2.6.32-550.el6.x86_64 # rpm -q qemu-kvm qemu-kvm-0.12.1.2-2.464.el6.x86_64 # rpm -qpi kernel-2.6.32-550.el6.x86_64.rpm --changelog |grep 1116398 - [x86] kvm: Avoid pagefault in kvm_lapic_sync_to_vapic (Paolo Bonzini) [1116398] ENV: host has 512G memory: # free -g total used free shared buffers cached Mem: 504 3 501 0 0 0 -/+ buffers/cache: 2 501 Swap: 3 0 3 The cli of guests like this: /usr/libexec/qemu-kvm -cpu Opteron_G1 -M rhel6.5.0 -enable-kvm -m 52G -smp 4,sockets=1,cores=4,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/RHEL-Server-6.7-64-virtio-scsi.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,snapshot=on -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:12,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vga qxl -spice port=5911,disable-ticketing,seamless-migration=on -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 scenario1: 1.start 14 guests: # ps aux |grep qemu -c 14 2.start stress in guest: # stress -m 1 --vm-bytes 50000M --vm-keep 3.wait for the memory of host is full used to trigger KSM activation. # free -m total used free shared buffers cached Mem: 516858 516337 521 0 1 46 -/+ buffers/cache: 516289 569 Swap: 4095 4040 55 # service ksm status ksm is running Result: wait for long time, host/guests work well, no crash/softlock occurs. scenario2: # service ksmtuned status ksmtuned is stopped # service ksm status ksm is running 1.start a 50G memory guest 2.Start stress in guest: # stress -m 1 --vm-bytes 50000M --vm-keep 3.try to execute the eatmemory program # ./eatmemory 500000M Eating 524288000000 bytes in chunks of 1024... Killed Result:guest/host work well. So the tests pass, and this bug is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1272.html |