Bug 1116398

Summary:

RHEV-H crashes and reboots when ksmd (MOM) is enabled

Product:

Red Hat Enterprise Linux 6

Reporter:

akotov

Component:

kernel

Assignee:

Paolo Bonzini <pbonzini>

kernel sub component:

KVM

QA Contact:

Virtualization Bugs <virt-bugs>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

urgent

Priority:

urgent

CC:

aarcange, agkesos, alitke, areis, audgiri, bmcclain, carlos.molina.ext, chayang, cshao, dhoward, fdeutsch, f_ella, gouyang, hkrzesin, huiwa, iheim, jbuchta, juzhang, lcapitulino, leiwang, lilu, liwan, michen, mkenneth, pagupta, pbonzini, pstehlik, qiguo, qzhang, rbalakri, rbarry, rhodain, rhod, riel, rpacheco, virt-bugs, virt-maint, yaniwang, yanwang, ycui, yeylon

Version:

6.5

Keywords:

Reopened, ZStream

Target Milestone:

Target Release:

6.5

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

kernel-2.6.32-527.el6

Doc Type:

Bug Fix

Doc Text:

Cause: a page fault is taken by KVM with interrupts disabled Consequence:, the page fault handler tries to take a lock, but KSM has sent an IPI while taking the same lock. KSM waits for the IPI to be processed, but KVM will not process it until it takes the lock. KSM and KVM then would deadlock, each waiting for the other. Fix: Avoid operations that can page fault while interrupts are disabled. Result: KVM and KSM do not bring each other to deadlock

Story Points:

---

Clone Of:

Clones:

1192055 (view as bug list)

Environment:

Last Closed:

2015-07-22 08:09:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1002699, 1069309, 1192055

Attachments:

Description	Flags
eatmemory	none
RHEV-H-crash.png	none
mom.log	none
cpuinfo.txt	none

Description akotov 2014-07-04 13:00:18 UTC

Description of problem:

When  KSM is active, hypervisors locks up and occasionally reboot

un  9 17:17:58 rhhyper11 kernel: BUG: soft lockup - CPU#4 stuck for 67s! [qemu-kvm:6084]
Jun  9 17:17:58 rhhyper11 kernel: Modules linked in: iptable_nat nf_nat ebt_arp nfs fscache auth_rpcgss nfs_acl ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp ebtable_nat ebtables bnx2fc fcoe libfcoe libfc lockd sunrpc bridge bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport iptable_filter ip_tables ext4 jbd2 8021q garp stp llc sha256_generic cbc cryptoloop dm_crypt ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 aes_generic vhost_net macvtap macvlan tun kvm_amd kvm sg hpwdt amd64_edac_mod edac_core edac_mce_amd i2c_piix4 shpchp dm_snapshot squashfs ext2 mbcache dm_round_robin sd_mod hpsa lpfc scsi_transport_fc scsi_tgt crc_t10dif be2net pata_acpi ata_generic pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscs
Jun  9 17:17:58 rhhyper11 kernel: i_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
Jun  9 17:17:58 rhhyper11 kernel: CPU 4 
Jun  9 17:17:58 rhhyper11 kernel: Modules linked in: iptable_nat nf_nat ebt_arp nfs fscache auth_rpcgss nfs_acl ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp ebtable_nat ebtables bnx2fc fcoe libfcoe libfc lockd sunrpc bridge bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport iptable_filter ip_tables ext4 jbd2 8021q garp stp llc sha256_generic cbc cryptoloop dm_crypt ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 aes_generic vhost_net macvtap macvlan tun kvm_amd kvm sg hpwdt amd64_edac_mod edac_core edac_mce_amd i2c_piix4 shpchp dm_snapshot squashfs ext2 mbcache dm_round_robin sd_mod hpsa lpfc scsi_transport_fc scsi_tgt crc_t10dif be2net pata_acpi ata_generic pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscs
Jun  9 17:17:58 rhhyper11 kernel: i_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
Jun  9 17:17:58 rhhyper11 kernel: 
Jun  9 17:17:58 rhhyper11 kernel: Pid: 6084, comm: qemu-kvm Not tainted 2.6.32-431.11.2.el6.x86_64 #1 HP ProLiant BL685c G7
Jun  9 17:17:58 rhhyper11 kernel: RIP: 0010:[<ffffffff8152a92e>]  [<ffffffff8152a92e>] _spin_lock+0x1e/0x30
Jun  9 17:17:58 rhhyper11 kernel: RSP: 0000:ffff880f80a0d5e8  EFLAGS: 00000287
Jun  9 17:17:58 rhhyper11 kernel: RAX: 000000000000dfa0 RBX: ffff880f80a0d5e8 RCX: ffff880000000000
Jun  9 17:17:58 rhhyper11 kernel: RDX: 000000000000df9f RSI: ffff880c39721d20 RDI: ffffea002ab1b5b8
Jun  9 17:17:58 rhhyper11 kernel: RBP: ffffffff8100bb8e R08: 0000000000000000 R09: 0000000000001000
Jun  9 17:17:58 rhhyper11 kernel: R10: 0000000000013560 R11: 0000000000000000 R12: ffffffffa04b90e9
Jun  9 17:17:58 rhhyper11 kernel: R13: ffff880f80a0d5b8 R14: ffff8813e123d000 R15: 0000000000000003
Jun  9 17:17:58 rhhyper11 kernel: FS:  00007f7001a2c980(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
Jun  9 17:17:58 rhhyper11 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun  9 17:17:58 rhhyper11 kernel: CR2: 00007f6dd9a00000 CR3: 0000001b77d51000 CR4: 00000000000007e0
Jun  9 17:17:58 rhhyper11 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun  9 17:17:58 rhhyper11 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun  9 17:17:58 rhhyper11 kernel: Process qemu-kvm (pid: 6084, threadinfo ffff880f80a0c000, task ffff88103a570040)
Jun  9 17:17:58 rhhyper11 kernel: Stack:
Jun  9 17:17:58 rhhyper11 kernel: ffff880f80a0d628 ffffffff81153e81 ffff880f80a0d608 ffff881c39222d40
Jun  9 17:17:58 rhhyper11 kernel: <d> 0000000000000301 00007f30b49ae000 ffff8808348c04a8 0000000000000301
Jun  9 17:17:58 rhhyper11 kernel: <d> ffff880f80a0d6b8 ffffffff81154670 ffff880c3923ccd0 ffffffff81173b00
Jun  9 17:17:58 rhhyper11 kernel: Call Trace:
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81153e81>] ? page_check_address+0x141/0x1d0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81154670>] ? try_to_unmap_one+0x40/0x500
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81173b00>] ? remove_migration_pte+0x0/0x300
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81155294>] ? rmap_walk+0x184/0x230
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff811696ab>] ? compaction_alloc+0x3b/0x460
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff811553ee>] ? try_to_unmap_anon+0xae/0x140
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81155cd5>] ? try_to_unmap+0x55/0x70
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81174de3>] ? migrate_pages+0x333/0x480
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81169670>] ? compaction_alloc+0x0/0x460
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8116a1b1>] ? compact_zone+0x581/0x950
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8116a82c>] ? compact_zone_order+0xac/0x100
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814898e9>] ? nf_iterate+0x69/0xb0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8113b898>] ? zone_reclaim+0x558/0x650
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814b1001>] ? tcp_send_delayed_ack+0xf1/0x100
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814ad88b>] ? tcp_rcv_established+0x39b/0x7f0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8112d82c>] ? get_page_from_freelist+0x6ac/0x870
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814af44e>] ? tcp_transmit_skb+0x40e/0x7b0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814a4d91>] ? tcp_recvmsg+0x821/0xe80
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8112f3a3>] ? __alloc_pages_nodemask+0x113/0x8d0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8144a929>] ? sock_common_recvmsg+0x39/0x50
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81449f23>] ? sock_recvmsg+0x133/0x160
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81167baa>] ? alloc_pages_vma+0x9a/0x150
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8118344d>] ? do_huge_pmd_anonymous_page+0x14d/0x3b0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8114b360>] ? handle_mm_fault+0x2f0/0x300
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8144a11b>] ? sys_recvfrom+0x16b/0x180
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81144de0>] ? sys_madvise+0x350/0x790
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8152da7e>] ? do_page_fault+0x3e/0xa0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8152ae35>] ? page_fault+0x25/0x30
Jun  9 17:17:58 rhhyper11 kernel: Code: 00 00 00 01 74 05 e8 22 41 d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 
Jun  9 17:17:58 rhhyper11 kernel: Call Trace:
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81154cb0>] ? page_add_anon_rmap+0x10/0x20
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81153e81>] ? page_check_address+0x141/0x1d0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81154670>] ? try_to_unmap_one+0x40/0x500
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81173b00>] ? remove_migration_pte+0x0/0x300
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81155294>] ? rmap_walk+0x184/0x230
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff811696ab>] ? compaction_alloc+0x3b/0x460
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff811553ee>] ? try_to_unmap_anon+0xae/0x140
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81155cd5>] ? try_to_unmap+0x55/0x70
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81174de3>] ? migrate_pages+0x333/0x480
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81169670>] ? compaction_alloc+0x0/0x460
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8116a1b1>] ? compact_zone+0x581/0x950
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8116a82c>] ? compact_zone_order+0xac/0x100
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814898e9>] ? nf_iterate+0x69/0xb0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8113b898>] ? zone_reclaim+0x558/0x650
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814b1001>] ? tcp_send_delayed_ack+0xf1/0x100
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814ad88b>] ? tcp_rcv_established+0x39b/0x7f0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8112d82c>] ? get_page_from_freelist+0x6ac/0x870
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814af44e>] ? tcp_transmit_skb+0x40e/0x7b0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff814a4d91>] ? tcp_recvmsg+0x821/0xe80
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8112f3a3>] ? __alloc_pages_nodemask+0x113/0x8d0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8144a929>] ? sock_common_recvmsg+0x39/0x50
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81449f23>] ? sock_recvmsg+0x133/0x160
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81167baa>] ? alloc_pages_vma+0x9a/0x150
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8118344d>] ? do_huge_pmd_anonymous_page+0x14d/0x3b0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8114b360>] ? handle_mm_fault+0x2f0/0x300
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8144a11b>] ? sys_recvfrom+0x16b/0x180
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff81144de0>] ? sys_madvise+0x350/0x790
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8152da7e>] ? do_page_fault+0x3e/0xa0
Jun  9 17:17:58 rhhyper11 kernel: [<ffffffff8152ae35>] ? page_fault+0x25/0x30
Jun  9 17:18:58 rhhyper11 kernel: BUG: soft lockup - CPU#0 stuck for 67s! [ksmd:507]
Jun  9 17:18:58 rhhyper11 kernel: Modules linked in: iptable_nat nf_nat ebt_arp nfs fscache auth_rpcgss nfs_acl ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp ebtable_nat ebtables bnx2fc fcoe libfcoe libfc lockd sunrpc bridge bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport iptable_filter ip_tables ext4 jbd2 8021q garp stp llc sha256_generic cbc cryptoloop dm_crypt ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 aes_generic vhost_net macvtap macvlan tun kvm_amd kvm sg hpwdt amd64_edac_mod edac_core edac_mce_amd i2c_piix4 shpchp dm_snapshot squashfs ext2 mbcache dm_round_robin sd_mod hpsa lpfc scsi_transport_fc scsi_tgt crc_t10dif Jun  9 17:26:59 rhhyper11 kernel: imklog 5.8.10, log source = /proc/kmsg started.

[crash]

Jun  9 17:26:59 rhhyper11 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="10319" x-info="http://www.rsyslog.com"] start
Jun  9 17:26:59 rhhyper11 kernel: Initializing cgroup subsys cpuset
Jun  9 17:26:59 rhhyper11 kernel: Initializing cgroup subsys cpu
Jun  9 17:26:59 rhhyper11 kernel: Linux version 2.6.32-431.11.2.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Mon Mar 3 13:32:45 EST 2014
J


Version-Release number of selected component (if applicable):

Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140520.0.el6ev)

How reproducible:

On customer's hardware constantly when KSM is enabled,


Steps to Reproduce:
1. Run VMs on hypervisor to trigger KSM activation


Actual results:
Hypervisor crashes


Expected results:

Hypervisor works with KSM normally
Additional info:

disabling KSM completely workarounds the issue, environment running stable for weeks

# vdsClient -s 0 setMOMPolicyParameters ksmEnabled=False
# service ksmtuned status
# service ksmtuned stop

Comment 2 Fabian Deutsch 2014-07-04 13:07:08 UTC

Moving this to qemu-kvm for now, but I am not sure if this is a qemu-kvm-rhev or kernel issue.

Comment 4 Fabian Deutsch 2014-07-04 13:42:59 UTC

This is affecting (at least) RHEV-H so requesting 6.5.z

Comment 5 Adam Litke 2014-07-07 12:31:04 UTC

Hi,

The first thing that comes to mind when I see a soft lockup like this is that you are trying to run too many vCPUs given the number of physical cores you have in the hardware.  Could you share some more information about your configuration:

1. /proc/cpuinfo on the host
2. Details about the VM load on the host 
  a. how many vms
  b. how many vCPUs per vm
  c. how much memory assigned to each VM
  d. workload that is driving memory consumption inside the VMs
3. /var/log/vdsm/mom.log around the time of the crash so I can see the ksm settings that are being activated.

Comment 6 cshao 2014-07-10 08:54:15 UTC

RHEV-H QE got the following conclusions after did three different scenarios. 

Test version:
rhev-hypervisor6-6.5-20140624.0.el6ev
ovirt-node-3.0.1-18.el6_5.11.noarch
vdsm-4.14.7-3.el6ev.x86_64
RHEVM av10


Test scenario 1 (Host: RHEV-H):
Run lots of VMs to full of memory for trigger KSM activation.

Test result:
ksm server can start automatic and RHEV-H will no crash.


Test scenario 2 (Host: RHEV-H):
1. Create a VM and made the memory more closely to the host(e.g. The memory of host= 48G, then set the Memory of VM to 48G)
2. Run eatmemory script on VM for run out of memory

Test result:
RHEV-H crashes.


Test scenario 3 (Host: RHEL):
Do the same steps with scenario 2 on RHEL host.

Test result:
1. The process of eatmemory will be killed automatic.
2. Host(RHEL) will not crash.

So the crash only occurs on RHEV-H but no RHEL.
You can find the script and crash.png in the attachment.

Thanks!

Comment 7 cshao 2014-07-10 08:54:51 UTC

Created attachment 917023 [details]
eatmemory

Comment 8 cshao 2014-07-10 08:56:00 UTC

Created attachment 917024 [details]
RHEV-H-crash.png

Comment 9 Fabian Deutsch 2014-07-10 10:26:13 UTC

Hey Ying,

thanks for the excessive testing. Pelase provide some more details (see inline).

(In reply to shaochen from comment #6)
> Test scenario 2 (Host: RHEV-H):
> 1. Create a VM and made the memory more closely to the host(e.g. The memory
> of host= 48G, then set the Memory of VM to 48G)
> 2. Run eatmemory script on VM for run out of memory
> 
> Test result:
> RHEV-H crashes.

Please provide the output of `free -m` , or some details on how much swap was avilable.

> Test scenario 3 (Host: RHEL):
> Do the same steps with scenario 2 on RHEL host.
> 
> Test result:
> 1. The process of eatmemory will be killed automatic.
> 2. Host(RHEL) will not crash.

The same as above - please proved `free -m` or some details on the swap available.

> So the crash only occurs on RHEV-H but no RHEL.
> You can find the script and crash.png in the attachment.

IIUIC then it's quite normal that memory hogs get killed in cases that the kernel runs out of memory.

Comment 10 cshao 2014-07-10 12:42:12 UTC

> Please provide the output of `free -m` , or some details on how much swap
> was avilable.
> 

=====================================================
[root@ibm-x3650m3-01 admin]# free -m swap
             total       used       free     shared    buffers     cached
Mem:         48259       4505      43754          0         30        195
-/+ buffers/cache:       4278      43981
Swap:        32323        868      31455
[root@ibm-x3650m3-01 admin]# free -m swap
             total       used       free     shared    buffers     cached
Mem:         48259       9817      38442          0         30        195
-/+ buffers/cache:       9591      38668
Swap:        32323        868      31455
[root@ibm-x3650m3-01 admin]# free -m swap
             total       used       free     shared    buffers     cached
Mem:         48259      22919      25340          0         30        196
-/+ buffers/cache:      22692      25567
Swap:        32323        867      31456
[root@ibm-x3650m3-01 admin]# free -m swap
             total       used       free     shared    buffers     cached
Mem:         48259      36447      11812          0         30        196
-/+ buffers/cache:      36220      12039
Swap:        32323        864      31459
[root@ibm-x3650m3-01 admin]# free -m swap
             total       used       free     shared    buffers     cached
Mem:         48259      46712       1547          0         30        196
-/+ buffers/cache:      46485       1774
Swap:        32323        864      31459
[root@ibm-x3650m3-01 admin]# free -m swap
             total       used       free     shared    buffers     cached
Mem:         48259      47990        269          0         22        148
-/+ buffers/cache:      47819        440
Swap:        32323       1137      31186

The scenario 2's error is different with original bug error.


> > Test scenario 3 (Host: RHEL):
> > Do the same steps with scenario 2 on RHEL host.
> > 
> > Test result:
> > 1. The process of eatmemory will be killed automatic.
> > 2. Host(RHEL) will not crash.
> 
> The same as above - please proved `free -m` or some details on the swap
> available.

====================================================
[root@dell-op740-03 ~]# free -m swap
             total       used       free     shared    buffers     cached
Mem:          7808       1428       6380          0         81        163
-/+ buffers/cache:       1183       6625
Swap:            0          0          0
[root@dell-op740-03 ~]# free -m swap
             total       used       free     shared    buffers     cached
Mem:          7808       3955       3853          0         81        163
-/+ buffers/cache:       3710       4098
Swap:            0          0          0
[root@dell-op740-03 ~]# free -m swap
             total       used       free     shared    buffers     cached
Mem:          7808       5995       1813          0         81        163
-/+ buffers/cache:       5750       2058
Swap:            0          0          0
[root@dell-op740-03 ~]# free -m swap
             total       used       free     shared    buffers     cached
Mem:          7808       7583        225          0         81        163
-/+ buffers/cache:       7338        470
Swap:            0          0          0


> 
> > So the crash only occurs on RHEV-H but no RHEL.
> > You can find the script and crash.png in the attachment.
> 
> IIUIC then it's quite normal that memory hogs get killed in cases that the
> kernel runs out of memory.

Comment 14 Fabian Deutsch 2014-07-16 13:24:06 UTC

Chen, could you also please provide the informations Adam needs, see comment 5.

Comment 15 cshao 2014-07-17 02:57:58 UTC

(In reply to Fabian Deutsch from comment #14)
> Chen, could you also please provide the informations Adam needs, see comment
> 5.

1. /proc/cpuinfo on the host
Please see attachment "cpuinfo.txt"
2. Details about the VM load on the host 
  a. how many vms
9 vms

  b. how many vCPUs per vm
1 or 2 vCPUs per vm

  c. how much memory assigned to each VM
The total Memory of the host is 48G, and 4800M memory assigned to each VM.

  d. workload that is driving memory consumption inside the VMs
99%
3. /var/log/vdsm/mom.log around the time of the crash so I can see the ksm settings that are being activated.
Please see attachment "mom.log"

Comment 16 cshao 2014-07-17 02:58:36 UTC

Created attachment 918569 [details]
mom.log

Comment 17 cshao 2014-07-17 02:59:07 UTC

Created attachment 918570 [details]
cpuinfo.txt

Comment 19 Ying Cui 2014-07-17 07:31:39 UTC

Hey Linqing,
   As comment 10 said, the scenario 2's error is different with original bug error. We suspect that it is another new bug, not the same issue as this bug. And we can not determine Shao Chen's test procedure is 100% steps to reproduce the customer bug. 
   Could you help to reproduce this bug in kernel side?

Thanks
Ying

Comment 20 Ying Cui 2014-07-17 07:36:36 UTC

Hey Alexander,
   See comment 5, could you reply and provide the information?

Thanks
Ying

Comment 21 Fabian Deutsch 2014-07-17 08:38:56 UTC

Hey Adam, do the informations (thanks Chen) from comment 15 shed some more light on this?

Comment 23 Linqing Lu 2014-07-17 09:00:51 UTC

Other two needinfo flags from got removed unintentionally.
Adding them back. Sorry for the confusion.

Comment 25 Andrea Arcangeli 2014-08-14 23:16:06 UTC

Attachment in comment #8 is a different bug than the one in comment #0 and comment #1.

For comment #8 please file another bugreport, I've fixed in my upstream aa git tree several issues with OOM handling related to ext4 I/O errors that even lead to remounting the fs readonly (found with trinity triggering floods of OOM).

For this bug (comment #0 and comment #1) it seems some sort of deadlock in smp_call_function_single/many. I would have expected the NMI watchdog to trigger too but checking the sos report it didn't. The softlockup shows the deadlock kept running for 67 seconds before full crash. The NMI watchdog should fire in 5 seconds much less. It's unclear if it's a lock inversion between all those smp_call_functions running simultanously or something else. One wouldn't expect bugs in the IPI delivery logic because it runs all the time.

It would help if you could run SYSRQ+L and SYSRQ+T while syslog is still able to log (i.e. within the first 67 seconds) and report it. A crash dump would also help as then we could see all CPU stacktraces. Just to make an example CPU 1 is not shown. It is possible the culprit is in one of those CPUs that don't show the softlockup.

I'll think more about the available stack trace next week.

And if this is only reproducible in a single NUMA system and not everywhere else, we could evaluate if there could be hardware issues in the NUMA IPI delivery. A lost IPI can explain this too: there is a CPU waiting in csd_lock_wait in generic_exec_single, that is just waiting the IPI to run. (again if the IPI doesn't run normally it means the irqs have been disabled for too long on such CPU, but then the NMI watchdog should have fired, or the IPI was lost by the hardware, or there is some other software bug in the IPI delivery).

"grep NMI /proc/interrupts" and "cat /proc/sys/kernel/nmi_watchdog" can also verify the NMI watchdog is running.

Comment 28 Ying Cui 2014-09-02 08:13:33 UTC

Shao Chen,
   As comment 25, could you help to submit a new bug for your comment #8? Thanks.

Comment 29 cshao 2014-09-17 06:04:34 UTC

(In reply to Ying Cui from comment #28)
> Shao Chen,
>    As comment 25, could you help to submit a new bug for your comment #8?
> Thanks.

I can't reproduce this issue with rhev-hypervisor6-6.5-20140821.1.el6ev(kernel-2.6.32-431.29.2.el6.x86_64 + qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64).

The process of eatmemory will be killed automatic. so please ignore my comment.

Thanks!

./eatmemory 20000M
Eating 20971520000 bytes in chunks of 1024...
Killed

Comment 45 Paolo Bonzini 2015-01-12 19:47:44 UTC

I'm building a patch after discussion with Andrea.

Comment 47 RHEL Program Management 2015-01-13 18:20:08 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 48 Paolo Bonzini 2015-01-14 13:11:42 UTC

*** Bug 1083448 has been marked as a duplicate of this bug. ***

Comment 52 Rafael Aquini 2015-01-30 17:11:39 UTC

Patch(es) available on kernel-2.6.32-527.el6

Comment 60 Qian Guo 2015-04-08 17:06:08 UTC

Test following scenarios with 

# uname -r
2.6.32-550.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.464.el6.x86_64

# rpm -qpi kernel-2.6.32-550.el6.x86_64.rpm --changelog |grep 1116398
- [x86] kvm: Avoid pagefault in kvm_lapic_sync_to_vapic (Paolo Bonzini) [1116398]


ENV:
host has 512G memory:
# free -g
             total       used       free     shared    buffers     cached
Mem:           504          3        501          0          0          0
-/+ buffers/cache:          2        501
Swap:            3          0          3


The cli of guests like this:
/usr/libexec/qemu-kvm -cpu Opteron_G1 -M rhel6.5.0 -enable-kvm -m 52G -smp 4,sockets=1,cores=4,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/RHEL-Server-6.7-64-virtio-scsi.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,snapshot=on -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:12,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vga qxl -spice port=5911,disable-ticketing,seamless-migration=on -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864


scenario1:
1.start 14 guests:
# ps aux |grep qemu -c
14

2.start stress in guest:
# stress -m 1 --vm-bytes 50000M --vm-keep

3.wait for the memory of host is full used to trigger KSM activation.
# free -m
             total       used       free     shared    buffers     cached
Mem:        516858     516337        521          0          1         46
-/+ buffers/cache:     516289        569
Swap:         4095       4040         55

# service ksm status
ksm is running

Result: wait for long time, host/guests work well, no crash/softlock occurs.

scenario2:
# service ksmtuned status
ksmtuned is stopped
# service ksm status
ksm is running

1.start a 50G memory guest

2.Start stress in guest:
# stress -m 1 --vm-bytes 50000M --vm-keep

3.try to execute the eatmemory program
# ./eatmemory 500000M
Eating 524288000000 bytes in chunks of 1024...


Killed

Result:guest/host work well.

So the tests pass, and this bug is fixed.

Comment 63 errata-xmlrpc 2015-07-22 08:09:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html