1525356 – qemu quit silently after virtio-balloon of deadlock on OOM occurred

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1525356 - qemu quit silently after virtio-balloon of deadlock on OOM occurred

Summary: qemu quit silently after virtio-balloon of deadlock on OOM occurred

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.6
Hardware:	ppc64le
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.6
Assignee:	David Gibson
QA Contact:	Min Deng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1513404 1528344
TreeView+	depends on / blocked

Reported:	2017-12-13 07:32 UTC by Min Deng
Modified:	2018-07-16 13:27 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-06 04:00:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
balloonissuelog (84.38 KB, text/plain) 2017-12-20 06:03 UTC, Min Deng	no flags	Details
screenshot (258.12 KB, image/png) 2017-12-20 06:06 UTC, Min Deng	no flags	Details
Log (15.24 KB, text/plain) 2018-01-29 02:24 UTC, Min Deng	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	162967	0	None	None	None	2019-03-15 05:03:26 UTC

Description Min Deng 2017-12-13 07:32:26 UTC

Description of problem:
qemu quit silently after virtio-balloon of deadlock on OOM occurred

Version-Release number of selected component (if applicable):
kernel-4.14.0-18.el7a.ppc64le
qemu-kvm-rhev-2.10.0-11.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

How reproducible:
3/3

Steps to Reproduce:
/usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries-rhel7.5.0 -nodefaults -vga std -device virtio-blk-pci,id=virtio_blk_pci0,disable-legacy=off,disable-modern=off,drive=drive_image1 -drive id=drive_image1,if=none,cache=none,aio=native,format=qcow2,file=Pegas-Server-7.4-bug.qcow2 -qmp tcp:0:5555,server,nowait -vnc :11 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -monitor stdio -device nec-usb-xhci,id=usb1 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-down,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:11:36:3f:01 -chardev socket,id=serial_id_serial0,path=/tmp/min,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -m 24G,maxmem=200G,slots=200 -smp 8 -device virtio-balloon-pci,id=balloon0

2.run the following command in guest
  #memhog -r100 22G 

3.evict balloon at the same time.
  balloon 300

Actual results,
qemu quit silently,just looked like shutdown by itself.
Expected results:
qemu will not quit even if there was a deadlock of OOM of virtio-balloon 

Additional info:
Please refer to 1516486 if need.

Comment 1 Min Deng 2017-12-13 07:50:19 UTC

Output of terminal
/usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries-rhel7.5.0 -nodefaults -vga std -device virtio-blk-pci,id=virtio_blk_pci0,disable-legacy=off,disable-modern=off,drive=drive_image1 -drive id=drive_image1,if=none,cache=none,aio=native,format=qcow2,file=Pegas-Server-7.4-bug.qcow2 -qmp tcp:0:5555,server,nowait -vnc :11 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -monitor stdio -device nec-usb-xhci,id=usb1 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-down,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:11:36:3f:01 -chardev socket,id=serial_id_serial0,path=/tmp/min,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -m 24G,maxmem=200G,slots=200 -smp 8 -device virtio-balloon-pci,id=balloon0
QEMU 2.10.0 monitor - type 'help' for more information
(qemu) 
(qemu) 
(qemu) 
(qemu) 
(qemu) balloon 300
(qemu) qemu-kvm: network script /etc/qemu-down failed with status 256

Comment 2 Serhii Popovych 2017-12-14 12:41:57 UTC

Components:

Host kernel  : 4.14.0-18.el7a.ppc64le
Guest kernel : 4.14.0-18.el7a.ppc64le
Qemu-KVM     : 2.10.0-12.el7.ppc64le
SLOF         : SLOF-20170724-2.git89f519f.el7.noarch

Using steps from comment 0 not able to get qemu-kvm exit silently as in comment 1.

Instead hit contiguous traces from guest kernel, even after booting another guest kernel
(i.e. originally boot 4.14.0-18.el7.ppc64le, hit traces, boot 3.10.0-820.el7.ppc64le)
without qemu-kvm stop/start.

[   31.758831] Node 0 DMA free:19584kB min:19584kB low:24448kB high:29376kB active_anon:32576kB inactive_anon:36864kB active_file:704kB inactive_file:896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:25165824kB managed:413888kB mlocked:0kB dirty:0kB writeback:36928kB mapped:4224kB shmem:5120kB slab_reclaimable:21696kB slab_unreclaimable:99648kB kernel_stack:2688kB pagetables:2240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[   31.759145] lowmem_reserve[]: 0 0 0
[   31.759204] Node 0 DMA: 34*64kB (UEM) 7*128kB (U) 5*256kB (UM) 4*512kB (UM) 3*1024kB (UM) 2*2048kB (UM) 2*4096kB (UM) 0*8192kB 0*16384kB = 21760kB
[   31.759430] 677 total pagecache pages
[   31.759456] 580 pages in swap cache
[   31.759482] Swap cache stats: add 2251, delete 1671, find 3/4
[   31.759524] Free swap  = 1949632kB
[   31.759550] Total swap = 2097088kB
[   31.759576] 393216 pages RAM
[   31.759601] 0 pages HighMem/MovableOnly
[   31.759627] 386749 pages reserved
[   31.759656] virtio_balloon virtio3: Out of puff! Can't get 16 pages
[   31.989644] kworker/2:1: page allocation failure: order:0, mode:0x310da
[   31.989693] CPU: 2 PID: 76 Comm: kworker/2:1 Not tainted 3.10.0-820.el7.ppc64le #1
[   31.989746] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[   31.989804] Call Trace:
[   31.989824] [c0000005ed70b740] [c00000000001b340] show_stack+0x80/0x330 (unreliable)
[   31.989884] [c0000005ed70b7f0] [c000000000a27ebc] dump_stack+0x30/0x44
[   31.989936] [c0000005ed70b810] [c00000000026a77c] warn_alloc_failed+0x10c/0x160
[   31.989996] [c0000005ed70b8c0] [c000000000271518] __alloc_pages_nodemask+0xb68/0xc70
[   31.990055] [c0000005ed70bab0] [c0000000002e9920] alloc_pages_current+0x1f0/0x430
[   31.990115] [c0000005ed70bb30] [c00000000032a794] balloon_page_alloc+0x24/0x40
[   31.990175] [c0000005ed70bb50] [d000000004750f30] update_balloon_size_func+0xe0/0x430 [virtio_balloon]
[   31.990243] [c0000005ed70bc40] [c00000000011784c] process_one_work+0x1dc/0x680
[   31.990303] [c0000005ed70bce0] [c000000000117e90] worker_thread+0x1a0/0x520
[   31.990354] [c0000005ed70bd80] [c000000000123d4c] kthread+0xec/0x100
[   31.990406] [c0000005ed70be30] [c00000000000a4b8] ret_from_kernel_thread+0x5c/0xa4
[   31.990464] Mem-Info:
[   31.990485] active_anon:274 inactive_anon:297 isolated_anon:0
[   31.990485]  active_file:9 inactive_file:12 isolated_file:0
[   31.990485]  unevictable:0 dirty:0 writeback:308 unstable:0
[   31.990485]  slab_reclaimable:339 slab_unreclaimable:1557
[   31.990485]  mapped:66 shmem:79 pagetables:35 bounce:0
[   31.990485]  free:302 free_pcp:0 free_cma:0
[   31.990676] Node 0 DMA free:19328kB min:19584kB low:24448kB high:29376kB active_anon:17536kB inactive_anon:19008kB active_file:576kB inactive_file:768kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:25165824kB managed:380288kB mlocked:0kB dirty:0kB writeback:19712kB mapped:4224kB shmem:5056kB slab_reclaimable:21696kB slab_unreclaimable:99648kB kernel_stack:2688kB pagetables:2240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

After reverting of commit 0dab85ecc476 ([virtio] virtio_balloon: fix increment
of vb->num_pfns in fill_balloon()) introduced with bz1516486 from kernel-alt
traces only happen when I reboot guest kernel without qemu-kvm stop/start.

Min, do you have results from reproducing comment 0 on x86? Also not sure this is qemu-kvm-rhev related (kernel-alt as component?).

Comment 3 David Gibson 2017-12-15 01:03:16 UTC

Serhii, the traces in comment 2 look expected to me: we're inflating the balloon so large the guest doesn't have enough memory to operate properly and so generates continuous out of memory errors.

qemu quitting is definitely wrong, but there's not much information to go on here.  

Min, are you able to reproduce with the 2.10.0-12.el7.ppc64le qemu that Serhii tried?

Comment 6 Min Deng 2017-12-20 06:02:27 UTC

> Min, correct me if I'm wrong: do you see qemu-kvm process on the host after
> memhog in guest being killed?
  I double checked there wasn't qemu process on host any more.The guest booted up again and again but qemu-kvm process quit eventually.
  Do you have any grub (if used) messages on guest?
  qemu-kvm quit from the terminal directly and I only got the following
  Please see attachment.

  Thanks
  Min

Comment 7 Min Deng 2017-12-20 06:03:52 UTC

Created attachment 1370291 [details]
balloonissuelog

Also can reproduced on build.
qemu-kvm-rhev-2.10.0-13.el7.ppc64le

Comment 8 Min Deng 2017-12-20 06:06:57 UTC

Created attachment 1370293 [details]
screenshot

Comment 9 David Gibson 2017-12-21 01:48:02 UTC

Min,

We're having trouble reproducing this.  Can you check a couple of situations:

   1) Does the problem occur on POWER8 with the 4.14 kernel?
   2) If (1) does the problem occur with the 3.10 kernel on POWER8?

Comment 10 Min Deng 2017-12-21 09:10:29 UTC

(In reply to David Gibson from comment #9)
> Min,
> 
> We're having trouble reproducing this.  Can you check a couple of situations:
> 
>    1) Does the problem occur on POWER8 with the 4.14 kernel?
>    2) If (1) does the problem occur with the 3.10 kernel on POWER8?

  Hi David,
     The bug couldn't be reproduced on P8 host with the following builds installed
     kernel-3.10.0-823.el7.ppc64le(host and guest)
     qemu-kvm-rhev-2.10.0-13.el7.ppc64le
     and 
     kernel-4.14.0-18.el7a.ppc64le(host and guest)
     qemu-kvm-rhev-2.10.0-13.el7.ppc64le

    Thanks 
 
Min

Comment 11 Tetsuo Handa 2017-12-26 04:11:15 UTC

You are not using deflate-on-oom=on option when specifying balloon device.
( https://www.redhat.com/archives/libvir-list/2015-December/msg00494.html )

Unless deflate-on-oom=on is enabled by default, isn't this an expected result that
trying to inflate the balloon too much leads to kernel panic due to no more
OOM-killable processes?

What is the problem?

Comment 12 Tetsuo Handa 2017-12-26 07:13:15 UTC

So, the problem is that something is causing qemu process to terminate
silently, isn't it? Then, there are two possibilities to check. One is
that qemu process is abnormally terminating regardless of guest (i.e.
a bug in qemu side). The other is that guest is calling
reboot(LINUX_REBOOT_CMD_POWER_OFF) due to emergency situation that
restarting of killed processes are failing.

I think you can try inserting a SystemTap probe into the reboot events.
For example,

---------- halt2panic.stp ----------
probe kernel.function("kernel_power_off") { panic("Calling panic() due to poweroff\n"); }
probe kernel.function("kernel_halt") { panic("Calling panic() due to halt\n"); }
---------- halt2panic.stp ----------

# stap -p4 -g -m stap_halt2panic halt2panic.stp
# staprun -L stap_halt2panic.ko

will give you logs like below if somebody is calling poweroff/halt.

----------
[  190.895040] Kernel panic - not syncing: Calling panic() due to poweroff
[  190.895040] 
[  190.897992] CPU: 6 PID: 1610 Comm: poweroff Tainted: G           OE  ------------   3.10.0-693.11.1.el7.x86_64 #1
[  190.901634] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  190.905415]  ffffc900006bc000 00000000796da6b0 ffff880134ca3c08 ffffffff816a3e61
[  190.908269]  ffff880134ca3c88 ffffffff8169dd24 0000000000000010 ffff880134ca3c98
[  190.911091]  ffff880134ca3c38 00000000796da6b0 ffffffff810bd514 ffffffffc0506138
[  190.913800] Call Trace:
[  190.914426]  [<ffffffff816a3e61>] dump_stack+0x19/0x1b
[  190.915756]  [<ffffffff8169dd24>] panic+0xe8/0x20d
[  190.916982]  [<ffffffff810bd514>] ? __wake_up+0x44/0x50
[  190.918339]  [<ffffffffc04fff02>] function___global_panic__overload_0+0x62/0x70 [stap_halt2panic]
[  190.920664]  [<ffffffffc0501a80>] probe_6119+0x40/0x80 [stap_halt2panic]
[  190.922434]  [<ffffffff810a0311>] ? kernel_power_off+0x1/0x80
[  190.923893]  [<ffffffffc0503c7e>] enter_kprobe_probe+0x19e/0x330 [stap_halt2panic]
[  190.925789]  [<ffffffff810a0311>] ? kernel_power_off+0x1/0x80
[  190.927282]  [<ffffffff816afa47>] kprobe_ftrace_handler+0xb7/0x120
[  190.928813]  [<ffffffff810a0315>] ? kernel_power_off+0x5/0x80
[  190.930266]  [<ffffffff810a0310>] ? migrate_to_reboot_cpu+0x70/0x70
[  190.931800]  [<ffffffff810a23ab>] ? SYSC_reboot+0x18b/0x260
[  190.933208]  [<ffffffff8114151e>] ftrace_ops_list_func+0xee/0x110
[  190.934703]  [<ffffffff816b6eb4>] ftrace_regs_call+0x5/0x81
[  190.936107]  [<ffffffff810a0310>] ? migrate_to_reboot_cpu+0x70/0x70
[  190.937695]  [<ffffffff810a0315>] ? kernel_power_off+0x5/0x80
[  190.939105]  [<ffffffff810a23ab>] ? SYSC_reboot+0x18b/0x260
[  190.940473]  [<ffffffff816b0091>] ? __do_page_fault+0x171/0x450
[  190.941985]  [<ffffffff810a24ee>] SyS_reboot+0xe/0x10
[  190.943236]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
----------

Comment 13 David Gibson 2018-01-04 01:19:32 UTC

Ugly as the qemu quit is, it's not blocker material, so punting to 7.6.

Comment 14 Serhii Popovych 2018-01-04 13:45:46 UTC

After several tests I have full picture of the case:

  1) According to comment 11 we do not use deflate-on-oom=on option to balloon
     on qemu command line. I played with this option and behaviour as described 
     in bug 1002360#c3: never exit, run OOM killer periodically, balloon does 
     report "Out of puff", not panic'ing guest.

     This is true for both 4.14.0-18.el7 and 3.10.0-826.el7 kernels and 
     qemu-kvm-rhev-2.10.0-15.el7.

     So after "balloon 300" from step (3) in comment 0 I tried "balloon 1024"
     and system operations resume, I able to reboot guest cleanly, to new
     kernel etc.

  2) With deflate-on-oom=off (which is default, I guess) following behavior 
     observed:

     2.1) Booting qemu-kvm with 8G of memory and start memhog as suggested in
          comment 4

     2.2) Then do "balloon 300" from step (3) in comment 0

     2.3) Contiguously get traces with "Out of puff": expected

     2.4) OOM killer starts kicking processes: expected

     2.5) Finally we panic by OOM killer not being able to kill any process

[    8.740024] Kernel panic - not syncing: Out of memory and no killable processes...
[    8.740024]

[   42.369528] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[   42.369579] [  522]     0   522      304        2       5       70         -1000 systemd-udevd
[   42.369636] [  818]     0   818      283        6       4       52         -1000 auditd
[   42.369686] [ 1244]     0  1244      368        1       6      106         -1000 sshd
[   42.369734] Kernel panic - not syncing: Out of memory and no killable processes...

     2.6) Kdump is in effect: dumps kernel and reboots: Expected

     2.7) Note that "balloon 300" is in effect.

     2.8) Booting new (or same as crashed kernel) kernel begins, OOM killer +
          "Out of Puff" is in effect leading to OOM killer panic like:

[   42.369528] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[   42.369579] [  522]     0   522      304        2       5       70         -1000 systemd-udevd
[   42.369636] [  818]     0   818      283        6       4       52         -1000 auditd
[   42.369686] [ 1244]     0  1244      368        1       6      106         -1000 sshd
[   42.369734] Kernel panic - not syncing: Out of memory and no killable processes...

     2.9) qemu-kvm exits: unexpected, crashkernel via Kdump should be executed
          and reboot system repeatedly (similar to 2.3 - 2.6 steps).

Comment 15 Tetsuo Handa 2018-01-04 14:07:25 UTC

(In reply to Serhii Popovych from comment #14)
>      2.9) qemu-kvm exits: unexpected, crashkernel via Kdump should be
> executed
>           and reboot system repeatedly (similar to 2.3 - 2.6 steps).

So, you confirmed that guest unexpectedly terminates when

  Kernel panic - not syncing: Out of memory and no killable processes...

is printed, didn't you?

Can you reproduce this problem when the guest panic()ed by doing

# echo c > /proc/sysrq-trigger

or

# echo -1000 > /proc/self/oom_score_adj
# memhog -r100 22G

instead of doing "balloon 300" ?

Comment 16 Serhii Popovych 2018-01-04 14:39:02 UTC

Yes, that is full output from step 2.9 from comment 14:

[    9.135954] 127948 pages reserved
[    9.135979] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[    9.136030] [  515]     0   515      304        0       5       83         -1000 systemd-udevd
[    9.136092] [  814]     0   814      283        0       5       49         -1000 auditd
[    9.136142] Kernel panic - not syncing: Out of memory and no killable processes...
[    9.136142]
[    9.136199] CPU: 3 PID: 515 Comm: systemd-udevd Tainted: G           OE  ------------   3.10.0-826.el7.ppc64le #1
[    9.136272] Call Trace:
[    9.136290] [c0000001f7c336a0] [c00000000001b340] show_stack+0x80/0x330 (unreliable)
[    9.136347] [c0000001f7c33750] [c000000000a27dc8] dump_stack+0x30/0x44
[    9.136396] [c0000001f7c33770] [c000000000a1dc9c] panic+0x140/0x2d8
[    9.136445] [c0000001f7c33800] [c0000000002676a4] out_of_memory+0x6d4/0x760
[    9.136494] [c0000001f7c338d0] [c00000000027165c] __alloc_pages_nodemask+0xc4c/0xc70
[    9.136551] [c0000001f7c33ac0] [c0000000002ebe38] alloc_pages_vma+0x1c8/0x6f0
[    9.136600] [c0000001f7c33b80] [c0000000002d3298] swapin_readahead+0x298/0x370
[    9.136656] [c0000001f7c33c70] [c0000000002b13ec] handle_mm_fault+0xbdc/0x1180
[    9.136712] [c0000001f7c33d70] [c000000000a0f584] do_page_fault+0x4a4/0x870
[    9.136761] [c0000001f7c33e30] [c00000000000956c] handle_page_fault+0x10/0x30
qemu-ifdown: Removing tap1525356-3000 from bridge virbr0

With SysRQ trigger I get expected behaviour: reboot to crashkernel, Kdump reboot into same kernel with no problem. Will try oom_score_adj.

Comment 17 Serhii Popovych 2018-01-04 14:39:46 UTC

Update. I have tried systemtap with dracut initramfs (via systemtap-initramfs package) to hook if poweroff/halt being called with no luck.

During step 2.9 from comment 14 I observe same panic scenario and silent qemu-kvm exit.

This is output during early initialization at initramfs (to prove stap loaded):
-------------------------------------------------------------------------------

[    0.614114] halt2panic: loading out-of-tree module taints kernel.
[    0.614469] halt2panic: module verification failed: signature and/or required key missing - tainting kernel
[    0.613702] dracut-cmdline[103]: Disconnecting from systemtap module.
[    0.614168] dracut-cmdline[103]: To reconnect, type "staprun -A halt2panic"

Comment 18 Serhii Popovych 2018-01-04 14:46:00 UTC

According to comment 15:

  echo -1000 > /proc/self/oom_score_adj
  # memhog -r100 8G

Does not trigger behavior observed in comment 14 step 2.9.

Comment 19 Tetsuo Handa 2018-01-04 15:05:52 UTC

OK. So, the problem occurs only when you did "balloon 300", doesn't it?

Two questions remaining.

One is about the guest side. At 2.8), did you confirm that kdump service
was started successfully? If "balloon 300" is in progress, there might be
too little memory for starting kdump service normally.

  # systemctl status kdump

The other is about the host side. Is there a watchdog functionality
configured which powers off the guest when the guest seems to hang up?

Comment 20 Tetsuo Handa 2018-01-04 15:26:13 UTC

(In reply to Tetsuo Handa from comment #19)
> The other is about the host side. Is there a watchdog functionality
> configured which powers off the guest when the guest seems to hang up?

Well, a simple way to check it could be

# systemctl stop kdump
# echo c > /proc/sysrq-trigger

and wait for a while whether the guest is terminated automatically.

Comment 21 Serhii Popovych 2018-01-04 15:43:30 UTC

(In reply to Tetsuo Handa from comment #19)
> OK. So, the problem occurs only when you did "balloon 300", doesn't it?

Exactly.

> 
> Two questions remaining.
> 
> One is about the guest side. At 2.8), did you confirm that kdump service
> was started successfully? If "balloon 300" is in progress, there might be
> too little memory for starting kdump service normally.
> 
>   # systemctl status kdump

I suppose it is, since after step 2.5) crashkernel kexecuted and finally I get
dumps in /var/crash/ (as well as vmcore-dmesg.txt).

> 
> The other is about the host side. Is there a watchdog functionality
> configured which powers off the guest when the guest seems to hang up?

According to comment 20 with kdump stopped qemu-kvm exits after triggering SysRQ:

# echo 'c' >/proc/sysrq-trigger
[   79.734413] SysRq : Trigger a crash
[   79.734477] Unable to handle kernel paging request for data at address 0x00000000
[   79.734526] Faulting instruction address: 0xc000000000608b80
[   79.734598] Oops: Kernel access of bad area, sig: 11 [#1]
[   79.734631] SMP NR_CPUS=2048 NUMA pSeries
[   79.734685] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ext4 mbcache jbd2 sg virtio_balloon ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_net virtio_console virtio_scsi virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod halt2panic(OE)
[   79.735240] CPU: 4 PID: 1797 Comm: bash Tainted: G           OE  ------------   3.10.0-826.el7.ppc64le #1
[   79.735296] task: c0000001f4fb16e0 ti: c0000001f8f60000 task.ti: c0000001f8f60000
[   79.735343] NIP: c000000000608b80 LR: c0000000006097bc CTR: c000000000608b60
[   79.735391] REGS: c0000001f8f63a70 TRAP: 0300   Tainted: G           OE  ------------    (3.10.0-826.el7.ppc64le)
[   79.735453] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222822  XER: 20000000
[   79.735566] CFAR: c000000000a1df1c DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c00000000060979c c0000001f8f63cf0 c000000001273f00 0000000000000063
GPR04: c00000000180a818 c00000000181b5f8 00000000000000c2 c00000000141aa30
GPR08: 0000000000000007 0000000000000001 0000000000000000 c00000000141fc28
GPR12: c000000000608b60 c000000007b82400 0000000010139e58 0000000040000000
GPR16: 000000001013b5d0 0000000000000000 00000000101306fc 0000000010139de4
GPR20: 0000000010139de8 0000000010093150 0000000000000000 0000000000000000
GPR24: 000000001013b5e0 00000000100fa0e8 0000000000000007 c0000000011bf590
GPR28: 0000000000000063 c0000000011bf950 c000000001189ba8 0000000000000002
[   79.736233] NIP [c000000000608b80] sysrq_handle_crash+0x20/0x30
[   79.736274] LR [c0000000006097bc] write_sysrq_trigger+0x10c/0x230
[   79.736314] Call Trace:
[   79.736332] [c0000001f8f63cf0] [c00000000060979c] write_sysrq_trigger+0xec/0x230 (unreliable)
[   79.736402] [c0000001f8f63d90] [c0000000003ef784] proc_reg_write+0x84/0x120
[   79.736451] [c0000001f8f63dd0] [c0000000003350b0] SyS_write+0x150/0x400
[   79.736500] [c0000001f8f63e30] [c00000000000a184] system_call+0x38/0xb4
[   79.736548] Instruction dump:
[   79.736573] 409effb8 7fc3f378 4bfff381 4bffffac 3c4c00c7 3842b3a0 3d42fff1 394a5c30
[   79.736654] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020 60000000 60420000
[   79.736758] ---[ end trace fa5f2ac54a8e4df4 ]---
[   79.738154]
[   81.738248] Kernel panic - not syncing: Fatal exception
qemu-ifdown: Removing tap1525356-3000 from bridge virbr0
^^^^^^^^^^^
   qemu-kvm exits here.

Comment 22 Tetsuo Handa 2018-01-04 21:03:49 UTC

(In reply to Serhii Popovych from comment #21)
> (In reply to Tetsuo Handa from comment #19)
> > OK. So, the problem occurs only when you did "balloon 300", doesn't it?
> 
> Exactly.
> 
> > 
> > Two questions remaining.
> > 
> > One is about the guest side. At 2.8), did you confirm that kdump service
> > was started successfully? If "balloon 300" is in progress, there might be
> > too little memory for starting kdump service normally.
> > 
> >   # systemctl status kdump
> 
> I suppose it is, since after step 2.5) crashkernel kexecuted and finally I
> get
> dumps in /var/crash/ (as well as vmcore-dmesg.txt).

Excuse me, but please do check.

The crashkernel ran at 2.5) was loaded before starting memhog at 2.1), which
means that there was a plenty of memory for starting kdump service normally.

The crashkernel which should have run at 2.9) is loaded after 2.6), which
means that there might be too little memory for starting kdump service normally.

> 
> > 
> > The other is about the host side. Is there a watchdog functionality
> > configured which powers off the guest when the guest seems to hang up?
> 
> According to comment 20 with kdump stopped qemu-kvm exits after triggering
> SysRQ:
> 
> # echo 'c' >/proc/sysrq-trigger
> [   79.734413] SysRq : Trigger a crash
> [   79.734477] Unable to handle kernel paging request for data at address
> 0x00000000
> [   79.734526] Faulting instruction address: 0xc000000000608b80
> [   79.734598] Oops: Kernel access of bad area, sig: 11 [#1]
> [   79.734631] SMP NR_CPUS=2048 NUMA pSeries
> [   79.734685] Modules linked in: ip6t_rpfilter ipt_REJECT nf_reject_ipv4
> ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat
> ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
> nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> iptable_mangle iptable_security iptable_raw ebtable_filter ebtables
> ip6table_filter ip6_tables iptable_filter ext4 mbcache jbd2 sg
> virtio_balloon ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic
> crct10dif_common virtio_net virtio_console virtio_scsi virtio_pci
> virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod halt2panic(OE)
> [   79.735240] CPU: 4 PID: 1797 Comm: bash Tainted: G           OE 
> ------------   3.10.0-826.el7.ppc64le #1
> [   79.735296] task: c0000001f4fb16e0 ti: c0000001f8f60000 task.ti:
> c0000001f8f60000
> [   79.735343] NIP: c000000000608b80 LR: c0000000006097bc CTR:
> c000000000608b60
> [   79.735391] REGS: c0000001f8f63a70 TRAP: 0300   Tainted: G           OE 
> ------------    (3.10.0-826.el7.ppc64le)
> [   79.735453] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222822 
> XER: 20000000
> [   79.735566] CFAR: c000000000a1df1c DAR: 0000000000000000 DSISR: 42000000
> SOFTE: 1
> GPR00: c00000000060979c c0000001f8f63cf0 c000000001273f00 0000000000000063
> GPR04: c00000000180a818 c00000000181b5f8 00000000000000c2 c00000000141aa30
> GPR08: 0000000000000007 0000000000000001 0000000000000000 c00000000141fc28
> GPR12: c000000000608b60 c000000007b82400 0000000010139e58 0000000040000000
> GPR16: 000000001013b5d0 0000000000000000 00000000101306fc 0000000010139de4
> GPR20: 0000000010139de8 0000000010093150 0000000000000000 0000000000000000
> GPR24: 000000001013b5e0 00000000100fa0e8 0000000000000007 c0000000011bf590
> GPR28: 0000000000000063 c0000000011bf950 c000000001189ba8 0000000000000002
> [   79.736233] NIP [c000000000608b80] sysrq_handle_crash+0x20/0x30
> [   79.736274] LR [c0000000006097bc] write_sysrq_trigger+0x10c/0x230
> [   79.736314] Call Trace:
> [   79.736332] [c0000001f8f63cf0] [c00000000060979c]
> write_sysrq_trigger+0xec/0x230 (unreliable)
> [   79.736402] [c0000001f8f63d90] [c0000000003ef784]
> proc_reg_write+0x84/0x120
> [   79.736451] [c0000001f8f63dd0] [c0000000003350b0] SyS_write+0x150/0x400
> [   79.736500] [c0000001f8f63e30] [c00000000000a184] system_call+0x38/0xb4
> [   79.736548] Instruction dump:
> [   79.736573] 409effb8 7fc3f378 4bfff381 4bffffac 3c4c00c7 3842b3a0
> 3d42fff1 394a5c30
> [   79.736654] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020
> 60000000 60420000
> [   79.736758] ---[ end trace fa5f2ac54a8e4df4 ]---
> [   79.738154]
> [   81.738248] Kernel panic - not syncing: Fatal exception
> qemu-ifdown: Removing tap1525356-3000 from bridge virbr0
> ^^^^^^^^^^^
>    qemu-kvm exits here.

OK. So, "balloon 300" should be irrelevant with shutting down the guest.
"balloon 300" is indirectly causing shutdown due to not allowing the guest
kernel to load the crashkernel. I don't know about qemu, but some option
you specified at command line might be relevant with shutting down the
guest (because the guest kernel already panic()ed).

Comment 23 David Gibson 2018-01-05 01:09:17 UTC

I think what's happening here is this:

 * Continuous OOMS panic the guest
 * Goes into a reboot loop attempting to kdump but running out of memory
 * At some point we hit a critical oom before we've even installed the kdump handler
 * That causes the guest to plain old panic (RTAS 'os-term' call)
 * qemu treats that as a guest panic and exits

If so, this is basically expected behaviour.

The way to check this would be to add a qmp socket to your qemu, intialize it (by sending the capabilities command), then look for a GUEST_PANICKED event issued by qemu just before it quits.

Alternatively you could -no-shutdown option to qemu which will change its behaviour on guest panic.

Comment 24 Tetsuo Handa 2018-01-05 02:45:32 UTC

(In reply to David Gibson from comment #23)
>  * Goes into a reboot loop attempting to kdump but running out of memory

I don't know whether crashkernel is getting "puff" messages.
Amount of memory reserved for crashkernel is usually so small that
it is smaller than the amount specified by the balloon command.

>  * At some point we hit a critical oom before we've even installed the kdump
> handler

Yes. The ballooning operation is done via kernel workqueue and loading the
crashkernel is done via userspace processes. The former can start earlier
than the latter.

>  * That causes the guest to plain old panic (RTAS 'os-term' call)
>  * qemu treats that as a guest panic and exits

Yes if qemu is listening to that.

> 
> If so, this is basically expected behaviour.

I agree.

> 
> The way to check this would be to add a qmp socket to your qemu, intialize
> it (by sending the capabilities command), then look for a GUEST_PANICKED
> event issued by qemu just before it quits.
> 
> Alternatively you could -no-shutdown option to qemu which will change its
> behaviour on guest panic.

Let's try that option.

Comment 25 Serhii Popovych 2018-01-05 12:24:45 UTC

So with QMP on and reproducing steps from comment 0 I get following:

[spopovyc@ibm-p8-virt-03 bz1525356]$ telnet 0 3000
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-15.el7)"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1515154966, "microseconds": 426569}, "event": "BALLOON_CHANGE", "data": {"actual": 8588886016}}
{"timestamp": {"seconds": 1515154967, "microseconds": 374186}, "event": "BALLOON_CHANGE", "data": {"actual": 8479834112}}
{"timestamp": {"seconds": 1515154968, "microseconds": 298928}, "event": "BALLOON_CHANGE", "data": {"actual": 8340373504}}
{"timestamp": {"seconds": 1515154969, "microseconds": 345541}, "event": "BALLOON_CHANGE", "data": {"actual": 8189378560}}
{"timestamp": {"seconds": 1515154970, "microseconds": 532847}, "event": "BALLOON_CHANGE", "data": {"actual": 8020557824}}
{"timestamp": {"seconds": 1515154971, "microseconds": 495469}, "event": "BALLOON_CHANGE", "data": {"actual": 7917797376}}
{"timestamp": {"seconds": 1515154972, "microseconds": 631074}, "event": "BALLOON_CHANGE", "data": {"actual": 7761559552}}
{"timestamp": {"seconds": 1515154973, "microseconds": 476765}, "event": "BALLOON_CHANGE", "data": {"actual": 7606370304}}
{"timestamp": {"seconds": 1515154974, "microseconds": 648769}, "event": "BALLOON_CHANGE", "data": {"actual": 7453278208}}
{"timestamp": {"seconds": 1515154975, "microseconds": 818741}, "event": "BALLOON_CHANGE", "data": {"actual": 5052760064}}
{"timestamp": {"seconds": 1515154976, "microseconds": 819630}, "event": "BALLOON_CHANGE", "data": {"actual": 2542469120}}
{"timestamp": {"seconds": 1515154977, "microseconds": 736803}, "event": "BALLOON_CHANGE", "data": {"actual": 799604736}}
{"timestamp": {"seconds": 1515154978, "microseconds": 575283}, "event": "BALLOON_CHANGE", "data": {"actual": 797573120}}
{"timestamp": {"seconds": 1515154988, "microseconds": 768042}, "event": "RESET", "data": {"guest": true}}
{"timestamp": {"seconds": 1515155003, "microseconds": 527906}, "event": "BALLOON_CHANGE", "data": {"actual": 8588886016}}
{"timestamp": {"seconds": 1515155004, "microseconds": 527969}, "event": "BALLOON_CHANGE", "data": {"actual": 5937037312}}
{"timestamp": {"seconds": 1515155005, "microseconds": 528294}, "event": "BALLOON_CHANGE", "data": {"actual": 3322937344}}
{"timestamp": {"seconds": 1515155006, "microseconds": 527936}, "event": "BALLOON_CHANGE", "data": {"actual": 905969664}}
{"timestamp": {"seconds": 1515155007, "microseconds": 332827}, "event": "BALLOON_CHANGE", "data": {"actual": 777912320}}
{"timestamp": {"seconds": 1515155008, "microseconds": 384995}, "event": "BALLOON_CHANGE", "data": {"actual": 773259264}}
{"timestamp": {"seconds": 1515155008, "microseconds": 689389}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}
{"timestamp": {"seconds": 1515155008, "microseconds": 689482}, "event": "GUEST_PANICKED", "data": {"action": "poweroff"}}
{"timestamp": {"seconds": 1515155008, "microseconds": 694574}, "event": "SHUTDOWN", "data": {"guest": true}}

Comment 26 Serhii Popovych 2018-01-05 14:14:26 UTC

(In reply to Tetsuo Handa from comment #22)
> (In reply to Serhii Popovych from comment #21)
> > (In reply to Tetsuo Handa from comment #19)
> > > OK. So, the problem occurs only when you did "balloon 300", doesn't it?
> > 
> > Exactly.
> > 
> > > 
> > > Two questions remaining.
> > > 
> > > One is about the guest side. At 2.8), did you confirm that kdump service
> > > was started successfully? If "balloon 300" is in progress, there might be
> > > too little memory for starting kdump service normally.
> > > 
> > >   # systemctl status kdump
> > 
> > I suppose it is, since after step 2.5) crashkernel kexecuted and finally I
> > get
> > dumps in /var/crash/ (as well as vmcore-dmesg.txt).
> 
> Excuse me, but please do check.

Just looked at your question again (2.8 you asked vs 2.5 I check): sorry.

No at 2.8) kdump isn't started, there is little of memory. You are right.

> 
> The crashkernel ran at 2.5) was loaded before starting memhog at 2.1), which
> means that there was a plenty of memory for starting kdump service normally.

Yes, there is 8GB of ram before starting memhog and "balloon 300". 

We boot system with kdump and then at 2.1) start memhog. Step 2.5) starts
crashkernel and dumps successfuly, as shown above.

> 
> The crashkernel which should have run at 2.9) is loaded after 2.6), which
> means that there might be too little memory for starting kdump service
> normally.

Yes, definitely. But I can't login to check with # systemctl status kdump
that Kdump is started.

Comment 27 Tetsuo Handa 2018-01-05 14:56:27 UTC

(In reply to Serhii Popovych from comment #26)
> > The crashkernel which should have run at 2.9) is loaded after 2.6), which
> > means that there might be too little memory for starting kdump service
> > normally.
> 
> Yes, definitely. But I can't login to check with # systemctl status kdump
> that Kdump is started.

No problem. Since the only difference is amount of memory available for
starting kdump service, we can assume that crashkernel did not start means
crashkernel could not be loaded.

OK, everything became clear. This is not a bug but an expected behavior.

Comment 28 Min Deng 2018-01-23 06:01:52 UTC

Here are some concerns about this bug from QE's perspective.
Scenario 1,
1.boot up a guest with the following cli,
  /usr/libexec/qemu-kvm -name guest=nrs,debug-threads=on -machine pseries,accel=kvm,usb=off,dump-guest-core=off -m size=50G,slots=256,maxmem=419430400k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid d7987973-2467-43ff-b8d2-acefc6ac59e5 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/tmp/qmp,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=Pegas-Server-7.4-bug.qcow2-newk,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:8a:8b,bus=pci.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -monitor unix:/tmp/monitor3,server,nowait

2.balloon 2048

3.check it from guest
#free -m
              total        used        free      shared  buff/cache   available
Mem:            942         312         340          20         289         163
Swap:          2047           0        2047

#[root@localhost home]# cat /proc/meminfo
cat /proc/meminfo
MemTotal:         964736 kB
MemFree:          351424 kB
MemAvailable:     170176 kB

********According to test plan,MemTotal should be 2G,but it wasn't********

Scenarios 2.
1.use the same CLI to boot up a guest
2.balloon 1024
  ...
  (qemu) info balloon
  balloon: actual=1504   - here,qemu-kvm process quit,there wasn't any qemu-kvm process on host any longer,and also got call trace log,there wasn't any memory consumer within the guest.

3.boot up guest again but only with 1280M,which was even less than 1504,as a result,the guest run well with 1280M memory.

  QE wonder Why the qemu-kvm quitted when guest's memory was bigger size during ballooning.The issue could be easily reproduced with big memory.

  Thanks a lot.Any issues please let me know.

Comment 29 Min Deng 2018-01-23 08:06:46 UTC

Tried it on the latest build by QE today,and got following results,

Build information,
kernel-4.14.0-25.el7a.ppc64le
qemu-kvm-rhev-2.10.0-17.el7.ppc64le
1.boot up guest with the following cli
/usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox off -machine pseries -nodefaults -vga std -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=rhel75-alt.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:13:17:51,bus=pci.0,addr=0x1e -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :11 -rtc base=utc,clock=host -enable-kvm -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -monitor stdio -smp 4 -m 80G,slots=256,maxmem=100G -qmp tcp:0:4444,server,nowait -monitor unix:/tmp/monitor3,server,nowait -device virtio-balloon-pci,id=balloon1 -chardev socket,id=serial_id_serial10,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial10
2.balloon 2048

******Before balloon******
[root@localhost ~]# uname -r
uname -r
4.14.0-25.el7a.ppc64le
[root@localhost ~]# free -m
free -m
              total        used        free      shared  buff/cache   available
Mem:          79759         354       78851          20         552       78745
Swap:          3071           0        3071
[root@localhost ~]# 

Actual results,qemu-kvm quitted silently without OOM and other errors.

Comment 30 Tetsuo Handa 2018-01-23 11:22:34 UTC

(In reply to Min Deng from comment #28)
> Here are some concerns about this bug from QE's perspective.
> Scenario 1,
> 1.boot up a guest with the following cli,
>   /usr/libexec/qemu-kvm -name guest=nrs,debug-threads=on -machine
> pseries,accel=kvm,usb=off,dump-guest-core=off -m
> size=50G,slots=256,maxmem=419430400k -realtime mlock=off -smp
> 4,sockets=4,cores=1,threads=1 -uuid d7987973-2467-43ff-b8d2-acefc6ac59e5
> -display none -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/tmp/qmp,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot strict=on
> -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device
> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive
> file=Pegas-Server-7.4-bug.qcow2-newk,format=qcow2,if=none,id=drive-scsi0-0-0-
> 0 -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,
> id=scsi0-0-0-0,bootindex=1 -netdev
> tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet0,vhost=on
> -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:8a:8b,bus=pci.0,
> addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg
> timestamp=on -monitor stdio -chardev
> socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device
> spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -monitor
> unix:/tmp/monitor3,server,nowait
> 
> 2.balloon 2048
> 
> 3.check it from guest
> #free -m
>               total        used        free      shared  buff/cache  
> available
> Mem:            942         312         340          20         289        
> 163
> Swap:          2047           0        2047
> 
> #[root@localhost home]# cat /proc/meminfo
> cat /proc/meminfo
> MemTotal:         964736 kB
> MemFree:          351424 kB
> MemAvailable:     170176 kB
> 
> ********According to test plan,MemTotal should be 2G,but it wasn't********

I don't know what is happening here. Some memory might be reserved for
other purposes (such as crashkernel= parameter or other hardware/driver
dependent usage). First step is to check the amount of gap between
"free -m" / "cat /proc/meminfo" and "info balloon" before executing
"balloon 2048" command?

> 
> Scenarios 2.
> 1.use the same CLI to boot up a guest
> 2.balloon 1024
>   ...
>   (qemu) info balloon
>   balloon: actual=1504   - here,qemu-kvm process quit,there wasn't any
> qemu-kvm process on host any longer,and also got call trace log,there wasn't
> any memory consumer within the guest.

I suspect that "info balloon" said 1504MB was an outdated value. Since the
ballooning operation can change the guest's memory very quickly, the value
printed might not be as of immediately before the qemu-kvm process quits.

If 1GB of memory is reserved for some reason, executing "balloon 1024" command
would make the guest's memory 0B. Then, the guest kernel will panic() even
without executing memory consumer. Did you check the guest's messages?

> 
> 3.boot up guest again but only with 1280M,which was even less than 1504,as a
> result,the guest run well with 1280M memory.

The difference of 3 is "not executing balloon command" ? Then, what happens
if you execute "balloon 1024" command?

> 
>   QE wonder Why the qemu-kvm quitted when guest's memory was bigger size
> during ballooning.The issue could be easily reproduced with big memory.
> 
>   Thanks a lot.Any issues please let me know.

I'm not familiar with ppc64le architecture, but attaching the whole kernel
messages (which includes memory mapping information etc.) might help.

Comment 31 Tetsuo Handa 2018-01-24 14:19:50 UTC

Can I confirm your question?

The problem in description and comment 1 used "-m 24G,maxmem=200G,slots=200"
and what seemed to be a bug was actually an expected behavior. Is this correct?

The problem in comment 28 used "-m size=50G,slots=256,maxmem=419430400k" and
the problem in comment 29 used "-m 80G,slots=256,maxmem=100G", but the problem
you are experiencing is different from what you experienced with
"-m 24G,maxmem=200G,slots=200". Is this correct?

Comment 28 says that /proc/meminfo reports unexpected value if executing
"balloon 2048" after booted with 50GB. Comment 28 and comment 29 both says
that qemu-kvm terminated silently without OOM and other errors, despite
comment 28 was "balloon 1024" after booted with 50GB and comment 29 was
"balloon 2048" after booted with 80GB. Is this correct?

Your comment

  Why the qemu-kvm quitted when guest's memory was bigger size during ballooning.
  The issue could be easily reproduced with big memory.

comes from the amount of memory available before executing "balloon XXXX" command.
Is this correct?

Then, I guess something unexpected is happening when executing "balloon XXXX"
command.

virtio_balloon driver is storing return value of page_to_pfn() (which is
"unsigned long") into "u32". Since page_to_balloon_pfn() returns "u32" value,
pfn * VIRTIO_BALLOON_PAGES_PER_PAGE can overflow if page_to_pfn() returned a
large value. I don't know whether qemu-kvm can work correctly if overflow is
occurring due to the amount of memory available before executing "balloon XXXX"
command was big.

--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -100,6 +100,7 @@ static u32 page_to_balloon_pfn(struct page *page)
 	unsigned long pfn = page_to_pfn(page);
 
 	BUILD_BUG_ON(PAGE_SHIFT < VIRTIO_BALLOON_PFN_SHIFT);
+	BUG_ON(pfn * VIRTIO_BALLOON_PAGES_PER_PAGE > (unsigned long) U32_MAX);
 	/* Convert pfn from Linux page size to balloon page size. */
 	return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }

Comment 32 Min Deng 2018-01-26 09:15:42 UTC

(In reply to Tetsuo Handa from comment #31)
> Can I confirm your question?
> 
> The problem in description and comment 1 used "-m 24G,maxmem=200G,slots=200"
> and what seemed to be a bug was actually an expected behavior. Is this
> correct?
> 
> The problem in comment 28 used "-m size=50G,slots=256,maxmem=419430400k" and
> the problem in comment 29 used "-m 80G,slots=256,maxmem=100G", but the
> problem
> you are experiencing is different from what you experienced with
> "-m 24G,maxmem=200G,slots=200". Is this correct?

Hi Tetsuo,
   Very appreciated for your detail reply.In my opinions,no matter how much memory guest has,QE could meet qemu-kvm quit issue with the build's mentioned in comment0.Furthermore,QE also tried the following test matrix through old kernel-alt build.The qemu-kvm won't quit though there was kernel panic warning message.It is more acceptable rather than quit silently.It's more like a regression issue.Thanks,any issues please let me know.

Build information,
kernel-4.14.0-1.el7a.ppc64le

1) -m 80G,slots=256,maxmem=500G
2) -m 50G ...
3) -m 24G ...


[  122.776726] Kernel panic - not syncing: Out of memory and no killable processes...
[  122.776726] 
[  122.776785] CPU: 3 PID: 1 Comm: systemd Not tainted 4.14.0-1.el7a.ppc64le #1
[  122.776830] Call Trace:
[  122.776848] [c0000013f1603680] [c000000000c34ccc] dump_stack+0xb0/0xf4 (unreliable)
[  122.776895] [c0000013f16036c0] [c00000000012f604] panic+0x150/0x32c
[  122.776932] [c0000013f1603750] [c00000000031b710] out_of_memory+0x540/0x8c0
[  122.776970] [c0000013f16037f0] [c00000000032404c] __alloc_pages_nodemask+0xe2c/0x1070
[  122.777015] [c0000013f16039f0] [c0000000003d2290] alloc_pages_vma+0xe0/0x570
[  122.777060] [c0000013f1603a60] [c0000000003b2248] __read_swap_cache_async+0x1f8/0x300
[  122.777105] [c0000013f1603ae0] [c0000000003b2848] swapin_readahead+0x1f8/0x3c0
[  122.777149] [c0000013f1603ba0] [c000000000381428] do_swap_page+0xbb8/0xc90
[  122.777187] [c0000013f1603c30] [c000000000385ec8] __handle_mm_fault+0xc88/0x1010
[  122.777232] [c0000013f1603d30] [c000000000386378] handle_mm_fault+0x128/0x200
[  122.777275] [c0000013f1603d70] [c000000000076b24] do_page_fault+0x1d4/0x6f0
[  122.777314] [c0000013f1603e30] [c00000000000a4d4] handle_page_fault+0x18/0x38
[  122.777600] Sending IPI to other CPUs
[  122.779073] IPI complete
[  122.780830] kexec: Starting switchover sequence.

Comment 33 Min Deng 2018-01-29 02:24:28 UTC

Created attachment 1387528 [details]
Log

Comment 34 Min Deng 2018-01-29 02:26:32 UTC

For comment32,
qemu-kvm-rhev-2.10.0-17.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Comment 35 David Gibson 2018-04-13 06:01:55 UTC

Extra notes, based on IRC discussing with Min Deng.

 1) Comment 32 mentions using various memory sizes, however in each case memhog was adjusted so that most of the memory was used
 2) Mostly seems like expected behaviour: guest dies due to OOM, qemu reports the panic then quits as it is configured to do
 3) kdump would usually prevent the qemu panic quit, but the setup here means we might be hitting it before kdump can get established.

The only odd thing remaining is that the behaviour seems to have changed since previous versions, see comment 10.

Comment 36 Yumei Huang 2018-05-11 07:08:06 UTC

QE hit same contiguous traces as comment 2 on x86 host. 

Seems option "deflate-on-oom" has nothing to do with it. No matter turn it on or off, hit the same trace.

Details:
guest kernel: 3.10.0-879.el7.x86_64
host kernel: 3.10.0-862.el7.x86_64
qemu-kvm-rhev-2.12.0-1.el7

Cmdline:
#  /usr/libexec/qemu-kvm -m 10G -smp 32 \
/home/kvm_autotest_root/images/rhel76-64-virtio-scsi.qcow2 \
-netdev tap,id=tap0 -device virtio-net-pci,id=net0,netdev=tap0 \
-vnc :0 -monitor stdio \
-device virtio-balloon-pci,id=balloon0,deflate-on-oom=yes/no

Run stress in guest:
# nohup stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M &

Inflate the balloon:
(qemu) balloon 2400

A few minutes later, hit contiguous traces from guest kernel:

2018-05-10 07:39:36: [  191.104396] virtio_balloon virtio2: Out of puff! Can't get 1 pages
2018-05-10 07:39:37: [  192.183957] kworker/8:1: page allocation failure: order:0, mode:0x310da
2018-05-10 07:39:37: [  192.185742] CPU: 8 PID: 480 Comm: kworker/8:1 Kdump: loaded Not tainted 3.10.0-879.el7.x86_64 #1
2018-05-10 07:39:37: [  192.187284] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
2018-05-10 07:39:37: [  192.188893] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
2018-05-10 07:39:37: [  192.190748] Call Trace:
2018-05-10 07:39:37: [  192.191225]  [<ffffffffa5b176b3>] dump_stack+0x19/0x1b
2018-05-10 07:39:37: [  192.192218]  [<ffffffffa559ab30>] warn_alloc_failed+0x110/0x180
2018-05-10 07:39:37: [  192.193250]  [<ffffffffa559f6b4>] __alloc_pages_nodemask+0x9b4/0xbb0
2018-05-10 07:39:37: [  192.194790]  [<ffffffffa55e92b8>] alloc_pages_current+0x98/0x110
2018-05-10 07:39:37: [  192.195601]  [<ffffffffa56186a5>] balloon_page_alloc+0x15/0x20
2018-05-10 07:39:37: [  192.196949]  [<ffffffffc0567811>] update_balloon_size_func+0xb1/0x290 [virtio_balloon]
2018-05-10 07:39:37: [  192.198350]  [<ffffffffa54b2ecf>] process_one_work+0x17f/0x440
2018-05-10 07:39:37: [  192.199895]  [<ffffffffa54b3b96>] worker_thread+0x126/0x3c0
2018-05-10 07:39:37: [  192.201579]  [<ffffffffa54b3a70>] ? manage_workers.isra.24+0x2a0/0x2a0
2018-05-10 07:39:37: [  192.203032]  [<ffffffffa54baf01>] kthread+0xd1/0xe0
2018-05-10 07:39:37: [  192.203850]  [<ffffffffa54bae30>] ? insert_kthread_work+0x40/0x40
2018-05-10 07:39:37: [  192.205011]  [<ffffffffa5b29637>] ret_from_fork_nospec_begin+0x21/0x21
2018-05-10 07:39:37: [  192.206566]  [<ffffffffa54bae30>] ? insert_kthread_work+0x40/0x40
2018-05-10 07:39:37: [  192.207487] Mem-Info:


I also tried guest kernel 3.10.0-666.el7.x86_64, which does NOT contain the commit "[virtio] virtio_balloon: fix increment of vb->num_pfns in fill_balloon()" , still hit the same issue. 


Serhii, could you please help clarity if above result is expected? Thanks!

Comment 37 Yumei Huang 2018-05-21 11:04:45 UTC

Hi Luiz, could you please help have a look at comment 36? Is the call trace expected ? If not, I will file a new bug. Thanks!


BTW, in comment 36, I forgot to mention that I have ran 3 more times of following stress cmd in guest to trigger the call trace.

# nohup stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M &

Comment 38 Luiz Capitulino 2018-05-22 13:28:32 UTC

Yumei,

Yes, that's expected behavior. The kernel is running out of memory because you ballooned it down to less than 2GB but you have an application consuming memory.

Comment 39 David Gibson 2018-06-06 04:00:06 UTC

From comments 36-38, this doesn't look to be a bug.

Comment 40 Min Deng 2018-07-12 06:59:33 UTC

Hi all,
  QE need to confirm something with you.
  How to define the lowest value of balloon.If we can address the correct value somewhere it could not be better.
  In today's tests,I have a guest with 4G initial memory after balloon around 700G,qemu quit,there wasn't process on host.It means that the problem still can be reproduced.The issue described in comment36 is different with mine.I think qemu process was still there in comment36.
  Can we discuss this problem further if necessary ?

1.boot up a guest 
cli,
/usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox off -machine pseries -nodefaults -vga std -chardev socket,id=qmp_id_qmp1,path=/tmp/qmp,servet -mon chardev=qmp_id_qmp1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/tmp/qmp_id_catch_monitor,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=rhel76-ppc64-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -device virtio-net-pci,mac=9a:1a:1b:1c:1d:1e,id=idDTPKW8,vectors=4,netdev=net0,bus=pci.0,addr=0x5 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -m 4096 -smp 16,cores=8,threads=1,sockets=2 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :1 -rtc base=utc,clock=host -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -monitor stdio

2.Hotplug a balloon device

#nc -U /tmp/qmp

#{"execute":"qmp_capabilities"}

#{"execute":"device_add","arguments":{"driver":"virtio-balloon-pci","id":"balloon1","addr":"0x9"}}


#{"execute":"balloon","arguments":{"value":734003200}}
{"return": {}}

3.qemu quit.

Comment 41 Min Deng 2018-07-12 07:03:35 UTC

Build info
kernel-3.10.0-916.el7.ppc64le
qemu-kvm-rhev-2.12.0-6.el7.ppc64le

Comment 42 Luiz Capitulino 2018-07-12 17:47:06 UTC

Min,

The minimum value you can resize your guest is the minimum value your guest needs to run without crashing out of memory. This depends on several things (your guest size and hardware, workload, etc). I guess that a minimal RHEL7 installation, running on a 1 vCPU guest which is idle, would probably need at the very least a few hundred megas.

In you example, you're resizing your guest from 4GB to about 700MB. Again, whether this is enough or not depends on your guest.

Regarding qemu exiting, I don't know. It could be a bug, but it could also be What David explains in comment 23.

A simple test to find out if nothing major is broken is to go from 4GB to 3GB or 2GB first.

Comment 43 Min Deng 2018-07-13 08:41:05 UTC

(In reply to Luiz Capitulino from comment #42)
> Min,
> 
> The minimum value you can resize your guest is the minimum value your guest
> needs to run without crashing out of memory. This depends on several things
> (your guest size and hardware, workload, etc). I guess that a minimal RHEL7
> installation, running on a 1 vCPU guest which is idle, would probably need
> at the very least a few hundred megas.
> 
> In you example, you're resizing your guest from 4GB to about 700MB. Again,
> whether this is enough or not depends on your guest.
> 
> Regarding qemu exiting, I don't know. It could be a bug, but it could also
> be What David explains in comment 23.
> 
> A simple test to find out if nothing major is broken is to go from 4GB to
> 3GB or 2GB first

Hi Luiz,
  Thanks for your kindly reply,and QE totally understand what you described.
QE just care about comment10.The test result was different with previous.And QE think the previous behavior was much better than current's,even if there was OOM occurred the qemu won't quit.It's much more like a regression problem in my opinions.Anyway,thanks again.

Thanks.
Min

Comment 44 Luiz Capitulino 2018-07-13 18:49:20 UTC

Min,

I think you have a point that if pvpanic is not used, QEMU shouldn't exit. I honestly don't know if this would matter much, since the guest is hung anyways, but if QEMU is exiting due to a bug then this has to be fixed.

Now, I quickly tried the test-case on latest RHEL7.6 x86 and it works as expected for me: I start a guest with 4GB under libvirt, balloon it down to 700MB, which causes the guest kernel to run out of memory and hang. But QEMU is indeed still up and running.

So, you could try your test-case with x86. If it works (ie. QEMU is still running), then this might mean this is a specific ppc64le issue. In which case I unfortunately can't help you, but you could try reopening the BZ.

Comment 45 David Gibson 2018-07-16 01:46:06 UTC

In reply to comment 44:

> I think you have a point that if pvpanic is not used, QEMU shouldn't exit. 

So the POWER equivalent of pvpanic is a hypervisor facility that's always available.  So, effectively, we *always* have pvpanic - that's why we always get a qemu exit.

In reply to comment 43:

> The test result was different with previous.

Which earlier version behaved differently?

> And QE think the previous behavior was much better than current's,even if there was OOM occurred the qemu won't quit.

I'm not sure I agree here.  If qemu quits, it will be reported as a crash to the management tools, so the user can easily see what's happened and restart the guest.  If the guest just freezes within qemu, it's no longer doing it's job, but it won't be as obvious why.

Comment 46 Luiz Capitulino 2018-07-16 13:27:22 UTC

(In reply to David Gibson from comment #45)
> In reply to comment 44:
> 
> > I think you have a point that if pvpanic is not used, QEMU shouldn't exit. 
> 
> So the POWER equivalent of pvpanic is a hypervisor facility that's always
> available.  So, effectively, we *always* have pvpanic - that's why we always
> get a qemu exit.

Oh, OK. That makes sense then. Sorry for the noise.

Note You need to log in before you can comment on or make changes to this bug.