Bug 1506882

Summary: Call trace showed up in dmesg after migrating guest when "stress-ng --numa 2" was running inside guest
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED ERRATA QA Contact: Min Deng <mdeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: dgibson, knoel, lmiksik, lvivier, mdeng, michen, mrezanin, qzhang, virt-maint, xianwang
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-9.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:44:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
calltracelogfromguest none

Description Min Deng 2017-10-27 05:10:36 UTC
Description of problem:
Call trace showed up in dmesg after migrating guest with "stress-ng --numa 2" running.
Version-Release number of selected component (if applicable):
kernel-3.10.0-747.el7.ppc64le
qemu-kvm-rhev-2.10.0-3.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch
How reproducible:
3/3
Steps to Reproduce:
1.boot up guest with the following cli,
src:
  /usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox off -machine pseries -nodefaults -vga std -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=rhel75-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:43:17:1a,bus=pci.0,addr=0x1e -m 4096 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -rtc base=utc,clock=host -enable-kvm -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -monitor stdio
dst:
/usr/libexec/qemu-kvm -name avocado-vt-vm1 -sandbox off -machine pseries -nodefaults -vga std -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=rhel75-ppc64le-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:43:17:1a,bus=pci.0,addr=0x1e -m 4096 -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :1 -rtc base=utc,clock=host -enable-kvm -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -monitor stdio -incoming tcp:0:1111

2.make stress-ng --numa 2 running on guest.
  
3.migrate guest 
  migrate -d tcp:0:1111
  migrate_set_speed 3G

Actual results:
Migration succeed without error

Expected results:
Migration succeed and there wasn't call trace issue from dmesg

Additional info:
See call trace issue from attachment.

[  179.641380] CPU: 0 PID: 2697 Comm: stress-ng-numa Tainted: G             L ------------   3.10.0-747.el7.ppc64le #1
[  179.641382] task: c000000029204020 ti: c0000000e9828000 task.ti: c0000000e9828000
[  179.641383] NIP: c000000000511a60 LR: c00000000050d564 CTR: 0000000000000000
[  179.641385] REGS: c0000000e982b9b0 TRAP: 0901   Tainted: G             L ------------    (3.10.0-747.el7.ppc64le)
[  179.641386] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44004444  XER: 20000000
[  179.641391] CFAR: c00000000050d560 SOFTE: 1 
GPR00: c00000000050d4f8 c0000000e982bc30 c00000000121e900 c0000000e982bd80 
GPR04: 0000000000000100 0000000000000007 ffffffffffffffc0 ffffffffffffffff 
GPR08: ffffffffffffffc0 0000000000000100 c0000000e982bd80 0000000000000000 
GPR12: 0000000024004444 c000000003380000 
[  179.641402] NIP [c000000000511a60] find_next_bit+0x0/0xf0
[  179.641404] LR [c00000000050d564] bitmap_bitremap+0xb4/0x200
[  179.641405] Call Trace:
[  179.641406] [c0000000e982bc30] [c00000000050d4f8] bitmap_bitremap+0x48/0x200 (unreliable)
[  179.641409] [c0000000e982bc80] [c0000000002e3264] do_migrate_pages+0x204/0x390
[  179.641411] [c0000000e982bd60] [c0000000002e36f0] SyS_migrate_pages+0x270/0x2a0
[  179.641413] [c0000000e982be30] [c00000000000a184] system_call+0x38/0xb4
[  179.641414] Instruction dump:
[  179.641415] 7c832378 4e800020 60000000 60420000 7cea3b78 7d230074 2063003f 794a3664 
[  179.641418] 7c6307b4 7c635214 4e800020 60000000 <7fa52040> 409c00cc 78a606a1 78a7d182 
[43810.942228] TCP: lp registered

Comment 1 Min Deng 2017-10-27 05:11:48 UTC
QE will try it on x86 later as soon as get the test result.

Comment 2 Min Deng 2017-10-27 08:03:30 UTC
It won't be reproduced without migration
It won't be reproduced while guest has more than 1 node

Comment 3 Min Deng 2017-10-30 03:31:48 UTC
(In reply to Min Deng from comment #1)
> QE will try it on x86 later as soon as get the test result.

  QE tried the bug on the following builds it could not be reproduced on x86 platform.So mark the bug as power only.Thanks a lot.
  qemu-kvm-rhev-2.10.0-3.el7.x86_64
  guest:kernel-3.10.0-693.el7.x86_64/kernel-3.10.0-759.el7.x86_64.rpm

Comment 4 David Gibson 2017-11-09 03:05:30 UTC
We need a bit more information here:

  1. Does the call trace appear in the guest or the host?

  2. Can we have more context from the call trace.  comment 0 mentions a trace in a attachement, but I don't see any attachement on this BZ.

Comment 5 Min Deng 2017-11-14 01:57:44 UTC
(In reply to David Gibson from comment #4)
> We need a bit more information here:
> 
>   1. Does the call trace appear in the guest or the host?
    The call trace happened in guest
>   2. Can we have more context from the call trace.  comment 0 mentions a
> trace in a attachement, but I don't see any attachement on this BZ.
    I will double check and upload it,thanks a lot.

Comment 6 Min Deng 2017-11-14 01:58:31 UTC
Created attachment 1351778 [details]
calltracelogfromguest

Comment 7 David Gibson 2017-11-14 23:56:28 UTC
Call trace seems to be a soft lockup with a ludicrously long time.

Looks like something is messed up in the migration of the time.

Laurent, I believe you were looking at another bug with a similar symptom, this is likely to be a dupe.

Comment 8 xianwang 2017-11-16 06:20:17 UTC
Hi, Laurent,
I hit a problem, it is similar to this bug, but I am not sure, the result is as following:
on P9, execute "# stress --cpu 4 --io 4 --vm 2", after migration, on src host, migration status is "completed" while on dst host, the status of vm is "running", but vm is hang on dst host, and there is call trace on dmesg:

[   24.472855] Call Trace:
[   24.472858] [c0000001f593bc50] [c0000001f5b84970] 0xc0000001f5b84970 (unreliable)
[   24.472861] [c0000001f593bc80] [c00000000012e770] copy_mm+0x48c/0x63c
[   24.472865] [c0000001f593bd40] [c00000000012b09c] copy_process.isra.31.part.32+0x72c/0x10a0
[   24.472868] [c0000001f593bdc0] [c00000000012c970] _do_fork+0xf0/0x4e0
[   24.472873] [c0000001f593be30] [c00000000000b410] ppc_clone+0x8/0xc
[   24.472874] Instruction dump:
[   24.472878] 7c7f1b78 48000008 e8410018 e93f00a8 e95f00a0 e9090008 e94a00f0 e90806d8 
[   24.472885] 7d0a5278 7d4a0074 794ad182 694a0001 <0b0a0000> 39490030 7d005028 31080001 
[   24.472893] ---[ end trace b01f99a09aec16ae ]---
[   34.500801] nr_pdflush_threads exported in /proc is scheduled for removal
[   35.280012] sctp: Hash tables configured (bind 4096/4096)


could you help to check whether the problem is same with this bug? if not, I will file a new BZ, thanks

Comment 9 Laurent Vivier 2017-11-16 09:07:27 UTC
(In reply to xianwang from comment #8)
> Hi, Laurent,
> I hit a problem, it is similar to this bug, but I am not sure, the result is
> as following:
> on P9, execute "# stress --cpu 4 --io 4 --vm 2", after migration, on src
> host, migration status is "completed" while on dst host, the status of vm is
> "running", but vm is hang on dst host, and there is call trace on dmesg:
> 
> [   24.472855] Call Trace:
> [   24.472858] [c0000001f593bc50] [c0000001f5b84970] 0xc0000001f5b84970
> (unreliable)
> [   24.472861] [c0000001f593bc80] [c00000000012e770] copy_mm+0x48c/0x63c
> [   24.472865] [c0000001f593bd40] [c00000000012b09c]
> copy_process.isra.31.part.32+0x72c/0x10a0
> [   24.472868] [c0000001f593bdc0] [c00000000012c970] _do_fork+0xf0/0x4e0
> [   24.472873] [c0000001f593be30] [c00000000000b410] ppc_clone+0x8/0xc
> [   24.472874] Instruction dump:
> [   24.472878] 7c7f1b78 48000008 e8410018 e93f00a8 e95f00a0 e9090008
> e94a00f0 e90806d8 
> [   24.472885] 7d0a5278 7d4a0074 794ad182 694a0001 <0b0a0000> 39490030
> 7d005028 31080001 
> [   24.472893] ---[ end trace b01f99a09aec16ae ]---
> [   34.500801] nr_pdflush_threads exported in /proc is scheduled for removal
> [   35.280012] sctp: Hash tables configured (bind 4096/4096)
> 
> 
> could you help to check whether the problem is same with this bug? if not, I
> will file a new BZ, thanks

The interesting part in comment 6 attachment is:

    NMI watchdog: BUG: soft lockup - CPU#0 stuck for 3489660870s! [stress-ng-numa:2697]
    NMI watchdog: BUG: soft lockup - CPU#1 stuck for 3489660884s! [stress-ng-numa:2698]

Do you have the same traces with this new bug?

Comment 10 Laurent Vivier 2017-11-16 11:18:52 UTC
Bisected to:

commit 42043e4f1241eeb77f87f5816b5cf0b6e9583ed7
Author: Laurent Vivier <lvivier>
Date:   Fri Jan 27 13:24:58 2017 +0100

    spapr: clock should count only if vm is running
    
    This is a port to ppc of the i386 commit:
        00f4d64 kvmclock: clock should count only if vm is running
    
    We remove timebase_post_load function, and use the VM state
    change handler to save and restore the guest_timebase (on stop
    and continue).
    
    We keep timebase_pre_save to reduce the clock difference on
    migration like in:
        6053a86 kvmclock: reduce kvmclock difference on migration
    
    Time base offset has originally been introduced by commit
        98a8b52 spapr: Add support for time base offset migration
    
    So while VM is paused, the time is stopped. This allows to have
    the same result with date (based on Time Base Register) and
    hwclock (based on "get-time-of-day" RTAS call).
    
    Moreover in TCG mode, the Time Base is always paused, so this
    patch also adjust the behavior between TCG and KVM.
    
    VM state field "time_of_the_day_ns" is now useless but we keep
    it to be able to migrate to older version of the machine.
    
    As vmstate_ppc_timebase structure (with timebase_pre_save() and
    timebase_post_load() functions) was only used by vmstate_spapr,
    we register the VM state change handler only in ppc_spapr_init().

Comment 11 Laurent Vivier 2017-11-16 14:47:58 UTC
Commit 42043e4 adds a call to kvm_set_one_reg(cpu, KVM_REG_PPC_TB_OFFSET, ...):

...
@@ -904,9 +890,44 @@ static int timebase_post_load(void *opaque, int version_id)
     CPU_FOREACH(cpu) {
         PowerPCCPU *pcpu = POWERPC_CPU(cpu);
         pcpu->env.tb_env->tb_offset = tb_off_adj;
+#if defined(CONFIG_KVM)
+        kvm_set_one_reg(cpu, KVM_REG_PPC_TB_OFFSET,
+                        &pcpu->env.tb_env->tb_offset);
+#endif
     }
+}
...

And it seems the cause of the problem. If I remove it, all works fine.

Comment 12 Laurent Vivier 2017-11-17 08:58:15 UTC
I've checked the content of TBR on src and dest and the content is okay before and after migration under stress.

#include <stdio.h>

#define FREQ 512000000
int main(void)
{
        unsigned long timebase;
        asm("mftb %0" : "=r" (timebase));
        printf("%lu.%lu\n", timebase / FREQ, timebase % FREQ);
        return 0;
}

Comment 13 Laurent Vivier 2017-11-17 15:17:40 UTC
If I use upstream kernel 4.14.0+ (cf9b077) in guest the problem cannot be reproduced.

Comment 14 David Gibson 2017-11-19 23:53:00 UTC
Smells like a host kernel bug setting the TV_OFFSET that got fixed upstream.

Comment 15 xianwang 2017-11-20 08:55:25 UTC
(In reply to Laurent Vivier from comment #13)
> If I use upstream kernel 4.14.0+ (cf9b077) in guest the problem cannot be
> reproduced.

Sorry, I have a mistake, I have used the old package of qemu-kvm-rhev-2.9, I have updated the packages to qemu-kvm-rhev-2.10.0-6.el7, and there is no this problem,so, please ignore comment8, thanks

Comment 16 xianwang 2017-11-20 09:10:08 UTC
(In reply to xianwang from comment #15)
> (In reply to Laurent Vivier from comment #13)
> > If I use upstream kernel 4.14.0+ (cf9b077) in guest the problem cannot be
> > reproduced.
> 
> Sorry, I have a mistake, I have used the old package of qemu-kvm-rhev-2.9, I
> have updated the packages to qemu-kvm-rhev-2.10.0-6.el7, and there is no
> this problem,so, please ignore comment8, thanks

Now, I have updated to the latest packages as following:
Host:
Host:
4.14.0-2.el7a.ppc64le
qemu-kvm-rhev-2.10.0-6.el7.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

Guest:
4.14.0-2.el7a.ppc64le

If stress guest with "# stress --io 4", this problem is not hit, but with "# stress --cpu 4 --io 4 --vm 2 --vm-bytes 128M", sometimes this bug is reproduced as comment8, but maybe as Laurent said that it is fixed upstream.

Comment 17 Laurent Vivier 2017-11-20 14:38:19 UTC
(In reply to Laurent Vivier from comment #13)
> If I use upstream kernel 4.14.0+ (cf9b077) in guest the problem cannot be
> reproduced.

Bisecting doesn't converge with a commit that can fix the problem (nothing related to POWER, or KVM). So I think the problem cannot be reproduced reliably with latest kernel and is not fixed.

Comment 18 Laurent Vivier 2017-11-20 16:59:24 UTC
The timestamp is computed using get_timestamp() which uses running_clock().

On POWER, running_clock() is:

unsigned long long running_clock(void)
{
...
        if (firmware_has_feature(FW_FEATURE_LPAR) &&
            cpu_has_feature(CPU_FTR_ARCH_207S))
                return mulhdu(get_vtb() - boot_tb, 
                              tb_to_ns_scale) << tb_to_ns_shift;

...
        return local_clock() -
               cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
}

And in our case, both features are true, and the cause of the problem is
boot_tb > get_vtb().

It seems get_vtb() is not computed correctly after migration.

Comment 19 Laurent Vivier 2017-11-21 12:25:58 UTC
The problem is in QEMU which doesn't save/restore the content of the VTB.

The following patches fixes the problem:

diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index b9c49c2..4e11e6f 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8081,10 +8081,10 @@ static void gen_spr_power8_ebb(CPUPPCState *env)
 /* Virtual Time Base */
 static void gen_spr_vtb(CPUPPCState *env)
 {
-    spr_register(env, SPR_VTB, "VTB",
+    spr_register_kvm(env, SPR_VTB, "VTB",
                  SPR_NOACCESS, SPR_NOACCESS,
                  &spr_read_tbl, SPR_NOACCESS,
-                 0x00000000);
+                 KVM_REG_PPC_VTB, 0x00000000);
 }

but the fix is not enough as we can have problem with the migration from a previous release and trying to restore an invalid value.

Comment 20 Laurent Vivier 2017-11-21 17:12:10 UTC
I've tested a migration from qemu-2.8.0 and pseries-2.8.0 to a patched master (qemu-2.11.0-rc1). The VTB received is 0 (as is it not initialized by qemu-2.8.0), but the value seems to be ignored by KVM and a non zero VTB is used by the kernel. I have no explanation for that.

Comment 21 Laurent Vivier 2017-11-22 13:57:44 UTC
Sent upstream:

http://patchwork.ozlabs.org/patch/840170/

Comment 23 Miroslav Rezanina 2017-11-28 10:52:31 UTC
Fix included in qemu-kvm-rhev-2.10.0-9.el7

Comment 25 Min Deng 2017-12-05 05:40:11 UTC
Reproduced the bug on 
kernel-3.10.0-747.el7.ppc64le
qemu-kvm-rhev-2.10.0-3.el7.ppc64le
Verified the bug on 
kernel-3.10.0-804.el7.ppc64le
qemu-kvm-rhev-2.10.0-10.el7.ppc64le
Steps please refer to comment 0
Draw a conclusion,the original issue has been fixed so the bug can be moved to verified,thanks a lot.

Comment 28 errata-xmlrpc 2018-04-11 00:44:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104