Bug 1317732 - Destination guest sometimes kernel panic or no response after migration with high stress load
Summary: Destination guest sometimes kernel panic or no response after migration with ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: pagupta
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-15 02:46 UTC by xiywang
Modified: 2018-02-05 09:50 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-05 09:50:12 UTC
Target Upstream Version:


Attachments (Terms of Use)
kernel-panic-1.log (76.37 KB, text/plain)
2016-03-15 02:46 UTC, xiywang
no flags Details
kernel-panic-2.log (49.61 KB, text/plain)
2016-03-15 02:47 UTC, xiywang
no flags Details
xfs-error.log (549 bytes, text/plain)
2016-03-15 02:48 UTC, xiywang
no flags Details
systemd-segfault.log (282 bytes, text/plain)
2016-03-15 02:49 UTC, xiywang
no flags Details
core dump (34.07 KB, text/plain)
2017-03-03 07:32 UTC, xiywang
no flags Details

Description xiywang 2016-03-15 02:46:50 UTC
Created attachment 1136350 [details]
kernel-panic-1.log

Description of problem:
After doing migration with high load (iozone, dd, stress at the same time) 9 times:
1. get kernel panic 3 times;
2. xfs error and guest no response with ssh and console but ping from external can success 3 times;
3. systemd error and guest no response with ssh and console but ping from external can success 1 time;
4. successfully migrated 2 times.

Version-Release number of selected component (if applicable):
Host & Guest kernel:
3.10.0-364.rt56.241.el7.x86_64
Host qemu-kvm:
qemu-kvm-rhev-2.5.0-2.el7.x86_64

How reproducible:
75%

Steps to Reproduce:
1. boot guest in src host
/usr/libexec/qemu-kvm -name rhel7.2-rt-355 -machine pc-i440fx-rhel7.2.0 -cpu IvyBridge -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 \

-drive file=/home/rhel7.2-rt-355.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,snapshot=off -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \

-netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a1:d0:5f \

-monitor stdio -device qxl-vga,id=video0 -serial unix:/tmp/console,server,nowait -vnc :1 -spice port=5900,disable-ticketing

2. set migration parameters in src host hmp
1). migrate_set_speed 2G
2). migrate_set_capability xbzrle on

3. run high stress in guest
1). for((;;)); do iozone -a; done
2). for((;;)); do dd if=/dev/zero if=/home/test bs=1M count=50; done
3). stress --cpu 4 --vm-bytes 2048M --timeout 300s

4. boot guest in dst host with "-incoming tcp:0:4444"

5. migrate
1). migrate -d tcp:10.73.64.233:4444
2). migrate_set_downtime 20

6. observe output of "nc -U /tmp/console" on dst host

Actual results:
Four kinds of result have been observed.
1. kernel panic, 3 times
Only the logs in console been kept, but lost track of core file. Once I get the file I will upload it.
And the log files (I kept 2 files) is in the attachment called kernel-panic-1.log and kernel-panic-2.log.

2. xfs error and guest in dst host have no response, 3 times
Log file is xfs-error.log

3. systemd segfault and guest in dst host have no reponse, 1 time
Log file is systemd-segfault.log

4. guest migrated normally, 2 times

Expected results:
Guest migrated normally all the time

Additional info:
Tested this case on non-rt kernel, no error occured.

Comment 1 xiywang 2016-03-15 02:47:32 UTC
Created attachment 1136351 [details]
kernel-panic-2.log

Comment 2 xiywang 2016-03-15 02:48:25 UTC
Created attachment 1136352 [details]
xfs-error.log

Comment 3 xiywang 2016-03-15 02:49:09 UTC
Created attachment 1136353 [details]
systemd-segfault.log

Comment 6 juzhang 2016-03-24 23:58:40 UTC
Hi Xiywang,

Could you reply comment5?

Best Regards,
Junyi

Comment 14 pagupta 2017-02-13 07:30:55 UTC
Hello xiywang,

Can you please test with kernel >= kernel-rt-3.10.0-548.rt56.456.el7 and check if issue still persists. If it persists Can you please provide me guest core dump to look at what's going on inside guest kernel.

Best regards,
Pankaj

Comment 15 xiywang 2017-03-03 07:30:47 UTC
(In reply to pagupta from comment #14)
> Hello xiywang,
> 
> Can you please test with kernel >= kernel-rt-3.10.0-548.rt56.456.el7 and
> check if issue still persists. If it persists Can you please provide me
> guest core dump to look at what's going on inside guest kernel.
> 
> Best regards,
> Pankaj

Tested again. The issue remains.

Host & Guest kernel:
# uname -r
3.10.0-576.rt56.486.el7.x86_64
Host qemu-kvm-rhev:
#rpm -qa | grep qemu-kvm-rhev
qemu-kvm-rhev-2.8.0-5.el7.x86_64

After migration, the network in guest is down. However, I can still manage the guest by remote-viewer or console. Get the vmcore file and uploaded though no error message in "dmesg" and qemu-kvm command line.

Besides, I'm not sure whether this can help you diagnose:
After manually triggered core dump in guest, I saw these in dmesg of the guest:
[   46.666934] blk_update_request: I/O error, dev fd0, sector 0
[   48.478112] nr_pdflush_threads exported in /proc is scheduled for removal

--
Celia

Comment 16 xiywang 2017-03-03 07:32:28 UTC
Created attachment 1259418 [details]
core dump

Comment 17 pagupta 2017-12-15 09:43:15 UTC
Hi Pei, Xiywang,

Is this bug present with the latest version of realtime KVM.

Thanks,
Pankaj

Comment 18 Pei Zhang 2018-02-05 09:16:25 UTC
(In reply to pagupta from comment #17)
> Hi Pei, Xiywang,
> 
> Is this bug present with the latest version of realtime KVM.
> 
> Thanks,
> Pankaj

Hi Pankaj,

With latest versions, this issue has gone.

Versions:
3.10.0-843.rt56.784.el7.x86_64
qemu-kvm-rhev-2.10.0-19.el7.x86_64
tuned-2.9.0-1.el7.noarch

Steps:
Following steps in Description. All 10 migrations run work well, no any error in host and guest. 


Best Regards,
Pei

Comment 19 pagupta 2018-02-05 09:50:12 UTC
Hi Pei,

Thanks for testing this. As per comment 18, closing this BZ. Feel free
to reopen BZ if issue persists.

Thanks,
Pankaj


Note You need to log in before you can comment on or make changes to this bug.