Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1317732

Summary: Destination guest sometimes kernel panic or no response after migration with high stress load
Product: Red Hat Enterprise Linux 7 Reporter: xiywang
Component: kernel-rtAssignee: pagupta
kernel-rt sub component: KVM QA Contact: Pei Zhang <pezhang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: bhu, chayang, hhuang, juzhang, michen, pezhang, qzhang, virt-maint, williams, xfu, xiywang
Version: 7.3   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-05 09:50:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel-panic-1.log
none
kernel-panic-2.log
none
xfs-error.log
none
systemd-segfault.log
none
core dump none

Description xiywang 2016-03-15 02:46:50 UTC
Created attachment 1136350 [details]
kernel-panic-1.log

Description of problem:
After doing migration with high load (iozone, dd, stress at the same time) 9 times:
1. get kernel panic 3 times;
2. xfs error and guest no response with ssh and console but ping from external can success 3 times;
3. systemd error and guest no response with ssh and console but ping from external can success 1 time;
4. successfully migrated 2 times.

Version-Release number of selected component (if applicable):
Host & Guest kernel:
3.10.0-364.rt56.241.el7.x86_64
Host qemu-kvm:
qemu-kvm-rhev-2.5.0-2.el7.x86_64

How reproducible:
75%

Steps to Reproduce:
1. boot guest in src host
/usr/libexec/qemu-kvm -name rhel7.2-rt-355 -machine pc-i440fx-rhel7.2.0 -cpu IvyBridge -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 \

-drive file=/home/rhel7.2-rt-355.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,snapshot=off -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 \

-netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a1:d0:5f \

-monitor stdio -device qxl-vga,id=video0 -serial unix:/tmp/console,server,nowait -vnc :1 -spice port=5900,disable-ticketing

2. set migration parameters in src host hmp
1). migrate_set_speed 2G
2). migrate_set_capability xbzrle on

3. run high stress in guest
1). for((;;)); do iozone -a; done
2). for((;;)); do dd if=/dev/zero if=/home/test bs=1M count=50; done
3). stress --cpu 4 --vm-bytes 2048M --timeout 300s

4. boot guest in dst host with "-incoming tcp:0:4444"

5. migrate
1). migrate -d tcp:10.73.64.233:4444
2). migrate_set_downtime 20

6. observe output of "nc -U /tmp/console" on dst host

Actual results:
Four kinds of result have been observed.
1. kernel panic, 3 times
Only the logs in console been kept, but lost track of core file. Once I get the file I will upload it.
And the log files (I kept 2 files) is in the attachment called kernel-panic-1.log and kernel-panic-2.log.

2. xfs error and guest in dst host have no response, 3 times
Log file is xfs-error.log

3. systemd segfault and guest in dst host have no reponse, 1 time
Log file is systemd-segfault.log

4. guest migrated normally, 2 times

Expected results:
Guest migrated normally all the time

Additional info:
Tested this case on non-rt kernel, no error occured.

Comment 1 xiywang 2016-03-15 02:47:32 UTC
Created attachment 1136351 [details]
kernel-panic-2.log

Comment 2 xiywang 2016-03-15 02:48:25 UTC
Created attachment 1136352 [details]
xfs-error.log

Comment 3 xiywang 2016-03-15 02:49:09 UTC
Created attachment 1136353 [details]
systemd-segfault.log

Comment 6 juzhang 2016-03-24 23:58:40 UTC
Hi Xiywang,

Could you reply comment5?

Best Regards,
Junyi

Comment 14 pagupta 2017-02-13 07:30:55 UTC
Hello xiywang,

Can you please test with kernel >= kernel-rt-3.10.0-548.rt56.456.el7 and check if issue still persists. If it persists Can you please provide me guest core dump to look at what's going on inside guest kernel.

Best regards,
Pankaj

Comment 15 xiywang 2017-03-03 07:30:47 UTC
(In reply to pagupta from comment #14)
> Hello xiywang,
> 
> Can you please test with kernel >= kernel-rt-3.10.0-548.rt56.456.el7 and
> check if issue still persists. If it persists Can you please provide me
> guest core dump to look at what's going on inside guest kernel.
> 
> Best regards,
> Pankaj

Tested again. The issue remains.

Host & Guest kernel:
# uname -r
3.10.0-576.rt56.486.el7.x86_64
Host qemu-kvm-rhev:
#rpm -qa | grep qemu-kvm-rhev
qemu-kvm-rhev-2.8.0-5.el7.x86_64

After migration, the network in guest is down. However, I can still manage the guest by remote-viewer or console. Get the vmcore file and uploaded though no error message in "dmesg" and qemu-kvm command line.

Besides, I'm not sure whether this can help you diagnose:
After manually triggered core dump in guest, I saw these in dmesg of the guest:
[   46.666934] blk_update_request: I/O error, dev fd0, sector 0
[   48.478112] nr_pdflush_threads exported in /proc is scheduled for removal

--
Celia

Comment 16 xiywang 2017-03-03 07:32:28 UTC
Created attachment 1259418 [details]
core dump

Comment 17 pagupta 2017-12-15 09:43:15 UTC
Hi Pei, Xiywang,

Is this bug present with the latest version of realtime KVM.

Thanks,
Pankaj

Comment 18 Pei Zhang 2018-02-05 09:16:25 UTC
(In reply to pagupta from comment #17)
> Hi Pei, Xiywang,
> 
> Is this bug present with the latest version of realtime KVM.
> 
> Thanks,
> Pankaj

Hi Pankaj,

With latest versions, this issue has gone.

Versions:
3.10.0-843.rt56.784.el7.x86_64
qemu-kvm-rhev-2.10.0-19.el7.x86_64
tuned-2.9.0-1.el7.noarch

Steps:
Following steps in Description. All 10 migrations run work well, no any error in host and guest. 


Best Regards,
Pei

Comment 19 pagupta 2018-02-05 09:50:12 UTC
Hi Pei,

Thanks for testing this. As per comment 18, closing this BZ. Feel free
to reopen BZ if issue persists.

Thanks,
Pankaj