Bug 709740 - [RHEL 5.6] High softirq load on a Xen host
Summary: [RHEL 5.6] High softirq load on a Xen host
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.6
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 514489
TreeView+ depends on / blocked
 
Reported: 2011-06-01 14:35 UTC by asilva
Modified: 2018-11-14 12:45 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-26 13:54:00 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description asilva 2011-06-01 14:35:32 UTC
>Description of problem:
Customer experienced a sudden high softirq load on a Xen host, that forced them to migrate the Xen guests away to another host. The softirq load decreased linearly as guests were migrated out. 

Customer runs a Xen virtualization host, ux004, that hosts several Xen paravirtualized guests (vux0xx).

- On May 4th, ux004 suddently experienced a massive softirq load in the host around 10:00, that lead both the host and the guests unusable.
- Customer starts to shutdown the guests, and after managing to stop between three and five of them, softirq load in the host dropped to nearly 0%.
- They still shutdown the remaining guests and upgrade the host kernel from 2.6.18-238.5.1 to 2.6.18-238.9.1 and increase dom0 vcpus from 2 to 4.
- /var/log/xen/xend.log show multiple entries like this one:
"Cannot recreate information for dying domain 24.  Xend will ignore this domain from now on."
- BZ#695369 is not belived to be the cause of the softirq load.

>Supporting information:
sosreport-ux004.201105041307-772065-0e0750.tar.bz2
ux004_sar04.pdf (sar statistics for May 4th)

A different event, possibly unrelated to the first one, happened on May 19th on a different virtualization host, ux001, running 2.6.18-238.12 (hotfix for BZ#695369) and with 'hardvirt' enabled.

- Customer started live migrating xen paravirtualized guests from ux002 to ux001 at 06:36.
- After live migrating around 20 guests to ux001, sudden high softirq on ux001 occurs at 06:55.
- Customer stops the live migrations, and they start to live migrate the guests from ux001 back to ux002.
- dom0 experiences a higher than usual steal time (up to 7%).

>Supporting information:
sosreport-ux001.00469714-757550-73e9fb.tar.bz2
sosreport-ux002.00469714-514346-5fa7bf.tar.bz2
ux001_sar19.pdf (sar statistics for May 19th)
ux002_sar19.pdf (sar statistics for May 19th)
ux001-softirq-analysis.tar.bz2 containing:
- 3 captures of /proc/interrupts (1 total, 1 for blk irqs, 1 for net irqs): proc-interrupts*.txt
- 4 captures of /proc/irq/*/smp_affinity as we set it: irq.smp_affinity.*.txt
- 1 five minutes mpstat log showing softirq per cpu with the above pinning: mpstat.20110519-073316.log
- 1 vuxiostat capture also showing steal time: vuxiostat.ux001.20110519-073833.log

>Supporting information:
I've attached the oprofile:
- oprofile_test_reports.tar.gz
- oprofile_test_reports.tar.gz

>How reproducible:
We have a reproducer at the FAB lab:

Xen hosts:
10.33.8.90  r210xen.gsslab.fab.redhat.com
10.33.8.140 dhcp-140.gsslab.fab.redhat.com

iSCSI server for the shared storage:
10.33.8.75  pe1950-5.gsslab.fab.redhat.com

All guests named RHEL5LMx can be live migrated between both Xen hosts, and xenoprofile should be installed, at least, on r210xen.
  
>Actual results:
experienced a sudden high softirq load on a Xen host

>Expected results:
identify why it's getting a high sofirq load

Comment 1 asilva 2011-06-01 14:40:50 UTC
Created attachment 502289 [details]
ux004_sar04.pdf

Comment 2 asilva 2011-06-01 14:43:19 UTC
Created attachment 502291 [details]
ux002_sar19.pdf

Comment 3 asilva 2011-06-01 14:44:22 UTC
Created attachment 502292 [details]
ux001-softirq-analysis.tar.bz2

Comment 4 asilva 2011-06-01 14:50:53 UTC
Created attachment 502293 [details]
ux001_sar19.pdf

Comment 6 asilva 2011-06-01 14:59:40 UTC
Created attachment 502295 [details]
sosreport-ux004.201105041307-772065-0e0750.tar.bz2

Comment 7 asilva 2011-06-01 15:09:26 UTC
Created attachment 502300 [details]
sosreport-ux001.00469714-757550-73e9fb.tar.bz2

Comment 8 asilva 2011-06-01 15:12:23 UTC
Created attachment 502302 [details]
sosreport-ux002.00469714-514346-5fa7bf.tar.bz2

Comment 17 RHEL Program Management 2011-06-21 05:32:35 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 22 Laszlo Ersek 2011-08-25 12:30:55 UTC
Hello Julio, I'm going to close this BZ as INSU tomorrow. Thank you.


Note You need to log in before you can comment on or make changes to this bug.