Bug 636534

Summary: XenDomU blkfront hangs
Product: [Fedora] Fedora Reporter: Andrew Jones <drjones>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: dougsland, gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda, ngaywood
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-31 13:06:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Jones 2010-09-22 13:45:12 UTC
When running Fedora as Xen DomU guests it's possible for the tasks (usually associated with blkfront) to start hanging. This is due to the way the event channel irq was set up. Xen events behave like edge triggered interrupts, but the kernel was setting them up as level. Since it's a fundamental issue, basically any action using interrupts could trigger the bug, but it seems you need high activity to increase the probability of hitting it. My reproducer was just to compile a kernel on the guest which triggered interrupts for the xen block device.

xen: use percpu interrupts for IPIs and VIRQs
xen: handle events as edge-triggered

These two patches correct the issue for RHEL6. See bug 550724. Fedora may need some other patches when bringing them in, or it may be able to pick them up when moving to a stable kernel, either way they should integrated into rawhide, as well as released in updates to supported Fedora releases.

Comment 1 Chuck Ebbert 2010-09-22 15:32:46 UTC
They've been in rawhide since late August. And they're also in F14 and F12, in kernels built but not yet released.

Comment 2 Chuck Ebbert 2010-09-22 16:18:13 UTC
Fixes added to 2.6.34.7-57.fc13

Comment 3 Chuck Ebbert 2010-09-23 15:25:02 UTC
Note that the second patch:
   [fb412a178502dc498430723b082a932f797e4763]
   xen: use percpu interrupts for IPIs and VIRQs

Has been reported to cause kernel panic on boot on 2.6.35.5:
   http://marc.info/?t=128509758700002&r=1&w=4&n=2

Comment 4 Norman Gaywood 2010-09-24 03:48:38 UTC
(In reply to comment #3)
> Note that the second patch:
>    [fb412a178502dc498430723b082a932f797e4763]
>    xen: use percpu interrupts for IPIs and VIRQs
> 
> Has been reported to cause kernel panic on boot on 2.6.35.5:
>    http://marc.info/?t=128509758700002&r=1&w=4&n=2

And 2.6.32.22 according to that link.

Seems there is a typo somewhere. I'm not an expert here but the typo is here:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=blobdiff;f=drivers/xen/events.c;h=a4dc7bf8fa137a7731d0855b7bf0a51d35f8f80d;hp=50cfdb02390b63517ed1813479819ed0fcc2d660;hb=c7b28349a66e0aa9a24872dfc55a5296fc01a360;hpb=c5783925493e315f91330241546da7915dcc46e3

(hope that link works). The typo is a:

static struct irq_chip en_percpu_chip __read_mostly = {

instead of:

static struct irq_chip xen_percpu_chip __read_mostly = {

How did it compile?

The typo is not everywhere. For example, this does not have it:

http://permalink.gmane.org/gmane.linux.kernel.commits.head/253917

Comment 5 Norman Gaywood 2010-09-27 22:11:08 UTC
2.6.32.23-170.fc12.x86_64 boots for me.

Now running with irqbalance on. Running with irqbalance off was a work-around for this bug.

Comment 6 Norman Gaywood 2010-10-13 05:25:44 UTC
2.6.32.23-170.fc12.x86_64 has been running for over 15 days now with irqbalance on.

At this stage that kernel is still only available in koji and has not been promoted to updates-testing. However, it seems to definitely fix this problem.

Comment 7 Andrew Jones 2010-10-13 09:54:22 UTC
(In reply to comment #6)
> 2.6.32.23-170.fc12.x86_64 has been running for over 15 days now with irqbalance
> on.
> 
> At this stage that kernel is still only available in koji and has not been
> promoted to updates-testing. However, it seems to definitely fix this problem.

Norman, I'm glad it fixes your problems. Thanks again for your diligence in hunting down the solution!

Andrew

Comment 8 Bug Zapper 2011-05-31 12:51:57 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Andrew Jones 2011-05-31 13:06:15 UTC
The bug is in MODIFIED and is fixed. Looks like the bugzapper mistargeted for a WONTFIX though. I'll close it as CURRENT RELEASE