247158 – process and kernel crashes under xen kernel with xfs stack overflow

Bug 247158 - process and kernel crashes under xen kernel with xfs stack overflow

Summary: process and kernel crashes under xen kernel with xfs stack overflow

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel-xen
Sub Component:
Version:	7
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Eduardo Habkost
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-07-05 18:21 UTC by kennyt
Modified:	2009-12-14 20:41 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-06-17 01:48:20 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
crash w/ usb (29.96 KB, text/plain) 2007-07-05 18:21 UTC, kennyt	no flags	Details
crash w/o usb (27.68 KB, text/plain) 2007-07-05 18:21 UTC, kennyt	no flags	Details
View All

Description kennyt 2007-07-05 18:21:00 UTC

Description of problem:


Version-Release number of selected component (if applicable):
2.6.20-2925.11.fc7xen

How reproducible:
always

Steps to Reproduce:
1. boot into xen0 (with or without xend running)
2. start a large rsync going either to or from affected box
  
Actual results:
rsync crashes in kernel space, following hypervisor calls

sometimes system hangs, gets checked, reboots

Expected results:
no oopses

Additional info:
please see the two attached `dmesg`s, one with USB and one without. Apparently
not the same problem as bug 190636...

The local FS is on a single-physical-volume LVM set with no RAID. This problem
does not occur under non-Xen kernels.

Comment 1 kennyt 2007-07-05 18:21:00 UTC

Created attachment 158607 [details]
crash w/ usb

Comment 2 kennyt 2007-07-05 18:21:43 UTC

Created attachment 158608 [details]
crash w/o usb

Comment 3 Richard W.M. Jones 2007-07-05 18:39:33 UTC

This sounds similar to bug 190636, although the stack trace is
different.

I tried to reproduce this one a few times, but unfortunately I only have
a 10 Mbps hub to drive data through, so I don't seem to be able to
push the network fast enough to see this.

Comment 4 kennyt 2007-07-05 19:03:34 UTC

I just tested it locally. I got a complete lockup, with a fatal error displayed
on the console, no network response, and a quick reboot.

However, I couldn't get a crash using / (ext3, raid1) as the target volume. It
only happened when I tried another LVM, xfs volume.

Comment 5 Richard W.M. Jones 2007-07-05 19:24:12 UTC

This was a good excuse for me to get rid of my 10 Mbps hub and
order a shiny new gigE switch, so in a few days I'll be able to
test this again.

Comment 6 kennyt 2007-07-05 19:27:27 UTC

> I just tested it locally.

By that, I meant that I was rsyncing from localhost to localhost, and it crashed.

Comment 7 Red Hat Bugzilla 2007-07-25 01:42:44 UTC

change QA contact

Comment 8 Eduardo Habkost 2007-08-02 19:45:43 UTC

(In reply to comment #4)
> 
> However, I couldn't get a crash using / (ext3, raid1) as the target volume.
> It only happened when I tried another LVM, xfs volume.


I've noticed XFS is abusing the stack. Two functions are using more than 400 
bytes in the stack, each, and many others are using more than 100 bytes. 
Unless there are XFS fixes for stack usage to be picked from the non-xen 
kernel, the only fix would be increasing the stack size until XFS stack usage 
improves.

Comment 9 Eric Sandeen 2007-10-17 04:25:59 UTC

Yes, these are stack overflows from xfs, plain and simple:  

do_IRQ: stack overflow: 460

F8 should be a bit better in that regard, if you'd like to test, but I'm still
of the belief that if you stack up enough IO layers, 4k will be trouble for any
filesystem.  It's just that xfs does indeed have trouble a bit sooner. :)

Comment 10 Bug Zapper 2008-05-14 13:25:30 UTC

This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 11 Bug Zapper 2008-06-17 01:48:18 UTC

Fedora 7 changed to end-of-life (EOL) status on June 13, 2008. 
Fedora 7 is no longer maintained, which means that it will not 
receive any further security or bug fix updates. As a result we 
are closing this bug. 

If you can reproduce this bug against a currently maintained version 
of Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.