Red Hat Bugzilla – Full Text Bug Listing
|Summary:||xfs on large volumes with 4k-stack kernel crashes|
|Product:||[Fedora] Fedora||Reporter:||Darko Veberic <darko.veberic>|
|Component:||kernel||Assignee:||Eric Sandeen <esandeen>|
|Status:||CLOSED UPSTREAM||QA Contact:||Brian Brock <bbrock>|
|Version:||5||CC:||cattelan, cebbert, davej, jarod, nvalentine, wtogami|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2007-09-17 15:41:58 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Darko Veberic 2007-02-05 06:39:20 EST
Description of problem: kernel crashes during heavy load on large (>1TB md) xfs valumes when the kernel stack size is only 4k Version-Release number of selected component (if applicable): 2.6.18-1.2257.fc5 How reproducible: once or two times per week, on heavy load Steps to Reproduce: 1. heavy load from several sources: httpd, nfs, scp, dd Actual results: xfs related kernel panic, or simply freeze Expected results: ??? work without flaws, maybe??? Additional info: i have an impression this is a known issue that xfs and 4k do not mix anymore. you should decide to either (1) switch to default 8k or (2) remove xfs support since a lot of people is losing data (one of the crashes required complete format since xfs_repair was segfaulting on such mess)...
Comment 1 Dave Jones 2007-02-13 11:44:25 EST
we need to see the traces really to be able to do anything with this.
Comment 2 Eric Sandeen 2007-02-13 12:31:23 EST
Yep, stacking xfs over MD and nfs and ... is going to push the limits of the 4k stack. sgi guys have done a little work to try to trim down stack usage but adding IO layers will help add it up again. Seeing traces would at least identify the particular callchain that you ran into.... a couple of thoughts on making this more robust/safe for fedora users... perhaps we could refuse to even load the xfs module into a 4kstacks kernel unless loaded with an "i_want_to_live_on_the_edge" module option or somesuch... also, Russell has suggested the possibility of making stack size a boot time parameter; but I bet that will go over like a ton of bricks upstream. -Eric
Comment 3 Eric Sandeen 2007-03-08 10:59:36 EST
We need more info on how this actually failed, or we can't even begin to address it. Backtraces please?
Comment 4 paul.knowles 2007-06-12 10:34:38 EDT
Created attachment 156797 [details] Panic backtrace of XFS stack overflow Kernel panic trace from a Hyperthreaded P4 running 2.6.20-1.2316.fc5smp. The system provides NFS access to a 1.3T XFS volume on software raid 5. The machine is also used to analyze the data written to the discs. Under heavy load (two memory intensive processes) plus a constant (~1Mb/s) write rate to the discs, the machine falls over after some unpredictable time between a few hours to a few days.
Comment 5 Eric Sandeen 2007-06-12 10:51:58 EDT
Thanks, I'll look over that stack backtrace. Interestingly, there seems to be no xfs functions in the stack which caused the initial warning. FWIW in the past I think I've seen the stack overflow warning itself cause the *actual* stack overflow... Honestly, xfs + any complex IO stack will have a very hard time fitting into 4k. SGI has been chipping away at the problem, but the low-hanging fruit has all been found... If this is critical to your workflow, I would seriously suggest using an x86_64 box for the purpose, which has 8k stacks (yes, they must share that 8k with irqs, but in my experience it's been a much more robust combination). I'll look over those stack traces & see if anything interesting pops out, though. -Eric
Comment 6 Eric Sandeen 2007-06-12 10:55:31 EDT
ah, I think the backtrace has the stack warning intermingled with the subsequent oops, it will take a bit to untangle it. (but yeah, xfs is in there) :)
Comment 7 Eric Sandeen 2007-08-09 15:42:40 EDT
*** Bug 240077 has been marked as a duplicate of this bug. ***
Comment 8 Eric Sandeen 2007-09-11 15:19:33 EDT
The other fun thing here is that the stack warning *itself* tends to push the stack over the edge, if you're close enough to make it go off. Neat, eh? I've sent something upstream to see about addressing this, but it's not been accepted yet. -Eric
Comment 9 Eric Sandeen 2007-09-17 15:41:58 EDT
I'm going to close this as "UPSTREAM" - I'm never going to fully resolve this bug, though I do occasionally try to whack down the worst offenders if I can. Realistically, the problem is known, and it's going to take upstream work (from sgi, on xfs, and maybe from core kernel work to ease stack for stacked IO). For now, as of fedora devel today, xfs is reasonably functional on 4k stacks, esp. if you don't put it over a volume manager....