Bug 204364 - GFS2 log flushing code looping
GFS2 log flushing code looping
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Whitehouse
Dean Jansa
:
Depends On:
Blocks: 204760
  Show dependency treegraph
 
Reported: 2006-08-28 14:20 EDT by Russell Cattelan
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: beta2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-22 19:06:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Another example (4.35 KB, text/plain)
2006-09-12 05:50 EDT, Steve Whitehouse
no flags Details
Bug fix to stop lockups in log flush code (861 bytes, patch)
2006-11-24 06:28 EST, Steve Whitehouse
no flags Details | Diff

  None (edit)
Description Russell Cattelan 2006-08-28 14:20:33 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text

Description of problem:
bear-03.lab.msp.redhat.com login: [ 2410.524000] GFS2 (built Aug 28 2006 11:43:40) installed
[ 2474.744000] BUG: soft lockup detected on CPU#1!
[ 2474.748000]  [<c0203d97>] show_trace+0xd/0x10
[ 2474.752000]  [<c02042f5>] dump_stack+0x19/0x1b
[ 2474.760000]  [<c02430df>] softlockup_tick+0xa5/0xb9
[ 2474.764000]  [<c0228558>] run_local_timers+0x12/0x14
[ 2474.768000]  [<c02288bf>] update_process_times+0x3c/0x61
[ 2474.772000]  [<c0214132>] smp_apic_timer_interrupt+0x5f/0x69
[ 2474.780000]  [<c0203807>] apic_timer_interrupt+0x1f/0x24
[ 2474.784000]  [<f8ca12b6>] gfs2_log_flush+0x109/0x2fa [gfs2]
[ 2474.792000]  [<f8c9e50c>] inode_go_sync+0x38/0x98 [gfs2]
[ 2474.796000]  [<f8c9e0d9>] gfs2_glock_drop_th+0xc9/0x14d [gfs2]
[ 2474.800000]  [<f8c9e41a>] inode_go_drop_th+0x12/0x15 [gfs2]
[ 2474.808000]  [<f8c9ca32>] run_queue+0x10d/0x315 [gfs2]
[ 2474.812000]  [<f8c9ce26>] gfs2_glock_dq+0xa3/0xb1 [gfs2]
[ 2474.816000]  [<f8c9ce61>] gfs2_glock_dq_uninit+0xb/0x15 [gfs2]
[ 2474.824000]  [<f8cb0446>] gfs2_statfs_sync+0x201/0x20b [gfs2]
[ 2474.828000]  [<f8c9644a>] gfs2_quotad+0x4c/0x132 [gfs2]
[ 2474.836000]  [<c0230e3b>] kthread+0xc3/0xf0
[ 2474.840000]  [<c0201005>] kernel_thread_helper+0x5/0xb
[ 2484.844000] BUG: soft lockup detected on CPU#1!


It appears gfs2_qutoad gfs2_logd and pdflush is all competing 
to flush the log

Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. write a file to gfs2
2.
3.

Actual Results:


Expected Results:


Additional info:
Comment 1 Steve Whitehouse 2006-08-31 07:29:36 EDT
Please try again with the latest fix now in my git tree:
623d93555c8884768db65ffc11509c93e50dd4db ([GFS2] Fix releasepage bug (fixes
direct i/o writes) as I suspect that this will have fixed this bug.
Comment 2 Steve Whitehouse 2006-09-12 05:50:53 EDT
Created attachment 136063 [details]
Another example

Here is another example of such a lock up. This time reproduced with postmark:

set transactions 100000
set number 100000

The back trace is different, but its probably the same issue. I suspect an
infinite loop since the lock debugging code should have caught any spinlock
recursion/deadlock.
Comment 3 Steve Whitehouse 2006-09-19 06:18:59 EDT
I've had another bash at fixing this. The git commit:
74669416f747363c14dba2ee6137540ae5a6834f has a patch which survives my postmark
test so far.
Comment 4 Steve Whitehouse 2006-09-22 05:54:06 EDT
This bug is not yet fixed. It seems to be timing related as small changes in
unrelated code mean that sometime I see this a lot, and sometimes hardly ever,
but its certainly still happening.
Comment 7 Steve Whitehouse 2006-11-24 06:28:31 EST
Created attachment 142057 [details]
Bug fix to stop lockups in log flush code

This is the patch which has gone upstream
Comment 9 Don Zickus 2006-12-13 20:06:02 EST
in 2.6.18-1.2876.el5
Comment 10 RHEL Product and Program Management 2006-12-22 19:06:06 EST
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.

Note You need to log in before you can comment on or make changes to this bug.