Bug 204364 - GFS2 log flushing code looping
Summary: GFS2 log flushing code looping
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Whitehouse
QA Contact: Dean Jansa
URL:
Whiteboard:
Depends On:
Blocks: 204760
TreeView+ depends on / blocked
 
Reported: 2006-08-28 18:20 UTC by Russell Cattelan
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: beta2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-12-23 00:06:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Another example (4.35 KB, text/plain)
2006-09-12 09:50 UTC, Steve Whitehouse
no flags Details
Bug fix to stop lockups in log flush code (861 bytes, patch)
2006-11-24 11:28 UTC, Steve Whitehouse
no flags Details | Diff

Description Russell Cattelan 2006-08-28 18:20:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text

Description of problem:
bear-03.lab.msp.redhat.com login: [ 2410.524000] GFS2 (built Aug 28 2006 11:43:40) installed
[ 2474.744000] BUG: soft lockup detected on CPU#1!
[ 2474.748000]  [<c0203d97>] show_trace+0xd/0x10
[ 2474.752000]  [<c02042f5>] dump_stack+0x19/0x1b
[ 2474.760000]  [<c02430df>] softlockup_tick+0xa5/0xb9
[ 2474.764000]  [<c0228558>] run_local_timers+0x12/0x14
[ 2474.768000]  [<c02288bf>] update_process_times+0x3c/0x61
[ 2474.772000]  [<c0214132>] smp_apic_timer_interrupt+0x5f/0x69
[ 2474.780000]  [<c0203807>] apic_timer_interrupt+0x1f/0x24
[ 2474.784000]  [<f8ca12b6>] gfs2_log_flush+0x109/0x2fa [gfs2]
[ 2474.792000]  [<f8c9e50c>] inode_go_sync+0x38/0x98 [gfs2]
[ 2474.796000]  [<f8c9e0d9>] gfs2_glock_drop_th+0xc9/0x14d [gfs2]
[ 2474.800000]  [<f8c9e41a>] inode_go_drop_th+0x12/0x15 [gfs2]
[ 2474.808000]  [<f8c9ca32>] run_queue+0x10d/0x315 [gfs2]
[ 2474.812000]  [<f8c9ce26>] gfs2_glock_dq+0xa3/0xb1 [gfs2]
[ 2474.816000]  [<f8c9ce61>] gfs2_glock_dq_uninit+0xb/0x15 [gfs2]
[ 2474.824000]  [<f8cb0446>] gfs2_statfs_sync+0x201/0x20b [gfs2]
[ 2474.828000]  [<f8c9644a>] gfs2_quotad+0x4c/0x132 [gfs2]
[ 2474.836000]  [<c0230e3b>] kthread+0xc3/0xf0
[ 2474.840000]  [<c0201005>] kernel_thread_helper+0x5/0xb
[ 2484.844000] BUG: soft lockup detected on CPU#1!


It appears gfs2_qutoad gfs2_logd and pdflush is all competing 
to flush the log

Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. write a file to gfs2
2.
3.

Actual Results:


Expected Results:


Additional info:

Comment 1 Steve Whitehouse 2006-08-31 11:29:36 UTC
Please try again with the latest fix now in my git tree:
623d93555c8884768db65ffc11509c93e50dd4db ([GFS2] Fix releasepage bug (fixes
direct i/o writes) as I suspect that this will have fixed this bug.


Comment 2 Steve Whitehouse 2006-09-12 09:50:53 UTC
Created attachment 136063 [details]
Another example

Here is another example of such a lock up. This time reproduced with postmark:

set transactions 100000
set number 100000

The back trace is different, but its probably the same issue. I suspect an
infinite loop since the lock debugging code should have caught any spinlock
recursion/deadlock.

Comment 3 Steve Whitehouse 2006-09-19 10:18:59 UTC
I've had another bash at fixing this. The git commit:
74669416f747363c14dba2ee6137540ae5a6834f has a patch which survives my postmark
test so far.

Comment 4 Steve Whitehouse 2006-09-22 09:54:06 UTC
This bug is not yet fixed. It seems to be timing related as small changes in
unrelated code mean that sometime I see this a lot, and sometimes hardly ever,
but its certainly still happening.

Comment 7 Steve Whitehouse 2006-11-24 11:28:31 UTC
Created attachment 142057 [details]
Bug fix to stop lockups in log flush code

This is the patch which has gone upstream

Comment 9 Don Zickus 2006-12-14 01:06:02 UTC
in 2.6.18-1.2876.el5

Comment 10 RHEL Program Management 2006-12-23 00:06:06 UTC
A package has been built which should help the problem described in 
this bug report. This report is therefore being closed with a resolution 
of CURRENTRELEASE. You may reopen this bug report if the solution does 
not work for you.



Note You need to log in before you can comment on or make changes to this bug.