From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text Description of problem: bear-03.lab.msp.redhat.com login: [ 2410.524000] GFS2 (built Aug 28 2006 11:43:40) installed [ 2474.744000] BUG: soft lockup detected on CPU#1! [ 2474.748000] [<c0203d97>] show_trace+0xd/0x10 [ 2474.752000] [<c02042f5>] dump_stack+0x19/0x1b [ 2474.760000] [<c02430df>] softlockup_tick+0xa5/0xb9 [ 2474.764000] [<c0228558>] run_local_timers+0x12/0x14 [ 2474.768000] [<c02288bf>] update_process_times+0x3c/0x61 [ 2474.772000] [<c0214132>] smp_apic_timer_interrupt+0x5f/0x69 [ 2474.780000] [<c0203807>] apic_timer_interrupt+0x1f/0x24 [ 2474.784000] [<f8ca12b6>] gfs2_log_flush+0x109/0x2fa [gfs2] [ 2474.792000] [<f8c9e50c>] inode_go_sync+0x38/0x98 [gfs2] [ 2474.796000] [<f8c9e0d9>] gfs2_glock_drop_th+0xc9/0x14d [gfs2] [ 2474.800000] [<f8c9e41a>] inode_go_drop_th+0x12/0x15 [gfs2] [ 2474.808000] [<f8c9ca32>] run_queue+0x10d/0x315 [gfs2] [ 2474.812000] [<f8c9ce26>] gfs2_glock_dq+0xa3/0xb1 [gfs2] [ 2474.816000] [<f8c9ce61>] gfs2_glock_dq_uninit+0xb/0x15 [gfs2] [ 2474.824000] [<f8cb0446>] gfs2_statfs_sync+0x201/0x20b [gfs2] [ 2474.828000] [<f8c9644a>] gfs2_quotad+0x4c/0x132 [gfs2] [ 2474.836000] [<c0230e3b>] kthread+0xc3/0xf0 [ 2474.840000] [<c0201005>] kernel_thread_helper+0x5/0xb [ 2484.844000] BUG: soft lockup detected on CPU#1! It appears gfs2_qutoad gfs2_logd and pdflush is all competing to flush the log Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. write a file to gfs2 2. 3. Actual Results: Expected Results: Additional info:
Please try again with the latest fix now in my git tree: 623d93555c8884768db65ffc11509c93e50dd4db ([GFS2] Fix releasepage bug (fixes direct i/o writes) as I suspect that this will have fixed this bug.
Created attachment 136063 [details] Another example Here is another example of such a lock up. This time reproduced with postmark: set transactions 100000 set number 100000 The back trace is different, but its probably the same issue. I suspect an infinite loop since the lock debugging code should have caught any spinlock recursion/deadlock.
I've had another bash at fixing this. The git commit: 74669416f747363c14dba2ee6137540ae5a6834f has a patch which survives my postmark test so far.
This bug is not yet fixed. It seems to be timing related as small changes in unrelated code mean that sometime I see this a lot, and sometimes hardly ever, but its certainly still happening.
Created attachment 142057 [details] Bug fix to stop lockups in log flush code This is the patch which has gone upstream
in 2.6.18-1.2876.el5
A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.