Bug 231910

Summary:

GFS2 withdraws if you are trying to remove a file while doing a du

Product:

Red Hat Enterprise Linux 5

Reporter:

Josef Bacik <jbacik>

Component:

kernel

Assignee:

Don Zickus <dzickus>

Status:

CLOSED ERRATA

QA Contact:

Martin Jenner <mjenner>

Severity:

medium

Docs Contact:

Priority:

high

Version:

5.1

CC:

adas, amirkin, bmarzins, kanderso, lwang, rkenna, swhiteho

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0959

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-11-07 19:43:46 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
A test patch to try and narrow down the cause of this bug	none
An updated test patch	none
Force gfs2_ail_empty_gl to flush log	none
flush after filemap_fdatawrite	none

Description Josef Bacik 2007-03-12 22:29:16 UTC

getting this down so I remember to look at it tomorrow.  Basically I copied the
kernel source onto my gfs2 filesystem, did an rm -rf, and then on another node
did a du -h, and I got this on the node that was doing the rm -rf

GFS2: fsid=rhel5cluster:gfs2lv1.1: fatal: assertion
"gfs2_glock_is_held_excl(gl)" failed
GFS2: fsid=rhel5cluster:gfs2lv1.1:   function = glock_lo_after_commit, file =
fs/gfs2/lops.c, line = 61
GFS2: fsid=rhel5cluster:gfs2lv1.1: about to withdraw this file system
GFS2: fsid=rhel5cluster:gfs2lv1.1: telling LM to withdraw
inode_doinit_with_dentry:  getxattr returned 5 for dev=dm-2 ino=1436481

btw this is with steve's GLF_DEMOTE patch, not that I think that its the
problem, just an FYI.

Comment 2 Steve Whitehouse 2007-03-13 09:01:25 UTC

This looks like a problem related to a glock being demoted incorrectly. If the
glock is in a transaction, the transaction must be committed before the glock is
demoted. The glock's various operations in glops.c should take account of that
and it might well be a wrong assumption of the GLF_DEMOTE patch. I'll take a
look at that today.

Comment 3 Steve Whitehouse 2007-03-13 10:44:38 UTC

I've looked through the patch and I can't spot anything that might have the
effect you are seeing. I've also traced through the code relating to xattrs and
not found a lot thats enlightening there either.

The check thats triggering on line 61 of lops.c is just checking to see that, at
the end of the transaction, the glock in question is still locked in the
exclusive  state. 

The call to getxattr is, as a "read-only" call, a bit of a red-herring I think
as it just happens to be the next place the lock is used and the EIO returned is
due to the I/O failing as a result of the earlier triggered assertion.

So there are two questions which spring to mind. Firstly, are you using selinux
on this filesystem? and secondly can you get the assertion to print out which
type of glock has actually caused this assertion to trigger? as that will be a
big clue in solving this.

Comment 4 Andrey Mirkin 2007-03-13 15:10:08 UTC

Our team has encountered the same assertion during running bonnie++ test on GFS2
partition. It is 100% reproducable. I'll try to add some debug and find out what
type of glock caused this assertion.

Comment 5 Josef Bacik 2007-03-13 21:15:15 UTC

rh5cluster1.gsslab.rdu.redhat.com login: gl type is 2
GFS2: fsid=rhel5cluster:gfs2lfs2.0: fatal: 
assertion "gfs2_glock_is_held_excl(gl)" failed
GFS2: fsid=rhel5cluster:gfs2lfs2.0:   function = glock_lo_after_commit, file = 
fs/gfs2/lops.c, line = 64
GFS2: fsid=rhel5cluster:gfs2lfs2.0: about to withdraw this file system
GFS2: fsid=rhel5cluster:gfs2lfs2.0: telling LM to withdraw

so the gl type is LM_TYPE_INODE.

Comment 6 Andrey Mirkin 2007-03-14 16:49:03 UTC

glock type 2 state 0
GFS2: fsid=alpha:mygfs.0: fatal: assertion "gfs2_glock_is_held_excl(gl)" failed
GFS2: fsid=alpha:mygfs.0:   function = glock_lo_after_commit, file = 
fs/gfs2/lops.c, line = 63
GFS2: fsid=alpha:mygfs.0: about to withdraw this file system
GFS2: fsid=alpha:mygfs.0: telling LM to withdraw

In our case glock type is LM_TYPE_INODE too, and its state is LM_ST_UNLOCKED.

Comment 7 Steve Whitehouse 2007-03-16 16:08:05 UTC

Created attachment 150247 [details]
A test patch to try and narrow down the cause of this bug

Here is a test patch whose aim is to gather information to help solve this
particular bug. In order for it to be useful, I'll need the full output once
the bug has triggered, and also, since one of the items is a pid, I'd like to
know which process it relates to. Since the main culprits here are the gfs2/dlm
daemons, a "ps" after mounting gfs2, but before running the test which triggers
this should be almost as good as data gathered afterwards (in case thats not
possible).

I have been through all the glock and lops code as it applied to the inode
glocks looking for the cause of this bug and its rather elusive it seems. We
know (from other asserts, not triggered) that the glock was certainly in the
exclusive mode when it was added to the transaction. The aim of the patch is to
catch the act of lock demotion of the lock thats still in the transaction and
thus find out which process is responsible.

The patch should apply cleanly to the current GFS2 -nmw git tree which has
recently been rebased against 2.6.21-rc4.

I have had reports that in some cases, there are glocks which have a zero ref
count (which should be impossible) triggering an assert elsewhere in the code
and I have a strong suspicion that the two are linked, but this seems the
easier symptom to use to track down the problem.

Comment 8 Andrey Mirkin 2007-03-16 17:15:15 UTC

Do you plan to fix this bug in RHEL5 (2.6.18-8.el5) kernel?

Comment 9 Steve Whitehouse 2007-03-16 17:20:21 UTC

It will be fixed for release 5.1.0 but it needs to (and can) make it into the
upstream kernel first, or at least as far as my -nmw git tree anyway. So the
answer is yes, it will be fixed, but I'm not sure whether it will be in -8 or a
later one.

Comment 10 Abhijith Das 2007-03-24 03:47:20 UTC

Steve,
I hit this assert with your debug patch in place. This is with the latest code
from the nmw tree.
Below is the output from the debug patch, followed by 'ps' under kdb and finally
the backtrace('btp') of the active process, lock_dlm1. Let me know if you need
more info.

Glock 0xffff81007b7a28b8 (2, 5248259)
  gl_flags = 
  gl_ref = 4
  gl_state = 0
  gl_owner = xdoio
  gl_ip = 18446744071696625455
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  le = yes
  reclaim = no
  aspace = 0xffff810069c02040 nrpages = 0
  ail = 0
  Inode:
    num = 65/5248259
    type = 8
    i_flags = 0 

------------[ cut here ]------------
kernel BUG at fs/gfs2/glock.c:123!
invalid opcode: 0000 [1] SMP 

Entering kdb (current=0xffff81007d362280, pid 6738) on processor 1 Oops: <NULL
>
due to oops @ 0xffffffff880515b4
     r15 = 0x0000000000000001      r14 = 0xffffffff88067880 
     r13 = 0x0000000000000000      r12 = 0xffffffff88069399 
     rbp = 0x0000000000000000      rbx = 0xffff81007b7a28b8 
     r11 = 0x0000000000000000      r10 = 0x000000000000001f 
      r9 = 0x0000000000000020       r8 = 0x000000000000000d 
     rax = 0x0000000000000000      rcx = 0x000000000000635c 
     rdx = 0x00000000ffffffff      rsi = 0xffffffff807c3780 
     rdi = 0xffffffff806b9e7c orig_rax = 0xffffffffffffffff 
     rip = 0xffffffff880515b4       cs = 0x0000000000000010 
  eflags = 0x0000000000010246      rsp = 0xffff810071309e00 
      ss = 0x0000000000000000 &regs = 0xffff810071309d68


[1]kdb> ps
49 sleeping system daemon (state M) processes suppressed
Task Addr               Pid   Parent [*] cpu State Thread             Command
0xffff8100717b6760     7303     7302  1    0   R  0xffff8100717b6a20  xdoio
0xffff81007d362280     6738       11  1    1   R  0xffff81007d362540 *lock_dlm1

0xffff810037d814e0        1        0  0    1   S  0xffff810037d817a0  init
0xffff81007f2089a0      950       11  0    1   R  0xffff81007f208c60  kjournald
0xffff81007f3594e0      999        1  0    1   S  0xffff81007f3597a0  udevd
0xffff81007f2130c0     5874        1  0    0   S  0xffff81007f213380  dhclient
0xffff81007f362080     5975        1  0    0   D  0xffff81007f362340  syslogd
0xffff81007f209080     5978        1  0    0   S  0xffff81007f209340  klogd
0xffff81007c7b03c0     5989        1  0    0   S  0xffff81007c7b0680  irqbalance
0xffff81007c001560     6009        1  0    0   S  0xffff81007c001820  portmap
0xffff81007f39ef00     6029        1  0    1   S  0xffff81007f39f1c0  rpc.statd
0xffff81007d26b760     6103        1  0    0   S  0xffff81007d26ba20  ccsd
0xffff810002ffcf00     6104        1  0    0   S  0xffff810002ffd1c0  ccsd
0xffff81007f362760     6109        1  0    0   S  0xffff81007f362a20  aisexec
0xffff81007f358040     6110        1  0    1   S  0xffff81007f358300  aisexec
0xffff81007f394ec0     6111        1  0    1   S  0xffff81007f395180  aisexec
0xffff81007cf10860     6118        1  0    0   S  0xffff81007cf10b20  aisexec
0xffff81007f1c2f80     6119        1  0    0   S  0xffff81007f1c3240  aisexec
0xffff81007c2655a0     6140        1  0    1   S  0xffff81007c265860  aisexec
[1]more>  
Only 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007f1c21c0     6141        1  0    0   S  0xffff81007f1c2480  aisexec
0xffff81007c024400     6142        1  0    0   S  0xffff81007c0246c0  aisexec
0xffff81007f244b20     6143        1  0    0   S  0xffff81007f244de0  aisexec
0xffff81007f18e820     6144        1  0    0   S  0xffff81007f18eae0  aisexec
0xffff81007f279520     6145        1  0    0   S  0xffff81007f2797e0  aisexec
0xffff81007f36d560     6178        1  0    0   S  0xffff81007f36d820  aisexec
0xffff81007f2137a0     6179        1  0    0   S  0xffff81007f213a60  aisexec
0xffff81007f244440     6735        1  0    0   S  0xffff81007f244700  aisexec
0xffff81007f245200     6736        1  0    0   S  0xffff81007f2454c0  aisexec
0xffff81007cf11620     6742        1  0    0   S  0xffff81007cf118e0  aisexec
0xffff81007d26a9a0     6744        1  0    0   S  0xffff81007d26ac60  aisexec
0xffff81007c2ec140     6117        1  0    0   S  0xffff81007c2ec400  groupd
0xffff81007cf10f40     6125        1  0    0   S  0xffff81007cf11200  fenced
0xffff81007f24f4e0     6131        1  0    1   S  0xffff81007f24f7a0  dlm_controld
0xffff81007d363040     6137        1  0    1   R  0xffff81007d363300  gfs_controld
0xffff81007f36ce80     6157        1  0    0   S  0xffff81007f36d140  dbus-daemon
0xffff81007f36c7a0     6168        1  0    0   S  0xffff81007f36ca60  clvmd
0xffff810037c62240     6185        1  0    0   S  0xffff810037c62500  clvmd
0xffff81007f358720     6186        1  0    1   S  0xffff81007f3589e0  clvmd
0xffff81007f362e40     6278        1  0    1   S  0xffff81007f363100  pcscd
0xffff81007f1ff720     6310        1  0    0   S  0xffff81007f1ff9e0  pcscd
0xffff81007f1c3660     6301        1  0    0   S  0xffff81007f1c3920  automount
[1]more> 
Only 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007f36c0c0     6302        1  0    0   S  0xffff81007f36c380  automount
0xffff81007f0b0860     6303        1  0    1   S  0xffff81007f0b0b20  automount
0xffff81007f18f5e0     6306        1  0    0   S  0xffff81007f18f8a0  automount
0xffff81007f1c28a0     6309        1  0    0   S  0xffff81007f1c2b60  automount
0xffff81007f1cc8e0     6320        1  0    1   S  0xffff81007f1ccba0  acpid
0xffff81007f24e720     6330        1  0    1   S  0xffff81007f24e9e0  cupsd
0xffff81007f278e40     6343        1  0    0   S  0xffff81007f279100  sshd
0xffff81007f39f5e0     6354        1  0    0   S  0xffff81007f39f8a0  xinetd
0xffff81007c000e80     6367        1  0    1   S  0xffff81007c001140  ntpd
0xffff810037c63000     6386        1  0    0   S  0xffff810037c632c0  sendmail
0xffff81007f2458e0     6393        1  0    1   S  0xffff81007f245ba0  sendmail
0xffff81007c264ec0     6409        1  0    1   S  0xffff81007c265180  gpm
0xffff81007f1cc200     6421        1  0    1   S  0xffff81007f1cc4c0  crond
0xffff81007f358e00     6447        1  0    1   S  0xffff81007f3590c0  xfs
0xffff81007f18ef00     6466        1  0    1   S  0xffff81007f18f1c0  atd
0xffff81007c264100     6494        1  0    1   S  0xffff81007c2643c0  yum-updatesd
0xffff81007f3955a0     6505        1  0    0   S  0xffff81007f395860  avahi-daemon
0xffff81007f3947e0     6506     6505  0    0   S  0xffff81007f394aa0  avahi-daemon
0xffff81007d2097a0     6516        1  0    0   S  0xffff81007d209a60  hald
0xffff81007c0258a0     6517     6516  0    1   S  0xffff81007c025b60  hald-runner
0xffff81007f209760     6523     6517  0    1   S  0xffff81007f209a20 
hald-addon-acpi
0xffff81007c2ec820     6524     6517  0    0   S  0xffff81007c2ecae0 
hald-addon-cpuf
[1]more> 
Only 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007c2ecf00     6530     6517  0    0   S  0xffff81007c2ed1c0 
hald-addon-keyb
0xffff81007c0000c0     6532     6517  0    1   S  0xffff81007c000380 
hald-addon-keyb
0xffff81007cf10180     6569        1  0    0   S  0xffff81007cf10440  smartd
0xffff81007d208300     6572        1  0    1   S  0xffff81007d2085c0  agetty
0xffff81007c0251c0     6573        1  0    1   S  0xffff81007c025480  mingetty
0xffff81007c2647e0     6576        1  0    1   S  0xffff81007c264aa0  mingetty
0xffff81007c7b6e40     6581        1  0    1   S  0xffff81007c7b7100  mingetty
0xffff81007f1855a0     6582        1  0    0   S  0xffff81007f185860  mingetty
0xffff81007f1847e0     6583        1  0    0   S  0xffff81007f184aa0  mingetty
0xffff81007f184100     6584        1  0    0   S  0xffff81007f1843c0  mingetty
0xffff81007cc90a60     6659     6343  0    0   S  0xffff81007cc90d20  sshd
0xffff81007c0cd8e0     6661     6659  0    0   S  0xffff81007c0cdba0  bash
0xffff81007d362280     6738       11  1    1   R  0xffff81007d362540 *lock_dlm1
0xffff81007f212300     7030     6354  0    0   S  0xffff81007f2125c0  btimed
0xffff81007f278080     7299     6354  0    1   S  0xffff81007f278340  qarshd
0xffff81007f34f8e0     7300     7299  0    1   R  0xffff81007f34fba0  xdoio
0xffff8100711ea720     7301     7300  0    0   R  0xffff8100711ea9e0  xdoio
0xffff810037c62920     7302     6354  0    0   S  0xffff810037c62be0  qarshd
0xffff8100717b6760     7303     7302  1    0   R  0xffff8100717b6a20  xdoio
0xffff8100717b6e40     7304     7303  0    0   R  0xffff8100717b7100  xdoio
0xffff81007f2082c0     7305     6354  0    0   S  0xffff81007f208580  qarshd
0xffff81007f24e040     7306     7305  0    1   R  0xffff81007f24e300  xdoio
[1]more> 
nly 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007f24ee00     7307     6354  0    1   S  0xffff81007f24f0c0  qarshd
0xffff81007f39e820     7308     7306  0    1   R  0xffff81007f39eae0  xdoio
0xffff8100717b6080     7309     7307  0    1   R  0xffff8100717b6340  xdoio
0xffff81007c7b6760     7310     7309  0    0   R  0xffff81007c7b6a20  xdoio


[1]kdb> btp 6738
Stack traceback for pid 6738
0xffff81007d362280     6738       11  1    1   R  0xffff81007d362540 *lock_dlm1
rsp                rip                Function (args)
0xffff810071309de8 0xffffffff880515b4 [gfs2]glock_in_wrong_state+0x42
0xffff810071309e28 0xffffffff88052d0b [gfs2]drop_bh+0x14b
0xffff810071309e58 0xffffffff880526e2 [gfs2]gfs2_glock_cb+0xcb
0xffff810071309e78 0xffffffff880c0896 [lock_dlm]gdlm_thread+0x512
0xffff810071309ea0 0xffffffff8021e1a8 default_wake_function
0xffff810071309ef0 0xffffffff880c0384 [lock_dlm]gdlm_thread
0xffff810071309f08 0xffffffff8023410e keventd_create_kthread
0xffff810071309f18 0xffffffff80234365 kthread+0xd1
0xffff810071309f48 0xffffffff8020a298 child_rip+0xa
0xffff810071309f60 0xffffffff8023410e keventd_create_kthread
0xffff810071309fc8 0xffffffff80234294 kthread
0xffff810071309fd8 0xffffffff8020a28e child_rip

Comment 11 Steve Whitehouse 2007-03-26 11:00:06 UTC

Did the:

print_symbol(KERN_INFO "tried to demote glock in transaction from %s\n",
gl->gl_demoter);

line not print out? Also it would be good if you could turn the gl_ip into a
symbol name as its not much use otherwise. A btp of the xdoio process would have
been useful too as its probably been caused by a combination of the two threads.

It seems that my hunch that it should be possible to trap the bug earlier was
correct. The odd thing here is that it appears that the glock has no pages and
is unlocked as you'd expect at this stage in a demotion. It does however still
appear to be part of a transaction which is unexpected, since if its run through
the demotion code (as suggested by the lack of pages) then it should have
flushed the log at the same time.

We do at least now know that its caused by a demote request from a remote node
and that its an exclusive -> unlocked transistion.

I'll keep looking to see if I can narrow down the cause further.

Comment 12 Abhijith Das 2007-03-27 21:58:25 UTC

Hi Steve,
Here's the additional info you requested when I hit the bug again.
It has the print_symbol() stuff in it. backtraces for the lock_dlm/1 process and
the xdoio processes (4 of them). I also did an id on the gl_ip and looks like it
points to gfs2_glock_dq. Let me know if you need more info.
Thanks,
--Abhi


tried to demote glock in transaction from blocking_cb+0x16/0x33 [gfs2]
drop_bh: pid 3561
Glock 0xcc04ccf0 (2, 5243952)
  gl_flags = 
  gl_ref = 4
  gl_state = 0
  gl_owner = xdoio
  gl_ip = 3763384709
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  le = yes
  reclaim = no
  aspace = 0xdbf0ecc0 nrpages = 0
  ail = 0
  Inode:
    num = 41/5243952
    type = 8
    i_flags = 0 
------------[ cut here ]------------
kernel BUG at fs/gfs2/glock.c:123!
invalid opcode: 0000 [#1]
SMP 
Modules linked in: lock_dlm gfs2 dlm configfs lpfc
CPU:    0
EIP:    0060:[<e0509b9b>]    Not tainted VLI
EFLAGS: 00010292   (2.6.20 #1)
EIP is at glock_in_wrong_state+0x3c/0x40 [gfs2]
eax: 00000000   ebx: cc04ccf0   ecx: ffffffff   edx: 00000046
esi: e0520a12   edi: cea25000   ebp: 00000000   esp: dbe8ff18
ds: 007b   es: 007b   ss: 0068
Process lock_dlm1 (pid: 3561, ti=dbe8e000 task=c343ca70 task.ti=dbe8e000)
Stack: e05207ce e0520a12 00000de9 cc04ccf0 00000000 e050afb6 dbe8ffac cea25000
 
       e051f5a0 cea25000 cc04ccf0 dbe8ffac c2a559c0 e050aa7a 00000286 c012a8ee
 
       df724000 fffefffe dbe8ffbc e006284c 00000000 00000000 00000082 df5abf34
 
Call Trace:
 [<e050afb6>] drop_bh+0x118/0x132 [gfs2]
 [<e050aa7a>] gfs2_glock_cb+0xb1/0x116 [gfs2]
 [<c012a8ee>] remove_wait_queue+0x31/0x35
 [<e006284c>] gdlm_thread+0x5af/0x5fc [lock_dlm]
 [<c0116336>] default_wake_function+0x0/0xd
 [<e006229d>] gdlm_thread+0x0/0x5fc [lock_dlm]
 [<c012a6a1>] kthread+0xb0/0xdb
 [<c012a5f1>] kthread+0x0/0xdb
 [<c0104253>] kernel_thread_helper+0x7/0x10
 =======================
Code: e8 33 9d c2 df 8b 43 34 8b 80 a4 00 00 00 89 74 24 04 c7 04 24 ce 07 52 
e0 89 44 24 08 e8 06 15 c1 df 89 da 31 c0 e8 47 fb ff ff <0f> 0b eb fe 55 89 c
5 57 89 d7 56 53 e8 84 f6 ff ff 89 c3 89 c6 
EIP: [<e0509b9b>] glock_in_wrong_state+0x3c/0x40 [gfs2] SS:ESP 0068:dbe8ff18
 
Entering kdb (current=0xc343ca70, pid 3561) on processor 0 Oops: invalid opcod
e
due to oops @ 0xe0509b9b
eax = 0x00000000 ebx = 0xcc04ccf0 ecx = 0xffffffff edx = 0x00000046 
esi = 0xe0520a12 edi = 0xcea25000 esp = 0xdbe8ff18 eip = 0xe0509b9b 
ebp = 0x00000000 xss = 0xc0380068 xcs = 0x00000060 eflags = 0x00010292 
xds = 0x0050007b xes = 0x0000007b origeax = 0xffffffff &regs = 0xdbe8fee0



[0]kdb> ps
37 sleeping system daemon (state M) processes suppressed
Task Addr       Pid   Parent [*] cpu State Thread     Command
0xc343ca70     3561        7  1    0   R  0xc343cc20 *lock_dlm1

0xdf586a70        1        0  0    0   S  0xdf586c20  init
0xdf50b550      960        1  0    0   S  0xdf50b700  udevd
0xdf263550     2658        1  0    0   S  0xdf263700  dhclient
0xdf6aa030     2759        1  0    0   S  0xdf6aa1e0  syslogd
0xdf69b550     2762        1  0    0   R  0xdf69b700  klogd
0xdf407030     2792        1  0    0   S  0xdf4071e0  portmap
0xdf6a6030     2812        1  0    0   S  0xdf6a61e0  rpc.statd
0xdf57d550     2882        1  0    0   S  0xdf57d700  ccsd
0xdf6b0a70     2883        1  0    0   S  0xdf6b0c20  ccsd
0xc14ea030     2888        1  0    0   R  0xc14ea1e0  aisexec
0xdf50ba70     2889        1  0    0   S  0xdf50bc20  aisexec
0xdf407a70     2920        1  0    0   S  0xdf407c20  aisexec
0xdf50b030     2927        1  0    0   S  0xdf50b1e0  aisexec
0xdec5c550     2928        1  0    0   S  0xdec5c700  aisexec
0xc14ea550     2947        1  0    0   S  0xc14ea700  aisexec
0xc1485550     2948        1  0    0   S  0xc1485700  aisexec
0xdf695a70     2949        1  0    0   S  0xdf695c20  aisexec
0xc1480a70     2950        1  0    0   S  0xc1480c20  aisexec
[0]more> 
0xdf595550     2953        1  0    0   S  0xdf595700  aisexec
0xc1480550     2954        1  0    0   S  0xc1480700  aisexec
0xdf40b030     2983        1  0    0   S  0xdf40b1e0  aisexec
0xdf6aa550     2984        1  0    0   S  0xdf6aa700  aisexec
0xcea21a70     3556        1  0    0   S  0xcea21c20  aisexec
0xcea21550     3557        1  0    0   S  0xcea21700  aisexec
0xd0db5030     3570        1  0    0   S  0xd0db51e0  aisexec
0xd0dbfa70     3571        1  0    0   S  0xd0dbfc20  aisexec
0xdf595030     2926        1  0    0   S  0xdf5951e0  groupd
0xdec5ca70     2934        1  0    0   S  0xdec5cc20  fenced
0xc1485a70     2940        1  0    0   S  0xc1485c20  dlm_controld
0xdf695550     2946        1  0    0   S  0xdf695700  gfs_controld
0xdf57da70     2966        1  0    0   S  0xdf57dc20  dbus-daemon
0xdf6f5030     2977        1  0    0   S  0xdf6f51e0  clvmd
0xdf263030     2993        1  0    0   S  0xdf2631e0  clvmd
0xdf263a70     2994        1  0    0   S  0xdf263c20  clvmd
0xdf69b030     3081        1  0    0   S  0xdf69b1e0  pcscd
0xc4b8c550     3086        1  0    0   S  0xc4b8c700  pcscd
0xd6ea2550     3108        1  0    0   S  0xd6ea2700  automount
0xdec5ba70     3109        1  0    0   S  0xdec5bc20  automount
0xd6850a70     3110        1  0    0   S  0xd6850c20  automount
0xd6ea2a70     3113        1  0    0   S  0xd6ea2c20  automount
0xd6ea2030     3116        1  0    0   S  0xd6ea21e0  automount
[0]more> 
0xcd7cea70     3126        1  0    0   S  0xcd7cec20  acpid
0xc69dfa70     3136        1  0    0   S  0xc69dfc20  cupsd
0xcd7ce030     3149        1  0    0   S  0xcd7ce1e0  sshd
0xdd688030     3160        1  0    0   S  0xdd6881e0  xinetd
0xc69df550     3173        1  0    0   S  0xc69df700  ntpd
0xc343c030     3193        1  0    0   S  0xc343c1e0  sendmail
0xc343c550     3201        1  0    0   S  0xc343c700  sendmail
0xdcc23a70     3220        1  0    0   S  0xdcc23c20  gpm
0xdcc2f030     3246        1  0    0   S  0xdcc2f1e0  crond
0xd8cfc550     3270        1  0    0   S  0xd8cfc700  xfs
0xdcc23550     3289        1  0    0   S  0xdcc23700  atd
0xdcc23030     3301        1  0    0   S  0xdcc231e0  yum-updatesd
0xde941550     3312        1  0    0   S  0xde941700  avahi-daemon
0xde941030     3313     3312  0    0   S  0xde9411e0  avahi-daemon
0xcd7ce550     3323        1  0    0   S  0xcd7ce700  hald
0xc69df030     3324     3323  0    0   S  0xc69df1e0  hald-runner
0xdcc2fa70     3330     3324  0    0   S  0xdcc2fc20  hald-addon-acpi
0xd8cfca70     3337     3324  0    0   S  0xd8cfcc20  hald-addon-keyb
0xdd688a70     3339     3324  0    0   S  0xdd688c20  hald-addon-keyb
0xd8cfc030     3348     3324  0    0   S  0xd8cfc1e0  hald-addon-stor
0xdd688550     3390        1  0    0   S  0xdd688700  smartd
0xdf6f2030     3393        1  0    0   S  0xdf6f21e0  agetty
0xd0d31550     3394        1  0    0   S  0xd0d31700  mingetty
[0]more> 
Only 'q' or 'Q' are processed at more prompt, input ignored
0xde941a70     3395        1  0    0   S  0xde941c20  mingetty
0xdcc2f550     3396        1  0    0   S  0xdcc2f700  mingetty
0xd3f97a70     3397        1  0    0   S  0xd3f97c20  mingetty
0xd0d31030     3398        1  0    0   S  0xd0d311e0  mingetty
0xd3f97550     3399        1  0    0   S  0xd3f97700  mingetty
0xd0d31a70     3464     3149  0    0   S  0xd0d31c20  sshd
0xd3f97030     3468     3464  0    0   S  0xd3f971e0  bash
0xc343ca70     3561        7  1    0   R  0xc343cc20 *lock_dlm1
0xccffca70     3562        7  0    0   R  0xccffcc20  lock_dlm2
0xd6850550     3591     3160  0    0   S  0xd6850700  btimed
0xc4b8c030     3664     3160  0    0   S  0xc4b8c1e0  qarshd
0xc4b8ca70     3665     3664  0    0   R  0xc4b8cc20  xdoio
0xdf6aaa70     3666     3665  0    0   R  0xdf6aac20  xdoio
0xd6850030     3671     3160  0    0   S  0xd68501e0  qarshd
0xdf69ba70     3672     3671  0    0   R  0xdf69bc20  xdoio
0xd0db5550     3673     3672  0    0   R  0xd0db5700  xdoio



[0]kdb> btp 3561
Stack traceback for pid 3561
0xc343ca70     3561        7  1    0   R  0xc343cc20 *lock_dlm1
esp        eip        Function (args)
0xdbe8ff0c 0xe0509b9b [gfs2]glock_in_wrong_state+0x3c
0xdbe8ff2c 0xe050afb6 [gfs2]drop_bh+0x118
0xdbe8ff4c 0xe050aa7a [gfs2]gfs2_glock_cb+0xb1
0xdbe8ff54 0xc012a8ee remove_wait_queue+0x31
0xdbe8ff64 0xe006284c [lock_dlm]gdlm_thread+0x5af
0xdbe8ffa0 0xc0116336 default_wake_function
0xdbe8ffc4 0xe006229d [lock_dlm]gdlm_thread
0xdbe8ffcc 0xc012a6a1 kthread+0xb0
0xdbe8ffd8 0xc012a5f1 kthread
0xdbe8ffe4 0xc0104253 kernel_thread_helper+0x7

[0]kdb> btp 3665
Stack traceback for pid 3665
0xc4b8ca70     3665     3664  0    0   R  0xc4b8cc20  xdoio
esp        eip        Function (args)
0xcc063f40 0xc0410033 __sched_text_start+0x863
0xcc063f60 0xc039a83b net_rx_action+0x94
0xcc063f98 0xc011ea32 irq_exit+0x35
0xcc063f9c 0xc01055e3 do_IRQ+0x87
0xcc063fb4 0xc010367e work_resched+0x5

[0]kdb> btp 3666
Stack traceback for pid 3666
0xdf6aaa70     3666     3665  0    0   R  0xdf6aac20  xdoio
esp        eip        Function (args)
0xcc049f30 0xc0410033 __sched_text_start+0x863
0xcc049f48 0xc015c35c locks_free_lock+0x3a
0xcc049f50 0xc015d729 fcntl_setlk64+0x1c4
0xcc049fa4 0xc01163ed sys_sched_yield+0x59
0xcc049fb4 0xc01035a0 sysenter_past_esp+0x5d

[0]kdb> btp 3672
Stack traceback for pid 3672
0xdf69ba70     3672     3671  0    0   R  0xdf69bc20  xdoio
esp        eip        Function (args)
0xc7cb9f40 0xc0410033 __sched_text_start+0x863
0xc7cb9f64 0xc0121bab run_timer_softirq+0x36
0xc7cb9fb4 0xc010367e work_resched+0x5

[0]kdb> btp 3673
Stack traceback for pid 3673
0xd0db5550     3673     3672  0    0   R  0xd0db5700  xdoio
esp        eip        Function (args)
0xdbf53f30 0xc0410033 __sched_text_start+0x863
0xdbf53f48 0xc015c35c locks_free_lock+0x3a
0xdbf53f50 0xc015d729 fcntl_setlk64+0x1c4
0xdbf53fa4 0xc01163ed sys_sched_yield+0x59
0xdbf53fb4 0xc01035a0 sysenter_past_esp+0x5d
0xdbf53fdc 0xc0410033 __sched_text_start+0x863




gl_ip = 3763384709 = 0xe050b185
[0]kdb> id 0xe050b185
0xe050b185 gfs2_glock_dq+0x22:    lea    0x1c(%ebx),%edi
0xe050b188 gfs2_glock_dq+0x25:    mov    %edi,%eax
0xe050b18a gfs2_glock_dq+0x27:    call   0xc041151e _spin_lock
0xe050b18f gfs2_glock_dq+0x2c:    mov    0x4(%esi),%eax
0xe050b192 gfs2_glock_dq+0x2f:    mov    (%esi),%edx
0xe050b194 gfs2_glock_dq+0x31:    mov    %edx,(%eax)
0xe050b196 gfs2_glock_dq+0x33:    mov    %eax,0x4(%edx)
0xe050b199 gfs2_glock_dq+0x36:    lea    0x40(%ebx),%eax
0xe050b19c gfs2_glock_dq+0x39:    mov    %esi,(%esi)
0xe050b19e gfs2_glock_dq+0x3b:    cmp    %eax,0x40(%ebx)
0xe050b1a1 gfs2_glock_dq+0x3e:    mov    %esi,0x4(%esi)
0xe050b1a4 gfs2_glock_dq+0x41:    jne    0xe050b1c4 gfs2_glock_dq+0x61
0xe050b1a6 gfs2_glock_dq+0x43:    movb   $0x1,0x1c(%ebx)
0xe050b1aa gfs2_glock_dq+0x47:    mov    0x18(%ebp),%edx
0xe050b1ad gfs2_glock_dq+0x4a:    test   %edx,%edx
0xe050b1af gfs2_glock_dq+0x4c:    je     0xe050b1b5 gfs2_glock_dq+0x52

Comment 13 Steve Whitehouse 2007-04-02 10:42:32 UTC

Created attachment 151413 [details]
An updated test patch

Sorry, it looks like there was a race in the previous patch I posted. Here is
an updated version which fixes this problem.

I've had a good look through the results though (which may well still be
correct, though it seems as well to be certain) and I still can't see what is
going on here.

Comment 14 Ben Marzinski 2007-04-04 23:03:15 UTC

I have been totally unable to recreate this issue. I've tried various types of
bonnie++ runs, as well as du's in combination with rm's. Are people still able
to reproduce this on the latest gfs2 code? If so, could someone please put a list of
commands necessary to reproduce this into the bugzilla, along with information
about their setup.

Comment 15 Andrey Mirkin 2007-04-05 06:01:50 UTC

bonnie++ version: 1.02
Command line: bonnie++ -d DIR -s 2000 -n 200:50000:0:100 -u root -m server
Cluster config: only one node is present in cluster

<?xml version="1.0"?>
<cluster name="alpha" config_version="2">
<cman>
</cman>
<clusternodes>
<clusternode name="ts25" votes="1" nodeid="1">
        <fence>
                <method name="single">
                        <device name="human" ipaddr="ts25"/>
                </method>
        </fence>
</clusternode>
</clusternodes>
<fencedevices>
        <fencedevice name="human" agent="fence_manual"/>
</fencedevices>
</cluster>


I was not able to test the latest gfs2 code, but this bug can be easy reproduced
on RHEL5 kernel.

Comment 16 Josef Bacik 2007-04-06 17:02:53 UTC

i can still reproduce this by rm -rf'ing the kernel source and then du'ing on 
another node.

Comment 17 Andrey Mirkin 2007-04-09 12:59:13 UTC

Hi Steve,

Can you, please, prepare a debug patch for 2.6.18-el5.8 kernel.

Comment 18 Josef Bacik 2007-04-10 20:07:51 UTC

*** Bug 218984 has been marked as a duplicate of this bug. ***

Comment 19 Ben Marzinski 2007-04-26 17:27:40 UTC

O.k. I've figured out basically what's wrong here. I have a patch that even
stops it. However, I don't understand This chunk of code very well, so I have no
idea if my solution is the correct way to solve the problem.

What is happening is:
One node has a file locked exclusively, and another node requests either a
shared or exclusive lock on the file. In the process of lock_dlm handling this
request, it calls inode_go_sync() to make sure that everything associated with
this glock is synced to disk.  However, at the end of inode_go_sync(), the glock
and some associated buffers are still on the incore log.  Actually, the
gfs2_log_flush() in inode_go_sync() removes it from the log, but the
filemap_fdatawrite() adds it back.

My solution to this is to always flush the log again at the end of
inode_go_sync(), instead of only if there are blocks on the glock's active items
list. This does make sure that the glock is definitely out of the log, but I'm
not sure that this is the most efficient solution, or even the correct place to
fix the problem at. I'll post the patch, and keep looking into the problem to
see if there's a more sensible solution.  If someone else, especially Steve,
could have a look at this and tell me what they think, that would be great.

Comment 20 Ben Marzinski 2007-04-26 22:27:51 UTC

Created attachment 153557 [details]
Force gfs2_ail_empty_gl to flush log

This patch makes the problem go away. However, I'm still not sure if this is
the correct way to fix it.

Comment 21 Steve Whitehouse 2007-04-27 08:25:58 UTC

Its possible that the problem is just that we are flushing things in the wrong
order, in fact for ordered writes, it looks like this is certainly the case as
we ought to ensure that we flush the log _after_ the data has been written to
disk. So in this case the inode_go_sync() should do a flush and a wait on the
data before the log flush is called, so that the problem should disappear due to
the different ordering. For writeback, we can try something which is potentially
a bit more efficient, in that we don't need to wait for the data to be written
out before the log flush, so that (pseudo code) it might look something like this:

filemap_fdatawrite();
if (ordered_data)
        filemap_fdatawait();
gfs2_log_flush();
if (writeback_data)
        filemap_fdatawait();

The other thing that I wonder about is whether in fact the call to
filemap_fdatawrite() should be adding the glock to the transaction, although I
suppose that in the ordered case it has to, otherwise we'd not get proper
flushing of all the data at the correct time.

The changes to the ordering of these operations have a lot of potential for
changing the performance, so I'd suggest trying a few postmark runs before &
after any changes just to see what difference it will make.

Just out of interest, does this bug go away if you mount with writeback rather
than ordered?

Comment 22 Steve Whitehouse 2007-05-02 10:41:01 UTC

Ben,

is there any more progress on this one?

Comment 23 Ben Marzinski 2007-05-04 15:52:07 UTC

Created attachment 154134 [details]
flush after filemap_fdatawrite

This fixes the problem for me. It simply moves the flush until after the inode
filemap_fdatawrite, so that the databufs from it get flushed as well.

Comment 24 Steve Whitehouse 2007-05-04 15:59:54 UTC

I'll push this into -nmw just as soon as Linus pulls the current tree.

Comment 25 Ben Marzinski 2007-05-04 16:28:27 UTC

Just a heads up. I went though and ran postmark on clean filesystems with a
fresh installed kernel for both with and without this patch. I was wrong about
the performance imporvement. Doing five runs on each, it ran faster without the
patch on three, and faster with on two. But there wasn't a huge difference
either way.

Comment 26 Steve Whitehouse 2007-05-08 08:15:55 UTC

Now in the upstream git tree, so its ready to be posted to rhkernel for RHEL at
this stage.

Comment 27 Ben Marzinski 2007-05-08 20:26:31 UTC

*** Bug 232010 has been marked as a duplicate of this bug. ***

Comment 28 Steve Whitehouse 2007-05-10 10:50:14 UTC

I've just posted this to rhkernel-list to try and ensure that it gets into RHEL
5.1 before Don goes off on holiday.

Comment 29 Don Zickus 2007-05-11 22:06:51 UTC

in 2.6.18-19.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 31 Ben Marzinski 2007-07-31 22:33:59 UTC

*** Bug 235462 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2007-11-07 19:43:46 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html