Bug 234086 - dlm locking error from gfs dio/aio during virt machine migration
dlm locking error from gfs dio/aio during virt machine migration
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
: 236277 (view as bug list)
Depends On:
Blocks: 238731
  Show dependency treegraph
 
Reported: 2007-03-26 18:21 EDT by David Teigland
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 14:45:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch posted to rhkernel on 04/19 (4.89 KB, text/plain)
2007-04-19 13:38 EDT, David Teigland
no flags Details
same change as above, but the patch applies against the 5.0 kernel (4.91 KB, text/plain)
2007-04-25 15:24 EDT, David Teigland
no flags Details

  None (edit)
Description David Teigland 2007-03-26 18:21:48 EDT
Description of problem:

We don't have a clear picture of the problem yet because there are so many
different things going on and very little debugging is enabled in the default
RHEL5 installation.  Here's what we know:

- Paul had a two node gfs cluster on which he was storing the image files
  for virtual machines that are forming their own cluster (non gfs)

- he attempted to enable the rgmanager virtual service (on cmd line with
  clusvcadm) for one of the virtual machines that was running.

- gfs on one of the physical machines withdrew after the dlm returned
  an unknown error

ar 26 10:01:58 ask-07 clurgmgrd[19198]: <notice> Recovering failed service vm:pk_4
Mar 26 10:02:00 ask-07 kernel: tap tap-10-51712: 2 getting info
Mar 26 10:02:01 ask-07 kernel: device vif10.0 entered promiscuous mode
Mar 26 10:02:01 ask-07 kernel: ADDRCONF(NETDEV_UP): vif10.0: link is not ready
Mar 26 10:02:01 ask-07 clurgmgrd[19198]: <notice> Service vm:pk_4 started
Mar 26 10:02:05 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1: fatal: assertion
"!(lck_ret & LM_OUT_ERROR)" failed
Mar 26 10:02:05 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1:   function =
gfs_glock_xmote_th
Mar 26 10:02:05 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1:   file =
/builddir/build/BUILD/gfs-kmod-0.1.16/_kmod_build_xen/src/gfs/glock.c, line = 1162
Mar 26 10:02:05 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1:   time = 1174921325
Mar 26 10:02:05 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1: about to withdraw
from the cluster
Mar 26 10:02:05 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1: telling LM to withdraw
Mar 26 10:02:06 ask-07 kernel: GFS: fsid=ask_cluster:gfs.1: withdrawn
Mar 26 10:02:06 ask-07 kernel:  [<ee73b8ae>] gfs_lm_withdraw+0x76/0x82 [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee74f812>] gfs_assert_withdraw_i+0x26/0x31 [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee7345b4>] gfs_glock_xmote_th+0x138/0x17c [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee7330e2>] run_queue+0x236/0x2f9 [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee733702>] gfs_glock_nq+0x31d/0x36f [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee74353a>] do_read_direct+0x76/0x17f [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee7421b0>] walk_vm+0xb5/0x274 [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee742499>] __gfs_read+0x43/0x4a [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<ee7434c4>] do_read_direct+0x0/0x17f [gfs]
Mar 26 10:02:06 ask-07 kernel:  [<c047b6b5>] aio_pread+0x25/0x6f
Mar 26 10:02:06 ask-07 kernel:  [<c047b690>] aio_pread+0x0/0x6f
Mar 26 10:02:06 ask-07 kernel:  [<c047c023>] aio_run_iocb+0xbd/0x141
Mar 26 10:02:06 ask-07 kernel:  [<c047c79c>] io_submit_one+0x24b/0x2a1
Mar 26 10:02:06 ask-07 kernel:  [<c047cc39>] sys_io_submit+0x82/0xd8
Mar 26 10:02:06 ask-07 kernel:  [<c0404cff>] syscall_call+0x7/0xb

some of the processes that were running:

 3301 S    xenstored --pid-file /var/r -
 3311 S    python /usr/sbin/xend start wait
 3312 Sl   python /usr/sbin/xend start stext
 3314 Sl   xenconsoled                 stext
 3316 Ssl  blktapctrl                  stext
 3708 Ss   /sbin/dhclient -1 -q -lf /v -
 3840 Ssl  /sbin/ccsd                  stext
 3848 SLl  aisexec                     stext
 3864 Ss   /sbin/groupd                -
 3872 Ss   /sbin/fenced                -
 3878 Ss   /sbin/dlm_controld          -
 3884 Ss   /sbin/gfs_controld          -
 3971 Ssl  clvmd -T20                  stext
 3977 S<   [dlm_astd]                  dlm_astd
 3978 S<   [dlm_scand]                 -
 3979 S<   [dlm_recvd]                 dlm_recvd
 3980 S<   [dlm_sendd]                 dlm_sendd
 3981 S<   [dlm_recoverd]              dlm_recoverd
 4394 S<   [gfs_scand]                 -
 4395 S<   [gfs_glockd]                gfs_glockd
 4398 S<   [gfs_recoverd]              -
 4399 S<   [gfs_logd]                  -
 4400 S<   [gfs_quotad]                -
 4401 S<   [gfs_inoded]                -
 4745 Rs   sshd: root@pts/8            -
 4755 Ss   -bash                       wait
 4918 R+   ps ax -o pid,stat,cmd,wchan -
18461 S<s  ricci -u 100                -
19197 S<s  clurgmgrd                   wait
19198 S<l  clurgmgrd                   stext
19199 S<   [dlm_recoverd]              dlm_recoverd
20496 Sl   tapdisk /dev/xen/tapctrlwri stext
20520 Sl   /usr/lib/xen/bin/xen-vncfb  stext
20587 Sl   tapdisk /dev/xen/tapctrlwri stext
20600 Sl   /usr/lib/xen/bin/xen-vncfb  stext
20676 Sl   tapdisk /dev/xen/tapctrlwri stext
20689 Sl   /usr/lib/xen/bin/xen-vncfb  stext
20795 S<   [xvd 7]                     tap_blkif_schedule
20796 S<   [xvd 8]                     tap_blkif_schedule
20797 S<   [xvd 9]                     tap_blkif_schedule
29559 Ss   sshd: root@pts/0            -
29566 Ss+  -bash                       -
29657 S    /usr/libexec/gconfd-2 8     -
29669 S    python /usr/share/virt-mana -
29931 Ss   sshd: root@pts/4            -
29933 Ss   -bash                       wait
30021 S+   tail -f /var/log/messages   -
31529 Dl   tapdisk /dev/xen/tapctrlwri glock_wait_internal
31553 Sl   /usr/lib/xen/bin/xen-vncfb  stext
31637 S<   [xvd 10]                    tap_blkif_schedule



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Lon Hohberger 2007-03-27 15:07:38 EDT
Hmm...

This is working on the simpsons cluster - they're x86_64, using GFS to hold the
virtual machines' root disks.

Any hints maybe as to what the VM was doing at the time? (writing to disk?) 
Ryan has seen strange behaviors where he has to rebuild the root disk images; I
wonder if this is related.
Comment 2 Ryan McCabe 2007-03-27 16:07:35 EDT
The behavior I was seeing--apparent filesystem corruption in the guests--was
happening when the guest disk images were LV-backed with the LVs on a clustered
VG. I'm testing now with file-backed guest images on GFS to see if I can
reproduce the behavior. So far, no problems, though.
Comment 3 Wendy Cheng 2007-03-27 18:56:05 EDT
Only happen with direct IO and AIO combination ? How about the default mode
(buffer IO) ? 
Comment 4 Lon Hohberger 2007-03-28 11:18:05 EDT
I think the problem is we're not sure exactly what was doing the writing at the
time of the panic... xend?  Xen doing some hypervisor magic?  

This makes it hard to try it with other access methods :(
Comment 5 David Teigland 2007-03-28 11:24:09 EDT
Yeah, I'm surprised the kernel didn't print a pid in the backtrace....
Would it be "tapdisk /dev/xen/tapctrlwri" from the ps?
Comment 6 David Teigland 2007-03-28 11:49:24 EDT
Latest backtrace did not include aio/dio functions:

xenbr0: port 3(vif1.0) entering learning state
xenbr0: topology change detected, propagating
xenbr0: port 3(vif1.0) entering forwarding state
GFS: fsid=ask_cluster:gfs.0: fatal: assertion "!(lck_ret & LM_OUT_ERROR)" failed
GFS: fsid=ask_cluster:gfs.0:   function = gfs_glock_xmote_th
GFS: fsid=ask_cluster:gfs.0:   file = /builddir/build/BUILD/gfs-kmod-0.1.16/_kmo
d_build_xen/src/gfs/glock.c, line = 1162
GFS: fsid=ask_cluster:gfs.0:   time = 1175069757
GFS: fsid=ask_cluster:gfs.0: about to withdraw from the cluster
GFS: fsid=ask_cluster:gfs.0: telling LM to withdraw
GFS: fsid=ask_cluster:gfs.0: withdrawn
 [<ee7388ae>] gfs_lm_withdraw+0x76/0x82 [gfs]
 [<ee74c812>] gfs_assert_withdraw_i+0x26/0x31 [gfs]
 [<ee7315b4>] gfs_glock_xmote_th+0x138/0x17c [gfs]
 [<ee7300e2>] run_queue+0x236/0x2f9 [gfs]
 [<ee730702>] gfs_glock_nq+0x31d/0x36f [gfs]
 [<ee732d6b>] gfs_glock_nq_atime+0x145/0x291 [gfs]
 [<ee740697>] do_read_buf+0x54/0x104 [gfs]
 [<ee73f1b0>] walk_vm+0xb5/0x274 [gfs]
 [<ee7407e7>] gfs_open+0x0/0x126 [gfs]
 [<c0460c76>] __dentry_open+0xea/0x1ab
 [<ee73f499>] __gfs_read+0x43/0x4a [gfs]
 [<ee740643>] do_read_buf+0x0/0x104 [gfs]
 [<ee73f4db>] gfs_read+0x0/0xe [gfs]
 [<ee73f4e6>] gfs_read+0xb/0xe [gfs]
 [<c046285c>] vfs_read+0x9f/0x141
 [<c0462caa>] sys_read+0x3c/0x63
 [<c0404cff>] syscall_call+0x7/0xb
 =======================
Comment 7 David Teigland 2007-04-16 15:54:42 EDT
This is probably related to the "altmode" code in the dlm that's used by
gfs for apps doing direct-io (like blktap).  It may also be related to
the conversion-deadlock resolution code which is the current subject of
bug 236277.


ask-08

Resource ebc532c0 Name (len=24) "       2         1faf64d"  
Local Copy, Master is node 1
Granted Queue
Conversion Queue
000103d0 PR (--) Master:     00010028
Waiting Queue

ask-07

Resource e9846dc0 Name (len=24) "       2         1faf64d"  
Master Copy
Granted Queue
00010028 PR Remote:   2 000103d0
Conversion Queue
Waiting Queue


5111 Dl   tapdisk /dev/xen/tapctrlwri glock_wait_internal
Comment 8 David Teigland 2007-04-19 09:48:50 EDT
There are flags to enable two specialized features in the dlm:

1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
changing the granted mode of locks to NL.
2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR or CW
to grant them if the main mode can't be granted.

GFS direct i/o exercises both of these features, especially when mixed with
buffered i/o.  The dlm has problems with them.

The first problem is on the master node. If it demotes a lock as a part of
converting it, the actual step of converting the lock isn't being done
after the demotion, the lock is just left sitting on the granted queue with a
granted mode of NL.  I think the mistaken assumption was that the call to
grant_pending_locks() would grant it, but that function naturally doesn't
look at locks on the granted queue.

The second problem is on the process node.  If the master either demotes
or gives an altmode, the munging of the gr/rq modes is never done in the
process copy of the lock, leaving the master/process copies out of sync.

Comment 9 David Teigland 2007-04-19 09:50:13 EDT
*** Bug 236277 has been marked as a duplicate of this bug. ***
Comment 10 RHEL Product and Program Management 2007-04-19 13:15:18 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 11 David Teigland 2007-04-19 13:38:02 EDT
Created attachment 153038 [details]
patch posted to rhkernel on 04/19
Comment 13 David Teigland 2007-04-25 15:24:49 EDT
Created attachment 153444 [details]
same change as above, but the patch applies against the 5.0 kernel
Comment 15 Don Zickus 2007-05-01 14:04:39 EDT
in 2.6.18-17.el5
Comment 19 errata-xmlrpc 2007-11-07 14:45:19 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.