Bug 429546 - lock_dlm: plock device version mismatch: kernel (1.1.0), user (16777216.16777216.0)
lock_dlm: plock device version mismatch: kernel (1.1.0), user (16777216.16777...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.1
ppc64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: David Teigland
GFS Bugs
: ZStream
Depends On:
Blocks: Cluster5-ppc 437164
  Show dependency treegraph
 
Reported: 2008-01-21 09:36 EST by Nate Straz
Modified: 2009-04-16 18:19 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2008-0347
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:58:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Nate Straz 2008-01-21 09:36:54 EST
Description of problem:

While running g2 on GFS, the first single reader multi writer test case produced
the following log entries.

Jan 20 16:04:10 basic gfs_controld[28493]: receive_plock from 1 header 1 info
16777216
Jan 20 16:04:10 basic gfs_controld[28493]: plock result write err -1 errno 22
Jan 20 16:04:10 basic kernel: lock_dlm: plock device version mismatch: kernel
(1.1.0), user (16777216.16777216.0)
Jan 20 16:04:10 basic gfs_controld[28493]: receive_plock from 1 header 1 info
16777216
Jan 20 16:04:10 basic gfs_controld[28493]: plock result write err -1 errno 22
Jan 20 16:04:10 basic kernel: lock_dlm: plock device version mismatch: kernel
(1.1.0), user (16777216.16777216.0)

Both writers hung in D state with the following backtraces:

 xdoio         D 000000000ff0df40 12368 29572      1               29576 (NOTLB)
 Call Trace:
 [C000000002433590] [C000000002433620] 0xc000000002433620 (unreliable)
 [C000000002433760] [C0000000000106C0] .__switch_to+0x130/0x154
 [C0000000024337F0] [C00000000035B0C8] .schedule+0xae0/0xc70
 [C000000002433900] [D000000000A03138] .gdlm_plock+0x158/0x240 [lock_dlm]
 [C0000000024339E0] [D000000000ACD8AC] .gfs_lm_plock+0x4c/0x68 [gfs]
 [C000000002433A60] [D000000000AD81F0] .gfs_lock+0x144/0x168 [gfs]
 [C000000002433BB0] [C000000000109EEC] .fcntl_setlk+0x180/0x2fc
 [C000000002433CB0] [C000000000105454] .sys_fcntl+0x320/0x3f8
 [C000000002433D50] [C000000000129B90] .compat_sys_fcntl64+0x348/0x514
 [C000000002433E30] [C0000000000086A4] syscall_exit+0x0/0x40
 xdoio         D 000000000ff0df40 12368 29576      1         29572 28994 (NOTLB)
 Call Trace:
 [C0000000769D3590] [C0000000769D3620] 0xc0000000769d3620 (unreliable)
 [C0000000769D3760] [C0000000000106C0] .__switch_to+0x130/0x154
 [C0000000769D37F0] [C00000000035B0C8] .schedule+0xae0/0xc70
 [C0000000769D3900] [D000000000A03138] .gdlm_plock+0x158/0x240 [lock_dlm]
 [C0000000769D39E0] [D000000000ACD8AC] .gfs_lm_plock+0x4c/0x68 [gfs]
 [C0000000769D3A60] [D000000000AD81F0] .gfs_lock+0x144/0x168 [gfs]
 [C0000000769D3BB0] [C000000000109EEC] .fcntl_setlk+0x180/0x2fc
 [C0000000769D3CB0] [C000000000105454] .sys_fcntl+0x320/0x3f8
 [C0000000769D3D50] [C000000000129B90] .compat_sys_fcntl64+0x348/0x514
 [C0000000769D3E30] [C0000000000086A4] syscall_exit+0x0/0x40


Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5_1.4.ppc
gfs-utils-0.1.12-1.el5.ppc
gfs2-utils-0.1.38-1.el5.ppc
kernel-2.6.18-53.el5.ppc64
kernel-2.6.18-53.1.6.el5.ppc64
kmod-gfs-0.1.19-7.el5_1.1.ppc64

How reproducible:
Hit on first try.  Will reset and try again
Comment 1 Nate Straz 2008-01-21 10:06:41 EST
Steps to Reproduce:

1. cd /mnt/gfs
2. export PATH=$PATH:/usr/tests/sts-rhel5.1/bin
3. xiogen -n -i 1 -f buffered -m sequential -s write -F
1000b:single_reader_mw_buffered_sequential | xdoio -vk
Comment 2 Nate Straz 2008-01-21 10:16:54 EST
Some information requested by Dave after starting gfs_controld w/ -P.

[root@doral gfs_vs0]# group_tool dump gfs gfs_vs0
1200928471 listen 1
1200928471 cpg 4
1200928471 groupd 6
1200928471 uevent 7
1200928471 plocks 10
1200928471 setup done
1200928493 process_plocks: no mg id 40001
1200928493 process_plocks: no mg id 40001
1200928521 client 6: dump
[root@doral gfs_vs0]# group_tool
type             level name     id       state       
fence            0     default  00010001 none        
[1 2 3 4]
dlm              1     clvmd    00030001 none        
[1 2 3 4]
dlm              1     gfs_vs0  00050001 none        
[1 2 3 4]
gfs              2     gfs_vs0  00040001 none        
[1 2 3 4]
Comment 3 David Teigland 2008-01-21 14:59:47 EST
This is an alignment problem.  Things work if we do the byte-swapping on the
original structure and then copy it into the final buffer, instead of copying
first and then trying to do the byte-swapping at an offset within the send buffer.
Comment 4 David Teigland 2008-01-21 15:32:26 EST
fix committed to HEAD, RHEL5 and RHEL51 branches
Comment 7 Rob Kenna 2008-03-12 14:05:48 EDT
Marking as 5.1z fix for ppc support.
Comment 10 errata-xmlrpc 2008-05-21 11:58:40 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0347.html

Note You need to log in before you can comment on or make changes to this bug.