Description of problem: While running g2 on GFS, the first single reader multi writer test case produced the following log entries. Jan 20 16:04:10 basic gfs_controld[28493]: receive_plock from 1 header 1 info 16777216 Jan 20 16:04:10 basic gfs_controld[28493]: plock result write err -1 errno 22 Jan 20 16:04:10 basic kernel: lock_dlm: plock device version mismatch: kernel (1.1.0), user (16777216.16777216.0) Jan 20 16:04:10 basic gfs_controld[28493]: receive_plock from 1 header 1 info 16777216 Jan 20 16:04:10 basic gfs_controld[28493]: plock result write err -1 errno 22 Jan 20 16:04:10 basic kernel: lock_dlm: plock device version mismatch: kernel (1.1.0), user (16777216.16777216.0) Both writers hung in D state with the following backtraces: xdoio D 000000000ff0df40 12368 29572 1 29576 (NOTLB) Call Trace: [C000000002433590] [C000000002433620] 0xc000000002433620 (unreliable) [C000000002433760] [C0000000000106C0] .__switch_to+0x130/0x154 [C0000000024337F0] [C00000000035B0C8] .schedule+0xae0/0xc70 [C000000002433900] [D000000000A03138] .gdlm_plock+0x158/0x240 [lock_dlm] [C0000000024339E0] [D000000000ACD8AC] .gfs_lm_plock+0x4c/0x68 [gfs] [C000000002433A60] [D000000000AD81F0] .gfs_lock+0x144/0x168 [gfs] [C000000002433BB0] [C000000000109EEC] .fcntl_setlk+0x180/0x2fc [C000000002433CB0] [C000000000105454] .sys_fcntl+0x320/0x3f8 [C000000002433D50] [C000000000129B90] .compat_sys_fcntl64+0x348/0x514 [C000000002433E30] [C0000000000086A4] syscall_exit+0x0/0x40 xdoio D 000000000ff0df40 12368 29576 1 29572 28994 (NOTLB) Call Trace: [C0000000769D3590] [C0000000769D3620] 0xc0000000769d3620 (unreliable) [C0000000769D3760] [C0000000000106C0] .__switch_to+0x130/0x154 [C0000000769D37F0] [C00000000035B0C8] .schedule+0xae0/0xc70 [C0000000769D3900] [D000000000A03138] .gdlm_plock+0x158/0x240 [lock_dlm] [C0000000769D39E0] [D000000000ACD8AC] .gfs_lm_plock+0x4c/0x68 [gfs] [C0000000769D3A60] [D000000000AD81F0] .gfs_lock+0x144/0x168 [gfs] [C0000000769D3BB0] [C000000000109EEC] .fcntl_setlk+0x180/0x2fc [C0000000769D3CB0] [C000000000105454] .sys_fcntl+0x320/0x3f8 [C0000000769D3D50] [C000000000129B90] .compat_sys_fcntl64+0x348/0x514 [C0000000769D3E30] [C0000000000086A4] syscall_exit+0x0/0x40 Version-Release number of selected component (if applicable): cman-2.0.73-1.el5_1.4.ppc gfs-utils-0.1.12-1.el5.ppc gfs2-utils-0.1.38-1.el5.ppc kernel-2.6.18-53.el5.ppc64 kernel-2.6.18-53.1.6.el5.ppc64 kmod-gfs-0.1.19-7.el5_1.1.ppc64 How reproducible: Hit on first try. Will reset and try again
Steps to Reproduce: 1. cd /mnt/gfs 2. export PATH=$PATH:/usr/tests/sts-rhel5.1/bin 3. xiogen -n -i 1 -f buffered -m sequential -s write -F 1000b:single_reader_mw_buffered_sequential | xdoio -vk
Some information requested by Dave after starting gfs_controld w/ -P. [root@doral gfs_vs0]# group_tool dump gfs gfs_vs0 1200928471 listen 1 1200928471 cpg 4 1200928471 groupd 6 1200928471 uevent 7 1200928471 plocks 10 1200928471 setup done 1200928493 process_plocks: no mg id 40001 1200928493 process_plocks: no mg id 40001 1200928521 client 6: dump [root@doral gfs_vs0]# group_tool type level name id state fence 0 default 00010001 none [1 2 3 4] dlm 1 clvmd 00030001 none [1 2 3 4] dlm 1 gfs_vs0 00050001 none [1 2 3 4] gfs 2 gfs_vs0 00040001 none [1 2 3 4]
This is an alignment problem. Things work if we do the byte-swapping on the original structure and then copy it into the final buffer, instead of copying first and then trying to do the byte-swapping at an offset within the send buffer.
fix committed to HEAD, RHEL5 and RHEL51 branches
Marking as 5.1z fix for ppc support.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0347.html