Bug 201932

Summary: GFS io error on Gulm and Cman/dlm
Product: Red Hat Enterprise Linux 5 Reporter: Pascal Pucci <pascal.pucci>
Component: gfs-kmodAssignee: Kiersten (Kerri) Anderson <kanderso>
Status: CLOSED NOTABUG QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: pjakobi
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-09-20 15:53:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pascal Pucci 2006-08-09 20:40:49 UTC
Description of problem:

break of GFS system during io test.
io error on a GFS mounted system on GNBD AND CMAN/DLM.

I Try to use GFS 6.1 on rh4u2 with the kernel :

2.6.9-39/

and

2.6.9-34/

And I reproduce the same error first with GULM gnbd export :

* 2 servers gnbd connected by FC to a volume.
* 4 clients node import GFS system via gnbd_import.

So I mounted the volume on server IO and on client nodes.
So 6 mounts. After I use some dd on each node to create IO.

During my tests, I had io error on gfs server and on client :

/var/log/messages-Aug  8 13:26:55 gfs5 kernel: dd 7149 called gnbd_end_request
with an error
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: end_request: I/O error, dev
gnbd0, sector 208
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3: fatal:
I/O error
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3:   block = 26
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3:  
function = gfs_dreread
/var/log/messages:Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3:   file =
/usr/src/build/762247-x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 576
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3:   time =
1155036415
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3: about to
withdraw from the cluster
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3: waiting
for outstanding I/O
/var/log/messages-Aug  8 13:26:55 gfs5 kernel: GFS: fsid=genopol:gfs.3: telling
LM to withdraw
/var/log/messages-Aug  8 13:26:57 gfs5 lock_gulmd_core[2199]: "GFS Kernel
Interface" is logged out. fd:12
/var/log/messages-Aug  8 13:26:57 gfs5 kernel: GFS: fsid=genopol:gfs.3: withdrawn
/var/log/messages-Aug  8 13:27:11 gfs5 gnbd_monitor[6378]: ERROR
[gnbd_monitor.c:557] gnbd_recvd failed (1)
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: dd 7208 called gnbd_end_request
with an error
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: end_request: I/O error, dev
gnbd0, sector 192
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5: fatal:
I/O error
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5:   block = 24
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5:  
function = gfs_dreread
/var/log/messages:Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5:   file =
/usr/src/build/762247-x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 576
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5:   time =
1155036417
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: dd 7208 called gnbd_end_request
with an error
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: end_request: I/O error, dev
gnbd0, sector 208
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5: about to
withdraw from the cluster
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5: waiting
for outstanding I/O
/var/log/messages-Aug  8 13:26:57 gfs7 kernel: GFS: fsid=genopol:gfs.5: telling
LM to withdraw
/var/log/messages-Aug  8 13:26:59 gfs7 lock_gulmd_core[2368]: "GFS Kernel
Interface" is logged out. fd:11
/var/log/messages-Aug  8 08:42:59 gfs1 gnbd_serv[10355]: server process 12778
exited because of signal 15
/var/log/messages-Aug  8 08:42:59 gfs1 gnbd_serv[10355]: server process 12779
exited with 1
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0: fatal:
I/O error
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0:   block
= 12587008
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0:  
function = gfs_logbh_wait
/var/log/messages:Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0:   file =
/usr/src/build/758962-x86_64/BUILD/gfs-kernel-2.6.9-57/smp/src/gfs/dio.c, line = 923
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0:   time =
1155019379
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0: about to
withdraw from the cluster
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0: waiting
for outstanding I/O
/var/log/messages-Aug  8 08:42:59 gfs1 kernel: GFS: fsid=genopol:gfs.0: telling
LM to withdraw
/var/log/messages-Aug  8 08:43:01 gfs1 lock_gulmd_core[2656]: "GFS Kernel
Interface" is logged out. fd:12
/var/log/messages-Aug  8 08:43:01 gfs1 kernel: GFS: fsid=genopol:gfs.0: withdrawn
/var/log/messages-Aug  8 08:43:04 gfs1 gnbd_serv[10355]: opened external connection
--
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: Device sda not ready.
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: end_request: I/O error, dev sda,
sector 36667048
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0: fatal:
I/O error
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0:   block
= 4583381
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0:  
function = gfs_dreread
/var/log/messages:Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0:   file =
/usr/src/build/762247-x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 576
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0:   time =
1155035119
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0: about to
withdraw from the cluster
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0: waiting
for outstanding I/O
/var/log/messages-Aug  8 13:05:19 gfs1 kernel: GFS: fsid=genopol:gfs.0: telling
LM to withdraw
/var/log/messages-Aug  8 13:05:21 gfs1 lock_gulmd_core[2387]: "GFS Kernel
Interface" is logged out. fd:11
/var/log/messages-Aug  8 13:05:21 gfs1 kernel: GFS: fsid=genopol:gfs.0: withdrawn
/var/log/messages-Aug  8 13:05:24 gfs1 gnbd_serv[4843]: opened external connection
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: Buffer I/O error on device
diapered_sda, logical block 655436
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: lost page write due to I/O error
on diapered_sda
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1: fatal:
I/O error
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1:   block
= 12618483
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1:  
function = gfs_logbh_wait
/var/log/messages:Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1:   file =
/usr/src/build/758962-x86_64/BUILD/gfs-kernel-2.6.9-57/smp/src/gfs/dio.c, line = 923
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1:   time =
1155019401
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1: about to
withdraw from the cluster
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1: waiting
for outstanding I/O
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1: telling
LM to withdraw
/var/log/messages-Aug  8 08:43:21 gfs2 kernel: GFS: fsid=genopol:gfs.1: jid=0:
Trying to acquire journal lock...
/var/log/messages-Aug  8 08:43:23 gfs2 lock_gulmd_core[2631]: "GFS Kernel
Interface" is logged out. fd:12
/var/log/messages-Aug  8 08:43:23 gfs2 kernel: GFS: fsid=genopol:gfs.1: withdrawn
--
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: Device sda not ready.
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: end_request: I/O error, dev sda,
sector 157803344
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0: fatal:
I/O error
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0:   block
= 19725418
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0:  
function = gfs_dreread
/var/log/messages:Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0:   file =
/usr/src/build/758962-x86_64/BUILD/gfs-kernel-2.6.9-57/smp/src/gfs/dio.c, line = 576
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0:   time =
1155025417
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0: about to
withdraw from the cluster
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0: waiting
for outstanding I/O
/var/log/messages-Aug  8 10:23:37 gfs2 kernel: GFS: fsid=genopol:gfs.0: telling
LM to withdraw
/var/log/messages-Aug  8 10:23:39 gfs2 lock_gulmd_core[2629]: "GFS Kernel
Interface" is logged out. fd:11
/var/log/messages-Aug  8 10:23:39 gfs2 kernel: GFS: fsid=genopol:gfs.0: withdrawn
/var/log/messages-Aug  8 10:23:49 gfs2 sshd(pam_unix)[6772]: session opened for
user root by (uid=0)
--
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: Device sda not ready.
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: end_request: I/O error, dev sda,
sector 202343632
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1: fatal:
I/O error
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1:   block
= 25290975
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1:  
function = gfs_ail_empty_trans
/var/log/messages:Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1:   file =
/usr/src/build/762247-x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 346
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1:   time =
1155051520
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1: about to
withdraw from the cluster
/var/log/messages-Aug  8 17:38:40 gfs2 kernel: GFS: fsid=genopol:gfs.1: waiting
for outstanding I/O
/var/log/messages-Aug  8 17:38:41 gfs2 kernel: GFS: fsid=genopol:gfs.1: telling
LM to withdraw
/var/log/messages-Aug  8 17:38:44 gfs2 kernel: lock_dlm: withdraw abandoned memory
/var/log/messages-Aug  8 17:38:44 gfs2 kernel: GFS: fsid=genopol:gfs.1: withdrawn
--
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: Device sda not ready.
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: end_request: I/O error, dev
sda, sector 100763272
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0: fatal:
I/O error
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0:  
block = 12595417
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0:  
function = gfs_logbh_wait
/var/log/messages.1:Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0:   file
= /usr/src/build/758962-x86_64/BUILD/gfs-kernel-2.6.9-57/smp/src/gfs/dio.c, line
= 923
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0:   time
= 1154730051
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0: about
to withdraw from the cluster
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0:
waiting for outstanding I/O
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: GFS: fsid=genopol:gfs.0:
telling LM to withdraw
/var/log/messages.1-Aug  5 00:20:51 gfs2 gnbd_serv[13483]: ERROR [gserv.c:68]
failed reading in do_file_read : Input/output error
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: Device sda not ready.
/var/log/messages.1-Aug  5 00:20:51 gfs2 kernel: end_request: I/O error, dev
sda, sector 22420776

I needed to reboot all cluster remount gfs volume (a umount and mount made
kernel panic !).

So I tried to use stable kernel : 2.6.9-34 and not the beta kernel.
Same error....

So, I stopped tests with gnbd and continue with cman/dlm :

so :

Just 2 servers connected with FC to a volume.

And  after some concurrent dd, I have same problem :

end_request: I/O error, dev sda, sector 202311784
Buffer I/O error on device diapered_sda, logical block 25288973
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288974
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288975
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288976
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288977
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288978
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288979
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288980
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288981
lost page write due to I/O error on diapered_sda
Buffer I/O error on device diapered_sda, logical block 25288982
lost page write due to I/O error on diapered_sda
Device sda not ready.
end_request: I/O error, dev sda, sector 202312808
Device sda not ready.
end_request: I/O error, dev sda, sector 202313832
Device sda not ready.
end_request: I/O error, dev sda, sector 202314856
Device sda not ready.
end_request: I/O error, dev sda, sector 202315880
Device sda not ready.
end_request: I/O error, dev sda, sector 202316904
Device sda not ready.
end_request: I/O error, dev sda, sector 202317928
Device sda not ready.
end_request: I/O error, dev sda, sector 202318952
Device sda not ready.
end_request: I/O error, dev sda, sector 202319976
Device sda not ready.
end_request: I/O error, dev sda, sector 202321000
Device sda not ready.
end_request: I/O error, dev sda, sector 202322024
Device sda not ready.
end_request: I/O error, dev sda, sector 202323048
Device sda not ready.
end_request: I/O error, dev sda, sector 202324072
Device sda not ready.
end_request: I/O error, dev sda, sector 202325096
Device sda not ready.
end_request: I/O error, dev sda, sector 202326120
Device sda not ready.
end_request: I/O error, dev sda, sector 202327144
Device sda not ready.
end_request: I/O error, dev sda, sector 202328272
Device sda not ready.
end_request: I/O error, dev sda, sector 202329296
Device sda not ready.
end_request: I/O error, dev sda, sector 202330320
Device sda not ready.
end_request: I/O error, dev sda, sector 202331344
Device sda not ready.
end_request: I/O error, dev sda, sector 202332368
Device sda not ready.
end_request: I/O error, dev sda, sector 202333392
Device sda not ready.
end_request: I/O error, dev sda, sector 202334416
Device sda not ready.
end_request: I/O error, dev sda, sector 202335440
Device sda not ready.
end_request: I/O error, dev sda, sector 202336464
Device sda not ready.
end_request: I/O error, dev sda, sector 202337488
Device sda not ready.
end_request: I/O error, dev sda, sector 202338512
Device sda not ready.
end_request: I/O error, dev sda, sector 202339536
Device sda not ready.
end_request: I/O error, dev sda, sector 202340560
Device sda not ready.
end_request: I/O error, dev sda, sector 202341584
Device sda not ready.
end_request: I/O error, dev sda, sector 202342608
Device sda not ready.
end_request: I/O error, dev sda, sector 202343632
GFS: fsid=genopol:gfs.1: fatal: I/O error
GFS: fsid=genopol:gfs.1:   block = 25290975
GFS: fsid=genopol:gfs.1:   function = gfs_ail_empty_trans
GFS: fsid=genopol:gfs.1:   file =
/usr/src/build/762247-x86_64/BUILD/gfs-kernel-2.6.9-49/smp/src/gfs/dio.c, line = 346
GFS: fsid=genopol:gfs.1:   time = 1155051520
GFS: fsid=genopol:gfs.1: about to withdraw from the cluster
GFS: fsid=genopol:gfs.1: waiting for outstanding I/O
GFS: fsid=genopol:gfs.1: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=genopol:gfs.1: withdrawn

So, any help, any idee to implement a stable GFS system ?

Comment 1 Kiersten (Kerri) Anderson 2006-08-09 22:01:46 UTC
What storage array are you using in this configuration?  Are you able to do
concurrent dd's from both nodes to the exported LUN's?  Start with one LUN per
node and validate that your storage is stable.  Then combine the LUNs, either
with cluster volume management.  If your storage array is not able to handle
concurrent writes/reads from both nodes at the same time, then the file system
will not be able to get the data it needs to operate.

Comment 2 Kiersten (Kerri) Anderson 2006-09-20 15:53:30 UTC
Closing this as not a bug.  Looks like problems with the underlying storage and
with no further information available, it looks like the cluster software
behaved correctly.

Comment 3 Nate Straz 2007-12-13 17:42:33 UTC
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Comment 4 Nate Straz 2007-12-19 20:06:36 UTC
Moving all closed bugs to gfs-kmod to match the rpm name.  GFS-kernel will be
removed.