Bug 164331
Summary: | fatal: filesystem consistency error during umount of GFS | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Henry Harris <henry.harris> | ||||
Component: | gfs | Assignee: | Abhijith Das <adas> | ||||
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | axel.thimm, kanderso, nobody+wcheng, nstraz, rkenna | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2006-0234 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-03-09 19:45:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 164915 | ||||||
Attachments: |
|
Description
Henry Harris
2005-07-27 00:22:50 UTC
I appeared to have reproduced this on the x86_64 link cluster (link-01 actually) while running regressions tests. One node (link-08) was shot by link-01 after missing heartbeats and then I started cleaning up the cluster inorder to start tests again. I attempted to umount the GFS on link-01 and it then Oops: GFS: fsid=LINK_128:vedder.0: fatal: filesystem consistency error GFS: fsid=LINK_128:vedder.0: function = trans_go_xmote_bh GFS: fsid=LINK_128:vedder.0: file = /usr/src/build/614138-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, line = 542 GFS: fsid=LINK_128:vedder.0: time = 1127213653 GFS: fsid=LINK_128:vedder.0: about to withdraw from the cluster GFS: fsid=LINK_128:vedder.0: waiting for outstanding I/O ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at lm:190 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_gulm(U) lock_harness(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core dm_mod ohci_hcd hw_random tg3 floppy ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod Pid: 14991, comm: gulm_Cb_Handler Tainted: G M 2.6.9-20.ELsmp RIP: 0010:[<ffffffffa01ec767>] <ffffffffa01ec767>{:gfs:gfs_lm_withdraw+215} RSP: 0018:000001002da6fc58 EFLAGS: 00010202 RAX: 0000000000000039 RBX: ffffff00001e48b8 RCX: 0000000100000000 RDX: ffffffff803d7748 RSI: 0000000000000246 RDI: ffffffff803d7740 RBP: ffffff00001ac000 R08: ffffffff803d7748 R09: ffffff00001e48b8 R10: ffffffff8011de14 R11: ffffffff8011de14 R12: 000001003bac67bc R13: 000001003bac6790 R14: ffffff00001ac000 R15: 0000000000000003 FS: 0000002a95574b00(0000) GS:ffffffff804d2f00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000004570b6 CR3: 000000003ff38000 CR4: 00000000000006e0 Process gulm_Cb_Handler (pid: 14991, threadinfo 000001002da6e000, task 00000100363a2030) Stack: 0000003000000030 000001002da6fd68 000001002da6fc78 00000000054c7d70 0000000000000000 0000000000000000 ffffff00001e48b8 ffffff00001e48b8 ffffffffa0205520 ffffff00001e48b8 Call Trace:<ffffffffa0204c60>{:gfs:gfs_consist_i+45} <ffffffffa01e572b>{:gfs:trans_go_xmote_bh+154} <ffffffffa01e3793>{:gfs:xmote_bh+897} <ffffffffa01e5122>{:gfs:gfs_glock_cb+194} <ffffffffa01c723a>{:lock_gulm:handler+394} <ffffffff80132dcd>{default_wake_function+0} <ffffffff80132dcd>{default_wake_function+0} <ffffffff80131bad>{finish_task_switch+55} <ffffffff80110ca3>{child_rip+8} <ffffffffa01c70b0>{:lock_gulm:handler+0} <ffffffff80110c9b>{child_rip+0} Code: 0f 0b 71 86 20 a0 ff ff ff ff be 00 8b 85 98 88 03 00 85 c0 RIP <ffffffffa01ec767>{:gfs:gfs_lm_withdraw+215} RSP <000001002da6fc58> <0>Sep 20 05:54:13 link-01 sshd(pam_unix)[16671]: session opened for user root by (uid=0) Sep 20 05:54:13 link-01 kernel: GFS: fsid=LINK_128:vedder.0: fatal: filesystem consistency error Sep 20 05:54:13 link-01 kernel: GFS: fsid=LINK_128:vedder.0: function = trans_go_xmoKte_bh Sep 20 05:54:13 link-01 kernel: GFS: fsid=LINK_128:vedder.0: file = /usr/src/build/614138-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, line = 542 Sep 20 05:54:13 link-01 kernel: GFS: fsid=LINK_128:vedder.0: time = 1127213653 Sep 20 05:54:13 link-01 keernel: GFS: fsid=LINK_128:vedder.0: about to withdraw from the cluster Sep 20 05:54:13 link-01 kernel: GFS: fsid=LINK_128:vedder.0: waiting for outstanding I/O Message from syslogd@link-01 at Tue Sep 20 05:54:13 2005 ... link-01 kernel: invalid operand: 0000 [1] SrMP nel panic - not syncing: Oops FYI - hit this oops again today while trying to unmount a gfs filesystem on link-02. Looks a bit familiar to bug #169693. *** Bug 175539 has been marked as a duplicate of this bug. *** *** Bug 169693 has been marked as a duplicate of this bug. *** I hit this again today after finishing up testing on 2.6.9-22.0.2.EL. Jan 17 23:16:31 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: fatal: filesystem consistency error Jan 17 23:16:31 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: function = tran s_go_xmote_bh Jan 17 23:16:31 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: file = /usr/src /build/678343-i686/BUILD/gfs-kernel-2.6.9-45/up/src/gfs/glops.c, line = 542 Jan 17 23:16:31 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: time = 11375613 91 Jan 17 23:16:31 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: about to withdraw from the cluster Jan 17 23:16:31 tank-05 kernel: GFS: fsid=tank-cluster:gfs0.4: waiting for outst anding I/O Jan 17 23:16:31 tank-05 kernel: ------------[ cut here ]------------ Jan 17 23:16:31 tank-05 kernel: kernel BUG at /usr/src/build/678343-i686/BUILD/g fs-kernel-2.6.9-45/up/src/gfs/lm.c:190! Jan 17 23:16:31 tank-05 kernel: invalid operand: 0000 [#1] Jan 17 23:16:31 tank-05 kernel: Modules linked in: lock_dlm(U) gfs(U) lock_harne ss(U) parport_pc lp parport autofs4 i2c_dev i2c_core dlm(U) cman(U) md5 ipv6 sun rpc button battery ac uhci_hcd hw_random shpchp e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod Jan 17 23:16:31 tank-05 kernel: CPU: 0 Jan 17 23:16:31 tank-05 kernel: EIP: 0060:[<f8ced988>] Not tainted VLI Jan 17 23:16:31 tank-05 kernel: EFLAGS: 00010202 (2.6.9-22.0.2.EL) Jan 17 23:16:31 tank-05 kernel: EIP is at gfs_lm_withdraw+0x50/0xbc [gfs] Jan 17 23:16:31 tank-05 kernel: eax: 0000003b ebx: f8c81890 ecx: f8d0e755 edx: f7067e40 Jan 17 23:16:31 tank-05 kernel: esi: f8c6d000 edi: 000004a0 ebp: f8c6d000 esp: f7067e54 Jan 17 23:16:31 tank-05 kernel: ds: 007b es: 007b ss: 0068 Jan 17 23:16:31 tank-05 kernel: Process lock_dlm1 (pid: 3639, threadinfo=f706700 0 task=f6474cd0) Jan 17 23:16:31 tank-05 kernel: Stack: f8c6d000 f6ac86e0 f8d0a999 f8c6d000 f8d12 aeb f8c81890 f8c81890 f8d0b20c Jan 17 23:16:31 tank-05 kernel: f8c81890 f8d0d477 0000021e f8c81890 43cdc f2f f8ce4b12 f8d0d477 0000021e Jan 17 23:16:31 tank-05 kernel: 01161970 00000008 00000000 00000000 00000 320 00000000 00000000 00000000 Jan 17 23:16:31 tank-05 kernel: Call Trace: Jan 17 23:16:31 tank-05 kernel: [<f8d0a999>] gfs_consist_i+0x24/0x28 [gfs] Jan 17 23:16:31 tank-05 kernel: [<f8ce4b12>] trans_go_xmote_bh+0x86/0xbc [gfs] Jan 17 23:16:31 tank-05 kernel: [<f8ce00d3>] xmote_bh+0x660/0x7a1 [gfs] Jan 17 23:16:31 tank-05 kernel: [<f8ce252b>] gfs_glock_cb+0xa2/0x12f [gfs] Jan 17 23:16:31 tank-05 kernel: [<f8c99ae0>] process_complete+0x3af/0x3b7 [lock _dlm] Jan 17 23:16:31 tank-05 kernel: [<f8c99e79>] dlm_async+0x391/0x416 [lock_dlm] Jan 17 23:16:31 tank-05 kernel: [<c011cf22>] default_wake_function+0x0/0xc Jan 17 23:16:31 tank-05 kernel: [<c030e08f>] schedule+0x43f/0x552 Jan 17 23:16:31 tank-05 kernel: [<c011cf22>] default_wake_function+0x0/0xc Jan 17 23:16:31 tank-05 kernel: [<f8c99ae8>] dlm_async+0x0/0x416 [lock_dlm] Jan 17 23:16:31 tank-05 kernel: [<c013972d>] kthread+0x69/0x91 Jan 17 23:16:31 tank-05 kernel: [<c01396c4>] kthread+0x0/0x91 Jan 17 23:16:31 tank-05 kernel: [<c01041d9>] kernel_thread_helper+0x5/0xb Jan 17 23:16:31 tank-05 kernel: Code: ff 74 24 14 e8 a4 3c 43 c7 53 68 23 e7 d0 f8 e8 88 3c 43 c7 53 68 55 e7 d0 f8 e8 7d 3c 43 c7 83 c4 18 83 be 34 02 00 00 00 74 08 <0f> 0b be 00 59 e6 d0 f8 8b 86 70 48 01 00 85 c0 74 1b b8 00 f0 I hit this after running on 2.6.9-31.ELsmp. Feb 14 01:17:13 morph-04 kernel: GFS: fsid=morph-cluster:gfs0.0: fatal: filesystem consistency error Feb 14 01:17:13 morph-04 kernel: GFS: fsid=morph-cluster:gfs0.0: function = trans_go_xmote_bh Feb 14 01:17:13 morph-04 kernel: GFS: fsid=morph-cluster:gfs0.0: file = /usr/src/build/700436-i686/BUILD/gfs-kernel-2.6.9-48/smp/src/gfs/glops.c, line = 542 Feb 14 01:17:13 morph-04 kernel: GFS: fsid=morph-cluster:gfs0.0: time = 1139901433 Feb 14 01:17:13 morph-04 kernel: GFS: fsid=morph-cluster:gfs0.0: about to withdraw from the cluster Feb 14 01:17:13 morph-04 kernel: GFS: fsid=morph-cluster:gfs0.0: waiting for outstanding I/O Feb 14 01:17:13 morph-04 kernel: ------------[ cut here ]------------ Feb 14 01:17:13 morph-04 kernel: kernel BUG at /usr/src/build/700436-i686/BUILD/gfs-kernel-2.6.9-48/smp/src/gfs/lm.c:190! Feb 14 01:17:13 morph-04 kernel: invalid operand: 0000 [#1] Feb 14 01:17:13 morph-04 kernel: SMP Feb 14 01:17:13 morph-04 kernel: Modules linked in: lock_dlm(U) parport_pc lp parport autofs4 i2c_dev i2c_core gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc button battery a c uhci_hcd e7xxx_edac edac_mc hw_random e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod Feb 14 01:17:13 morph-04 kernel: CPU: 1 Feb 14 01:17:13 morph-04 kernel: EIP: 0060:[<f8cd14a7>] Not tainted VLI Feb 14 01:17:13 morph-04 kernel: EFLAGS: 00010202 (2.6.9-31.ELsmp) Feb 14 01:17:13 morph-04 kernel: EIP is at gfs_lm_withdraw+0x51/0xc0 [gfs] Feb 14 01:17:13 morph-04 kernel: eax: 0000003c ebx: f8c87730 ecx: f6863e34 edx: f8ced5e5 Feb 14 01:17:13 morph-04 kernel: esi: f8c63000 edi: f6d9eed0 ebp: f7e41f10 esp: f6863e48 Feb 14 01:17:13 morph-04 kernel: ds: 007b es: 007b ss: 0068 Feb 14 01:17:13 morph-04 kernel: Process lock_dlm1 (pid: 4073, threadinfo=f6863000 task=f74c5430) Feb 14 01:17:13 morph-04 kernel: Stack: f8c63000 f7e41e64 f8ce9ef7 f8c63000 f8cf0b25 f8c87730 f8c87730 f8cea738 Feb 14 01:17:13 morph-04 kernel: f8c87730 f8cec505 0000021e f8c87730 43f183f9 f8cca65c f8cec505 0000021e Feb 14 01:17:13 morph-04 kernel: 01161970 00000008 00000000 00000000 00000320 00000000 00000000 00000000 Feb 14 01:17:13 morph-04 kernel: Call Trace: Feb 14 01:17:13 morph-04 kernel: [<f8ce9ef7>] gfs_consist_i+0x24/0x28 [gfs] Feb 14 01:17:13 morph-04 kernel: [<f8cca65c>] trans_go_xmote_bh+0x86/0xbc [gfs] Feb 14 01:17:13 morph-04 kernel: [<f8cc7404>] xmote_bh+0x312/0x3ab [gfs] Feb 14 01:17:13 morph-04 kernel: [<f8cc8adc>] gfs_glock_cb+0xa3/0x131 [gfs] Feb 14 01:17:13 morph-04 kernel: [<f8c9d6dd>] process_complete+0x3b7/0x3bf [lock_dlm] Feb 14 01:17:13 morph-04 kernel: [<f8c9d95b>] dlm_async+0x276/0x2ff [lock_dlm] Feb 14 01:17:13 morph-04 kernel: [<c011e6fb>] default_wake_function+0x0/0xc Feb 14 01:17:13 morph-04 kernel: [<c011e6fb>] default_wake_function+0x0/0xc Feb 14 01:17:13 morph-04 kernel: [<f8c9d6e5>] dlm_async+0x0/0x2ff [lock_dlm] Feb 14 01:17:13 morph-04 kernel: [<c0133ead>] kthread+0x73/0x9b Feb 14 01:17:13 morph-04 kernel: [<c0133e3a>] kthread+0x0/0x9b Feb 14 01:17:13 morph-04 kernel: [<c01041f5>] kernel_thread_helper+0x5/0xb Feb 14 01:17:13 morph-04 kernel: Code: ff 74 24 14 e8 a6 11 45 c7 53 68 b3 d5 ce f8 e8 8a 11 45 c7 53 68 e5 d5 ce f8 e8 7f 11 45 c7 83 c4 18 83 be 34 02 00 00 00 74 08 <0f> 0b be 00 e8 d4 ce f8 8b 86 10 47 02 00 85 c0 74 1c ba 02 00 Wendy's fix for this bug is in CVS. While granting exclusive lock, gfs_glock_cb() expects all other threads have relinguished their writes and journal has been flushed and shutdown. Otherwise it aborts the call and forces a filesystem consistency error. The current umount code (gfs_put_super) doesn't follow this logic by doing flushes without log shutdown before the exclusive lock is requested. The patch works around this issue by relocating the flushes into gfs_make_fs_ro() call itself after the gfs_glock_nq_init(). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0234.html |