Bug 712139
Summary: | GFS2: Update to rhel6.1 broke dovecot writing to a gfs2 filesystem | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Aliet <alietsantiesteban> |
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.1 | CC: | adas, adellape, adrew, akaiser, anprice, bmarzins, capelle, ccaulfie, cluster-maint, Colin.Simpson, cww, gasmith, hjia, ianauati, jeder, joshua, jwest, lhh, liko, nicolas, nstraz, rdassen, rmitchel, rpeterso, sergey, shiyer, swhiteho, teigland |
Target Milestone: | rc | Keywords: | Regression, ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-166.el6 | Doc Type: | Bug Fix |
Doc Text: |
When the responsibility for deallocation is passed from one node to another the receiving node may not have a fully uptodate inode state. If the sending node has changed the important parts of the state in the mean time (block allocation/deallocation) then this results in triggering an assert during the deallocation on the receiving node. This is easily fixed with fsck and the only data lost will be the file that was being deallocated in the first place.
There is no workaround beyond avoiding unlinking inodes.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2011-12-06 13:38:17 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 699956, 713949, 714982 |
Description
Aliet
2011-06-09 15:01:36 UTC
Error message when tested with plock_ownership in cluster.conf: <dlm enable_plock="1" plock_ownership="1" plock_rate_limit="0"/> <gfs_controld enable_plock="1" plock_ownership="1" plock_rate_limit="0"/> We have found this related to: https://bugzilla.redhat.com/show_bug.cgi?id=678585 Jun 8 18:42:20 n02 kernel: INFO: task imap:3731 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000000 0 3731 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880067a3bd98 0000000000000082 ffff880067a3bd60 ffff880067a3bd5c Jun 8 18:42:20 n02 kernel: 0000000000000246 ffff88007f823680 ffff880002015f80 0000000100049fa5 Jun 8 18:42:20 n02 kernel: ffff8800679f26b8 ffff880067a3bfd8 000000000000f598 ffff8800679f26b8 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3a1>] ? prepare_to_wait+0x1/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3732 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000000 0 3732 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880067a41d98 0000000000000082 0000000000000000 ffff880067a41d5c Jun 8 18:42:20 n02 kernel: 0000000000000246 ffff88007f823480 ffff880002015f80 0000000100049fa9 Jun 8 18:42:20 n02 kernel: ffff880067a3fa78 ffff880067a41fd8 000000000000f598 ffff880067a3fa78 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3a1>] ? prepare_to_wait+0x1/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3742 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000003 0 3742 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880067bc5d98 0000000000000086 ffff880067bc5d60 ffff880067bc5d5c Jun 8 18:42:20 n02 kernel: 0000000000000246 ffff88007f823680 ffff880002015f80 0000000100049fa9 Jun 8 18:42:20 n02 kernel: ffff880067ba10f8 ffff880067bc5fd8 000000000000f598 ffff880067ba10f8 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffff814da486>] ? schedule+0x6/0x3a9 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3759 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000003 0 3759 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880065afdd98 0000000000000082 ffff880065afdd60 ffff880065afdd5c Jun 8 18:42:20 n02 kernel: 0000000000000246 ffff88007f823680 ffff880002015f80 0000000100049fc0 Jun 8 18:42:20 n02 kernel: ffff88007953a6b8 ffff880065afdfd8 000000000000f598 ffff88007953a6b8 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3762 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000003 0 3762 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880067973d98 0000000000000086 ffff880067973d60 ffff880067973d5c Jun 8 18:42:20 n02 kernel: 0000000000000246 ffff88007f823680 ffff880002015f80 0000000100049fc0 Jun 8 18:42:20 n02 kernel: ffff8800792db078 ffff880067973fd8 000000000000f598 ffff8800792db078 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3773 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000003 0 3773 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff88006797fd98 0000000000000082 0000000000000001 ffff880065835de0 Jun 8 18:42:20 n02 kernel: 0000000000000246 0000000000000000 ffffffffa02b7120 0000000000000246 Jun 8 18:42:20 n02 kernel: ffff8800658d30b8 ffff88006797ffd8 000000000000f598 ffff8800658d30b8 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3793 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000000 0 3793 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880065835d98 0000000000000086 0000000000000001 ffff880065834000 Jun 8 18:42:20 n02 kernel: 0000000000000080 00000000c0000100 0000000000000002 ffff880065835dc8 Jun 8 18:42:20 n02 kernel: ffff880065833ab8 ffff880065835fd8 000000000000f598 ffff880065833ab8 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:42:20 n02 kernel: INFO: task imap:3853 blocked for more than 120 seconds. Jun 8 18:42:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:42:20 n02 kernel: imap D 0000000000000000 0 3853 3274 0x00000084 Jun 8 18:42:20 n02 kernel: ffff880063e21d98 0000000000000086 ffff880063e21d60 ffff880063e21d5c Jun 8 18:42:20 n02 kernel: 0000000000000246 ffff88007f823480 ffff880002015f80 0000000100049fb4 Jun 8 18:42:20 n02 kernel: ffff8800679a90b8 ffff880063e21fd8 000000000000f598 ffff8800679a90b8 Jun 8 18:42:20 n02 kernel: Call Trace: Jun 8 18:42:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:42:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:42:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:42:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:42:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:42:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:42:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:42:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:44:20 n02 kernel: INFO: task imap:3731 blocked for more than 120 seconds. Jun 8 18:44:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:44:20 n02 kernel: imap D 0000000000000000 0 3731 3274 0x00000084 Jun 8 18:44:20 n02 kernel: ffff880067a3bd98 0000000000000082 ffff880067a3bd60 ffff880067a3bd5c Jun 8 18:44:20 n02 kernel: 0000000000000246 ffff88007f823680 ffff880002015f80 0000000100049fa5 Jun 8 18:44:20 n02 kernel: ffff8800679f26b8 ffff880067a3bfd8 000000000000f598 ffff8800679f26b8 Jun 8 18:44:20 n02 kernel: Call Trace: Jun 8 18:44:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:44:20 n02 kernel: [<ffffffff8108e3a1>] ? prepare_to_wait+0x1/0x80 Jun 8 18:44:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:44:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:44:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:44:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:44:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:44:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:44:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Jun 8 18:44:20 n02 kernel: INFO: task imap:3732 blocked for more than 120 seconds. Jun 8 18:44:20 n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 18:44:20 n02 kernel: imap D 0000000000000000 0 3732 3274 0x00000084 Jun 8 18:44:20 n02 kernel: ffff880067a41d98 0000000000000082 0000000000000000 ffff880067a41d5c Jun 8 18:44:20 n02 kernel: 0000000000000246 ffff88007f823480 ffff880002015f80 0000000100049fa9 Jun 8 18:44:20 n02 kernel: ffff880067a3fa78 ffff880067a41fd8 000000000000f598 ffff880067a3fa78 Jun 8 18:44:20 n02 kernel: Call Trace: Jun 8 18:44:20 n02 kernel: [<ffffffff8108e3ee>] ? prepare_to_wait+0x4e/0x80 Jun 8 18:44:20 n02 kernel: [<ffffffff8108e3a1>] ? prepare_to_wait+0x1/0x80 Jun 8 18:44:20 n02 kernel: [<ffffffffa02aab05>] dlm_posix_lock+0x1b5/0x2d0 [dlm] Jun 8 18:44:20 n02 kernel: [<ffffffff811381c8>] ? handle_mm_fault+0x1d8/0x2a0 Jun 8 18:44:20 n02 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 8 18:44:20 n02 kernel: [<ffffffffa02e1c6b>] gfs2_lock+0x7b/0xf0 [gfs2] Jun 8 18:44:20 n02 kernel: [<ffffffff811bc243>] vfs_lock_file+0x23/0x40 Jun 8 18:44:20 n02 kernel: [<ffffffff811bc497>] fcntl_setlk+0x177/0x320 Jun 8 18:44:20 n02 kernel: [<ffffffff811845f7>] sys_fcntl+0x197/0x530 Jun 8 18:44:20 n02 kernel: [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Hi Aliet, can you confirm whether or not you have a Red Hat support agreement in place? I can't find any record of one here, but maybe something got missed at our end. Can you also tell me which region of the world you are in (and/or what company you work for) ? Yes, we have active Support Agreements, I will login to portal with our company login and will request a Support ticket for this bug, we are in America, Mexico, our company is Ultra Division Digital... Furhter investigating this bug I have tested all kinds of config with dovecot, and all of them gets gfs2 hanged, I have tested this scenario with bare metal hardware cluster, with virtualized cluster guests in vmware esxi 4.1, with a cluster test in vmware workstation and I can reproduce the problem in all the tests, even in different enviroments, we are testing if dovecot can be deployed on a Redhat Cluster of Active-Active Nodes doing user session persistence. This was my last test, I simplify the scenario with a cluster in my own laptop: 1- Used a two node rhel 6.1 cluster, virtualized in VMWare Workstation. 2- Used two shared iscsi devices from a NAS. 3- Used fence_scsi. Cluster.conf <?xml version="1.0"?> <cluster config_version="9" name="MailCluster"> <clusternodes> <clusternode name="node0.local" nodeid="1"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> <clusternode name="node1.local" nodeid="2"> <fence> <method name="fn_mt_scsi"> <device name="fn_scsi"/> </method> </fence> <unfence> <device action="on" name="fn_scsi"/> </unfence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_scsi" logfile="/var/log/cluster/fence_scsi.log" name="fn_scsi"/> </fencedevices> </cluster> 4- Used the iscsi devices for the LVM stuff and created there the GFS2 filesystems. fstab fragment # GFS2 filesystem /dev/vg_indexes/lv_indexes /var/vmail/indexes gfs2 noatime,quota=off,errors=withdraw 0 0 /dev/vg_mailbox/lv_mailbox /var/vmail/mailbox gfs2 noatime,quota=off,errors=withdraw 0 0 5- Dovecot configured with users in ldap, in this case we tested the mbox mailbox format with fnctl and mmap_disable=yes, we have also tested all other mailboxes formats, indexes and mailboxes stored in gfs2 filesystems, here the conf: [root@node0 ~]# dovecot -n # 2.0.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server release 6.1 (Santiago) gfs2 auth_default_realm = example.com auth_mechanisms = plain login auth_worker_max_count = 60 disable_plaintext_auth = no listen = * mail_fsync = always mail_gid = vmail mail_location = mbox:/var/vmail/mailbox/%d/%3n/%n:INDEX=/var/vmail/indexes/%d/%3n/%n mail_nfs_index = yes mail_nfs_storage = yes mail_uid = vmail mbox_write_locks = fcntl mmap_disable = yes passdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } ssl_cert = </etc/pki/dovecot/certs/dovecot.pem ssl_key = </etc/pki/dovecot/private/dovecot.pem userdb { args = /etc/dovecot/dovecot-ldap-userdb.conf.ext driver = ldap } [root@node0 ~]# 6- started dovecot service in the nodes. 7- from another hosts tested with imaptest the first node: imaptest host=192.168.164.95 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 8- Repeated many times against that node. 9- Run the test against the second node: imaptest host=192.168.164.96 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 10- First node hangs GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error GFS2: fsid=MailCluster:indexes.0: inode = 468 525144 GFS2: fsid=MailCluster:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system GFS2: fsid=MailCluster:indexes.0: telling LM to unmount GFS2: fsid=MailCluster:indexes.0: withdrawn Pid: 3808, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Call Trace: [<ffffffffa064bfd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] [<ffffffffa0621209>] ? trunc_dealloc+0xa9/0x130 [gfs2] [<ffffffffa064c1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] [<ffffffffa0631584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] [<ffffffffa064a1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] [<ffffffffa064a0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] [<ffffffffa064a020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 [<ffffffffa062e940>] ? delete_work_func+0x0/0x80 [gfs2] [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 [<ffffffffa0648c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] [<ffffffff8118bf82>] ? iput+0x62/0x70 [<ffffffffa062e994>] ? delete_work_func+0x54/0x80 [gfs2] [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 [<ffffffff8108dd96>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff8108dd00>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 no_formal_ino = 468 no_addr = 525144 i_disksize = 65536 blocks = 0 i_goal = 525170 i_diskflags = 0x00000000 i_height = 1 i_depth = 0 i_entries = 0 i_eattr = 0 GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5 I I change to differents mailbox formats, they also hangs, only that messages in the kernel are little differents as the first post. any ideas??? Best regards Today I reproduced the problem again, run imaptest four times againts first node0, run imap against the second node1, gfs2 in node0 hangs. Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: inode = 449 655889 Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: telling LM to unmount Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: withdrawn Jun 12 14:29:43 node0 kernel: Pid: 9310, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Jun 12 14:29:43 node0 kernel: Call Trace: Jun 12 14:29:43 node0 kernel: [<ffffffffa0734fd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffffa070a209>] ? trunc_dealloc+0xa9/0x130 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffffa07351dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffffa071a584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffffa07331da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffffa07330ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffffa0733020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 Jun 12 14:29:43 node0 kernel: [<ffffffffa0717940>] ? delete_work_func+0x0/0x80 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 Jun 12 14:29:43 node0 kernel: [<ffffffffa0731c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffff8118bf82>] ? iput+0x62/0x70 Jun 12 14:29:43 node0 kernel: [<ffffffffa0717994>] ? delete_work_func+0x54/0x80 [gfs2] Jun 12 14:29:43 node0 kernel: [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 Jun 12 14:29:43 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 12 14:29:43 node0 kernel: [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 Jun 12 14:29:43 node0 kernel: [<ffffffff8108dd96>] ? kthread+0x96/0xa0 Jun 12 14:29:43 node0 kernel: [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 Jun 12 14:29:43 node0 kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Jun 12 14:29:43 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 12 14:29:43 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 12 14:29:43 node0 kernel: no_formal_ino = 449 Jun 12 14:29:43 node0 kernel: no_addr = 655889 Jun 12 14:29:43 node0 kernel: i_disksize = 65536 Jun 12 14:29:43 node0 kernel: blocks = 0 Jun 12 14:29:43 node0 kernel: i_goal = 721647 Jun 12 14:29:43 node0 kernel: i_diskflags = 0x00000000 Jun 12 14:29:43 node0 kernel: i_height = 1 Jun 12 14:29:43 node0 kernel: i_depth = 0 Jun 12 14:29:43 node0 kernel: i_entries = 0 Jun 12 14:29:43 node0 kernel: i_eattr = 0 Jun 12 14:29:43 node0 kernel: GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,102d0 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,80021 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,e0010 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 2,10d79 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,10d79 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 2,102c8 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,102c8 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,b0058 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,b0021 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,a0013 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,b0056 err=-22 Jun 12 14:29:43 node0 kernel: gdlm_unlock 5,a0211 err=-22 dlm_controld fragment Jun 12 14:29:43 dlm_controld uevent: offline@/kernel/dlm/indexes Jun 12 14:29:43 dlm_controld kernel: offline@ indexes Jun 12 14:29:43 dlm_controld dlm:ls:indexes conf 1 0 1 memb 2 join left 1 Jun 12 14:29:43 dlm_controld indexes confchg for our leave Jun 12 14:29:43 dlm_controld indexes stop_kernel cg 0 Jun 12 14:29:43 dlm_controld write "0" to "/sys/kernel/dlm/indexes/control" Jun 12 14:29:43 dlm_controld dir_member 2 Jun 12 14:29:43 dlm_controld dir_member 1 Jun 12 14:29:43 dlm_controld set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/indexes/nodes/2" Jun 12 14:29:43 dlm_controld set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/indexes/nodes/1" Jun 12 14:29:43 dlm_controld set_members lockspace rmdir "/sys/kernel/config/dlm/cluster/spaces/indexes" Jun 12 14:29:43 dlm_controld write "0" to "/sys/kernel/dlm/indexes/event_done" Jun 12 14:29:43 dlm_controld uevent: remove@/kernel/dlm/indexes Jun 12 14:29:43 dlm_controld kernel: remove@ indexes I noticed on top of dlm_controld this, is relevante?? Jun 12 13:47:57 dlm_controld found /dev/misc/dlm-control minor 56 Jun 12 13:47:57 dlm_controld found /dev/misc/dlm-monitor minor 55 Jun 12 13:47:57 dlm_controld found /dev/misc/dlm_plock minor 54 Jun 12 13:47:57 dlm_controld /dev/misc/dlm-monitor fd 12 Jun 12 13:47:57 dlm_controld /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 Jun 12 13:47:57 dlm_controld /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 Have you uumounted on all nodes and run fsck.gfs2 between these runs? If so what output did you get from fsck? Ok, I can reproduce this problem always, new results included with fsck Steps: 1- From a test station runned against node 0 four times: imaptest host=192.168.164.95 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 2- Results ok, no GFS2 hangs or corruption 3- From a test station runned against node1(second node) only one test: imaptest host=192.168.164.96 userfile=userfile port=143 mbox=mail/dovecot-crlf no_tracking logout=0 clients=20 secs=30 seed=123 4- Node0(first node) reports GFS2 corruption and hangs with withdraw: 5- Restarted both nodes, unmounted gfs2 filesystems and did fsck, here output: [root@node0 ~]# fsck -y /dev/vg_indexes/lv_indexes fsck from util-linux-ng 2.17.2 Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Error: resource group 65555 (0x10013): free space (64116) does not match bitmap (64117) (1 blocks were reclaimed) The rgrp was fixed. RGs: Consistent: 19 Inconsistent: 1 Fixed: 1 Total: 20 Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Ondisk and fsck bitmaps differ at block 67618 (0x10822) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67619 (0x10823) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67620 (0x10824) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67621 (0x10825) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67628 (0x1082c) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67629 (0x1082d) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67652 (0x10844) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67707 (0x1087b) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67731 (0x10893) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67775 (0x108bf) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67786 (0x108ca) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67818 (0x108ea) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67850 (0x1090a) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67867 (0x1091b) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67884 (0x1092c) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67918 (0x1094e) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 67940 (0x10964) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. RG #65555 (0x10013) free count inconsistent: is 64117 should be 64134 Inode count inconsistent: is 107 should be 106 Resource group counts updated Ondisk and fsck bitmaps differ at block 720999 (0xb0067) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. RG #720905 (0xb0009) free count inconsistent: is 65491 should be 65492 Resource group counts updated Pass5 complete The statfs file is wrong: Current statfs values: blocks: 1310564 (0x13ff64) free: 1242848 (0x12f6e0) dinodes: 157 (0x9d) Calculated statfs values: blocks: 1310564 (0x13ff64) free: 1242867 (0x12f6f3) dinodes: 156 (0x9c) The statfs file was fixed. Writing changes to disk gfs2_fsck complete [root@node0 ~]# [root@node0 ~]# fsck -y /dev/vg_mailbox/lv_mailbox fsck from util-linux-ng 2.17.2 Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete [root@node0 ~]# Logs in node0: /var/log/messages Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: fatal: filesystem consistency error Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: inode = 1523 67617 Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: about to withdraw this file system Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: telling LM to unmount Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: withdrawn Jun 13 10:28:00 node0 kernel: Pid: 2210, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Jun 13 10:28:00 node0 kernel: Call Trace: Jun 13 10:28:00 node0 kernel: [<ffffffffa0537fd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffffa050d209>] ? trunc_dealloc+0xa9/0x130 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffffa05381dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffffa051d584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffffa05361da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffffa05360ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffffa0536020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 Jun 13 10:28:00 node0 kernel: [<ffffffffa051a940>] ? delete_work_func+0x0/0x80 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 Jun 13 10:28:00 node0 kernel: [<ffffffffa0534c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffff8118bf82>] ? iput+0x62/0x70 Jun 13 10:28:00 node0 kernel: [<ffffffffa051a994>] ? delete_work_func+0x54/0x80 [gfs2] Jun 13 10:28:00 node0 kernel: [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 Jun 13 10:28:00 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:28:00 node0 kernel: [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 Jun 13 10:28:00 node0 kernel: [<ffffffff8108dd96>] ? kthread+0x96/0xa0 Jun 13 10:28:00 node0 kernel: [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 Jun 13 10:28:00 node0 kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Jun 13 10:28:00 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:28:00 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:28:00 node0 kernel: no_formal_ino = 1523 Jun 13 10:28:00 node0 kernel: no_addr = 67617 Jun 13 10:28:00 node0 kernel: i_disksize = 65536 Jun 13 10:28:00 node0 kernel: blocks = 0 Jun 13 10:28:00 node0 kernel: i_goal = 67940 Jun 13 10:28:00 node0 kernel: i_diskflags = 0x00000000 Jun 13 10:28:00 node0 kernel: i_height = 1 Jun 13 10:28:00 node0 kernel: i_depth = 0 Jun 13 10:28:00 node0 kernel: i_entries = 0 Jun 13 10:28:00 node0 kernel: i_eattr = 0 Jun 13 10:28:00 node0 kernel: GFS2: fsid=MailCluster:indexes.0: gfs2_delete_inode: -5 Jun 13 10:28:00 node0 kernel: gdlm_unlock 5,10821 err=-22 Jun 13 10:30:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:30:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:30:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:30:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:30:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:30:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:30:13 node0 kernel: Call Trace: Jun 13 10:30:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:30:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:30:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:30:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:30:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:30:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:30:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:30:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:30:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:30:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:30:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:32:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:32:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:32:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:32:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:32:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:32:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:32:13 node0 kernel: Call Trace: Jun 13 10:32:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:32:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:32:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:32:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:32:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:32:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:32:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:32:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:32:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:32:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:32:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:34:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:34:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:34:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:34:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:34:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:34:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:34:13 node0 kernel: Call Trace: Jun 13 10:34:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:34:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:34:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:34:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:34:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:34:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:34:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:34:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:34:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:34:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:34:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:36:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:36:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:36:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:36:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:36:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:36:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:36:13 node0 kernel: Call Trace: Jun 13 10:36:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:36:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:36:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:36:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:36:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:36:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:36:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:36:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:36:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:36:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:36:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:38:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:38:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:38:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:38:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:38:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:38:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:38:13 node0 kernel: Call Trace: Jun 13 10:38:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:38:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:38:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:38:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:38:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:38:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:38:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:38:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:38:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:38:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:38:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:40:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:40:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:40:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:40:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:40:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:40:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:40:13 node0 kernel: Call Trace: Jun 13 10:40:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:40:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:40:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:40:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:40:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:40:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:40:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:40:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:40:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:40:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:40:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 13 10:42:13 node0 kernel: INFO: task gfs2_logd:3171 blocked for more than 120 seconds. Jun 13 10:42:13 node0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 13 10:42:13 node0 kernel: gfs2_logd D 0000000000000000 0 3171 2 0x00000080 Jun 13 10:42:13 node0 kernel: ffff88003b9d1dd0 0000000000000046 0000000000000004 00000000790111cc Jun 13 10:42:13 node0 kernel: ffff88003cb48950 0000000000000441 ffff88003b9d1d70 ffffffff811a281e Jun 13 10:42:13 node0 kernel: ffff88003cc78678 ffff88003b9d1fd8 000000000000f598 ffff88003cc78678 Jun 13 10:42:13 node0 kernel: Call Trace: Jun 13 10:42:13 node0 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 13 10:42:13 node0 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 13 10:42:13 node0 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 13 10:42:13 node0 kernel: [<ffffffffa052016a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 13 10:42:13 node0 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 13 10:42:13 node0 kernel: [<ffffffffa05204b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 13 10:42:13 node0 kernel: [<ffffffffa05203d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 13 10:42:13 node0 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 13 10:42:13 node0 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 13 10:42:13 node0 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 13 10:42:13 node0 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 /var/log/cluster/dlm_controld.log Jun 13 10:28:00 dlm_controld uevent: offline@/kernel/dlm/indexes Jun 13 10:28:00 dlm_controld kernel: offline@ indexes Jun 13 10:28:00 dlm_controld dlm:ls:indexes conf 1 0 1 memb 2 join left 1 Jun 13 10:28:00 dlm_controld indexes confchg for our leave Jun 13 10:28:00 dlm_controld indexes stop_kernel cg 0 Jun 13 10:28:00 dlm_controld write "0" to "/sys/kernel/dlm/indexes/control" Jun 13 10:28:00 dlm_controld dir_member 2 Jun 13 10:28:00 dlm_controld dir_member 1 Jun 13 10:28:00 dlm_controld set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/indexes/nodes/2" Jun 13 10:28:00 dlm_controld set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/indexes/nodes/1" Jun 13 10:28:00 dlm_controld set_members lockspace rmdir "/sys/kernel/config/dlm/cluster/spaces/indexes" Jun 13 10:28:00 dlm_controld write "0" to "/sys/kernel/dlm/indexes/event_done" Jun 13 10:28:00 dlm_controld uevent: remove@/kernel/dlm/indexes Jun 13 10:28:00 dlm_controld kernel: remove@ indexes Dovecot developers points to a problem in gfs2 filesystem, I have configured dovecot following dovecots developers directions for thi test: [root@node0 ~]# dovecot -n # 2.0.9: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-131.2.1.el6.x86_64 x86_64 Red Hat Enterprise Linux Server release 6.1 (Santiago) ext4 auth_default_realm = example.com auth_mechanisms = plain login auth_worker_max_count = 60 disable_plaintext_auth = no listen = * mail_gid = vmail mail_location = mbox:/var/vmail/mailbox/%d/%3n/%n:INDEX=/var/vmail/indexes/%d/%3n/%n mail_uid = vmail mbox_write_locks = fcntl mmap_disable = yes passdb { args = /etc/dovecot/dovecot-ldap.conf.ext driver = ldap } ssl_cert = </etc/pki/dovecot/certs/dovecot.pem ssl_key = </etc/pki/dovecot/private/dovecot.pem userdb { args = /etc/dovecot/dovecot-ldap-userdb.conf.ext driver = ldap } [root@node0 ~]# I have done another test, this time using a clean install of rhel6.0 dvd(not rhel 6.1) and with no updates and the setup works ok, I cannot reproduce the error with rhel 6.0, something has changed wich produce this... best regards It works ok without crashing with the folowings versions of the components in rhel 6.0: lvm2-2.02.72-8.el6.x86_64 lvm2-cluster-2.02.72-8.el6.x86_64 kernel-2.6.32-71.el6.x86_64 cluster-glue-libs-1.0.5-2.el6.x86_64 device-mapper-libs-1.02.53-8.el6.x86_64 device-mapper-event-libs-1.02.53-8.el6.x86_64 modcluster-0.16.2-10.el6.x86_64 lvm2-libs-2.02.72-8.el6.x86_64 fence-agents-3.0.12-8.el6.x86_64 ricci-0.16.2-13.el6.x86_64 resource-agents-3.0.12-15.el6.x86_64 kernel-firmware-2.6.32-71.el6.noarch gfs2-utils-3.0.12-23.el6.x86_64 corosynclib-1.2.3-21.el6.x86_64 clusterlib-3.0.12-23.el6.x86_64 openais-1.1.1-6.el6.x86_64 fence-virt-0.2.1-5.el6.x86_64 cman-3.0.12-23.el6.x86_64 rgmanager-3.0.12-10.el6.x86_64 corosync-1.2.3-21.el6.x86_64 openaislib-1.1.1-6.el6.x86_64 (In reply to comment #13) > It works ok without crashing with the folowings versions of the components in > rhel 6.0: > > lvm2-2.02.72-8.el6.x86_64 > lvm2-cluster-2.02.72-8.el6.x86_64 > kernel-2.6.32-71.el6.x86_64 > cluster-glue-libs-1.0.5-2.el6.x86_64 > device-mapper-libs-1.02.53-8.el6.x86_64 > device-mapper-event-libs-1.02.53-8.el6.x86_64 > modcluster-0.16.2-10.el6.x86_64 > lvm2-libs-2.02.72-8.el6.x86_64 > fence-agents-3.0.12-8.el6.x86_64 > ricci-0.16.2-13.el6.x86_64 > resource-agents-3.0.12-15.el6.x86_64 > kernel-firmware-2.6.32-71.el6.noarch > gfs2-utils-3.0.12-23.el6.x86_64 > corosynclib-1.2.3-21.el6.x86_64 > clusterlib-3.0.12-23.el6.x86_64 > openais-1.1.1-6.el6.x86_64 > fence-virt-0.2.1-5.el6.x86_64 > cman-3.0.12-23.el6.x86_64 > rgmanager-3.0.12-10.el6.x86_64 > corosync-1.2.3-21.el6.x86_64 > openaislib-1.1.1-6.el6.x86_64 Hello Aliet, My name is Adam Drew; I'll be serving as the back-line support engineering resource on your support ticket. Now that we're engaged through support we'll want to keep the updates coming in through support. Your case number with Red Hat Support is 00487362. You can use that case to provide Red Hat information on your ticket. We'll then use the bug report to track progress on resolving the issue at the engineering level. Bugzilla is not a support tool and carries no formal SLA so please keep the updates flowing in through support via the ticket. Support and engineering will work together internally and through the bug to help resolve your issue as fast as possible. Thanks very much, Adam Drew Software Maintenance Engineer Red Hat Support The original assert that was hit indicates that the problem was an incorrect number of blocks in an inode that was being deallocated. The subsequent assert reported the same problem, but at a slightly different time during the deallocation process. The report from fsck shows that some fixing up of the filesystem has occurred, but in general it was fairly minor, and probably restricted to finishing the removal of whatever inode(s) were being unlinked at the time. We need to try and establish when the block count became incorrect. Whether it was during the deallocation itself, or whether during some prior operation. Hi, We can confirm this issue as we had the same incident on Friday with our production cluster. In our case, we use GFS2 filsystem to store KVM virtual guests images. Cluster has been up in production for 2 months now (rhel 6.0 back then), this is the first time this type of incident occurs. Version-Release number of components are the same as Aliet's ones. -- Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: fatal: filesystem consistency error Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: inode = 6 62175922 Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: about to withdraw this file system Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: telling LM to unmount Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: withdrawn Jun 10 12:33:44 kvm1 kernel: Pid: 3227, comm: delete_workqueu Not tainted 2.6.32-131.2.1.el6.x86_64 #1 Jun 10 12:33:44 kvm1 kernel: Call Trace: Jun 10 12:33:44 kvm1 kernel: [<ffffffffa067cfd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffffa0652209>] ? trunc_dealloc+0xa9/0x130 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffffa067d1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffffa0662584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffffa067b1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffffa067b0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffffa067b020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 Jun 10 12:33:44 kvm1 kernel: [<ffffffffa065f940>] ? delete_work_func+0x0/0x80 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 Jun 10 12:33:44 kvm1 kernel: [<ffffffffa0679c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffff8118bf82>] ? iput+0x62/0x70 Jun 10 12:33:44 kvm1 kernel: [<ffffffffa065f994>] ? delete_work_func+0x54/0x80 [gfs2] Jun 10 12:33:44 kvm1 kernel: [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 Jun 10 12:33:44 kvm1 kernel: type=1400 audit(1307702024.151:5): avc: denied { read } for pid=9569 comm="dmsetup" name="udev.conf" dev=dm-0 ino=130984 scon text=system_u:system_r:gfs_controld_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file Jun 10 12:33:44 kvm1 kernel: type=1400 audit(1307702024.151:6): avc: denied { open } for pid=9569 comm="dmsetup" name="udev.conf" dev=dm-0 ino=130984 scon text=system_u:system_r:gfs_controld_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file Jun 10 12:33:44 kvm1 kernel: type=1400 audit(1307702024.151:7): avc: denied { getattr } for pid=9569 comm="dmsetup" path="/etc/udev/udev.conf" dev=dm-0 in o=130984 scontext=system_u:system_r:gfs_controld_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file Jun 10 12:33:44 kvm1 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 10 12:33:44 kvm1 kernel: [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 Jun 10 12:33:44 kvm1 kernel: type=1400 audit(1307702024.152:8): avc: denied { getattr } for pid=9569 comm="dmsetup" path="/dev/dm-13" dev=devtmpfs ino=16600 scontext=system_u:system_r:gfs_controld_t:s0 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file Jun 10 12:33:44 kvm1 kernel: [<ffffffff8108dd96>] ? kthread+0x96/0xa0 Jun 10 12:33:44 kvm1 kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Jun 10 12:33:44 kvm1 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 10 12:33:44 kvm1 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 10 12:33:44 kvm1 kernel: no_formal_ino = 6 Jun 10 12:33:44 kvm1 kernel: no_addr = 62175922 Jun 10 12:33:44 kvm1 kernel: i_disksize = 10737418240 Jun 10 12:33:44 kvm1 kernel: blocks = 0 Jun 10 12:33:44 kvm1 kernel: i_goal = 64050649 Jun 10 12:33:44 kvm1 kernel: i_diskflags = 0x00000000 Jun 10 12:33:44 kvm1 kernel: i_height = 3 Jun 10 12:33:44 kvm1 kernel: i_depth = 0 Jun 10 12:33:44 kvm1 kernel: i_entries = 0 Jun 10 12:33:44 kvm1 kernel: i_eattr = 0 Jun 10 12:33:44 kvm1 kernel: GFS2: fsid=vmcluster:libvirtimages.0: gfs2_delete_inode: -5 Jun 10 12:33:44 kvm1 kernel: gdlm_unlock 5,3b47a44 err=-22 Jun 10 12:33:44 kvm1 kernel: gdlm_unlock 5,3b4bab2 err=-22 Jun 10 12:33:44 kvm1 kernel: kvm: 9528: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd Jun 10 12:36:35 kvm1 kernel: br0: port 11(vnet15) entering disabled state Jun 10 12:36:35 kvm1 kernel: device vnet15 left promiscuous mode Jun 10 12:36:35 kvm1 kernel: br0: port 11(vnet15) entering disabled state Jun 10 12:36:35 kvm1 libvirtd: 12:36:35.740: 3431: error : qemuSecurityDACRestoreSecurityFileLabel:84 : cannot resolve symlink /var/lib/libvirt/images/template_x86_64_F15/template_x86_64_F15.img: Input/output error Jun 10 12:36:35 kvm1 libvirtd: 12:36:35.744: 3431: error : qemuSecurityDACRestoreSecurityFileLabel:84 : cannot resolve symlink /var/lib/libvirt/boot/virtinst-vmlinuz.jWAS58: Input/output error Jun 10 12:36:35 kvm1 libvirtd: 12:36:35.744: 3431: error : qemuSecurityDACRestoreSecurityFileLabel:84 : cannot resolve symlink /var/lib/libvirt/boot/virtinst-initrd.img.dwu6K9: Input/output error Jun 10 12:36:35 kvm1 libvirtd: 12:36:35.744: 3431: warning : SELinuxRestoreSecurityFileLabel:434 : cannot resolve symlink /var/lib/libvirt/images/template_x86_64_F15/template_x86_64_F15.img: Input/output error Jun 10 12:36:35 kvm1 libvirtd: 12:36:35.744: 3431: warning : SELinuxRestoreSecurityFileLabel:434 : cannot resolve symlink /var/lib/libvirt/boot/virtinst-vmlinuz.jWAS58: Input/output error Jun 10 12:36:35 kvm1 libvirtd: 12:36:35.744: 3431: warning : SELinuxRestoreSecurityFileLabel:434 : cannot resolve symlink /var/lib/libvirt/boot/virtinst-initrd.img.dwu6K9: Input/output error Jun 10 12:38:17 kvm1 kernel: INFO: task gfs2_logd:3258 blocked for more than 120 seconds. Jun 10 12:38:17 kvm1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 10 12:38:17 kvm1 kernel: gfs2_logd D 0000000000000004 0 3258 2 0x00000000 Jun 10 12:38:17 kvm1 kernel: ffff880fe398bdd0 0000000000000046 0000000000000031 00000000eb9b2366 Jun 10 12:38:17 kvm1 kernel: ffff880c9502f5a8 0000000000000441 ffff880fe398bd70 ffffffff811a281e Jun 10 12:38:17 kvm1 kernel: ffff880fe3989b38 ffff880fe398bfd8 000000000000f598 ffff880fe3989b38 Jun 10 12:38:17 kvm1 kernel: Call Trace: Jun 10 12:38:17 kvm1 kernel: [<ffffffff811a281e>] ? submit_bh+0x10e/0x150 Jun 10 12:38:17 kvm1 kernel: [<ffffffff81098c99>] ? ktime_get_ts+0xa9/0xe0 Jun 10 12:38:17 kvm1 kernel: [<ffffffff814db013>] io_schedule+0x73/0xc0 Jun 10 12:38:17 kvm1 kernel: [<ffffffffa066516a>] gfs2_log_flush+0x44a/0x6b0 [gfs2] Jun 10 12:38:17 kvm1 kernel: [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 Jun 10 12:38:17 kvm1 kernel: [<ffffffffa06654b1>] gfs2_logd+0xe1/0x150 [gfs2] Jun 10 12:38:17 kvm1 kernel: [<ffffffffa06653d0>] ? gfs2_logd+0x0/0x150 [gfs2] Jun 10 12:38:17 kvm1 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 Jun 10 12:38:17 kvm1 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Jun 10 12:38:17 kvm1 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 Jun 10 12:38:17 kvm1 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Jun 10 12:40:17 kvm1 kernel: INFO: task gfs2_logd:3258 blocked for more than 120 seconds. -- fsck output (truncated, sadly we don't have the whole output): -- [root@kvm2 ~]# fsck.gfs2 -y /dev/mapper/vg_imagesvm-LogVolLibvirtImages .. Ondisk and fsck bitmaps differ at block 65577579 (0x3e8a26b) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577580 (0x3e8a26c) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577581 (0x3e8a26d) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577582 (0x3e8a26e) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577583 (0x3e8a26f) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577584 (0x3e8a270) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577585 (0x3e8a271) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577586 (0x3e8a272) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577587 (0x3e8a273) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577588 (0x3e8a274) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577589 (0x3e8a275) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577590 (0x3e8a276) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577591 (0x3e8a277) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577592 (0x3e8a278) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577593 (0x3e8a279) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577594 (0x3e8a27a) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577595 (0x3e8a27b) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577596 (0x3e8a27c) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577597 (0x3e8a27d) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577598 (0x3e8a27e) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577599 (0x3e8a27f) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577600 (0x3e8a280) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577601 (0x3e8a281) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577602 (0x3e8a282) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577603 (0x3e8a283) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577604 (0x3e8a284) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577605 (0x3e8a285) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577606 (0x3e8a286) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577607 (0x3e8a287) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577608 (0x3e8a288) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577609 (0x3e8a289) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577610 (0x3e8a28a) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577611 (0x3e8a28b) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577612 (0x3e8a28c) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65577883 (0x3e8a39b) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65578663 (0x3e8a6a7) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579173 (0x3e8a8a5) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579444 (0x3e8a9b4) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579445 (0x3e8a9b5) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579446 (0x3e8a9b6) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579447 (0x3e8a9b7) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579448 (0x3e8a9b8) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579449 (0x3e8a9b9) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579450 (0x3e8a9ba) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579451 (0x3e8a9bb) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579452 (0x3e8a9bc) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579453 (0x3e8a9bd) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579454 (0x3e8a9be) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579455 (0x3e8a9bf) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579456 (0x3e8a9c0) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579457 (0x3e8a9c1) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579458 (0x3e8a9c2) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579459 (0x3e8a9c3) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579460 (0x3e8a9c4) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579461 (0x3e8a9c5) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579462 (0x3e8a9c6) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579463 (0x3e8a9c7) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579464 (0x3e8a9c8) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579465 (0x3e8a9c9) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579466 (0x3e8a9ca) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579467 (0x3e8a9cb) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579468 (0x3e8a9cc) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579469 (0x3e8a9cd) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579470 (0x3e8a9ce) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579471 (0x3e8a9cf) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579472 (0x3e8a9d0) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579473 (0x3e8a9d1) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579474 (0x3e8a9d2) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579475 (0x3e8a9d3) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579476 (0x3e8a9d4) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579477 (0x3e8a9d5) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579478 (0x3e8a9d6) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579479 (0x3e8a9d7) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579480 (0x3e8a9d8) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579481 (0x3e8a9d9) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579482 (0x3e8a9da) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579483 (0x3e8a9db) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579484 (0x3e8a9dc) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579485 (0x3e8a9dd) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579486 (0x3e8a9de) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579487 (0x3e8a9df) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579488 (0x3e8a9e0) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579489 (0x3e8a9e1) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579490 (0x3e8a9e2) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579491 (0x3e8a9e3) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579492 (0x3e8a9e4) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65579634 (0x3e8aa72) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65580144 (0x3e8ac70) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65580654 (0x3e8ae6e) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65581164 (0x3e8b06c) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65581674 (0x3e8b26a) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65582184 (0x3e8b468) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. Ondisk and fsck bitmaps differ at block 65582694 (0x3e8b666) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Succeeded. RG #65517205 (0x3e7b695) free count inconsistent: is 39677 should be 46736 Resource group counts updated Pass5 complete The statfs file is wrong: Current statfs values: blocks: 208121104 (0xc67ad10) free: 141275491 (0x86bb163) dinodes: 128 (0x80) Calculated statfs values: blocks: 208121104 (0xc67ad10) free: 141338071 (0x86ca5d7) dinodes: 127 (0x7f) The statfs file was fixed. Writing changes to disk gfs2_fsck complete -- Case opened with Red Hat Support (Case number: 00487747) Regards -- CAPELLE Benoit System / Network Administrator LaBRI - Universite Bordeaux I was able to reproduce this in our QE lab: GFS2: fsid=dash:indexes.0: fatal: filesystem consistency error GFS2: fsid=dash:indexes.0: inode = 57 197498 GFS2: fsid=dash:indexes.0: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, l GFS2: fsid=dash:indexes.0: about to withdraw this file system GFS2: fsid=dash:indexes.0: telling LM to unmount GFS2: fsid=dash:indexes.0: withdrawn Pid: 6707, comm: delete_workqueu Not tainted 2.6.32-131.0.15.el6.x86_64 #1 Call Trace: [<ffffffffa02f9fd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] [<ffffffffa02cf209>] ? trunc_dealloc+0xa9/0x130 [gfs2] [<ffffffffa02fa1dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] [<ffffffffa02df584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] [<ffffffffa02f81da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] [<ffffffffa02f80ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] [<ffffffffa02f8020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] [<ffffffff8118d11e>] ? generic_delete_inode+0xde/0x1d0 [<ffffffffa02dc940>] ? delete_work_func+0x0/0x80 [gfs2] [<ffffffff8118d275>] ? generic_drop_inode+0x65/0x80 [<ffffffffa02f6c4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] [<ffffffff8118c0e2>] ? iput+0x62/0x70 [<ffffffffa02dc994>] ? delete_work_func+0x54/0x80 [gfs2] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff8108dd60>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 no_formal_ino = 57 no_addr = 197498 i_disksize = 49152 blocks = 0 i_goal = 328157 i_diskflags = 0x00000000 i_height = 1 i_depth = 0 i_entries = 0 i_eattr = 0 GFS2: fsid=dash:indexes.0: gfs2_delete_inode: -5 gdlm_unlock 5,3037a err=-22 gdlm_unlock 2,30379 err=-22 gdlm_unlock 5,30379 err=-22 RE: comment #19 Was that with dovecot or with some other reproducer? (In reply to comment #21) > RE: comment #19 > > Was that with dovecot or with some other reproducer? With dovecot and imaptest as described in the original report. Reproduced with dovecot on kernel-2.6.32-158.el6. Looking at the trace, this is the last mention of the glock associated with the problem inode: delete_workqueu-1981 [000] 927.071760: gfs2_glock_put: 253,2 glock 5:66513 state NL => IV flags:Iq So thats just dropping the glock, probably after the actual issue occurred. Looking back further we have this: delete_workqueu-1981 [000] 925.517813: gfs2_demote_rq: 253,2 glock 5:66513 demote EX to NL flags:DIq delete_workqueu-1981 [000] 925.517817: gfs2_glock_queue: 253,2 glock 5:66513 dequeue EX delete_workqueu-1981 [000] 925.517826: gfs2_glock_queue: 253,2 glock 2:66513 dequeue EX glock_workqueue-1980 [000] 925.521237: gfs2_glock_state_change: 253,2 glock 5:66513 state EX to NL tgt:NL dmt:NL flags:lDpIq So a demote from EX to NL of the type 5 glock, which is normal for deallocation of inodes. This is run from delete_workqueue which suggests that this inode has been found during an attempt to allocate blocks for a different inode. So this may well not be the node which actually tried to unlink this inode originally. Just prior to that we have: delete_workqueu-1981 [000] 925.397176: gfs2_block_alloc: 253,2 bmap 66513 alloc 66515/3 free delete_workqueu-1981 [000] 925.397191: gfs2_pin: 253,2 log pin 131090/4096 inode 131090 delete_workqueu-1981 [000] 925.397192: gfs2_block_alloc: 253,2 bmap 66513 alloc 131095/1 free delete_workqueu-1981 [000] 925.397193: gfs2_block_alloc: 253,2 bmap 66513 alloc 131102/1 free delete_workqueu-1981 [000] 925.397194: gfs2_block_alloc: 253,2 bmap 66513 alloc 131104/1 free delete_workqueu-1981 [000] 925.397196: gfs2_block_alloc: 253,2 bmap 66513 alloc 131106/2 free delete_workqueu-1981 [000] 925.397197: gfs2_block_alloc: 253,2 bmap 66513 alloc 131109/2 free delete_workqueu-1981 [000] 925.397198: gfs2_block_alloc: 253,2 bmap 66513 alloc 131112/1 free delete_workqueu-1981 [000] 925.397199: gfs2_block_alloc: 253,2 bmap 66513 alloc 131114/1 free delete_workqueu-1981 [000] 925.397200: gfs2_block_alloc: 253,2 bmap 66513 alloc 131116/1 free delete_workqueu-1981 [000] 925.397202: gfs2_block_alloc: 253,2 bmap 66513 alloc 66601/1 free delete_workqueu-1981 [000] 925.397203: gfs2_block_alloc: 253,2 bmap 66513 alloc 66604/1 free delete_workqueu-1981 [000] 925.397208: gfs2_pin: 253,2 log pin 655370/4096 inode 655370 delete_workqueu-1981 [000] 925.397209: gfs2_block_alloc: 253,2 bmap 66513 alloc 655382/1 free delete_workqueu-1981 [000] 925.397214: gfs2_pin: 253,2 log pin 983045/4096 inode 983045 delete_workqueu-1981 [000] 925.397215: gfs2_block_alloc: 253,2 bmap 66513 alloc 983072/1 free This is (mostly) just deallocation of the blocks belonging to the inode, so nothing that we wouldn't expect there, since the next step is to deallocate the inode itself. Prior to that we have: delete_workqueu-1981 [000] 925.141990: gfs2_block_alloc: 253,2 bmap 66513 alloc 66514/1 free delete_workqueu-1981 [000] 925.143977: gfs2_pin: 253,2 log pin 66513/4096 inode 66513 So another block being freed - looks like the first block of the inode, since it appears to have a block number directly adjacent to the inode. Odd that there is a gap of .35 of a second between this and the other frees that are going on. Perhaps a log flush, or something like that got in between. Prior to that we have: delete_workqueu-1981 [000] 925.100207: gfs2_demote_rq: 253,2 glock 5:66513 demote PR to NL flags:DIq delete_workqueu-1981 [000] 925.100207: gfs2_glock_queue: 253,2 glock 5:66513 dequeue PR glock_workqueue-1980 [000] 925.100778: gfs2_glock_state_change: 253,2 glock 5:66513 state PR to NL tgt:NL dmt:NL flags:lDpIq glock_workqueue-1980 [000] 925.135388: gfs2_glock_state_change: 253,2 glock 5:66513 state NL to EX tgt:EX dmt:EX flags:lIq glock_workqueue-1980 [000] 925.135391: gfs2_promote: 253,2 glock 5:66513 promote other EX and this sequence is indicative of the start of the deallocation process, since here we see the type 5 glock move to EX state. So far nothing out of the ordinary. Prior to that: glock_workqueue-1980 [000] 924.804809: gfs2_glock_state_change: 253,2 glock 2:66513 state NL to EX tgt:EX dmt:EX flags:lIq glock_workqueue-1980 [000] 924.804811: gfs2_promote: 253,2 glock 2:66513 promote first EX So a request for an EX lock. Nothing appears to have changed after that, so it is a bit of a mystery why this lock has been requested. Prior to that: dlm_astd-1931 [000] 912.234983: gfs2_demote_rq: 253,2 glock 5:66513 demote PR to NL flags:DIq So this is a notification that another node is trying to deallocate the same inode. It is harmless, but it is also probably the trigger for this node (node1) to start the deallocation process which led to the problem. Prior to that: dlm_astd-1931 [000] 906.278354: gfs2_demote_rq: 253,2 glock 2:66513 demote PR to NL flags:DI glock_workqueue-1980 [000] 906.278358: gfs2_glock_state_change: 253,2 glock 2:66413 state PR to PR tgt:EX dmt:EX flags:lIq glock_workqueue-1980 [000] 906.278548: gfs2_glock_state_change: 253,2 glock 2:66413 state PR to NL tgt:EX dmt:EX flags:lIq glock_workqueue-1980 [000] 906.278576: gfs2_glock_state_change: 253,2 glock 2:66513 state PR to NL tgt:NL dmt:NL flags:lDpI So this is a request from a remote node to get the inode glock in EX state which is to be expected when it is trying to deallocate the inode, since the sequence goes inode glock -> iopen glock in this case. This node then drops its inode glock as requested. So overall, there doesn't appear to be anything out of the ordinary in the trace that I can see. The fact that the deallocation process has got as far as deallocating all the other blocks associated with the inode, bar the inode itself makes me think that we have an "off by one" error, since if this were not the case, the problem would have occurred earlier in the deallocation sequence. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I copied the test 6.1.z test kernel to this location: http://people.redhat.com/rpeterso/Experimental/RHEL6.x/kernel* I tested with Bob's 2.6.32-131.5.1.el6_1.bz712139.x86_64 and I was not able to reproduce the issue. Starting a regular regression run on the kernel. *** Bug 699956 has been marked as a duplicate of this bug. *** *** Bug 713949 has been marked as a duplicate of this bug. *** Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When the responsibility for deallocation is passed from one node to another the receiving node may not have a fully uptodate inode state. If the sending node has changed the important parts of the state in the mean time (block allocation/deallocation) then this results in triggering an assert during the deallocation on the receiving node. This is easily fixed with fsck and the only data lost will be the file that was being deallocated in the first place. There is no workaround beyond avoiding unlinking inodes. Test kernel: http://download.devel.redhat.com/brewroot/scratch/adas/task_3413358/ Patches included: patches/rhel6-introduce-ail-lock.patch patches/rhel6-update-to-ail-list-locking.patch patches/rhel6-alter-point-of-entry-to-glock-lru-list-for-glocks.patch patches/rhel6-use-filemap_fdatawrite-to-write-back-the-ail.patch patches/rhel6-make-write_inode-really-write.patch patches/rhel6-sync_inode_metadata.patch patches/rhel6-fsync.patch patches/rhel6-optimize-glock-lru-and-eol-inodes.patch patches/rhel6-improve-tracing.patch patches/rhel6-make-ail-writeback-responsive.patch patches/rhel6-ail-writeback-tracepoint.patch patches/rhel6-fix-ail-list-traversal.patch patches/rhel6-bz676626-debug.patch patches/rhel6-bz663356.patch patches/rhel6-bz712139.patch patches/rhel6-bz697019.patch This kernel has prevented my regular GFS2 issues with RHEL 6.1... so I consider it a fix for my problems. Thank you Red Hat! Do you have a release date for this kernel ? When running the 2.6.32-131.4.1.el6 we were seeing GFS2 failing every day! We have dropped back to the 6.0 kernel 2.6.32-71.24.1.el6 for the moment. I presume that RH support will not support us running a test kernel. So a bit stuck..... If the new kernel will be released soon I don't really want to go through the rigmarole opening a service request sosreports etc when this is clearly the issue. And they will probably just point me back to this test kernel that they won't support. Sounds like something has gone wrong, you should be supported on the hotfix kernel until the new release is available. Do you have a ticket open for this issue? I'll try and get someone to assist on this issue. I haven't opened a call as it's pretty obviously this issue. I was wondering if the kernel is round the corner, opening a support call will take several hours of my time. Again, if it takes several hours, then something is amiss. You should be able to simply put in a ticket asking for the hotfix kernel for this particular bugzilla. If there is any problem with that, then let me know and I'll follow up, but I can't do that while there is no ticket to follow up to. (In reply to comment #50) > I haven't opened a call as it's pretty obviously this issue. I was wondering if > the kernel is round the corner, opening a support call will take several hours > of my time. I've got the support case opened on your behalf by Dell and will work on the issue through there. Patch(es) available on kernel-2.6.32-166.el6 When is a kernel with this fix being released? Joshua, please contact Red Hat support directly in order to answer your question. Verified with imaptest to dovecot on GFS2 with kernel-2.6.32-188.el6.x86_64 *** Bug 732826 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |