Bug 1150244 - glusterfsd hangs on IO when underlying ext4 filesystem corrupts an xattr
Summary: glusterfsd hangs on IO when underlying ext4 filesystem corrupts an xattr
Keywords:
Status: CLOSED DUPLICATE of bug 1100204
Alias: None
Product: GlusterFS
Classification: Community
Component: unclassified
Version: 3.5.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: GlusterFS Bugs list
QA Contact:
URL:
Whiteboard:
Depends On: 1130242
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-07 18:51 UTC by rglick
Modified: 2014-10-28 11:43 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-10-28 11:40:58 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description rglick 2014-10-07 18:51:20 UTC
Description of problem:

glusterfsd process will hang (does not respond go glusterfs requests but appears to still be running) when the underlying ext4 filesystem gets a corrupted xattr.

IO to the affected brick will be stuck (glusterfsd process turns into a zombie when killed), only a reboot, fsck, and subsequent startup of gluster-server resolves the issue

This may be related (subset?) of https://bugzilla.redhat.com/show_bug.cgi?id=832609

kernel messages look like this

Oct  7 05:34:30 ghost9 kernel: [82029.008044] ------------[ cut here ]------------
Oct  7 05:34:30 ghost9 kernel: [82029.008063] WARNING: CPU: 4 PID: 2257 at /build/buildd/linux-lts-saucy-3.11.0/fs/ext4/ext4_jbd2.c:259 __ext4_handle_dirty_metadata+0x1a9/0x1c0()
Oct  7 05:34:30 ghost9 kernel: [82029.008065] Modules linked in: rpcsec_gss_krb5 nfsv4 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nouveau ttm snd_pcm mei_me snd_timer drm_kms_helper drm psmouse nfsd mei snd eeepc_wmi soundcore asus_wmi lpc_ich snd_page_alloc sparse_keymap i2c_algo_bit mxm_wmi video serio_raw mac_hid wmi lp nfs_acl auth_rpcgss parport nfs fscache lockd sunrpc ixgbe dca ahci libahci e1000e firewire_ohci firewire_core ptp mdio crc_itu_t pps_core
Oct  7 05:34:30 ghost9 kernel: [82029.008104] CPU: 4 PID: 2257 Comm: glusterfsd Not tainted 3.11.0-20-generic #34~precise1-Ubuntu
Oct  7 05:34:30 ghost9 kernel: [82029.008106] Hardware name: System manufacturer System Product Name/P9X79 WS, BIOS 4306 08/22/2013
Oct  7 05:34:30 ghost9 kernel: [82029.008108]  0000000000000103 ffff880fdd365998 ffffffff8173dd2d 0000000000000007
Oct  7 05:34:30 ghost9 kernel: [82029.008111]  0000000000000000 ffff880fdd3659d8 ffffffff8106540c ffff880fdde52180
Oct  7 05:34:30 ghost9 kernel: [82029.008112]  ffff880eb9af5000 00000000ffffff8b ffff8800878b08b0 ffff880fdde52180
Oct  7 05:34:30 ghost9 kernel: [82029.008115] Call Trace:
Oct  7 05:34:30 ghost9 kernel: [82029.008123]  [<ffffffff8173dd2d>] dump_stack+0x46/0x58
Oct  7 05:34:30 ghost9 kernel: [82029.008128]  [<ffffffff8106540c>] warn_slowpath_common+0x8c/0xc0
Oct  7 05:34:30 ghost9 kernel: [82029.008130]  [<ffffffff8106545a>] warn_slowpath_null+0x1a/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.008132]  [<ffffffff8127f7c9>] __ext4_handle_dirty_metadata+0x1a9/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.008136]  [<ffffffff81290f03>] ext4_xattr_release_block+0x103/0x1f0
Oct  7 05:34:30 ghost9 kernel: [82029.008138]  [<ffffffff81291524>] ext4_xattr_block_set+0x204/0x710
Oct  7 05:34:30 ghost9 kernel: [82029.008140]  [<ffffffff81292170>] ext4_xattr_set_handle+0x370/0x490
Oct  7 05:34:30 ghost9 kernel: [82029.008143]  [<ffffffff81292329>] ? ext4_xattr_set+0x99/0x140
Oct  7 05:34:30 ghost9 kernel: [82029.008145]  [<ffffffff81292355>] ext4_xattr_set+0xc5/0x140
Oct  7 05:34:30 ghost9 kernel: [82029.008147]  [<ffffffff81292e8d>] ext4_xattr_trusted_set+0x2d/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.008153]  [<ffffffff811d8b6b>] generic_setxattr+0x6b/0x90
Oct  7 05:34:30 ghost9 kernel: [82029.008155]  [<ffffffff811d949b>] __vfs_setxattr_noperm+0x7b/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.008159]  [<ffffffff81337d8e>] ? evm_inode_setxattr+0xe/0x10
Oct  7 05:34:30 ghost9 kernel: [82029.008162]  [<ffffffff811d969c>] vfs_setxattr+0xbc/0xc0
Oct  7 05:34:30 ghost9 kernel: [82029.008164]  [<ffffffff811d97de>] setxattr+0x13e/0x1e0
Oct  7 05:34:30 ghost9 kernel: [82029.008170]  [<ffffffff817494fe>] ? _raw_spin_lock+0xe/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.008178]  [<ffffffff811b6ee3>] ? __sb_start_write+0x53/0x110
Oct  7 05:34:30 ghost9 kernel: [82029.008181]  [<ffffffff811d3492>] ? mnt_clone_write+0x12/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.008183]  [<ffffffff811d9c7e>] SyS_fsetxattr+0xbe/0x100
Oct  7 05:34:30 ghost9 kernel: [82029.008187]  [<ffffffff811d9e5d>] ? SyS_fgetxattr+0x7d/0xd0
Oct  7 05:34:30 ghost9 kernel: [82029.008193]  [<ffffffff8175291d>] system_call_fastpath+0x1a/0x1f
Oct  7 05:34:30 ghost9 kernel: [82029.008195] ---[ end trace 655f8cd7683964af ]---
Oct  7 05:34:30 ghost9 kernel: [82029.008198] EXT4-fs: ext4_handle_dirty_xattr_block:167: aborting transaction: error 117 in __ext4_handle_dirty_metadata
Oct  7 05:34:30 ghost9 kernel: [82029.008388] EXT4-fs error (device sda1): ext4_handle_dirty_xattr_block:167: inode #15879459: block 63987149: comm glusterfsd: journal_dirty_metadata failed: handle type 10 started at line 1173, credits 24/24, errcode -117
Oct  7 05:34:30 ghost9 kernel: [82029.008415] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4841: Readonly filesystem
Oct  7 05:34:30 ghost9 kernel: [82029.008464] EXT4-fs error (device sda1) in ext4_dirty_inode:4960: error 117
Oct  7 05:34:30 ghost9 kernel: [82029.008505] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
Oct  7 05:34:30 ghost9 kernel: [82029.008575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Oct  7 05:34:30 ghost9 kernel: [82029.008585] IP: [<ffffffff812708c1>] __ext4_error_inode+0x31/0x120
Oct  7 05:34:30 ghost9 kernel: [82029.008598] PGD 0 
Oct  7 05:34:30 ghost9 kernel: [82029.008603] Oops: 0000 [#1] SMP 
Oct  7 05:34:30 ghost9 kernel: [82029.008609] Modules linked in: rpcsec_gss_krb5 nfsv4 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nouveau ttm snd_pcm mei_me snd_timer drm_kms_helper drm psmouse nfsd mei snd eeepc_wmi soundcore asus_wmi lpc_ich snd_page_alloc sparse_keymap i2c_algo_bit mxm_wmi video serio_raw mac_hid wmi lp nfs_acl auth_rpcgss parport nfs fscache lockd sunrpc ixgbe dca ahci libahci e1000e firewire_ohci firewire_core ptp mdio crc_itu_t pps_core
Oct  7 05:34:30 ghost9 kernel: [82029.008698] CPU: 0 PID: 2257 Comm: glusterfsd Tainted: G        W    3.11.0-20-generic #34~precise1-Ubuntu
Oct  7 05:34:30 ghost9 kernel: [82029.008705] Hardware name: System manufacturer System Product Name/P9X79 WS, BIOS 4306 08/22/2013
Oct  7 05:34:30 ghost9 kernel: [82029.008711] task: ffff880fd8219770 ti: ffff880fdd364000 task.ti: ffff880fdd364000
Oct  7 05:34:30 ghost9 kernel: [82029.008716] RIP: 0010:[<ffffffff812708c1>]  [<ffffffff812708c1>] __ext4_error_inode+0x31/0x120
Oct  7 05:34:30 ghost9 kernel: [82029.008727] RSP: 0018:ffff880fdd365968  EFLAGS: 00010282
Oct  7 05:34:30 ghost9 kernel: [82029.008731] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000003c804f2
Oct  7 05:34:30 ghost9 kernel: [82029.008737] RDX: 0000000000001131 RSI: ffffffff81830eb0 RDI: 0000000000000000
Oct  7 05:34:30 ghost9 kernel: [82029.008745] RBP: ffff880fdd365a08 R08: ffffffff81b23460 R09: 000000000000000a
Oct  7 05:34:30 ghost9 kernel: [82029.008750] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001131
Oct  7 05:34:30 ghost9 kernel: [82029.008755] R13: 0000000000000000 R14: ffff880fdde52180 R15: ffffffff81b23460
Oct  7 05:34:30 ghost9 kernel: [82029.008761] FS:  00007fcb17efe700(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
Oct  7 05:34:30 ghost9 kernel: [82029.008766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  7 05:34:30 ghost9 kernel: [82029.008770] CR2: 0000000000000028 CR3: 0000000fd6d29000 CR4: 00000000001407f0
Oct  7 05:34:30 ghost9 kernel: [82029.008776] Stack:
Oct  7 05:34:30 ghost9 kernel: [82029.008779]  ffff880fdd365988 ffffffff811e8050 ffff880fe4d82000 ffff880fddd4cc98
Oct  7 05:34:30 ghost9 kernel: [82029.008790]  ffff880fdd365998 ffffffff811e8093 ffff880fdde52180 ffffffff81838030
Oct  7 05:34:30 ghost9 kernel: [82029.008801]  ffff880fdd365a08 ffffffff8127f28d ffff880fdd3659e8 ffff880fe4d82000
Oct  7 05:34:30 ghost9 kernel: [82029.008811] Call Trace:
Oct  7 05:34:30 ghost9 kernel: [82029.008821]  [<ffffffff811e8050>] ? __sync_dirty_buffer+0xa0/0xd0
Oct  7 05:34:30 ghost9 kernel: [82029.008828]  [<ffffffff811e8093>] ? sync_dirty_buffer+0x13/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.008836]  [<ffffffff8127f28d>] ? ext4_journal_abort_handle+0x4d/0xe0
Oct  7 05:34:30 ghost9 kernel: [82029.008843]  [<ffffffff8127f737>] __ext4_handle_dirty_metadata+0x117/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.008854]  [<ffffffff812913f3>] ? ext4_xattr_block_set+0xd3/0x710
Oct  7 05:34:30 ghost9 kernel: [82029.008865]  [<ffffffff8125444a>] ext4_do_update_inode+0x36a/0x560
Oct  7 05:34:30 ghost9 kernel: [82029.008873]  [<ffffffff81255e47>] ext4_mark_iloc_dirty+0x67/0x90
Oct  7 05:34:30 ghost9 kernel: [82029.008879]  [<ffffffff8129204f>] ext4_xattr_set_handle+0x24f/0x490
Oct  7 05:34:30 ghost9 kernel: [82029.008886]  [<ffffffff81292355>] ext4_xattr_set+0xc5/0x140
Oct  7 05:34:30 ghost9 kernel: [82029.009104]  [<ffffffff81292e8d>] ext4_xattr_trusted_set+0x2d/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.009534]  [<ffffffff811d8b6b>] generic_setxattr+0x6b/0x90
Oct  7 05:34:30 ghost9 kernel: [82029.010056]  [<ffffffff811d949b>] __vfs_setxattr_noperm+0x7b/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.010569]  [<ffffffff81337d8e>] ? evm_inode_setxattr+0xe/0x10
Oct  7 05:34:30 ghost9 kernel: [82029.011084]  [<ffffffff811d969c>] vfs_setxattr+0xbc/0xc0
Oct  7 05:34:30 ghost9 kernel: [82029.011604]  [<ffffffff811d97de>] setxattr+0x13e/0x1e0
Oct  7 05:34:30 ghost9 kernel: [82029.012121]  [<ffffffff817494fe>] ? _raw_spin_lock+0xe/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.012648]  [<ffffffff811b6ee3>] ? __sb_start_write+0x53/0x110
Oct  7 05:34:30 ghost9 kernel: [82029.013143]  [<ffffffff811d3492>] ? mnt_clone_write+0x12/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.013631]  [<ffffffff811d9c7e>] SyS_fsetxattr+0xbe/0x100
Oct  7 05:34:30 ghost9 kernel: [82029.014109]  [<ffffffff811d9e5d>] ? SyS_fgetxattr+0x7d/0xd0
Oct  7 05:34:30 ghost9 kernel: [82029.014578]  [<ffffffff8175291d>] system_call_fastpath+0x1a/0x1f
Oct  7 05:34:30 ghost9 kernel: [82029.015037] Code: 48 89 e5 48 81 ec a0 00 00 00 48 89 5d d8 4c 89 65 e0 41 89 d4 4c 89 6d e8 4c 89 75 f0 48 89 fb 4c 89 7d f8 4c 89 4d c8 4d 89 c7 <48> 8b 47 28 48 8b 57 40 49 89 f5 49 89 ce 48 8b 80 50 03 00 00 
Oct  7 05:34:30 ghost9 kernel: [82029.016080] RIP  [<ffffffff812708c1>] __ext4_error_inode+0x31/0x120
Oct  7 05:34:30 ghost9 kernel: [82029.016559]  RSP <ffff880fdd365968>
Oct  7 05:34:30 ghost9 kernel: [82029.017041] CR2: 0000000000000028
Oct  7 05:34:30 ghost9 kernel: [82029.019503] ---[ end trace 655f8cd7683964b0 ]---

Version-Release number of selected component (if applicable):

3.5.2-ubuntu1~precise1

How reproducible:

Unable to reproduce, but this happens approximately 1x per week in a 10 node cluster with 20 compute clients.

Steps to Reproduce:
1. NA

Actual results:

Extended attributes corrupted (not sure if this is an ext4 issue or a gluster issue). Brick becomes unresponsive instead of crashing or failing gracefully.

Expected results:

No filesystem corruption.
IO fails, or brick goes down and replica responds.

Additional info:

Comment 1 Niels de Vos 2014-10-07 19:13:01 UTC
When bug 1130242 has its patch merged, we can take include it in the an upcoming 3.5.x release.

Comment 2 Lalatendu Mohanty 2014-10-28 11:40:58 UTC

*** This bug has been marked as a duplicate of bug 1100204 ***


Note You need to log in before you can comment on or make changes to this bug.