Bug 513885
Summary: | GFS kernel panic, suid + nfsd with posix ACLs enabled | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Abraham Alawi <a.alawi> | ||||||||
Component: | gfs-kmod | Assignee: | Robert Peterson <rpeterso> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 5.3 | CC: | adas, bmarzins, edamato, swhiteho | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | gfs-kmod-0.1.34-7.el5 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 568089 (view as bug list) | Environment: | |||||||||
Last Closed: | 2010-03-30 08:56:15 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 568089 | ||||||||||
Attachments: |
|
Description
Abraham Alawi
2009-07-26 23:18:16 UTC
Abraham, please confirm whether you have posix ACLs enabled. Also if you still have it, please attach the assert message to this bz, as I think there are a few lines missing from the top of the back trace you've posted. It looks like the issue is that GFS has tried to open a new transaction (to remove the suid) while it already has one open for the write. I suspect that if you don't use posix acls, you won't hit this particular issue, so that might be a sort term workaround depending on your application. Created attachment 355319 [details]
full dmesg - kernel trace
Thanks Steve. Yes, POSIX ACL is enabled and being used, herein the mount options: nodev,nosuid,nouser,rw,dirsync,_netdev,acl Also, I've attached the dmesg for the full runtime cycle. Let me know if you need more info. Hmm, its odd that the assert message itself doesn't appear in the logs. On the other hand there is only one in the transaction start function, so it does look like my first suggestion was correct. The question now is how to fix it.... we'll be in touch when we have a solution. Thanks for the report. Re-occured again, this time there's an assertion message: Aug 4 13:05:55 charlotte kernel: GFS: fsid=FSC:files.1: fast statfs start time = 1249347860 Aug 4 13:29:08 charlotte kernel: Aug 4 13:29:08 charlotte kernel: Call Trace: Aug 4 13:29:08 charlotte kernel: [<ffffffff886948f6>] :gfs:gfs_assert_i+0x5e/0x89 Aug 4 13:29:08 charlotte kernel: [<ffffffff88693968>] :gfs:gfs_trans_begin_i+0x178/0x1b2 Aug 4 13:29:08 charlotte kernel: [<ffffffff88670487>] :gfs:gfs_ea_acl_chmod+0x52/0x3c4 Aug 4 13:29:08 charlotte kernel: [<ffffffff8866fe25>] :gfs:ea_find_i+0x0/0x6b Aug 4 13:29:08 charlotte kernel: [<ffffffff88667699>] :gfs:gfs_acl_chmod+0x139/0x184 Aug 4 13:29:08 charlotte kernel: [<ffffffff88689b3d>] :gfs:gfs_setattr+0x30d/0x371 Aug 4 13:29:08 charlotte kernel: [<ffffffff8000e00a>] current_fs_time+0x3b/0x40 Aug 4 13:29:08 charlotte kernel: [<ffffffff8002c5ca>] notify_change+0x145/0x2e0 Aug 4 13:29:08 charlotte kernel: [<ffffffff800c2316>] __remove_suid+0x15/0x1a Aug 4 13:29:08 charlotte kernel: [<ffffffff8002b474>] remove_suid+0x9/0x1c Aug 4 13:29:08 charlotte kernel: [<ffffffff800160a1>] __generic_file_aio_write_nolock+0x277/0x3b8 Aug 4 13:29:08 charlotte kernel: [<ffffffff88669b4d>] :gfs:gfs_dreread+0x72/0xc7 Aug 4 13:29:08 charlotte kernel: [<ffffffff88691247>] :gfs:gfs_rgrp_read+0xe7/0x226 Aug 4 13:29:08 charlotte kernel: [<ffffffff800c2ce5>] generic_file_aio_write_nolock+0x20/0x6c Aug 4 13:29:08 charlotte kernel: [<ffffffff800c30b1>] generic_file_write_nolock+0x8f/0xa8 Aug 4 13:29:08 charlotte kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Aug 4 13:29:08 charlotte kernel: [<ffffffff886935fb>] :gfs:gfs_trans_add_bh+0xc7/0xd9 Aug 4 13:29:08 charlotte kernel: [<ffffffff886845b8>] :gfs:gfs_dinode_out+0x162/0x18f Aug 4 13:29:08 charlotte kernel: [<ffffffff88686723>] :gfs:do_write_buf+0x443/0x67e Aug 4 13:29:08 charlotte kernel: [<ffffffff88685f38>] :gfs:walk_vm+0x10e/0x311 Aug 4 13:29:08 charlotte kernel: [<ffffffff886862e0>] :gfs:do_write_buf+0x0/0x67e Aug 4 13:29:08 charlotte kernel: [<ffffffff800631ac>] wait_for_completion+0x1f/0xa2 Aug 4 13:29:08 charlotte kernel: [<ffffffff886861e7>] :gfs:__gfs_write+0xac/0xc6 Aug 4 13:29:08 charlotte kernel: [<ffffffff800dbb02>] do_readv_writev+0x198/0x295 Aug 4 13:29:08 charlotte kernel: [<ffffffff8868622a>] :gfs:gfs_write+0x0/0x8 Aug 4 13:29:08 charlotte kernel: [<ffffffff88674214>] :gfs:gfs_glock_dq+0x13c/0x14b Aug 4 13:29:08 charlotte kernel: [<ffffffff88687844>] :gfs:gfs_open+0x12c/0x15e Aug 4 13:29:08 charlotte kernel: [<ffffffff886ff5f4>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 Aug 4 13:29:08 charlotte kernel: [<ffffffff88687718>] :gfs:gfs_open+0x0/0x15e Aug 4 13:29:08 charlotte kernel: [<ffffffff8001e51a>] __dentry_open+0x101/0x1dc Aug 4 13:29:08 charlotte kernel: [<ffffffff886ffe68>] :nfsd:nfsd_write+0xb5/0xd5 Aug 4 13:29:08 charlotte kernel: [<ffffffff88706986>] :nfsd:nfsd3_proc_write+0xea/0x109 Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Aug 4 13:29:08 charlotte kernel: [<ffffffff8856448b>] :sunrpc:svc_process+0x454/0x71b Aug 4 13:29:08 charlotte kernel: [<ffffffff800646f5>] __down_read+0x12/0x92 Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc746>] :nfsd:nfsd+0x1a5/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Aug 4 13:29:08 charlotte kernel: Aug 4 13:29:08 charlotte kernel: Kernel panic - not syncing: GFS: fsid=FSC:files.1: assertion "!get_transaction" failed Aug 4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1: function = gfs_trans_begin_i Aug 4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1: file = /builddir/build/BUILD/gfs-kmod-0.1.31/_kmod_build_/src/gfs/trans.c, line = 136 Aug 4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1: time = 1249349254 Aug 4 13:29:08 charlotte kernel: Aug 4 13:33:09 charlotte kernel: klogd 1.4.1, log source = /proc/kmsg started. Aug 4 13:33:09 charlotte kernel: Linux version 2.6.18-128.el5 (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Dec 17 11:41:38 EST 2008 Based on version information in the comments, this must be RHEL5. Changing bugzilla version information accordingly. Created attachment 357965 [details]
Patch to fix the problem
This is a RHEL5.3 version of Steve's patch to fix the problem.
Created attachment 357966 [details]
GFS rpm test package to try
This is an x86_64 rpm for the previously attached patch.
Please try this version of gfs and tell us if it fixes the
problem.
Setting the NEEDINFO flag until we find out of the previously attached rpm package fixes the problem. Thanks, I'll try it and let you know if the problem occur again. Just to let you know, since deploying the new module the problem didn't re-occur again. Thanks! Since the patch fixes the problem, perhaps we should get this into GFS for 5.5. Requesting ack flags. This patch was tested by the customer and found to be correct as per comment #13. This patch was pushed to the master branch of the gfs1-utils git tree and the STABLE3 and RHEL55 branches of the cluster git tree for inclusion into 5.5. Changing status to POST until a build is done. Here are the git tree commit IDs: RHEL55 a07e555 GFS kernel panic, suid + nfsd with posix ACLs enabled STABLE3 ac582a1 GFS kernel panic, suid + nfsd with posix ACLs enabled master 3f34656 GFS kernel panic, suid + nfsd with posix ACLs enabled Build 2136058 is complete and successful. This is now fixed in gfs-kmod-0.1.34-7.el5. Changing status to Modified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0291.html |