Bug 513885 - GFS kernel panic, suid + nfsd with posix ACLs enabled
Summary: GFS kernel panic, suid + nfsd with posix ACLs enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod
Version: 5.3
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 568089
TreeView+ depends on / blocked
 
Reported: 2009-07-26 23:18 UTC by Abraham Alawi
Modified: 2010-03-30 08:56 UTC (History)
4 users (show)

Fixed In Version: gfs-kmod-0.1.34-7.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 568089 (view as bug list)
Environment:
Last Closed: 2010-03-30 08:56:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
full dmesg - kernel trace (46.42 KB, text/plain)
2009-07-27 20:50 UTC, Abraham Alawi
no flags Details
Patch to fix the problem (5.07 KB, patch)
2009-08-19 17:54 UTC, Robert Peterson
no flags Details | Diff
GFS rpm test package to try (150.78 KB, application/octet-stream)
2009-08-19 17:56 UTC, Robert Peterson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0291 0 normal SHIPPED_LIVE Moderate: gfs-kmod security, bug fix and enhancement update 2010-03-29 14:12:22 UTC

Description Abraham Alawi 2009-07-26 23:18:16 UTC
Description of problem:
gfs kernel panic

Version-Release number of selected component (if applicable):

Kernel : 2.6.18-128.el5
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GFS: filename:       /lib/modules/2.6.18-128.el5/extra/gfs/gfs.ko
license:        GPL
author:         Red Hat, Inc.
description:    Global File System 0.1.31-3.el5
srcversion:     7F9FE59FBFC0B8BF89F8C0F
depends:        dlm
vermagic:       2.6.18-128.el5 SMP mod_unload gcc-4.1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rpm -qi nfs-utils-1.0.9-40.el5
Name        : nfs-utils                    Relocations: (not relocatable)
Version     : 1.0.9                             Vendor: Red Hat, Inc.
Release     : 40.el5                        Build Date: Thu 13 Nov 2008 08:12:27 AM NZDT
Install Date: Thu 12 Mar 2009 04:00:13 PM NZDT      Build Host: hs20-bc1-5.build.redhat.com
Group       : System Environment/Daemons    Source RPM: nfs-utils-1.0.9-40.el5.src.rpm
Size        : 811771                           License: GPL
Signature   : DSA/SHA1, Wed 17 Dec 2008 04:54:57 AM NZDT, Key ID 5326810137017186
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : NFS utlilities and supporting clients and daemons for the kernel NFS server.
Description :
The nfs-utils package provides a daemon for the kernel NFS server and
related tools, which provides a much higher level of performance than the
traditional Linux NFS server used by most users.

This package also contains the showmount program.  Showmount queries the
mount daemon on a remote host for information about the NFS (Network File
System) server on the remote host.  For example, showmount can display the
clients which are mounted on that host.

This package also contains the mount.nfs and umount.nfs program.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
modinfo nfsd
filename:       /lib/modules/2.6.18-128.el5/kernel/fs/nfsd/nfsd.ko
license:        GPL
author:         Olaf Kirch <okir.de>
srcversion:     9CFBCCAF9BD15695B71C743
depends:        auth_rpcgss,sunrpc,exportfs,lockd,nfs_acl
vermagic:       2.6.18-128.el5 SMP mod_unload gcc-4.1
module_sig:	883f35049492f705cdc734e64d24fa1121e409f696e9b208888fa84a377528f910bb535e1042cb09e31481f718a2968a4bc96aa48ec3fd616e8d983
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
lsmod
Module                  Size  Used by
nfsd                  285065  29 
exportfs               38849  1 nfsd
lockd                  99185  2 nfsd
nfs_acl                36673  1 nfsd
auth_rpcgss            81889  1 nfsd
gfs                   324380  2 
aoe                    60257  2 
autofs4                57033  2 
lock_dlm               51425  0 
gfs2                  526829  1 lock_dlm
dlm                   159553  18 gfs,lock_dlm
configfs               62301  2 dlm
sunrpc                197897  14 nfsd,lockd,nfs_acl,auth_rpcgss
ipv6                  424609  27 
xfrm_nalgo             43333  1 ipv6
crypto_api             42945  1 xfrm_nalgo
xt_state               35265  1 
ip_conntrack           91109  1 xt_state
nfnetlink              40457  1 ip_conntrack
ipt_iprange            34881  2 
xt_tcpudp              36289  23 
xt_comment             35009  34 
xt_multiport           36417  5 
iptable_filter         36161  1 
ip_tables              55329  1 iptable_filter
x_tables               50377  6 xt_state,ipt_iprange,xt_tcpudp,xt_comment,xt_multiport,ip_tables
dm_multipath           55257  0 
scsi_dh                41665  1 dm_multipath
video                  53197  0 
hwmon                  36553  0 
backlight              39873  1 video
sbs                    49921  0 
i2c_ec                 38593  1 sbs
i2c_core               56129  1 i2c_ec
button                 40545  0 
battery                43849  0 
asus_acpi              50917  0 
acpi_memhotplug        40133  0 
ac                     38729  0 
parport_pc             62312  0 
lp                     47121  0 
parport                73165  2 parport_pc,lp
tg3                   151621  0 
libphy                 53825  1 tg3
sg                     69993  0 
pcspkr                 36289  0 
dm_raid45              99025  0 
dm_message             36161  1 dm_raid45
dm_region_hash         46145  1 dm_raid45
dm_mem_cache           38977  1 dm_raid45
dm_snapshot            51465  0 
dm_zero                35265  0 
dm_mirror              53065  0 
dm_log                 44865  3 dm_raid45,dm_region_hash,dm_mirror
dm_mod                100369  15 dm_multipath,dm_raid45,dm_snapshot,dm_zero,dm_mirror,dm_log
usb_storage           116129  0 
qla2xxx              1107173  0 
scsi_transport_fc      73801  1 qla2xxx
ata_piix               56901  0 
libata                208721  1 ata_piix
shpchp                 70637  0 
mptsas                 69201  2 
mptscsih               69697  1 mptsas
mptbase               113637  2 mptsas,mptscsih
scsi_transport_sas     66753  1 mptsas
sd_mod                 56385  3 
scsi_mod              196569  10 scsi_dh,sg,usb_storage,qla2xxx,scsi_transport_fc,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod
ext3                  168017  2 
jbd                    94257  1 ext3
uhci_hcd               57433  0 
ohci_hcd               55925  0 
ehci_hcd               65741  0 


How reproducible:
NA

Steps to Reproduce:
NA

Actual results:


Expected results:
NA

Additional info:

Jul 27 10:13:23 charlotte kernel: Call Trace: 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8870d8f6>] :gfs:gfs_assert_i+0x5e/0x89 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8870c968>] :gfs:gfs_trans_begin_i+0x178/0x1b2 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886e9487>] :gfs:gfs_ea_acl_chmod+0x52/0x3c4 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886e8e25>] :gfs:ea_find_i+0x0/0x6b 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886e0699>] :gfs:gfs_acl_chmod+0x139/0x184 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff88702b3d>] :gfs:gfs_setattr+0x30d/0x371 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8022bfe0>] __qdisc_run+0x36/0x1bb 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8000e00a>] current_fs_time+0x3b/0x40 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8002c5ca>] notify_change+0x145/0x2e0 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800c2316>] __remove_suid+0x15/0x1a 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8002b474>] remove_suid+0x9/0x1c 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800160a1>] __generic_file_aio_write_nolock+0x277/0x3b8 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886e2b4d>] :gfs:gfs_dreread+0x72/0xc7 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8870a247>] :gfs:gfs_rgrp_read+0xe7/0x226 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800c2ce5>] generic_file_aio_write_nolock+0x20/0x6c 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800c30b1>] generic_file_write_nolock+0x8f/0xa8 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8870c5fb>] :gfs:gfs_trans_add_bh+0xc7/0xd9 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886fd5b8>] :gfs:gfs_dinode_out+0x162/0x18f 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886ff723>] :gfs:do_write_buf+0x443/0x67e 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886fef38>] :gfs:walk_vm+0x10e/0x311 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886ff2e0>] :gfs:do_write_buf+0x0/0x67e 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800631ac>] wait_for_completion+0x1f/0xa2 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886ff1e7>] :gfs:__gfs_write+0xac/0xc6 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800dbb02>] do_readv_writev+0x198/0x295 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886ff22a>] :gfs:gfs_write+0x0/0x8 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff886ed214>] :gfs:gfs_glock_dq+0x13c/0x14b 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff88700844>] :gfs:gfs_open+0x12c/0x15e 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff887545f4>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff88700718>] :gfs:gfs_open+0x0/0x15e 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8001e51a>] __dentry_open+0x101/0x1dc 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff88754e68>] :nfsd:nfsd_write+0xb5/0xd5 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8875b986>] :nfsd:nfsd3_proc_write+0xea/0x109 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff887511db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8856348b>] :sunrpc:svc_process+0x454/0x71b 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff800646f5>] __down_read+0x12/0x92 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff887515a1>] :nfsd:nfsd+0x0/0x2cb 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff88751746>] :nfsd:nfsd+0x1a5/0x2cb 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff887515a1>] :nfsd:nfsd+0x0/0x2cb 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff887515a1>] :nfsd:nfsd+0x0/0x2cb 
Jul 27 10:13:23 charlotte kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11

Comment 1 Steve Whitehouse 2009-07-27 08:59:52 UTC
Abraham, please confirm whether you have posix ACLs enabled. Also if you still have it, please attach the assert message to this bz, as I think there are a few lines missing from the top of the back trace you've posted.

It looks like the issue is that GFS has tried to open a new transaction (to remove the suid) while it already has one open for the write. I suspect that if you don't use posix acls, you won't hit this particular issue, so that might be a sort term workaround depending on your application.

Comment 2 Abraham Alawi 2009-07-27 20:50:01 UTC
Created attachment 355319 [details]
full dmesg - kernel trace

Comment 3 Abraham Alawi 2009-07-27 20:53:58 UTC
Thanks Steve. Yes, POSIX ACL is enabled and being used, herein the mount options:
nodev,nosuid,nouser,rw,dirsync,_netdev,acl

Also, I've attached the dmesg for the full runtime cycle. Let me know if you need more info.

Comment 4 Steve Whitehouse 2009-07-28 09:42:48 UTC
Hmm, its odd that the assert message itself doesn't appear in the logs. On the other hand there is only one in the transaction start function, so it does look like my first suggestion was correct. The question now is how to fix it.... we'll be in touch when we have a solution. Thanks for the report.

Comment 7 Abraham Alawi 2009-08-04 05:03:12 UTC
Re-occured again, this time there's an assertion message:

Aug  4 13:05:55 charlotte kernel: GFS: fsid=FSC:files.1: fast statfs start time = 1249347860 
Aug  4 13:29:08 charlotte kernel:  
Aug  4 13:29:08 charlotte kernel: Call Trace: 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886948f6>] :gfs:gfs_assert_i+0x5e/0x89 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88693968>] :gfs:gfs_trans_begin_i+0x178/0x1b2 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88670487>] :gfs:gfs_ea_acl_chmod+0x52/0x3c4 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8866fe25>] :gfs:ea_find_i+0x0/0x6b 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88667699>] :gfs:gfs_acl_chmod+0x139/0x184 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88689b3d>] :gfs:gfs_setattr+0x30d/0x371 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8000e00a>] current_fs_time+0x3b/0x40 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8002c5ca>] notify_change+0x145/0x2e0 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800c2316>] __remove_suid+0x15/0x1a 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8002b474>] remove_suid+0x9/0x1c 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800160a1>] __generic_file_aio_write_nolock+0x277/0x3b8 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88669b4d>] :gfs:gfs_dreread+0x72/0xc7 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88691247>] :gfs:gfs_rgrp_read+0xe7/0x226 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800c2ce5>] generic_file_aio_write_nolock+0x20/0x6c 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800c30b1>] generic_file_write_nolock+0x8f/0xa8 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886935fb>] :gfs:gfs_trans_add_bh+0xc7/0xd9 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886845b8>] :gfs:gfs_dinode_out+0x162/0x18f 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88686723>] :gfs:do_write_buf+0x443/0x67e 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88685f38>] :gfs:walk_vm+0x10e/0x311 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886862e0>] :gfs:do_write_buf+0x0/0x67e 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800631ac>] wait_for_completion+0x1f/0xa2 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886861e7>] :gfs:__gfs_write+0xac/0xc6 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800dbb02>] do_readv_writev+0x198/0x295 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8868622a>] :gfs:gfs_write+0x0/0x8 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88674214>] :gfs:gfs_glock_dq+0x13c/0x14b 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88687844>] :gfs:gfs_open+0x12c/0x15e 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886ff5f4>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88687718>] :gfs:gfs_open+0x0/0x15e 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8001e51a>] __dentry_open+0x101/0x1dc 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886ffe68>] :nfsd:nfsd_write+0xb5/0xd5 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff88706986>] :nfsd:nfsd3_proc_write+0xea/0x109 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886fc1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8856448b>] :sunrpc:svc_process+0x454/0x71b 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff800646f5>] __down_read+0x12/0x92 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886fc746>] :nfsd:nfsd+0x1a5/0x2cb 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb 
Aug  4 13:29:08 charlotte kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11 
Aug  4 13:29:08 charlotte kernel:  
Aug  4 13:29:08 charlotte kernel: Kernel panic - not syncing: GFS: fsid=FSC:files.1: assertion "!get_transaction" failed 
Aug  4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1:   function = gfs_trans_begin_i 
Aug  4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1:   file = /builddir/build/BUILD/gfs-kmod-0.1.31/_kmod_build_/src/gfs/trans.c, line = 136 
Aug  4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1:   time = 1249349254 
Aug  4 13:29:08 charlotte kernel:  
Aug  4 13:33:09 charlotte kernel: klogd 1.4.1, log source = /proc/kmsg started. 
Aug  4 13:33:09 charlotte kernel: Linux version 2.6.18-128.el5 (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Dec 17 11:41:38 EST 2008

Comment 8 Robert Peterson 2009-08-19 17:21:42 UTC
Based on version information in the comments, this must be RHEL5.
Changing bugzilla version information accordingly.

Comment 9 Robert Peterson 2009-08-19 17:54:34 UTC
Created attachment 357965 [details]
Patch to fix the problem

This is a RHEL5.3 version of Steve's patch to fix the problem.

Comment 10 Robert Peterson 2009-08-19 17:56:06 UTC
Created attachment 357966 [details]
GFS rpm test package to try

This is an x86_64 rpm for the previously attached patch.
Please try this version of gfs and tell us if it fixes the
problem.

Comment 11 Robert Peterson 2009-08-19 17:57:15 UTC
Setting the NEEDINFO flag until we find out of the previously
attached rpm package fixes the problem.

Comment 12 Abraham Alawi 2009-08-20 02:33:17 UTC
Thanks, I'll try it and let you know if the problem occur again.

Comment 13 Abraham Alawi 2009-10-21 21:12:17 UTC
Just to let you know, since deploying the new module the problem didn't re-occur again. Thanks!

Comment 14 Robert Peterson 2009-12-07 15:54:36 UTC
Since the patch fixes the problem, perhaps we should get this into
GFS for 5.5.  Requesting ack flags.

Comment 15 Robert Peterson 2009-12-07 19:12:41 UTC
This patch was tested by the customer and found to be correct
as per comment #13.

This patch was pushed to the master branch of the gfs1-utils
git tree and the STABLE3 and RHEL55 branches of the cluster git
tree for inclusion into 5.5.  Changing status to POST until a
build is done.

Comment 16 Robert Peterson 2009-12-07 19:13:50 UTC
Here are the git tree commit IDs:

RHEL55  a07e555 GFS kernel panic, suid + nfsd with posix ACLs enabled
STABLE3 ac582a1 GFS kernel panic, suid + nfsd with posix ACLs enabled
master  3f34656 GFS kernel panic, suid + nfsd with posix ACLs enabled

Comment 17 Robert Peterson 2009-12-07 23:15:09 UTC
Build 2136058 is complete and successful.  This is now fixed in
gfs-kmod-0.1.34-7.el5.  Changing status to Modified.

Comment 22 errata-xmlrpc 2010-03-30 08:56:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0291.html


Note You need to log in before you can comment on or make changes to this bug.