Description of problem: gfs kernel panic Version-Release number of selected component (if applicable): Kernel : 2.6.18-128.el5 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ GFS: filename: /lib/modules/2.6.18-128.el5/extra/gfs/gfs.ko license: GPL author: Red Hat, Inc. description: Global File System 0.1.31-3.el5 srcversion: 7F9FE59FBFC0B8BF89F8C0F depends: dlm vermagic: 2.6.18-128.el5 SMP mod_unload gcc-4.1 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ rpm -qi nfs-utils-1.0.9-40.el5 Name : nfs-utils Relocations: (not relocatable) Version : 1.0.9 Vendor: Red Hat, Inc. Release : 40.el5 Build Date: Thu 13 Nov 2008 08:12:27 AM NZDT Install Date: Thu 12 Mar 2009 04:00:13 PM NZDT Build Host: hs20-bc1-5.build.redhat.com Group : System Environment/Daemons Source RPM: nfs-utils-1.0.9-40.el5.src.rpm Size : 811771 License: GPL Signature : DSA/SHA1, Wed 17 Dec 2008 04:54:57 AM NZDT, Key ID 5326810137017186 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Summary : NFS utlilities and supporting clients and daemons for the kernel NFS server. Description : The nfs-utils package provides a daemon for the kernel NFS server and related tools, which provides a much higher level of performance than the traditional Linux NFS server used by most users. This package also contains the showmount program. Showmount queries the mount daemon on a remote host for information about the NFS (Network File System) server on the remote host. For example, showmount can display the clients which are mounted on that host. This package also contains the mount.nfs and umount.nfs program. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ modinfo nfsd filename: /lib/modules/2.6.18-128.el5/kernel/fs/nfsd/nfsd.ko license: GPL author: Olaf Kirch <okir.de> srcversion: 9CFBCCAF9BD15695B71C743 depends: auth_rpcgss,sunrpc,exportfs,lockd,nfs_acl vermagic: 2.6.18-128.el5 SMP mod_unload gcc-4.1 module_sig: 883f35049492f705cdc734e64d24fa1121e409f696e9b208888fa84a377528f910bb535e1042cb09e31481f718a2968a4bc96aa48ec3fd616e8d983 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ lsmod Module Size Used by nfsd 285065 29 exportfs 38849 1 nfsd lockd 99185 2 nfsd nfs_acl 36673 1 nfsd auth_rpcgss 81889 1 nfsd gfs 324380 2 aoe 60257 2 autofs4 57033 2 lock_dlm 51425 0 gfs2 526829 1 lock_dlm dlm 159553 18 gfs,lock_dlm configfs 62301 2 dlm sunrpc 197897 14 nfsd,lockd,nfs_acl,auth_rpcgss ipv6 424609 27 xfrm_nalgo 43333 1 ipv6 crypto_api 42945 1 xfrm_nalgo xt_state 35265 1 ip_conntrack 91109 1 xt_state nfnetlink 40457 1 ip_conntrack ipt_iprange 34881 2 xt_tcpudp 36289 23 xt_comment 35009 34 xt_multiport 36417 5 iptable_filter 36161 1 ip_tables 55329 1 iptable_filter x_tables 50377 6 xt_state,ipt_iprange,xt_tcpudp,xt_comment,xt_multiport,ip_tables dm_multipath 55257 0 scsi_dh 41665 1 dm_multipath video 53197 0 hwmon 36553 0 backlight 39873 1 video sbs 49921 0 i2c_ec 38593 1 sbs i2c_core 56129 1 i2c_ec button 40545 0 battery 43849 0 asus_acpi 50917 0 acpi_memhotplug 40133 0 ac 38729 0 parport_pc 62312 0 lp 47121 0 parport 73165 2 parport_pc,lp tg3 151621 0 libphy 53825 1 tg3 sg 69993 0 pcspkr 36289 0 dm_raid45 99025 0 dm_message 36161 1 dm_raid45 dm_region_hash 46145 1 dm_raid45 dm_mem_cache 38977 1 dm_raid45 dm_snapshot 51465 0 dm_zero 35265 0 dm_mirror 53065 0 dm_log 44865 3 dm_raid45,dm_region_hash,dm_mirror dm_mod 100369 15 dm_multipath,dm_raid45,dm_snapshot,dm_zero,dm_mirror,dm_log usb_storage 116129 0 qla2xxx 1107173 0 scsi_transport_fc 73801 1 qla2xxx ata_piix 56901 0 libata 208721 1 ata_piix shpchp 70637 0 mptsas 69201 2 mptscsih 69697 1 mptsas mptbase 113637 2 mptsas,mptscsih scsi_transport_sas 66753 1 mptsas sd_mod 56385 3 scsi_mod 196569 10 scsi_dh,sg,usb_storage,qla2xxx,scsi_transport_fc,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod ext3 168017 2 jbd 94257 1 ext3 uhci_hcd 57433 0 ohci_hcd 55925 0 ehci_hcd 65741 0 How reproducible: NA Steps to Reproduce: NA Actual results: Expected results: NA Additional info: Jul 27 10:13:23 charlotte kernel: Call Trace: Jul 27 10:13:23 charlotte kernel: [<ffffffff8870d8f6>] :gfs:gfs_assert_i+0x5e/0x89 Jul 27 10:13:23 charlotte kernel: [<ffffffff8870c968>] :gfs:gfs_trans_begin_i+0x178/0x1b2 Jul 27 10:13:23 charlotte kernel: [<ffffffff886e9487>] :gfs:gfs_ea_acl_chmod+0x52/0x3c4 Jul 27 10:13:23 charlotte kernel: [<ffffffff886e8e25>] :gfs:ea_find_i+0x0/0x6b Jul 27 10:13:23 charlotte kernel: [<ffffffff886e0699>] :gfs:gfs_acl_chmod+0x139/0x184 Jul 27 10:13:23 charlotte kernel: [<ffffffff88702b3d>] :gfs:gfs_setattr+0x30d/0x371 Jul 27 10:13:23 charlotte kernel: [<ffffffff8022bfe0>] __qdisc_run+0x36/0x1bb Jul 27 10:13:23 charlotte kernel: [<ffffffff8000e00a>] current_fs_time+0x3b/0x40 Jul 27 10:13:23 charlotte kernel: [<ffffffff8002c5ca>] notify_change+0x145/0x2e0 Jul 27 10:13:23 charlotte kernel: [<ffffffff800c2316>] __remove_suid+0x15/0x1a Jul 27 10:13:23 charlotte kernel: [<ffffffff8002b474>] remove_suid+0x9/0x1c Jul 27 10:13:23 charlotte kernel: [<ffffffff800160a1>] __generic_file_aio_write_nolock+0x277/0x3b8 Jul 27 10:13:23 charlotte kernel: [<ffffffff886e2b4d>] :gfs:gfs_dreread+0x72/0xc7 Jul 27 10:13:23 charlotte kernel: [<ffffffff8870a247>] :gfs:gfs_rgrp_read+0xe7/0x226 Jul 27 10:13:23 charlotte kernel: [<ffffffff800c2ce5>] generic_file_aio_write_nolock+0x20/0x6c Jul 27 10:13:23 charlotte kernel: [<ffffffff800c30b1>] generic_file_write_nolock+0x8f/0xa8 Jul 27 10:13:23 charlotte kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Jul 27 10:13:23 charlotte kernel: [<ffffffff8870c5fb>] :gfs:gfs_trans_add_bh+0xc7/0xd9 Jul 27 10:13:23 charlotte kernel: [<ffffffff886fd5b8>] :gfs:gfs_dinode_out+0x162/0x18f Jul 27 10:13:23 charlotte kernel: [<ffffffff886ff723>] :gfs:do_write_buf+0x443/0x67e Jul 27 10:13:23 charlotte kernel: [<ffffffff886fef38>] :gfs:walk_vm+0x10e/0x311 Jul 27 10:13:23 charlotte kernel: [<ffffffff886ff2e0>] :gfs:do_write_buf+0x0/0x67e Jul 27 10:13:23 charlotte kernel: [<ffffffff800631ac>] wait_for_completion+0x1f/0xa2 Jul 27 10:13:23 charlotte kernel: [<ffffffff886ff1e7>] :gfs:__gfs_write+0xac/0xc6 Jul 27 10:13:23 charlotte kernel: [<ffffffff800dbb02>] do_readv_writev+0x198/0x295 Jul 27 10:13:23 charlotte kernel: [<ffffffff886ff22a>] :gfs:gfs_write+0x0/0x8 Jul 27 10:13:23 charlotte kernel: [<ffffffff886ed214>] :gfs:gfs_glock_dq+0x13c/0x14b Jul 27 10:13:23 charlotte kernel: [<ffffffff88700844>] :gfs:gfs_open+0x12c/0x15e Jul 27 10:13:23 charlotte kernel: [<ffffffff887545f4>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 Jul 27 10:13:23 charlotte kernel: [<ffffffff88700718>] :gfs:gfs_open+0x0/0x15e Jul 27 10:13:23 charlotte kernel: [<ffffffff8001e51a>] __dentry_open+0x101/0x1dc Jul 27 10:13:23 charlotte kernel: [<ffffffff88754e68>] :nfsd:nfsd_write+0xb5/0xd5 Jul 27 10:13:23 charlotte kernel: [<ffffffff8875b986>] :nfsd:nfsd3_proc_write+0xea/0x109 Jul 27 10:13:23 charlotte kernel: [<ffffffff887511db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Jul 27 10:13:23 charlotte kernel: [<ffffffff8856348b>] :sunrpc:svc_process+0x454/0x71b Jul 27 10:13:23 charlotte kernel: [<ffffffff800646f5>] __down_read+0x12/0x92 Jul 27 10:13:23 charlotte kernel: [<ffffffff887515a1>] :nfsd:nfsd+0x0/0x2cb Jul 27 10:13:23 charlotte kernel: [<ffffffff88751746>] :nfsd:nfsd+0x1a5/0x2cb Jul 27 10:13:23 charlotte kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jul 27 10:13:23 charlotte kernel: [<ffffffff887515a1>] :nfsd:nfsd+0x0/0x2cb Jul 27 10:13:23 charlotte kernel: [<ffffffff887515a1>] :nfsd:nfsd+0x0/0x2cb Jul 27 10:13:23 charlotte kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Abraham, please confirm whether you have posix ACLs enabled. Also if you still have it, please attach the assert message to this bz, as I think there are a few lines missing from the top of the back trace you've posted. It looks like the issue is that GFS has tried to open a new transaction (to remove the suid) while it already has one open for the write. I suspect that if you don't use posix acls, you won't hit this particular issue, so that might be a sort term workaround depending on your application.
Created attachment 355319 [details] full dmesg - kernel trace
Thanks Steve. Yes, POSIX ACL is enabled and being used, herein the mount options: nodev,nosuid,nouser,rw,dirsync,_netdev,acl Also, I've attached the dmesg for the full runtime cycle. Let me know if you need more info.
Hmm, its odd that the assert message itself doesn't appear in the logs. On the other hand there is only one in the transaction start function, so it does look like my first suggestion was correct. The question now is how to fix it.... we'll be in touch when we have a solution. Thanks for the report.
Re-occured again, this time there's an assertion message: Aug 4 13:05:55 charlotte kernel: GFS: fsid=FSC:files.1: fast statfs start time = 1249347860 Aug 4 13:29:08 charlotte kernel: Aug 4 13:29:08 charlotte kernel: Call Trace: Aug 4 13:29:08 charlotte kernel: [<ffffffff886948f6>] :gfs:gfs_assert_i+0x5e/0x89 Aug 4 13:29:08 charlotte kernel: [<ffffffff88693968>] :gfs:gfs_trans_begin_i+0x178/0x1b2 Aug 4 13:29:08 charlotte kernel: [<ffffffff88670487>] :gfs:gfs_ea_acl_chmod+0x52/0x3c4 Aug 4 13:29:08 charlotte kernel: [<ffffffff8866fe25>] :gfs:ea_find_i+0x0/0x6b Aug 4 13:29:08 charlotte kernel: [<ffffffff88667699>] :gfs:gfs_acl_chmod+0x139/0x184 Aug 4 13:29:08 charlotte kernel: [<ffffffff88689b3d>] :gfs:gfs_setattr+0x30d/0x371 Aug 4 13:29:08 charlotte kernel: [<ffffffff8000e00a>] current_fs_time+0x3b/0x40 Aug 4 13:29:08 charlotte kernel: [<ffffffff8002c5ca>] notify_change+0x145/0x2e0 Aug 4 13:29:08 charlotte kernel: [<ffffffff800c2316>] __remove_suid+0x15/0x1a Aug 4 13:29:08 charlotte kernel: [<ffffffff8002b474>] remove_suid+0x9/0x1c Aug 4 13:29:08 charlotte kernel: [<ffffffff800160a1>] __generic_file_aio_write_nolock+0x277/0x3b8 Aug 4 13:29:08 charlotte kernel: [<ffffffff88669b4d>] :gfs:gfs_dreread+0x72/0xc7 Aug 4 13:29:08 charlotte kernel: [<ffffffff88691247>] :gfs:gfs_rgrp_read+0xe7/0x226 Aug 4 13:29:08 charlotte kernel: [<ffffffff800c2ce5>] generic_file_aio_write_nolock+0x20/0x6c Aug 4 13:29:08 charlotte kernel: [<ffffffff800c30b1>] generic_file_write_nolock+0x8f/0xa8 Aug 4 13:29:08 charlotte kernel: [<ffffffff8009db21>] autoremove_wake_function+0x0/0x2e Aug 4 13:29:08 charlotte kernel: [<ffffffff886935fb>] :gfs:gfs_trans_add_bh+0xc7/0xd9 Aug 4 13:29:08 charlotte kernel: [<ffffffff886845b8>] :gfs:gfs_dinode_out+0x162/0x18f Aug 4 13:29:08 charlotte kernel: [<ffffffff88686723>] :gfs:do_write_buf+0x443/0x67e Aug 4 13:29:08 charlotte kernel: [<ffffffff88685f38>] :gfs:walk_vm+0x10e/0x311 Aug 4 13:29:08 charlotte kernel: [<ffffffff886862e0>] :gfs:do_write_buf+0x0/0x67e Aug 4 13:29:08 charlotte kernel: [<ffffffff800631ac>] wait_for_completion+0x1f/0xa2 Aug 4 13:29:08 charlotte kernel: [<ffffffff886861e7>] :gfs:__gfs_write+0xac/0xc6 Aug 4 13:29:08 charlotte kernel: [<ffffffff800dbb02>] do_readv_writev+0x198/0x295 Aug 4 13:29:08 charlotte kernel: [<ffffffff8868622a>] :gfs:gfs_write+0x0/0x8 Aug 4 13:29:08 charlotte kernel: [<ffffffff88674214>] :gfs:gfs_glock_dq+0x13c/0x14b Aug 4 13:29:08 charlotte kernel: [<ffffffff88687844>] :gfs:gfs_open+0x12c/0x15e Aug 4 13:29:08 charlotte kernel: [<ffffffff886ff5f4>] :nfsd:nfsd_vfs_write+0xf2/0x2e1 Aug 4 13:29:08 charlotte kernel: [<ffffffff88687718>] :gfs:gfs_open+0x0/0x15e Aug 4 13:29:08 charlotte kernel: [<ffffffff8001e51a>] __dentry_open+0x101/0x1dc Aug 4 13:29:08 charlotte kernel: [<ffffffff886ffe68>] :nfsd:nfsd_write+0xb5/0xd5 Aug 4 13:29:08 charlotte kernel: [<ffffffff88706986>] :nfsd:nfsd3_proc_write+0xea/0x109 Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6 Aug 4 13:29:08 charlotte kernel: [<ffffffff8856448b>] :sunrpc:svc_process+0x454/0x71b Aug 4 13:29:08 charlotte kernel: [<ffffffff800646f5>] __down_read+0x12/0x92 Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc746>] :nfsd:nfsd+0x1a5/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff886fc5a1>] :nfsd:nfsd+0x0/0x2cb Aug 4 13:29:08 charlotte kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Aug 4 13:29:08 charlotte kernel: Aug 4 13:29:08 charlotte kernel: Kernel panic - not syncing: GFS: fsid=FSC:files.1: assertion "!get_transaction" failed Aug 4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1: function = gfs_trans_begin_i Aug 4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1: file = /builddir/build/BUILD/gfs-kmod-0.1.31/_kmod_build_/src/gfs/trans.c, line = 136 Aug 4 13:29:08 charlotte kernel: GFS: fsid=FSC:files.1: time = 1249349254 Aug 4 13:29:08 charlotte kernel: Aug 4 13:33:09 charlotte kernel: klogd 1.4.1, log source = /proc/kmsg started. Aug 4 13:33:09 charlotte kernel: Linux version 2.6.18-128.el5 (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Dec 17 11:41:38 EST 2008
Based on version information in the comments, this must be RHEL5. Changing bugzilla version information accordingly.
Created attachment 357965 [details] Patch to fix the problem This is a RHEL5.3 version of Steve's patch to fix the problem.
Created attachment 357966 [details] GFS rpm test package to try This is an x86_64 rpm for the previously attached patch. Please try this version of gfs and tell us if it fixes the problem.
Setting the NEEDINFO flag until we find out of the previously attached rpm package fixes the problem.
Thanks, I'll try it and let you know if the problem occur again.
Just to let you know, since deploying the new module the problem didn't re-occur again. Thanks!
Since the patch fixes the problem, perhaps we should get this into GFS for 5.5. Requesting ack flags.
This patch was tested by the customer and found to be correct as per comment #13. This patch was pushed to the master branch of the gfs1-utils git tree and the STABLE3 and RHEL55 branches of the cluster git tree for inclusion into 5.5. Changing status to POST until a build is done.
Here are the git tree commit IDs: RHEL55 a07e555 GFS kernel panic, suid + nfsd with posix ACLs enabled STABLE3 ac582a1 GFS kernel panic, suid + nfsd with posix ACLs enabled master 3f34656 GFS kernel panic, suid + nfsd with posix ACLs enabled
Build 2136058 is complete and successful. This is now fixed in gfs-kmod-0.1.34-7.el5. Changing status to Modified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0291.html