Reported internally. static int gfs_lock(struct file *file, int cmd, struct file_lock *fl) { .. if ((ip->i_di.di_mode & (S_ISGID | S_IXGRP)) == S_ISGID) return -ENOLCK; .. } This is a check for mandatory locking where the GFS locking code will skip the lock in case sgid bits are set for the file. This is similar to bz 218777 which affected RHEL 4 NFS shares on the client. The reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=218777#c1 (private) can be used to crash a system mounting a GFS filesystem. I was able to reproduce this on 2.6.18-164.11.1 with kmod-gfs-0.1.34-2.el5 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/locks.c:2080 invalid opcode: 0000 [1] SMP last sysfs file: /kernel/dlm/gfs-sachin/id CPU 0 Modules linked in: gfs(U) lock_dlm gfs2 dlm configfs netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc iscsi_tcp bnx2i cnic uio cxgb3i cxgb3 8021q libiscsi_tcp ib_iser libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi ib_srp rds ib_sdp ib_ipoib ipoib_helper ipv6 xfrm_nalgo crypto_api rdma_ucm rdma_cm ib_ucm ib_uverbs ib_umad ib_cm iw_cm ib_addr ib_sa ib_mad ib_core loop dm_emc dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport sr_mod sg joydev pcspkr i5000_edac edac_mc qla2xxx bnx2 ata_piix libata scsi_transport_fc serial_core serio_raw ide_cd cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hc Pid: 12585, comm: crash Tainted: G 2.6.18-164.11.1.HOTFIX.el5xen #1 RIP: e030:[<ffffffff80227976>] [<ffffffff80227976>] locks_remove_flock+0xe4/0x124 RSP: e02b:ffff88003ff5de28 EFLAGS: 00010246 RAX: ffff88005275b3f8 RBX: ffff88003fb405b0 RCX: 7fffffffffffffff RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffffffff8052d800 RBP: ffff8800512d23c0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88003ff5de28 R11: 00000000000000b0 R12: ffff88003fb404b0 R13: ffff88003fb404b0 R14: ffff8800545af0c0 R15: ffff88003fed64b0 FS: 00002b71ceb65210(0000) GS:ffffffff805ca000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process crash (pid: 12585, threadinfo ffff88003ff5c000, task ffff880060288040) Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000003129 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff802132d8>] __fput+0x94/0x198 [<ffffffff802240af>] filp_close+0x5c/0x64 [<ffffffff8021e2c7>] sys_close+0x88/0xbd [<ffffffff802602f9>] tracesys+0xab/0xb6
Explanation: Using latest RHEL 5 code. 1) The file is locked using a posix lock. This is supported by GFS. 2) The mode of the file is then set to 02644, this sets the sgid bit but doesn't set the group execute bit. This mode is used to enforce mandatory locking. 3) When closing the file, the following code is called filp_close locks_remove_posix vfs_lock_file return filp->f_op->lock(filp, cmd, fl); This for a file on gfs is gfs2_lock static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl) { .. if ((ip->i_inode.i_mode & (S_ISGID | S_IXGRP)) == S_ISGID) return -ENOLCK; .. } Thus, gfs2_lock() notices that the mode set on the file corresponds to the mandatory locks. At this stage, it quits with a ENOLCK. The posix lock thus is not cleared at this point. The close file continues. filp_close fput __fput locks_remove_flock At this stage, it goes through the locks to remove any remaining flocks. It is assumed that all posix locks have been removed by the code path explained above. However it hits this particular lock which was skipped above. At this stage, it fails with a bug. void locks_remove_flock(struct file *filp) { .. while ((fl = *before) != NULL) { if (fl->fl_file == filp) { if (IS_FLOCK(fl)) { locks_delete_lock(before); continue; } if (IS_LEASE(fl)) { lease_modify(before, F_UNLCK); continue; } /* What? */ BUG(); <-- fails here. } before = &fl->fl_next; } .. }
Created attachment 398505 [details] proposed patch Check for mandatory locks should be ignored in case of unlock requests. This is similar to the code which went into the NFS module.
I've verified the same issue exists on upstream.
Created attachment 398546 [details] Upstream patch (should do for RHEL6 as well)
Need fixes for GFS-kernel, gfs-kmod, and gfs2. Note that the upstream kernel needs the fix for gfs2 eventually.
Patch submitted upstream: http://lkml.org/lkml/2010/3/11/269
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2010:0178 https://rhn.redhat.com/errata/RHSA-2010-0178.html
This issue has been addressed in following products: Red Hat Enterprise Linux 5 Via RHSA-2010:0291 https://rhn.redhat.com/errata/RHSA-2010-0291.html
This issue has been addressed in following products: GFS for RHEL 3 Via RHSA-2010:0330 https://rhn.redhat.com/errata/RHSA-2010-0330.html
This issue has been addressed in following products: GFS for RHEL 4 Via RHSA-2010:0331 https://rhn.redhat.com/errata/RHSA-2010-0331.html
This issue has been addressed in following products: Red Hat Enterprise Linux 5.4.Z - Server Only Via RHSA-2010:0380 https://rhn.redhat.com/errata/RHSA-2010-0380.html
This issue has been addressed in following products: Red Hat Enterprise Linux 5.4.Z - Server Only Via RHSA-2010:0521 https://rhn.redhat.com/errata/RHSA-2010-0521.html
Is there any reason to keep this bug record open? It's seen no activity for almost 5 years.
(In reply to Robert Peterson from comment #46) > Is there any reason to keep this bug record open? > It's seen no activity for almost 5 years. No, closing. Thanks Robert.