Bug 567479
Summary: | fasync_helper patch causing problems with GPFS | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Lachlan McIlroy <lmcilroy> | ||||||||
Component: | kernel | Assignee: | Lachlan McIlroy <lmcilroy> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Eryu Guan <eguan> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 5.4 | CC: | aviro, bugproxy, cluster-maint, dhoward, dhowells, eguan, esandeen, eteo, jeder, jjarvis, jlayton, jwest, plyons, rwheeler, smfltc, tao, vgaikwad | ||||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
The Red Hat Enterprise Linux 5.5 kernel contained a fix for Bugzilla issue number 548657 which introduced a regression in file locking behavior that presented with the General Parallel File System (GPFS). This update removes the redundant locking code.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-01-13 21:08:10 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 599730 | ||||||||||
Attachments: |
|
Description
Lachlan McIlroy
2010-02-23 02:06:33 UTC
I don't quite understand how a file_lock can have a NULL fl_file pointer in this stack trace. For __posix_lock_file_conf() to call locks_delete_lock() it means that a lock request overlaps and/or replaces an existing one. The existing one is already attached to the inode and I'm pretty sure it should have a valid file pointer attached. When a file pointer is closed all posix locks that use that file pointer are released so we shouldn't have any posix locks hanging around for files that are not open. This problem was exposed by patch: linux-2.6-fs-fasync-split-fasync_helper-into-separate-add-remove-functions.patch that went into 2.6.18-164.11.1.el5. Prior to this patch the fasync_helper() function didn't de-reference the fl_file pointer so a NULL value passed through without incident. But this function did (and still does) use the fl_file value to filter out which locks to remove and filtering on a NULL value rather than a valid file pointer may not be producing the correct results and in that case it would be leaking memory in the fasync_cache. It appears the only way to queue a locked file onto the async list is via fcntl_setlease(): int fcntl_setlease(unsigned int fd, struct file *filp, long arg) { struct file_lock fl, *flp = &fl; struct dentry *dentry = filp->f_dentry; struct inode *inode = dentry->d_inode; int error; if (IS_NO_LEASES(inode)) return -EINVAL; locks_init_lock(&fl); error = lease_init(filp, arg, &fl); if (error) return error; lock_kernel(); error = setlease(filp, arg, &flp); if (error || arg == F_UNLCK) goto out_unlock; error = fasync_helper(fd, filp, 1, &flp->fl_fasync); ... Where filp is a valid file pointer. int fasync_helper(int fd, struct file *filp, int on, struct fasync_struct **fapp) { if (!on) return fasync_remove_entry(filp, fapp); return fasync_add_entry(fd, filp, fapp); } static int fasync_add_entry(int fd, struct file *filp, struct fasync_struct **fapp) { struct fasync_struct *new, *fa, **fp; int result = 0; new = kmem_cache_alloc(fasync_cache, GFP_KERNEL); if (!new) return -ENOMEM; spin_lock(&filp->f_ep_lock); write_lock_irq(&fasync_lock); for (fp = fapp; (fa = *fp) != NULL; fp = &fa->fa_next) { if (fa->fa_file != filp) continue; fa->fa_fd = fd; kmem_cache_free(fasync_cache, new); goto out; } new->magic = FASYNC_MAGIC; new->fa_file = filp; new->fa_fd = fd; new->fa_next = *fapp; *fapp = new; result = 1; filp->f_flags |= FASYNC; out: write_unlock_irq(&fasync_lock); spin_unlock(&filp->f_ep_lock); return result; } So the fasync_struct gets queue with a non-NULL fa_file field. When deleting locks, fl_file is apparently NULL: static void locks_delete_lock(struct file_lock **thisfl_p) { struct file_lock *fl = *thisfl_p; *thisfl_p = fl->fl_next; fl->fl_next = NULL; list_del_init(&fl->fl_link); fasync_helper(0, fl->fl_file, 0, &fl->fl_fasync); if (fl->fl_fasync != NULL) { printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync); fl->fl_fasync = NULL; } ... So we have a NULL filp here: static int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp) { struct fasync_struct *fa, **fp; int result = 0; spin_lock(&filp->f_ep_lock); write_lock_irq(&fasync_lock); for (fp = fapp; (fa = *fp) != NULL; fp = &fa->fa_next) { if (fa->fa_file != filp) continue; *fp = fa->fa_next; kmem_cache_free(fasync_cache, fa); filp->f_flags &= ~FASYNC; result = 1; break; } write_unlock_irq(&fasync_lock); spin_unlock(&filp->f_ep_lock); return result; } And matching on a NULL filp will not find the locked files that were originally queued on the async list. So back in locks_delete_lock() we should be tripping this code: if (fl->fl_fasync != NULL) { printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync); fl->fl_fasync = NULL; } And leaking memory from the fasync_cache. The same bug would have existed before the fasync_helper() patch was applied. From the description from the customer it sounds like they are not using lease locks. Since we only queue the locked file for leases then there's no need to dequeue it when we are removing a non-lease lock. So their suggested change would certainly prevent the panics they are seeing. 573 static void locks_delete_lock(struct file_lock **thisfl_p) 574 { 575 struct file_lock *fl = *thisfl_p; 576 577 *thisfl_p = fl->fl_next; 578 fl->fl_next = NULL; 579 list_del_init(&fl->fl_link); 580 +++++ if (IS_LEASE(fl)) 581 fasync_helper(0, fl->fl_file, 0, &fl->fl_fasync); While I think this is a valid optimisation for the RHEL code I'm not convinced it's the correct way to fix their problem because having a lock without a valid fl_file field is indicative of another bug. A test kernel with version kernel-2.6.18-164.11.1.el5.bz567479.1 that includes the proposed fix to locks_delete_lock() will be available from: https://brewweb.devel.redhat.com/taskinfo?taskID=2275587 Created attachment 395654 [details]
Patch requested by customer
(In reply to comment #4) > This problem was exposed by patch: > linux-2.6-fs-fasync-split-fasync_helper-into-separate-add-remove-functions.patch > that went into 2.6.18-164.11.1.el5. > > Prior to this patch the fasync_helper() function didn't de-reference the > fl_file pointer so a NULL value passed through without incident. Ok, interesting. Upstream, the old fasync_helper did this before the change: 625 int fasync_helper(int fd, struct file * filp, int on, struct fasync_struct **fapp) 626 { ... 637 /* 638 * We need to take f_lock first since it's not an IRQ-safe 639 * lock. 640 */ 641 spin_lock(&filp->f_lock); but we didn't have that in RHEL5; the patch in question essentially added this new spinlock which now dereferences filp. Danny's patch essentially pulled in part of 76398425bb06b07cc3a3b1ce169c67dc9d6874ed as well, which moved the locking to this function. But we didn't get the rest of it, let me see what was going on there. So whatever GPFS is doing to end up with the NULL filp, that's why we didn't oops before: 76398425bb06b07cc3a3b1ce169c67dc9d6874ed turned a big lock_kernel() into a spin_lock(filp-> ... ) and we didn't have any of that; now we do, -some- of it came in with the backport of 53281b6d3, which added a new deref of filp - and we still have the old lock_kernel as well. So -maybe- with that old lock_kernel still in place, removing the spin_lock() would eliminate the NULL deref and get back to the old behavior that "worked," but it still doesn't sound like we should be getting here with a NULL filp in any case. Eric, I think you're onto something here. Since the lock_kernel() is still in place then the f_ep_lock seems totally redundant and can be removed. It's not even the correct lock to use. Removing the f_ep_lock instead of the proposed patch will mean we don't have to update mainline, we fix a bug in our own code and (as Al pointed out) we wont hurt ISVs that depend on fasync for non-lease locks. Jeremy, a test kernel with version kernel-2.6.18-164.11.1.el5.bz567479.2 that includes a patch to remove the unneeded f_ep_lock will be available from: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2278674 Created attachment 395873 [details]
Alternative patch to remove unnecessary use of f_ep_lock
------- Comment From sharyath.com 2010-02-23 01:30 EDT------- Brian You already seem to have solution for this problem. I went through the code and found that even in the later code (2.6.32) this check is not present. I would suggest you to mail it to lkml as the distro will be more willing to consider it for acceptance once the mainline acceptance is complete. Can you please mail it to lkml. And one more request please do CC me (sharyath.com) so that it will be easier to track the issue. Regards Sharyathi N Event posted on 2010-02-24 10:40:44am EST by cambach Hi Jeremy, I installed the new kernel kernel-2.6.18-164.11.1.el5.bz567479.2 on a SONAS box and did a quick test. The error did not show up again as with the first test package. Regards, Christian This event sent from IssueTracker by jeder issue 545163 ------- Comment From tpnoonan.com 2010-02-24 16:33 EDT------- HI Red Hat, Any outlook for when this code will go upstream? Thanks it would appear that locks_mandatory_area() can queue a lock with a NULL filp. um... Just to make it clear - NULL fl->fl_file is not valid. We are talking about workaround for RHEL and *only* that. For mainline GPFS will need to figure out how to live without that, the quoted piece of (misguided) defensive programming in the tree nonwithstanding. Created attachment 396472 [details]
Updated patch for removing f_ep_lock
Fixed a comment in this patch to match the code changes.
------- Comment From sharyath.com 2010-03-09 05:31 EDT------- I am moving this bug to Fixed by Distro and to Submitted Re-open if you like to investigate further in kernel-2.6.18-197.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details. ------- Comment From dixonbp.com 2010-05-13 11:40 EDT------- yes, this can be closed Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The Red Hat Enterprise Linux 5.5 kernel contained a fix for Bugzilla issue number 548657 which introduced a regression in file locking behavior that presented with the General Parallel File System (GPFS). This update removes the redundant locking code. (In reply to comment #4) > This problem was exposed by patch: > linux-2.6-fs-fasync-split-fasync_helper-into-separate-add-remove-functions.patch > that went into 2.6.18-164.11.1.el5. > > Prior to this patch the fasync_helper() function didn't de-reference the > fl_file pointer so a NULL value passed through without incident. But this > function did (and still does) use the fl_file value to filter out which locks > to remove and filtering on a NULL value rather than a valid file pointer may > not be producing the correct results and in that case it would be leaking > memory in the fasync_cache. QE, while testing this bug, please also use the test case for CVE-2009-4141. Email me if you do not have access to the test case already. Thanks. According to the patch, this has been verified by IBM's GPFS developer Brian Dixon. I have ran test case suggested in comment 37 and fsstress, no issue found. And confirmed patch is applied correctly in -233 kernel An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |