Description of problem: The ext2 and ext3 permission functions do not properly handle the return value from ext2/3_get_acl. This is an ERR_PTR but is treated as a plain pointer: fs/ext3/acl.c:ext3_permission(): acl = ext3_get_acl(inode, ACL_TYPE_ACCESS); if (acl) { int error = posix_acl_permission(inode, acl, mask); posix_acl_release(acl); if (error == -EACCES) goto check_capabilities; return error; } else In case of an I/O error, acl will have the value 0xfffffffffffffffb (-5) causing the oops when the FOREACH_ACL_ENTRY macro tries to access the acl's a_count or a_entries members. Version-Release number of selected component (if applicable): kernel-2.6.9-55.EL How reproducible: Difficult - need to have an I/O or out-of-memory error at the time ext2 or ext3 is attempting to validate POSIX ACLs via ext2_permission/ext3_permission. This should be easier to test with the ability to inject errors into the block device(e.g. using dm-flakey). Steps to Reproduce: 1. Configure a block device so that access to at least one ACL xattr block will return an I/O error 2. Attempt an access to the file system that will require checking of the erroring ACL Actual results: EXT3-fs error (device dm-47): ext3_readdir: directory #2 contains a hole at offset 0 inode_doinit_with_dentry: getxattr returned 5 for dev=dm-18 ino=30245480 Unable to handle kernel paging request at ffffffffffffffff RIP: <ffffffff801a62d4>{posix_acl_permission+2} PML4 103027 PGD 1051067 PMD 0 Oops: 0000 [1] SMP CPU 2 Modules linked in: nfsd exportfs parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc ds yenta_socket pcmcia_core ide_dump scsi_dump diskdump zlib_deflate ipmi_watchdog ipmi_de vintf ipmi_si ipmi_msghandler joydev button battery ac uhci_hcd ehci_hcd hw_random e1000 bonding(U) floppy ata_piix libata ipw2100 ieee80211 ieee80211_crypt sg dm_snapshot dm_zero dm_mirror ext3 jb d dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mod Pid: 5225, comm: qsmserver Not tainted 2.6.9-42.29.ELsmp RIP: 0010:[<ffffffff801a62d4>] <ffffffff801a62d4>{posix_acl_permission+2} RSP: 0018:00000101aa15fdb8 EFLAGS: 00010282 RAX: fffffffffffffffb RBX: fffffffffffffffb RCX: 00000000fffffffb RDX: 0000000000000004 RSI: fffffffffffffffb RDI: 00000101672439d0 RBP: 000000000000816d R08: 000001001bc97600 R09: 00000101bf474860 R10: ffffffff8015d1c2 R11: ffffffff8015d1c2 R12: 00000101672439d0 R13: 0000000000000004 R14: 0000000000008001 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffffffff804e7200(005b) knlGS:00000000f7e016c0 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: ffffffffffffffff CR3: 0000000005d6e000 CR4: 00000000000006e0 Process fooserver (pid: 5225, threadinfo 00000101aa15e000, task 00000101791c3030) Stack: 0000000000008001 ffffffffa007268a 00000101aa15fc98 00000101672439d0 0000000000000004 000000000000816d 00000101aa15fea8 ffffffff80185fe9 deaf1eed01000000 00000101672439d0 Call Trace:<ffffffffa007268a>{:ext3:ext3_permission+213} <ffffffff80185fe9>{permission+86} <ffffffff80187d29>{may_open+88} <ffffffff8018821e>{open_namei+788} <ffffffff80179214>{filp_open+80} <ffffffff801ed9e5>{strncpy_from_user+74} <ffffffff801287b1>{sys32_open+54} <ffffffff80126047>{sysenter_do_call+27} Code: 8b 46 04 49 89 fe 41 55 45 31 ed 41 54 55 89 d5 53 48 8d 5e RIP <ffffffff801a62d4>{posix_acl_permission+2} RSP <00000101aa15fdb8> CR2: ffffffffffffffff In this case, the I/O errors came from a dm snapshot device being invalidated. The "directory contains a hole" and "getxattr returned 5" errors are expected but the oops shouldn't happen. Expected results: Some scary looking errors but no oops. Additional info: Upstream reorganised ext2/3_permission to use generic_permission in 2.6.10, moving this bug into a new function, ext2_check_acl/ext3_check_acl. This was then fixed by e493073d8d053429fbb42331b57a95dd0d61cadb. The attached patch just does the same thing directly in ext3_permission and ext2_permission.
Created attachment 155818 [details] patch mirroring upstream fix
Gitweb link: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e493073d8d053429fbb42331b57a95dd0d61cadb
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
Putting this back on Bryn's plate - He was planning to follow it all the way through, but it wasn't assigned so I picked it up (jumped the gun a bit). Nice job debugging & finding the root cause & upstream patch, thanks! -Eric
committed in stream U6 build 55.15. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html