Bug 524520 (CVE-2009-3286) - CVE-2009-3286 kernel: O_EXCL creates on NFSv4 are broken
Summary: CVE-2009-3286 kernel: O_EXCL creates on NFSv4 are broken
Keywords:
Status: CLOSED ERRATA
Alias: CVE-2009-3286
Product: Security Response
Classification: Other
Component: vulnerability
Version: unspecified
Hardware: All
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Red Hat Product Security
QA Contact:
URL:
Whiteboard:
: 523797 (view as bug list)
Depends On: 522163 524521 537293 621428
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-21 03:02 UTC by Eugene Teo (Security Response)
Modified: 2021-11-12 20:00 UTC (History)
42 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-21 17:03:48 UTC
Embargoed:


Attachments (Terms of Use)
Proposed patch (7.58 KB, patch)
2009-09-21 03:12 UTC, Eugene Teo (Security Response)
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1548 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2009-11-03 19:33:33 UTC

Description Eugene Teo (Security Response) 2009-09-21 03:02:27 UTC
Description of problem:
   Description From  Eric Paris (eparis)  2009-09-09 12:02:22 EDT   (-) [reply] [edit]     Private

nfs-utils-1.0.9-42.el5
kernel-2.6.18-164.el5
nfs-utils-lib-1.0.8-7.6.el5

Install RHEL 5.4
Create a directory /export
chown user:user /export
clone a git tree into /export (as user)

Put an entry in /etc/exports like:
/export  *(rw,all_squash,fsid=0,anonuid=500,anongid=500,nohide)

mount the export at /import like so:
mount -t nfs4 -o rw,noatime,nodiratime 127.0.0.1:/ /import

Run a git command in /import
cd /import; git status

Watch the command fail.  Files created by git have very odd metadata.  I
managed to get files that looked like:

----------. 1 paris paris 0 1970-05-27 18:50 .git/index.lock
or like
-r-s-wxr-- 1 paris paris 0 Jan 18  2038 .git/objects/tmp_obj_JVIlCu

Obviously the metadata is totally screwed up....

If i add 'nohide' to the export everything seems to be working correctly. 
Something should be done to keep me from getting into this 'broken'
situation...  

Comment #1 From  Eric Paris (eparis)  2009-09-09 12:54:16 EDT   (-) [reply] [edit] -------      Private

Actually it looks like nohide only fixed the problem when I mounted from
another machine.  When I actually mount 127.0.0.1 using NFSv4 even the nohide
doesn't seem to fix the problem..... 

Comment #2 From  dijuremo (dijuremo)  2009-09-11 12:08:12 EDT   (-) [reply] [edit] -------      Private

I can confirm this same problem, I originally found the issue on my file
servers serving nfs and samba along with drbd after updating to 5.4. It all
seems related to the kernel 2.6.18-164.el5. Having all the rpms from 5.4 and
using the older kernel 2.6.18-128.7.1.el5 fixed the problem, so this is most
likely a kernel issue.

I actually kickstarted a new machine, configured it as an nfs server and was
able to reproduce the problem.

I think the priority and severity of this request should be elevated to HIGH!!!

On the server:

[root@phys-ha01 ~]# cat /etc/exports
/              
phys41012.physics.gatech.edu(fsid=0,rw,sync,no_root_squash,nohide)
/web            phys41012.physics.gatech.edu(rw,sync,no_root_squash,nohide)
/tmp            phys41012.physics.gatech.edu(rw,sync,no_root_squash,nohide)
/tmp/test       phys41012.physics.gatech.edu(rw,sync,no_root_squash,nohide)
/              
phys-ha01.physics.gatech.edu(fsid=0,rw,sync,no_root_squash,nohide)
/web            phys-ha01.physics.gatech.edu(rw,sync,no_root_squash,nohide)
/tmp            phys-ha01.physics.gatech.edu(rw,sync,no_root_squash,nohide)
/tmp/test       phys-ha01.physics.gatech.edu(rw,sync,no_root_squash,nohide)

On the client:
[root@phys41012 ~]# mount -t nfs4 -v -o
rw,nosuid,rsize=32768,wsize=32768,hard,intr
phys-ha01.physics.gatech.edu:/tmp/test /tmp/non-drbd/
[dr126@phys41012 tmp]$ cd /tmp/non-drbd/
[dr126@phys41012 non-drbd]$ mount | grep drbd
phys-ha01.physics.gatech.edu:/tmp/test on /tmp/non-drbd type nfs4
(rw,nosuid,rsize=32768,wsize=
[dr126@phys41012 non-drbd]$ ls -la
total 24
drwxr-xr-x  3 root  root 4096 Sep 11 11:30 .
drwxrwxrwt 28 root  root 4096 Sep 11 11:30 ..
drwx------  2 dr126 root 4096 Sep 11 11:38 dr126
[dr126@phys41012 non-drbd]$ cd dr126/
[dr126@phys41012 dr126]$ ls -la
total 16
drwx------ 2 dr126 root   4096 Sep 11 11:38 .
drwxr-xr-x 3 root  root   4096 Sep 11 11:30 ..
-rw------- 1 dr126 nobody    0 Sep 11 11:38 test
---------- 1 dr126 nobody    0 Mar 18  1973 .test2.swo
--wSrwSrwT 1 dr126 nobody    0 Mar 18  1973 .test2.swp
---------- 1 dr126 nobody    0 Mar 18  1973 .test.swo
--wSrwS-wt 1 dr126 nobody    0 Mar 18  1973 .test.swp
[dr126@phys41012 dr126]$ rm -rf test .test*
[dr126@phys41012 dr126]$ touch testfile
[dr126@phys41012 dr126]$ ls -la testfile
-rw------- 1 dr126 nobody 0 Sep 11 11:43 testfile
[dr126@phys41012 dr126]$ vi test
[dr126@phys41012 dr126]$ ls -la
total 16
drwx------ 2 dr126 root   4096 Sep 11 11:43 .
drwxr-xr-x 3 root  root   4096 Sep 11 11:30 ..
-rw------- 1 dr126 nobody    0 Sep 11 11:43 testfile
---------- 1 dr126 nobody    0 Mar 22  1973 .test.swo
---------- 1 dr126 nobody    0 Mar 22  1973 .test.swp  

Comment #3 From  dijuremo (dijuremo)  2009-09-11 13:53:44 EDT   (-) [reply] [edit] -------      Private

Not sure whether or not this helps, but if you are writing as root from the
nfs4 client, then the writes work correctly. If you are writing as a non-root
user then the problem manifests.

[root@phys41012 dr126]# cd /
[root@phys41012 /]# cd /tmp/non-drbd/dr126/
[root@phys41012 dr126]# mount | grep non-drbd
phys-ha01.physics.gatech.edu:/tmp/test on /tmp/non-drbd type nfs4
(rw,nosuid,rsize=32768,wsize=32768,hard,intr,addr=130.207.139.16)
[root@phys41012 dr126]# vi test-root
[root@phys41012 dr126]# ls -l
total 4
-rw-r--r-- 1 root root 23 Sep 11 13:49 test-root
[root@phys41012 dr126]# cat test-root
This is a test by root

Now a regular user with the echo command (this works)

[root@phys41012 dr126]# su dr126
id: cannot find name for group ID 2626
[dr126@phys41012 dr126]$ echo "This is a test by a user" > test-dr126
[dr126@phys41012 dr126]$ ls -la
total 24
drwx------ 2 dr126 root   4096 Sep 11 13:50 .
drwxr-xr-x 4 root  root   4096 Sep 11 11:54 ..
-rw------- 1 dr126 nobody   25 Sep 11 13:50 test-dr126
-rw-r--r-- 1 root  root     23 Sep 11 13:49 test-root

Now a regular user with vi (I assume the way vi handles the open and writing of
the files is different to the way echo does it, that may also give you more
clues as to how to identify the problem).

[dr126@phys41012 dr126]$ vi test-dr126-using-vi

At this point vi warns of an already existing swap file, I abort and:

[dr126@phys41012 dr126]$ ls -la
total 24
drwx------ 2 dr126 root   4096 Sep 11 13:51 .
drwxr-xr-x 4 root  root   4096 Sep 11 11:54 ..
-rw------- 1 dr126 nobody   25 Sep 11 13:50 test-dr126
---------- 1 dr126 nobody    0 Jun 19  1973 .test-dr126-using-vi.swo
---------- 1 dr126 nobody    0 Jun 19  1973 .test-dr126-using-vi.swp
-rw-r--r-- 1 root  root     23 Sep 11 13:49 test-root

Here are the files with the wrong timestamps.  

Comment #4 From  Anders Wäänänen (waananen)  2009-09-13 17:04:37 EDT   (-) [reply] [edit] -------      Private

I can also confirm this. I do believe that the problem is related to O_EXCL
flag to open(2). Try to run mktemp (user of open(2) with O_EXCL) and try to
generate a temporary file on the filesystem. It seems that creating files
without
O_EXCL is ok.

Anders  

Comment #5 From  Jeff Layton (jlayton)  2009-09-14 09:45:44 EDT   (-) [reply] [edit] -------      Private

Going ahead and marking this a regression based on comment #2  

Comment #6 From  Jeff Layton (jlayton)  2009-09-14 11:57:12 EDT   (-) [reply] [edit] -------      Private

I've narrowed it down as far as do_open_permission returning error. We had some
changes in that area in 5.4 (bug 502244) that may be at fault. Still trying to
narrow down the exact problem at this point and how upstream kernels behave.  

Comment #7 From  Jeff Layton (jlayton)  2009-09-14 13:10:57 EDT   (-) [reply] [edit] -------      Private

Here are the results of some of the creates:

---------- 1 testuser testuser         0 Aug 23  1981 testfile_aj6852
---------- 1 testuser testuser         0 Oct  6  1981 testfile_Am8373
---------- 1 testuser testuser         0 Sep 10  1981 testfile_Bf7622
---S-wS--x 1 testuser testuser         0 Sep  2  1981 testfile_ee7185
---------- 1 testuser testuser         0 Sep 10  1981 testfile_ej7627
-rw------- 1 testuser testuser         0 Sep 15 21:42 testfile_EV7164
---------- 1 testuser testuser         0 Sep 16  1981 testfile_hL8008
-rw------- 1 testuser testuser         0 Sep 15 21:55 testfile_Lp7607
---------- 1 testuser testuser         0 Oct  6  1981 testfile_mb8365
---------- 1 testuser testuser         0 Feb 22  1982 testfile_qy9525
--wSrw-r-x 1 testuser testuser         0 Aug 17  1981 testfile_Tt6325
---Sr---w- 1 testuser testuser         0 Aug  2  1981 testfile_Ub6178
---s-wxr-- 1 testuser testuser         0 Aug  4  1981 testfile_vd6236
---------- 1 testuser testuser         0 Jul 31  1981 testfile_VW6132

With an exclusive create, we the file should be getting created with some sort
of sane "default" permissions, and the verifier gets stored in the atime and
mtime fields of the inode. The client is then supposed to go back and reset
those values with a SETATTR call if the create was successful.

What's happening here is that the inital create is occurring, but then the
permission check for the open fails. This short-circuits the rest of the
process, causes the open to return an error and leaves the inode in this funky
state. That also explains why we don't see this problem with root...the
permission check is always passing there (provided we're not root squashing).

The modes are a little disturbing too actually. I believe nfsd is generally
passing an uninitialized value for the mode to the vfs_create call. In some
cases, the mode just happens to end up allowing the permission check to pass.
Those are shown by testfile_Lp7607 and testfile_EV7164 in the above list.

If this hypothesis is correct there are 2 possibilties for fixing it:

1) ensure that the create mode is set in such a way that the subsequent
permission check succeeds

2) always allow the permission check to succeed in this case

It's not clear to me whether and how this was fixed upstream, so I may need to
do some similar investigation there.  

Comment #8 From  Jeff Layton (jlayton)  2009-09-14 13:12:12 EDT   (-) [reply] [edit] -------      Private

So to be clear...while this is technically a regression from 5.3, I think the
patch that went into 5.4 is actually correct and just ended up exposing another
bug in this code.  

Comment #9 From  Jeff Layton (jlayton)  2009-09-14 15:34:43 EDT   (-) [reply] [edit] -------      Private

Ok, I see part of the problem...

In the nfsd4_open struct, the verifier and the iattrs are a union. For
exclusive creates, the verifier field is intended to be used. For other
creates, the iattr piece may be used.

The bogus modes come about because we're setting the verifier, and then later
dereferencing the iattr part of that union. This gives us crazy ia_valid values
and possibly modes as well...

This seems to have been inadvertantly fixed upstream as part of the NFSv4.1
code merge with commit 79fb54abd285b442e1f30f851902f3ddf58e7704. That patch
separated the union into 2 fields for other reasons, but probably fixed this as
well.

That should fix the bogus modes by making it so that the mode is set to S_IFREG
on create. That will probably not completely fix the problem though...it will
probably still fail the following permission check, but that should at least
make things consistent.  

Comment #11 From  Jeff Layton (jlayton)  2009-09-15 09:13:10 EDT   (-) [reply] [edit] -------      Private

Created an attachment (id=361079) [details]
patchset #2

This set of patches should be a complete fix. It includes the patch that I
posted yesterday, plus two other patches. One is a cosmetic patch to clean up
the indentation in do_open_lookup, and the other makes the permissions check
conditional on whether the file was actually created.

All 3 patches are backports from upstream, and fix the reproducer I have for
this. Working on building a test kernel now that I'll have up on my people page
later today.  

Comment #12 From  Jeff Layton (jlayton)  2009-09-15 15:49:46 EDT   (-) [reply] [edit] -------      Private

I've added these patches to my test kernels here:

    http://people.redhat.com/jlayton/

...could those who have reported seeing these issues test them and report
whether they fix the problems for you too?  

Comment #17 From  dijuremo (dijuremo)  2009-09-16 09:52:25 EDT   (-) [reply] [edit] -------      Private

Yep, the patches fixe the problem.

Server
[root@phys-ha01 ~]# uname -a
Linux phys-ha01.physics.gatech.edu 2.6.18-165.el5.jtltest.87 #1 SMP Tue Sep 15
09:37:44 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


Client still using the broken kernel (which means the problem was only server
side).
[dr126@phys41012 dr126]$ uname -a
Linux phys41012.physics.gatech.edu 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48
EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[dr126@phys41012 ~]$ cd /tmp/drbd/
[dr126@phys41012 drbd]$ mount | grep drbd
phys-ha01.physics.gatech.edu:/web on /tmp/drbd type nfs4
(rw,nosuid,rsize=32768,wsize=32768,hard,intr,addr=130.207.139.16)
[dr126@phys41012 drbd]$ ls -la
total 48
drwxr-xr-x  4 root  root  4096 Sep 16 08:58 .
drwxrwxrwt 25 root  root  4096 Sep 16 09:46 ..
-rw-------  1 root  root  6144 Sep 11 15:10 aquota.group
-rw-------  1 root  root  7168 Sep 16 08:41 aquota.user
drwx------  2 dr126 root  4096 Sep 16 08:59 dr126
drwx------  2 root  root 16384 Sep 10 15:16 lost+found
[dr126@phys41012 drbd]$ cd dr126/
[dr126@phys41012 dr126]$ ls -la
total 8
drwx------ 2 dr126 root 4096 Sep 16 08:59 .
drwxr-xr-x 4 root  root 4096 Sep 16 08:58 ..
[dr126@phys41012 dr126]$ touch test
[dr126@phys41012 dr126]$ ls -la test
-rw------- 1 dr126 nobody 0 Sep 16  2009 test
[dr126@phys41012 dr126]$ vi test2
[dr126@phys41012 dr126]$ ls -la
total 12
drwx------ 2 dr126 root   4096 Sep 16  2009 .
drwxr-xr-x 4 root  root   4096 Sep 16 08:58 ..
-rw------- 1 dr126 nobody    0 Sep 16 09:48 test
-rw------- 1 dr126 nobody   20 Sep 16  2009 test2
[dr126@phys41012 dr126]$ cat test2
This is a test file

Thanks for the fix, when do you think this patch will be released to
production?  

Comment #18 From  Jeff Layton (jlayton)  2009-09-17 07:36:44 EDT   (-) [reply] [edit] -------      Private

Thanks for testing it.

It's currently on the proposed list for RHEL5.5.  

Comment #19 From  dijuremo (dijuremo)  2009-09-17 07:47:04 EDT   (-) [reply] [edit] -------      Private

This does not make any sense, RHEL5.5 is months away, and this is a major flaw
that will prevent anybody from using the kernel in 5.4 as an nfs server.  Are
you not planning to release an updated kernel for 5.4 with the nfs fixes before
5.5?  

Comment #21 From  Jeff Layton (jlayton)  2009-09-17 08:04:14 EDT   (-) [reply] [edit] -------      Private

(In reply to comment #19)
> This does not make any sense, RHEL5.5 is months away, and this is a major flaw
> that will prevent anybody from using the kernel in 5.4 as an nfs server.  Are
> you not planning to release an updated kernel for 5.4 with the nfs fixes before
> 5.5?  

I'm not opposed to seeing this fix ship sooner, but in order for that to happen
someone will need to open a support case and request a more immediate fix. If
you do so, please be sure to reference this BZ so that the support folks know
that this is a known problem and that there is a patch that seems to fix it.  

Comment #22 From  Jim Perrin (james.l.perrin.mil)  2009-09-17 13:35:59 EDT   (-) [reply] [edit] -------      Private

Global support services ticket filed. #1953510

I'd very much like to see this fix pushed sooner than 5.5  

Comment #23 From  Bruce A. Locke (blocke)  2009-09-17 13:38:41 EDT   (-) [reply] [edit] -------      Private

I'm afraid I can't open a direct support case as we're using academic licensing
and I'd rather chew off my arm then deal with our hardware vendor's support
team.

What I can say is that I saw a related issue on our systems and the test kernel
above fixes our issue.

We didn't notice the randomized permissions.  In our case many calls in java to
java.io.UnixFileSystem.createFileExclusively failed and rolling the kernel back
or using your test kernel fixes the issue.

In your example it shows a major security issue with DAC permissions being
randomized with a good chance of a file being uploaded through a webapp being
granted SUID and execute.  That isn't enough of a reason?

Thanks for all your efforts so far.  

Comment #24 From  Rob Marti (rjm002)  2009-09-17 13:40:24 EDT   (-) [reply] [edit] -------      Private

Global support services ticket filed. #1953526

I'd very much like to see this fix pushed sooner than 5.5 as well  

Comment #25 From  Tru Huynh (tru)  2009-09-17 15:43:46 EDT   (-) [reply] [edit] -------      Private

#1953582 added  

Comment #26 From  Levente Farkas (lfarkas)  2009-09-18 06:05:05 EDT   (-) [reply] [edit] -------      Private

are you kidding?
that you only plan to fix this only in 5.5? rhn support said they can't do
anything:-(
imho it's a serious regression so you should have to push a new update. the fix
is already found. imho rh has two choice:
- push a new update kernel update for 5.3 which contains all security fix which
is already in 5.4's kernel or
- push an updated kernel for 5.4 which fix nfs bug asap.  

Comment #27 From  Jeff Layton (jlayton)  2009-09-18 06:21:55 EDT   (-) [reply] [edit] -------      Private

It looks like several people are being affected by this bug. I'm going to see
what can be done to get it fixed sooner than 5.5. Stay tuned...  

Comment #30 From  dijuremo (dijuremo)  2009-09-18 07:35:00 EDT   (-) [reply] [edit] -------      Private

(In reply to comment #27)
> It looks like several people are being affected by this bug. I'm going to see
> what can be done to get it fixed sooner than 5.5. Stay tuned...  

Isn't it obvious that anybody running nfs4 on RHEL 5.4 will have this problem?
I guess that 5.4 being so new, it has not yet made it into big commercial
installations as they seem to hold off updating, but this is just such a
critical problem with a very commonly used service that as soon as the big
companies start deploying 5.4 they are going to find themselves in a huge mess
with all their nfs servers broken if this is not fixed. I guess at least we can
run with your patched kernel -165, but that one is not really "official".

I have asked my campus fellows with support contracts to open tickets, add
1953460 and 1953404 to the list.  

Comment #31 From  Levente Farkas (lfarkas)  2009-09-18 08:37:24 EDT   (-) [reply] [edit] -------      Private

i add 1953458 to the list.

Comment 2 Eugene Teo (Security Response) 2009-09-21 03:12:39 UTC
Created attachment 361856 [details]
Proposed patch

Comment 4 Levente Farkas 2009-09-23 09:37:10 UTC
can we get a testing src.rpm?
is the attached patch the final patch?
thanks

Comment 5 Andrew C Aitchison 2009-09-29 16:41:11 UTC
Every release of RHEL 5 seesm to include a new serious NFS kernel bug and very few NFS bugs are fixed between releases. Would I be right to assume that no one who pays for RHEL uses NFS seriously, or perhaps that no one who uses NFS seriously buys RHEL ?

Comment 6 Jim Perrin 2009-09-29 18:49:05 UTC
The response to my paid support ticket said that the fix was planned for release the first week of November. I'm not entirely sure why it has to be another month from now before we see a fix for this. Any other paid support tickets see movement, similar or otherwise?

Comment 7 dijuremo 2009-09-29 18:59:32 UTC
After getting the notification about the patch having to wait until november, we have requested a "hotfix". However, we have not heard back in almost a week. 

Even with paid support this is getting ridiculous, such a critical issue not being addressed faster even when the appropriate fix is already known...

23-SEP-2009 16:17:50 	**RH Support**
I've requested the hotfix for you and will try to find out an ETA on when it could be fulfilled.

Regards,
**RH Support**

23-SEP-2009 15:09:57 	**Our Support Requestor**
I would like to request a hotfix. This issue is breaking our nfs servers, and another 7 weeks seems an awfully long time to wait for a critical patch. When can I tell my users to expect the hotfix kernel?

Comment 8 Rob Marti 2009-09-29 19:20:36 UTC
I've gotten the exact same response - first week of November.  Its been updated every 4 days, so I guess I'm due for an update today. :)  Seriously though - a workaround/hotfix or kernel needs to be released sooner than that.

Comment 9 Arkady Kanevsky 2009-10-01 14:27:36 UTC
Jeff,
I would like to get a patch ASUP.
And also I would like to add my voice for hotfix.
Arkady

Comment 10 Jeff Layton 2009-10-01 14:46:21 UTC
Hotfixes are generally handled by our support organization. If you need a hotfix in the interim before this is released, then please file a support ticket and request one.

Comment 11 Bruce A. Locke 2009-10-01 17:20:43 UTC
So is it common for Red Hat to sit on a completely broken protocol with nasty security issues for three months?  November?  Seriously?

Comment 12 Jim Perrin 2009-10-01 17:49:39 UTC
I'm actually a little shocked by this one, since other fixes involving regression issues have been patched much sooner. 

RH also just released another kernel the other day for a KVM issue, so it's not like they appear to be holding off for consolidation here. 

Folks in this bug are actively vocal and generally appear to be paying customers given the quantity of support ticket numbers listed, so what's going on? 

Could someone from RH explain what the delay is about? If there's a legitimate explanation then it should clear up some of the hostility floating around in this bug. 

I'm not asking for a complete disclosure of practices, but a small peak behind the curtain would certainly go a long way to ease some of the tension here.

Comment 13 Ric Wheeler 2009-10-01 18:11:41 UTC
As Jeff said in https://bugzilla.redhat.com/show_bug.cgi?id=524520#c10, please open a specific ticket with Red Hat support if you want/need a hotfix before we do another minor release.

Comment 14 Jim Perrin 2009-10-01 18:40:17 UTC
I have. Mine is the original paid support ticket listed in the post at the top. (original comment #22)

via paid support I was told November. I asked there why it was taking so long and was offered the hotfix early. That's great, but doesn't help other folks. I'm simply asking here why it's taking so long. 

Would you like me to open a separate paid support ticket to ask why it taking 3 months and lots of pushing to get this fixed?

All I'm asking for is an explanation for the delay. I don't think that's asking for anything overly unreasonable. As I said before, if there's a good reason it should allay the hostility that appears to be in this BZ entry.

Comment 15 dijuremo 2009-10-01 18:45:53 UTC
We have also requested a hot fix via paid support and have not heard anything since Sep 23 as shown in our previous posting:

23-SEP-2009 16:17:50  **RH Support**
I've requested the hotfix for you and will try to find out an ETA on when it
could be fulfilled.

Regards,
**RH Support**

Comment 16 Levente Farkas 2009-10-01 18:48:04 UTC
many of us already open a ticket. and we just waiting. in most other case rh can release a kernel update in a much faster way, but as most of us don't understand how can it happened that such a basic service bug not fixed for months?

Comment 19 dijuremo 2009-10-02 12:12:34 UTC
The following hotfix packages have been made available to us via our support
ticket:

kernel-2.6.18-164.3.1.el5.x86_64.rpm
kernel-2.6.18-164.3.1.el5.i686.rpm

If you have a support contract I assume you can get those as well until
everyone gets the patched kernel in the first week of November.

Comment 20 Jim Perrin 2009-10-02 22:01:34 UTC
Via paid support, the word I got on the delay was that they were going to release a roll-up kernel to fix this and several other issues, and that they try to do this monthly. This bug missed the cycle for the last monthly update.


This to me is a workable explanation that is a middle ground between inundating folks with new kernels and not releasing updates at all. Given that the hotfix is available for those who request it, I'm satisfied with this explanation. 

No idea why this couldn't have been explained here in the BZ though.

Comment 21 Jeff Layton 2009-10-08 18:34:12 UTC
*** Bug 523797 has been marked as a duplicate of this bug. ***

Comment 23 Eugene Teo (Security Response) 2009-10-13 01:34:28 UTC
(In reply to comment #20)
> Via paid support, the word I got on the delay was that they were going to
> release a roll-up kernel to fix this and several other issues, and that they
> try to do this monthly. This bug missed the cycle for the last monthly update.
> 
> 
> This to me is a workable explanation that is a middle ground between inundating
> folks with new kernels and not releasing updates at all. Given that the hotfix
> is available for those who request it, I'm satisfied with this explanation. 
> 
> No idea why this couldn't have been explained here in the BZ though.  

Jim, this bug was reported to us during the QE phase in the last kernel update, so we can only fix it in the next update. I could have explained this in the bug in response to c#5, but I was already on vacation then. Sorry about it. As mentioned in c#19, hotfixes are available, kindly contact Red Hat Support for them. Thanks.

Comment 24 Levente Farkas 2009-10-14 08:50:08 UTC
can we know the the hotfix mentioned in #13 and the coming kernel update in november will contain the same patch as attached in #2?
thanks.

Comment 25 Eugene Teo (Security Response) 2009-10-14 09:02:57 UTC
(In reply to comment #24)
> can we know the the hotfix mentioned in #13 and the coming kernel update in
> november will contain the same patch as attached in #2?
> thanks.  

Yes, of course.

Comment 28 Mike Hanby 2009-10-27 17:49:39 UTC
I'd like to add that I experienced the metadata issues on my 5.4 NFSv4 server (5.4 clients as well). In addition I also got a lot of reports from users where applications couldn't establish a file lock (/home is NFSv4 mounted):

$ ssh -X wkstation01
/usr/bin/xauth:  error in locking authority file /home/mhanby/.Xauthority

As suggested above, booting to kernel 2.6.18-128.7.1 appears to have resolved the issue.

I haven't tried the patched kernel yet. Just figured I'd add the locking issue in case someone searches for it since I did a good bit of searching before an IRC user directed me to this bug.

Comment 29 errata-xmlrpc 2009-11-03 19:33:38 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2009:1548 https://rhn.redhat.com/errata/RHSA-2009-1548.html

Comment 33 Rudd-O DragonFear 2010-08-04 08:53:32 UTC
This problem happens in Fedora 13, kernel 2.6.33.6-147.2.4.fc13.x86_64 and earlier kernels, if the exported file system is a FUSE file system (REGARDLESS of the kernel in the exporter server) in NFS versions 2 and 3.

Details here:

http://groups.google.com/group/zfs-fuse/tree/browse_frm/thread/767dec5dd628d839/45d14c213613741f?rnum=11&q=nfs+problems&_done=%2Fgroup%2Fzfs-fuse%2Fbrowse_frm%2Fthread%2F767dec5dd628d839%2F45d14c213613741f%3Flnk%3Dgst%26q%3Dnfs%2Bproblems%26#doc_230660389d18e348

Comment 34 Rudd-O DragonFear 2010-08-04 08:54:58 UTC
Sorry, my last comment should be NFS4.  NFS3 and NFS2 only produce a metric ton of ESTALE errors in the client.

Comment 36 Eugene Teo (Security Response) 2010-08-05 05:27:02 UTC
(In reply to comment #33)
> This problem happens in Fedora 13, kernel 2.6.33.6-147.2.4.fc13.x86_64 and
> earlier kernels, if the exported file system is a FUSE file system (REGARDLESS
> of the kernel in the exporter server) in NFS versions 2 and 3.
> 
> Details here:
> 
> http://groups.google.com/group/zfs-fuse/tree/browse_frm/thread/767dec5dd628d839/45d14c213613741f?rnum=11&q=nfs+problems&_done=%2Fgroup%2Fzfs-fuse%2Fbrowse_frm%2Fthread%2F767dec5dd628d839%2F45d14c213613741f%3Flnk%3Dgst%26q%3Dnfs%2Bproblems%26#doc_230660389d18e348    

Rudd, please file a new bug for this. If you can provide the steps to reproduce the issue, even better. Thanks.

Comment 38 Jeff Layton 2010-08-05 11:59:25 UTC
This bug was due to a bug introduced in a backport to RHEL5. It should be a problem only in a small subset of RHEL5 kernels and was fixed quite some time ago. I suspect that any problems in Fedora are unrelated to this, even if the symptoms look the same.

Comment 39 Rudd-O DragonFear 2010-08-06 05:43:27 UTC
OK.


Note You need to log in before you can comment on or make changes to this bug.