Description of problem: On RHEL5, If I mount with NFSV4 using kerberos krb5p flavor, I get a hang when running under this mount the LTP fsx benchmark in a infinite while true loop. The NFSV4 server can be Fedora Core 6 ( linux-2.6.19-rc6-CITI_NFS4_ALL-1) or AIX 5.3 How reproducible: On RHEL5: mount -t nfs4 -o sec=krb5p nfs4_gb:/ /mnt/krb5p cd /mnt/krb5p while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_alto_gb; date; sleep 30; done Cheers
Created attachment 148658 [details] messages.fsx.krb5p.gz Hi, Here it is the /var/log/messages until the freeze after doing echo t > /proc/sysrq-trigger Cheers
Hi, Try the last release 2.6.18-8.el5, problem still there. Best regards.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
What are the arugments are you giving to fsx?
(In reply to comment #4) > What are the arugments are you giving to fsx? On RHEL5: mount -t nfs4 -o sec=krb5p nfs4_gb:/ /mnt/krb5p cd /mnt/krb5p while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_alto_gb; date; sleep 30; done Cheers
What type of machine are you using? x86_64? SMP? How much memory?
Also how long does the test have to run before hang happens?
Using the in http://people.redhat.com/dzickus/el5/28.el5/, I was able to continuously run the fsx without any hangs or opps... There has heen a number NFS patches that could help with this...
(In reply to comment #8) > Using the in http://people.redhat.com/dzickus/el5/28.el5/, I was able to > continuously run the fsx without any hangs or opps... There has > heen a number NFS patches that could help with this... Hi, Sorry I thought having given an answer to comments #6 and #7. Client running RHEL5 is a x86_64 four CPUs 2Gmemory It is nevertheless a 32bits distribution on it. The server is a x86_64 two CPUs 2Gmemory It is a running a Fedora Core 6 64bits with a Linux version 2.6.21-CITI_NFS4_ALL-1 kernel I try on the client RHEL5 el5/28.el5 with fsx on a krb5p mount with the following way and parameters: while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_nfs2_gb; date; sleep 30; done The client RHEL5 machine hangs totally (need reboot)after a few minutes. Cheers
Ok... thanks for trying... let me find a similar machine to rerun the tests...
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
Surprising enough, v4 seems to work but I get the following when running with v3,sec=krb5: ~/work/fsx/fsx -N 50000 /mnt/rhelxen/home/tmp/fsx_nfs1_alto_gb truncating to largest ever: 0x13e76 2 trunc from 0x13e76 to 0x26858 READ BAD DATA: offset = 0xc73e, size = 0xf0c4 OFFSET GOOD BAD RANGE 0x17098 0x02ae 0x0000 0x 4735 operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops LOG DUMP (4 total operations): Segmentation fault
*** Bug 229626 has been marked as a duplicate of this bug. ***
We've also seen hangs when running iozone inside a Xen dom-U which is backed by a file on NFSv4. If the domain image filesystem is mounted using NFSv3 instead, the benchmark finishes without problem. It seems very reproducible, in fact we've not yet been able to finish the NFSv4 benchmarks due to the hangs. Tore
Could this be related to the rpc.idmapd deadlock described in bug 483365?
Does this still happen on a more recent RHEL5 kernels? There has been a large number of changes to the NFS code since 5.0...
Yes it does, See this: https://bugzilla.redhat.com/show_bug.cgi?id=690196 Happens with the very recent kernel..
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days