Bug 226756 - Hang with RHEL5 used as NFSV4 client when fsx bench running [NEEDINFO]
Hang with RHEL5 used as NFSV4 client when fsx bench running
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
:
: 229626 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-01 09:38 EST by Aime Le Rouzic
Modified: 2014-05-21 13:15 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-05-21 13:15:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
steved: needinfo? (aime.le-rouzic)


Attachments (Terms of Use)
messages.fsx.krb5p.gz (20.88 KB, application/postscript)
2007-02-23 05:41 EST, Aime Le Rouzic
no flags Details

  None (edit)
Description Aime Le Rouzic 2007-02-01 09:38:15 EST
Description of problem:

On RHEL5, If I mount with NFSV4 using kerberos krb5p flavor, I get a hang
when running under this mount the LTP fsx benchmark in a infinite while true
loop. The NFSV4 server can be Fedora Core 6 ( linux-2.6.19-rc6-CITI_NFS4_ALL-1)
or AIX 5.3

How reproducible:

On RHEL5:
mount -t nfs4 -o sec=krb5p nfs4_gb:/ /mnt/krb5p
cd /mnt/krb5p
while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_alto_gb; date; sleep 30;
done


Cheers
Comment 1 Aime Le Rouzic 2007-02-23 05:41:22 EST
Created attachment 148658 [details]
messages.fsx.krb5p.gz

Hi,
Here it is the /var/log/messages until the freeze after doing echo t >
/proc/sysrq-trigger

Cheers
Comment 2 Aime Le Rouzic 2007-03-09 11:18:27 EST
Hi,
Try the last release 2.6.18-8.el5, problem still there.

Best regards.
Comment 3 RHEL Product and Program Management 2007-04-25 17:44:41 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Steve Dickson 2007-06-05 15:34:10 EDT
What are the arugments are you giving to fsx?
Comment 5 Aime Le Rouzic 2007-06-06 02:40:50 EDT
(In reply to comment #4)
> What are the arugments are you giving to fsx?

On RHEL5:
mount -t nfs4 -o sec=krb5p nfs4_gb:/ /mnt/krb5p
cd /mnt/krb5p
while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_alto_gb; date; sleep 30;
done


Cheers
Comment 6 Steve Dickson 2007-06-08 07:59:46 EDT
What type of machine are you using? x86_64? 
SMP? How much memory?
Comment 7 Steve Dickson 2007-06-08 08:02:16 EDT
Also how long does the test have to run before hang happens?
Comment 8 Steve Dickson 2007-06-21 13:39:38 EDT
Using the in http://people.redhat.com/dzickus/el5/28.el5/, I was able to 
continuously run the fsx without any hangs or opps... There has 
heen a number NFS patches that could help with this...  
Comment 9 Aime Le Rouzic 2007-06-27 05:59:23 EDT
(In reply to comment #8)
> Using the in http://people.redhat.com/dzickus/el5/28.el5/, I was able to 
> continuously run the fsx without any hangs or opps... There has 
> heen a number NFS patches that could help with this...  

Hi, 
Sorry I thought having given an answer to comments #6 and #7.
Client running RHEL5 is a x86_64 four CPUs 2Gmemory
It is nevertheless a 32bits distribution on it.

The server is a x86_64 two CPUs 2Gmemory
It is a running a Fedora Core 6 64bits with a Linux version
2.6.21-CITI_NFS4_ALL-1 kernel

I try on the client RHEL5 el5/28.el5 with fsx on a krb5p mount
with the following way and parameters:
while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_nfs2_gb; date; sleep 30;
done

The client RHEL5 machine hangs totally (need reboot)after a few minutes.

Cheers
Comment 10 Steve Dickson 2007-06-29 09:28:15 EDT
Ok... thanks for trying... let me find a similar machine to rerun the tests...
Comment 11 RHEL Product and Program Management 2007-09-07 15:55:51 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 12 Steve Dickson 2008-05-22 14:22:50 EDT
Surprising enough, v4 seems to work but I get the following
when running with v3,sec=krb5:
 
~/work/fsx/fsx -N 50000 /mnt/rhelxen/home/tmp/fsx_nfs1_alto_gb
truncating to largest ever: 0x13e76
2 trunc from 0x13e76 to 0x26858
READ BAD DATA: offset = 0xc73e, size = 0xf0c4
OFFSET  GOOD    BAD     RANGE
0x17098 0x02ae  0x0000  0x 4735
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (4 total operations):
Segmentation fault
Comment 13 Steve Dickson 2008-05-22 14:23:39 EDT
*** Bug 229626 has been marked as a duplicate of this bug. ***
Comment 14 Tore Anderson 2009-10-19 03:32:29 EDT
We've also seen hangs when running iozone inside a Xen dom-U which is backed by a file on NFSv4.  If the domain image filesystem is mounted using NFSv3 instead, the benchmark finishes without problem.

It seems very reproducible, in fact we've not yet been able to finish the NFSv4 benchmarks due to the hangs.

Tore
Comment 15 Nathaniel W. Turner 2009-11-17 17:59:59 EST
Could this be related to the rpc.idmapd deadlock described in bug 483365?
Comment 18 Steve Dickson 2011-01-22 14:00:51 EST
Does this still happen on a more recent RHEL5 kernels? There has been a 
large number of changes to the NFS code since 5.0...
Comment 19 Ondrej Valousek 2011-03-24 04:08:26 EDT
Yes it does,
See this:
https://bugzilla.redhat.com/show_bug.cgi?id=690196
Happens with the very recent kernel..

Note You need to log in before you can comment on or make changes to this bug.