Bug 226756 - Hang with RHEL5 used as NFSV4 client when fsx bench running
Summary: Hang with RHEL5 used as NFSV4 client when fsx bench running
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
: 229626 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-01 14:38 UTC by Aime Le Rouzic
Modified: 2023-09-14 01:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-21 17:15:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
messages.fsx.krb5p.gz (20.88 KB, application/postscript)
2007-02-23 10:41 UTC, Aime Le Rouzic
no flags Details

Description Aime Le Rouzic 2007-02-01 14:38:15 UTC
Description of problem:

On RHEL5, If I mount with NFSV4 using kerberos krb5p flavor, I get a hang
when running under this mount the LTP fsx benchmark in a infinite while true
loop. The NFSV4 server can be Fedora Core 6 ( linux-2.6.19-rc6-CITI_NFS4_ALL-1)
or AIX 5.3

How reproducible:

On RHEL5:
mount -t nfs4 -o sec=krb5p nfs4_gb:/ /mnt/krb5p
cd /mnt/krb5p
while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_alto_gb; date; sleep 30;
done


Cheers

Comment 1 Aime Le Rouzic 2007-02-23 10:41:22 UTC
Created attachment 148658 [details]
messages.fsx.krb5p.gz

Hi,
Here it is the /var/log/messages until the freeze after doing echo t >
/proc/sysrq-trigger

Cheers

Comment 2 Aime Le Rouzic 2007-03-09 16:18:27 UTC
Hi,
Try the last release 2.6.18-8.el5, problem still there.

Best regards.

Comment 3 RHEL Program Management 2007-04-25 21:44:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Steve Dickson 2007-06-05 19:34:10 UTC
What are the arugments are you giving to fsx?

Comment 5 Aime Le Rouzic 2007-06-06 06:40:50 UTC
(In reply to comment #4)
> What are the arugments are you giving to fsx?

On RHEL5:
mount -t nfs4 -o sec=krb5p nfs4_gb:/ /mnt/krb5p
cd /mnt/krb5p
while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_alto_gb; date; sleep 30;
done


Cheers

Comment 6 Steve Dickson 2007-06-08 11:59:46 UTC
What type of machine are you using? x86_64? 
SMP? How much memory?

Comment 7 Steve Dickson 2007-06-08 12:02:16 UTC
Also how long does the test have to run before hang happens?

Comment 8 Steve Dickson 2007-06-21 17:39:38 UTC
Using the in http://people.redhat.com/dzickus/el5/28.el5/, I was able to 
continuously run the fsx without any hangs or opps... There has 
heen a number NFS patches that could help with this...  

Comment 9 Aime Le Rouzic 2007-06-27 09:59:23 UTC
(In reply to comment #8)
> Using the in http://people.redhat.com/dzickus/el5/28.el5/, I was able to 
> continuously run the fsx without any hangs or opps... There has 
> heen a number NFS patches that could help with this...  

Hi, 
Sorry I thought having given an answer to comments #6 and #7.
Client running RHEL5 is a x86_64 four CPUs 2Gmemory
It is nevertheless a 32bits distribution on it.

The server is a x86_64 two CPUs 2Gmemory
It is a running a Fedora Core 6 64bits with a Linux version
2.6.21-CITI_NFS4_ALL-1 kernel

I try on the client RHEL5 el5/28.el5 with fsx on a krb5p mount
with the following way and parameters:
while true; do ./fsx-linux -N 50000 /mnt/krb5p/fsx_nfs1_nfs2_gb; date; sleep 30;
done

The client RHEL5 machine hangs totally (need reboot)after a few minutes.

Cheers

Comment 10 Steve Dickson 2007-06-29 13:28:15 UTC
Ok... thanks for trying... let me find a similar machine to rerun the tests...

Comment 11 RHEL Program Management 2007-09-07 19:55:51 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 12 Steve Dickson 2008-05-22 18:22:50 UTC
Surprising enough, v4 seems to work but I get the following
when running with v3,sec=krb5:
 
~/work/fsx/fsx -N 50000 /mnt/rhelxen/home/tmp/fsx_nfs1_alto_gb
truncating to largest ever: 0x13e76
2 trunc from 0x13e76 to 0x26858
READ BAD DATA: offset = 0xc73e, size = 0xf0c4
OFFSET  GOOD    BAD     RANGE
0x17098 0x02ae  0x0000  0x 4735
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (4 total operations):
Segmentation fault


Comment 13 Steve Dickson 2008-05-22 18:23:39 UTC
*** Bug 229626 has been marked as a duplicate of this bug. ***

Comment 14 Tore Anderson 2009-10-19 07:32:29 UTC
We've also seen hangs when running iozone inside a Xen dom-U which is backed by a file on NFSv4.  If the domain image filesystem is mounted using NFSv3 instead, the benchmark finishes without problem.

It seems very reproducible, in fact we've not yet been able to finish the NFSv4 benchmarks due to the hangs.

Tore

Comment 15 Nathaniel W. Turner 2009-11-17 22:59:59 UTC
Could this be related to the rpc.idmapd deadlock described in bug 483365?

Comment 18 Steve Dickson 2011-01-22 19:00:51 UTC
Does this still happen on a more recent RHEL5 kernels? There has been a 
large number of changes to the NFS code since 5.0...

Comment 19 Ondrej Valousek 2011-03-24 08:08:26 UTC
Yes it does,
See this:
https://bugzilla.redhat.com/show_bug.cgi?id=690196
Happens with the very recent kernel..

Comment 20 Red Hat Bugzilla 2023-09-14 01:11:00 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.