Bug 1296607 - have a configurable connection timeout for the rpcgssd service
have a configurable connection timeout for the rpcgssd service
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: nfs-utils (Show other bugs)
6.7
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Steve Dickson
Yongcheng Yang
:
Depends On: 1009528
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-07 11:12 EST by Olga Kornieskaia
Modified: 2016-05-10 20:04 EDT (History)
4 users (show)

See Also:
Fixed In Version: nfs-utils-1.2.3-68.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-10 20:04:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
modified patch of the upstream commit (3.17 KB, application/mbox)
2016-01-07 11:12 EST, Olga Kornieskaia
no flags Details

  None (edit)
Description Olga Kornieskaia 2016-01-07 11:12:27 EST
Created attachment 1112491 [details]
modified patch of the upstream commit

Description of problem:
Asking to include the fix for bugzilla 1009528 into RHEL6.x version. It's the same problem that when the server takes too long to respond, it leads gssd to timeout and log something like: 
"Jan  6 05:21:01 scspr0055500001 rpc.gssd[3152]: WARNING: can't create tcp 
rpc_clnt to server ae1nb141541.dmag212.com for user with uid 70120: RPC: Remote system error - Connection timed out"

Then, the kernel fails the operation with "permission denied". This problem has been observed by the NetApp QA on RHEL6.6 nfs-utils-1.2.3-54 version.

Fix is to include upstream commit b7b9e4e69587652b40d543094b133fb8b4881dc1
commit b7b9e4e69587652b40d543094b133fb8b4881dc1
Author: Rinku Kothiya <rkothiya@redhat.com>
Date:   Fri Jul 25 10:50:55 2014 -0400

   gssd: configurable connection timeout for the rpcgssd service

As the patch doesn't cleanly apply, I'm attaching a slightly modified version of that patch that does apply.

Version-Release number of selected component (if applicable):

How reproducible:


Steps to Reproduce:
1. mount with krb5
2. kinit as some user
3. create a test directory and there as user execute a test application (hammer) that has a variety of open, lock, read, write, rename, delete  ops etc
4. mount with krb5i
5. kinit as some user
6. same as 3
7. repeat 1-6 3 more times.

the failure is for when context is getting established for the user.


Actual results:


Expected results:


Additional info:
Comment 3 Steve Dickson 2016-01-08 15:19:06 EST
> What do they have
> to do to reproduce this problem?

From what I understand, the test case does something like
1. mount with krb5
2. kinit as some user
3. create a test directory and there as user execute a test application (hammer) that has a variety of open, lock, read, write, rename, delete  ops etc
4. mount with krb5i
5. kinit as some user
6. same as 3
7. repeat 1-6 3 more times.

the failure is for when context is getting established for the user.
Comment 5 Olga Kornieskaia 2016-01-08 15:31:03 EST
I was also able to reproduce this locally using "tc".

1. sudo mount -t nfs4 -osec=krb5 192.168.1.120:/ /mnt
2. kinit
3. tc qdisc add dev eth0 root handle 1: prio
4. tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 match ip dst 192.168.1.120 flowid 2:1
5. tc qdisc change dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms
6. echo testing > /mnt/test

this fails with "permission denied"

Then run gssd with "-T 60" and see that test succeeds.
Comment 6 Yongcheng Yang 2016-01-11 05:28:01 EST
(In reply to Olga Kornieskaia from comment #5)
> I was also able to reproduce this locally using "tc".
> 
> 1. sudo mount -t nfs4 -osec=krb5 192.168.1.120:/ /mnt
> 2. kinit
> 3. tc qdisc add dev eth0 root handle 1: prio
> 4. tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32
> match ip dst 192.168.1.120 flowid 2:1
> 5. tc qdisc change dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms
> 6. echo testing > /mnt/test
> 
> this fails with "permission denied"
> 
> Then run gssd with "-T 60" and see that test succeeds.

Hi Olga,
I cannot reproduce it using "tc" in previous version nfs-utils-1.2.3-64.el6.
Please help to have a look at the following steps:
[18:17:11 root@ ~~]# mount -t nfs4 -o sec=krb5 $HOSTNAME:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[18:17:11 root@ ~~]# kinit root <<< 'redhat'
Password for root@RHQE.COM: 
[18:17:11 root@ ~~]# tc qdisc add dev eth0 root handle 1: prio
[18:17:11 root@ ~~]# tc qdisc show dev eth0
qdisc prio 1: root refcnt 9 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
[18:17:11 root@ ~~]# tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 match ip dst 10.73.4.241 flowid 2:1
[18:17:11 root@ ~~]# tc filter show dev eth0
filter parent 1: protocol ip pref 55 u32 
filter parent 1: protocol ip pref 55 u32 fh 800: ht divisor 1 
filter parent 1: protocol ip pref 55 u32 fh 800::55 order 85 key ht 800 bkt 0 flowid 2:1 
  match 0a4904f1/ffffffff at 16
[18:17:12 root@ ~~]# tc qdisc add dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms
^^^^^^^^^^^^^ "tc qdisc change" fail "RTNETLINK answers: No such file or directory"
^^^^^^^^^^^^^ So I replace the "change" to be "add" as someone said
[18:17:12 root@ ~~]# tc qdisc show dev eth0
qdisc prio 1: root refcnt 9 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc netem 2: parent 1:1 limit 1000 delay 10.0s  10.0s
[18:17:12 root@ ~~]# date >/mnt/nfsmp-krb5-common-local/testfile
[18:17:12 root@ ~~]# umount /mnt/nfsmp-krb5-common-local
[18:17:12 root@ ~~]# cat /var/log/messages | grep 'Connection timed out'
[18:17:12 root@ ~~]#
Comment 7 Olga Kornieskaia 2016-01-11 08:32:16 EST
Have you verified that you see a delay happening? Is this a localhost mount? If so you are setting the delay for eth0. Given your time output of the shell: 18:17:12 is your start of the "date" command and then "unmount" again at "18:17:12" so if the delay was 10s the delay definitely didn't happen. that command should have taken some time to complete.
Comment 8 Yongcheng Yang 2016-01-11 20:59:11 EST
(In reply to Olga Kornieskaia from comment #7)
> Have you verified that you see a delay happening? Is this a localhost mount?
> If so you are setting the delay for eth0. Given your time output of the
> shell: 18:17:12 is your start of the "date" command and then "unmount" again
> at "18:17:12" so if the delay was 10s the delay definitely didn't happen.
> that command should have taken some time to complete.

Yes, this is a localhost mount and I have see the delay happened.
I can see the delay between my typing and screen showing, but seems the command exec without delay. I'm wondering whether it's the matter that "tc qdisc add" or "tc qdisc change" (in Comment 6)

[root@hp-dl160g8-01 ~]# rpm -q nfs-utils
nfs-utils-1.2.3-64.el6.x86_64
[root@hp-dl160g8-01 ~]# mount -t nfs4 -o sec=krb5 hp-dl160g8-01.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[root@hp-dl160g8-01 ~]# kinit root <<< 'redhat'
Password for root@RHQE.COM: 
[root@hp-dl160g8-01 ~]# tc qdisc add dev eth0 root handle 1: prio
[root@hp-dl160g8-01 ~]# tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 match ip dst 10.73.4.241 flowid 2:1
[root@hp-dl160g8-01 ~]# tc qdisc add dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ delay happening after this
[root@hp-dl160g8-01 ~]#  tc qdisc show dev eth0
qdisc prio 1: root refcnt 9 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc netem 2: parent 1:1 limit 1000 delay 10.0s  10.0s
[root@hp-dl160g8-01 ~]# tc filter show dev eth0
filter parent 1: protocol ip pref 55 u32 
filter parent 1: protocol ip pref 55 u32 fh 800: ht divisor 1 
filter parent 1: protocol ip pref 55 u32 fh 800::55 order 85 key ht 800 bkt 0 flowid 2:1 
  match 0a4904f1/ffffffff at 16
[root@hp-dl160g8-01 ~]# date; date >/mnt/nfsmp-krb5-common-local/testfile; date
Tue Jan 12 09:50:29 CST 2016
Tue Jan 12 09:50:29 CST 2016
[root@hp-dl160g8-01 ~]# date; umount /mnt/nfsmp-krb5-common-local; date
Tue Jan 12 09:51:10 CST 2016
Tue Jan 12 09:51:10 CST 2016
[root@hp-dl160g8-01 ~]# tc qdisc del dev eth0 root
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ delay disappeared after this
[root@hp-dl160g8-01 ~]# tc qdisc del dev eth0 root
RTNETLINK answers: No such file or directory
[root@hp-dl160g8-01 ~]# cat /var/log/messages | grep 'Connection timed out'
[root@hp-dl160g8-01 ~]# echo $?
1
[root@hp-dl160g8-01 ~]#
Comment 9 Olga Kornieskaia 2016-01-12 08:25:00 EST
If this is a localhost mount then mount goes over loopback interface, not eth0.
I don't see any delay in your output.
Comment 10 Yongcheng Yang 2016-02-04 04:53:20 EST
(In reply to Olga Kornieskaia from comment #9)
> If this is a localhost mount then mount goes over loopback interface, not
> eth0.
> I don't see any delay in your output.
Thanks Olga, I have been able to reproduce this problem.
Thanks again for your detailed reproducer!

> I was also able to reproduce this locally using "tc".
> 
> 1. sudo mount -t nfs4 -osec=krb5 192.168.1.120:/ /mnt
> 2. kinit
> 3. tc qdisc add dev eth0 root handle 1: prio
> 4. tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32
> match ip dst 192.168.1.120 flowid 2:1
               ^^^^^^^^^^^^^ It's the server's IP, previously I wrongly understand
> 5. tc qdisc change dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms
> 6. echo testing > /mnt/test
> 
> this fails with "permission denied"
> 
> Then run gssd with "-T 60" and see that test succeeds.
Comment 12 Yongcheng Yang 2016-02-19 02:41:26 EST
Move to VERIFIED as Comment 11
Comment 14 errata-xmlrpc 2016-05-10 20:04:48 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0868.html

Note You need to log in before you can comment on or make changes to this bug.