| Summary: | have a configurable connection timeout for the rpcgssd service | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Olga Kornieskaia <kolga> | ||||
| Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Yongcheng Yang <yoyang> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.7 | CC: | eguan, fs-qe, kolga, yoyang | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | nfs-utils-1.2.3-68.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-05-11 00:04:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | 1009528 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
> What do they have
> to do to reproduce this problem?
From what I understand, the test case does something like
1. mount with krb5
2. kinit as some user
3. create a test directory and there as user execute a test application (hammer) that has a variety of open, lock, read, write, rename, delete ops etc
4. mount with krb5i
5. kinit as some user
6. same as 3
7. repeat 1-6 3 more times.
the failure is for when context is getting established for the user.
I was also able to reproduce this locally using "tc". 1. sudo mount -t nfs4 -osec=krb5 192.168.1.120:/ /mnt 2. kinit 3. tc qdisc add dev eth0 root handle 1: prio 4. tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 match ip dst 192.168.1.120 flowid 2:1 5. tc qdisc change dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms 6. echo testing > /mnt/test this fails with "permission denied" Then run gssd with "-T 60" and see that test succeeds. (In reply to Olga Kornieskaia from comment #5) > I was also able to reproduce this locally using "tc". > > 1. sudo mount -t nfs4 -osec=krb5 192.168.1.120:/ /mnt > 2. kinit > 3. tc qdisc add dev eth0 root handle 1: prio > 4. tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 > match ip dst 192.168.1.120 flowid 2:1 > 5. tc qdisc change dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms > 6. echo testing > /mnt/test > > this fails with "permission denied" > > Then run gssd with "-T 60" and see that test succeeds. Hi Olga, I cannot reproduce it using "tc" in previous version nfs-utils-1.2.3-64.el6. Please help to have a look at the following steps: [18:17:11 root@ ~~]# mount -t nfs4 -o sec=krb5 $HOSTNAME:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local [18:17:11 root@ ~~]# kinit root <<< 'redhat' Password for root: [18:17:11 root@ ~~]# tc qdisc add dev eth0 root handle 1: prio [18:17:11 root@ ~~]# tc qdisc show dev eth0 qdisc prio 1: root refcnt 9 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 [18:17:11 root@ ~~]# tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 match ip dst 10.73.4.241 flowid 2:1 [18:17:11 root@ ~~]# tc filter show dev eth0 filter parent 1: protocol ip pref 55 u32 filter parent 1: protocol ip pref 55 u32 fh 800: ht divisor 1 filter parent 1: protocol ip pref 55 u32 fh 800::55 order 85 key ht 800 bkt 0 flowid 2:1 match 0a4904f1/ffffffff at 16 [18:17:12 root@ ~~]# tc qdisc add dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms ^^^^^^^^^^^^^ "tc qdisc change" fail "RTNETLINK answers: No such file or directory" ^^^^^^^^^^^^^ So I replace the "change" to be "add" as someone said [18:17:12 root@ ~~]# tc qdisc show dev eth0 qdisc prio 1: root refcnt 9 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc netem 2: parent 1:1 limit 1000 delay 10.0s 10.0s [18:17:12 root@ ~~]# date >/mnt/nfsmp-krb5-common-local/testfile [18:17:12 root@ ~~]# umount /mnt/nfsmp-krb5-common-local [18:17:12 root@ ~~]# cat /var/log/messages | grep 'Connection timed out' [18:17:12 root@ ~~]# Have you verified that you see a delay happening? Is this a localhost mount? If so you are setting the delay for eth0. Given your time output of the shell: 18:17:12 is your start of the "date" command and then "unmount" again at "18:17:12" so if the delay was 10s the delay definitely didn't happen. that command should have taken some time to complete. (In reply to Olga Kornieskaia from comment #7) > Have you verified that you see a delay happening? Is this a localhost mount? > If so you are setting the delay for eth0. Given your time output of the > shell: 18:17:12 is your start of the "date" command and then "unmount" again > at "18:17:12" so if the delay was 10s the delay definitely didn't happen. > that command should have taken some time to complete. Yes, this is a localhost mount and I have see the delay happened. I can see the delay between my typing and screen showing, but seems the command exec without delay. I'm wondering whether it's the matter that "tc qdisc add" or "tc qdisc change" (in Comment 6) [root@hp-dl160g8-01 ~]# rpm -q nfs-utils nfs-utils-1.2.3-64.el6.x86_64 [root@hp-dl160g8-01 ~]# mount -t nfs4 -o sec=krb5 hp-dl160g8-01.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local [root@hp-dl160g8-01 ~]# kinit root <<< 'redhat' Password for root: [root@hp-dl160g8-01 ~]# tc qdisc add dev eth0 root handle 1: prio [root@hp-dl160g8-01 ~]# tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 match ip dst 10.73.4.241 flowid 2:1 [root@hp-dl160g8-01 ~]# tc qdisc add dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ delay happening after this [root@hp-dl160g8-01 ~]# tc qdisc show dev eth0 qdisc prio 1: root refcnt 9 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc netem 2: parent 1:1 limit 1000 delay 10.0s 10.0s [root@hp-dl160g8-01 ~]# tc filter show dev eth0 filter parent 1: protocol ip pref 55 u32 filter parent 1: protocol ip pref 55 u32 fh 800: ht divisor 1 filter parent 1: protocol ip pref 55 u32 fh 800::55 order 85 key ht 800 bkt 0 flowid 2:1 match 0a4904f1/ffffffff at 16 [root@hp-dl160g8-01 ~]# date; date >/mnt/nfsmp-krb5-common-local/testfile; date Tue Jan 12 09:50:29 CST 2016 Tue Jan 12 09:50:29 CST 2016 [root@hp-dl160g8-01 ~]# date; umount /mnt/nfsmp-krb5-common-local; date Tue Jan 12 09:51:10 CST 2016 Tue Jan 12 09:51:10 CST 2016 [root@hp-dl160g8-01 ~]# tc qdisc del dev eth0 root ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ delay disappeared after this [root@hp-dl160g8-01 ~]# tc qdisc del dev eth0 root RTNETLINK answers: No such file or directory [root@hp-dl160g8-01 ~]# cat /var/log/messages | grep 'Connection timed out' [root@hp-dl160g8-01 ~]# echo $? 1 [root@hp-dl160g8-01 ~]# If this is a localhost mount then mount goes over loopback interface, not eth0. I don't see any delay in your output. (In reply to Olga Kornieskaia from comment #9) > If this is a localhost mount then mount goes over loopback interface, not > eth0. > I don't see any delay in your output. Thanks Olga, I have been able to reproduce this problem. Thanks again for your detailed reproducer! > I was also able to reproduce this locally using "tc". > > 1. sudo mount -t nfs4 -osec=krb5 192.168.1.120:/ /mnt > 2. kinit > 3. tc qdisc add dev eth0 root handle 1: prio > 4. tc filter add dev eth0 parent 1:0 protocol ip pref 55 handle ::55 u32 > match ip dst 192.168.1.120 flowid 2:1 ^^^^^^^^^^^^^ It's the server's IP, previously I wrongly understand > 5. tc qdisc change dev eth0 parent 1:1 handle 2: netem delay 10000ms 10000ms > 6. echo testing > /mnt/test > > this fails with "permission denied" > > Then run gssd with "-T 60" and see that test succeeds. Move to VERIFIED as Comment 11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0868.html |
Created attachment 1112491 [details] modified patch of the upstream commit Description of problem: Asking to include the fix for bugzilla 1009528 into RHEL6.x version. It's the same problem that when the server takes too long to respond, it leads gssd to timeout and log something like: "Jan 6 05:21:01 scspr0055500001 rpc.gssd[3152]: WARNING: can't create tcp rpc_clnt to server ae1nb141541.dmag212.com for user with uid 70120: RPC: Remote system error - Connection timed out" Then, the kernel fails the operation with "permission denied". This problem has been observed by the NetApp QA on RHEL6.6 nfs-utils-1.2.3-54 version. Fix is to include upstream commit b7b9e4e69587652b40d543094b133fb8b4881dc1 commit b7b9e4e69587652b40d543094b133fb8b4881dc1 Author: Rinku Kothiya <rkothiya> Date: Fri Jul 25 10:50:55 2014 -0400 gssd: configurable connection timeout for the rpcgssd service As the patch doesn't cleanly apply, I'm attaching a slightly modified version of that patch that does apply. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. mount with krb5 2. kinit as some user 3. create a test directory and there as user execute a test application (hammer) that has a variety of open, lock, read, write, rename, delete ops etc 4. mount with krb5i 5. kinit as some user 6. same as 3 7. repeat 1-6 3 more times. the failure is for when context is getting established for the user. Actual results: Expected results: Additional info: