We have a f37 client (kernel 6.1.14-200.fc37.x86_64) connected to an up-to-date Synology NFS server, connected via nfs4. Intermittently, a condition develops whereby the client speed falls off a cliff. Normal NFS ops are instantaneous, but when this condition develops, ordinary ls operations can take order of seconds, firefox startup taking 30. mount reports active options thusly: (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.1,local_lock=none,addr=192.168.1.21) Other clients to the same server are not affected at the same time. No contemporaneous diagnostics visible on the affected client, other clients, or the server. So it's hard to identify the point in time of the beginning of the condition. A "perf top" on the kernel displays this peculiarity: Samples: 6M of event 'cycles', 4000 Hz, Event count (approx.): 38276948048 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 49.85% [kernel] [k] nfs_server_reap_expired_delegations ◆ 4.47% [kernel] [k] nfs_mark_test_expired_all_delegations ▒ 1.79% [kernel] [k] add_interrupt_randomness ▒ 0.87% [kernel] [k] check_preemption_disabled ▒ 0.86% [kernel] [k] read_tsc ▒ i.e., nfs_server_reap_expired_delegations() is very very busy. iptraf-ng reports O(1000) packets per second to the server on a relatively quiescent state (so no active file activities), most of them appear to be getattr queries/responses. O(3000)+ such packets if NFS activities are being attempted. 13:21:56.343792 IP (tos 0x2,ECT(0), ttl 64, id 43049, offset 0, flags [DF], proto TCP (6), length 268) CLIENT > SERVER: Flags [P.], cksum 0x8465 (incorrect -> 0x0852), seq 74016:74232, ack 34429, win 24571, options [nop,nop,TS val 1821171799 ecr 1844073712], length 216: NFS request xid 2025763571 212 getattr fh 0,1/53 13:21:56.343997 IP (tos 0x2,ECT(0), ttl 64, id 6274, offset 0, flags [DF], proto TCP (6), length 152) SERVER > CLIENT: Flags [P.], cksum 0xe7ab (correct), seq 34429:34529, ack 74232, win 24576, options [nop,nop,TS val 1844073713 ecr 1821171799], length 100: NFS reply xid 2025763571 reply ok 96 getattr NON 2 ids 0/-777130909 sz 469762048 umount -l / mount of the same filesystem while the machine stays up does not appear to help matters. Once the condition begins, there appears to be no stopping it, even with a server reboot. A client reboot does fix things right up, unsurprisingly. :-)
wireshark indicates NFSv4 TEST_STATEID packets Frame 8: 282 bytes on wire (2256 bits), 282 bytes captured (2256 bits) on interface br0, id 0 Ethernet II, Src: 86:59:3e:3f:64:1d (86:59:3e:3f:64:1d), Dst: Synology_b2:55:91 (00:11:32:b2:55:91) Internet Protocol Version 4, Src: 192.168.1.1, Dst: 192.168.1.21 Transmission Control Protocol, Src Port: 678, Dst Port: 2049, Seq: 649, Ack: 401, Len: 216 Remote Procedure Call, Type:Call XID:0x542ce2f3 Network File System, Ops(2): SEQUENCE, TEST_STATEID No. Time Source Destination Protocol Info Length 9 0.001713316 192.168.1.21 192.168.1.1 NFS V4 Reply (Call In 8) TEST_STATEID 166 Frame 9: 166 bytes on wire (1328 bits), 166 bytes captured (1328 bits) on interface br0, id 0 Ethernet II, Src: Synology_b2:55:91 (00:11:32:b2:55:91), Dst: 86:59:3e:3f:64:1d (86:59:3e:3f:64:1d) Internet Protocol Version 4, Src: 192.168.1.21, Dst: 192.168.1.1 Transmission Control Protocol, Src Port: 2049, Dst Port: 678, Seq: 401, Ack: 865, Len: 100 Remote Procedure Call, Type:Reply XID:0x542ce2f3 Network File System, Ops(2): SEQUENCE TEST_STATEID No. Time Source Destination Protocol Info Length 10 0.001966063 192.168.1.1 192.168.1.21 NFS V4 Call (Reply In 11) TEST_STATEID 282
The presence of a sysctl.d setting to turn fs.leases-enable=0 does not prevent this problem.
fedora38 kernel 6.3.8-200 does not fix this issue. Once it hits (after something like 3 weeks of uptime and no other known changes on the network), nfsstat -c indicates 2-30 thousands of test_stateid messages per second, and the [IP-manager] kernel thread occupying the bulk of cpu time.
The above is expected to occur if/when the NFSv4.1 server indicates that stateids have been administratively revoked, or the lease was expired by returning one of the errors NFS4ERR_DELEG_REVOKED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_EXPIRED. It can also occur if the server sets one of the SEQUENCEID flags SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, or SEQ4_STATUS_ADMIN_STATE_REVOKED. In all those cases, the client is required by the spec to iterate through its list of known stateids, and call TEST_STATEID to ascertain that the stateid is still valid.
There appears to be no trace of such a message coming from the server: either in its own logs, that of the client, or any abnormality in any of the other clients. Further, once this situation begins, the iteration through the list of known stateids seems to be perpetual & continuous until the reboot of the machine. It does not seem like a one-pass query.
Do you have logs to post?
> Do you have logs to post? I was going to say no, because the client logged nothing. However! Yet another peer NFS client (not 192.168.1.1), which continued performing normally, dmesg'd this about the NFS server: [379489.954753] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbe8fe, got 0x9fbe903 [380399.949813] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbede6, got 0x9fbede7 [380403.949741] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbedeb, got 0x9fbeded [380428.949556] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbee06, got 0x9fbee0a [380518.949379] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbee52, got 0x9fbee53 [380548.949321] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbee69, got 0x9fbee6a [384159.931317] NFS: server HOSTNAME error: fileid changed fsid 0:60: expected fileid 0x9fbfd6d, got 0x9fbfd6e and the NFS server turns out to have dmesg'd this: [4259331.428885] nfsd4_validate_stateid: 443 callbacks suppressed [4259331.434869] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.443101] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.451501] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.459626] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.467707] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.475864] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.483976] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.492077] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.500249] NFSD: client 192.168.1.1 testing state ID with incorrect client ID [4259331.508330] NFSD: client 192.168.1.1 testing state ID with incorrect client ID These may have been at around the same time. There were no other log entries, just these ~10, but the problem persisted for several hours, till a reboot. If the problem should occur again, is there some tracing or debugging data you suggest gathering, from either client or the server?
Do these clients have unique hostnames and/or do they have NFSv4 uniquefiers set?
(In reply to Trond Myklebust from comment #8) > Do these clients have unique hostnames and/or do they have NFSv4 uniquefiers > set? Unique hostnames & IP addresses, yes, and a nearby arpwatch doesn't detect any shenanigans. By uniquefier, do you mean a manually set /sys/fs/nfs/net/nfs_client/identifier? No, just the default "(null)" appears.
Well.. 'FS: server <wwww> error: fileid changed fsid <xxxx>: expected fileid <yyyy>, got <zzzz>' implies that the server is having trouble resolving filehandles correctly. It is returning files with inode numbers that do not match those which were originally returned when the file was looked up by name. This will obviously confuse clients, since they expect filehandles to always be unique. The message 'NFSD: client <aaaa> testing state ID with incorrect client ID' was removed by commit 663e36f07666 a couple of years ago, since (as I said above) it is not unexpected for clients to have to test recover after a network partition causes the lease to be lost. So that would be the smoking gun for the 'NFS4ERR_EXPIRED' return value from the server.
Correction: the 'testing state ID with incorrect client ID' will lead to a NFS4ERR_BAD_STATEID being returned. That has the same effect as the NFS4ERR_EXPIRED.
Thanks for your help. I've added modprobe.d nfs nfs4_unique_id=FOO configurations to all the clients, in hope of avoiding identifier collisions, if that is somehow being caused presently.
If you had unique hostnames on the clients, then the nfs4_unique_id setting is not necessary. However this report has me wondering about the whole nfsd4_validate_stateid thingy in the server. If the server is testing the stateid for consistency with the new client id before it has even tried to call find_stateid_locked(), then how are we ever going to clear out those stateids? The client is not supposed to call FREE_STATEID if the TEST_STATEID call returns NFS4ERR_BAD_STATEID. So this might explain the loop: the server is expecting that FREE_STATEID in order to free up the no-longer-valid stateid, but that will never happen because the result from TEST_STATEID told the client that the stateid is bad. This again means the server cannot clear the SEQUENCEID flags, and so we're in for another round of TEST_STATEID. Lather-rinse-repeat...
I think there's a server bug that can cause this, but I've been unable to find a way for the client to tickle it in practice: https://lore.kernel.org/linux-nfs/c0fe2b35900938943048340ef70fc12282fe1af8.1691160604.git.bcodding@redhat.com/T/#u Sure would be nice to see a wire capture of the client being able to get into this state. In a different report of this problem (bug 2217103), the client had 75k delegations and multiple network partitions, but we aren't able to examine the server's state or see what happened on the wire to create this situation.
(In reply to Frank Ch. Eigler from comment #0) > Once the condition begins, there appears to be no > stopping it, even with a server reboot. A client reboot does fix things > right up, unsurprisingly. :-) Oh, nevermind on comment 14. A server reboot should fix the problem that the patch in comment 14 also fixes.
The same or a related problem appears to be plaguing Fedora 39's nfs client. In a configuration where a RHEL9 nfs-server is dealing with a F39 nfs client, we're getting to a state where a kernel thread is spinning, and NFS traffic is slow. 6.6.4-200.fc39.x86_64 96.17% [kernel] [k] nfs_server_return_marked_delegations ◆ 1.49% [kernel] [k] nfs4_is_valid_delegation.part.0 ▒ 0.11% [kernel] [k] update_sd_lb_stats.constprop.0 ▒ 0.09% [kernel] [k] native_write_msr ▒ 0.07% [kernel] [k] cpuidle_enter_state ▒ 0.07% [kernel] [k] nfs_expire_unused_delegation_types ▒ 0.05% [kernel] [k] native_irq_return_iret ▒ 0.04% [kernel] [k] menu_select ▒
This should be fixed by upstream a9b8d90f8726 NFSv4: fairly test all delegations on a SEQ4_ revocation on v6.7 kernel.