Bug 526494
Summary: | nfs mount for anaconda kickstart file takes a long time | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Orion Poplawski <orion> | ||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | rawhide | CC: | anaconda-maint-list, steved | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-10-23 12:58:43 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Orion Poplawski
2009-09-30 15:31:40 UTC
would there be any way for you to post a bzip2-ed tshark network trace? Created attachment 363360 [details]
tshark trace dump - raw
Only 16k. This is just port 2049. Let me know if you want other ports.
1 0.000000 192.168.0.191 -> 192.168.0.12 TCP wpgs > nfs [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=4294683552 TSER=0 WS=6
2 0.000063 192.168.0.12 -> 192.168.0.191 TCP nfs > wpgs [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=89252403 TSER=4294683552 WS=6
3 0.000600 192.168.0.191 -> 192.168.0.12 TCP wpgs > nfs [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=4294683554 TSER=89252403
4 0.000903 192.168.0.191 -> 192.168.0.12 NFS V4 NULL Call
5 0.000939 192.168.0.12 -> 192.168.0.191 TCP nfs > wpgs [ACK] Seq=1 Ack=45 Win=5824 Len=0 TSV=89252404 TSER=4294683554
6 0.000985 192.168.0.12 -> 192.168.0.191 NFS V4 NULL Reply (Call In 4)
7 0.001148 192.168.0.191 -> 192.168.0.12 TCP wpgs > nfs [ACK] Seq=45 Ack=29 Win=5888 Len=0 TSV=4294683555 TSER=89252404
8 35.067646 192.168.0.191 -> 192.168.0.12 NFS V4 COMP Call <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR
9 35.081904 192.168.0.12 -> 192.168.0.191 NFS V4 COMP Reply (Call In 8) <EMPTY> PUTROOTFH PUTROOTFH;GETFH GETFH;GETATTR GETATTR
syslog:
RPC: Registered udp transport module
RPC: Registered tcp transport module
Slow work thread pool: Starting up
Slow work thread pool: Ready
FS-Cache: Loaded
FS-Cache: Netfs 'nfs' registered for caching
rpcbind: server localhost not responding, timed out
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
mount.nfs used greatest stack depth: 4164 bytes left
Maybe it just takes that long to get the nfsv4 stack up. Maybe it could get started up earlier? Maybe rpcbind is not configured to listen to localhost and needs to be. In fact, rpcbind is not running, maybe it needs to be?
It's not a big deal, but I noticed it when the mount started failing because I needed to change the mount point to correspond to the nfsv4 root and it took a while to respond when before it was instantaneous.
Hmm.... The trace definitely shows a 35sec delay between the mount and the initial traffic... Does this happen all the time? Yes. Is the "NULL" command the mount? I can duplicate on a running system if I stop rpcbind. Messages show: Oct 1 10:30:45 localhost kernel: rpcbind: server localhost not responding, timed out Oct 1 10:30:45 localhost rpc.idmapd[1235]: nss_getpwnam: name 'root.com' does not map into domain 'localdomain' So it looks like nfsv4 mounts need to talk to rpc.idmapd, which isn't configured in the anaconda environment? strace on mount shows the hang here: [pid 2542] mount("saga:/data1", "/mnt", "nfs", 0, "addr=192.168.0.12,vers=4,clienta"...) = 0 <35.020464> > Is the "NULL" command the mount? Yes, we check to see if V4 is support and is up and running before do the actual mount... If neither one of those is true we roll back to v3. > So it looks like nfsv4 mounts need to talk to rpc.idmapd, which isn't > configured in the anaconda environment? Yes, this what is causing the delay... One of the main differences in V4 is the uids and gids of a file are no longer passed between the server and client. Instead, strings are passed back and forth in the format of username@domain Then user string is converted into uid and gid on both the server and client. This is done by the kernel doing an "upcall" to rpc.idmapd. So I'm guessing there is a 35sec time out on that upcall. >Oct 1 10:30:45 localhost kernel: rpcbind: server localhost not responding, > timed out hmm... I wonder who is trying to talk to rpcbind and why?? > Oct 1 10:30:45 localhost rpc.idmapd[1235]: nss_getpwnam: name > 'root.com' does not map into domain 'localdomain' In /etc/idmapd.conf, try setting 'Domain = cora.nwra.com' to see if this goes away... Question: How many users are needed for an install? One, root, correct? (In reply to comment #6) > > Oct 1 10:30:45 localhost rpc.idmapd[1235]: nss_getpwnam: name > > 'root.com' does not map into domain 'localdomain' > In /etc/idmapd.conf, try setting 'Domain = cora.nwra.com' > to see if this goes away... It makes both messages go away (rpcbind,rpc.idmapd), though not the 35 sec delay. Just to be clear, you are starting up rpc.idmapd with the domain set to cora.nwra.com and you are still seeing the delay?? (In reply to comment #8) > Just to be clear, you are starting up rpc.idmapd with the domain > set to cora.nwra.com and you are still seeing the delay?? Yes. I thought the rpcbind message went away, but it looks like it is still there: Oct 1 13:49:52 localhost kernel: rpcbind: server localhost not responding, timed out Start rpcbind again and the above and the delay goes away. It appears the v4 mount is getting caught in some legacy sunrpc code.... I've started the upstream discussion as to how we should handle this... http://linux-nfs.org/pipermail/nfsv4/2009-October/011378.html *** This bug has been marked as a duplicate of bug 528537 *** |