Description of problem: This may be a kernel or other issue, but I'm starting with anaconda as it is an install problem. We use a central NFS server for all of our Fedora installs. We have not seen problems with any other systems. I'm trying to install FC4 onto two dual-opteron machines, uses e100 driver. The installl hangs with the following messages: nfs warning: mount version older than kernel nfs: server alexandria not responding, still trying [repeats] anaconda log shows: * host is alexandria, dir is /export/data1/fedora/cora/4/x86_64/os * mounting nfs path alexandria:/export/data1/fedora/cora/4/x86_64/os * mounted alexandria:/export/data1/fedora/cora/4/x86_64/os on /mnt/source * can access /mnt/source/Fedora/base/stage2.img * mntloop loop0 on /mnt/runtime as /mnt/source/Fedora/base/stage2.img fs is 26 How reproducible: Everytime
Created attachment 117495 [details] tcpdump -w dump Using an updated install image (from current updated FC4) exhibits the same problem. This is a tcpdump packet capture of netowrk traffic between the install machine and the nfs server.
Well, it's not just x86_64. Just saw it on our dual xeon server. It also uses an e100 nic. Also tried this with an e1000 nic in the machine and it timed out as well, though it got as far as running anaconda, but that was the last message before timing out. Other commonality is that they are all on the same switch as the server (SMC8508T), while other machines are at least a hop away on our Cisco switch stack. I have managed to install on other (single processor) 32-bit e100 machines.
hmm... is appears the server is going off the deep end... Would it be possible to get a system trace from 'alexandria' by doing a 'echo t > sysrq-trigger'
Created attachment 118999 [details] Gzipped system trace Server is fine except for these particularly FC4 upgrade attempts. It is one of our main NFS file servers and we generally don't have problems with it. Here is the trace while the client (FC4 upgrade) is stalling. Let me know what else I can do/send.
2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
Just ran into this again installing FC4 x86_64 from our FC4 (2.6.15-1.1831_FC4smp) NFS install server. Here's the rub though: The systems are on the same SMC8508T switch. If I move the client (install target) to another switch, the install works fine. The client has a 100Mbit NIC and the server a 1GB NIC. And this is only during install. Once installed, I move the wire back to the SMC switch and everything is fine. Although now that I think about it, NFS traffic travels over a separate gigabit network. Just did a basic test mounting the partition over 100MB nic and it can copy data off of it just fine.
This sound more like network problem than an NFS problem... When the system hangs, can you see (via etheral or tcpdump) any any traffic at all?
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you.
This does appear to be fixed in FC5.
Seeing this again trying to install current rawhide on a laptop. Our nfs server is now running FC6.
Created attachment 153364 [details] tcpdump -s 1500 trace of data from client to server This is kind of odd, from the end of the trace: 10:41:09.507179 IP (tos 0xc0, ttl 64, id 29724, offset 0, flags [none], proto: ICMP (1), length: 576) cynosure.cora.nwra.com > saga.cora.nwra.com: ICMP ip reassembly time exceeded, length 556 IP (tos 0x0, ttl 64, id 54448, offset 0, flags [+], proto: UDP (17), length: 1500) saga.cora.nwra.com.nfs > cynosure.cora.nwra.com.2729529335: reply ok 1472 read REG 100644 ids 537/537 sz 94142464 nlink 1 rdev 0/0 fsid 1605 fileid e94006 a/m/ctime 1177424659.000000 1177403744.000000 1177424927.000000 16384 bytes MPLS extension v4 packet not supported
If I tell anaconda to use tcp (--opts=tcp in kickstart file), everything works fine. Looking more in the logs, looks like the server is sending lots of IP fragments (buffer size of around 16k?) but the client is not receiving them and is resending the request. This could be network problems on our network I suppose as we don't normally use UDP for NFS and moving connections to different switches has helped at times. But in general, I don't see that much problems on our network.
It could be the case that your not seeing network problems because not too many applications using UDP these days. But going to NFS over TCP is definitely the correct solution and I'm not sure why --opts=tcp is the the default...