Red Hat Bugzilla – Bug 165229
nfs install hangs
Last modified: 2007-11-30 17:11:11 EST
Description of problem:
This may be a kernel or other issue, but I'm starting with anaconda as it is an
We use a central NFS server for all of our Fedora installs. We have not seen
problems with any other systems. I'm trying to install FC4 onto two
dual-opteron machines, uses e100 driver. The installl hangs with the following
nfs warning: mount version older than kernel
nfs: server alexandria not responding, still trying
anaconda log shows:
* host is alexandria, dir is /export/data1/fedora/cora/4/x86_64/os
* mounting nfs path alexandria:/export/data1/fedora/cora/4/x86_64/os
* mounted alexandria:/export/data1/fedora/cora/4/x86_64/os on /mnt/source
* can access /mnt/source/Fedora/base/stage2.img
* mntloop loop0 on /mnt/runtime as /mnt/source/Fedora/base/stage2.img fs is 26
Created attachment 117495 [details]
tcpdump -w dump
Using an updated install image (from current updated FC4) exhibits the same
This is a tcpdump packet capture of netowrk traffic between the install machine
and the nfs server.
Well, it's not just x86_64. Just saw it on our dual xeon server. It also uses
an e100 nic. Also tried this with an e1000 nic in the machine and it timed out
as well, though it got as far as running anaconda, but that was the last message
before timing out.
Other commonality is that they are all on the same switch as the server
(SMC8508T), while other machines are at least a hop away on our Cisco switch stack.
I have managed to install on other (single processor) 32-bit e100 machines.
hmm... is appears the server is going off the deep
end... Would it be possible to get a system trace
from 'alexandria' by doing a 'echo t > sysrq-trigger'
Created attachment 118999 [details]
Gzipped system trace
Server is fine except for these particularly FC4 upgrade attempts. It is one
of our main NFS file servers and we generally don't have problems with it.
Here is the trace while the client (FC4 upgrade) is stalling.
Let me know what else I can do/send.
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.
This is a mass-update to all currently open kernel bugs.
A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
Just ran into this again installing FC4 x86_64 from our FC4
(2.6.15-1.1831_FC4smp) NFS install server. Here's the rub though: The systems
are on the same SMC8508T switch. If I move the client (install target) to
another switch, the install works fine. The client has a 100Mbit NIC and the
server a 1GB NIC.
And this is only during install. Once installed, I move the wire back to the
SMC switch and everything is fine. Although now that I think about it, NFS
traffic travels over a separate gigabit network. Just did a basic test mounting
the partition over 100MB nic and it can copy data off of it just fine.
This sound more like network problem than an NFS problem... When
the system hangs, can you see (via etheral or tcpdump) any
any traffic at all?
[This comment added as part of a mass-update to all open FC4 kernel bugs]
FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel. As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
Please retest with Fedora Core 5.
This does appear to be fixed in FC5.
Seeing this again trying to install current rawhide on a laptop. Our nfs server
is now running FC6.
Created attachment 153364 [details]
tcpdump -s 1500 trace of data from client to server
This is kind of odd, from the end of the trace:
10:41:09.507179 IP (tos 0xc0, ttl 64, id 29724, offset 0, flags [none], proto:
ICMP (1), length: 576) cynosure.cora.nwra.com > saga.cora.nwra.com: ICMP ip
reassembly time exceeded, length 556
IP (tos 0x0, ttl 64, id 54448, offset 0, flags [+], proto: UDP (17),
length: 1500) saga.cora.nwra.com.nfs > cynosure.cora.nwra.com.2729529335: reply
ok 1472 read REG 100644 ids 537/537 sz 94142464 nlink 1 rdev 0/0 fsid 1605
fileid e94006 a/m/ctime 1177424659.000000 1177403744.000000 1177424927.000000
MPLS extension v4 packet not supported
If I tell anaconda to use tcp (--opts=tcp in kickstart file), everything works fine.
Looking more in the logs, looks like the server is sending lots of IP fragments (buffer size of around 16k?)
but the client is not receiving them and is resending the request.
This could be network problems on our network I suppose as we don't normally use UDP for NFS and
moving connections to different switches has helped at times. But in general, I don't see that much
problems on our network.
It could be the case that your not seeing network problems
because not too many applications using UDP these days.
But going to NFS over TCP is definitely the correct solution
and I'm not sure why --opts=tcp is the the default...