97610 – netfs seems to start too soon after network

Bug 97610 - netfs seems to start too soon after network

Summary: netfs seems to start too soon after network

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 2.1
Classification:	Red Hat
Component:	initscripts
Sub Component:
Version:	2.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Bill Nottingham
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	107999 116711 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-06-18 12:35 UTC by Patrick C. F. Ernzer
Modified:	2014-03-17 02:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-21 21:12:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
cluster host 1 (1.71 KB, text/plain) 2003-07-11 06:11 UTC, ilja lunev	no flags	Details
cluster host 1 (64 bytes, text/plain) 2003-07-11 06:11 UTC, ilja lunev	no flags	Details
patch to /etc/init.d/netfs (862 bytes, patch) 2004-04-05 23:11 UTC, Eric Eisenhart	no flags	Details \| Diff
simple retry nfs mount on startup (1.99 KB, patch) 2005-07-06 23:29 UTC, Vilius Puidokas	no flags	Details \| Diff
View All

Description Patrick C. F. Ernzer 2003-06-18 12:35:40 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
RHN System ID 1002961687 sees the following behaviour:

netfs does a 'mount -a -t nfs' more or less directly after the network is
brought up.

Although the interface reports OK (static config BTW), we are unable to mount
nfs filesystems, adding sleep 15 makes the problem less severe (only the first
one fails), changing the priority from 25 to 90 makes the problem go away.

If the machine tries to resolve the NFS server name via DNS, we get a 'cannot
resolve', if we have said server in /etc/hosts, we cannot route to it, looks to
me like network is not fully up at that point. No idea if it is the local
network infrastructure or not.

What I'd like to see is either netfs starting later (is there a problem with my
dirty hack to start it at prio 90? BTW, the script with changed prio is named
netfs-patched) or some test to see if network is really working.

This is being entered for a customer who will be on cc, please address all
questions (regaring tests to be done on that specific machine) to him.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. have an nfs type entry in /etc/fstab
2. have the netfs service enabled
3. reboot
    

Actual Results:  at least the first nfs mount will fail

Expected Results:  all nfs type mounts to succeed.

Additional info:

doing a 'service netfs stop;service netfs start' after boot mounts all NFS
filesystems just fine.

Comment 1 Patrick C. F. Ernzer 2003-06-18 12:37:11 UTC

note, see Issue Tracker #21184, Event posted 06-18-2003 06:57am as well (if you
have access that is.

Philips tech who was on the phone for the manipulation is Mr Lunev

Comment 2 Bill Nottingham 2003-06-18 19:05:06 UTC

What network card, attached to what sort of switch? It's probably spending a
very large amount of time negotiating.

Comment 3 Patrick C. F. Ernzer 2003-06-30 08:55:36 UTC

Bill,

it's a 'Dell PowerEdge 2650', which should make the NIC 'BROADCOM Corporation
NetXtreme BCM5701 Gigabit Ethernet (rev 15)', I can check with the customer if
you need this info.

My question was actually aiming at why netfs is started so shortly after
network. spending a very large amount of time negotiating is not that uncommon,
so if there is no reason to start netfs this early after network, I'd like this
bug to be considered an RFE to start network dependant services a tad later, is
that acceptable?

RU

PCFE

Comment 4 Bill Nottingham 2003-06-30 16:30:36 UTC

Not really. Too many other things rely on netfs being finished. I presume you're
using static IPs?

Comment 5 Bill Nottingham 2003-06-30 21:59:34 UTC

Yes, you are using static IPs, after rereading.

Comment 6 Bill Nottingham 2003-06-30 22:07:13 UTC

How are you handling DNS on these boxes?

Comment 7 Patrick C. F. Ernzer 2003-07-01 20:02:45 UTC

Bill,

"If the machine tries to resolve the NFS server name via DNS, we get a 'cannot
resolve', if we have said server in /etc/hosts, we cannot route to it", so the DNS part was covered. 
Or is there something else you'd like to check?

As we cannot start netfs later, can we add a test to the script to make sure the network reallly is up 
before attempting to mount network filesystems?

Comment 8 Bill Nottingham 2003-07-01 20:05:10 UTC

Yes, it would be good to know how your DNS is configured. The fact
that you get a can't resolve error implies something else is wrong entirely
outside of initscripts/autonegotiation interactions.

Comment 9 Patrick C. F. Ernzer 2003-07-03 09:48:33 UTC

Bill,

so, as things do work later in bootup you mean to say that network may be up but DNS 
problematic?!? (Even though taking DNS out of the equation by using an /etc/hosts entry gave me 
routing errors).

Anyway, I'll make a separate posting for the admin to post resolv.conf and details on the DNS.

Comment 10 Patrick C. F. Ernzer 2003-07-03 09:49:45 UTC

uxadm: can you please post your /etc/resolv.conf and all details you know about your DNS for Mr 
Nottingham to this bug please?

Comment 11 ilja lunev 2003-07-03 10:39:22 UTC

[root@bblxc12a rhn]# cat /etc/resolv.conf
nameserver 130.143.87.243
search bbl.ms.philips.com philips.com
-----------------------------------------

[root@bblxc12a rhn]# dig depbblhps1ms000.bbl.ms.philips.com.

; <<>> DiG 9.2.1 <<>> depbblhps1ms000.bbl.ms.philips.com.
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35039
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;depbblhps1ms000.bbl.ms.philips.com. IN A

;; ANSWER SECTION:
depbblhps1ms000.bbl.ms.philips.com. 86400 IN A  130.143.87.243

;; Query time: 0 msec
;; SERVER: 130.143.87.243#53(130.143.87.243)
;; WHEN: Thu Jul  3 12:38:00 2003
;; MSG SIZE  rcvd: 68

-----------------------------------------------------------
[root@bblxc12a rhn]# dig 130.143.87.243

; <<>> DiG 9.2.1 <<>> 130.143.87.243
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 34867
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;130.143.87.243.                        IN      A

;; AUTHORITY SECTION:
.                       86400   IN      SOA     ns0.philips.com.
dns.philips.com. 2003021700 10800 3600 604800 604800

;; Query time: 394 msec
;; SERVER: 130.143.87.243#53(130.143.87.243)
;; WHEN: Thu Jul  3 12:38:30 2003
;; MSG SIZE  rcvd: 86

Comment 12 Bill Nottingham 2003-07-03 15:47:26 UTC

What I'm saying is that if the machine is autonegotiating, you should *not* get
'cannot resolve', or associated errors. It should just wait. The problem implies
some more fundamental issue with the network config.

Comment 13 Patrick C. F. Ernzer 2003-07-10 06:39:30 UTC

Bill,

so the next step would be a tcpdump I guess, or would you like other info prior
to doimng a tcpdump from another machine that sits on a hub with the problematic
server?

RU

PCFE

Comment 14 Bill Nottingham 2003-07-10 18:57:56 UTC

Can you post your /etc/resolv.conf and /etc/nsswitch.conf?

Comment 15 ilja lunev 2003-07-11 06:11:18 UTC

Created attachment 92874 [details]
cluster host 1

hi,

here is my nsswitch.conf

Comment 16 ilja lunev 2003-07-11 06:11:56 UTC

Created attachment 92875 [details]
cluster host 1

here is my resolve.conf

Comment 17 Bill Nottingham 2003-07-11 15:17:20 UTC

Are you actually using nisplus?

Comment 18 ilja lunev 2003-07-14 06:45:01 UTC

We use never nisplus.

Comment 19 Bill Nottingham 2003-07-14 15:26:09 UTC

What happens if you remove all the nisplus entries from /etc/nsswitch.conf?

Comment 20 ilja lunev 2003-07-15 15:07:25 UTC

I remove all nisplus entries form /etc/nsswitch.conf.
No success

Comment 21 Bill Nottingham 2003-07-15 15:58:17 UTC

Hm, this is still something we've never seen in any other testing or reports.

What does your /etc/sysconfig/network and /etc/sysconfig/network-scripts/ifcfg-*
look like?

Comment 22 ilja lunev 2003-07-16 07:29:52 UTC

[root@bblxc11a root]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=bblxc11a.bbl.ms.philips.com

[root@bblxc11a root]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=130.143.87.181
NETMASK=255.255.255.0
GATEWAY=130.143.87.1

[root@bblxc11a root]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.0.0.1
NETMASK=255.255.255.0

Comment 23 Eric Eisenhart 2003-12-10 20:14:28 UTC

I believe this is another duplicate of 107999.

Comment 24 Bill Nottingham 2003-12-10 20:17:27 UTC

*** Bug 107999 has been marked as a duplicate of this bug. ***

Comment 25 Bill Nottingham 2004-02-26 21:00:14 UTC

*** Bug 116711 has been marked as a duplicate of this bug. ***

Comment 26 Eric Eisenhart 2004-04-05 23:11:26 UTC

Created attachment 99129 [details]
patch to /etc/init.d/netfs

This patch from ticket 107999 seems to have been overlooked.

Comment 28 Vilius Puidokas 2005-07-06 23:29:05 UTC

Created attachment 116444 [details]
simple retry nfs mount on startup

retry mounting, adds up to 25sec to boot time.

For my boxes I did bump sshd's priority - in case 'mount' would hang the netfs
for too long. thus, netfs announces itself in /etc/motd

Probably, to be included in a distro it should look for options in sysconfig
and be turned off by default.

believe this is based on netfs from initscripts-6.40-1

happy hacking

Comment 29 Bill Nottingham 2005-09-21 21:12:05 UTC

With the goal of minimizing risk of change for deployed systems, and in response
to customer and partner requirements, Red Hat takes a conservative approach when
evaluating changes for inclusion in maintenance updates for currently deployed
products. The primary objectives of update releases are to enable new hardware
platform support and to resolve critical defects. At this stage, this behavior
isn't going to be changed for RHEL2.1/RHEL 3/RHEL 4.

Closing as deferred.

Note You need to log in before you can comment on or make changes to this bug.