Bug 475133 - S14nfslock hangs on new fedora 10 release install
Summary: S14nfslock hangs on new fedora 10 release install
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 10
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 493039 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-12-08 03:38 UTC by Peter Glassenbury
Modified: 2009-07-16 07:32 UTC (History)
4 users (show)

Fixed In Version: 1.2.0-3.fc11
Clone Of:
Environment:
Last Closed: 2009-07-16 07:32:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Peter Glassenbury 2008-12-08 03:38:15 UTC
Description of problem:
Installed fedora 10 release (ALL packages) via kickstart 
first reboot (and any others) hangs at S14nfsboot

Version-Release number of selected component (if applicable):
updated to nfs-utils-1.1.4-2.fc10.x86_64.rpm
no change

How reproducible: Always
Install fedora10 release; boot

Steps to Reproduce:
1.Install fedora10 release
2.boot
3.
  
Actual results:
Hang; can't Cntrl C to break out of it; can't get into another screen
(CntrlAltF2); Ctrl Alt DEL will shutdown and reboot machine

Expected results:
Boot to runlevel 3 or 5 depending on inittab

Additional info:
added autofs-5.0.3-36.x86_64.rpm, we needed this package but the hang occurs
with or without it.
updated to nfs-utils-1.1.4-2.fc10.x86_64.rpm (latest)- no change

Have a "workaround" for testing but not sure if it is valid or will cause
further problems in active use. Fine for my system testing at the moment
"cd /etc/rc5.d ; mv S14nfslock S27nfslock"
Shifting to after the S27Netmanager in the boot sequence, gives a 20 second delay but then gets through nfslock and starts X normally. Don't know if this
has a valid nfslock or not.

Comment 1 Peter Glassenbury 2008-12-08 03:58:50 UTC
in a Hanging boot I get messages in the logs
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address1/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address2/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address1/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address2/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
It then repeats 3 or so time until sleeping 32 seconds.


With the workaround in place with the nfslock down in S27 with Netmanager, the logs are....As follows.... Then after that normal completion of boot....

Dec  8 15:58:06 cosc7123 NetworkManager: <info>  starting...
Dec  8 15:58:06 cosc7123 NetworkManager: <WARN>  nm_generic_enable_loopback(): error -17 returned from rtnl_addr_add():#0
12Sucess#012
Dec  8 15:58:06 cosc7123 NetworkManager: <info>  eth0: driver is 'e1000e'.
Dec  8 15:58:06 cosc7123 NetworkManager: <info>  Found new Ethernet device 'eth0'.
Dec  8 15:58:06 cosc7123 NetworkManager: <info>  (eth0): exported as /org/freedesktop/Hal/devices/net_00_19_d1_e4_98_8a
Dec  8 15:58:06 cosc7123 NetworkManager: <info>  Trying to start the supplicant...
Dec  8 15:58:06 cosc7123 NetworkManager: <info>  Trying to start the system settings daemon...
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: Version 1.1.4 Starting
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap:///Ldap_IP_address1: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address2/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address1/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address2/: Can't contact LDAP server
Dec  8 15:58:06 cosc7123 rpc.statd[2017]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Dec  8 15:58:06 cosc7123 nm-system-settings: Loaded plugin ifcfg-fedora: (c) 2007 - 2008 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
Dec  8 15:58:07 cosc7123 nm-system-settings:    ifcfg-fedora: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ... 
Dec  8 15:58:07 cosc7123 nm-system-settings:    ifcfg-fedora:     read connection 'System eth0'
Dec  8 15:58:07 cosc7123 nm-system-settings:    ifcfg-fedora: parsing /etc/sysconfig/network-scripts/ifcfg-lo ... 
Dec  8 15:58:07 cosc7123 nm-system-settings:    ifcfg-fedora:     error: Ignoring loopback device config.
Dec  8 15:58:10 cosc7123 NetworkManager: <info>  (eth0): device state change: 1 -> 2
Dec  8 15:58:10 cosc7123 NetworkManager: <info>  (eth0): bringing up device.
Dec  8 15:58:10 cosc7123 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Dec  8 15:58:10 cosc7123 NetworkManager: <info>  (eth0): preparing device.
Dec  8 15:58:10 cosc7123 NetworkManager: <info>  (eth0): deactivating device (reason: 2).
Dec  8 15:58:10 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address1/: Can't contact LDAP server
Dec  8 15:58:10 cosc7123 rpc.statd[2017]: nss_ldap: failed to bind to LDAP server ldap://Ldap_IP_address2/: Can't contact LDAP server
Dec  8 15:58:10 cosc7123 rpc.statd[2017]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Dec  8 15:58:11 cosc7123 kernel: 0000:00:19.0: eth0: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
Dec  8 15:58:11 cosc7123 kernel: 0000:00:19.0: eth0: 10/100 speed: disabling TSO
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  (eth0): carrier now ON (device state 2)
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  (eth0): device state change: 2 -> 3
Dec  8 15:58:11 cosc7123 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) starting connection 'System eth0'
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  (eth0): device state change: 3 -> 4
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) scheduled...
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) started...
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) scheduled...
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) complete.
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) starting...
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  (eth0): device state change: 4 -> 5
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) successful.
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) scheduled.
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) complete.
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  (eth0): device state change: 5 -> 7
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Beginning DHCP transaction.
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  dhclient started with pid 2036
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
Dec  8 15:58:11 cosc7123 dhclient: Internet Systems Consortium DHCP Client 4.0.0
Dec  8 15:58:11 cosc7123 dhclient: Copyright 2004-2007 Internet Systems Consortium.
Dec  8 15:58:11 cosc7123 dhclient: All rights reserved.
Dec  8 15:58:11 cosc7123 dhclient: For info, please visit http://www.isc.org/sw/dhcp/
Dec  8 15:58:11 cosc7123 dhclient: 
Dec  8 15:58:11 cosc7123 NetworkManager: <info>  DHCP: device eth0 state changed (null) -> preinit
Dec  8 15:58:11 cosc7123 dhclient: Listening on LPF/eth0/00:19:d1:e4:98:8a
Dec  8 15:58:11 cosc7123 dhclient: Sending on   LPF/eth0/00:19:d1:e4:98:8a
Dec  8 15:58:11 cosc7123 dhclient: Sending on   Socket/fallback
Dec  8 15:58:14 cosc7123 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4
Dec  8 15:58:14 cosc7123 dhclient: DHCPOFFER from Dhcp_IP_address
Dec  8 15:58:14 cosc7123 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec  8 15:58:14 cosc7123 dhclient: DHCPACK from Dhcp_IP_address
Dec  8 15:58:14 cosc7123 NetworkManager: <info>  DHCP: device eth0 state changed preinit -> bound
Dec  8 15:58:14 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) scheduled...
Dec  8 15:58:14 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) started...
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    address My_IP
Dec  8 15:58:14 cosc7123 dhclient: bound to My_IP -- renewal in 33526 seconds.
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    prefix 21 (255.255.248.0)
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    gateway GatewayIP_address
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    hostname 'cosc7123'
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    nameserver 'nameserver1IP'
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    nameserver 'nameserver2IP'
Dec  8 15:58:14 cosc7123 NetworkManager: <info>    domain name 'our.domain'
Dec  8 15:58:14 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) scheduled...
Dec  8 15:58:14 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) complete.
Dec  8 15:58:14 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) started...
Dec  8 15:58:15 cosc7123 NetworkManager: <info>  (eth0): device state change: 7 -> 8
Dec  8 15:58:15 cosc7123 NetworkManager: <info>  Policy set 'System eth0' (eth0) as default for routing and DNS.
Dec  8 15:58:15 cosc7123 NetworkManager: <info>  Activation (eth0) successful, device activated.
Dec  8 15:58:15 cosc7123 NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.
Dec  8 15:58:19 cosc7123 rpc.statd[2017]: nss_ldap: reconnected to LDAP server ldap://LDAP_IPAddress/ after 3 attempts

Comment 2 Steve Dickson 2008-12-08 12:31:07 UTC
It appears the reason nfslock is hanging is due to the fact the
LDAP server is non-responsive. I would contend you fix the non-responsive 
LDAP server and the hang will go a way...

Comment 3 Peter Glassenbury 2008-12-08 23:02:24 UTC
Except that the LDAP server DOES respond when the nfslock is later in the 
boot cycle (no other changes) and as shown in the workaround logs,
the client seems to need S27Networkmanager
to have completed before it will connect properly to the LDAP server

Comment 4 Steve Dickson 2008-12-09 01:38:51 UTC
Okay... I guess I missed the fact you having NetworkManager bring up your
network... sorry about that...  And yes it seems NetworkManager is 
started very late in the boot process.

Just to ensure that is truly is the problem, use system-config-network
to turn off then NetworkManager from bring up the the interfaces,
make sure the interfaces are brought up during boot and finally
uses 'chkconfig network on' to ensure the boot process does indeed 
bring up the interfaces... 

If all that works, we will move this bug to the NetworkManager  world
since thats where the problem truly lies... tia..

Comment 5 Peter Glassenbury 2008-12-10 20:50:04 UTC
Sorry for the delay. I am not sure if I have the right information.
When I go into system-config-network, the ethernet device is does NOT have
the tick in the "Controlled by NetNanager" line. It has"Activate device when computer starts".
In the kickstart file we do a install with the following lines relevant to networking..
network --device=eth0 --bootproto=dhcp
bootloader --md5pass=xxx --location=mbr --driveorder=sda --append="noipv6"

chkconfig has the following settings
/sbin/chkconfig --list |grep -i net
netconsole      0:off   1:off   2:off   3:off   4:off   5:off   6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
netplugd        0:off   1:off   2:off   3:off   4:off   5:off   6:off
network         0:off   1:off   2:off   3:off   4:off   5:off   6:off
NetworkManager  0:off   1:off   2:on    3:on    4:on    5:on    6:off

It appears that I have the system-config-network not controlling via networkmanager and the service startup files starting it..
Is that correct?.. At this point in time I should be running an almost
vanilla F10 .

Comment 6 Steve Dickson 2008-12-10 20:59:14 UTC
I believe 
    "network         0:off   1:off   2:off   3:off   4:off   5:off   6:off"
is the problem... 'service network start' is not happening at boot time.

Try 'chkconfig network on' and then reboot.

Comment 7 Peter Glassenbury 2008-12-10 21:23:04 UTC
OK.. from your last two comments I have 
    chkconfig network on
    chkconfig NetManager off
    mv nfslock back to S14nfslock
    reboot

Now it works. 
Question now becomes "how did the install produce a broken setup"
and should I just leave Netmanager off when it is supposed to be the default
new fedora networking package

Get the messages below in dmesg
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
ADDRCONF(NETDEV_UP): eth0: link is not ready
0000:00:19.0: eth0: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
0000:00:19.0: eth0: 10/100 speed: disabling TSO
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
RPC: Registered udp transport module.
RPC: Registered tcp transport module.

Comment 8 Bill Nottingham 2008-12-11 16:36:56 UTC
A couple of things here:

- if you go into system-config-network, you need to either make sure that 'Interface is controlled by NetworkManager' is checked before saving any changes, or that you manually enable the network service. Otherwise, you'll have no networking. (This is bug 467946)

- nfslock should probably have a dependency on the 'network' init facility.

Something like:

### BEGIN INIT INFO
# Provides:          nfslock
# Required-Start:    $network
# Required-Stop:     $network
# Description:       <goes here>
### END INIT INFO

in the header of the file. You'd then need to do 'chkconfig nfslock resetpriorities' if you're testing this on an installed system.

Comment 9 Peter Glassenbury 2008-12-16 00:15:16 UTC
Sorry for the delay again...

I have looked at the suggested fixes and have a working INSTALLED machine.

Can you tell me what to do in a kickstart to turn the "network" service on?

The issue is, I don't think I have chkconfig available in the post-install
(still running a memory linux with the disk mounted) and on the first boot
the system hangs very early in the boot. Should I create a hack... say
a S13hack to do a "chkconfig network on ; service network start" just before
going into the S14nfslock. I CAN copy a file into place in the post-install
before firstboot.

The changes in comment#8 are for after I have an installed and running system
so don't apply on install unless I can do a setting somewhere in the 
post-install... or if it is fixed by the dependency, that will be done in the
rpm.

Ideas?

Comment 10 Bill Nottingham 2008-12-16 15:04:19 UTC
You can either run chkconfig by hand in the %post of your kickstart file, or use '%services --disabled NetworkManager --enabled network' (see the kickstart docs)

Comment 11 Steve Dickson 2009-06-11 15:08:48 UTC
*** Bug 493039 has been marked as a duplicate of this bug. ***

Comment 12 Fedora Update System 2009-06-11 18:30:31 UTC
nfs-utils-1.2.0-3.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/nfs-utils-1.2.0-3.fc11

Comment 13 Fedora Update System 2009-06-16 01:22:22 UTC
nfs-utils-1.2.0-3.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update nfs-utils'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-5985

Comment 14 Fedora Update System 2009-07-16 07:32:09 UTC
nfs-utils-1.2.0-3.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.