Bug 157759

Summary: named crashes when NetworkManager started using init.d script
Product: [Fedora] Fedora Reporter: Rob Kooper <kooper>
Component: NetworkManagerAssignee: Dan Williams <dcbw>
Status: CLOSED WORKSFORME QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: dcbw, jvdias
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-04 17:09:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 136451    
Attachments:
Description Flags
NetworkManager-named.conf
none
debugging named script none

Description Rob Kooper 2005-05-14 16:27:10 UTC
Description of problem:
Starting NetworkManager using the init.d script will result in a crash when it
tries to start the named. Starting NetworkManager without using the script it
works fine.

I have a wired and a wireless card in my machine, both are not active at bootup. 

Version-Release number of selected component (if applicable):
NetworkManager-0.4-10.cvs20050404

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
named crashes after being started.

Expected results:


Additional info:
Following is the information from /var/log/messages

Starting NetworkManager using /etc/rc.d/init.d script
May 14 11:09:24 gonzo kernel: eth1: New link status: Disconnected (0002)
May 14 11:09:25 gonzo kernel: eth1: New link status: Connected (0001)
May 14 11:09:27 gonzo named[3161]: starting BIND 9.3.1 -f -u named -c
/var/named/data/NetworkManager-named.conf
May 14 11:09:27 gonzo named[3161]: found 1 CPU, using 1 worker thread
May 14 11:09:27 gonzo named[3161]: ./main.c:476: unexpected error:
May 14 11:09:27 gonzo named[3161]: ns_taskmgr_create() failed: no available threads
May 14 11:09:27 gonzo named[3161]: create_managers() failed: unexpected error
May 14 11:09:27 gonzo named[3161]: exiting (due to early fatal error)

Starting by hand, just NetworkManager
May 14 11:09:48 gonzo kernel: eth1: New link status: Disconnected (0002)
May 14 11:09:48 gonzo kernel: eth1: New link status: Connected (0001)
May 14 11:09:50 gonzo kernel: eth1: New link status: Disconnected (0002)
May 14 11:09:50 gonzo named[3253]: starting BIND 9.3.1 -f -u named -c
/var/named/data/NetworkManager-named.conf
May 14 11:09:50 gonzo named[3253]: found 1 CPU, using 1 worker thread
May 14 11:09:51 gonzo named[3253]: loading configuration from
'/var/named/data/NetworkManager-named.conf'
May 14 11:09:51 gonzo named[3253]: listening on IPv4 interface lo, 127.0.0.1#53
May 14 11:09:51 gonzo named[3253]: /var/named/data/NetworkManager-named.conf:7:
no forwarders seen; disabling forwarding
May 14 11:09:51 gonzo named[3253]: /var/named/data/NetworkManager-named.conf:7:
no forwarders seen; disabling forwarding
May 14 11:09:51 gonzo named[3253]: running
May 14 11:09:52 gonzo kernel: eth1: New link status: Connected (0001)

Comment 1 Dan Williams 2005-05-14 18:40:10 UTC
Over to named... However, it sounds like it could be an SELinux issue, since
we've run into this before where NM dies when started from initscripts but is
fine when run normally as root.


Comment 2 Jason Vas Dias 2005-05-16 13:09:51 UTC
The first run of named shows that pthread_create() failed - this 
can be due to lack of memory.

The second run shows there was an error in the configuration file -
forwarding is disabled because of it, but named continues to run.

Please attach the /var/named/data/NetworkManager-named.conf 
configuration file you are using - this appears to be the source
of the problem.







Comment 3 Rob Kooper 2005-05-16 13:35:29 UTC
Created attachment 114421 [details]
NetworkManager-named.conf

Attached is the NetworkManager-named.conf file. I am running on a IBM T30 with
1Gb of memory.

Comment 4 Jason Vas Dias 2005-05-16 16:44:03 UTC
RE: ns_taskmgr_create() failed: no available threads

Have you changed the stacksize ulimit from the default (10240 KB) ? 
ie. do you issue a "ulimit -s" in your initscripts or /etc/profile ?
The only reason I know of why named's pthread_create() might fail
is that named always asks for the pthread_attr_getstacksize(...)
stack for each of its 4 threads. If the stacksize rlimit is 
unreasonably large, the thread create can fail . What is your
threads-max limit (cat /proc/sys/kernel/threads-max -> 16364 )?  

RE: no forwarders seen; disabling forwarding

This means NetworkManager is starting named with an empty
forwarders{ ... } clause (ie. NO nameservers are configured) .
NM should not start named if there are no nameservers to forward to.


Comment 5 Rob Kooper 2005-05-16 19:40:00 UTC
ulimit -a reports stack size of 10240 and threads-max = 32750

I started NetworkManager and before I connect, indeed the forwarders is an empty
clause.

Comment 6 Jason Vas Dias 2005-05-17 20:50:55 UTC
How reproducible is the named crash problem for you ?

If the named crash only happened once, then it is likely 
to have been caused by transient resource exhaustion - this
can happen to any process that relies on pthread_create() 
(NM included) and is not a bug.

The fact that NetworkManager starts named at all with an
empty forwarders clause, nothing else in the config file,
and 127.0.0.1 in resolv.conf is a NetworkManager bug .





Comment 7 Rob Kooper 2005-05-17 22:55:49 UTC
Still happens, even with the update of today of NetworkManager. I have
not been able to start it correctly. I might try and reinstall FC4T3
later this week and see if it still exists after this.


Comment 8 Jason Vas Dias 2005-05-19 15:10:44 UTC
Please do try to reproduce this bug with the latest glibc*-2.3.5-6 and
bind-9.3.1-4 from FC4 / rawhide. I cannot reproduce it here.

glibc-2.3.4-19 introduced new threads libraries, which BIND was compiled
to use . 

If you can still reproduce this problem with glibc*-2.3.5-6 and bind-9.3.1-4:

-  Do you have SELinux enabled ? If so, ensure you are up-to-date with
   selinux-policy-targeted, libselinux, policycoreutils and libsepol .

   -  Does the problem still occur with SELinux disabled ?
      boot with "selinux=no" grub boot argument.

If the problem is reproducible with the latest glibc, BIND and selinux RPMS:
please download the attached "named" script and do the following:
   # pkill -TERM named
   # mv /usr/sbin/named /usr/sbin/named_exe
   # cp -fp named /usr/sbin
   # restorecon /usr/sbin/named
   # mkdir /tmp/named

Once you have reproduced the problem, then:
   # tar -cpvf - /tmp/named | gzip -9 > /tmp/named.tar.gz
   # mv /usr/sbin/named /usr/sbin/named_dbg
   # mv /usr/sbin/named_exe /usr/sbin/named

and append the named.tar.gz file to this bug.


Comment 9 Jason Vas Dias 2005-05-19 15:12:00 UTC
Created attachment 114569 [details]
debugging named script

Comment 10 Rob Kooper 2005-05-20 01:47:04 UTC
I think I can reliably reproduce it now. The problem is indeed a SELinux
problem. After I did a restorecon -R /etc the script works. Here is how I can
break it.

1. Start named using the script /etc/rc.d/init/named start
2. Start network configuration using Desktop->System Setings->Network
3. Setup a wireless connection and activate the interface
4. Surf the web
5. Stop named using /etc/rc.d/init/named stop
6. Start NetworkManger using /etc/rc.d/init/NetworkManager start

This will make the SELinux stop named when started from NetworkManager and
making NetworkManager fail.

Comment 11 Jason Vas Dias 2005-05-20 02:51:59 UTC
Aha ! Many thanks for the information.

So there are two problems here:
1. The NetworkManager init script does not get the right SELinux context
   during installation - or did you write to the script after you installed
   the RPM ? If not, NM should be setting the context of its initscript 
   correctly in the RPM .

2. NetworkManager currently cannot be run when an instance of named is run
   from /etc/rc.d/init/named . NetworkManager requires its own dedicated
   named at the moment, and named cannot be used for other purposes on the 
   same machine.
   There is work in the pipeline to rectify this - I've completed a version
   of named that provides dynamic management of forwarding zones over the D-BUS.
   Then you could run named at boot with its standard named.conf file (or one
   you've customized) using /etc/init.d/named, having it serve authoritative
   zones over external interfaces, while still dynamically configuring the
   forwarding zones used for queries from the localhost interface, and NM
   would not have to start up / shut down named every time it brings an 
   interface up / down .
   A version of NM that uses bind-dbus should be out shortly.


Comment 12 Rob Kooper 2005-05-20 04:07:39 UTC
Actually I think NetworkManager starts named from inside the application. It
seems that if I use system-config-network it changes a file somewhere such that
SElinux will complain next time I start /etc/rc.d/init.d/NetworkManager, only
way to fix this is running restorecon

Comment 13 Jason Vas Dias 2005-05-20 14:07:35 UTC
Well, there isn't much that the BIND package can do about these issues.

NetworkManager should:
 - ideally not run named at all - it should use existing named
   started from initscript ( and use D-BUS to manage forwarders ).
   Until it does so: 
   - NM should either use the /etc/init.d/named script to start named
     or check that no named instance is running before it runs named
   - should make the SELinux policy of its initscript compatible with
     running named .
Over to NetworkManager.

Comment 14 Rob Kooper 2005-06-04 17:09:55 UTC
Tried today to see if I could reproduce this bug, but I could not. It seems that
somewhere something changed enough to not trigger this again. Will close the bug.

There is still the extra message in the logfile stating that named.conf has no
forwarders in it.