Bug 174745 - avahi startup script does not clean up properly after itself
avahi startup script does not clean up properly after itself
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: avahi (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jason Vas Dias
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-01 16:18 EST by Michal Jaegermann
Modified: 2007-11-30 17:11 EST (History)
0 users

See Also:
Fixed In Version: avahi-0.6.1-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-12-20 16:11:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
trace from /usr/sbin/avahi-daemon -k with daemon running (7.63 KB, text/plain)
2005-12-05 16:06 EST, Michal Jaegermann
no flags Details
the same as the previous one but this time avahi-daemon does not run (7.80 KB, text/plain)
2005-12-05 16:11 EST, Michal Jaegermann
no flags Details

  None (edit)
Description Michal Jaegermann 2005-12-01 16:18:04 EST
Description of problem:

# service avahi-daemon stop
Shutting down Avahi daemon:
# service avahi-daemon start
Starting Avahi daemon... Process 8149 died: No such process; removing PID file.
(/var/run/avahi-daemon//pid)
                                                           [  OK  ]

and similar things happen with 'service avahi-daemon restart'.  That means
that 'pid' file is not handled properly even if a new process does start.

Version-Release number of selected component (if applicable):
avahi-0.6-2

How reproducible:
always
Comment 1 Jason Vas Dias 2005-12-01 17:20:42 EST
RE: 
> Starting Avahi daemon... Process 8149 died: No such process; removing PID file.

This shows that avahi is unable to start, and that the avahi-daemon process
IS cleaning up after the failed start attempt .

Please ensure that you have updated to the latest selinux-policy-targeted:
  selinux-policy-targeted-2.0.6-2+ 
and avahi packages:
  avahi-*-0.6-3+

I cannot reproduce this problem with the above versions installed.
Comment 2 Michal Jaegermann 2005-12-04 15:33:31 EST
I have here right now avahi-*-0.6-5 packagess and selinux-policy-targeted-2.0.8-1
(although selinux is turned off).  Now entirely skipping a content of
/etc/rc.d/init.d directory after '/usr/sbin/avahi-daemon -D' I see

# ls -l /var/run/avahi-daemon/
total 4
---------- 1 avahi avahi 5 Dec  4 13:24 pid
srwxrwxrwx 1 avahi avahi 0 Dec  4 13:24 socket

So far so good and /var/run/avahi-daemon/pid contains a correct process id.
After '/usr/sbin/avahi-daemon -k' I have

# ls -l /var/run/avahi-daemon/
total 4
---------- 1 avahi avahi 5 Dec  4 13:24 pid

even if the process is gone.  Only after the second '/usr/sbin/avahi-daemon -k'
I see on my screen:

Process 3805 died: No such process; removing PID file. (/var/run/avahi-daemon//pid)
Failed to kill daemon: No such process

and /var/run/avahi-daemon/ is finally empty.  In other words after the
first time 'pid' file was really not removed.  As you can see 'socket' is
gone.

This is what shows up when I am trying to do the same through 'service'.

A difference can be that I am seeing the above on x86_64 and I have
installed both avahi.x86_64 and avahi.i386 packages.  The later is actually
pulled in through a chain of dependencies by openoffice.org packages (although
that does not seem to be relavant to the problem on hands).
Comment 3 Jason Vas Dias 2005-12-05 15:05:04 EST
I'm definitely not seeing this problem with the latest rawhide i386 packages .

Whenever I do a 'service avahi-daemon stop' OR an 'avahi-daemon -k',
BOTH the /var/run/avahi-daemon/pid AND /var/run/avahi-daemon/socket 
are removed. 

It sounds to me as if there is an SELinux permissions issue that prevents
the avahi-daemon from removing the pid file .

There was a problem with upgrading from selinux-policy-targeted-1.27* to
the new "reference policy" selinux-targeted-2.0.x , that required
an SELinux relabel to fix.

With the selinux-policy-targeted-2.0.8-1 package installed, please do, as root:
# touch /.autorelabel
Then reboot .

After SELinux relabelling is complete, do you still see the problem ? 

If so, please clear out your audit log:
 # >/var/log/audit/audit.log
and then reproduce the pid file removal problem.

What is then the output of 
 # audit2allow </var/log/audit/audit.log 
?
Please append the contents, if any, of /var/log/audit/audit.log to this 
bug report.

If there are no audit log contents, please start avahi-daemon, and do :

# strace -vfo /tmp/avahi-daemon-k.strace.log avahi-daemon -k

and compress and append the /tmp/avahi-daemon-k.strace.log to this bug report.
Thanks!
Comment 4 Michal Jaegermann 2005-12-05 16:02:09 EST
I think that you are barking on a wrong tree with SELinux packages.  I wrote
before that SELinux is currently turned _off_ here.  If that affects how
avahi-daemon behaves this is a bug on its own.  BTW - there is no mention
of /var/run/avahi-daemon in whatever happens to be in audit log files.
Besides if that would be a permission problem then the second invocation
of '/usr/sbin/avahi-daemon -k' would not remove that pid file either; but it
does.

Look at this terminal session:

# /usr/sbin/avahi-daemon -D
# pgrep -f -l avahi
24354 avahi-daemon: running [dyna0.local]
24355 avahi-daemon: chroot helper process
# /usr/sbin/avahi-daemon -k
# cat /var/run/avahi-daemon/pid 
24354
# pgrep -f -l avahi
#

Actually '/usr/sbin/avahi-daemon -k' was straced and I attach the whole
trace below.  This trace ends up with:

24367 open("/var/run/avahi-daemon//pid", O_RDWR) = 3
24367 fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
24367 read(3, "24354\n", 255)           = 6
24367 kill(24354, SIG_0)                = 0
24367 fcntl(3, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
24367 close(3)                          = 0
24367 kill(24354, SIGTERM)              = 0
24367 kill(24354, SIG_0)                = -1 ESRCH (No such process)
24367 exit_group(0)                     = ?


Ooops! The process is already gone and you bail out leaving a 'pid' file
behind.  Most likely this file should be unlinked, no matter what, on
ESRCH.   Was this second 'kill(24354, SIG_0)' meant to check if
'kill(24354, SIGTERM)' was effective?  It appears that it was.
Comment 5 Michal Jaegermann 2005-12-05 16:06:42 EST
Created attachment 121867 [details]
trace from /usr/sbin/avahi-daemon -k with daemon running
Comment 6 Michal Jaegermann 2005-12-05 16:11:00 EST
Created attachment 121869 [details]
the same as the previous one but this time avahi-daemon does not run

See the difference?  This time we have 'unlink("/var/run/avahi-daemon//pid") =
0'
after ESRCH and it succeeds without any problems.
Comment 7 Michal Jaegermann 2005-12-06 01:49:26 EST
I looked at the code and I do not understand how you are not seeing the bug
on i386.  avahi-daemon has the following in main.c:

    } else if (config.command == DAEMON_KILL) {
        if (daemon_pid_file_kill_wait(SIGTERM, 5) < 0) {
            avahi_log_warn("Failed to kill daemon: %s", strerror(errno));
            goto finish;
        }

        r = 0;

    } else if ...

but daemon_pid_file_kill_wait() unlinks pid file only on error.  That is
exactly what I am seeing.  Other calls after 'finish:' label which could
have done unlinking are called only if 'wrote_pid_file' is non-zero.
Therefore one can do above

        r = 0;
        wrote_pid_file = 1;
        goto finish;

and after such change 'avahi-daemon -k' indeed removes its pid file (as
the only function in libdaemon which removes pid file without an error
is 'daemon_pid_file_remove()' ).

Replacing 'r = 0;' with 'r = daemon_pid_file_remove();' does not seem a very
good option because maybe we should have called 'avahi_chroot_helper_shutdown()'.
Comment 8 Jason Vas Dias 2005-12-07 14:30:28 EST
I've still not been able to reproduce this problem at all, eg. with :
$ for ((i=0;i<100;i++)) do service avahi-daemon stop || break; service
avahi-daemon start || break; done

The avahi-daemon -k sends a signal to the running avahi-daemon whose pid is 
in the /var/run/avahi-daemon/pid pid file; once this file is removed, control
over the active avahi-daemon by avahi-daemon is no longer possible as currently
implemented.

Upon receipt of the shutdown signal, the active avahi daemon, running in a 
chroot, is meant to send a message to the avahi-daemon 'chroot helper process'
to remove the pid file. 

I've not seen this scheme fail so far .

I've now updated both libdaemon (0.10) and avahi (0.6.1) in Rawhide .

Please test these new versions and let me know if this is still a problem.

I know there have been problems with yum on the x86_64 platform : it has 
incorrectly installed both i386 and x86_64 versions of some packages .

Please check that you have only one architecture of the avahi-* and libdaemon-*
packages installed .
Comment 9 Michal Jaegermann 2005-12-07 15:02:35 EST
> I've still not been able to reproduce this problem at all ...

The tell me what unlinks /var/run/avahi-daemon/pid on '-k'?  Traces which
I attached and reading through the code show that daemon_pid_file_kill_wait()
does that only when an error condition occured.

> once this file is removed, control over the active avahi-daemon by
> avahi-daemon is no longer possible as currently implemented.

Unless I miss something in libdaemon code daemon_pid_file_remove() will
not remove pid file if a process is still active.  In any case a change I made
makes avahi-daemon to behave with respect to its pid file.

> Upon receipt of the shutdown signal, the active avahi daemon, running in a 
> chroot, is meant to send a message to the avahi-daemon 'chroot helper process'
> to remove the pid file.

It does not seem work too well here.  /var/run/avahi-daemon

> Please test these new versions and let me know if this is still a problem.

With what is available today there is

# rpm -q avahi.x86_64
avahi-0.6-6
# rpm -q libdaemon.x86_64
libdaemon-0.8-1
# pgrep -l -f avahi
2708 avahi-daemon: running [dyna0.local]
2709 avahi-daemon: chroot helper process
# avahi-daemon -k
# pgrep -l -f avahi
# cat /var/run/avahi-daemon/pid
2708
# avahi-daemon -D
Process 2708 died: No such process; removing PID file. (/var/run/avahi-daemon//pid)
# pgrep -l -f avahi
12024 avahi-daemon: running [dyna0.local]
12025 avahi-daemon: chroot helper process
# cat /var/run/avahi-daemon/pid
12024

so at least '-D' works correctly even where is stale pid file.
/var/run/avahi-daemon/socket indeed is going away when expected.

When newer versions will become available I will retest.
Comment 10 Jason Vas Dias 2005-12-07 15:23:21 EST
RE: > tell me what unlinks /var/run/avahi-daemon/pid on '-k'

The unlink is done by the avahi-daemon 'chroot helper process' on receipt of a 
UNLINK message from the active avahi-daemon that was signalled by avahi-daemon -k.
The libdaemon daemon_pid_file_remove() is NOT invoked to remove the pid file.

Please ensure that you have ONLY the x86_64 versions of the avahi and 
libdaemon packages installed:

# rpm -e avahi.i386 avahi-devel.i386 avahi-tools.i386 avahi-glib.i386
avahi-glib-devel.i386 avahi-qt3.i386 avahi-qt3-devel.i386 libdaemon.i386
libdaemon-devel.i386
 
I have tried to reproduce this problem but cannot. 

Please try the updated versions of these packages when they become available.

If you are still able to reproduce this with the new packages, one thing you
could try that would be most helpful is to gather an strace of the chroot 
helper process:

# pgrep -l -f avahi
2708 avahi-daemon: running [dyna0.local]
2709 avahi-daemon: chroot helper process
# strace -vfp 2709 -o /tmp/avahi-chroot-strace.log &
# avahi-daemon -k

Then append the /tmp/avahi-chroot-strace.log to this bug report - thanks .
Comment 11 Michal Jaegermann 2005-12-07 15:58:29 EST
> Please ensure that you have ONLY the x86_64 versions of the avahi and 
> libdaemon packages installed:

I am afraid that currently this is not a practical option.  After
'yum remove avahi.i386' you will see:

Removing:
 avahi                   i386       0.6-6            installed         649 k
Removing for dependencies:
 avahi-glib              i386       0.6-6            installed         9.6 k
 gnome-vfs2              i386       2.13.1-1         installed         4.3 M
 libgsf                  i386       1.13.3-2         installed         232 k
 libwpd                  i386       0.8.3-2          installed         405 k
 openoffice.org-core     i386       1:2.0.1-0.139.1.2  installed         194 M
 openoffice.org-impress  i386       1:2.0.1-0.139.1.2  installed         4.3 M
 openoffice.org-math     i386       1:2.0.1-0.139.1.2  installed         3.1 M
 openoffice.org-writer   i386       1:2.0.1-0.139.1.2  installed         5.8 M

and avahi.i386 pulls in libdaemon.i386.  There is no x86_64 version
of openoffice.  Yes, I can force issue with '--nodeps' but this is likely
not a good idea.  If you are sure that this will not break anything then
possibly you can do a "fake provide" for avahi.i386, and whatever else is
needed, in avahi.x86_64 package? "Obsoletes" can ensure then that
libdaemon.i386 is also gone from a system.  Maybe.

Yes, I know that slightly newer version of openoffice is available.

Still 'file /usr/sbin/avahi-daemon' shows "ELF 64-bit LSB executable, AMD
x86-64" and libraries have different locations anyway.

I will wait for updated packages to show up and will try to revisit.  With
the current rawhide tracing helper on -k gives quite short:

12025 read(6, "\r", 1)                  = 1
12025 unlink("/var/run/avahi-daemon/socket") = 0
12025 write(6, "\0", 1)                 = 1
12025 read(6, "", 1)                    = 0
12025 exit_group(0)                     = ?

which perfectly agrees with what I am seeing here.
Comment 12 Michal Jaegermann 2005-12-08 18:17:49 EST
After an update to avahi-0.6.1-1 and (or?) libdaemon-0.10-1 the problem indeed
disappears.  Tracing what happens on '-k' to chroot helper process shows this
time:

1789  read(7, "\r", 1)                  = 1
1789  unlink("/var/run/avahi-daemon/socket") = 0
1789  write(7, "\0", 1)                 = 1
1789  read(7, "\f", 1)                  = 1
1789  unlink("/var/run/avahi-daemon//pid") = 0
1789  write(7, "\0", 1)                 = 1
1789  read(7, "", 1)                    = 0
1789  exit_group(0)                     = ?

That means that something did change.

Note You need to log in before you can comment on or make changes to this bug.