631620 – ordering cycles exist (+ breaking them deletes wrong services)

Bug 631620 - ordering cycles exist (+ breaking them deletes wrong services)

Summary: ordering cycles exist (+ breaking them deletes wrong services)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	hal
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Richard Hughes
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	632037 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-09-08 02:05 UTC by Jake Edge
Modified:	2010-09-27 03:56 UTC (History)
CC List:	9 users (show)
Fixed In Version:	hal-0.5.14-5.fc14
Clone Of:
Environment:
Last Closed:	2010-09-27 03:56:20 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
A log of the boot with systemd.log_level=debug systemd.log_target=kmsg (189.49 KB, text/plain) 2010-09-08 23:47 UTC, Jake Edge	no flags	Details
[PATCH] Report correctly which job was deleted (721 bytes, patch) 2010-09-10 14:35 UTC, Michal Schmidt	no flags	Details \| Diff
Contents of /etc/rc.d/init.d/jexe (9.79 KB, text/plain) 2010-09-13 17:43 UTC, Gene Snider	no flags	Details
View All

Description Jake Edge 2010-09-08 02:05:31 UTC

Description of problem:

The battery/power indicator does not work once you log in to KDE.  It works if you boot with init=/sbin/upstart and "it" also works if you log in to GNOME.

Version-Release number of selected component (if applicable):

rawhide updated on the morning of 7 September (with the exception of Firefox and xulrunner -- presumably completely unrelated to this problem -- due to unresolvable dependency problems)

systemd-9-3.fc15.i686

How reproducible:

for me, completely reliably

Steps to Reproduce:
1. boot with systemd
2. log in to KDE
3. battery indicator says "no battery" and/or "not plugged in"
  
Actual results:

battery indicator says "no battery" and/or "not plugged in"

Expected results:

it's plugged into the mains, so not that :)

Additional info:

based on IRC conversations with kay and michich, it would seem that systemd is not starting haldaemon -- it can be started manually with 'systemctl start haldaemon.service'

Comment 1 Bill Nottingham 2010-09-08 02:37:43 UTC

This seems odd... hal isn't a systemd nor a dbus service. What does 'chkconfig --list haldaemon' say?

Comment 2 Jake Edge 2010-09-08 02:54:24 UTC

ah yes, i should have mentioned that:

it said "on" for 3-5

also, before starting it manually, systemctl status haldaemon.service did not show systemd trying to start it, but as i said above, it could be started manually successfully ...

Comment 3 Michal Schmidt 2010-09-08 06:29:59 UTC

On IRC you mentioned a log contained something about a dependency loop involving haldaemon. Please attach the log.
Also on IRC I suggested booting with "systemd.log_level=debug systemd.log_target=kmsg" in order to get useful information in dmesg. Please attach this too.

Comment 4 Jake Edge 2010-09-08 11:49:00 UTC

Here is the relevant section from /var/log/messages:

Sep  7 12:13:33 ouzel kernel: init[1]: systemd 9 running in system mode. (+PAM +LIBWRAP -AUDIT +SELINUX)
Sep  7 12:13:33 ouzel kernel: NET: Registered protocol family 10
Sep  7 12:13:33 ouzel kernel: lo: Disabled Privacy Extensions
Sep  7 12:13:33 ouzel kernel: init[1]: Set hostname to <ouzel>.
Sep  7 12:13:33 ouzel kernel: init[1]: Found ordering cycle on NetworkManager.service/start
Sep  7 12:13:33 ouzel kernel: init[1]: Walked on cycle path to haldaemon.service/start
Sep  7 12:13:33 ouzel kernel: init[1]: Walked on cycle path to openct.service/start
Sep  7 12:13:33 ouzel kernel: init[1]: Walked on cycle path to network.target/start
Sep  7 12:13:33 ouzel kernel: init[1]: Walked on cycle path to NetworkManager.service/start
Sep  7 12:13:33 ouzel kernel: init[1]: Breaking ordering cycle by deleting job NetworkManager.service/start
Sep  7 12:13:33 ouzel kernel: udev[414]: starting version 161

I'll reboot with those params and grab that log later today and attach it.

Comment 5 Lennart Poettering 2010-09-08 23:19:19 UTC

Ah, a dep cycle loop. Looks like a problem that always existed in the SysV scripts but never mattered much. With systemd this now becomes visible.

Hmm, so the necessary fix here is probably that one of these ordering dependencies need to go away. The chain is NetworkManager AFTER haldaemon AFTER openct AFTER network AFTER NetworkManager. Of this chain it makes me wonder why openct is placed before hal and after network. That looks very wrong. I wonder where those ordering deps come from.

Comment 6 Jake Edge 2010-09-08 23:47:18 UTC

Created attachment 446105 [details]
A log of the boot with systemd.log_level=debug systemd.log_target=kmsg

Comment 7 Michal Schmidt 2010-09-09 08:56:39 UTC

Also, the log says:

Sep  8 17:31:37 ouzel kernel: init[1]: Breaking ordering cycle by deleting job NetworkManager.service/start

but NetworkManager is in fact started afterwards:

Sep  8 17:31:38 ouzel kernel: init[1]: About to execute: /usr/sbin/NetworkManager --no-daemon
Sep  8 17:31:38 ouzel kernel: init[1]: Forked /usr/sbin/NetworkManager as 1197
Sep  8 17:31:38 ouzel kernel: init[1]: NetworkManager.service changed dead -> start


While the only two lines about haldaemon are:

Sep  8 17:31:37 ouzel kernel: init[1]: Added job haldaemon.service/start to transaction
...
Sep  8 17:31:37 ouzel kernel: init[1]: Walked on cycle path to haldaemon.service/start


It looks like it was not really NetworkManager.service/start which was deleted, but haldaemon.service/start.

Comment 8 Bill Nottingham 2010-09-09 15:52:30 UTC

Well, if you're breaking the chain, remove the hal dep from NM. NM doesn't use HAL.

Comment 9 Bill Nottingham 2010-09-09 15:53:37 UTC

... although, looking at the NM init script, it doesn't set a dep on hal.

Comment 10 Michal Schmidt 2010-09-09 22:46:55 UTC

*** Bug 632037 has been marked as a duplicate of this bug. ***

Comment 11 Michal Schmidt 2010-09-09 22:52:12 UTC

Bug 632037 had another case where a cycle was detected. A different service was removed than was claimed too.
Obviously, haldaemon.service/start could not have been removed 4 times in a row. And the reporter was missing his cpuspeed service (which is also involved in the loop):

init[1]: Found ordering cycle on haldaemon.service/start
init[1]: Walked on cycle path to irqbalance.service/start
init[1]: Walked on cycle path to restorecond.service/start
init[1]: Walked on cycle path to jexec.service/start
init[1]: Walked on cycle path to haldaemon.service/start
init[1]: Breaking ordering cycle by deleting job haldaemon.service/start
init[1]: Found ordering cycle on haldaemon.service/start
init[1]: Walked on cycle path to cpuspeed.service/start
init[1]: Walked on cycle path to restorecond.service/start
init[1]: Walked on cycle path to jexec.service/start
init[1]: Walked on cycle path to haldaemon.service/start
init[1]: Breaking ordering cycle by deleting job haldaemon.service/start
init[1]: Found ordering cycle on haldaemon.service/start
init[1]: Walked on cycle path to restorecond.service/start
init[1]: Walked on cycle path to jexec.service/start
init[1]: Walked on cycle path to haldaemon.service/start
init[1]: Breaking ordering cycle by deleting job haldaemon.service/start
init[1]: Found ordering cycle on haldaemon.service/start
init[1]: Walked on cycle path to jexec.service/start
init[1]: Walked on cycle path to haldaemon.service/start
init[1]: Breaking ordering cycle by deleting job haldaemon.service/start

Comment 12 Gene Snider 2010-09-09 23:34:16 UTC

Bug 632037 was reported by me.  After looking at Comment 11, I decided to check out three other services.  Each service immediately following the line:

init[1]: Found ordering cycle on haldaemon.service/start

was listed as inactive (dead).  For example:
[gene@Mobile-PC ~]$ sudo service irqbalance status
irqbalance.service - LSB: start and stop irqbalance daemon
	  Loaded: loaded (/etc/rc.d/init.d/irqbalance)
	  Active: inactive (dead)
	  CGroup: name=systemd:/systemd-1/irqbalance.service

I had to manually start irqbalance.service, cpuspeed.service (as reported in bug 632037), restorecond.service, and jexec.service.  It looks like removing haldaemon.service/start actually removes the next service?

Gene

Comment 13 Michal Schmidt 2010-09-10 11:13:44 UTC

I cannot reproduce any of the ordering cycles.
Do you have any modifications in the SysV initscripts? Try:
  rpm -Vf /etc/rc.d/init.d/*

Gene, where does jexec.service come from? I found a jexec binary in java-1.6.0-openjdk, but no initscript of this name.

Comment 14 Jake Edge 2010-09-10 13:36:33 UTC

Here is my ouput from that command:

[root@ouzel jake]# rpm -Vf /etc/rc.d/init.d/*                                   
.M.......  c /etc/cups/subscriptions.conf                                       
.......T.  c /etc/dnsmasq.conf                                                  
S.5....T.  c /etc/inittab                                                       
prelink: /usr/libexec/hal-storage-cleanup-all-mountpoints: at least one of file's dependencies has changed since prelinking                                     
S.?......    /usr/libexec/hal-storage-cleanup-all-mountpoints                   
prelink: /usr/libexec/hal-storage-closetray: at least one of file's dependencies has changed since prelinking                                                   
S.?......    /usr/libexec/hal-storage-closetray                                 
S.5....T.  c /etc/inittab                                                       
S.5....T.  c /etc/inittab                                                       
file /etc/rc.d/init.d/livesys is not owned by any package                       
file /etc/rc.d/init.d/livesys-late is not owned by any package                  
S.5....T.  c /etc/inittab                                                       
S.5....T.  c /etc/inittab                                                       
S.5....T.  c /etc/inittab                                                       
S.5....T.  c /etc/inittab                                                       
S.5....T.  c /etc/inittab                                                       
S.5....T.    /etc/cron.d/smolt

Comment 15 Bill Nottingham 2010-09-10 14:29:14 UTC

Jake/Gene - are you running from the livecd, from a system installed from the livecd, or from a system installed from the basic install media?

Comment 16 Michal Schmidt 2010-09-10 14:35:52 UTC

Created attachment 446524 [details]
[PATCH] Report correctly which job was deleted

This patch fixes the lesser problem of incorrect reporting which job was deleted.

Comment 17 Jake Edge 2010-09-10 14:40:21 UTC

For me:

Version-Release number of selected component (if applicable):

rawhide updated on the morning of 7 September (with the exception of Firefox
and xulrunner -- presumably completely unrelated to this problem -- due to
unresolvable dependency problems)

systemd-9-3.fc15.i686

Comment 18 Gene Snider 2010-09-10 18:15:42 UTC

My system started as F13 alpha, and was updated to F14 rawhide a while ago.  The F13 install was made from normal install media, not livecd.  It looks like the jexec init script is installed with Sun java:

[gene@Mobile-PC ~]$ head -n20 /etc/rc.d/init.d/jexec
#!/bin/sh
#
# Copyright 2006 Sun Microsystems, Inc. All rights reserved.
# SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
#
# @(#)jexec     1.1 05/05/24
#
# A script for supporting the direct execution of binary formats through
# binfmt_misc.
#
# More information about binfmt_misc is available at the binfmt_misc homepage:
#
#     http://www.tat.physik.uni-tuebingen.de/~rguenth/linux/binfmt_misc.html

I modified the jexec init script slightly to clean up the logger output, but that had no effect on the service starting prior to systemd.  Here's the output you asked for in Comment 13:

[gene@Mobile-PC ~]$ sudo rpm -Vf /etc/rc.d/init.d/*
S.5....T.  c /etc/sysconfig/cpuspeed
.......T.  c /etc/dnsmasq.conf
..5....T.  c /etc/inittab
..5....T.  c /etc/inittab
S.5....T.  c /etc/sysconfig/ip6tables-config
S.5....T.  c /etc/sysconfig/iptables-config
file /etc/rc.d/init.d/jexec.save is not owned by any package
..5....T.  c /etc/inittab
S.5....T.  c /etc/sysconfig/lm_sensors
..5....T.  c /etc/inittab
..5....T.  c /etc/inittab
..5....T.  c /etc/inittab
..5....T.  c /etc/inittab
..5....T.  c /etc/inittab
S.5....T.    /etc/cron.d/smolt

Gene

Comment 19 Lennart Poettering 2010-09-11 00:21:51 UTC

I have now merged Michal's patch. Thanks!

The HAL dep cycle I will fix in the HAL package, by turning it into a bus-activatable service. This will fix the issue as a side effect but also be the next step in our HALsectomy. This might appear as a big change, but I think it is relatively without risk as Ubuntu has already shippe dmultiple releases with a bus-activated HAL.

Comment 20 Lennart Poettering 2010-09-12 19:46:05 UTC

Jake, can I bribe you into testing whether installing this koji build fixes the dep cycles for you?

http://koji.fedoraproject.org/koji/taskinfo?taskID=2463262 (for x86_64)
http://koji.fedoraproject.org/koji/taskinfo?taskID=2463263 (for i386)

If so, then I'll file a bodhi ticket for this.

I'd be very thankful if you could find the time to give this a quick test. If not please report back, then I'll submit it to Bodhi right-away!

(Reboot recommended after upgrading the rpm, to test HAL autospawning)

Thanks a lot!

Comment 21 Jake Edge 2010-09-13 15:02:34 UTC

well i fail to see the bribe, but i went ahead and tested it anyway :)

i installed the hal packages and all seems well ... the battery indicator seems to work, no complaints about ordering cycles, etc.

Comment 22 Gene Snider 2010-09-13 17:19:51 UTC

I tested the hal package, and things improved, but are not entirely fixed.  Here's the dep cycles I still see:

Sep 13 10:08:53 Mobile-PC kernel: init[1]: Set hostname to <Mobile-PC>.
Sep 13 10:08:53 Mobile-PC kernel: init[1]: Found ordering cycle on restorecond.service/start
Sep 13 10:08:53 Mobile-PC kernel: init[1]: Walked on cycle path to jexec.service/start
Sep 13 10:08:53 Mobile-PC kernel: init[1]: Walked on cycle path to restorecond.service/start
Sep 13 10:08:53 Mobile-PC kernel: init[1]: Breaking ordering cycle by deleting job restorecond.service/start

It looks like jexec.service is being deleted, because I have to manually start it after the boot.  Also, I'm now seeing very odd behaviour when I run ps in a terminal:

[gene@Mobile-PC ~]$ ps
Unknown HZ value! (87) Assume 1024.
  PID TTY          TIME CMD
 3302 pts/0    00:00:00 bash
 5664 pts/0    00:00:00 ps

The value isn't always 87, if that helps.  It also seems to happen only for the ps command.  I haven't found any info about that error in the error logs yet.

Is there a shutdown log I can check? Saving the random seed is failing on shutdown, and I don't know why.

Thanks,
Gene

Comment 23 Michal Schmidt 2010-09-13 17:39:29 UTC

Gene, since it's not found in Fedora, please attach the script /etc/init.d/jexec so we can see what dependencies it declares.

Comment 24 Gene Snider 2010-09-13 17:43:11 UTC

Created attachment 447003 [details]
Contents of /etc/rc.d/init.d/jexe

Here's the init script you asked for.

Gene

Comment 25 Michal Schmidt 2010-09-13 17:49:35 UTC

Now this surely is a mistake:
# Default-Start: 1 2 3 4 5
# Default-Start: 0 6

If you change the second "Start" to "Stop", does it change anything?

Or if you remove this?:
# Provides: binfmt_misc

Comment 26 Gene Snider 2010-09-13 18:26:03 UTC

Replacing the second Start with Stop fixed the dep cycle problem, thanks!  
Should anything be done about the "Provides: binfmt_misc" line?  Also, I checked the original script from Sun, and that section of the script is identical to the one I attached in comment 24.  Would you like me to attach the original script?

I'm sure a number of users are running Sun java, so should someone let upstream know about their typo?

Thanks,
Gene

Comment 27 Fedora Update System 2010-09-13 19:28:59 UTC

hal-0.5.14-5.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/hal-0.5.14-5.fc14

Comment 28 Fedora Update System 2010-09-14 04:29:42 UTC

hal-0.5.14-5.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update hal'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/hal-0.5.14-5.fc14

Comment 29 Adam Williamson 2010-09-15 14:51:37 UTC

Given that we're reverting to sequential upstart for F14, I believe this bug won't occur and shouldn't block beta. Please test with upstart and re-apply blocker status if it does.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 30 Fedora Update System 2010-09-27 03:56:15 UTC

hal-0.5.14-5.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.