635066 – systemd is not allowing the system to reboot

Bug 635066 - systemd is not allowing the system to reboot

Summary: systemd is not allowing the system to reboot

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Lennart Poettering
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	636757 (view as bug list)
Depends On:
Blocks:	F14Target
TreeView+	depends on / blocked

Reported:	2010-09-17 17:51 UTC by Nicolas Mailhot
Modified:	2010-11-18 23:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-11-18 23:27:43 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nicolas Mailhot 2010-09-17 17:51:19 UTC

Sep 17 19:39:59  systemd[1]: Activating special unit ctrl-alt-del.target
Sep 17 19:39:59  systemd[1]: Found ordering cycle on cups.service/stop
Sep 17 19:39:59  systemd[1]: Walked on cycle path to named.service/stop
Sep 17 19:39:59  systemd[1]: Walked on cycle path to network.target/stop
Sep 17 19:39:59  systemd[1]: Walked on cycle path to NetworkManager.service/stop
Sep 17 19:39:59  systemd[1]: Walked on cycle path to udev-post.service/stop
Sep 17 19:39:59  systemd[1]: Walked on cycle path to cups.service/stop
Sep 17 19:39:59  systemd[1]: Unable to break cycle
Sep 17 19:39:59  systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.
Sep 17 19:39:59 arekh systemd[1]: Failed to enqueue ctrl-alt-del.target job: Transaction order is cyclic. See system logs for details.

Is this really a sufficient reason to stop shutdown? I had to use reset and risk data loss to restart the system.

cups-1.4.4-10.fc14.x86_64
NetworkManager-0.8.1-6.git20100831.fc14.x86_64
NetworkManager-glib-0.8.1-6.git20100831.fc14.x86_64
NetworkManager-gnome-0.8.1-6.git20100831.fc14.x86_64
systemd-gtk-10-1.fc14.x86_64
systemd-sysvinit-10-1.fc14.x86_64
systemd-units-10-1.fc14.x86_64
udev-161-2.fc14.x86_64

Comment 1 Michal Schmidt 2010-09-23 07:55:30 UTC

(In reply to comment #0)
> Is this really a sufficient reason to stop shutdown? I had to use reset and
> risk data loss to restart the system.

There's nothing special about shutdown or reboot in systemd. They're just another targets. BTW, "sync && reboot -f" would have been somewhat safer than a hard reset.

Having an ordering cycle is definitely a problem. I'll see if I can reproduce it.

Comment 2 Michal Schmidt 2010-09-23 08:02:19 UTC

I see you also filed bug 636757 where there is an ordering cycle involving named.
Am I right guessing that you had to work around the problem with named not starting during boot by starting it afterwards manually? That would surely cause the very same ordering cycle which originally prevented the start of named to show up during shutdown again.

Comment 3 Nicolas Mailhot 2010-09-23 08:13:21 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > Is this really a sufficient reason to stop shutdown? I had to use reset and
> > risk data loss to restart the system.
> 
> There's nothing special about shutdown or reboot in systemd. They're just
> another targets.

But they are not just *another target* functionnaly. shutdown/reboot are used
in emergency cases. If systemd can not honor them reliably, it will force
situations where data loss risk is real.

> BTW, "sync && reboot -f" would have been somewhat safer than a
> hard reset.

Thanks, but really, there never was a need for reboot -f on Fedora before, and
it is a regression if users/admins need to learn it now

> Having an ordering cycle is definitely a problem. I'll see if I can reproduce
> it.

I can provide more info if needed (for example the local named instance is the
only way for the box to do resolving, and the link to the internet is dhcp
managed by networkmanager *without* allowing the isp to set its own resolvers)

(In reply to comment #2)
> I see you also filed bug 636757 where there is an ordering cycle involving
> named.
> Am I right guessing that you had to work around the problem with named not
> starting during boot by starting it afterwards manually?

This bug is the startup part and yes I had to start named manually today (this part is new, the startup was broken differently last week, named was started, but not at the right time. Now it's not started at all. Either way network users are way unhappy)

> That would surely
> cause the very same ordering cycle which originally prevented the start of
> named to show up during shutdown again.

systemd needs to figure how to manage the services in that case. This is not a new setup, it has been running reliably Fedora for years, the new part is addition of systemd at F14 branch time.

Comment 4 Michal Schmidt 2010-09-23 10:49:07 UTC

(In reply to comment #3)
> Thanks, but really, there never was a need for reboot -f on Fedora before, and
> it is a regression if users/admins need to learn it now

I was not suggesting it as a common practice for admins. It was merely a friendly advice for you, in case you get into such a situation again.

I did acknowledge there was a bug in systemd. No need to convince me about it.

I have now reproduced the bug on Rawhide.

Comment 5 Nicolas Mailhot 2010-09-23 13:01:30 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > Thanks, but really, there never was a need for reboot -f on Fedora before, and
> > it is a regression if users/admins need to learn it now
> 
> I was not suggesting it as a common practice for admins. It was merely a
> friendly advice for you, in case you get into such a situation again.

Sorry, it came out bad. Should not respond to bugzilla while answering the phone

Is sync + reboot -f adding any security over sync + hard reset?

Comment 6 Michal Schmidt 2010-09-23 16:38:15 UTC

(In reply to comment #5)
> Is sync + reboot -f adding any security over sync + hard reset?

At least in my case the former allows the kernel to print "md: stopping all md devices" which gives me a soothing feeling when I have md RAID arrays :-)


Your dependency cycle is caused by NetworkManager.
Try disabling its SysV script:  chkconfig NetworkManager off
Note that NetworkManager should still be enabled using its systemd native unit. Do "systemctl enable NetworkManager.service" to make sure it is so.

Comment 7 Michal Schmidt 2010-09-23 16:42:05 UTC

*** Bug 636757 has been marked as a duplicate of this bug. ***

Comment 8 Nicolas Mailhot 2010-09-23 18:20:14 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > Is sync + reboot -f adding any security over sync + hard reset?
> 
> At least in my case the former allows the kernel to print "md: stopping all md
> devices" which gives me a soothing feeling when I have md RAID arrays :-)

That's a good point, md resync has been killing me

> Your dependency cycle is caused by NetworkManager.
> Try disabling its SysV script:  chkconfig NetworkManager off
> Note that NetworkManager should still be enabled using its systemd native unit.
> Do "systemctl enable NetworkManager.service" to make sure it is so.

That works (both on boot and reboot). However, now if I switch to upstart, the system is broken

Comment 9 Nicolas Mailhot 2010-09-24 17:31:13 UTC

(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #5)

> > Your dependency cycle is caused by NetworkManager.
> > Try disabling its SysV script:  chkconfig NetworkManager off
> > Note that NetworkManager should still be enabled using its systemd native unit.
> > Do "systemctl enable NetworkManager.service" to make sure it is so.
> 
> That works (both on boot and reboot). However, now if I switch to upstart, the
> system is broken

Actually, I was over hasty in replying. I see in maillog now

Sep 24 19:20:25 arekh postfix/postfix-script[1557]: starting the Postfix mail system
Sep 24 19:20:25 arekh postfix/master[1558]: fatal: bind 192.168.0.4 port 25: Cannot assign requested address

So networkmanager/bind ordering is still not ok wrt other network services. Postfix is started too early

Comment 10 Nicolas Mailhot 2010-09-24 17:33:52 UTC

Likewise for the bip service

24-09-2010 19:20:28 [freenode] Connecting user '' using server irc.freenode.net:7070
24-09-2010 19:20:28 ERROR: getaddrinfo(irc.freenode.net): Name or service not known
24-09-2010 19:20:28 [freenode] Cannot connect.
24-09-2010 19:20:28 ERROR: [freenode] reconnecting in 120 seconds
24-09-2010 19:20:28 [gimp] Connecting user '' using server irc.gimp.net:6667
24-09-2010 19:20:28 ERROR: getaddrinfo(irc.gimp.net): Name or service not known
24-09-2010 19:20:28 [gimp] Cannot connect.
24-09-2010 19:20:28 ERROR: [gimp] reconnecting in 120 seconds

Thanksfuly bip at least retries till the network is fixed

Comment 11 Matthias Clasen 2010-10-08 22:46:32 UTC

Moving systemd bugs to f15, since the systemd feature got delayed.

Comment 12 Lennart Poettering 2010-11-18 23:27:43 UTC

systemd in F15 is now able to fix shutdown transactions if they are cyclic. This should fix the original issue reported here.

Note You need to log in before you can comment on or make changes to this bug.