576662 – unclean shutdown from shutdown button

Bug 576662 - unclean shutdown from shutdown button

Summary: unclean shutdown from shutdown button

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	upstart
Sub Component:
Version:	13
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Casey Dahlin
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	588929
TreeView+	depends on / blocked

Reported:	2010-03-24 18:25 UTC by Tom Horsley
Modified:	2014-06-18 08:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:	upstart-0.6.5-5.fc13
Clone Of:
Clones:	588929 (view as bug list)
Environment:
Last Closed:	2010-05-06 06:56:44 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
re-exec on SIGTERM (1.95 KB, patch) 2010-05-01 01:54 UTC, Bill Nottingham	no flags	Details \| Diff
View All

Description Tom Horsley 2010-03-24 18:25:02 UTC

Description of problem:

I'm going back and forth between my main fedora 12 partition and my
fedora 13 alpha partition, rebooting to test f13 from time to time.

Several times now when I have used the normal shutdown button from
the gnome menus while logged into a fedora 13 gnome session, when I
boot back to fedora 12, it says "recovering journal" on the fedora 13
partition, and has to clean up 1 or 2 orphan inodes.

If the standard shutdown button is supposed to shutdown cleanly, it
doesn't appear to be doing so.

Version-Release number of selected component (if applicable):
upstart-0.6.5-3.fc13.x86_64
kernel-2.6.33-1.fc13.x86_64

How reproducible:
Definitely somewhat random, I have shutdown from f13 and come back to f12
with no journal problems.

Steps to Reproduce:
1.see above
2.
3.
  
Actual results:
recovering journal

Expected results:
clean shutdown

Additional info:
I have rhgb turned off, so I can watch the messages during boot and shutdown,
and I can't remember seeing anything unusual during shutdown (though the
messages don't stay on the screen long :-).

Since this is f13, I'm often doing updates just before rebooting. I sometimes
wonder if updates can install changes incompatible with what was there
when I booted, and cause an unclean shutdown just that one time.

I submit this against upstart just as a guess for a good component to
use in the bug report.

Comment 1 Tom Horsley 2010-03-24 18:45:18 UTC

There is an associated thread in the test list on this as well:

http://lists.fedoraproject.org/pipermail/test/2010-March/089600.html

Comment 2 Andre Robatino 2010-03-24 19:00:19 UTC

Running in runlevel 3, shutting down with the halt command, I sometimes (maybe 10-20% of the time) see

mount: / is busy

just before power down.  On the next reboot there are often a few orphan inodes.  It's been happening for quite a while.

Comment 3 Bill Nottingham 2010-03-24 20:00:52 UTC

If you stuff a 'lsof' in /etc/init.d/halt near the end, what does it say is staying open?

What sort of partitioning do you have?

Comment 4 Andre Robatino 2010-03-24 21:00:03 UTC

I saw this again in Rawhide after applying today's updates (I think it happens in both Rawhide and F13, and may even have happened before F12 Final).  It seems to be strongly correlated with applying a lot of updates.  Where exactly should the lsof go - before the $kexec_command attempt, or before exec $command?  And will the list be short enough that it's actually readable within the second or so that the message appears before power down?

My partitioning is the simplest possible - I'm using VirtualBox, and when installing I told it to use the entire drive.

Comment 5 Bill Nottingham 2010-03-24 21:11:19 UTC

Either place is fine; it's debatable how long it will take. You can halt without powering off if you just want to leave the messages up.

An interesting thing to test would be if you can duplicate it by upgrading glibc while the system is up, and not duplicate it otherwise.

Comment 6 Andre Robatino 2010-03-24 21:40:09 UTC

I included the lsof just before the $kexec_command attempt.  Upon running "halt -f", there was just one line printed saying "System halted.", and no apparent output from lsof.  (This seems to be the only way to avoid the guest window closing - without "-f", it powers off and closes the window.)  I tried to downgrade glibc but it said there was no downgrade available.  I'll use "halt -f" from now on in both F13 and Rawhide and see if anything appears.

Comment 7 Tom Horsley 2010-03-24 22:34:23 UTC

Meanwhile, I have rebooted my f13 partition several times
recently while fiddling with configuring bridge networking, virtualization,
and wot-not, and have not seen the journal errors. I also have not done
any updates before any of these shutdowns, so maybe this is something that
only happens due to certain updates.

Comment 8 Andre Robatino 2010-03-24 23:05:19 UTC

I don't believe "halt -f" is invoking /etc/init.d/halt at all, since nothing is ever printed except "System halted.".  I can use regular halt, and pause the guest when the lsof messages appear.  There are many pages of them with no way to view them all.  I can reproduce the "mount: / is busy" message by either "yum downgrade glibc\*" and "halt", or "yum update glibc\*" and "halt".  It always prints

Unmounting pipe file systems:
Unmounting file systems:
mount: / is busy
Halting system...

after I removed the lsof from the halt script.

Comment 9 Bill Nottingham 2010-03-25 00:50:18 UTC

Andre - can you reproduce it when a glibc upgrade (or downgrade) is *not* involved?

Comment 10 Andre Robatino 2010-03-25 00:56:56 UTC

Well, I just did today's batch of F13 updates, not including glibc, and there was no such message.  I'll have to watch it for the next few weeks (and Rawhide as well).

Comment 11 Andre Robatino 2010-03-25 01:48:06 UTC

I just did a text install of x86_64 Beta.TC1 in VirtualBox.  Then I enabled the network, installed yum-presto, and updated everything except glibc\*, then halted.  The "mount: / is busy" warning appeared again, so it can appear without a glibc update.  Then I booted again, and updated glibc\*, then halted, and saw the warning again.  So it can happen without a glibc version change, but seems to happen reliably with a glibc change.

Comment 12 Andre Robatino 2010-03-25 03:16:06 UTC

Further checking shows that updates or downgrades in either dbus-libs or glibc\* trigger the problem, no other packages in the minimal install of Beta.TC1 seem to be involved.

Comment 13 Felix Miata 2010-03-25 07:20:26 UTC

FWIW, this is not unique to Fedora. I have a lot of multiboot systems. It happens to me with Mandriva and openSUSE as well, not only after updating, but sometimes after booting a distro that has since had its / mounted by some other distro.

Comment 14 Bill Nottingham 2010-03-25 13:44:47 UTC

OK, so earlier in /etc/init.d/halt we have:

# Tell init to re-exec itself.
kill -TERM 1

This is supposed to make upstart's init re-exec itself against the newly upgraded libdbus and libc, so it's not holding open inodes. Obviously, it's not working. This will require more serious debugging. Thanks for helping narrow this down!

Comment 15 Tom Horsley 2010-03-26 20:14:05 UTC

I just got this same error rebooting fedora 12, and I notice in the last
batch of fedora 12 updates was an update to upstart itself, so I guess that's
another thing that can cause a problem in addition to the libraries.

Comment 16 Andre Robatino 2010-03-28 14:36:10 UTC

After applying today's Rawhide updates including gcc and related packages, I saw the problem again, so gcc is probably another package that triggers it.  (I wasn't able to do a gcc downgrade to be sure.)

Comment 17 Bill Nottingham 2010-04-30 21:25:24 UTC

This is because upstream dropped the SIGTERM handler.

Comment 18 Bill Nottingham 2010-05-01 01:54:10 UTC

Created attachment 410629 [details]
re-exec on SIGTERM

Here's a forward-ported patch from 0.3.x to re-exec on SIGTERM.

It does not attempt to save state. Passes very minimal testing.

Comment 19 Fedora Update System 2010-05-04 20:38:59 UTC

upstart-0.6.5-5.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/upstart-0.6.5-5.fc13

Comment 20 Fedora Update System 2010-05-05 07:23:20 UTC

upstart-0.6.5-5.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update upstart'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/upstart-0.6.5-5.fc13

Comment 21 Fedora Update System 2010-05-06 06:56:35 UTC

upstart-0.6.5-5.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.