Red Hat Bugzilla – Bug 72043
glibc-2.2.90-24: system hangs during shutdown
Last modified: 2014-03-16 22:30:13 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020809
Description of problem:
Prevents system from unmounting "/usr" partition during shutdown. The system
needs a hard reset to reboot!
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Shut down system
2. Watch procedure continue until the system partitions are about to be unmounted.
Actual Results: System issues the following message:
"Unmounting file systems: umount2: Device or resource busy"
"umount: /dev/sda5 : not mounted"
"umount : /usr : Illegal seek"
The system ist stuck then.
Expected Results: The system should shut down normally.
The system is a PR440FX based dual Pentium Pro workstation with SCSI peripherals
only. The reported issue has never been observed in the past. Downgrading to
"glibc-2.2.90-17" from "Limbo 2"cures this problem.
*** Bug 71559 has been marked as a duplicate of this bug. ***
This works fine with -23. Presumably something to do with locale archives?
Yes, the question is what programs are running at umount /usr time with
If it is just some program started from halt script or something similar
that late, it should be easy enough to fix - export LOCPATH=/usr/lib/locale
(this means locales still work but locale-archive is not used).
If it is bash running halt script, we could add
if [ -z "$LOCPATH ]; then export LOCPATH=/usr/lib/locale; exec /etc/rc.d/init.d/half; fi
I'll try putting fuser -v /usr/lib/locale/locale-archive
before the umount commands in /etc/rc.d/init.d/halt
I don't have /usr mounted separately and no space to install that though...
I reproduced this without a separate /usr partition by moving
/usr/lib/locale/locale-archive to a small partition (presumably a -o loop
filesystem would work too) and replacing it with a symlink to that.
Shutdown now barfs on unmounting /foobar as reported for /usr.
I would not expect LOCPATH=/usr/lib/locale to behave any differently
because in that case it will mmap the individual files and have the
same issues with the filesystem. But obviously that didn't happen before.
The difference must be MAP_SHARED vs MAP_PRIVATE in the mmap of the archive.
I had it in mind that it didn't matter which under PROT_READ, but in fact the
kernel has to hold on to the file in case we ever did mprotect to PROT_WRITE.
I have checked in a fix to glibc mainline to use MAP_PRIVATE for the archive,
which should make it behave the same as mmap'ing the individual files has done.
Still doesn't work in -26, FWIW.
*** Bug 72949 has been marked as a duplicate of this bug. ***
Works ok with 2.2.91-1.
*** Bug 73152 has been marked as a duplicate of this bug. ***
when will 2.2.91-1 finally hit the public rawhide tree?
The same problem appears with glibc-2.2.93-5.
So this Bug really should not be considered closed.
This failure mode does indeed persist in 8.0.
I cannot see how it is glibc's fault, though.
It seems like the kernel's fault for not letting the filesystem
be unmounted when the only references to it are read-only mmap's
(the file descriptors are already closed). If the kernel is not
supposed to let you unmount the partition, then I think the halt
script needs to work around the fact that /usr may still be referenced.
I think it's trying to do that with NOLOCLAE=1 before /etc/init.d/functions.
Adding "unset LANG" at the top of /etc/init.d/halt fixes it for me.
I suspect that should be done in /etc/init.d/functions in the NOLOCALE case.
Probably this bug should be reopened and reassigned to initscripts.
The weird thing is that before locale-archive the individual LC_ files were mapped
exactly like that, r--p in /proc/<pid>/maps.
In fact, if I:
dd if=/dev/zero of=localefs bs=1024k count=100
echo y | mke2fs -m 0 localefs
mount -o loop localefs /mnt/floppy
cp -a /usr/lib/locale/en_US /usr/lib/locale/locale-archive /mnt/floppy
and on another vt
LC_ALL=en_US LOCPATH=/mnt/floppy /bin/sh
then /mnt/floppy cannot be umounted.
Which means I don't understand why this ever worked.
Concerning /etc/init.d/halt, /etc/rc.d/rc is already supposed to unset it:
if [ "$subsys" = "halt" -o "$subsys" = "reboot" ]; then
exec $i start
It would be obviously better to export LC_ALL=C, not unset those two vars,
so that even in presence of some other LC_ variable it doesn't use locale-archive
or locale files.
I could have sworn the "unset LANG" was what made it work, but I think
I was confused by something else at the time. I can no longer reproduce
this with a real partition for /usr/lib/locale. My tests using a loopback
filesystem turned out to be a red herring, because /etc/init.d/netfs would
try to unmount it and lose before it got to /etc/init.d/halt.
I think at this point someone other than me and Jakub should try to reproduce it
Setting LC_ALL=C in rc done in 7.03-1; this *should* solve the problem.
*** Bug 75700 has been marked as a duplicate of this bug. ***