Description of problem: I use ZFS, and others use ntfs-3g. When killall is executed, the ZFS/ntfs process is killed - fail. We need something like the sendsigs protocol in Ubuntu, for the killall5 command. This works by creating a /var/run/sendsigs.d directory where processes can dump their PID files, and killall5 is executed with additional -o <pid> arguments extracted from said files. The ubuntu bugs with the info and code is here: https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/87763 patch to killall5 to properly SIGCONT kilall5'ed omitted processes: https://bugs.launchpad.net/ubuntu/+source/sysvinit/+bug/151580 after everything has been finished, no files are open, and filesystems have been remounted readonly, then and only then it is safe to rerun killall5 -TERM with remaining processes. That would be at the very end of the shutdown process. How reproducible: Run root or /usr filesystem on FUSE. Reboot. Total catastrophe. Additional info: Here is a script that starts ZFS on my computer: -------------------------------------------- rudd-o@karen: /sbin $ cat zfsctl #! /bin/sh PIDFILE=/var/run/zfs-fuse.pid LOCKFILE=/var/lock/zfs/zfs_lock export PATH=/sbin:/bin unset LANG ulimit -v unlimited ulimit -c 512000 log_action_begin_msg() { true # echo $* } log_action_end_msg() { true # echo $* } do_start() { test -x /sbin/zfs-fuse || exit 0 PID=`cat "$PIDFILE" 2> /dev/null` if [ "$PID" != "" ] then if kill -0 $PID 2> /dev/null then echo "ZFS-FUSE is already running" exit 3 else # pid file is stale, we clean up shit log_action_begin_msg "Cleaning up stale ZFS-FUSE PID files" rm -f "$PIDFILE" # /var/run/sendsigs.omit.d/zfs-fuse log_action_end_msg 0 fi fi log_action_begin_msg "Starting ZFS-FUSE process" zfs-fuse -p "$PIDFILE" ES_TO_REPORT=$? if [ 0 = "$ES_TO_REPORT" ] then true else log_action_end_msg 1 "code $ES_TO_REPORT" exit 3 fi for a in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 do PID=`cat "$PIDFILE" 2> /dev/null` [ "$PID" != "" ] && break sleep 1 done if [ "$PID" = "" ] then log_action_end_msg 1 "ZFS-FUSE did not start or create $PIDFILE" exit 3 else log_action_end_msg 0 fi log_action_begin_msg "Immunizing ZFS-FUSE against OOM kills and sendsigs signals" # mkdir -p /var/run/sendsigs.omit.d # cp "$PIDFILE" /var/run/sendsigs.omit.d/zfs-fuse echo -17 > "/proc/$PID/oom_adj" ES_TO_REPORT=$? if [ 0 = "$ES_TO_REPORT" ] then log_action_end_msg 0 else log_action_end_msg 1 "code $ES_TO_REPORT" exit 3 fi log_action_begin_msg "Mounting ZFS filesystems" sleep 1 zfs mount -a ES_TO_REPORT=$? if [ 0 = "$ES_TO_REPORT" ] then log_action_end_msg 0 else log_action_end_msg 1 "code $ES_TO_REPORT" #echo "Dropping into a shell for debugging. Post_mountall pending." #bash #post_mountall exit 3 fi if [ -x /usr/bin/renice ] ; then log_action_begin_msg "Increasing ZFS-FUSE priority" /usr/bin/renice -15 -g $PID > /dev/null ES_TO_REPORT=$? if [ 0 = "$ES_TO_REPORT" ] then log_action_end_msg 0 else log_action_end_msg 1 "code $ES_TO_REPORT" exit 3 fi true fi } do_stop () { test -x /sbin/zfs-fuse || exit 0 PID=`cat "$PIDFILE" 2> /dev/null` if [ "$PID" = "" ] ; then # no pid file, we exit exit 0 elif kill -0 $PID 2> /dev/null; then # pid file and killable, we continue true else # pid file is stale, we clean up shit log_action_begin_msg "Cleaning up stale ZFS-FUSE PID files" rm -f "$PIDFILE" # /var/run/sendsigs.omit.d/zfs-fuse log_action_end_msg 0 exit 0 fi log_action_begin_msg "Syncing disks" sync log_action_end_msg 0 log_action_begin_msg "Unmounting ZFS filesystems" zfs unmount -a ES_TO_REPORT=$? if [ 0 = "$ES_TO_REPORT" ] then log_action_end_msg 0 else log_action_end_msg 1 "code $ES_TO_REPORT" exit 3 fi log_action_begin_msg "Terminating ZFS-FUSE process gracefully" kill -TERM $PID for a in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 do kill -0 $PID 2> /dev/null [ "$?" != "0" ] && break sleep 1 done if kill -0 $PID 2> /dev/null then log_action_end_msg 1 "ZFS-FUSE refused to die after 15 seconds" exit 3 else rm -f "$PIDFILE" # /var/run/sendsigs.omit.d/zfs-fuse log_action_end_msg 0 fi log_action_begin_msg "Syncing disks again" sync log_action_end_msg 0 } case "$1" in start) do_start ;; stop) do_stop ;; status) PID=`cat "$PIDFILE" 2> /dev/null` if [ "$PID" = "" ] ; then echo "ZFS-FUSE is not running" exit 3 else if kill -0 $PID then echo "ZFS-FUSE is running, pid $PID" zpool status exit 0 else echo "ZFS-FUSE died, PID files stale" exit 3 fi fi ;; restart|reload|force-reload) echo "Error: argument '$1' not supported" >&2 exit 3 ;; *) echo "Usage: $0 start|stop|status" >&2 exit 3 ;; esac : -------------------------------- And here is how I invoke it (since it needs to be started early in the process, I run it on rc.sysinit after other filesystems have been mounted). This is a snippet from rc.sysinit, look at the line after the comment #Rudd-O: -------------------------------- # Enter mounted filesystems into /etc/mtab mount -f / mount -f /proc >/dev/null 2>&1 mount -f /sys >/dev/null 2>&1 mount -f /dev/pts >/dev/null 2>&1 mount -f /proc/bus/usb >/dev/null 2>&1 # Mount all other filesystems (except for NFS and /proc, which is already # mounted). Contrary to standard usage, # filesystems are NOT unmounted in single user mode. action $"Mounting local filesystems: " mount -a -t nonfs,nfs4,smbfs,ncpfs,cifs,gfs,gfs2 -O no_netdev # Rudd-O action $"Mounting ZFS filesystems: " /sbin/zfsctl start # Update quotas if necessary if [ X"$_RUN_QUOTACHECK" = X1 -a -x /sbin/quotacheck ]; then action $"Checking local filesystem quotas: " /sbin/quotacheck -anug fi if [ -x /sbin/quotaon ]; then action $"Enabling local filesystem quotas: " /sbin/quotaon -aug fi # Check to see if a full relabel is needed if [ -n "$SELINUX_STATE" -a "$READONLY" != "yes" ]; then if [ -f /.autorelabel ] || strstr "$cmdline" autorelabel ; then relabel_selinux fi else --------------------------------- So basically in /etc/rc6.d/S01reboot you may need to put something like "/sbin/zfsctl stop" somewhere here, conditioned, of course, to the fact that /sbin/zfsctl is not on a FUSE filesystem: ------------------------------ # First, try kexec. If that fails, fall back to rebooting the old way. [ -n "$kexec_command" ] && $kexec_command -e -x >& /dev/null HALTARGS="-d" [ -f /poweroff -o ! -f /halt ] && HALTARGS="$HALTARGS -p" exec $command $HALTARGS ----------------------------------
Could you please instead of whole script just make shortest possible testcase which fails for you?
No, no testcase. You'll read this text of this bug and the launchpad bugs too, and then you'll understand why asking for a testcase is dumb.
Rudd-O. Does this only apple when the partition is '/' or '/usr', or does it happen with other less critical partitions: e.g. /mnt/disk. I really doubt that having ntfs-3g or zfs as '/' or '/usr' is recommended or supported.
> I really doubt that having ntfs-3g or zfs as '/' or '/usr' is recommended or supported. Well that's the point, to MAKE IT supported. Maybe not at install time, but having the base OS work fine if booted from an userspace filesystem is the WHOLE POINT of this. The Ubuntu guys have done magistral strides in that direction, and my patch posted in launchpad completes the circle (otherwise userspace filesystems hang on killall5), so it's worthwhile to draft those improevments into Fedora as well.
OK, leaving this to maintainers of this component, what they want to do. Nothing to do for bug triaging.
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This can probably be closed now because of the new systemd advancements in processes started by initramfs that only get killed after the final pivot_root.