Bug 753335

Summary:

mdadm starts resync on imsm raid even in Normal state

Product:

[Fedora] Fedora

Reporter:

Peter Bieringer <pb>

Component:

mdadm

Assignee:

Jes Sorensen <Jes.Sorensen>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

CC:

agk, dledford, harald, Jes.Sorensen, lukasz.dorau, madasafan, mads, mbroz, ncjeffgus, rhbug, zadeluca

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

748986

Environment:

Last Closed:

2012-10-27 13:38:43 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
halt, unfinished sync	none
halt after sync finished - but result in start of resync after pushing reset	none
patch for /usr/lib/dracut/modules.d/90mdraid/md-shutdown.sh	none

Description Peter Bieringer 2011-11-11 22:09:57 UTC

Version-Release number of selected component (if applicable):

FC16:

kernel-PAE-3.1.0-7.fc16.i686
mdadm-3.2.2-13.fc16.i686


+++ This bug was initially created as a clone of Bug #748986 +++

Description of problem:
mdadm starts resync on imsm raid even BIOS reports "Normal" state (this state was the result of a successful sync until last poweroff.

Version-Release number of selected component (if applicable):
kernel-PAE-2.6.40.6-0.fc15.i686
mdadm-3.2.2-12.fc15.i686

(initramfs created using shown mdadm version)

How reproducible:
Always


Steps to Reproduce:
1. boot
2. let resync finish
3. poweroff
4. poweron
  
Actual results:
Resync starts again


Expected results:

# cat /proc/mdstat 
Personalities : [raid1] 
md127 : active raid1 sda[1] sdb[0]
      156288000 blocks super external:/md0/0 [2/2] [UU]
      [>....................]  resync =  4.9% (7779584/156288132) finish=64.6min speed=38294K/sec
      
md0 : inactive sdb[1](S) sda[0](S)
      5544 blocks super external:imsm
       

Additional info:

related dmesg report:

[    3.311658] dracut: Autoassembling MD Raid
[    3.324529] md: md0 stopped.
[    3.326070] md: bind<sda>
[    3.326143] md: bind<sdb>
[    3.326280] dracut: mdadm: Container /dev/md0 has been assembled with 2 drives
[    3.334575] md: md127 stopped.
[    3.334724] md: bind<sdb>
[    3.337308] md: bind<sda>
[    3.341079] md: raid1 personality registered for level 1
[    3.343813] bio: create slab <bio-1> at 1
[    3.346517] md/raid1:md127: not clean -- starting background reconstruction
[    3.349168] md/raid1:md127: active with 2 out of 2 mirrors
[    3.351725] md127: detected capacity change from 0 to 160038912000
[    3.355530]  md127: p1 p2 p3 p4 < p5 p6 p7 p8 >
[    3.482773] md: md127 switched to read-write mode.
[    3.485360] md: resync of RAID array md127
[    3.487793] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[    3.490229] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[    3.492691] md: using 128k window, over a total of 156288132k.
[    3.496305] dracut: mdadm: Started /dev/md127 with 2 devices

Is there any debug information available how to get more information why mdadm is detecting that md127 is not clean while this was reported as "Normal" in the BIOS?

--- Additional comment from Jes.Sorensen on 2011-10-26 07:08:57 EDT ---

Does this happen both if you are using the RAID for booting from and/or
if you have the RAID as a secondary device and boot off a regular partition?

--- Additional comment from pb on 2011-10-26 14:48:03 EDT ---

The RAID is my primary device, I can't easily configure it as second device...no other bootable disks in there.

But it looks like it has to do with the shutdown procedure.

Yesterday, I need to poweroff via Alt-Sysrq (hanging during "unmount", probably related to autofs problem). Now booting today (BIOS-Status: Normal), no resync was triggered. Rebooting now again will result in resync (array detected as "not clean").

My system is too fast to follow last messages before booting, but I thought I have seen on Alt-Sysrq-Poweroff something like "raid status stored", while on reboot, nothing such was seen.

--- Additional comment from Jes.Sorensen on 2011-10-26 15:11:05 EDT ---

Interesting - I was wondering if it was related to shutdown. The problem with
BIOS RAID is that the kernel cannot do the full shutdown on it's own, so if
something is hanging and preventing mdadm/mdmon from doing their job, it could
result in some of the RAID metadata not being written out as it should.

--- Additional comment from pb on 2011-10-26 15:53:33 EDT ---

So, next normal shutdown triggered and power-on again -> resync starts again.

For me it looks like only Alt-Sysrq-Poweroff will leave a synced RAID in proper state for next power-on.

Normal power-off or reboot won't. Any suggestions for debugging this deeper?

Comment 1 Jes Sorensen 2011-12-07 15:24:01 UTC

Jeff,

You mentioned in the F15 version of this bug that you are also seeing this
problem. Could you provide details on your system please? kernel version,
how many drives, /proc/mdstat output?

Thanks,
Jes

Comment 2 Jeff Gustafson 2011-12-07 22:12:30 UTC

Jes Sorensen,

I initially had the problem with F15. I decided to bump to F16. The BIOS screen was showing "Verify" for the drives. I let the drives re-sync in Linux and if I reboot the BIOS show "Normal". Unfortunately once Linux reboots it insists on resyncing the drives. If I reboot in the middle of the re-sync the BIOS goes bake to "Verify". 

I am running the current versions of everything. The exception is mdadm which is from the Fedora testing repo.

I found a handful of issues in Bugzilla that I think are the same as this bug. They all involve the isw driver.

Comment 3 Jes Sorensen 2011-12-14 12:02:20 UTC

Jeff,

Sorry for being dense here, but which driver are you referring to as the
'isw' driver?

Thanks,
Jes

Comment 4 Jeff Gustafson 2011-12-15 05:37:45 UTC

heh... my mistake. I meant imsm. Isn't isw is a wireless driver!? ;)
BTW, the dracut in testing has make the situation bearable since the system doesn't get stuck on boot waiting for the RAID to become clean. The question is still why the RAID gets marked "Verify" in the first place.

Comment 5 Jes Sorensen 2011-12-15 10:37:30 UTC

Jeff,

No worries, I am still trying to recover from the flu here so my brain is a
bit slow :)

The fact that the drives get marked verify makes me think something isn't
being shut down correctly, so the metadata isn't written out to disk before
the reboot.

Why this is, I really don't know.

One thing that would be interesting to track down is whether this only
happens for people who upgraded from F15, or if it also happens with fresh
F16 installs - I haven't been able to reproduce it here myself so far.

Cheers,
Jes

Comment 6 Jeff Gustafson 2011-12-15 23:16:38 UTC

Jes,
If I let Linux finish syncing the disks and then shutdown, the BIOS will show "Normal". As soon as Linux boots, the drives will start syncing again. If I reboot in the middle of the process, the BIOS will show "Verify" which is the correct status since I interrupted the syncing.

I hope that helps.

Comment 7 Doug Ledford 2011-12-16 02:47:35 UTC

This is interesting: that the BIOS shows "Normal" and linux starts a resync anyway. The reason your problem is baffling is because of this.

So, let's see if we have the various conditions right:

1) Reboot using alt-sysrq (implies unclean reboot as no shutdown, but if the disks were quiet before the alt-sysrq was sent for more than about 5 seconds, then it also implies that mdmon would have already written out the clean bits and so it actually would be a clean shutdown as far as the BIOS IMSM RAID code is concerned), BIOS shows Normal, no resync when linux comes up active.

2) Reboot normally when the array is clean, the shutdown procedure should trigger mdmon to write out that the device is clean, BIOS reads Normal which seems to imply that that mdmon did its job, but we start a resync once mdmon comes up on the next reboot.

3) Reboot normally when the array is busy resyncing, shutdown procedure checkmarks the array resync state, BIOS detects the resync in progress and prints Verify, mdmon restarts the resync once the device is brought up under linux.

#1 and #3 are correct behaviors. The only one that isn't is #2. And so in order to debug this further, what we really need is a basic dump of the IMSM superblock from between when the system was shut down and prior to the next boot up, which means booting off of CD/DVD and then using mdadm to get a dump of the superblock (I think the best we can do right now is mdadm -E /dev/sda and put the output in this bug report, although I hear one possible new feature of an upcoming mdadm is the ability to dump raw superblocks to a file and then restore them from file as well). That information should help shed some light on this issue.

Comment 8 Peter Bieringer 2011-12-18 15:59:34 UTC

Have extended initramfs with some scriptlets and found at least the reason for the resync:

scriptlet 1 shows superblock before RAID start:

# cat lib/dracut/hooks/pre-trigger/90mdraid-superblock.sh 
#!/bin/sh
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

info "Show RAID superblock of /dev/sda"
info "$(mdadm -E /dev/sda)"
info "Show RAID superblock of /dev/sdb"
info "$(mdadm -E /dev/sdb)"
info "Sleep 40 seconds"
sleep 40


scriptlet 2 shows raid status:

# cat lib/dracut/hooks/pre-mount/10mdraid-info.sh 
#!/bin/sh
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

info "/proc/mdstat information (sleep 20 seconds afterwards):"
info "$(cat /proc/mdstat)"
sleep 20


Analyis:

1. booted into runlevel 1
2. wait for resync finished
3. check mdadm -E /dev/sda and mdadm -E /dev/sdb
     Migrate State: idle
     Dirty State: clean (but sometimes also "dirty" for some seconds)
4. reboot (which hangs because of BZ#752593) finally with ALT-SYSRQ (sync, umount, boot)
5. reboot again in runlevel 1
=> everything is fine
6. reboot again (with ALT-SYSRQ) in runlevel 5
=> everything is fine
7. reboot again (triggered via GUI, works without ALT-SYSRQ)
=> RAID syncs again, "Migrate State" is now in "repair" state

scriptlet 1 shows, that "Dirty State" of both drives is "dirty", while "Migrate State" is "idle"

So it looks like "Migrate State" = information shown by BIOS, which means here "Normal"

But if "Dirty State" is "dirty", the resync starts and set "Migrate State" to "repair".

Can it be that the shutdown triggered from runlevel 5 is too fast and leave the RAID superblock in "dirty" state?

Any hints where I can add a "sleep" to create some idle time or a check until "Dirty State" is "clean" before finally reboot?

Here the output of my proper sync'ed RAID:

/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.1.00
    Orig Family : 92633fce
         Family : 92633fce
     Generation : 00041339
     Attributes : All supported
           UUID : ee***
       Checksum : 97b844f6 correct
    MPB Sectors : 1
          Disks : 2
   RAID Devices : 1

  Disk00 Serial : WD-WCANM2406979
          State : active
             Id : 00000000
    Usable Size : 312576264 (149.05 GiB 160.04 GB)


[OS]:
           UUID : 6b****
     RAID Level : 1
        Members : 2
          Slots : [UU]
    Failed disk : none
      This Slot : 0
     Array Size : 312576000 (149.05 GiB 160.04 GB)
   Per Dev Size : 312576264 (149.05 GiB 160.04 GB)
  Sector Offset : 0
    Num Stripes : 1221000
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : normal
    Dirty State : clean

Comment 9 Jes Sorensen 2012-02-17 13:27:43 UTC

Peter,

Sorry for letting this slip through the cracks. The fact that you hit
this problem during normal reboot, makes me think it could be related
to the unrolling back to the initramfs bug that we have discussed in
other BZs. It would be interesting to see if this issue goes away with
the fixes we pushed into rawhide/f17 for that.

I will try to see if I can do a test install to try those bits out
soonish.

Cheers,
Jes

Comment 10 Jes Sorensen 2012-03-28 15:59:40 UTC

Peter,

Harald has pushed dracut-013-22.fc16 into updates testing. I am hopeful
it should make this problem go away combined with the latest mdadm and
systemd packages.

https://admin.fedoraproject.org/updates/dracut-013-22.fc16

Please give it a spin if you can.

Cheers,
Jes

Comment 11 Peter Bieringer 2012-04-15 20:07:59 UTC

changing dracut from dracut-017-1.fc17 to dracut-013-22.fc16 (but did not run tests after update) and a normal kernel update to 3.3.1-5.fc16.i686.PAE the problem reappears.

crossgrade to dracut-017-1.fc17 again and update ramdisk using "dracut --force" problem disappears afer next reboot.

=> dracut-013-22.fc16 did not solve the issue

Comment 12 Peter Bieringer 2012-04-24 18:51:52 UTC

Downgrade to dracut-017-1.fc17 did not really help anymore, also I've tried dracut-017-62.git20120322.fc17.

I can't remember what I have done since begin of April (kernel update to 3.3.0-8) and April, 7 (kernel update to 3.3.1-2), but since then the always-resync on boot strikes back.

At least no updates via yum were done.

Today's scenario:

systemd-37-19.fc16.i686
mdadm-3.2.3-6.fc16.i686
dracut-017-1.fc17
kernel update to 3.3.2-6.fc16.i686.PAE

wait until sync is finished -> reboot -> resync started again because of "md/raid1:md127: not clean -- starting background reconstruction" (which is wrongly detected because BIOS still reports "NORMAL").

Any new hints available?

Comment 13 Jes Sorensen 2012-04-25 15:24:11 UTC

Peter,

This is really odd - are you sure you're booting with the right newly
updated initramfs?

What mdmon look like if you do 'ps -aux|grep mdmon' post boot? It should
have a @ in the process name.

Cheers,
Jes

Comment 14 Peter Bieringer 2012-04-25 17:14:38 UTC

Yes, initramfs is freshly built:

# rpm -qa --last |more
kernel-tools-3.3.2-6.fc16                     Di 24 Apr 2012 20:07:25 CEST
ruby-libs-1.8.7.358-1.fc16                    Di 24 Apr 2012 20:07:24 CEST
kernel-PAE-3.3.2-6.fc16                       Di 24 Apr 2012 20:07:19 CEST
sssd-1.8.2-10.fc16                            Di 24 Apr 2012 20:07:10 CEST
libpng-devel-1.2.49-1.fc16                    Di 24 Apr 2012 20:07:09 CEST
sssd-client-1.8.2-10.fc16                     Di 24 Apr 2012 20:07:08 CEST
libpng-1.2.49-1.fc16                          Di 24 Apr 2012 20:07:08 CEST
libipa_hbac-1.8.2-10.fc16                     Di 24 Apr 2012 20:07:07 CEST
kernel-headers-3.3.2-6.fc16                   Di 24 Apr 2012 20:07:06 CEST
kernel-PAE-devel-3.3.2-6.fc16                 Di 24 Apr 2012 20:04:08 CEST
yum-plugin-fastestmirror-1.1.31-2.fc16        Di 24 Apr 2012 20:00:50 CEST
dracut-017-1.fc17                             Di 24 Apr 2012 19:04:49 CEST
mdadm-3.2.3-6.fc16                            Di 24 Apr 2012 18:58:20 CEST
hardlink-1.0-12.fc16                          Mo 23 Apr 2012 21:50:27 CEST
systemd-sysv-37-19.fc16                       So 22 Apr 2012 22:16:19 CEST
systemd-37-19.fc16                            So 22 Apr 2012 22:16:18 CEST
systemd-units-37-19.fc16                      So 22 Apr 2012 22:16:16 CEST

# ll /boot/initramfs-3.3.2-6.fc16.i686.PAE.img 
-rw-r--r-- 1 root root 16977314 24. Apr 20:08 /boot/initramfs-3.3.2-6.fc16.i686.PAE.img


And ps -aux|grep mdmon shows nothing, but

# ps axwww|grep md
   47 ?        S<     0:00 [md]
   57 ?        SN     0:00 [ksmd]
  432 ?        S      0:02 [md127_raid1]
  436 ?        SLsl   0:00 @dmon --offroot md0
  445 ?        D      0:01 [md127_resync]
  623 ?        S      0:00 [jbd2/md127p1-8]
  699 ?        Ss     0:00 /lib/systemd/systemd-stdout-syslog-bridge
 1212 ?        S      0:00 [md1_raid1]
 1338 ?        S      0:00 [jbd2/md1-8]
 1358 ?        S      0:00 [jbd2/md127p8-8]
 1434 ?        Ss     0:00 /lib/systemd/systemd-logind
 1458 ?        Ss     0:00 /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation
 2598 pts/0    S+     0:00 grep --color=auto md

=> mdmon is shown as @dmon

Comment 15 Harald Hoyer 2012-04-26 11:15:08 UTC

(In reply to comment #14)
> Yes, initramfs is freshly built:

Do you see at shutdown:

Unmounted ....
Unmounted ....
Unmounted /oldroot/sys
Unmounted /oldroot/dev
Unmounted /oldroot

???

You might have to disable plymouth to see the console output:

# for i in halt kexec poweroff reboot; do systemctl mask plymouth-${i}.serice;done

Comment 16 Harald Hoyer 2012-04-26 11:16:17 UTC

(In reply to comment #15)
> (In reply to comment #14)
> > Yes, initramfs is freshly built:
> 
> Do you see at shutdown:
> 
> Unmounted ....
> Unmounted ....
> Unmounted /oldroot/sys
> Unmounted /oldroot/dev
> Unmounted /oldroot
> 
> ???
> 
> You might have to disable plymouth to see the console output:
> 
> # for i in halt kexec poweroff reboot; do systemctl mask
> plymouth-${i}.serice;done

# for i in halt kexec poweroff reboot; do \
     systemctl mask plymouth-${i}.service; \
 done

Comment 17 Peter Bieringer 2012-04-26 18:54:07 UTC

Wait until resync finished.

Used now "halt" instead of "poweroff" to eye-copy last lines to here:

Umounting file systems
Unmounted /sys/fs/fuse/connections
Unmounted /proc/fs/nfsd
Unmounted /var/lib/nfs/rpc_pipefs
Unmounted /sys/kernel/debug
Unmounted /dev/mqueue
Unmounted /sys/kernel/security
Unmounted /dev/hugepages
[ time] EXT4-fs (md127p1): re-mounted. Opts: (null)
Disabling swaps
Detaching loop devices
Detaching DM devices
[ time] sd .... Synchronizing SCSI cache
[ time] sd .... Stopping disk

So I do not see any "Unmount /oldroot..." at all currently

Hit reset button, BIOS reports "Normal"

System comes up, starts resync again.

Use "reboot" to see any differences: nothing except that the RAID sync checkpoint is somehow stored.

After this reboot, BIOS reports "VERIFY" (expected)

Comment 18 Harald Hoyer 2012-04-27 09:30:39 UTC

(In reply to comment #17)
> Wait until resync finished.
> 
> Used now "halt" instead of "poweroff" to eye-copy last lines to here:
> 
> Umounting file systems
> Unmounted /sys/fs/fuse/connections
> Unmounted /proc/fs/nfsd
> Unmounted /var/lib/nfs/rpc_pipefs
> Unmounted /sys/kernel/debug
> Unmounted /dev/mqueue
> Unmounted /sys/kernel/security
> Unmounted /dev/hugepages
> [ time] EXT4-fs (md127p1): re-mounted. Opts: (null)
> Disabling swaps
> Detaching loop devices
> Detaching DM devices
> [ time] sd .... Synchronizing SCSI cache
> [ time] sd .... Stopping disk
> 
> So I do not see any "Unmount /oldroot..." at all currently

oh... what's the output of:

# ls -l /run/initramfs/shutdown
# for i in /etc/dracut.conf /etc/dracut.conf.d/*; do \
     echo; echo $i; echo; cat $i; \
  done

Comment 19 Harald Hoyer 2012-04-27 09:31:50 UTC

and
# ls -l /lib/systemd/system/shutdown.target.wants/

Comment 20 Peter Bieringer 2012-04-29 09:31:52 UTC

# LC_ALL=C ls -l /run/initramfs/shutdown
ls: cannot access /run/initramfs/shutdown: No such file or directory
 

# LC_ALL=C ls -l /run/initramfs/
total 4
-rw-r--r-- 1 root root 2 Apr 29  2012 root-fsck


# ls -l /lib/systemd/system/shutdown.target.wants/
insgesamt 4
-rw-r--r-- 1 root root 207 31. Jan 09:53 alsa-store.service
lrwxrwxrwx 1 root root  35 22. Apr 22:16 systemd-random-seed-save.service -> ../systemd-random-seed-save.service
lrwxrwxrwx 1 root root  39 22. Apr 22:16 systemd-update-utmp-shutdown.service -> ../systemd-update-utmp-shutdown.service


# rpm -qf /etc/dracut.conf
dracut-017-1.fc17.noarch


# rpm -V dracut
(no result)


# for i in /etc/dracut.conf /etc/dracut.conf.d/*; do      echo; echo $i; echo; cat $i;   done

/etc/dracut.conf

# Sample dracut config file

logfile=/var/log/dracut.log
fileloglvl=6

# Exact list of dracut modules to use.  Modules not listed here are not going
# to be included.  If you only want to add some optional modules use
# add_dracutmodules option instead.
#dracutmodules+=""

# Dracut modules to omit
#omit_dracutmodules+=""

# Dracut modules to add to the default
#add_dracutmodules+=""

# additional kernel modules to the default
#add_drivers+=""

# list of kernel filesystem modules to be included in the generic initramfs
#filesystems+=""

# build initrd only to boot current hardware
#hostonly="yes"
#

# install local /etc/mdadm.conf
mdadmconf="yes"

# install local /etc/lvm/lvm.conf
lvmconf="yes"

# A list of fsck tools to install. If it's not specified, module's hardcoded
# default is used, currently: "umount mount /sbin/fsck* xfs_db xfs_check
# xfs_repair e2fsck jfs_fsck reiserfsck btrfsck". The installation is
# opportunistic, so non-existing tools are just ignored.
#fscks=""

# inhibit installation of any fsck tools
#nofscks="yes"

/etc/dracut.conf.d/01-dist.conf

# Dracut config file customized for RedHat/Fedora.

# i18n
i18n_vars="/etc/sysconfig/keyboard:KEYTABLE-KEYMAP /etc/sysconfig/i18n:SYSFONT-FONT,FONTACM-FONT_MAP,FONT_UNIMAP"
add_dracutmodules+=" rpmversion "
omit_dracutmodules+=" dash "
omit_drivers+=" .*/fs/ocfs/.* "
stdloglvl=3
realinitpath="/usr/lib/systemd/systemd"
install_items+=" vi /etc/virc ps grep cat rm openvt "

/etc/dracut.conf.d/1

#!/bin/bash

#set -x

if [ -z "$1" ]; then
	echo "ERROR : missing version (arg1)"
	exit 1
fi

version="$1"

initramfs="/boot/initramfs-$version.img"

if [ ! -f "$initramfs" ]; then
	echo "ERROR : given version has no initramfs: $initramfs"
	exit 1
fi

dir_initramfs="/tmp/initramfs-$version"

if [ -d "$dir_initramfs" ]; then
	echo "ERROR : directory already exists: $dir_initramfs"
	exit 1
fi

echo "INFO  : extract $initramfs to $dir_initramfs"

mkdir "$dir_initramfs" || exit 1

pushd "$dir_initramfs" || exit 1

zcat "$initramfs" | cpio -i || exit 1

popd || exit 1

Comment 21 Peter Bieringer 2012-04-29 10:39:17 UTC

Downgraded to dracut-013-22.fc16.noarch, rebuild initramfs reboot 2 times, issue is still the same. Took 2 photos from "halt" screen, perhaps this helps.

Comment 22 Peter Bieringer 2012-04-29 10:40:56 UTC

Created attachment 581037 [details]
halt, unfinished sync

Comment 23 Peter Bieringer 2012-04-29 10:42:15 UTC

Created attachment 581045 [details]
halt after sync finished - but result in start of resync after pushing reset

Comment 24 Peter Bieringer 2012-05-02 05:13:59 UTC

next strange case:
- booted today (after yesterday shutdown)
  => raid still in sync
- reboot
  => raid start syncing again

=> can it be that there is a timing issue on shutdown?

Comment 25 Harald Hoyer 2012-05-02 11:24:11 UTC

Please try:

https://admin.fedoraproject.org/updates/FEDORA-2012-6603/dracut-018-26.git20120424.fc17

Comment 26 Peter Bieringer 2012-05-02 17:46:42 UTC

dracut-018-26.git20120424.fc17 won't fix the issue. While looking well on "halt" (no such problems shown like shown in screenshot, but after pushing reset button - while BIOS tells still raid status normal, resync starts again.

I did the following:

1. boot
2. update dracut
3. rebuild initramfs
4. reboot (raid still syncing)
5. wait until raid sync finished
6. halt
7. push reset button
8. raid resync after boot starts again

are there any more debug capabilities possible?

For me it looks like that having now most shutdown tasks programmed in C fast turnarounds on debugging are no longer possible :-(

Comment 27 Harald Hoyer 2012-05-03 12:04:51 UTC

This should give you a shell after systemd shutdown and pivot to the /run/initramfs was done:

# mkdir -p /run/initramfs/etc/cmdline.d
# echo "rd.break=pre-shutdown rd.break=shutdown" >> /run/initramfs/etc/cmdline.d/shutdown.conf

After you exit the shell in pre-shutdown, you will get another shell after the /oldroot was unmounted and all devices are disassembled.

Comment 28 Peter Bieringer 2012-05-04 18:00:29 UTC

(In reply to comment #27)
> This should give you a shell after systemd shutdown and pivot to the
> /run/initramfs was done:
> 
> # mkdir -p /run/initramfs/etc/cmdline.d
> # echo "rd.break=pre-shutdown rd.break=shutdown" >>
> /run/initramfs/etc/cmdline.d/shutdown.conf
> 
> After you exit the shell in pre-shutdown, you will get another shell after the
> /oldroot was unmounted and all devices are disassembled.

Looks like not working, neither using latest fc16 dracut nor fc17 dracut from above...does this not work using "halt"? And how to enable also for "reboot"

Comment 29 Peter Bieringer 2012-05-05 08:35:16 UTC

rescue shells now activated by putting rd.break=pre-shutdown rd.break=shutdown to grub kernel options.

1) on dracut-013-22 (f16) and dracut-018, shell now also appears during boot...minor bug?

2) on dracut-018, rescuse shell did not appear, also the hooks are not executed (they are missing in /run/initramfs)

=> is this caused by incompatibilities with util-linux 2.20?)

So with dracut-013-22:

3) /dev/md127p1 is still mounted (ro), also in second rescue shell, mdmon is still running, having a lot of files still open.

So now I wondering, what should be really happen? As long as mdmon is running, probably the device can't be unmounted at all.

Is there "how it should work in theory" path?

Can you provide an updated dracut for fc16?

Comment 30 Harald Hoyer 2012-05-07 07:24:44 UTC

(In reply to comment #29)
> 2) on dracut-018, rescuse shell did not appear, also the hooks are not executed
> (they are missing in /run/initramfs)
> 
> => is this caused by incompatibilities with util-linux 2.20?)

Well, do you use systemd? I works for me on my F17. I tested it before posting it here.

$ rpm -q systemd util-linux
systemd-44-8.fc17.x86_64
util-linux-2.21.1-1.fc17.x86_64


oh, older util-linux might still have the setsid bug.

Comment 31 Jes Sorensen 2012-05-30 06:54:37 UTC

Peter,

Any chance you could answer Harald's question in comment #30?

Thanks,
Jes

Comment 32 Peter Bieringer 2012-05-31 20:58:07 UTC

After F17 is released now I will try 2nd time to upgrade first and see, whether the issue is gone here then...1st time upgrade did not work proper, because grup2 is too big to be stored inside partition....and the system is a multi-boot system...cannot easily introduce a dedicated /boot partition :-(

Comment 33 Peter Bieringer 2012-06-05 20:12:54 UTC

Happen on fresh installed F17 also :-(

Comment 34 Peter Bieringer 2012-06-27 05:27:00 UTC

Problem still present with updated fresh installed F17, any new hints available regarding debug or how to fix?

Comment 35 Jes Sorensen 2012-06-27 06:13:44 UTC

Peter,

You didn't answer Harald's previous question - are you using systemd or
sysvinit?

Thanks,
Jes

Comment 36 Peter Bieringer 2012-06-27 06:40:01 UTC

(In reply to comment #35)
> Peter,
> 
> You didn't answer Harald's previous question - are you using systemd or
> sysvinit?

F17 default = systemd

# rpm -q systemd util-linux
systemd-44-14.fc17.x86_64
util-linux-2.21.2-1.fc17.x86_64

# ps ax |grep systemd
    1 ?        Ss     0:06 /usr/lib/systemd/systemd
  523 ?        Ss     0:00 /usr/lib/systemd/systemd-journald
 1119 ?        Ss     0:00 /usr/lib/systemd/systemd-logind
 1141 ?        Ssl    0:00 /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation
 5959 pts/1    S+     0:00 grep --color=auto systemd

But is it possible in F17 to switch from systemd to sysvinit? I don't need this fast startup...which currently produces always a slow system for > 1h because of RAID resync...

Comment 37 Jes Sorensen 2012-06-27 08:16:27 UTC

Peter,

Do you happen to have other storage controllers in the system, like an
mpt one etc?

Harald, any ideas? It looks like systemd is in place correctly, and
mdmon is launched with --offroot as expected.

Thanks,
Jes

Comment 38 Harald Hoyer 2012-06-27 10:05:06 UTC

Created attachment 594742 [details]
patch for /usr/lib/dracut/modules.d/90mdraid/md-shutdown.sh

please update to 
https://admin.fedoraproject.org/updates/FEDORA-2012-9847/dracut-018-78.git20120622.fc17

and apply the patch to /usr/lib/dracut/modules.d/90mdraid/md-shutdown.sh

then recreate the initramfs:

# dracut -f

Might help with the shutdown logic.

Comment 39 Peter Bieringer 2012-06-29 16:55:02 UTC

Latest suggestions won't help, looks that also with a fresh F17 install I'm still stucking on the same issue like mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=753335#c17

the md-shutdown.sh scriptlet is not executed at all, I've put a sleep into the script and search for infos...no sleep during reboot and nothing seen on halt.

I've investigated initramfs, modified script is inside.

# LANG=C ls -l /run/initramfs/shutdown
ls: cannot access /run/initramfs/shutdown: No such file or directory

# LANG=C ls -l /lib/systemd/system/shutdown.target.wants/
total 0
lrwxrwxrwx. 1 root root 21  5. Jun  21:14 alsa-store.service -> ../alsa-store.service
lrwxrwxrwx. 1 root root 26 28. Jun  07:44 dracut-shutdown.service -> ../dracut-shutdown.service
lrwxrwxrwx. 1 root root 35 21. Jun  07:59 systemd-random-seed-save.service -> ../systemd-random-seed-save.service
lrwxrwxrwx. 1 root root 39 21. Jun  07:59 systemd-update-utmp-shutdown.service -> ../systemd-update-utmp-shutdown.service

Something really strange here.

Comment 40 Harald Hoyer 2012-07-05 09:23:02 UTC

(In reply to comment #39)
> Latest suggestions won't help, looks that also with a fresh F17 install I'm
> still stucking on the same issue like mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=753335#c17
> 
> the md-shutdown.sh scriptlet is not executed at all, I've put a sleep into
> the script and search for infos...no sleep during reboot and nothing seen on
> halt.
> 
> I've investigated initramfs, modified script is inside.
> 
> # LANG=C ls -l /run/initramfs/shutdown
> ls: cannot access /run/initramfs/shutdown: No such file or directory
> 
> # LANG=C ls -l /lib/systemd/system/shutdown.target.wants/
> total 0
> lrwxrwxrwx. 1 root root 21  5. Jun  21:14 alsa-store.service ->
> ../alsa-store.service
> lrwxrwxrwx. 1 root root 26 28. Jun  07:44 dracut-shutdown.service ->
> ../dracut-shutdown.service
> lrwxrwxrwx. 1 root root 35 21. Jun  07:59 systemd-random-seed-save.service
> -> ../systemd-random-seed-save.service
> lrwxrwxrwx. 1 root root 39 21. Jun  07:59
> systemd-update-utmp-shutdown.service ->
> ../systemd-update-utmp-shutdown.service
> 
> Something really strange here.

well, with the new dracut we unpack the initramfs to /run/initramfs later on, if needed with the  dracut-shutdown.service

you should have "/run/initramfs/.need_shutdown" now

Comment 41 rhbug 2012-08-04 20:58:51 UTC

been having this same issue

> you should have "/run/initramfs/.need_shutdown" now

doesn't get created for me but doing so manually fixes the resync problem for me:

echo @reboot root touch /run/initramfs/.need_shutdown >> /etc/crontab

Comment 42 Peter Bieringer 2012-08-05 15:14:42 UTC

(In reply to comment #41)
> been having this same issue
> 
> > you should have "/run/initramfs/.need_shutdown" now
> 
> doesn't get created for me but doing so manually fixes the resync problem
> for me:
> 
> echo @reboot root touch /run/initramfs/.need_shutdown >> /etc/crontab

Thank you for this wonderful workaround, I'm using now a slightly other one to avoid modification on /etc/crontab:

# cat /etc/cron.d/initramfs 
@reboot root /bin/touch /run/initramfs/.need_shutdown

BTW: this workaround won't help on F16.

Comment 43 Jes Sorensen 2012-10-24 13:41:51 UTC

There has been no activity on this bug for a couple of months now - does this
mean the issue has been resolved with the latest dracut updates?

Thanks,
Jes

Comment 44 rhbug 2012-10-27 10:39:02 UTC

After removing previously mentioned workaround I can confirm this has now been fixed for me (running dracut-018-105.git20120927.fc17.noarch).

thanks!

Comment 45 Peter Bieringer 2012-10-27 12:16:26 UTC

(In reply to comment #44)
> After removing previously mentioned workaround I can confirm this has now
> been fixed for me (running dracut-018-105.git20120927.fc17.noarch).

I can also confirm that after disabling the @reboot cron entry everything works still fine on F17 with all updates applied.

Comment 46 zadeluca 2013-12-02 16:45:39 UTC

Hi,

I'm having the same problem with Fedora 19, should I continue here or create a new bug? (I also found https://bugzilla.redhat.com/show_bug.cgi?id=996475 but I am not using a livecd)

I don't fully understand what happened in comments 41-42, do I need to do something to fix this or was it expected to be fixed in an update? My system is currently up to date on all packages.

Thanks,
Zach