Bug 1058111

Summary: Kernel panic at power down when rootfs on NBD device
Product: [Fedora] Fedora Reporter: Dag <den.mail>
Component: nbdAssignee: Christopher Meng <i>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 20CC: enslaver, gansalmon, harald, i, itamar, johannbg, jonathan, kernel-maint, lnykryn, madhu.chinakonda, msekleta, plautrba, systemd-maint, vpavlin, wtogami, xjakub, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1098499 (view as bug list) Environment:
Last Closed: 2014-05-16 15:14:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1098499    

Description Dag 2014-01-27 03:18:06 UTC
Description of problem:
My computer has no hard drive, so it boots up via PXE on the network, then the root FS is mounted via nbd at boot (dracut network nbd in initramfs).
When I type "poweroff", the computer is shuting down all services ... until it kills nbd-client service, killing rootfs, leading to a kernel panic, then the computer stays on forever with a kernel panic trace on screen.
It wasn't happening on Fedora 18 i686 PAE.
It seems that nbd-client gets killed too early in the poweroff procedure.


Version-Release number of selected component (if applicable):
nbd-3.6-1.fc20.i686
dracut-network-034-64.git20131205.fc20.1.i686
kernel-PAE-3.12.8-300.fc20.i686


PXE parameters :
label Fedora20_i686
menu label Fedora20 i686 PAE
kernel vmlinuz-3.12.8-300.fc20.i686+PAE
initrd initramfs-3.12.8-300.fc20.i686+PAE.img
append rw netroot=nbd:192.168.56.100:2014:ext3 root=UUID=8ad5f702-4ecb-4e5f-ac8a-912641c7d8f9 ip=dhcp vga=865 quiet selinux=0

How reproducible:
Always

Steps to Reproduce:
1. install Fedora on a mounted nbd target (ext3 formatted in my case)
2. chroot in the newly-installed system and: yum install dracut-network nbd
3. put the rootfs (nbd) UUID in /etc/fstab
4. dracut --kver 3.12.8-300.fc20.i686+PAE -f /boot/initramfs-3.12.8-300.fc20.i686+PAE.img
5. prepare another machine to serve tftp and dhcp to be able to PXE boot
6. prepare the pxe menu (as described obove) on this other machine
7. boot the computer via PXE, it should find its NBD root device and mount it
8. login as root, then type poweroff

Actual results:
The computer will shutdown all services until it kills the nbd-client, thus killing its root filesystem.
It then kernel panics stating :

[ok] Reached target Unmount all filesystems
[ok] Stopped monitoring of LVM2 [...]
     Stopping LVM2 metadata daemon ...
[ok] Stopped LVM2 metadata daemon
[ok] starting restore /run/initramfs
[ok] reached target shutdown
[kernel.time] block nbd0: receive control failed (result -4)
[kernel.time] block nbd0: attempted send on closed socket
[kernel.time] end_request: I/O error, dev nbd0, sector 6680
[kernel.time] ---------[ cut here ] -------
[kernel.time] kernel bug at fs/buffer.c:3015!
[kernel.time] invalid opcode: 0000 [#1] SMP
[... kernel trace ...]
[kernel.time] ---------[ end trace ] --------
[kernel.time] kernel panic - not syncing: attempted to kill init! exitcode=0x0000000b


Expected results:
The system should shutdown cleanly (disconnect NBD at the ultimate end of the procedure) and the computer be turned off at the end.


Additional info:

Comment 1 Josh Boyer 2014-01-27 14:51:44 UTC
The kernel paniced because init died.  The init process is responsible for shutdown order, so perhaps the systemd guys have a suggestion here.

Comment 2 Josh Boyer 2014-01-27 14:52:00 UTC
The kernel paniced because init died.  The init process is responsible for shutdown order, so perhaps the systemd guys have a suggestion here.

Comment 3 Lennart Poettering 2014-01-28 00:44:53 UTC
Well, i nbd wants to survive the final killing spree then it needs to be run from the initrd and mark itself appropriately.

http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/

Otherwise it will be killed, and thus the root fs goes away, and thus the init process might crash, which causes the kernel to panic.

REassigning to nbd.

Comment 4 Christopher Meng 2014-01-28 09:16:57 UTC
I will backport a fix later or in Thursday maybe.

Comment 5 Fedora Update System 2014-02-04 03:08:27 UTC
nbd-3.7-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/nbd-3.7-2.fc20

Comment 6 Christopher Meng 2014-02-04 03:11:33 UTC
Sorry for the belated update, I'm celebrating Chinese New Year recently. 

Please see -m option and manpages of -m option after this update, and test if it still crashes then leave a karma in bodhi. Or tell me directly here if you don't have any Fedora account. 

Thanks!

Comment 7 Fedora Update System 2014-02-05 03:41:15 UTC
Package nbd-3.7-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing nbd-3.7-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-2024/nbd-3.7-2.fc20
then log in and leave karma (feedback).

Comment 8 Dag 2014-02-12 05:37:04 UTC
Hi,


I updated to nbd-3.7-2.fc20 from the Fedora 20 testing repository.

Then I updated kernel to 3.12.10-300-fc20.i686+PAE.
Then rebooted on the new kernel/iniramfs.

I then made a "dracut --force -vv" to be sure to have updated nbd-client in initramfs.

I then rebooted on the new initramfs : I still get the very same error on shutdown (poweroff/reboot).


So this update didn't fix the issue : nbd-client gets killed seemingly too early when shuting down resulting in kernel pannic.


Regards,
Daggett

Comment 9 Harald Hoyer 2014-03-14 10:37:54 UTC
nbd-client is started in the dracut-initqueue.service from a shell script in the initramfs.

So a dracut patch is needed to add the "-m" option to the nbd-client call.

You can also mirror the behavior of mdmon.

301         if (in_initrd()) {
302                 /*
303                  * set first char of argv[0] to @. This is used by
304                  * systemd to signal that the task was launched from
305                  * initrd/initramfs and should be preserved during shutdown
306                  */
307                 argv[0][0] = '@';
308         }
309 

...

1945 int in_initrd(void)
1946 {
1947         /* This is based on similar function in systemd. */
1948         struct statfs s;
1949         return  statfs("/", &s) >= 0 &&
1950                 ((unsigned long)s.f_type == TMPFS_MAGIC ||
1951                  (unsigned long)s.f_type == RAMFS_MAGIC);
1952 }

Comment 10 Fedora Update System 2014-03-21 02:29:39 UTC
nbd-3.8-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/nbd-3.8-1.fc20

Comment 11 Fedora Update System 2014-04-02 09:11:46 UTC
nbd-3.8-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 12 Dag 2014-04-06 12:23:18 UTC
Hi,

I updated the system with the new nbd package (nbd-3.8-1.fc20).

Then I updated kernel to 3.13.8.200.fc20.i686+PAE.

Then I rebooted on new kernel/initramfs.

Then I made a "dracut --force -vv" to be sure to have updated nbd-client in initramfs.

I then rebooted on the new initramfs : I still get the very same error on shutdown (poweroff/reboot).


So this update didn't fix the issue : nbd-client gets killed seemingly too early when shuting down resulting in kernel panic.


As  Harald Hoyer said : dracut may need an update.


I made a new bug report as I believe it is NOT a nbd issue anymore :

https://bugzilla.redhat.com/show_bug.cgi?id=1084763


May be this bug report could be closed as "fixed" for nbd ?
 or reassigned to systemd ?
 or reassigned to dracut ?


regards,
Dag

Comment 13 Dag 2014-05-16 15:14:34 UTC
this problem was solved by Harald Hoyer in git.kernel.org.

see https://bugzilla.redhat.com/show_bug.cgi?id=1098499


Thanks Harald !

Dag