Bug 1122431 - PXE-booted system with readonly rootfs on iscsi-target hangs on shutdown/reboot
Summary: PXE-booted system with readonly rootfs on iscsi-target hangs on shutdown/reboot
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-23 08:43 UTC by martinjilg
Modified: 2015-06-29 21:42 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-29 21:42:47 UTC


Attachments (Terms of Use)
Collection of logfiles (47.70 KB, application/zip)
2014-07-23 08:43 UTC, martinjilg
no flags Details
Logfile of the boot process (48.07 KB, text/plain)
2014-07-23 08:45 UTC, martinjilg
no flags Details
Command line used when creating the logs (275 bytes, text/plain)
2014-07-23 08:46 UTC, martinjilg
no flags Details
dmesg logs (35.41 KB, text/plain)
2014-07-23 08:47 UTC, martinjilg
no flags Details
messages logfile (103.16 KB, text/plain)
2014-07-23 08:47 UTC, martinjilg
no flags Details
Logfile of the shutdown process (42.17 KB, text/plain)
2014-07-23 08:48 UTC, martinjilg
no flags Details
Output of systemctl -l (20.39 KB, text/plain)
2014-07-23 08:49 UTC, martinjilg
no flags Details

Description martinjilg 2014-07-23 08:43:35 UTC
Created attachment 920167 [details]
Collection of logfiles

Description of problem:
We use PXE-boot and dracut-network functionalities to mount a read-only iscsi-target as rootfs for a diskless workstation.
The iscsi-target is a disk partition exported via the LIO-target on a Debian machine.
The kernel and initramfs used for the PXE-boot have been created by using the mentioned rootfs, which is Fedora 20.

Since the release of the first 3.13.** kernel, we encounter the problem that the system hangs on shutdown after installing all available updates. The reported error messages seem to be related to the iscsi-initiator ("connection1:0: detected conn error (1011)"). Even after waiting several (>10) minutes, the system won't shut down.
Investigating the problem and following the advices for debugging shutdown problems, we believe that the problem is related to systemd.


Version-Release number of selected component (if applicable):
The last systemd/kernel combination that works properly for us is:
systemd-208-9.fc20
kernel-3.12.8-300.fc20.x86_64
Any newer versions show the described behavior.


How reproducible:
always

Steps to Reproduce:
1. Install Fedora 20 on a free disk partition
2. Provide disk partition as read-only iscsi-device
3. Run "yum remove NetworkManager" and "yum update"
4. Build initramfs for iscsi-support using dracut (hostonly=no)
5. Provide kernel and initramfs via pxelinux
6. PXE-boot client machine with iscsi target as rootfs
7. Shutdown client machine

Actual results:
The system hangs and won't shut down even after waiting several minutes.


Expected results:
The system properly shuts down after several seconds.


Additional info:

The problem also occurs when initiating a reboot sequence. 
Currently, we use "ExecStart=/usr/bin/systemctl --force poweroff -f" in the
"/usr/lib/systemd/system/systemd-poweroff.service" file as a workaround.

The commands 
sync && reboot -f
sync && poweroff -f
work as expected.

In the Fedora 20 rootfs, we use network.service instead of NetworkManager (uninstalled) to manage the network devices of the diskless workstation.

When enabling the debug-shell.service (which was active when creating the provided log files), the mentioned error message ("connection1:0: detected conn error (1011)") appears earlier, and the system eventually shuts down, usually after 3 - 4 minutes.

The logs have been collected in the serial console of a qemu virtual machine which has been PXE-booted in a virtual network (hypervisor was the Debian machine exporting the iscsi target).

The Fedora 20 installation is maintained in a virtual machine hypervised by the mentioned Debian machine. In this VM, the Fedora 20 rootfs is mounted read-write. We pass "readonlyroot" and the MAC-Adress of the boot interface to the kernel command-line of the diskless client during PXE-boot.

Comment 1 martinjilg 2014-07-23 08:45:26 UTC
Created attachment 920169 [details]
Logfile of the boot process

Comment 2 martinjilg 2014-07-23 08:46:30 UTC
Created attachment 920170 [details]
Command line used when creating the logs

Comment 3 martinjilg 2014-07-23 08:47:10 UTC
Created attachment 920171 [details]
dmesg logs

Comment 4 martinjilg 2014-07-23 08:47:54 UTC
Created attachment 920172 [details]
messages logfile

Comment 5 martinjilg 2014-07-23 08:48:20 UTC
Created attachment 920173 [details]
Logfile of the shutdown process

Comment 6 martinjilg 2014-07-23 08:49:09 UTC
Created attachment 920174 [details]
Output of systemctl -l

Comment 7 Jóhann B. Guðmundsson 2014-07-23 09:56:38 UTC
Afaik "conn error (1011)" is network related problem in one form or another in the iscsi world and that not working properly can lead to a hang on shutdown anyway have you narrowed the last working systemd release the last know to exactly "systemd-208-9.fc20" since that update introduced two patches for #1026860 which then in 208-10 got removed again since the LVM rules have been updated so 208-10 should just work since it should be exactly like 208-8 ( which would point to those lvm rules being updated potentially being the culprit. )

Comment 8 martinjilg 2014-07-23 12:01:16 UTC
We are currently using "systemd-208-19.fc20" and the problem persists as for various other versions between 208-19 and 208-9. Unfortunately, we can't tell if specifically 208-10 works as we don't have this version of the package available, but we will check the behavior of 208-20.

Comment 9 Jóhann B. Guðmundsson 2014-07-23 12:04:26 UTC
You can fetch that release ( 208-10) and other releases inbetween from koji
http://koji.fedoraproject.org/koji/packageinfo?packageID=10477

Comment 10 martinjilg 2014-08-26 07:28:27 UTC
In the meantime we verified that everything works fine using
systemd 208-11 (208-10 was not available on koji) and systemd 208-20, 
so in contrast to our initial guess, the problem seems NOT to be caused 
by systemd (at least not by systemd alone).
Additionally, we found out that an update of the following packages 
does NOT reproduce our problem:

audit,clutter,dbus,device-mapper,dhclient,dracut,gtk3,ibus,initsctipts,
kernel,libmount,libuuid,mutter,mutter-wayland,selinux-policy,xorg-x11,
iscsi-initiator-utils-iscsiuio

so it must be related to some other package. Next, we will check the 
behavior with the newest updates and when booting into the multiuser 
instead of the graphical target and see what's happening.

Comment 11 martinjilg 2014-09-02 06:40:04 UTC
Good news: Last thursday (2014-08-28), we installed all updates and the problem vanished.

Comment 12 Fedora End Of Life 2015-05-29 12:26:56 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2015-06-29 21:42:47 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.