Bug 700010 - libvirt does not logout of iscsi targets, causing system hang on shutdown
Summary: libvirt does not logout of iscsi targets, causing system hang on shutdown
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Eric Blake
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-27 09:36 UTC by Hans de Goede
Modified: 2013-03-04 13:30 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-04 13:30:23 UTC


Attachments (Terms of Use)
Diff, showing the additional startup dependency. (339 bytes, patch)
2013-02-28 06:05 UTC, Fritz Elfert
no flags Details | Diff

Description Hans de Goede 2011-04-27 09:36:05 UTC
Hi,

It took me a while to figure out what exactly is going on here. I export 2 "disks" on my system as iscsi targets through tgtd. I also have my system logging in to these, so that the are available as:
/dev/disk/by-path/ip-192.168.1.100:3260-iscsi-iqn.localdomain.shalem:...

This way they have identical paths on all my systems which is handy for running virtual machines on them and migrating them.

With the new libvirt support for spice I've switched to using libvirt + virtmanager for starting my vm-s rather then a bunch of scripts. So I've added the 2 iscsi targets to my libvirt storage pool.

As said my system was configured to log in to these disks already, this is being done by the iscsi service / sysv initscript, which will log in to all
nodes in the local iscsidb which have their node.startup setting set to automatic.

After adding the 2 disks to the libvirt storage pool, my system would no longer shutdown (it would hang at the end instead). This is caused by the combination of systemd and libvirt doing some undesirable (imho) things wrt iscsi.

There are 2 different possible scenarios here:
1) The iscsi service is started first, then the libvirtd service in this case
   everything works ok, the iscsi service logs into the iscsi targets, libvirt
   can use them but does not touch them otherwise, and on shutdown the iscsi
   service log outs of them again

2) The libvirtd service gets started first, in this case libvirt logs into
   the targets itself and *changes* their c

   Stopping libvirt however will not logout of the nodes (nor restore their
   node.startup setting).

   The changing of the node.startup setting causes the iscsi service to not
   create /var/lock/subsys/iscsi because there are no nodes with a
   node.startup=automatic setting, and thus their is nothing for it to do
   when stopped.
   
   On shutdown, libvirt does not logout of the nodes as mentioned before,
   neither does the iscsi service logout because of the lack of both
   /var/lock/subsys/iscsi and any targets with node.startup=automatic

   So the iscsi nodes stay in a logged in state, which causes the tgtd sysv
   script to exit with an error when asked to stop because their are still
   user of the node at which point the system hangs with systemd waiting
   indefinitely for the tgtd process to exit.


In scenario 2, there is a bug in libvirt IMHO, if it has logged into the iscsi
nodes it should also logout of them. Also it should not change the node.startup
setting IMHO.

My system was actually hitting scenario 2, because systemd seems to parallelize starting classic sysv init scripts, unless their LSB headers have Required-Start or Should-Stop headers indicating another service should be started first.

I've fixed things on my system for now by adding the following lines to the LSB header:
Should-Start: iscsi 
Should-Start: tgtd
Should-Stop: iscsi
Should-Stop: tgtd

Which causes things to switch to scenario 1. Given the large difference in startup priorities in the pre systemd world, I assume that is how things are supposed to work. Note I also added Should-Start/Stop: tgtd in case the iscsi service is not used for doing the log in / out, and locally hosted iscsi targets are used. I needed to do the same for the iscsi init script (which I more or less maintain), otherwise the system would hang while it tried to log in before tgtd was started.

Even with these lines added to the libvirt initscript LSB header, I still believe that the scenario 2 behavior is a (separate) bug which also needs to be fixed,

Regards,

Hans

Comment 1 Hans de Goede 2011-07-18 10:53:12 UTC
Ping? After upgrading to libvirt-0.8.8-7.fc15 which overwrote my modifed libvirt init script I had a system which would not reboot / shutdown again.

At least at the:
Should-Start: iscsi 
Should-Start: tgtd
Should-Stop: iscsi
Should-Stop: tgtd

To the LSB header, working around the issue of libvirt not logging out of nodes it logged in to, thus hanging the system on halt.

Comment 2 Fedora Admin XMLRPC Client 2011-09-22 17:52:12 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 3 Fedora Admin XMLRPC Client 2011-09-22 17:55:24 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 4 Fedora Admin XMLRPC Client 2011-11-30 20:01:29 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 5 Fedora Admin XMLRPC Client 2011-11-30 20:01:54 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Fedora Admin XMLRPC Client 2011-11-30 20:06:33 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 7 Fedora Admin XMLRPC Client 2011-11-30 20:06:40 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 8 Cole Robinson 2012-06-07 00:14:39 UTC
Hans, sorry for the lack of response. I'm pretty sure the libvirt iscsi code hasn't changed probably up through rawhide, so this may still be an issue. Reassigning to F16 for now.

Hans, can you confirm that is still an issue on F16+?

Comment 9 Hans de Goede 2012-06-07 08:48:36 UTC
Hi,

(In reply to comment #8)
> Hans, sorry for the lack of response. I'm pretty sure the libvirt iscsi code
> hasn't changed probably up through rawhide, so this may still be an issue.
> Reassigning to F16 for now.
> 
> Hans, can you confirm that is still an issue on F16+?

I'm afraid I've moved away from using iscsi as backing for vms in the mean time, so I cannot test this (without a lot of effort).

Regards,

Hans

Comment 10 Cole Robinson 2012-10-21 00:28:19 UTC
Since there hasn't been much noise about this WRT to fedora, moving to the upstream tracker.

dallan, anyone on the libvirt team with an iscsi setup that can confirm if this issue still exists?

Comment 11 Dave Allan 2012-10-22 18:16:54 UTC
I'm asking around.

Comment 12 Fritz Elfert 2013-02-28 06:04:09 UTC
Hi guys,

I just stumbled of this bugzilla entry when googl'ing for my reboot problems with Fedora 17 & 18.
I had the very same behavior (sessions of an libvirt iscsi storage pool).
What I noticed on F18 here:
During shutdown, iscsid is stopped *before* libvirtd. This prevents libvirtd from properly logging out of the scsi-target it uses. I solved this in a similar manner like the OP (systemd is used now):

Adding a startup-dependency on iscsid.service in libvirtd.service solved the problem. I'm attaching a diff of my custom /etc/systemd/system/libvirtd.service vs. the original /usr/lib/systemd/system/libvirtd.service.

BTW: The fact, that after fixing the the startup/shutdown sequence the system behaves properly, proves that the culprit is NOT libvirtd but the init-scripts (resp. service units) included in the package.

Cheers
 -Fritz

Comment 13 Fritz Elfert 2013-02-28 06:05:43 UTC
Created attachment 703811 [details]
Diff, showing the additional startup dependency.

Comment 14 Dave Allan 2013-02-28 14:16:14 UTC
(In reply to comment #12)
> Adding a startup-dependency on iscsid.service in libvirtd.service solved the
> problem. I'm attaching a diff of my custom
> /etc/systemd/system/libvirtd.service vs. the original
> /usr/lib/systemd/system/libvirtd.service.

Hi Fritz, thanks for the patch; would you mind sending it to libvir-list for discussion?

Comment 15 Fritz Elfert 2013-02-28 20:47:35 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > Adding a startup-dependency on iscsid.service in libvirtd.service solved the
> > problem. I'm attaching a diff of my custom
> > /etc/systemd/system/libvirtd.service vs. the original
> > /usr/lib/systemd/system/libvirtd.service.
> 
> Hi Fritz, thanks for the patch; would you mind sending it to libvir-list for
> discussion?

Not at all. Just posted it there.
 -Fritz

Comment 16 Fritz Elfert 2013-03-04 11:42:39 UTC
Appears to be accepted. See:
https://www.redhat.com/archives/libvir-list/2013-February/msg01727.html

Comment 17 Eric Blake 2013-03-04 13:30:23 UTC
Will appear in 1.0.3.

commit 443ec5c8c36e05819eae6157211b3691bebfe970
Author: Fritz Elfert <fritz@fritz-elfert.de>
Date:   Thu Feb 28 21:46:19 2013 +0100

    libvirt does not logout of iscsi targets, causing system hang on shutdown
    
    There's a quite old bug entry here:
    
    https://bugzilla.redhat.com/show_bug.cgi?id=700010
    
    I just stumbled over that very issue on F18. Doing a little bit
    debugging of the shutdown sequence, it turns out that - at least on my
    F18 installation - libvirtd is shutdown *after* iscsid, which makes it
    impossible for libvirt to perform the logout of the iscsi session properly.
    
    This patch simply adds another startup dependancy on iscsid.service
    which in turn delays iscsid shutdown until after libvirtd has stopped.
    Having that applied, the system shuts down properly again.
    
    Signed-off-by: Eric Blake <eblake@redhat.com>


Note You need to log in before you can comment on or make changes to this bug.