Bug 967493

Summary: Lockfailure action Ignore will lead to sanlock rem_lockspace stuck
Product: Red Hat Enterprise Linux 7 Reporter: Luwen Su <lsu>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: ajia, cwei, dyuan, fsimonce, jdenemar, mzhan, rbalakri, shyu, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.7-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 905280 Environment:
Last Closed: 2015-03-05 07:20:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 905280    
Bug Blocks:    

Description Luwen Su 2013-05-27 09:46:56 UTC
+++ This bug was initially created as a clone of Bug #905280 +++

Description of problem:
Create this bug to track the Bug 832156 not solved issue
Lockfailure action Ignore  will lead to sanlock rem_lockspace stuck
I re-write the reproduce steps for further testing since both libvirt and sanlock have changed a lot.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-17.el6.x86_64
sanlock.x86_64 0:2.6-2.el6      

How reproducible:
100%

Steps to Reproduce:
1.Libvirt configuration
# service libvirtd stop

# getsebool -a | grep sanlock
sanlock_use_fusefs --> off
sanlock_use_nfs --> on
sanlock_use_samba --> off
virt_use_sanlock --> on

# tail -5 /etc/libvirt/qemu-sanlock.conf 
user = "sanlock"
group = "sanlock"
host_id = 1
auto_disk_leases = 0
/*Because the auto leases will have effect on the action , so close it , create lockspace and lease file mannully*/
disk_lease_dir = "/var/lib/libvirt/sanlock"

#tail -1 /etc/libvirt/qemu.conf 
lock_manager = "sanlock"

2.Sanlock configureation

/*create lockspace file*/
#truncate -s 1M /var/lib/libvirt/sanlock/TEST_LS
#sanlock direct init -s TEST_LS:0:/var/lib/libvirt/sanlock/TEST_LS:0

/*change to permission to sanlock , since its default is root*/
#chown sanlock:sanlock /var/lib/libvirt/sanlock/TEST_LS:0

/*add lockspace to sanlock */
#service wdmd start
#service sanlock start
#sanlock client add_lockspace -s TEST_LS:1:var/lib/libvirt/sanlock/TEST_LS:0

/*you can use this command to check the status*/
#sanlock client host_status -s TEST_LS

/*create lease file*/
#truncate -s 1M /var/lib/libvirt/sanlock/test-disk-resource-lock
#sanlock direct init -r TEST_LS:test-disk-resource-lock:/var/lib/libvirt/sanlock/test-disk-resource-lock:0

/*change permission to sanlock*/
#chown sanlock:sanlock /var/lib/libvirt/sanlock/test-disk-resource-lock

3.Guest XML setting
A normal shutdown guest  , add
....
<on_lockfailure>ignore</on_lockfailure>
....
<lease>
<lockspace>TEST_LS</lockspace>
<key>test-disk-resource-lock</key>
<target path=’/var/lib/libvirt/sanlock/test-disk-resource-lock’/>
</lease>
...

to the guest.

#service libvirtd restart
/*sometimes libvirtd will hang a while , it's expected*/
#virsh start $domain


4.Try to remove the lockspace when the guest is using.
#sanlock client rem_lockspace -s __LIBVIRT__DISKS__:1:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0
/*will stuck here , libvirtd shows itself just repeat an event again and again*/
/*If there is a better way to make the lockspace "lost " , plz correct me :) thanks*/


Actual results:
sanlock hang

Expected results:
should work well

Additional info:
For easy to review , i sort Bug 832156's comment related ignore here for further disscuess
-----------------------------------------------------------------------------
Federico Simoncelli 2013-01-24 08:22:48 EST Comment 38 

The on_lockfailure policies to check are:

poweroff
========
If I understand correctly this is currently working. The vm is shutdown.

restart
=======
I don't think we'll ever see this working with sanlock. Once you removed the lockspace you're not able to start the VM. Anyway this depends on the implementation of "restart", if libvirt is actually killing/shutting-down the qemu process, then my assumption is correct. If the restart is handled in some way so that the qemu process remains the same then this would appear (in sanlock view) as an "ignore" (see below).

pause
=====
If I understand correctly this is currently working. The vm is paused and the sanlock resource is released (double check this).

ignore
======
This is not supposed to work with sanlock. If the qemu process is ignoring the request and it's not releasing the resources then sanlock should escalate to kill, kill -9 and eventually rebooting the host.

For what I saw the escalation is not happening on the sanlock side. David, do you want to take a look? Thanks.

----------------------------------------------------------------------------
David Teigland 2013-01-24 10:31:04 EST       Comment 40 

Sorry I couldn't follow all the discussion above very well, so I'll probably repeat some obvious background to make sure that we're all expecting the same things.

pause
-----
You need to pass sanlock the path to a kill script/program that sanlock will run against the vm when the lock fails.  In the libvirt case we expect this program to result in the following (probably done within libvirtd):
1. pause/suspend the vm
2. inquire and save the lease state from sanlock
3. release the sanlock leases for the vm

When the sanlock daemon sees that the leases are gone, it will no longer trigger the watchdog reset.

ignore
------
You should not set killpath if you don't want sanlock to use it.  In this case, sanlock will use SIGTERM and SIGKILL against the vm when its lock fails.  If the pid does not exit from either of those, then the host will be reset by the watchdog.  If this is not happening, could you run "sanlock client log_dump > log.txt" and send that to me?

Finally, I'm not sure what rem_lockspace is being used for above; it should probably not be used to test lock failure.  The way I usually simulate lock failures is by using dmsetup to load the error target under the leases lv.

------------------------------------------------------------------------------

Federico Simoncelli 2013-01-24 13:11:53 EST         Comment 41 

(In reply to comment #40)
> Sorry I couldn't follow all the discussion above very well, so I'll probably

I think that we have a misconception here, the "ignore" policy was implemented (Jiri correct me if I'm wrong) "ignoring" the fact that sanlock is requesting to release the resource. In this situation sanlock should escalate anyway ("ignore" == forced reboot in the sanlock implementation).


> If the pid does not exit from either of those, then the host will be
> reset by the watchdog.  If this is not happening, could you run "sanlock
> client log_dump > log.txt" and send that to me?

# sanlock client log_dump
2013-01-24 18:43:42+0800 6085 [2735]: sanlock daemon started 2.6 host 7d9676dc-9af3-4d63-bc91-dc5ba9e50a7e.intel-8400
2013-01-24 18:43:50+0800 6094 [2739]: cmd_add_lockspace 2,9 TEST_LS:1:var/lib/libvirt/sanlock/TEST_LS:0 flags 0 timeout 0
2013-01-24 18:43:50+0800 6094 [2739]: s1 lockspace TEST_LS:1:var/lib/libvirt/sanlock/TEST_LS:0
2013-01-24 18:43:50+0800 6094 [2849]: s1 delta_acquire begin TEST_LS:1
2013-01-24 18:43:51+0800 6094 [2849]: s1 delta_acquire write 1 1 6094 7d9676dc-9af3-4d63-bc91-dc5ba9e50a7e.intel-8400
2013-01-24 18:43:51+0800 6094 [2849]: s1 delta_acquire delta_short_delay 20
2013-01-24 18:44:11+0800 6114 [2849]: s1 delta_acquire done 1 1 6094
2013-01-24 18:44:11+0800 6115 [2739]: s1 add_lockspace done
2013-01-24 18:44:11+0800 6115 [2739]: cmd_add_lockspace 2,9 done 0
2013-01-24 18:46:30+0800 6253 [2735]: cmd_register ci 2 fd 9 pid 2913
2013-01-24 18:46:30+0800 6253 [2740]: cmd_killpath 2,9,2913 flags 0
2013-01-24 18:46:31+0800 6254 [2735]: cmd_restrict ci 2 fd 9 pid 2913 flags 1
2013-01-24 18:46:31+0800 6254 [2739]: cmd_acquire 2,9,2913 ci_in 3 fd 12 count 1
2013-01-24 18:46:31+0800 6254 [2739]: s1:r1 resource TEST_LS:sles11sp2-disk-resource-lock:/var/lib/libvirt/sanlock/sles11sp2-disk-resource-lock:0 for 2,9,2913
2013-01-24 18:46:31+0800 6254 [2739]: r1 paxos_acquire begin 0 0 0
2013-01-24 18:46:31+0800 6254 [2739]: r1 paxos_acquire leader 0 owner 0 0 0 max mbal[1999] 0 our_dblock 0 0 0 0 0 0
2013-01-24 18:46:31+0800 6254 [2739]: r1 paxos_acquire leader 0 free
2013-01-24 18:46:31+0800 6254 [2739]: r1 ballot 1 phase1 mbal 1
2013-01-24 18:46:31+0800 6254 [2739]: r1 ballot 1 phase2 bal 1 inp 1 1 6254 q_max -1
2013-01-24 18:46:31+0800 6254 [2739]: r1 ballot 1 commit self owner 1 1 6254
2013-01-24 18:46:31+0800 6254 [2739]: r1 acquire_disk rv 1 lver 1 at 6254
2013-01-24 18:46:31+0800 6254 [2739]: cmd_acquire 2,9,2913 result 0 pid_dead 0
2013-01-25 02:04:10+0800 32513 [2740]: cmd_rem_lockspace 3,12 TEST_LS flags 0
2013-01-25 02:04:10+0800 32513 [2735]: s1 set killing_pids check 0 remove 1
2013-01-25 02:04:10+0800 32513 [2735]: s1:r1 client_using_space pid 2913
2013-01-25 02:04:10+0800 32513 [2735]: s1 kill 2913 sig 100 count 1
2013-01-25 02:05:49+0800 32612 [2735]: s1 killing pids stuck 1
<...nothing else, 5 minutes passed...>

------------------------------------------------------------------------------

Federico Simoncelli 2013-01-24 13:14:49 EST        Comment 42

(In reply to comment #41)
> (In reply to comment #40)
> I think that we have a misconception here, the "ignore" policy was
> implemented (Jiri correct me if I'm wrong) "ignoring" the fact that sanlock
> is requesting to release the resource. In this situation sanlock should
> escalate anyway ("ignore" == forced reboot in the sanlock implementation).

Actually let me correct myself, "ignore" == vm is abruptly killed (and eventually we might escalate to the reboot).
--------------------------------------------------------------------------

David Teigland 2013-01-24 14:05:47 EST              Comment 43

The first problem is as I mentioned above: rem_lockspace is not equivalent to a failed lock, and should not be used to test that. (This does reveal a possible problem with a forced rem_lockspace, though, which I will look into.)

There might also be a problem with the killpath program because the the lease is not removed or the pid does not exit.  We'd expect one of those results from running killpath.  (If the lockspace had actually failed, then sanlock would have escalated when the killpath did not do anything.)

Comment 2 David Teigland 2013-05-28 15:57:57 UTC
I believe that the libvirt "autoleases" capability should be removed.
If not removed, I believe it should be disabled by Red Hat.
There is no case in which Red Hat wants to use or support this capability.

Comment 3 Jiri Denemark 2014-03-25 08:26:36 UTC
This is fixed upstream by v1.2.2-340-ge3dd35e:

commit e3dd35e881614e6f08a35e1e714336268764d5ba
Author: Jiri Denemark <jdenemar>
Date:   Mon Mar 24 14:22:36 2014 +0100

    sanlock: Forbid VIR_DOMAIN_LOCK_FAILURE_IGNORE
    
    https://bugzilla.redhat.com/show_bug.cgi?id=905280
    https://bugzilla.redhat.com/show_bug.cgi?id=967493
    
    Sanlock expects that the configured kill script either kills the PID on
    lock failure or removes all locks the PID owns. If none of the two
    options happen, sanlock will reboot the host. Although IGNORE action is
    supposed to ignore the request to kill the PID or remove all leases,
    it's certainly not designed to cause the host to be rebooted. That said,
    IGNORE action is incompatible with sanlock and should be forbidden by
    libvirt.
    
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 6 Shanzhi Yu 2014-12-01 09:25:57 UTC
Verify this bug with libvirt-1.2.8-9.el7.x86_64

Step:

1. Define guest with below options
..
<on_lockfailure>ignore</on_lockfailure>
..

2. Try to start guest

# virsh start r7
error: Failed to start domain r7
error: internal error: Process exited prior to exec: libvirt: Lock Driver error : unsupported configuration: Failure action ignore is not supported by sanlock


on_lockfailure options "ignore" is not supported anymore, so the result is expected.

Comment 8 errata-xmlrpc 2015-03-05 07:20:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html