Hide Forgot
Hello! sanlock --version sanlock 3.2.4 (built Mar 31 2016 15:30:42) Can't be stopped correctly after lockspace creation: # To disable use of watchdog via wdmd and disable high priority features SANLOCKOPTS="-U sanlock -G sanlock -w 0 -h 0" [root@zabbix sysconfig]# systemctl restart sanlock [root@zabbix sysconfig]# systemctl status sanlock ● sanlock.service - Shared Storage Lease Manager Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled) Active: active (running) since Ср 2016-04-13 14:18:06 SAMT; 6s ago Process: 18304 ExecStop=/lib/systemd/systemd-sanlock stop (code=exited, status=0/SUCCESS) Process: 18401 ExecStart=/lib/systemd/systemd-sanlock start (code=exited, status=0/SUCCESS) Main PID: 18407 (sanlock) CGroup: /user.slice/user-0.slice/session-1091.scope/system.slice/sanlock.service ├─18407 sanlock daemon -U sanlock -G sanlock -w 0 -h 0 └─18408 sanlock daemon -U sanlock -G sanlock -w 0 -h 0 апр 13 14:18:06 zabbix.p98.belkam.com systemd[1]: Starting Shared Storage Lease Manager... апр 13 14:18:06 zabbix.p98.belkam.com systemd-sanlock[18401]: Starting sanlock: [ OK ] апр 13 14:18:06 zabbix.p98.belkam.com systemd[1]: Started Shared Storage Lease Manager. [root@zabbix sysconfig]# [root@zabbix lib64]# touch /tmp/test.slk && sanlock direct init -s test:0:/tmp/test.slk:0 init done 0 [root@zabbix lib64]# chown sanlock:sanlock /tmp/test.slk [root@zabbix lib64]# time sanlock client add_lockspace -s test:1:/tmp/test.slk:0 add_lockspace add_lockspace done 0 real 0m21.005s user 0m0.001s sys 0m0.002s [root@zabbix lib64]# systemctl stop sanlock [root@zabbix lib64]# systemctl status sanlock ● sanlock.service - Shared Storage Lease Manager Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Ср 2016-04-13 14:20:59 SAMT; 11s ago Process: 19140 ExecStop=/lib/systemd/systemd-sanlock stop (code=exited, status=1/FAILURE) Process: 18401 ExecStart=/lib/systemd/systemd-sanlock start (code=exited, status=0/SUCCESS) Main PID: 18407 (code=killed, signal=KILL) апр 13 14:19:19 zabbix.p98.belkam.com systemd[1]: Stopping Shared Storage Lease Manager... апр 13 14:19:19 zabbix.p98.belkam.com systemd-sanlock[19140]: Sending stop signal sanlock (18407): [ OK ] апр 13 14:19:29 zabbix.p98.belkam.com systemd-sanlock[19140]: Waiting for sanlock (18407) to stop:[FAILED] апр 13 14:19:29 zabbix.p98.belkam.com systemd[1]: sanlock.service: control process exited, code=exited status=1 апр 13 14:19:29 zabbix.p98.belkam.com sanlock[18407]: 2016-04-13 14:19:29+0400 5530701 [18407]: helper pid 18408 term signal 15 апр 13 14:20:59 zabbix.p98.belkam.com systemd[1]: sanlock.service stop-sigterm timed out. Killing. апр 13 14:20:59 zabbix.p98.belkam.com systemd[1]: sanlock.service: main process exited, code=killed, status=9/KILL апр 13 14:20:59 zabbix.p98.belkam.com systemd[1]: Stopped Shared Storage Lease Manager. апр 13 14:20:59 zabbix.p98.belkam.com systemd[1]: Unit sanlock.service entered failed state. апр 13 14:20:59 zabbix.p98.belkam.com systemd[1]: sanlock.service failed. [root@zabbix lib64]# As you can see it is finally killed. In my tests this problem is always reproducible. Also, add_lockspace takes 21 seconds, may be it is OK, but too slow in my opinion, looks like something wrong here... Thank you!
There are new native systemd files that make sanlock work better with systemd. It doesn't stop sanlock while lockspaces exist. Before stopping sanlock, stop applications using sanlock so that they remove their lockspaces. Or, you can run the command 'sanlock shutdown -f 1' which will cause sanlock to remove lockspaces first. At default, the add lockspace delay should be at least 20 seconds -- twice the default io timeout (10 seconds).
Hello! Thank you! As I see in natvie files there is SendSIGKILL=no so sanlock will not be killed by systemd. But there are no options to "manually" create and delete lockspaces :-( Unfortunately libvirt automatic option doesn't work for us- we want to place vm disks on glusterfs, and libvirt doesn't create automatic locks in this case. We'll create our own systemd units for sanlock start and stop then... And I'd like to mention that sometimes lockspace creation tooks far longer then 20 seconds: [root@zabbix tmp]# systemctl stop sanlock [root@zabbix tmp]# systemctl start sanlock [root@zabbix tmp]# time sanlock client add_lockspace -s test:1:/tmp/test.slk:0 add_lockspace add_lockspace done 0 real 2m41.019s user 0m0.002s sys 0m0.002s [root@zabbix tmp]# I don't know how to reproduce this though - it happens sometimes, this is all I know. Could you tell me is it expected behaviour? Thank you!
just reproduced [root@zabbix tmp]# systemctl stop sanlock systemd killed sanlock here, because there was lockspace. [root@zabbix tmp]# systemctl start sanlock [root@zabbix tmp]# time sanlock client add_lockspace -s test:1:/tmp/test.slk:0 add_lockspace add_lockspace done 0 real 2m41.019s user 0m0.000s sys 0m0.003s [root@zabbix tmp]# I guess this may be correct behaviour because of locks... If it is, then, please, close this issue , because there is no bug here, just lack of infrastructure for manual locks... Thank you!
If the lockspace is not cleanly removed, then the long timeout is needed the next time it's added. (It sounds like you're reinventing ovirt/rhev.)
Hello! Thank you, just a note, no , we are not reinventing, we want to use libvirt and, really , don't need automatic HA, although we plan to try implement this. Anyway, we need something far more simple and easier in support, then ovirt...
I'm having similar symptoms. No matter what I try my systems wont reboot/shutdown cleanly. I was getting the feeling that the reason for this is that other hosts have locks in the same lockspace and the locks correspond to storage that is accessible through the same path on the local host as it is in the remote host. If that were the case it would never shutdown properly until every domain/pool on the whole cluster releases its locks. I think if I were using by-path targets I wouldn't be having this issue. But I am using LVM VGs as pools and as far as I know I can't change the target.
Start by shutting things down manually to find a sequence of steps that produces a clean result. In this manual process, you want to make sure that any application that is using sanlock has been shut down cleanly first. After those applications have been stopped, run 'sanlock status' to check that no lockspaces exist. If none do, then you should be able to run 'systemctl stop sanlock' to cleanly shut down sanlock. If there are applications that used sanlock, but did not remove their lockspaces when they shut down, then you could introduce an new step in the shutdown process to remove those lockspaces. This new step could be run in a new systemd unit file, which should run after the applications stop and before sanlock is stopped. This new unit file could do something like this: for i in `sanlock gets | awk '{ print $2 }'`; do sanlock rem_lockspace -s $i; done Or, a shortcut for this is to just run 'sanlock shutdown -f 1' which will automatically remove lockspaces (assuming that no leases exist in them.) The application(s) that use sanlock should ideally have their own methods (or unit files) for removing any lockspaces they use so that you don't need to clean up for them.