1298461 – Host under maintenance still have sanlock lockspaces which prevents the upgrade of the sanlock package

Bug 1298461 - Host under maintenance still have sanlock lockspaces which prevents the upgrade of the sanlock package

Summary: Host under maintenance still have sanlock lockspaces which prevents the upgra...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-hosted-engine-ha
Sub Component:
Version:	3.5.7
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	ovirt-3.5.8
Target Release:	3.5.8
Assignee:	Simone Tiraboschi
QA Contact:	Ilanit Stein
Docs Contact:
URL:
Whiteboard:
Depends On:	1282187
Blocks:	1294353
TreeView+	depends on / blocked

Reported:	2016-01-14 07:45 UTC by Sandro Bonazzola
Modified:	2016-02-29 14:49 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, on service stop, ovirt-ha-agent wasn't properly cleaning sanlock lockspaces. Upgrading the sanlock package with active locks can result in a reboot in the middle of the upgrade. Now, ovirt-ha-agent correctly cleans on exit.
Clone Of:	1282187
Environment:
Last Closed:	2016-02-29 14:49:36 UTC
oVirt Team:	Integration
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0318	normal	SHIPPED_LIVE	ovirt-hosted-engine-ha bug fix update	2016-02-29 19:48:16 UTC
oVirt gerrit	49387	None	None	None	2016-01-14 07:45:21 UTC
oVirt gerrit	49598	None	None	None	2016-01-14 07:45:21 UTC
oVirt gerrit	50009	None	None	None	2016-01-14 07:45:21 UTC
oVirt gerrit	50127	None	None	None	2016-01-14 07:45:21 UTC
oVirt gerrit	51792	ovirt-hosted-engine-ha-1.2	MERGED	Stop monitoring domain during maintenance / shutdown	2016-01-19 14:22:45 UTC

Description Sandro Bonazzola 2016-01-14 07:45:22 UTC

+++ This bug was initially created as a clone of Bug #1282187 +++

Description of problem:
Have hosted engine environment with two hosts, after I put on of hosts to maintenance via engine, host will still have two sanlock lockspaces, that can prevent to correct upgrade of sanlock package.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.2.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted engine on two hosts
2. Add storage domain to environment and add some test vm
3. Put one of hosts to maintenance

Actual results:
Host still have two sanlock lockspaces

Expected results:
Host not have any sanlock lockspaces

Additional info:
you can see output of sanlock client status -D under script_output.txt

you can see bug https://bugzilla.redhat.com/show_bug.cgi?id=1278369 for additional details

I see that host under maintenance via engine and also via hosted-engine --vm-status
--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma07.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
Local maintenance                  : True
Host timestamp                     : 242733
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=242733 (Sun Nov 15 11:11:17 2015)
        host-id=1
        score=0
        maintenance=True
        state=LocalMaintenance

--- Additional comment from Artyom on 2015-11-15 06:38 EST ---



--- Additional comment from Nir Soffer on 2015-11-16 13:26:52 EST ---

Martin, can you explain what are the lockspaces we see in attachment 1094386 [details]?

Does stopping hosted engine agent remove these lockspaces?

--- Additional comment from Nir Soffer on 2015-11-16 13:28:20 EST ---

Artyom, what is this mount?

10.35.64.11\:_vol_RHEV_virt_alukiano__HE__upgrade/

If the host is in maintenance, I don't expect to find this mount point
active, and sanlock should not use it.

--- Additional comment from Nir Soffer on 2015-11-16 13:59:25 EST ---

Artyom, the logs and the output of the script are conflicting.

The output of the script suggests that on Sun Nov 15 10:04:06 IST 2015
the host was in maintenance. But according to the logs, the host 
was in maintenance in from 2015-11-15 11:05:27 to 2015-11-15 11:11:13

Thread-20::INFO::2015-11-15 11:05:27,886::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': 'a04857b8-7375-4c5d-a834-d0d20ef00946'}]}

Thread-18::INFO::2015-11-15 11:11:13,612::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'protocol_version': 3, 'connection': '10.35.64.11:/vol/RHEV/virt/alukiano_HE_upgrade', 'user': 'kvm', 'id': 'a04857b8-7375-4c5d-a834-d0d20ef00946'}], options=None)

On 10:04:06, the host was connected to storage:

Thread-83::DEBUG::2015-11-15 10:04:05,851::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n500 bytes (500 B) copied, 0.000449901 s, 1.1 MB/s\n'; <rc> = 0
Detector thread::DEBUG::2015-11-15 10:04:06,056::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:56009
Detector thread::DEBUG::2015-11-15 10:04:06,061::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 127.0.0.1:56009
Detector thread::DEBUG::2015-11-15 10:04:06,061::protocoldetector::247::vds.MultiProtocolAcceptor::(_handle_connection_read) Detected protocol xml from 127.0.0.1:56009
Detector thread::DEBUG::2015-11-15 10:04:06,061::BindingXMLRPC::1173::XmlDetector::(handleSocket) xml over http detected from ('127.0.0.1', 56009)
Thread-1457::DEBUG::2015-11-15 10:04:06,062::BindingXMLRPC::1133::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('8408e3bd-22bc-497e-b98f-cbd50ca754b3',) {}
Thread-1457::DEBUG::2015-11-15 10:04:06,063::BindingXMLRPC::1140::vds::(wrapper) return vmGetStats with {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
Detector thread::DEBUG::2015-11-15 10:04:06,081::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:56010
Detector thread::DEBUG::2015-11-15 10:04:06,085::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 127.0.0.1:56010
Detector thread::DEBUG::2015-11-15 10:04:06,085::protocoldetector::247::vds.MultiProtocolAcceptor::(_handle_connection_read) Detected protocol xml from 127.0.0.1:56010
Detector thread::DEBUG::2015-11-15 10:04:06,085::BindingXMLRPC::1173::XmlDetector::(handleSocket) xml over http detected from ('127.0.0.1', 56010)
Thread-1458::DEBUG::2015-11-15 10:04:06,087::task::595::Storage.TaskManager.Task::(_updateState) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::moving from state init -> state preparing
Thread-1458::INFO::2015-11-15 10:04:06,087::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1458::INFO::2015-11-15 10:04:06,087::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'5a6cc8b0-07f7-45c3-8bb5-7ba630170130': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000449901', 'lastCheck': '0.2', 'valid': True}}
Thread-1458::DEBUG::2015-11-15 10:04:06,087::task::1191::Storage.TaskManager.Task::(prepare) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::finished: {'5a6cc8b0-07f7-45c3-8bb5-7ba630170130': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000449901', 'lastCheck': '0.2', 'valid': True}}
Thread-1458::DEBUG::2015-11-15 10:04:06,087::task::595::Storage.TaskManager.Task::(_updateState) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::moving from state preparing -> state finished
Thread-1458::DEBUG::2015-11-15 10:04:06,087::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1458::DEBUG::2015-11-15 10:04:06,087::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1458::DEBUG::2015-11-15 10:04:06,088::task::993::Storage.TaskManager.Task::(_decref) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::ref 0 aborting False
Detector thread::DEBUG::2015-11-15 10:04:07,054::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:56011

Maybe we are getting different timestamps from from the script and from
the logs?

To make sure we are testing the right thing, lets repeat this test with 
some more logs:

1. Have a running hosted engine setup on this environment

2. Add message to vdsm log:
   echo "==== `date` maintenance - start ====" >> /var/log/vdsm/vdsm.log

3. Move host to maintenance

4. When host is in maintenance mode in engine, add the next log
   to vdsm log:
   echo "==== `date` maintenance - finish ====" >> /var/log/vdsm/vdsm.log

5. Add log before stopping hosted engine agent:
   echo "==== `date` stopping he agent - start ====" >> /var/log/vdsm/vdsm.log

6. Stop hosted engine agent

7. Add log after stopping hosted engine agent:
   echo "==== `date` stopping he agent - finish ====" >> /var/log/vdsm/vdsm.log

8. Run the suggested script again so we can see sanlock state.

--- Additional comment from Nir Soffer on 2015-11-16 14:41:24 EST ---

Artyom, the attached vdsm.log looks like wrong combination
of different log files:

MainThread::INFO::2015-11-15 11:39:22,892::vdsm::145::vds::(run) <WorkerThread(Thread-8, started daemon 140012301575936)>
VM Channels Listener::INFO::2015-11-15 11:39:23,801::vmchannels::183::vds::(run) VM channels listener thread has ended.
MainThread::INFO::2015-11-15 09:40:09,395::vdsm::132::vds::(run) (PID: 24404) I am the actual vdsm 4.16.29-1.el7ev alma07.qa.lab.tlv.redhat.com (3.10.0-229.24.2.el7.x86_64)
MainThread::DEBUG::2015-11-15 09:40:09,396::resourceManager::421::Storage.ResourceManager::(registerNamespace) Registering namespace 'Storage'

Please avoid these modifications of the logs. If two log files
are needed, attach unmodified logs.

--- Additional comment from Artyom on 2015-11-17 06:32 EST ---

1) It storage domain that I use when deploy hosted engine
2) Attached new logs and script outputs
3) I did edit vdsm log, for me it looks like vdsm log have different date, before added to engine and after(not sure about it)
I hope it will help you)

--- Additional comment from Artyom on 2015-11-17 06:34:48 EST ---

3) I did not edit vdsm log

--- Additional comment from Tal Nisan on 2015-11-17 07:31:39 EST ---

Nir, is this a blocker indeed?

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-11-17 07:31:44 EST ---

This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset.
Please set the correct milestone or add the z-stream flag.

--- Additional comment from Nir Soffer on 2015-11-17 08:36:59 EST ---

Martin, thanks for passing the bug to me, but I don't know what you are
doing in the hosted engine setup, so I will need your answers.
See comment 2.

--- Additional comment from Martin Sivák on 2015-11-18 04:31:47 EST ---

Uh what? Tal moved the bug to you.

And with regards to lockspaces: 

hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

Those two will go away when agent is stopped.

I have no idea about the rest, none of those belong to hosted engine agent. They seem to be related to VMs or SPM operation.

Hosted engine agent's lockspace always ends with ha_agent.lockspace on NFS. And they seem to start with hosted-engine on iSCSI.

--- Additional comment from Tal Nisan on 2015-11-18 08:40:22 EST ---

This is an upgrade issue so pushing out

--- Additional comment from Nir Soffer on 2015-11-20 08:04:24 EST ---

Tal, please avoid setting needinfo unless you need info from me.

--- Additional comment from Nir Soffer on 2015-11-20 08:45:27 EST ---

Thanks Artyom for the detailed logs.

Looking at the new logs:


1. Connecting to storage pool

Thread-126566::INFO::2015-11-17 13:07:12,867::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID=u'00000002-0002-0002-0002-0000000001c6', hostID=1, msdUUID=u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7', masterVersion=1, domainsMap={u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7': u'active'}, options=None)


2. connection finished

Thread-126566::INFO::2015-11-17 13:07:13,094::logUtils::47::dispatcher::(wrapper) Run and protect: connectStoragePool, Return response: True


3. Disconnecting from storage pool

==== Tue Nov 17 13:08:27 IST 2015 maintenance - start ====
Thread-126682::INFO::2015-11-17 13:08:34,627::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID=u'00000002-0002-0002-0002-0000000001c6', hostID=1, remove=False, options=None)
Thread-126682::INFO::2015-11-17 13:08:39,070::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStoragePool, Return response: True

==== Tue Nov 17 13:08:53 IST 2015 maintenance - finish ====


4. Stopping hosted engine agent

==== Tue Nov 17 13:11:18 IST 2015 stopping he agent - start ====

# agent.log
MainThread::INFO::2015-11-17 13:11:26,854::agent::121::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

==== Tue Nov 17 13:11:53 IST 2015 stopping he agent - finish ====


5. According to script output, we still have 2 lockspaces, from
   13:12:06 until 13:12:56 when the script ends.

s hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

s 5a6cc8b0-07f7-45c3-8bb5-7ba630170130:1:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_virt_alukiano__HE__upgrade/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/dom_md/ids:0


6. Hosted engine agent starts up again

# agent.log
MainThread::INFO::2015-11-17 13:14:24,984::agent::57::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.2.8 started


According to comment 11 this lockspace should be removed when stopping the agent:

hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

Moving back to Martin to investigate the hosted engine side.

In the meantime, I will check what is the second lockspace. I think it is also related to
hosted engine, but I need to investigate this further.

--- Additional comment from Nir Soffer on 2015-11-20 09:04:03 EST ---

Regarding the second lockspace:

s 5a6cc8b0-07f7-45c3-8bb5-7ba630170130:1:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_virt_alukiano__HE__upgrade/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/dom_md/ids:0

This is a lockspace on the domain 5a6cc8b0-07f7-45c3-8bb5-7ba630170130

This domain is not part of the pool - when we connect to the stoarage
pool, we can see that vdsm is managing only one domain:

Thread-126566::INFO::2015-11-17 13:07:12,867::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID=u'00000002-0002-0002-0002-0000000001c6', hostID=1, 
msdUUID=u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7', masterVersion=1, domainsMap={u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7': u'active'}, options=None)

Domain 34da7c89-06e3-49a3-9a44-912c0ff0b6e7 is owned by vdsm. When we 
disconnect from the storage pool, vdsm stop monitoring this domain:

Thread-126682::INFO::2015-11-17 13:08:37,068::domainMonitor::115::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 34da7c89-06e3-49a3-9a44-912c0ff0b6e7
Thread-126570::DEBUG::2015-11-17 13:08:37,068::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 34da7c89-06e3-49a3-9a44-912c0ff0b6e7
Thread-126570::INFO::2015-11-17 13:08:37,069::clusterlock::242::Storage.SANLock::(releaseHostId) Releasing host id for domain 34da7c89-06e3-49a3-9a44-912c0ff0b6e7 (id: 1)

I suspect that this domain is owned by hosted agent agent.

Vdsm supports monitoring external storage domains. External programs can start
monitoring these domains using the startMonitoringDomain verb. When they are
done, they must invoke stopMonitoringDomain verb to stop monitoring the domain.

See https://github.com/oVirt/vdsm/blob/master/lib/api/vdsmapi-schema.json#L2242

We need to get older logs files showing the when vdsm started to monitor
this domain.

Artyom, can you find and attach the log with this line:

    Run and protect: startMonitoringDomain

--- Additional comment from Artyom on 2015-11-22 04:36 EST ---

I found in one of vdsm log line connect to message:
grep "Run and protect: startMonitoringDomain" vdsm.log*
vdsm.log.83:Thread-77::INFO::2015-11-15 11:17:12,286::logUtils::44::dispatcher::(wrapper) Run and protect: startMonitoringDomain(sdUUID='5a6cc8b0-07f7-45c3-8bb5-7ba630170130', hostID=1, options=None)
vdsm.log.83:Thread-77::INFO::2015-11-15 11:17:12,291::logUtils::47::dispatcher::(wrapper) Run and protect: startMonitoringDomain, Return response: None
vdsm.log.83:Thread-82::INFO::2015-11-15 09:41:24,323::logUtils::44::dispatcher::(wrapper) Run and protect: startMonitoringDomain(sdUUID='5a6cc8b0-07f7-45c3-8bb5-7ba630170130', hostID=1, options=None)
vdsm.log.83:Thread-82::INFO::2015-11-15 09:41:24,327::logUtils::47::dispatcher::(wrapper) Run and protect: startMonitoringDomain, Return response: None

--- Additional comment from Nir Soffer on 2015-11-22 06:11:59 EST ---

(In reply to Artyom from comment #16)
> Created attachment 1097510 [details]
> vdsm.log.83

> vdsm.log.83:Thread-82::INFO::2015-11-15
> 09:41:24,323::logUtils::44::dispatcher::(wrapper) Run and protect:
> startMonitoringDomain(sdUUID='5a6cc8b0-07f7-45c3-8bb5-7ba630170130',
> hostID=1, options=None)
> vdsm.log.83:Thread-82::INFO::2015-11-15
> 09:41:24,327::logUtils::47::dispatcher::(wrapper) Run and protect:
> startMonitoringDomain, Return response: None

Artyom, is this the time when hosted engine was installed?

We must find the program that started monitoring - it owns this
domain and must stop monitoring it when it is not needed.

--- Additional comment from Artyom on 2015-11-22 06:36:38 EST ---

I believe yes, because it the oldest one. If you need some additional information, ping me and I will provide you all you need or will give you access to my host.

--- Additional comment from Nir Soffer on 2015-11-22 09:13:50 EST ---

Simone, is it possible that hosted engine setup is starting to monitor
the hosted engine vm, import the vm into engine, and do not stop monitoring
the domain after import?

--- Additional comment from Simone Tiraboschi on 2015-11-23 06:04:22 EST ---

On upgrade you don't re-run hosted-engine-setup at all.

--- Additional comment from Martin Sivák on 2015-11-23 07:48:08 EST ---

The agent is calling startMonitoringDomain it seems. But it always needs that unless it is being killed.. what happens to domain monitoring when vdsm dies first? Is it restored on VDSM restart?

--- Additional comment from Nir Soffer on 2015-11-23 08:34:54 EST ---

(In reply to Martin Sivák from comment #21)
> The agent is calling startMonitoringDomain it seems. But it always needs
> that unless it is being killed.. what happens to domain monitoring when vdsm
> dies first? Is it restored on VDSM restart?

External domains are owned by the caller, vdsm does not manage them. If you
start monitoring you should stop it, and if vdsm restart you should start
monitoring again.

This verb is also deprecated, seems like a temporary to solution added for 
hosted engine.

Once the hosted engine is imported into the dc, hosted engine can stop
monitoring this domain.

Note that stopping monitoring will release the host id on that domain, which may cause failures in the imported domain, until it will acquire the host
id again.

--- Additional comment from Martin Sivák on 2015-11-23 08:48:33 EST ---

I think we were using it as there was no other way to make sure vdsm already knows about the storage. Hosted engine storage was supposed to be mounted by vdsm (prepareImage).. but it was probably a bit complicated without a SD.

--- Additional comment from Simone Tiraboschi on 2015-11-23 08:55:41 EST ---

If all the host in the data-center goes down at the same time due to a power-outage, we still have to properly handle the distributed process of starting the engine.

--- Additional comment from Nir Soffer on 2015-11-29 17:57:14 EST ---

The attached patch does not fix this issue, but it allows the hosted engine
agent to stop monitoring the hosted engine storage domain when entering 
local maintenance in a race free way.

Martin, what do you think?

--- Additional comment from Yaniv Dary on 2015-12-01 07:32:57 EST ---

QE reports this is a test blocker for them. moving back to 3.6.1.
Upgrade should work.

--- Additional comment from Allon Mureinik on 2015-12-01 11:40:47 EST ---

Nir, doesn't you last series of patches fix this issue too?

--- Additional comment from Nir Soffer on 2015-12-01 12:32:42 EST ---

(In reply to Allon Mureinik from comment #27)
> Nir, doesn't you last series of patches fix this issue too?

No, but no hosted engine agent can safely stop monitoring the hosted engine
storage domain, which will release the host id, and allow an upgrade.

--- Additional comment from Yaniv Dary on 2015-12-07 08:50:45 EST ---

What is the status of this bug?

--- Additional comment from Artyom on 2015-12-13 03:23:55 EST ---

Hi Martin I see that hosts under maintenance still have one sanlock, if it desired behavior?

before maintenance:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status
s hosted-engine:2:/var/run/vdsm/storage/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/a5e71225-71cd-43a8-8f6b-8cc68c9537a3/01b08932-0c39-4db6-a190-6a4577a26853:0
s 7c5171bf-bf2b-46cd-b75a-41bbc3a969ba:2:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/dom_md/ids:0 ADD
s cec30bfe-9640-4da6-94e9-3118f13d0af0:2:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade__nfs__0/cec30bfe-9640-4da6-94e9-3118f13d0af0/dom_md/ids:0 ADD


after maintenance:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status
s hosted-engine:2:/var/run/vdsm/storage/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/a5e71225-71cd-43a8-8f6b-8cc68c9537a3/01b08932-0c39-4db6-a190-6a4577a26853:0

--- Additional comment from Martin Sivák on 2015-12-14 03:48:00 EST ---

Yes, this is the expected behaviour. The hosted engine agent has to be stopped to release its lock. It needs it to publish statuses to other nodes and it does that even in maintenance mode.

--- Additional comment from Artyom on 2015-12-14 04:46:21 EST ---

Ok thanks Martin
Verified on ovirt-hosted-engine-ha-1.3.3.3-1.el7ev.noarch

After maintenance:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status
s hosted-engine:2:/var/run/vdsm/storage/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/a5e71225-71cd-43a8-8f6b-8cc68c9537a3/01b08932-0c39-4db6-a190-6a4577a26853:0

After stop ovirt-ha-agent:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status

--- Additional comment from Sandro Bonazzola on 2015-12-16 07:19:16 EST ---

According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.

Comment 2 Artyom 2016-01-28 13:37:37 UTC

Verified on ovirt-hosted-engine-ha-1.2.9-1.el7ev.noarch
# hosted-engine --set-maintenance --mode=global

# sanlock client status
daemon d9f2f03e-2679-45ab-89fe-c4387b40ba56.rose05.qa.
p -1 helper
p -1 listener
p 31126 HostedEngine
p -1 status
s hosted-engine:1:/var/run/vdsm/storage/c8b5968c-68e8-4ed6-b309-163fc9e148bb/24ee7f20-f2eb-4734-9741-ead1eb66bcc2/b209bc7f-3154-430f-b9ef-0d3bd584e589:0
s c8b5968c-68e8-4ed6-b309-163fc9e148bb:1:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade/c8b5968c-68e8-4ed6-b309-163fc9e148bb/dom_md/ids:0
r c8b5968c-68e8-4ed6-b309-163fc9e148bb:c8394374-b882-4ca4-8461-bfc38fcdd4ee:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade/c8b5968c-68e8-4ed6-b309-163fc9e148bb/images/ce92d50c-d04f-4f22-82cd-f64bf3635a35/c8394374-b882-4ca4-8461-bfc38fcdd4ee.lease:0:2 p 31126

# hosted-engine --vm-poweroff
# sanlock client status
daemon d9f2f03e-2679-45ab-89fe-c4387b40ba56.rose05.qa.
p -1 helper
p -1 listener
p -1 status
s hosted-engine:1:/var/run/vdsm/storage/c8b5968c-68e8-4ed6-b309-163fc9e148bb/24ee7f20-f2eb-4734-9741-ead1eb66bcc2/b209bc7f-3154-430f-b9ef-0d3bd584e589:0
s c8b5968c-68e8-4ed6-b309-163fc9e148bb:1:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade/c8b5968c-68e8-4ed6-b309-163fc9e148bb/dom_md/ids:0

# systemctl stop ovirt-ha-agent
# sanlock client status
daemon d9f2f03e-2679-45ab-89fe-c4387b40ba56.rose05.qa.
p -1 helper
p -1 listener
p -1 status

Comment 4 errata-xmlrpc 2016-02-29 14:49:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0318.html

Note You need to log in before you can comment on or make changes to this bug.