Bug 1282187

Summary:

Host under maintenance still have sanlock lockspaces which prevents the upgrade of the sanlock package

Product:

[oVirt] ovirt-hosted-engine-ha

Reporter:

Artyom <alukiano>

Component:

Agent

Assignee:

Martin Sivák <msivak>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Artyom <alukiano>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

1.2.8

CC:

alukiano, amureini, bugs, dfediuck, gklein, mavital, msivak, nsoffer, sbonazzo, stirabos, tnisan, ylavi

Target Milestone:

ovirt-3.6.1

Keywords:

TestBlocker, Triaged

Target Release:

1.3.3.2

Flags:

rule-engine: ovirt-3.6.z+
rule-engine: blocker+
ylavi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+

Hardware:

x86_64

OS:

Linux

Whiteboard:

sla

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1298461 (view as bug list)

Environment:

Last Closed:

2015-12-16 12:19:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

SLA

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1284954, 1285700, 1294353, 1298461

Attachments:

Description	Flags
host logs	none
output of script	none
logs and script outputs	none
vdsm.log.83	none

Description Artyom 2015-11-15 11:37:50 UTC

Created attachment 1094385 [details]
host logs

Description of problem:
Have hosted engine environment with two hosts, after I put on of hosts to maintenance via engine, host will still have two sanlock lockspaces, that can prevent to correct upgrade of sanlock package.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.2.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted engine on two hosts
2. Add storage domain to environment and add some test vm
3. Put one of hosts to maintenance

Actual results:
Host still have two sanlock lockspaces

Expected results:
Host not have any sanlock lockspaces

Additional info:
you can see output of sanlock client status -D under script_output.txt

you can see bug https://bugzilla.redhat.com/show_bug.cgi?id=1278369 for additional details

I see that host under maintenance via engine and also via hosted-engine --vm-status
--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma07.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
Local maintenance                  : True
Host timestamp                     : 242733
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=242733 (Sun Nov 15 11:11:17 2015)
        host-id=1
        score=0
        maintenance=True
        state=LocalMaintenance

Comment 1 Artyom 2015-11-15 11:38:37 UTC

Created attachment 1094386 [details]
output of script

Comment 2 Nir Soffer 2015-11-16 18:26:52 UTC

Martin, can you explain what are the lockspaces we see in attachment 1094386 [details]?

Does stopping hosted engine agent remove these lockspaces?

Comment 3 Nir Soffer 2015-11-16 18:28:20 UTC

Artyom, what is this mount?

10.35.64.11\:_vol_RHEV_virt_alukiano__HE__upgrade/

If the host is in maintenance, I don't expect to find this mount point
active, and sanlock should not use it.

Comment 4 Nir Soffer 2015-11-16 18:59:25 UTC

Artyom, the logs and the output of the script are conflicting.

The output of the script suggests that on Sun Nov 15 10:04:06 IST 2015
the host was in maintenance. But according to the logs, the host 
was in maintenance in from 2015-11-15 11:05:27 to 2015-11-15 11:11:13

Thread-20::INFO::2015-11-15 11:05:27,886::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': 'a04857b8-7375-4c5d-a834-d0d20ef00946'}]}

Thread-18::INFO::2015-11-15 11:11:13,612::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'protocol_version': 3, 'connection': '10.35.64.11:/vol/RHEV/virt/alukiano_HE_upgrade', 'user': 'kvm', 'id': 'a04857b8-7375-4c5d-a834-d0d20ef00946'}], options=None)

On 10:04:06, the host was connected to storage:

Thread-83::DEBUG::2015-11-15 10:04:05,851::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n500 bytes (500 B) copied, 0.000449901 s, 1.1 MB/s\n'; <rc> = 0
Detector thread::DEBUG::2015-11-15 10:04:06,056::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:56009
Detector thread::DEBUG::2015-11-15 10:04:06,061::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 127.0.0.1:56009
Detector thread::DEBUG::2015-11-15 10:04:06,061::protocoldetector::247::vds.MultiProtocolAcceptor::(_handle_connection_read) Detected protocol xml from 127.0.0.1:56009
Detector thread::DEBUG::2015-11-15 10:04:06,061::BindingXMLRPC::1173::XmlDetector::(handleSocket) xml over http detected from ('127.0.0.1', 56009)
Thread-1457::DEBUG::2015-11-15 10:04:06,062::BindingXMLRPC::1133::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('8408e3bd-22bc-497e-b98f-cbd50ca754b3',) {}
Thread-1457::DEBUG::2015-11-15 10:04:06,063::BindingXMLRPC::1140::vds::(wrapper) return vmGetStats with {'status': {'message': 'Virtual machine does not exist', 'code': 1}}
Detector thread::DEBUG::2015-11-15 10:04:06,081::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:56010
Detector thread::DEBUG::2015-11-15 10:04:06,085::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 127.0.0.1:56010
Detector thread::DEBUG::2015-11-15 10:04:06,085::protocoldetector::247::vds.MultiProtocolAcceptor::(_handle_connection_read) Detected protocol xml from 127.0.0.1:56010
Detector thread::DEBUG::2015-11-15 10:04:06,085::BindingXMLRPC::1173::XmlDetector::(handleSocket) xml over http detected from ('127.0.0.1', 56010)
Thread-1458::DEBUG::2015-11-15 10:04:06,087::task::595::Storage.TaskManager.Task::(_updateState) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::moving from state init -> state preparing
Thread-1458::INFO::2015-11-15 10:04:06,087::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1458::INFO::2015-11-15 10:04:06,087::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'5a6cc8b0-07f7-45c3-8bb5-7ba630170130': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000449901', 'lastCheck': '0.2', 'valid': True}}
Thread-1458::DEBUG::2015-11-15 10:04:06,087::task::1191::Storage.TaskManager.Task::(prepare) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::finished: {'5a6cc8b0-07f7-45c3-8bb5-7ba630170130': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000449901', 'lastCheck': '0.2', 'valid': True}}
Thread-1458::DEBUG::2015-11-15 10:04:06,087::task::595::Storage.TaskManager.Task::(_updateState) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::moving from state preparing -> state finished
Thread-1458::DEBUG::2015-11-15 10:04:06,087::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1458::DEBUG::2015-11-15 10:04:06,087::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1458::DEBUG::2015-11-15 10:04:06,088::task::993::Storage.TaskManager.Task::(_decref) Task=`5e933a40-bf10-4c23-bc83-cc41699813f1`::ref 0 aborting False
Detector thread::DEBUG::2015-11-15 10:04:07,054::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:56011

Maybe we are getting different timestamps from from the script and from
the logs?

To make sure we are testing the right thing, lets repeat this test with 
some more logs:

1. Have a running hosted engine setup on this environment

2. Add message to vdsm log:
   echo "==== `date` maintenance - start ====" >> /var/log/vdsm/vdsm.log

3. Move host to maintenance

4. When host is in maintenance mode in engine, add the next log
   to vdsm log:
   echo "==== `date` maintenance - finish ====" >> /var/log/vdsm/vdsm.log

5. Add log before stopping hosted engine agent:
   echo "==== `date` stopping he agent - start ====" >> /var/log/vdsm/vdsm.log

6. Stop hosted engine agent

7. Add log after stopping hosted engine agent:
   echo "==== `date` stopping he agent - finish ====" >> /var/log/vdsm/vdsm.log

8. Run the suggested script again so we can see sanlock state.

Comment 5 Nir Soffer 2015-11-16 19:41:24 UTC

Artyom, the attached vdsm.log looks like wrong combination
of different log files:

MainThread::INFO::2015-11-15 11:39:22,892::vdsm::145::vds::(run) <WorkerThread(Thread-8, started daemon 140012301575936)>
VM Channels Listener::INFO::2015-11-15 11:39:23,801::vmchannels::183::vds::(run) VM channels listener thread has ended.
MainThread::INFO::2015-11-15 09:40:09,395::vdsm::132::vds::(run) (PID: 24404) I am the actual vdsm 4.16.29-1.el7ev alma07.qa.lab.tlv.redhat.com (3.10.0-229.24.2.el7.x86_64)
MainThread::DEBUG::2015-11-15 09:40:09,396::resourceManager::421::Storage.ResourceManager::(registerNamespace) Registering namespace 'Storage'

Please avoid these modifications of the logs. If two log files
are needed, attach unmodified logs.

Comment 6 Artyom 2015-11-17 11:32:37 UTC

Created attachment 1095437 [details]
logs and script outputs

1) It storage domain that I use when deploy hosted engine
2) Attached new logs and script outputs
3) I did edit vdsm log, for me it looks like vdsm log have different date, before added to engine and after(not sure about it)
I hope it will help you)

Comment 7 Artyom 2015-11-17 11:34:48 UTC

3) I did not edit vdsm log

Comment 8 Tal Nisan 2015-11-17 12:31:39 UTC

Nir, is this a blocker indeed?

Comment 9 Red Hat Bugzilla Rules Engine 2015-11-17 12:31:44 UTC

This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset.
Please set the correct milestone or add the z-stream flag.

Comment 10 Nir Soffer 2015-11-17 13:36:59 UTC

Martin, thanks for passing the bug to me, but I don't know what you are
doing in the hosted engine setup, so I will need your answers.
See comment 2.

Comment 11 Martin Sivák 2015-11-18 09:31:47 UTC

Uh what? Tal moved the bug to you.

And with regards to lockspaces: 

hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

Those two will go away when agent is stopped.

I have no idea about the rest, none of those belong to hosted engine agent. They seem to be related to VMs or SPM operation.

Hosted engine agent's lockspace always ends with ha_agent.lockspace on NFS. And they seem to start with hosted-engine on iSCSI.

Comment 12 Tal Nisan 2015-11-18 13:40:22 UTC

This is an upgrade issue so pushing out

Comment 13 Nir Soffer 2015-11-20 13:04:24 UTC

Tal, please avoid setting needinfo unless you need info from me.

Comment 14 Nir Soffer 2015-11-20 13:45:27 UTC

Thanks Artyom for the detailed logs.

Looking at the new logs:


1. Connecting to storage pool

Thread-126566::INFO::2015-11-17 13:07:12,867::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID=u'00000002-0002-0002-0002-0000000001c6', hostID=1, msdUUID=u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7', masterVersion=1, domainsMap={u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7': u'active'}, options=None)


2. connection finished

Thread-126566::INFO::2015-11-17 13:07:13,094::logUtils::47::dispatcher::(wrapper) Run and protect: connectStoragePool, Return response: True


3. Disconnecting from storage pool

==== Tue Nov 17 13:08:27 IST 2015 maintenance - start ====
Thread-126682::INFO::2015-11-17 13:08:34,627::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID=u'00000002-0002-0002-0002-0000000001c6', hostID=1, remove=False, options=None)
Thread-126682::INFO::2015-11-17 13:08:39,070::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStoragePool, Return response: True

==== Tue Nov 17 13:08:53 IST 2015 maintenance - finish ====


4. Stopping hosted engine agent

==== Tue Nov 17 13:11:18 IST 2015 stopping he agent - start ====

# agent.log
MainThread::INFO::2015-11-17 13:11:26,854::agent::121::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

==== Tue Nov 17 13:11:53 IST 2015 stopping he agent - finish ====


5. According to script output, we still have 2 lockspaces, from
   13:12:06 until 13:12:56 when the script ends.

s hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

s 5a6cc8b0-07f7-45c3-8bb5-7ba630170130:1:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_virt_alukiano__HE__upgrade/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/dom_md/ids:0


6. Hosted engine agent starts up again

# agent.log
MainThread::INFO::2015-11-17 13:14:24,984::agent::57::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 1.2.8 started


According to comment 11 this lockspace should be removed when stopping the agent:

hosted-engine:1:/var/run/vdsm/storage/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/a66d5806-f79c-4195-bd9b-37f954282ed5/fca4a3b2-717e-400e-aea0-62c9972eed2e:0

Moving back to Martin to investigate the hosted engine side.

In the meantime, I will check what is the second lockspace. I think it is also related to
hosted engine, but I need to investigate this further.

Comment 15 Nir Soffer 2015-11-20 14:04:03 UTC

Regarding the second lockspace:

s 5a6cc8b0-07f7-45c3-8bb5-7ba630170130:1:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_virt_alukiano__HE__upgrade/5a6cc8b0-07f7-45c3-8bb5-7ba630170130/dom_md/ids:0

This is a lockspace on the domain 5a6cc8b0-07f7-45c3-8bb5-7ba630170130

This domain is not part of the pool - when we connect to the stoarage
pool, we can see that vdsm is managing only one domain:

Thread-126566::INFO::2015-11-17 13:07:12,867::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID=u'00000002-0002-0002-0002-0000000001c6', hostID=1, 
msdUUID=u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7', masterVersion=1, domainsMap={u'34da7c89-06e3-49a3-9a44-912c0ff0b6e7': u'active'}, options=None)

Domain 34da7c89-06e3-49a3-9a44-912c0ff0b6e7 is owned by vdsm. When we 
disconnect from the storage pool, vdsm stop monitoring this domain:

Thread-126682::INFO::2015-11-17 13:08:37,068::domainMonitor::115::Storage.DomainMonitor::(stopMonitoring) Stop monitoring 34da7c89-06e3-49a3-9a44-912c0ff0b6e7
Thread-126570::DEBUG::2015-11-17 13:08:37,068::domainMonitor::192::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for 34da7c89-06e3-49a3-9a44-912c0ff0b6e7
Thread-126570::INFO::2015-11-17 13:08:37,069::clusterlock::242::Storage.SANLock::(releaseHostId) Releasing host id for domain 34da7c89-06e3-49a3-9a44-912c0ff0b6e7 (id: 1)

I suspect that this domain is owned by hosted agent agent.

Vdsm supports monitoring external storage domains. External programs can start
monitoring these domains using the startMonitoringDomain verb. When they are
done, they must invoke stopMonitoringDomain verb to stop monitoring the domain.

See https://github.com/oVirt/vdsm/blob/master/lib/api/vdsmapi-schema.json#L2242

We need to get older logs files showing the when vdsm started to monitor
this domain.

Artyom, can you find and attach the log with this line:

    Run and protect: startMonitoringDomain

Comment 16 Artyom 2015-11-22 09:36:36 UTC

Created attachment 1097510 [details]
vdsm.log.83

I found in one of vdsm log line connect to message:
grep "Run and protect: startMonitoringDomain" vdsm.log*
vdsm.log.83:Thread-77::INFO::2015-11-15 11:17:12,286::logUtils::44::dispatcher::(wrapper) Run and protect: startMonitoringDomain(sdUUID='5a6cc8b0-07f7-45c3-8bb5-7ba630170130', hostID=1, options=None)
vdsm.log.83:Thread-77::INFO::2015-11-15 11:17:12,291::logUtils::47::dispatcher::(wrapper) Run and protect: startMonitoringDomain, Return response: None
vdsm.log.83:Thread-82::INFO::2015-11-15 09:41:24,323::logUtils::44::dispatcher::(wrapper) Run and protect: startMonitoringDomain(sdUUID='5a6cc8b0-07f7-45c3-8bb5-7ba630170130', hostID=1, options=None)
vdsm.log.83:Thread-82::INFO::2015-11-15 09:41:24,327::logUtils::47::dispatcher::(wrapper) Run and protect: startMonitoringDomain, Return response: None

Comment 17 Nir Soffer 2015-11-22 11:11:59 UTC

(In reply to Artyom from comment #16)
> Created attachment 1097510 [details]
> vdsm.log.83

> vdsm.log.83:Thread-82::INFO::2015-11-15
> 09:41:24,323::logUtils::44::dispatcher::(wrapper) Run and protect:
> startMonitoringDomain(sdUUID='5a6cc8b0-07f7-45c3-8bb5-7ba630170130',
> hostID=1, options=None)
> vdsm.log.83:Thread-82::INFO::2015-11-15
> 09:41:24,327::logUtils::47::dispatcher::(wrapper) Run and protect:
> startMonitoringDomain, Return response: None

Artyom, is this the time when hosted engine was installed?

We must find the program that started monitoring - it owns this
domain and must stop monitoring it when it is not needed.

Comment 18 Artyom 2015-11-22 11:36:38 UTC

I believe yes, because it the oldest one. If you need some additional information, ping me and I will provide you all you need or will give you access to my host.

Comment 19 Nir Soffer 2015-11-22 14:13:50 UTC

Simone, is it possible that hosted engine setup is starting to monitor
the hosted engine vm, import the vm into engine, and do not stop monitoring
the domain after import?

Comment 20 Simone Tiraboschi 2015-11-23 11:04:22 UTC

On upgrade you don't re-run hosted-engine-setup at all.

Comment 21 Martin Sivák 2015-11-23 12:48:08 UTC

The agent is calling startMonitoringDomain it seems. But it always needs that unless it is being killed.. what happens to domain monitoring when vdsm dies first? Is it restored on VDSM restart?

Comment 22 Nir Soffer 2015-11-23 13:34:54 UTC

(In reply to Martin Sivák from comment #21)
> The agent is calling startMonitoringDomain it seems. But it always needs
> that unless it is being killed.. what happens to domain monitoring when vdsm
> dies first? Is it restored on VDSM restart?

External domains are owned by the caller, vdsm does not manage them. If you
start monitoring you should stop it, and if vdsm restart you should start
monitoring again.

This verb is also deprecated, seems like a temporary to solution added for 
hosted engine.

Once the hosted engine is imported into the dc, hosted engine can stop
monitoring this domain.

Note that stopping monitoring will release the host id on that domain, which may cause failures in the imported domain, until it will acquire the host
id again.

Comment 23 Martin Sivák 2015-11-23 13:48:33 UTC

I think we were using it as there was no other way to make sure vdsm already knows about the storage. Hosted engine storage was supposed to be mounted by vdsm (prepareImage).. but it was probably a bit complicated without a SD.

Comment 24 Simone Tiraboschi 2015-11-23 13:55:41 UTC

If all the host in the data-center goes down at the same time due to a power-outage, we still have to properly handle the distributed process of starting the engine.

Comment 25 Nir Soffer 2015-11-29 22:57:14 UTC

The attached patch does not fix this issue, but it allows the hosted engine
agent to stop monitoring the hosted engine storage domain when entering 
local maintenance in a race free way.

Martin, what do you think?

Comment 26 Yaniv Lavi 2015-12-01 12:32:57 UTC

QE reports this is a test blocker for them. moving back to 3.6.1.
Upgrade should work.

Comment 27 Allon Mureinik 2015-12-01 16:40:47 UTC

Nir, doesn't you last series of patches fix this issue too?

Comment 28 Nir Soffer 2015-12-01 17:32:42 UTC

(In reply to Allon Mureinik from comment #27)
> Nir, doesn't you last series of patches fix this issue too?

No, but no hosted engine agent can safely stop monitoring the hosted engine
storage domain, which will release the host id, and allow an upgrade.

Comment 29 Yaniv Lavi 2015-12-07 13:50:45 UTC

What is the status of this bug?

Comment 30 Artyom 2015-12-13 08:23:55 UTC

Hi Martin I see that hosts under maintenance still have one sanlock, if it desired behavior?

before maintenance:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status
s hosted-engine:2:/var/run/vdsm/storage/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/a5e71225-71cd-43a8-8f6b-8cc68c9537a3/01b08932-0c39-4db6-a190-6a4577a26853:0
s 7c5171bf-bf2b-46cd-b75a-41bbc3a969ba:2:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/dom_md/ids:0 ADD
s cec30bfe-9640-4da6-94e9-3118f13d0af0:2:/rhev/data-center/mnt/10.35.64.11\:_vol_RHEV_Virt_alukiano__HE__upgrade__nfs__0/cec30bfe-9640-4da6-94e9-3118f13d0af0/dom_md/ids:0 ADD


after maintenance:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status
s hosted-engine:2:/var/run/vdsm/storage/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/a5e71225-71cd-43a8-8f6b-8cc68c9537a3/01b08932-0c39-4db6-a190-6a4577a26853:0

Comment 31 Martin Sivák 2015-12-14 08:48:00 UTC

Yes, this is the expected behaviour. The hosted engine agent has to be stopped to release its lock. It needs it to publish statuses to other nodes and it does that even in maintenance mode.

Comment 32 Artyom 2015-12-14 09:46:21 UTC

Ok thanks Martin
Verified on ovirt-hosted-engine-ha-1.3.3.3-1.el7ev.noarch

After maintenance:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status
s hosted-engine:2:/var/run/vdsm/storage/7c5171bf-bf2b-46cd-b75a-41bbc3a969ba/a5e71225-71cd-43a8-8f6b-8cc68c9537a3/01b08932-0c39-4db6-a190-6a4577a26853:0

After stop ovirt-ha-agent:
# sanlock client status
daemon 162c27d5-da7c-4cfc-a6e2-0e49eaf4da2c.master-vds
p -1 helper
p -1 listener
p -1 status

Comment 33 Sandro Bonazzola 2015-12-16 12:19:16 UTC

According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.