Bug 1173951 - [hosted-engine] [iSCSI support] Hosted-engine VM pasues on EIO when deactivating an iSCSI domain in the setup
Summary: [hosted-engine] [iSCSI support] Hosted-engine VM pasues on EIO when deactivat...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: x86_64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Liron Aravot
QA Contact: Elad
URL:
Whiteboard: storage
Depends On: 1171452
Blocks: rhev35rcblocker rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2014-12-14 12:37 UTC by Elad
Modified: 2016-02-10 18:57 UTC (History)
18 users (show)

Fixed In Version: ovirt-engine-3.5.0_vt13.4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-16 19:10:58 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs from host and engine (6.55 MB, application/x-gzip)
2014-12-14 12:37 UTC, Elad
no flags Details
sosreport and db dump (7.10 MB, application/x-xz)
2014-12-16 08:35 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 36212 0 master MERGED core: avoid saving duplicate records when saving LUN connections Never
oVirt gerrit 36236 0 ovirt-engine-3.5 MERGED core: avoid saving duplicate records when saving LUN connections Never

Description Elad 2014-12-14 12:37:54 UTC
Created attachment 968457 [details]
logs from host and engine

Description of problem:

While testing the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1157239, I haven't noticed that putting an iSCSI domain in maintenance causes the hosted engine VM to pause. I didn't noticed it since the engine itself wasn't funtioning while its VM was paused, so there was no event on it. 

I tested this scenario again and I this time I saw that the VM which the engine is installed on, stops functioning for a brief time while putting an iSCSI domain in maintenance. I checked the logs in the host and saw that the VM moved to paused and then resumed. I tried the scenario one more time and then the VM wasn't even resumed from pasued, I had to resume it manually using virsh.

Version-Release number of selected component (if applicable):
RHEL6.6 installed on the host
rhev 3.5 vt13.1
vdsm-4.16.8.1-2.el6ev.x86_64
ovirt-hosted-engine-setup-1.2.1-7.el6ev.noarch
ovirt-hosted-engine-ha-1.2.4-2.el6ev.noarch
libvirt-0.10.2-46.el6_6.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64
selinux-policy-3.7.19-260.el6.noarch
sanlock-2.8-1.el6.x86_64

rhevm-3.5.0-0.23.beta.el6ev.noarch


How reproducible:
Always

Steps to Reproduce: 
The same steps from https://bugzilla.redhat.com/show_bug.cgi?id=1157239:

1. Deploy hosted-engine using iSCSI
2. Create an iSCSI storage domain using a LUN from the same storage server where the engine's VM disk is located. Create one more storage domain (nfs)
3. Put the iSCSI domain in maintenance


Actual results:
The VM gets paused on EIO. The event is not reported by engine because the machine (VM) that the engine is installed on is paused. I noticed a delay of engine resonse for browsing between tabs in webadmin. 

iSCSI domain moved to maitenance:

2014-12-14 11:57:48,427 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (org.ovirt.thread.pool-7-thread-22) [2f516469] START, DisconnectStorageServerVDSCommand(HostName = hosted_engine_1, HostId = 7ce081ee-cf0d-486d-9d76-7a6e63a0f3d6, storagePoolId = 00000002-0002-0002-0002-0000000002e8, storageType = ISCSI, connectionList = [{ id: 40caad41-6985-402f-90a6-00e3d1a3fe13, connection: 10.35.146.129, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: b1c3058b-1a60-44a8-af0e-0a0302d6262d, connection: 10.35.146.161, iqn: iqn.2008-05.com.xtremio:001e675b8ee1, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: fc73b008-1af8-4fca-975f-526fa55364bd, connection: 10.35.146.193, iqn: iqn.2008-05.com.xtremio:001e675ba170, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 44d36dd7-cfc7-4804-b206-2e91819701a3, connection: 10.35.146.225, iqn: iqn.2008-05.com.xtremio:001e675ba171, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 6937c373


**There is a time difference of 2 hours between engine and host***

I checked vdsm.log and saw the following (  'status': 'Paused', ):


Thread-646873::DEBUG::2014-12-14 13:58:38,500::BindingXMLRPC::1149::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'displayInfo': [{'tlsPort': u'5900', 'ipAddress':
 '0', 'type': 'spice', 'port': '-1'}], 'memUsage': '0', 'acpiEnable': 'true', 'guestFQDN': '', 'pid': '25041', 'session': 'Unknown', 'displaySecurePort': u'5900', 'timeOffset': '0', 'balloonInfo': {}, 'pauseCode':
 'EIO', 'network': {u'vnet0': {'macAddr': '00:16:3E:76:D5:D5', 'rxDropped': '0', 'rxErrors': '0', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u
'vnet0'}}, 'vmType': 'kvm', 'cpuUser': '0.33', 'elapsedTime': '418576', 'vmJobs': {}, 'cpuSys': '0.25', 'appsList': [], 'displayType': 'qxl', 'vcpuCount': '2', 'clientIp': '', 'hash': '8422439248032234059', 'vmId'
: '3d9ba108-351e-4c07-ae06-95b4457b799e', 'displayIp': '0', 'vcpuPeriod': 100000L, 'displayPort': '-1', 'vcpuQuota': '-1', 'kvmEnable': 'true', 'disks': {u'vda': {'readLatency': '0', 'apparentsize': '26843545600',
 'writeLatency': '0', 'imageID': '1897de25-698e-4f49-aca3-516afd1c02a5', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '26843545600', 'writeRate': '0.00'}, u'hdc': {'readLatency': '0', 'apparentsize': '0', 
'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}, 'monitorResponse': '0', 'statsAge': '0.30', 'username': 'Unknown', 'status': 'Paused', 'guestCPUCount': -1, 'io
Tune': [], 'guestIPs': ''}]}



Expected results:
The hosted engine VM should not pause on EIO when deactivating an iSCSI domain in the setup.


Additional info:
logs from host and engine

Comment 1 Allon Mureinik 2014-12-14 21:06:43 UTC
Liron, why are we disconnecting from the target? Shouldn't we check that no LUN disc uses this connection?

Sando - are we registereing the HE's DirectLUN's connection? If not, I don't see how this can be avoided.

Comment 2 Sandro Bonazzola 2014-12-15 12:16:39 UTC
Allon we register the disc used by the Hosted Engine VM as direct LUN.
We had a BZ requiring that and as far as I can tell it works, see https://bugzilla.redhat.com/show_bug.cgi?id=1157239#c11.
If I've understood correctly, the iSCSI domain Elad is disactivating is not the same used by the Hosted Engine, it's just an iSCSI domain on the same host providing the storage domain for Hosted Engine.

Elad can you confirm?

Comment 3 Liron Aravot 2014-12-15 13:09:52 UTC
(In reply to Allon Mureinik from comment #1)
> Liron, why are we disconnecting from the target? Shouldn't we check that no
> LUN disc uses this connection?
> 

That should be checked before disconnecting.

Elad, the engine log you pasted is incorrect.
Please attach the correct log and a database dump.

thanks.

Comment 4 Elad 2014-12-16 08:35:06 UTC
Created attachment 969456 [details]
sosreport and db dump

sosreport and db dump attached.

Liron, it's possible you won't see any related event in engine log since the engine doesn't operate while the machine it is running on is paused.

Comment 5 Elad 2014-12-16 09:42:44 UTC
(In reply to Sandro Bonazzola from comment #2)
> Allon we register the disc used by the Hosted Engine VM as direct LUN.
> We had a BZ requiring that and as far as I can tell it works, see
> https://bugzilla.redhat.com/show_bug.cgi?id=1157239#c11.
> If I've understood correctly, the iSCSI domain Elad is disactivating is not
> the same used by the Hosted Engine, it's just an iSCSI domain on the same
> host providing the storage domain for Hosted Engine.
> 
> Elad can you confirm?

It's a domain located on the same storage server where the hosted-engine VM's disk is. It is connected via the same iSCSI target

Comment 6 Liron Aravot 2014-12-21 12:37:11 UTC
Adding the requires_release_note? after talk with amureini.
After the solution to this bug, if connection with same details alreasy exists in the system (same username/target/port/portal) it'll be used as the conenction for the new added lun (even if the password is different)-
that will be handled in bug https://bugzilla.redhat.com/show_bug.cgi?id=1176402 for 3.5.1

Comment 7 Elad 2014-12-21 15:57:10 UTC
Cannot be tested due to https://bugzilla.redhat.com/show_bug.cgi?id=1171452

Comment 8 Elad 2014-12-24 16:10:32 UTC
Hosted engine VM doesn't pause when the last iscsi domain, which located on the storage server where the engine image is deployed, is moved to maintenance.


Verified using rhev 3.5 vt13.5
rhevm-3.5.0-0.27.el6ev.noarch

Comment 9 Allon Mureinik 2015-02-16 19:10:58 UTC
RHEV-M 3.5.0 has been released, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.