Created attachment 968457 [details] logs from host and engine Description of problem: While testing the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1157239, I haven't noticed that putting an iSCSI domain in maintenance causes the hosted engine VM to pause. I didn't noticed it since the engine itself wasn't funtioning while its VM was paused, so there was no event on it. I tested this scenario again and I this time I saw that the VM which the engine is installed on, stops functioning for a brief time while putting an iSCSI domain in maintenance. I checked the logs in the host and saw that the VM moved to paused and then resumed. I tried the scenario one more time and then the VM wasn't even resumed from pasued, I had to resume it manually using virsh. Version-Release number of selected component (if applicable): RHEL6.6 installed on the host rhev 3.5 vt13.1 vdsm-4.16.8.1-2.el6ev.x86_64 ovirt-hosted-engine-setup-1.2.1-7.el6ev.noarch ovirt-hosted-engine-ha-1.2.4-2.el6ev.noarch libvirt-0.10.2-46.el6_6.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.448.el6.x86_64 selinux-policy-3.7.19-260.el6.noarch sanlock-2.8-1.el6.x86_64 rhevm-3.5.0-0.23.beta.el6ev.noarch How reproducible: Always Steps to Reproduce: The same steps from https://bugzilla.redhat.com/show_bug.cgi?id=1157239: 1. Deploy hosted-engine using iSCSI 2. Create an iSCSI storage domain using a LUN from the same storage server where the engine's VM disk is located. Create one more storage domain (nfs) 3. Put the iSCSI domain in maintenance Actual results: The VM gets paused on EIO. The event is not reported by engine because the machine (VM) that the engine is installed on is paused. I noticed a delay of engine resonse for browsing between tabs in webadmin. iSCSI domain moved to maitenance: 2014-12-14 11:57:48,427 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (org.ovirt.thread.pool-7-thread-22) [2f516469] START, DisconnectStorageServerVDSCommand(HostName = hosted_engine_1, HostId = 7ce081ee-cf0d-486d-9d76-7a6e63a0f3d6, storagePoolId = 00000002-0002-0002-0002-0000000002e8, storageType = ISCSI, connectionList = [{ id: 40caad41-6985-402f-90a6-00e3d1a3fe13, connection: 10.35.146.129, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: b1c3058b-1a60-44a8-af0e-0a0302d6262d, connection: 10.35.146.161, iqn: iqn.2008-05.com.xtremio:001e675b8ee1, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: fc73b008-1af8-4fca-975f-526fa55364bd, connection: 10.35.146.193, iqn: iqn.2008-05.com.xtremio:001e675ba170, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 44d36dd7-cfc7-4804-b206-2e91819701a3, connection: 10.35.146.225, iqn: iqn.2008-05.com.xtremio:001e675ba171, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 6937c373 **There is a time difference of 2 hours between engine and host*** I checked vdsm.log and saw the following ( 'status': 'Paused', ): Thread-646873::DEBUG::2014-12-14 13:58:38,500::BindingXMLRPC::1149::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'displayInfo': [{'tlsPort': u'5900', 'ipAddress': '0', 'type': 'spice', 'port': '-1'}], 'memUsage': '0', 'acpiEnable': 'true', 'guestFQDN': '', 'pid': '25041', 'session': 'Unknown', 'displaySecurePort': u'5900', 'timeOffset': '0', 'balloonInfo': {}, 'pauseCode': 'EIO', 'network': {u'vnet0': {'macAddr': '00:16:3E:76:D5:D5', 'rxDropped': '0', 'rxErrors': '0', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u 'vnet0'}}, 'vmType': 'kvm', 'cpuUser': '0.33', 'elapsedTime': '418576', 'vmJobs': {}, 'cpuSys': '0.25', 'appsList': [], 'displayType': 'qxl', 'vcpuCount': '2', 'clientIp': '', 'hash': '8422439248032234059', 'vmId' : '3d9ba108-351e-4c07-ae06-95b4457b799e', 'displayIp': '0', 'vcpuPeriod': 100000L, 'displayPort': '-1', 'vcpuQuota': '-1', 'kvmEnable': 'true', 'disks': {u'vda': {'readLatency': '0', 'apparentsize': '26843545600', 'writeLatency': '0', 'imageID': '1897de25-698e-4f49-aca3-516afd1c02a5', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '26843545600', 'writeRate': '0.00'}, u'hdc': {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}, 'monitorResponse': '0', 'statsAge': '0.30', 'username': 'Unknown', 'status': 'Paused', 'guestCPUCount': -1, 'io Tune': [], 'guestIPs': ''}]} Expected results: The hosted engine VM should not pause on EIO when deactivating an iSCSI domain in the setup. Additional info: logs from host and engine
Liron, why are we disconnecting from the target? Shouldn't we check that no LUN disc uses this connection? Sando - are we registereing the HE's DirectLUN's connection? If not, I don't see how this can be avoided.
Allon we register the disc used by the Hosted Engine VM as direct LUN. We had a BZ requiring that and as far as I can tell it works, see https://bugzilla.redhat.com/show_bug.cgi?id=1157239#c11. If I've understood correctly, the iSCSI domain Elad is disactivating is not the same used by the Hosted Engine, it's just an iSCSI domain on the same host providing the storage domain for Hosted Engine. Elad can you confirm?
(In reply to Allon Mureinik from comment #1) > Liron, why are we disconnecting from the target? Shouldn't we check that no > LUN disc uses this connection? > That should be checked before disconnecting. Elad, the engine log you pasted is incorrect. Please attach the correct log and a database dump. thanks.
Created attachment 969456 [details] sosreport and db dump sosreport and db dump attached. Liron, it's possible you won't see any related event in engine log since the engine doesn't operate while the machine it is running on is paused.
(In reply to Sandro Bonazzola from comment #2) > Allon we register the disc used by the Hosted Engine VM as direct LUN. > We had a BZ requiring that and as far as I can tell it works, see > https://bugzilla.redhat.com/show_bug.cgi?id=1157239#c11. > If I've understood correctly, the iSCSI domain Elad is disactivating is not > the same used by the Hosted Engine, it's just an iSCSI domain on the same > host providing the storage domain for Hosted Engine. > > Elad can you confirm? It's a domain located on the same storage server where the hosted-engine VM's disk is. It is connected via the same iSCSI target
Adding the requires_release_note? after talk with amureini. After the solution to this bug, if connection with same details alreasy exists in the system (same username/target/port/portal) it'll be used as the conenction for the new added lun (even if the password is different)- that will be handled in bug https://bugzilla.redhat.com/show_bug.cgi?id=1176402 for 3.5.1
Cannot be tested due to https://bugzilla.redhat.com/show_bug.cgi?id=1171452
Hosted engine VM doesn't pause when the last iscsi domain, which located on the storage server where the engine image is deployed, is moved to maintenance. Verified using rhev 3.5 vt13.5 rhevm-3.5.0-0.27.el6ev.noarch
RHEV-M 3.5.0 has been released, closing this bug.