Bug 1261996

Summary: [hosted-engine] Storage pool UUID is modified when importing the HE storage domain to the setup. Hence, the HE VM dies
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.HostedEngineAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.0CC: acanan, bugs, ebenahar, gklein, lsurette, mgoldboi, mlipchuk, pstehlik, rgolan, sbonazzo, sherold, stirabos, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: Regression
Target Release: 3.6.0Flags: rule-engine: ovirt-3.6.0+
mgoldboi: blocker+
sherold: planning_ack+
rgolan: devel_ack+
pstehlik: testing_ack+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-10 12:55:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1269768, 1291634    
Bug Blocks:    
Attachments:
Description Flags
vdsm.log
none
Relevant log file
none
qemu.log none

Description Elad 2015-09-10 14:50:24 UTC
Created attachment 1072237 [details]
vdsm.log

Description of problem:
Storage pool UUID is modified when importing the storage domain where the hosted-engine's disk is located on. This should not happen, vdsm should leave this field in the storage domain metadata as it is.

Version-Release number of selected component (if applicable):
vdsm-4.17.5-1.el7ev.noarch
vdsm-python-4.17.5-1.el7ev.noarch
vdsm-infra-4.17.5-1.el7ev.noarch
vdsm-yajsonrpc-4.17.5-1.el7ev.noarch
vdsm-jsonrpc-4.17.5-1.el7ev.noarch
vdsm-cli-4.17.5-1.el7ev.noarch
vdsm-xmlrpc-4.17.5-1.el7ev.noarch
ovirt-host-deploy-1.4.0-0.0.5.master.el7ev.noarch
ovirt-hosted-engine-setup-1.3.0-0.5.beta.git2cd8189.el7ev.noarch
ovirt-vmconsole-host-1.0.0-0.0.1.master.el7ev.noarch
libgovirt-0.3.3-1.el7.x86_64
ovirt-vmconsole-1.0.0-0.0.1.master.el7ev.noarch
ovirt-hosted-engine-ha-1.3.0-0.4.beta.git1bcfe28.el7ev.noarch
ovirt-setup-lib-1.0.0-0.1.master.git6a54bc0.el7ev.noarch
libvirt-daemon-1.2.17-6.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64
sanlock-3.2.4-1.el7.x86_64
python-2.7.5-31.el7.x86_64
selinux-policy-3.13.1-46.el7.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine (over glusterfs in my case)
2. Once setup is up, add a new storage domain to the DC
3. Import the HE storage domain to the DC

Actual results:
The SP UUID of the HE domain metadata is modified to the SP UUID of the DC where it is imported to.

Expected results:
The SP UUID in the SD metadata should not be modified when importing the HE domain.

Additional info:
vdsm.log

Comment 1 Elad 2015-09-10 14:52:00 UTC
[root@green-vdsb dom_md]# cat metadata 
CLASS=Data
DESCRIPTION=hosted_storage
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
POOL_UUID=00000001-0001-0001-0001-000000000257
REMOTE_PATH=10.35.160.6:/elad1
ROLE=Regular
SDUUID=2cfaf5e4-34f3-4caf-bd92-0f22c866b51e
TYPE=GLUSTERFS
VERSION=3
_SHA_CKSUM=d5ed78f189334394fd5f323332d9b96121b049bf
[root@green-vdsb dom_md]# pwd
/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/dom_md

Comment 2 Simone Tiraboschi 2015-09-10 15:07:09 UTC
The engine is explicitly attaching to the target Datacenter storageDomain:

Thread-4658::DEBUG::2015-09-10 13:53:03,143::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StorageDomain.attach' in bridge with {u'storagepoolID': u'00000001-0001-0001-0001-000000000257', u'storagedomainID': u'2cfaf5e4-34f3-4caf-bd92-0f22c866b51e'}
Thread-4658::DEBUG::2015-09-10 13:53:03,144::task::595::Storage.TaskManager.Task::(_updateState) Task=`5bc25541-fcba-4f3c-9cab-8636f8ea1bd2`::moving from state init -> state preparing
Thread-4658::INFO::2015-09-10 13:53:03,145::logUtils::48::dispatcher::(wrapper) Run and protect: attachStorageDomain(sdUUID=u'2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', spUUID=u'00000001-0001-0001-0001-000000000257', options=None)

Comment 3 Simone Tiraboschi 2015-09-10 15:17:44 UTC
Than sanlock releases its lock on the hosted-engine storage domain:

Thread-4658::INFO::2015-09-10 13:53:03,409::clusterlock::339::Storage.SANLock::(release) Releasing cluster lock for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e
Thread-4658::DEBUG::2015-09-10 13:53:03,476::clusterlock::349::Storage.SANLock::(release) Cluster lock for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e successfully released
Thread-4658::INFO::2015-09-10 13:53:03,477::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e (id: 1)

Comment 4 Simone Tiraboschi 2015-09-10 15:22:47 UTC
And the hosted-engine VM went down:

93049:Thread-4660::DEBUG::2015-09-10 13:53:03,618::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {u'vmList': [u'01b3ce29-e575-47ef-89d2-335e4da7db36']}
93050:Thread-4660::DEBUG::2015-09-10 13:53:03,620::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.getVMFullList' in bridge with [{'guestFQDN': '', 'emulatedMachine': 'rhel6.5.0', 'pid': '22369', 'guestDiskMapping': {}, 'displaySecurePort': '-1', 'cpuType': 'Conroe', 'pauseCode': 'NOERR', 'smp': '2', 'vmType': 'kvm', 'memSize': 4096, 'vmName': 'HostedEngine', 'username': 'Unknown', 'vmId': '01b3ce29-e575-47ef-89d2-335e4da7db36', 'displayIp': '0', 'displayPort': u'5900', 'guestIPs': '', 'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'nicModel': 'rtl8139,pv', 'devices': [{'alias': 'console0', 'specParams': {}, 'deviceType': 'console', 'deviceId': '82d903bd-5cc4-455b-bafd-cf90ce386bb4', 'device': 'console', 'type': 'console'}, {'device': 'memballoon', 'specParams': {'model': 'none'}, 'deviceType': 'balloon', 'type': 'balloon', 'alias': u'balloon0'}, {'alias': u'scsi0', 'deviceType': 'controller', 'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'device': 'vnc', 'specParams': {'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'displayIp': '0'}, 'deviceType': 'graphics', 'type': 'graphics', 'port': u'5900'}, {'nicModel': 'pv', 'macAddr': '00:16:3E:76:D5:D5', 'linkActive': True, 'network': 'ovirtmgmt', 'specParams': {}, 'filter': 'vdsm-no-mac-spoofing', 'alias': u'net0', 'deviceType': 'interface', 'deviceId': '6083a9cd-afdf-41a9-b79f-42498ee624dc', 'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'bridge', 'type': 'interface', 'name': u'vnet0'}, {'index': '2', 'iface': 'ide', 'name': u'hdc', 'alias': u'ide0-1-0', 'specParams': {}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': 'e6cdce3f-08fc-48cb-8daa-1576dbdcff49', 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'poolID': '00000000-0000-0000-0000-000000000000', 'volumeInfo': {'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'volType': 'path', 'leaseOffset': 0, 'volumeID': u'88ee5207-8d75-4cc4-8636-4fce645ccff4', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4.lease', 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'path': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4'}, 'index': '0', 'iface': 'virtio', 'apparentsize': '26843545600', 'specParams': {}, 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'readonly': 'False', 'deviceType': 'disk', 'shared': 'exclusive', 'truesize': '5138464768', 'type': 'disk', 'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'reqsize': '0', 'format': 'raw', 'deviceId': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'disk', 'path': u'/var/run/vdsm/storage/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4', 'propagateErrors': 'off', 'optional': 'false', 'name': u'vda', 'bootOrder': u'1', 'volumeID': '88ee5207-8d75-4cc4-8636-4fce645ccff4', 'alias': u'virtio-disk0', 'volumeChain': [{'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'volType': 'path', 'leaseOffset': 0, 'volumeID': u'88ee5207-8d75-4cc4-8636-4fce645ccff4', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4.lease', 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'path': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4'}]}, {'device': u'usb', 'alias': u'usb', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x2'}}, {'device': u'ide', 'alias': u'ide', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x1'}}, {'device': u'virtio-serial', 'alias': u'virtio-serial0', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {'device': u'unix', 'alias': u'channel0', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}, {'device': u'unix', 'alias': u'channel1', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}, {'device': u'unix', 'alias': u'channel2', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'3'}}, {'device': '', 'alias': u'video0', 'deviceType': u'video', 'type': u'video', 'address': {u'slot': u'0x02', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}], 'status': 'Up', 'clientIp': '', 'statusTime': '4476662650', 'display': 'vnc'}]
93055:VM Channels Listener::DEBUG::2015-09-10 13:53:04,467::guestagent::218::virt.vm::(_connect) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Attempting connection to /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm
93056:VM Channels Listener::DEBUG::2015-09-10 13:53:04,468::guestagent::232::virt.vm::(_connect) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Failed to connect to /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm with 111
93057:libvirtEventLoop::DEBUG::2015-09-10 13:53:04,838::vm::4422::virt.vm::(onLibvirtLifecycleEvent) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::event Stopped detail 5 opaque None
93058:libvirtEventLoop::INFO::2015-09-10 13:53:04,838::vm::781::virt.vm::(_onQemuDeath) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::underlying process disconnected
93059:libvirtEventLoop::INFO::2015-09-10 13:53:04,838::vm::3685::virt.vm::(releaseVm) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Release VM resources
93082:periodic/0::WARNING::2015-09-10 13:53:05,955::periodic::253::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on ['01b3ce29-e575-47ef-89d2-335e4da7db36']
93120:libvirtEventLoop::WARNING::2015-09-10 13:53:06,489::utils::144::root::(rmFile) File: /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm already removed
93121:libvirtEventLoop::WARNING::2015-09-10 13:53:06,490::utils::144::root::(rmFile) File: /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.org.qemu.guest_agent.0 already removed
93123:libvirtEventLoop::INFO::2015-09-10 13:53:06,491::logUtils::48::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId='01b3ce29-e575-47ef-89d2-335e4da7db36')
93130:libvirtEventLoop::INFO::2015-09-10 13:53:06,495::vm::1276::virt.vm::(setDownStatus) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Changed state to Down: Lost connection with qemu process (code=2)
93131:libvirtEventLoop::DEBUG::2015-09-10 13:53:06,496::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4476665530, "01b3ce29-e575-47ef-89d2-335e4da7db36": {"status": "Down", "exitReason": 2, "exitMessage": "Lost connection with qemu process", "exitCode": 1}}, "jsonrpc": "2.0", "method": "|virt|VM_status|01b3ce29-e575-47ef-89d2-335e4da7db36"}
93139:Thread-4662::DEBUG::2015-09-10 13:53:06,940::bindingxmlrpc::1256::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('01b3ce29-e575-47ef-89d2-335e4da7db36',) {}
93140:Thread-4662::DEBUG::2015-09-10 13:53:06,941::bindingxmlrpc::1263::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'exitMessage': 'Lost connection with qemu process', 'statusTime': '4476665970', 'vmId': '01b3ce29-e575-47ef-89d2-335e4da7db36', 'exitReason': 2, 'exitCode': 1}]}

Comment 5 Simone Tiraboschi 2015-09-10 15:32:32 UTC
Created attachment 1072266 [details]
Relevant log file

Attaching the relevant log file

Comment 6 Roy Golan 2015-09-10 16:16:27 UTC
can you attach the qemu log so we would see the exact reason for the process to crash?

Importing the SD changes the mount link so I guess that could be the reason. Does VDSM pass the path for the disks with /rhev/datacenter/mnt/$SD or with /rhev/datacenter/$DC/$SD ?   what happens to the old drive path during the import?

Comment 7 Elad 2015-09-10 17:54:42 UTC
Created attachment 1072310 [details]
qemu.log

(In reply to Roy Golan from comment #6)
> can you attach the qemu log so we would see the exact reason for the process
> to crash?
qemu.log attached
> Importing the SD changes the mount link so I guess that could be the reason.
> Does VDSM pass the path for the disks with /rhev/datacenter/mnt/$SD or with
> /rhev/datacenter/$DC/$SD ?
Actually, it's using the /var/run/vdsm/storage/%spuuid%/%sduuid%/%imageuuid%/ path
>   what happens to the old drive path during the
> import?

This is the old path (from qemu.log):

/var/run/vdsm/storage/8e9e186f-b8f9-4202-aedf-273ac5bac204/9a693f7b-ddb0-4d44-ab52-1ac020c07511/faf31400-6208-46ee-bff1-8a6c86b273c2


This is the new path (from qemu.log):

/var/run/vdsm/storage/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4



Here is the content of the old path:


[root@green-vdsb 8e9e186f-b8f9-4202-aedf-273ac5bac204]# pwd
/var/run/vdsm/storage/8e9e186f-b8f9-4202-aedf-273ac5bac204
[root@green-vdsb 8e9e186f-b8f9-4202-aedf-273ac5bac204]# ll
total 16
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 4b9ec765-952f-458f-b38a-039fde48237d -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/4b9ec765-952f-458f-b38a-039fde48237d
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 9a693f7b-ddb0-4d44-ab52-1ac020c07511 -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/9a693f7b-ddb0-4d44-ab52-1ac020c07511
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 e5e78c3c-d47f-47e9-b0cb-642ccb465ecd -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/e5e78c3c-d47f-47e9-b0cb-642ccb465ecd
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 f18aae49-560e-4fb0-b23d-16725832ad14 -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/f18aae49-560e-4fb0-b23d-16725832ad14

Comment 8 Elad 2015-09-10 17:56:23 UTC
Sorry, it's using the /var/run/vdsm/storage/%sduuid%/%imageuuid%/ path (without spuuid)

Comment 9 Sandro Bonazzola 2015-09-11 09:37:54 UTC
Simone, is the sp changed by VDSM? By the engine? Who's changing it?

Comment 10 Simone Tiraboschi 2015-09-11 09:40:34 UTC
The engine explicitly sends attachStorageDomain using the storagePool of the target datacenter.
See:
https://bugzilla.redhat.com/show_bug.cgi?id=1261996#c2

Comment 11 Elad 2015-09-16 07:20:02 UTC
In such situation, as described here in the bug, the VM doesn't exist in the engine DB and it cannot be registered from the HE storage domain since this storage domain import action causes the setup to die with no possible workaround. Therefore, we cannot test the basic flow of HE VM migration.

Comment 12 Elad 2015-10-07 10:20:08 UTC
Blocked due to a bug with libvirt, it cannot be started as part of vdsm configuration, happens on RHEL7.2 as reported here https://bugzilla.redhat.com/show_bug.cgi?id=1269176

Comment 13 Elad 2015-11-05 10:03:23 UTC
Simone, as we've seen, importing the hosted-engine SD still causes the VM to die since we have to change the SPUUID in /etc/ovirt-hosted-engine/hosted-engine.conf
(reported in 1275210 - which is closed as dup of 1269768)
Therefore, I can't verify this bug.

Comment 14 Simone Tiraboschi 2015-11-05 10:49:54 UTC
(In reply to Elad from comment #13)
> Simone, as we've seen, importing the hosted-engine SD still causes the VM to
> die since we have to change the SPUUID in
> /etc/ovirt-hosted-engine/hosted-engine.conf
> (reported in 1275210 - which is closed as dup of 1269768)
> Therefore, I can't verify this bug.

We investigated it further: the spUUID change on storage domain import shouldn't stop the VM.
The issue appears to be with sanlock that kills qemu: Roy is on that.