Bug 1261996 - [hosted-engine] Storage pool UUID is modified when importing the HE storage domain to the setup. Hence, the HE VM dies
[hosted-engine] Storage pool UUID is modified when importing the HE storage d...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.HostedEngine (Show other bugs)
3.6.0
x86_64 Unspecified
urgent Severity urgent (vote)
: ovirt-3.6.0-rc
: 3.6.0
Assigned To: Sandro Bonazzola
Elad
: Regression
Depends On: 1269768 1291634
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-10 10:50 EDT by Elad
Modified: 2016-02-10 07:55 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-10 07:55:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.0+
mgoldboi: blocker+
sherold: planning_ack+
rgolan: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
vdsm.log (16.79 MB, text/plain)
2015-09-10 10:50 EDT, Elad
no flags Details
Relevant log file (893.68 KB, application/x-xz)
2015-09-10 11:32 EDT, Simone Tiraboschi
no flags Details
qemu.log (25.39 KB, text/plain)
2015-09-10 13:54 EDT, Elad
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 46234 master MERGED core: Don't disconnect while importing hosted engine storage domains Never
oVirt gerrit 46714 ovirt-engine-3.6 MERGED core: Don't disconnect while importing hosted engine storage domains Never
oVirt gerrit 46715 ovirt-engine-3.6.0 MERGED core: Don't disconnect while importing hosted engine storage domains Never

  None (edit)
Description Elad 2015-09-10 10:50:24 EDT
Created attachment 1072237 [details]
vdsm.log

Description of problem:
Storage pool UUID is modified when importing the storage domain where the hosted-engine's disk is located on. This should not happen, vdsm should leave this field in the storage domain metadata as it is.

Version-Release number of selected component (if applicable):
vdsm-4.17.5-1.el7ev.noarch
vdsm-python-4.17.5-1.el7ev.noarch
vdsm-infra-4.17.5-1.el7ev.noarch
vdsm-yajsonrpc-4.17.5-1.el7ev.noarch
vdsm-jsonrpc-4.17.5-1.el7ev.noarch
vdsm-cli-4.17.5-1.el7ev.noarch
vdsm-xmlrpc-4.17.5-1.el7ev.noarch
ovirt-host-deploy-1.4.0-0.0.5.master.el7ev.noarch
ovirt-hosted-engine-setup-1.3.0-0.5.beta.git2cd8189.el7ev.noarch
ovirt-vmconsole-host-1.0.0-0.0.1.master.el7ev.noarch
libgovirt-0.3.3-1.el7.x86_64
ovirt-vmconsole-1.0.0-0.0.1.master.el7ev.noarch
ovirt-hosted-engine-ha-1.3.0-0.4.beta.git1bcfe28.el7ev.noarch
ovirt-setup-lib-1.0.0-0.1.master.git6a54bc0.el7ev.noarch
libvirt-daemon-1.2.17-6.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64
sanlock-3.2.4-1.el7.x86_64
python-2.7.5-31.el7.x86_64
selinux-policy-3.13.1-46.el7.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy hosted-engine (over glusterfs in my case)
2. Once setup is up, add a new storage domain to the DC
3. Import the HE storage domain to the DC

Actual results:
The SP UUID of the HE domain metadata is modified to the SP UUID of the DC where it is imported to.

Expected results:
The SP UUID in the SD metadata should not be modified when importing the HE domain.

Additional info:
vdsm.log
Comment 1 Elad 2015-09-10 10:52:00 EDT
[root@green-vdsb dom_md]# cat metadata 
CLASS=Data
DESCRIPTION=hosted_storage
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
POOL_UUID=00000001-0001-0001-0001-000000000257
REMOTE_PATH=10.35.160.6:/elad1
ROLE=Regular
SDUUID=2cfaf5e4-34f3-4caf-bd92-0f22c866b51e
TYPE=GLUSTERFS
VERSION=3
_SHA_CKSUM=d5ed78f189334394fd5f323332d9b96121b049bf
[root@green-vdsb dom_md]# pwd
/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/dom_md
Comment 2 Simone Tiraboschi 2015-09-10 11:07:09 EDT
The engine is explicitly attaching to the target Datacenter storageDomain:

Thread-4658::DEBUG::2015-09-10 13:53:03,143::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StorageDomain.attach' in bridge with {u'storagepoolID': u'00000001-0001-0001-0001-000000000257', u'storagedomainID': u'2cfaf5e4-34f3-4caf-bd92-0f22c866b51e'}
Thread-4658::DEBUG::2015-09-10 13:53:03,144::task::595::Storage.TaskManager.Task::(_updateState) Task=`5bc25541-fcba-4f3c-9cab-8636f8ea1bd2`::moving from state init -> state preparing
Thread-4658::INFO::2015-09-10 13:53:03,145::logUtils::48::dispatcher::(wrapper) Run and protect: attachStorageDomain(sdUUID=u'2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', spUUID=u'00000001-0001-0001-0001-000000000257', options=None)
Comment 3 Simone Tiraboschi 2015-09-10 11:17:44 EDT
Than sanlock releases its lock on the hosted-engine storage domain:

Thread-4658::INFO::2015-09-10 13:53:03,409::clusterlock::339::Storage.SANLock::(release) Releasing cluster lock for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e
Thread-4658::DEBUG::2015-09-10 13:53:03,476::clusterlock::349::Storage.SANLock::(release) Cluster lock for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e successfully released
Thread-4658::INFO::2015-09-10 13:53:03,477::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e (id: 1)
Comment 4 Simone Tiraboschi 2015-09-10 11:22:47 EDT
And the hosted-engine VM went down:

93049:Thread-4660::DEBUG::2015-09-10 13:53:03,618::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {u'vmList': [u'01b3ce29-e575-47ef-89d2-335e4da7db36']}
93050:Thread-4660::DEBUG::2015-09-10 13:53:03,620::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.getVMFullList' in bridge with [{'guestFQDN': '', 'emulatedMachine': 'rhel6.5.0', 'pid': '22369', 'guestDiskMapping': {}, 'displaySecurePort': '-1', 'cpuType': 'Conroe', 'pauseCode': 'NOERR', 'smp': '2', 'vmType': 'kvm', 'memSize': 4096, 'vmName': 'HostedEngine', 'username': 'Unknown', 'vmId': '01b3ce29-e575-47ef-89d2-335e4da7db36', 'displayIp': '0', 'displayPort': u'5900', 'guestIPs': '', 'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'nicModel': 'rtl8139,pv', 'devices': [{'alias': 'console0', 'specParams': {}, 'deviceType': 'console', 'deviceId': '82d903bd-5cc4-455b-bafd-cf90ce386bb4', 'device': 'console', 'type': 'console'}, {'device': 'memballoon', 'specParams': {'model': 'none'}, 'deviceType': 'balloon', 'type': 'balloon', 'alias': u'balloon0'}, {'alias': u'scsi0', 'deviceType': 'controller', 'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'device': 'vnc', 'specParams': {'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'displayIp': '0'}, 'deviceType': 'graphics', 'type': 'graphics', 'port': u'5900'}, {'nicModel': 'pv', 'macAddr': '00:16:3E:76:D5:D5', 'linkActive': True, 'network': 'ovirtmgmt', 'specParams': {}, 'filter': 'vdsm-no-mac-spoofing', 'alias': u'net0', 'deviceType': 'interface', 'deviceId': '6083a9cd-afdf-41a9-b79f-42498ee624dc', 'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'bridge', 'type': 'interface', 'name': u'vnet0'}, {'index': '2', 'iface': 'ide', 'name': u'hdc', 'alias': u'ide0-1-0', 'specParams': {}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': 'e6cdce3f-08fc-48cb-8daa-1576dbdcff49', 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'poolID': '00000000-0000-0000-0000-000000000000', 'volumeInfo': {'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'volType': 'path', 'leaseOffset': 0, 'volumeID': u'88ee5207-8d75-4cc4-8636-4fce645ccff4', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4.lease', 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'path': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4'}, 'index': '0', 'iface': 'virtio', 'apparentsize': '26843545600', 'specParams': {}, 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'readonly': 'False', 'deviceType': 'disk', 'shared': 'exclusive', 'truesize': '5138464768', 'type': 'disk', 'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'reqsize': '0', 'format': 'raw', 'deviceId': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'disk', 'path': u'/var/run/vdsm/storage/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4', 'propagateErrors': 'off', 'optional': 'false', 'name': u'vda', 'bootOrder': u'1', 'volumeID': '88ee5207-8d75-4cc4-8636-4fce645ccff4', 'alias': u'virtio-disk0', 'volumeChain': [{'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'volType': 'path', 'leaseOffset': 0, 'volumeID': u'88ee5207-8d75-4cc4-8636-4fce645ccff4', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4.lease', 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'path': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4'}]}, {'device': u'usb', 'alias': u'usb', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x2'}}, {'device': u'ide', 'alias': u'ide', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x1'}}, {'device': u'virtio-serial', 'alias': u'virtio-serial0', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {'device': u'unix', 'alias': u'channel0', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}, {'device': u'unix', 'alias': u'channel1', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}, {'device': u'unix', 'alias': u'channel2', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'3'}}, {'device': '', 'alias': u'video0', 'deviceType': u'video', 'type': u'video', 'address': {u'slot': u'0x02', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}], 'status': 'Up', 'clientIp': '', 'statusTime': '4476662650', 'display': 'vnc'}]
93055:VM Channels Listener::DEBUG::2015-09-10 13:53:04,467::guestagent::218::virt.vm::(_connect) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Attempting connection to /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm
93056:VM Channels Listener::DEBUG::2015-09-10 13:53:04,468::guestagent::232::virt.vm::(_connect) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Failed to connect to /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm with 111
93057:libvirtEventLoop::DEBUG::2015-09-10 13:53:04,838::vm::4422::virt.vm::(onLibvirtLifecycleEvent) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::event Stopped detail 5 opaque None
93058:libvirtEventLoop::INFO::2015-09-10 13:53:04,838::vm::781::virt.vm::(_onQemuDeath) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::underlying process disconnected
93059:libvirtEventLoop::INFO::2015-09-10 13:53:04,838::vm::3685::virt.vm::(releaseVm) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Release VM resources
93082:periodic/0::WARNING::2015-09-10 13:53:05,955::periodic::253::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on ['01b3ce29-e575-47ef-89d2-335e4da7db36']
93120:libvirtEventLoop::WARNING::2015-09-10 13:53:06,489::utils::144::root::(rmFile) File: /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm already removed
93121:libvirtEventLoop::WARNING::2015-09-10 13:53:06,490::utils::144::root::(rmFile) File: /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.org.qemu.guest_agent.0 already removed
93123:libvirtEventLoop::INFO::2015-09-10 13:53:06,491::logUtils::48::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId='01b3ce29-e575-47ef-89d2-335e4da7db36')
93130:libvirtEventLoop::INFO::2015-09-10 13:53:06,495::vm::1276::virt.vm::(setDownStatus) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Changed state to Down: Lost connection with qemu process (code=2)
93131:libvirtEventLoop::DEBUG::2015-09-10 13:53:06,496::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4476665530, "01b3ce29-e575-47ef-89d2-335e4da7db36": {"status": "Down", "exitReason": 2, "exitMessage": "Lost connection with qemu process", "exitCode": 1}}, "jsonrpc": "2.0", "method": "|virt|VM_status|01b3ce29-e575-47ef-89d2-335e4da7db36"}
93139:Thread-4662::DEBUG::2015-09-10 13:53:06,940::bindingxmlrpc::1256::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('01b3ce29-e575-47ef-89d2-335e4da7db36',) {}
93140:Thread-4662::DEBUG::2015-09-10 13:53:06,941::bindingxmlrpc::1263::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'exitMessage': 'Lost connection with qemu process', 'statusTime': '4476665970', 'vmId': '01b3ce29-e575-47ef-89d2-335e4da7db36', 'exitReason': 2, 'exitCode': 1}]}
Comment 5 Simone Tiraboschi 2015-09-10 11:32:32 EDT
Created attachment 1072266 [details]
Relevant log file

Attaching the relevant log file
Comment 6 Roy Golan 2015-09-10 12:16:27 EDT
can you attach the qemu log so we would see the exact reason for the process to crash?

Importing the SD changes the mount link so I guess that could be the reason. Does VDSM pass the path for the disks with /rhev/datacenter/mnt/$SD or with /rhev/datacenter/$DC/$SD ?   what happens to the old drive path during the import?
Comment 7 Elad 2015-09-10 13:54:42 EDT
Created attachment 1072310 [details]
qemu.log

(In reply to Roy Golan from comment #6)
> can you attach the qemu log so we would see the exact reason for the process
> to crash?
qemu.log attached
> Importing the SD changes the mount link so I guess that could be the reason.
> Does VDSM pass the path for the disks with /rhev/datacenter/mnt/$SD or with
> /rhev/datacenter/$DC/$SD ?
Actually, it's using the /var/run/vdsm/storage/%spuuid%/%sduuid%/%imageuuid%/ path
>   what happens to the old drive path during the
> import?

This is the old path (from qemu.log):

/var/run/vdsm/storage/8e9e186f-b8f9-4202-aedf-273ac5bac204/9a693f7b-ddb0-4d44-ab52-1ac020c07511/faf31400-6208-46ee-bff1-8a6c86b273c2


This is the new path (from qemu.log):

/var/run/vdsm/storage/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4



Here is the content of the old path:


[root@green-vdsb 8e9e186f-b8f9-4202-aedf-273ac5bac204]# pwd
/var/run/vdsm/storage/8e9e186f-b8f9-4202-aedf-273ac5bac204
[root@green-vdsb 8e9e186f-b8f9-4202-aedf-273ac5bac204]# ll
total 16
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 4b9ec765-952f-458f-b38a-039fde48237d -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/4b9ec765-952f-458f-b38a-039fde48237d
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 9a693f7b-ddb0-4d44-ab52-1ac020c07511 -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/9a693f7b-ddb0-4d44-ab52-1ac020c07511
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 e5e78c3c-d47f-47e9-b0cb-642ccb465ecd -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/e5e78c3c-d47f-47e9-b0cb-642ccb465ecd
lrwxrwxrwx. 1 vdsm kvm 131 Sep  8 18:06 f18aae49-560e-4fb0-b23d-16725832ad14 -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/f18aae49-560e-4fb0-b23d-16725832ad14
Comment 8 Elad 2015-09-10 13:56:23 EDT
Sorry, it's using the /var/run/vdsm/storage/%sduuid%/%imageuuid%/ path (without spuuid)
Comment 9 Sandro Bonazzola 2015-09-11 05:37:54 EDT
Simone, is the sp changed by VDSM? By the engine? Who's changing it?
Comment 10 Simone Tiraboschi 2015-09-11 05:40:34 EDT
The engine explicitly sends attachStorageDomain using the storagePool of the target datacenter.
See:
https://bugzilla.redhat.com/show_bug.cgi?id=1261996#c2
Comment 11 Elad 2015-09-16 03:20:02 EDT
In such situation, as described here in the bug, the VM doesn't exist in the engine DB and it cannot be registered from the HE storage domain since this storage domain import action causes the setup to die with no possible workaround. Therefore, we cannot test the basic flow of HE VM migration.
Comment 12 Elad 2015-10-07 06:20:08 EDT
Blocked due to a bug with libvirt, it cannot be started as part of vdsm configuration, happens on RHEL7.2 as reported here https://bugzilla.redhat.com/show_bug.cgi?id=1269176
Comment 13 Elad 2015-11-05 05:03:23 EST
Simone, as we've seen, importing the hosted-engine SD still causes the VM to die since we have to change the SPUUID in /etc/ovirt-hosted-engine/hosted-engine.conf
(reported in 1275210 - which is closed as dup of 1269768)
Therefore, I can't verify this bug.
Comment 14 Simone Tiraboschi 2015-11-05 05:49:54 EST
(In reply to Elad from comment #13)
> Simone, as we've seen, importing the hosted-engine SD still causes the VM to
> die since we have to change the SPUUID in
> /etc/ovirt-hosted-engine/hosted-engine.conf
> (reported in 1275210 - which is closed as dup of 1269768)
> Therefore, I can't verify this bug.

We investigated it further: the spUUID change on storage domain import shouldn't stop the VM.
The issue appears to be with sanlock that kills qemu: Roy is on that.

Note You need to log in before you can comment on or make changes to this bug.