Bug 1261996
Summary: | [hosted-engine] Storage pool UUID is modified when importing the HE storage domain to the setup. Hence, the HE VM dies | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Elad <ebenahar> | ||||||||
Component: | BLL.HostedEngine | Assignee: | Sandro Bonazzola <sbonazzo> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 3.6.0 | CC: | acanan, bugs, ebenahar, gklein, lsurette, mgoldboi, mlipchuk, pstehlik, rgolan, sbonazzo, sherold, stirabos, ykaul | ||||||||
Target Milestone: | ovirt-3.6.0-rc | Keywords: | Regression | ||||||||
Target Release: | 3.6.0 | Flags: | rule-engine:
ovirt-3.6.0+
mgoldboi: blocker+ sherold: planning_ack+ rgolan: devel_ack+ pstehlik: testing_ack+ |
||||||||
Hardware: | x86_64 | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-02-10 12:55:53 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1269768, 1291634 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
[root@green-vdsb dom_md]# cat metadata CLASS=Data DESCRIPTION=hosted_storage IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 POOL_UUID=00000001-0001-0001-0001-000000000257 REMOTE_PATH=10.35.160.6:/elad1 ROLE=Regular SDUUID=2cfaf5e4-34f3-4caf-bd92-0f22c866b51e TYPE=GLUSTERFS VERSION=3 _SHA_CKSUM=d5ed78f189334394fd5f323332d9b96121b049bf [root@green-vdsb dom_md]# pwd /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/dom_md The engine is explicitly attaching to the target Datacenter storageDomain: Thread-4658::DEBUG::2015-09-10 13:53:03,143::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StorageDomain.attach' in bridge with {u'storagepoolID': u'00000001-0001-0001-0001-000000000257', u'storagedomainID': u'2cfaf5e4-34f3-4caf-bd92-0f22c866b51e'} Thread-4658::DEBUG::2015-09-10 13:53:03,144::task::595::Storage.TaskManager.Task::(_updateState) Task=`5bc25541-fcba-4f3c-9cab-8636f8ea1bd2`::moving from state init -> state preparing Thread-4658::INFO::2015-09-10 13:53:03,145::logUtils::48::dispatcher::(wrapper) Run and protect: attachStorageDomain(sdUUID=u'2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', spUUID=u'00000001-0001-0001-0001-000000000257', options=None) Than sanlock releases its lock on the hosted-engine storage domain: Thread-4658::INFO::2015-09-10 13:53:03,409::clusterlock::339::Storage.SANLock::(release) Releasing cluster lock for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e Thread-4658::DEBUG::2015-09-10 13:53:03,476::clusterlock::349::Storage.SANLock::(release) Cluster lock for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e successfully released Thread-4658::INFO::2015-09-10 13:53:03,477::clusterlock::245::Storage.SANLock::(releaseHostId) Releasing host id for domain 2cfaf5e4-34f3-4caf-bd92-0f22c866b51e (id: 1) And the hosted-engine VM went down: 93049:Thread-4660::DEBUG::2015-09-10 13:53:03,618::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.getVMFullList' in bridge with {u'vmList': [u'01b3ce29-e575-47ef-89d2-335e4da7db36']} 93050:Thread-4660::DEBUG::2015-09-10 13:53:03,620::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.getVMFullList' in bridge with [{'guestFQDN': '', 'emulatedMachine': 'rhel6.5.0', 'pid': '22369', 'guestDiskMapping': {}, 'displaySecurePort': '-1', 'cpuType': 'Conroe', 'pauseCode': 'NOERR', 'smp': '2', 'vmType': 'kvm', 'memSize': 4096, 'vmName': 'HostedEngine', 'username': 'Unknown', 'vmId': '01b3ce29-e575-47ef-89d2-335e4da7db36', 'displayIp': '0', 'displayPort': u'5900', 'guestIPs': '', 'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'nicModel': 'rtl8139,pv', 'devices': [{'alias': 'console0', 'specParams': {}, 'deviceType': 'console', 'deviceId': '82d903bd-5cc4-455b-bafd-cf90ce386bb4', 'device': 'console', 'type': 'console'}, {'device': 'memballoon', 'specParams': {'model': 'none'}, 'deviceType': 'balloon', 'type': 'balloon', 'alias': u'balloon0'}, {'alias': u'scsi0', 'deviceType': 'controller', 'address': {u'slot': u'0x04', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'device': 'vnc', 'specParams': {'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'displayIp': '0'}, 'deviceType': 'graphics', 'type': 'graphics', 'port': u'5900'}, {'nicModel': 'pv', 'macAddr': '00:16:3E:76:D5:D5', 'linkActive': True, 'network': 'ovirtmgmt', 'specParams': {}, 'filter': 'vdsm-no-mac-spoofing', 'alias': u'net0', 'deviceType': 'interface', 'deviceId': '6083a9cd-afdf-41a9-b79f-42498ee624dc', 'address': {u'slot': u'0x03', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'bridge', 'type': 'interface', 'name': u'vnet0'}, {'index': '2', 'iface': 'ide', 'name': u'hdc', 'alias': u'ide0-1-0', 'specParams': {}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': 'e6cdce3f-08fc-48cb-8daa-1576dbdcff49', 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'poolID': '00000000-0000-0000-0000-000000000000', 'volumeInfo': {'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'volType': 'path', 'leaseOffset': 0, 'volumeID': u'88ee5207-8d75-4cc4-8636-4fce645ccff4', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4.lease', 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'path': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4'}, 'index': '0', 'iface': 'virtio', 'apparentsize': '26843545600', 'specParams': {}, 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'readonly': 'False', 'deviceType': 'disk', 'shared': 'exclusive', 'truesize': '5138464768', 'type': 'disk', 'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'reqsize': '0', 'format': 'raw', 'deviceId': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'device': 'disk', 'path': u'/var/run/vdsm/storage/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4', 'propagateErrors': 'off', 'optional': 'false', 'name': u'vda', 'bootOrder': u'1', 'volumeID': '88ee5207-8d75-4cc4-8636-4fce645ccff4', 'alias': u'virtio-disk0', 'volumeChain': [{'domainID': '2cfaf5e4-34f3-4caf-bd92-0f22c866b51e', 'volType': 'path', 'leaseOffset': 0, 'volumeID': u'88ee5207-8d75-4cc4-8636-4fce645ccff4', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4.lease', 'imageID': '3418c4a9-d1c2-4571-babf-ca675efaa661', 'path': u'/rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/images/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4'}]}, {'device': u'usb', 'alias': u'usb', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x2'}}, {'device': u'ide', 'alias': u'ide', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x01', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x1'}}, {'device': u'virtio-serial', 'alias': u'virtio-serial0', 'deviceType': 'controller', 'type': 'controller', 'address': {u'slot': u'0x05', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}, {'device': u'unix', 'alias': u'channel0', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'1'}}, {'device': u'unix', 'alias': u'channel1', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'2'}}, {'device': u'unix', 'alias': u'channel2', 'deviceType': u'channel', 'type': u'channel', 'address': {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port': u'3'}}, {'device': '', 'alias': u'video0', 'deviceType': u'video', 'type': u'video', 'address': {u'slot': u'0x02', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}}], 'status': 'Up', 'clientIp': '', 'statusTime': '4476662650', 'display': 'vnc'}] 93055:VM Channels Listener::DEBUG::2015-09-10 13:53:04,467::guestagent::218::virt.vm::(_connect) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Attempting connection to /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm 93056:VM Channels Listener::DEBUG::2015-09-10 13:53:04,468::guestagent::232::virt.vm::(_connect) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Failed to connect to /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm with 111 93057:libvirtEventLoop::DEBUG::2015-09-10 13:53:04,838::vm::4422::virt.vm::(onLibvirtLifecycleEvent) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::event Stopped detail 5 opaque None 93058:libvirtEventLoop::INFO::2015-09-10 13:53:04,838::vm::781::virt.vm::(_onQemuDeath) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::underlying process disconnected 93059:libvirtEventLoop::INFO::2015-09-10 13:53:04,838::vm::3685::virt.vm::(releaseVm) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Release VM resources 93082:periodic/0::WARNING::2015-09-10 13:53:05,955::periodic::253::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on ['01b3ce29-e575-47ef-89d2-335e4da7db36'] 93120:libvirtEventLoop::WARNING::2015-09-10 13:53:06,489::utils::144::root::(rmFile) File: /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.com.redhat.rhevm.vdsm already removed 93121:libvirtEventLoop::WARNING::2015-09-10 13:53:06,490::utils::144::root::(rmFile) File: /var/lib/libvirt/qemu/channels/01b3ce29-e575-47ef-89d2-335e4da7db36.org.qemu.guest_agent.0 already removed 93123:libvirtEventLoop::INFO::2015-09-10 13:53:06,491::logUtils::48::dispatcher::(wrapper) Run and protect: inappropriateDevices(thiefId='01b3ce29-e575-47ef-89d2-335e4da7db36') 93130:libvirtEventLoop::INFO::2015-09-10 13:53:06,495::vm::1276::virt.vm::(setDownStatus) vmId=`01b3ce29-e575-47ef-89d2-335e4da7db36`::Changed state to Down: Lost connection with qemu process (code=2) 93131:libvirtEventLoop::DEBUG::2015-09-10 13:53:06,496::__init__::206::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4476665530, "01b3ce29-e575-47ef-89d2-335e4da7db36": {"status": "Down", "exitReason": 2, "exitMessage": "Lost connection with qemu process", "exitCode": 1}}, "jsonrpc": "2.0", "method": "|virt|VM_status|01b3ce29-e575-47ef-89d2-335e4da7db36"} 93139:Thread-4662::DEBUG::2015-09-10 13:53:06,940::bindingxmlrpc::1256::vds::(wrapper) client [127.0.0.1]::call vmGetStats with ('01b3ce29-e575-47ef-89d2-335e4da7db36',) {} 93140:Thread-4662::DEBUG::2015-09-10 13:53:06,941::bindingxmlrpc::1263::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'exitMessage': 'Lost connection with qemu process', 'statusTime': '4476665970', 'vmId': '01b3ce29-e575-47ef-89d2-335e4da7db36', 'exitReason': 2, 'exitCode': 1}]} Created attachment 1072266 [details]
Relevant log file
Attaching the relevant log file
can you attach the qemu log so we would see the exact reason for the process to crash? Importing the SD changes the mount link so I guess that could be the reason. Does VDSM pass the path for the disks with /rhev/datacenter/mnt/$SD or with /rhev/datacenter/$DC/$SD ? what happens to the old drive path during the import? Created attachment 1072310 [details] qemu.log (In reply to Roy Golan from comment #6) > can you attach the qemu log so we would see the exact reason for the process > to crash? qemu.log attached > Importing the SD changes the mount link so I guess that could be the reason. > Does VDSM pass the path for the disks with /rhev/datacenter/mnt/$SD or with > /rhev/datacenter/$DC/$SD ? Actually, it's using the /var/run/vdsm/storage/%spuuid%/%sduuid%/%imageuuid%/ path > what happens to the old drive path during the > import? This is the old path (from qemu.log): /var/run/vdsm/storage/8e9e186f-b8f9-4202-aedf-273ac5bac204/9a693f7b-ddb0-4d44-ab52-1ac020c07511/faf31400-6208-46ee-bff1-8a6c86b273c2 This is the new path (from qemu.log): /var/run/vdsm/storage/2cfaf5e4-34f3-4caf-bd92-0f22c866b51e/3418c4a9-d1c2-4571-babf-ca675efaa661/88ee5207-8d75-4cc4-8636-4fce645ccff4 Here is the content of the old path: [root@green-vdsb 8e9e186f-b8f9-4202-aedf-273ac5bac204]# pwd /var/run/vdsm/storage/8e9e186f-b8f9-4202-aedf-273ac5bac204 [root@green-vdsb 8e9e186f-b8f9-4202-aedf-273ac5bac204]# ll total 16 lrwxrwxrwx. 1 vdsm kvm 131 Sep 8 18:06 4b9ec765-952f-458f-b38a-039fde48237d -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/4b9ec765-952f-458f-b38a-039fde48237d lrwxrwxrwx. 1 vdsm kvm 131 Sep 8 18:06 9a693f7b-ddb0-4d44-ab52-1ac020c07511 -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/9a693f7b-ddb0-4d44-ab52-1ac020c07511 lrwxrwxrwx. 1 vdsm kvm 131 Sep 8 18:06 e5e78c3c-d47f-47e9-b0cb-642ccb465ecd -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/e5e78c3c-d47f-47e9-b0cb-642ccb465ecd lrwxrwxrwx. 1 vdsm kvm 131 Sep 8 18:06 f18aae49-560e-4fb0-b23d-16725832ad14 -> /rhev/data-center/mnt/glusterSD/10.35.160.6:_elad1/8e9e186f-b8f9-4202-aedf-273ac5bac204/images/f18aae49-560e-4fb0-b23d-16725832ad14 Sorry, it's using the /var/run/vdsm/storage/%sduuid%/%imageuuid%/ path (without spuuid) Simone, is the sp changed by VDSM? By the engine? Who's changing it? The engine explicitly sends attachStorageDomain using the storagePool of the target datacenter. See: https://bugzilla.redhat.com/show_bug.cgi?id=1261996#c2 In such situation, as described here in the bug, the VM doesn't exist in the engine DB and it cannot be registered from the HE storage domain since this storage domain import action causes the setup to die with no possible workaround. Therefore, we cannot test the basic flow of HE VM migration. Blocked due to a bug with libvirt, it cannot be started as part of vdsm configuration, happens on RHEL7.2 as reported here https://bugzilla.redhat.com/show_bug.cgi?id=1269176 Simone, as we've seen, importing the hosted-engine SD still causes the VM to die since we have to change the SPUUID in /etc/ovirt-hosted-engine/hosted-engine.conf (reported in 1275210 - which is closed as dup of 1269768) Therefore, I can't verify this bug. (In reply to Elad from comment #13) > Simone, as we've seen, importing the hosted-engine SD still causes the VM to > die since we have to change the SPUUID in > /etc/ovirt-hosted-engine/hosted-engine.conf > (reported in 1275210 - which is closed as dup of 1269768) > Therefore, I can't verify this bug. We investigated it further: the spUUID change on storage domain import shouldn't stop the VM. The issue appears to be with sanlock that kills qemu: Roy is on that. Moving to VERIFIED based on: https://bugzilla.redhat.com/show_bug.cgi?id=1269768#c53 https://bugzilla.redhat.com/show_bug.cgi?id=1269768#c54 |
Created attachment 1072237 [details] vdsm.log Description of problem: Storage pool UUID is modified when importing the storage domain where the hosted-engine's disk is located on. This should not happen, vdsm should leave this field in the storage domain metadata as it is. Version-Release number of selected component (if applicable): vdsm-4.17.5-1.el7ev.noarch vdsm-python-4.17.5-1.el7ev.noarch vdsm-infra-4.17.5-1.el7ev.noarch vdsm-yajsonrpc-4.17.5-1.el7ev.noarch vdsm-jsonrpc-4.17.5-1.el7ev.noarch vdsm-cli-4.17.5-1.el7ev.noarch vdsm-xmlrpc-4.17.5-1.el7ev.noarch ovirt-host-deploy-1.4.0-0.0.5.master.el7ev.noarch ovirt-hosted-engine-setup-1.3.0-0.5.beta.git2cd8189.el7ev.noarch ovirt-vmconsole-host-1.0.0-0.0.1.master.el7ev.noarch libgovirt-0.3.3-1.el7.x86_64 ovirt-vmconsole-1.0.0-0.0.1.master.el7ev.noarch ovirt-hosted-engine-ha-1.3.0-0.4.beta.git1bcfe28.el7ev.noarch ovirt-setup-lib-1.0.0-0.1.master.git6a54bc0.el7ev.noarch libvirt-daemon-1.2.17-6.el7.x86_64 qemu-kvm-rhev-2.3.0-22.el7.x86_64 sanlock-3.2.4-1.el7.x86_64 python-2.7.5-31.el7.x86_64 selinux-policy-3.13.1-46.el7.noarch How reproducible: Always Steps to Reproduce: 1. Deploy hosted-engine (over glusterfs in my case) 2. Once setup is up, add a new storage domain to the DC 3. Import the HE storage domain to the DC Actual results: The SP UUID of the HE domain metadata is modified to the SP UUID of the DC where it is imported to. Expected results: The SP UUID in the SD metadata should not be modified when importing the HE domain. Additional info: vdsm.log