Bug 1295427
Summary: | hosted engine doesnt start - fails during storage server upgrade | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Qiong Wu <qiong.wu> | ||||
Component: | Agent | Assignee: | Simone Tiraboschi <stirabos> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | sefi litmanovich <slitmano> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 1.3.3.6 | CC: | alukiano, bmcclain, bugs, dfediuck, mavital, qiong.wu, sbonazzo, slitmano, ylavi | ||||
Target Milestone: | ovirt-3.6.3 | Flags: | rule-engine:
ovirt-3.6.z+
rule-engine: exception+ rule-engine: planning_ack+ sbonazzo: devel_ack+ mavital: testing_ack+ |
||||
Target Release: | 1.3.4.3 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
The 3.5 -> 3.6 upgrade procedure was wrongly checking the maintenance status.
Consequence:
an ambiguous error (Error: 'unhashable type: 'dict'') was reported
Fix:
Correctly checking maintenance status
Result:
Now it reports:
Unable to upgrade while not in maintenance mode: please put this host into maintenance mode from the engine, and manually restart this service when ready
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-03-11 07:22:43 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1284954 | ||||||
Attachments: |
|
Description
Qiong Wu
2016-01-04 13:06:22 UTC
I just checked vdsclient list and got this: [root@localhost ~]# vdsClient -s localhost list 5a034fba-b54e-41fe-b65a-20cd069334b7 Status = Down emulatedMachine = pc guestDiskMapping = {} displaySecurePort = -1 cpuType = Westmere devices = [{'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': '413084b1-841a-4b87-96a0-6bba242d6491', 'alias': 'console0'}, {'device': 'memballoon', 'specParams': {'model': 'none'}, 'type': 'balloon'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'device': 'vnc', 'specParams': {'spiceSecureChannels': 'smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir', 'displayIp': '0'}, 'type': 'graphics'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:48:44:e4', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': 'cbb23bc4-7070-4c8d-8473-aacd99faffea', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': '3abe57e9-16f2-44d7-a978-ced738b9463e', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '/home/tmp/centos.iso', 'type': 'disk'}, {'poolID': '00000000-0000-0000-0000-000000000000', 'reqsize': '0', 'index': '0', 'iface': 'virtio', 'apparentsize': '26843545600', 'imageID': 'c261320f-1dc0-43db-8b6c-dd49f74b8007', 'readonly': 'false', 'shared': 'exclusive', 'truesize': '4974497792', 'type': 'disk', 'domainID': '48fb7be2-d8eb-44e4-8690-7770ccaf3766', 'volumeInfo': {'domainID': '48fb7be2-d8eb-44e4-8690-7770ccaf3766', 'volType': 'path', 'leaseOffset': 0, 'volumeID': '5c579441-98e0-43fb-8e35-8c3d619e8998', 'leasePath': '/rhev/data-center/mnt/ovirt-nfs.labtest.lab:_engine/48fb7be2-d8eb-44e4-8690-7770ccaf3766/images/c261320f-1dc0-43db-8b6c-dd49f74b8007/5c579441-98e0-43fb-8e35-8c3d619e8998.lease', 'imageID': 'c261320f-1dc0-43db-8b6c-dd49f74b8007', 'path': '/rhev/data-center/mnt/ovirt-nfs.labtest.lab:_engine/48fb7be2-d8eb-44e4-8690-7770ccaf3766/images/c261320f-1dc0-43db-8b6c-dd49f74b8007/5c579441-98e0-43fb-8e35-8c3d619e8998'}, 'format': 'raw', 'deviceId': 'c261320f-1dc0-43db-8b6c-dd49f74b8007', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'path': '/var/run/vdsm/storage/48fb7be2-d8eb-44e4-8690-7770ccaf3766/c261320f-1dc0-43db-8b6c-dd49f74b8007/5c579441-98e0-43fb-8e35-8c3d619e8998', 'propagateErrors': 'off', 'optional': 'false', 'bootOrder': '1', 'volumeID': '5c579441-98e0-43fb-8e35-8c3d619e8998', 'specParams': {}, 'volumeChain': [{'domainID': '48fb7be2-d8eb-44e4-8690-7770ccaf3766', 'volType': 'path', 'leaseOffset': 0, 'volumeID': '5c579441-98e0-43fb-8e35-8c3d619e8998', 'leasePath': '/rhev/data-center/mnt/ovirt-nfs.labtest.lab:_engine/48fb7be2-d8eb-44e4-8690-7770ccaf3766/images/c261320f-1dc0-43db-8b6c-dd49f74b8007/5c579441-98e0-43fb-8e35-8c3d619e8998.lease', 'imageID': 'c261320f-1dc0-43db-8b6c-dd49f74b8007', 'path': '/rhev/data-center/mnt/ovirt-nfs.labtest.lab:_engine/48fb7be2-d8eb-44e4-8690-7770ccaf3766/images/c261320f-1dc0-43db-8b6c-dd49f74b8007/5c579441-98e0-43fb-8e35-8c3d619e8998'}]}] smp = 2 vmType = kvm memSize = 4096 vmName = HostedEngine exitMessage = Failed to acquire lock: No space left on device pid = 0 displayIp = 0 displayPort = -1 clientIp = exitCode = 1 nicModel = rtl8139,pv exitReason = 1 spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir statusTime = 4299084070 display = vnc also: 2016-01-04 16:03:01+0100 3712 [1182]: r2 cmd_acquire 2,8,18691 invalid lockspace found -1 failed 0 name 48fb7be2-d8eb-44e4-8690-7770ccaf3766 from sanlock.log 2016-01-05 00:10:52+0100 15454 [1154]: s69 lockspace hosted-engine:1:/var/run/vdsm/storage/48fb7be2-d8eb-44e4-8690-7770ccaf3766/86438929-9f4e-4873-a141-3f061b2edee2/86f47014-dd1f-43f1-84ca-f41e02f88f58:0 2016-01-05 00:10:52+0100 15454 [3371]: verify_leader 1 wrong magic 0 /var/run/vdsm/storage/48fb7be2-d8eb-44e4-8690-7770ccaf3766/86438929-9f4e-4873-a141-3f061b2edee2/86f47014-dd1f-43f1-84ca-f41e02f88f58 2016-01-05 00:10:52+0100 15454 [3371]: leader1 delta_acquire_begin error -223 lockspace hosted-engine host_id 1 2016-01-05 00:10:52+0100 15454 [3371]: leader2 path /var/run/vdsm/storage/48fb7be2-d8eb-44e4-8690-7770ccaf3766/86438929-9f4e-4873-a141-3f061b2edee2/86f47014-dd1f-43f1-84ca-f41e02f88f58 offset 0 2016-01-05 00:10:52+0100 15454 [3371]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0 2016-01-05 00:10:52+0100 15454 [3371]: leader4 sn rn ts 0 cs 0 2016-01-05 00:10:53+0100 15455 [1154]: s69 add_lockspace fail result -223 Just to be sure, are you use the same upgrade procedure, that described under: http://www.ovirt.org/Hosted_Engine_Howto#Upgrade_Hosted_Engine yes, I managed to get things running again in the meanwhile. I tracked it down to problems with the storage pool id. so I changed the storage pool id in the hosted-engine.conf and on my gluster volume in dom_md/metadata to a new value and the machine booted again. Then, being able to log in again to the engine, I repaired the hosted engine storage and things started working again. It is deployment over gluster storage? Because I not encounter this specific problem over NFS and ISCSI storages. yeah, I set everything up according to http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ Simone, this is NFS over Gluster in hyperconverged setup. Qiong Wu, can you please upload a sos report somewhere we can look at? (yum install sos ; sosreport) Understood: as for comment 1 a VM was there when you tried the upgrade and we weren't correctly parsing the output of vdscli.list that returns a list of dictionaries and so it ended with Error: 'unhashable type: 'dict'' - trying to restart agent This doesn't affect the regular flow. Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone. Verified with the following flow: 1. On two RHEL 7.2 hosts Install latest 3.5 packages with: ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch.rpm ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch.rpm 2.hosted-engine --deploy from one of the hosts + install engine on the created vm. 3. hosted-engine --deploy from second host to add it to engine. 4. add stroage domain to engine and create vms to run on host 2. 5 set maintenace mode to global. 6. upgrade engine from rhevm-3.5.8-0.1.el6ev.noarch to rhevm-3.6.3.3-0.1.el6.noarch. 7. Disable global maintenance. 8. put the 1st host to maintenance on engine. 9. update host with 3.6 repos -> ovirt-ha-agent stop -> yum update. 10.restart vdsm, ovirt-ha-agent/broker. 11. start host on engine. 12. do step 9 on the second host without moving it to maintenance (3 vms still running on it) 13. restart ovirt-ha-agent on the second host. result: restart crashes, hosted engine vm goes down (then starts on the first host). in agent.log I get (Full log is attached): MainThread::INFO::2016-02-25 15:28:37,794::hosted_engine::757::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Acquired lock on host id 2 MainThread::INFO::2016-02-25 15:28:37,807::upgrade::977::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Upgrading to current version MainThread::INFO::2016-02-25 15:28:37,813::upgrade::831::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_in_engine_maintenance) This host is connected to other storage pools: ['00000002-0002-0002-0002-00000000003c'] MainThread::ERROR::2016-02-25 15:28:37,813::upgrade::980::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Unable to upgrade while not in maintenance mode: please put this host into maintenance mode from the engine, and manually restart this service when ready Created attachment 1130539 [details]
agent log for host that did not move to maintenance
|