Description of problem: The hosted-engine fails to upgrade / auto import the storage domain. Version-Release number of selected component (if applicable): ( on host ) rpm -qa ovirt* vdsm* ovirt-vmconsole-1.0.0-1.el7.centos.noarch ovirt-setup-lib-1.0.1-1.el7.centos.noarch vdsm-yajsonrpc-4.17.18-1.el7.noarch vdsm-cli-4.17.18-1.el7.noarch ovirt-hosted-engine-ha-1.3.3.7-1.el7.centos.noarch ovirt-release36-003-1.noarch vdsm-xmlrpc-4.17.18-1.el7.noarch vdsm-4.17.18-1.el7.noarch ovirt-host-deploy-1.4.1-1.el7.centos.noarch ovirt-hosted-engine-setup-1.3.2.3-1.el7.centos.noarch vdsm-python-4.17.18-1.el7.noarch vdsm-hook-vmfex-dev-4.17.18-1.el7.noarch ovirt-release35-006-1.noarch ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch vdsm-infra-4.17.18-1.el7.noarch vdsm-jsonrpc-4.17.18-1.el7.noarch How reproducible: Upgrade hosts from el6 to el7 and upgrade ovirt to 3.6 Steps to Reproduce: 1. Upgrade ovirt to 3.6 on el6 2. Upgrade hosts to el7 ( vdsmd not upated on el6 anymore ) 3. deploy reinstalled el7 hosts to ovirt 4. restart ovirt-ha-agent ( host is in maintenance mode!!! ) 5. upgrade should start, but fails Actual results: /var/log/ovirt-hosted-engine-ha/agent.log MainThread::INFO::2016-02-11 20:06:19,191::hosted_engine::744::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Acquired lock on host id 2 MainThread::INFO::2016-02-11 20:06:19,192::upgrade::947::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade) Upgrading to current version MainThread::INFO::2016-02-11 20:06:19,592::upgrade::819::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_in_engine_maintenance) This host is connected to other storage pools MainThread::ERROR::2016-02-11 20:06:19,592::upgrade::950::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade) Unable to upgrade while not in maintenance mode: please put this host into maintenance mode from the engine, and manually restart this service when ready on hosted-engine /var/log/ovirt-engine/engine.log 2016-02-11 20:06:19,304 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-19) [66a28e10] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsync='true', hostId='41894d95-ef99-45a8-bd5d-c59d6e4c5e2e', vds='Host[,41894d95-ef99-45a8-bd5d-c59d6e4c5e2e]', vmIds='[8cb6bafc-abd7-49d8-b781-a6f37e63430a]'}), log id: 7084c699 2016-02-11 20:06:20,311 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-19) [66a28e10] FINISH, FullListVDSCommand, return: [{status=Up, nicModel=rtl8139,pv, emulatedMachine=pc, guestDiskMapping={QEMU_DVD-ROM_={name=/dev/sr0}, 842b979e-9c0a-4337-b={name=/dev/vda}}, vmId=8cb6bafc-abd7-49d8-b781-a6f37e63430a, pid=385, devices=[Ljava.lang.Object;@1d0aa085, smp=2, vmType=kvm, displayIp=0, display=vnc, displaySecurePort=-1, memSize=8194, displayPort=5900, cpuType=Westmere, spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir, statusTime=4461066420, vmName=HostedEngine, clientIp=, pauseCode=NOERR}], log id: 7084c699 2016-02-11 20:06:20,322 INFO [org.ovirt.engine.core.bll.storage.GetExistingStorageDomainListQuery] (org.ovirt.thread.pool-8-thread-35) [] START, GetExistingStorageDomainListQuery(GetExistingStorageDomainListParameters:{refresh='true', filtered='false'}), log id: 7d08449 2016-02-11 20:06:20,323 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] START, HSMGetStorageDomainsListVDSCommand(HostName = geisha-2.pazion.nl, HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true', hostId='41894d95-ef99-45a8-bd5d-c59d6e4c5e2e', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='null', storageDomainType='Data', path='null'}), log id: a9cd8a1 2016-02-11 20:06:21,837 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] FINISH, HSMGetStorageDomainsListVDSCommand, return: [7ebbf9af-f4aa-4639-be31-ee4aa38ccea6, 8744765f-729f-4687-905e-1edd3546a16e, 88b69eba-ef4f-4dbe-ba53-20dadd424d0e], log id: a9cd8a1 2016-02-11 20:06:21,862 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] START, HSMGetStorageDomainInfoVDSCommand(HostName = geisha-2.pazion.nl, HSMGetStorageDomainInfoVDSCommandParameters:{runAsync='true', hostId='41894d95-ef99-45a8-bd5d-c59d6e4c5e2e', storageDomainId='88b69eba-ef4f-4dbe-ba53-20dadd424d0e'}), log id: 38c26d07 2016-02-11 20:06:22,877 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (org.ovirt.thread.pool-8-thread-35) [] FINISH, HSMGetStorageDomainInfoVDSCommand, return: <StorageDomainStatic:{name='hostedengine_nfs', id='88b69eba-ef4f-4dbe-ba53-20dadd424d0e'}, 499b208c-9de9-4a2a-97de-30f410b4e6d4>, log id: 38c26d07 2016-02-11 20:06:22,877 INFO [org.ovirt.engine.core.bll.storage.GetExistingStorageDomainListQuery] (org.ovirt.thread.pool-8-thread-35) [] FINISH, GetExistingStorageDomainListQuery, log id: 7d08449 2016-02-11 20:06:22,877 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [46900451] Lock Acquired to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' 2016-02-11 20:06:22,896 WARN [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [46900451] CanDoAction of action 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_NOT_EXIST 2016-02-11 20:06:22,896 INFO [org.ovirt.engine.core.bll.ImportHostedEngineStorageDomainCommand] (org.ovirt.thread.pool-8-thread-35) [46900451] Lock freed to object 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}' Expected results: Upgraded hosted-engine with storage domain available in webGUI Additional info: ovirt install was created in 3.4 and multiple times upgrade with success. hosted engine is on NFS3 master storage is on FC I had serious problems with this upgrade path ( hosts el6 to el7 ) - https://www.mail-archive.com/users@ovirt.org/msg30964.html - during hosted-engine --deploy /var/run/vdsm/storage was missing - problems with spUUID= in hosted-engine.conf, no connection, so reset to 0000-0000 value in hosted-engine.conf as suggested. ( http://screencast.com/t/n0yFcgd5gC ) Finally hosted-engine is working and I was able to set Cluster compatibility in webGui to 3.6 and save. Only issue beside failed upgrade is fact I have commented #conf_volume_UUID, if not, I get these in the logs: MainThread::INFO::2016-02-11 20:40:01,424::config::205::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE MainThread::WARNING::2016-02-11 20:40:01,425::ovf_store::105::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE MainThread::ERROR::2016-02-11 20:40:01,425::config::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf MainThread::ERROR::2016-02-11 20:40:01,425::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Path to volume 125f858b-6e2a-444b-bbfa-e45a328075f6 not found in /rhev/data-center/mnt' - trying to restart agent Tried bot local and global maintenance mode. I tried to remove any existing connection to the host and did : service ovirt-ha-broker stop service ovirt-ha-agent stop sanlock client shutdown -f 1 service sanlock stop service vdsmd restart umount /rhev/data-center/mnt/hostedstorage.pazion.nl\:_opt_hosted-engine/ service sanlock start service ovirt-ha-broker start I tried to restart host and trigger upgrade. Also tried to shutdown hosted-engine and restart ovirt-ha-agent. Still same error. Makes me believe I hit a bug... I keep getting connected to storage pools error and error host is not in maintenance mode
I found https://bugzilla.redhat.com/show_bug.cgi?id=1271771 and https://bugzilla.redhat.com/show_bug.cgi?id=1294457 So I tried to set HostedEngineStorageDomainName and restart ovirt-ha-agent, but this had no effect either. I got the same connected storagepools and put host in maintenance error again.
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1269768
(In reply to Paul from comment #0) > Description of problem: > > The hosted-engine fails to upgrade / auto import the storage domain. > > > Steps to Reproduce: > > 1. Upgrade ovirt to 3.6 on el6 > 2. Upgrade hosts to el7 ( vdsmd not upated on el6 anymore ) Hi, the above is not the proper upgrade path since you are moving between operating systems. The right way to upgrade is while still in 3.5. Here's a list of all steps to properly upgrade your HE setup: Assumptions ------------ - All hosts are running latest 3.5 on el6 - Using 3.5 hosted engine. - The 3.5 cluster has a redundancy of at least 1 host (since we need to be able to take down a host and its VMs will be evacuated). Phase 1: el6 to el7 (must be done in 3.5) ----------------------------------------- - Move one host to maintenance. - Reinstall the host as el7 - Create new 3.5 cluster for el7 - Add the new host to the new cluster (in 3.5 compatibility mode: engine is still 3.5) - Run hosted engine deploy on the new host. - Migrate VMs directly to the new host. - Repeat the above for other hosts until you get to the last host running HE VM.- - Stop the HE VM running in the el6 cluster. It should be automatically started in the el7 cluster. - Take the last el6 host down to maintenance and reinstall it, add to the new el7 cluster. Phase 2: 3.5 to 3.6 -------------------- - Move the HE VM to global maintenance. - Upgrade the engine RPMs to 3.6 - start the engine. - upgrade all the hosts from 3.5 to 5.6 (you must move the host to maintenance from the UI to ensure it is viewed as maintenance mode by the engine and also put into local-maintenance mode). - Once everything is stable and running, change cluster compatibility mode to 3.6 - Make sure you have an additional SD. If not, add it to ensure the VM is properly imported.
Please re-open if you find any issues while following the instructions in comment 3.
Upgrade path followed: I had problems during the upgrade. So what I did is I downgraded 1 host to ovirt 3.5 and then ran upgrade path described in phase 1. Problem is the whole host upgrade was done because I found out 3.6 was not supported on el6. So By this time I already had upgraded my engine. In the quest to get all properly upgrade I set DC to compatibility 3.6 as well as Cluster. I have tried different options and also have a problem with the storage domain being imported ( Is this what you meant with "Make sure you have an additional SD. If not, add it to ensure the VM is properly imported.". ? ): http://lists.ovirt.org/pipermail/users/2016-February/038023.html Possible solution?: I am a bit stuck in the middle. Beside above upgrade problem, my hosted_storage domain is named different and only imported ( and locked ) when name is set to hostedengine_nfs. These might be closely related tough. So my main question is how can i upgrade my setup to 3.6? Is there a way to downgrade hosted-engine to 3.5 ( and dc, cluster compatibility ) and rerun upgrade. Or is there some fix in 3.6.3 which might provide a solution? Or is my only solution create a new hosted-engine ( clean install from 3.6 ) and then disconnect old master storage domain and import storage domain on new hosted-engine?
I'm for re-opening it: currently trying to add a fresh el7 host to a 3.5 hosted-engine instance using hosted-engine-setup from 3.6 fails with: [ INFO ] Stage: Setup validation [ ERROR ] Failed to execute stage 'Setup validation': Unable to prepare image: Unknown pool id, pool not connected: ('f8eee402-e5f8-4325-9f5c-81acc65e57aa',) [ INFO ] Stage: Clean up which is not that clear. Really supporting it (direct 3.5 on el6 -> 3.6 on el7 upgrade) is definitively not worth since the proposed way (upgrading from el6 -> el7 while on 3.5 and only after that upgrading 3.5->3.6) is correctly working. But at least we should provide a clear error message when the user is not on the supported path and so at least we should provide a clear error message when the user tries to add a 3.6 host to an existing 3.5 instance. The severity is not that high since it's just about providing a clear error message instead of an internal one.
I upgraded the hosted-engine to 3.6.3 and the storagedomain became active in the webinterface, when it first was locked. From there things started moving and I was able to put hosts in maintenance and the OVF stores were created and hosted-engine.conf got updated. Looking back I think there were several (related) issues I encountered and got fixed somehow fixed in 3.6.3: - pool UUID problem, with error not found and still connected - differently named hosted_storage So after quite a struggle and thanks to the new 3.6.3 version, my hosts have been upgraded to el7 and my platform is upgraded to 3.6. Thanks to everyone for the help and especially Simone Tiraboschi helping me by answering my posts on the ovirt users list. Thanks!
Upgrade 3.4 el6 -> 3.5 el6 -> 3.5 el7 -> 3.6 el7 3.4 el6 - Versions ================================================ ovirt-hosted-engine-setup-1.1.5-1.el6ev.noarch ovirt-hosted-engine-ha-1.1.6-3.el6ev.noarch vdsm-python-zombiereaper-4.14.18-7.el6ev.noarch vdsm-cli-4.14.18-7.el6ev.noarch vdsm-4.14.18-7.el6ev.x86_64 vdsm-xmlrpc-4.14.18-7.el6ev.noarch vdsm-python-4.14.18-7.el6ev.x86_64 3.5 el6 - Versions: ================================================ kernel-2.6.32-573.el6.x86_64 ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch vdsm-jsonrpc-4.16.36-1.el6ev.noarch vdsm-python-4.16.36-1.el6ev.noarch vdsm-cli-4.16.36-1.el6ev.noarch vdsm-yajsonrpc-4.16.36-1.el6ev.noarch vdsm-4.16.36-1.el6ev.x86_64 vdsm-python-zombiereaper-4.16.36-1.el6ev.noarch vdsm-xmlrpc-4.16.36-1.el6ev.noarch 3.5 el7 Versions: ================================================ kernel-3.10.0-327.13.1.el7.x86_64 ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch vdsm-4.16.36-1.el7ev.x86_64 vdsm-python-zombiereaper-4.16.36-1.el7ev.noarch vdsm-xmlrpc-4.16.36-1.el7ev.noarch vdsm-jsonrpc-4.16.36-1.el7ev.noarch vdsm-hook-ethtool-options-4.16.36-1.el7ev.noarch vdsm-python-4.16.36-1.el7ev.noarch vdsm-yajsonrpc-4.16.36-1.el7ev.noarch vdsm-cli-4.16.36-1.el7ev.noarch 3.6 el7 Versions: ================================================ kernel-3.10.0-327.13.1.el7.x86_64 ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5-1.el7ev.noarch vdsm-python-4.17.23.1-0.el7ev.noarch vdsm-4.17.23.1-0.el7ev.noarch vdsm-cli-4.17.23.1-0.el7ev.noarch vdsm-hook-ethtool-options-4.17.23.1-0.el7ev.noarch vdsm-jsonrpc-4.17.23.1-0.el7ev.noarch vdsm-infra-4.17.23.1-0.el7ev.noarch vdsm-yajsonrpc-4.17.23.1-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.23.1-0.el7ev.noarch vdsm-xmlrpc-4.17.23.1-0.el7ev.noarch ================================================ Steps 3.4 el6 -> 3.5 el6: 1) Start with 3.4 engine(el6) and two hosts(el6) - environment has nfs storage domain and two additional running VM's 2) Set global maintenance 3) Upgrade engine to 3.5 4) Put first host to maintenance 5) Upgrade first host to 3.5(el6) 6) Activate first host 7) Put second host to maintenance 8) Upgrade second host to 3.5(el6) 9) Activate host 10) Change cluster and datacenter compatability version to 3.5 PASS Steps 3.5 el6 -> 3.5 el7: 1) Put first host to maintenance and remove it from engine 2) Reprovision first host to 3.5 el7 3) Redeploy first host via hosted-engine tool(use second host as source for config file and W/A from https://bugzilla.redhat.com/show_bug.cgi?id=1308962) 4) Put second host to maintenance and remove it from engine 5) Reprovision second host to 3.5 el7 6) Redeploy second host via hosted-engine tool(use first host as source for config file and W/A from https://bugzilla.redhat.com/show_bug.cgi?id=1308962) PASS Steps 3.5 el7 -> 3.6 el7: 1) Set global maintenance 2) Upgrade engine to 3.6 3) Put first host to maintenance 4) Upgrade first host to 3.6 5) Activate first host 6) Put second host to maintenance 7) Upgrade second host to 3.6 8) Activate host 9) Change cluster and datacenter compatability version to 3.6 PASS Deploy additional 3.6 host: ### Please upgrade the existing HE hosts to current release before adding this host. ### Please check the log file for more details. ***Q:STRING OVEHOSTED_PREVENT_MIXING_HE_35_CURRENT ### Replying "No" will abort Setup. ### Continue? (Yes, No) [No] Yes Deploy succeed !!!ovirt-ha-agent droped to fail state!!! agent.log ============================= MainThread::DEBUG::2016-03-20 14:12:13,181::brokerlink::273::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) Sending request: notify time=1458475933.18 type=state_transition detail=StartState-ReinitializeFSM hostname='alma07.qa.lab.tlv.redhat.com' MainThread::DEBUG::2016-03-20 14:12:13,181::util::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(socket_readline) socket_readline with 30.0 seconds timeout MainThread::DEBUG::2016-03-20 14:12:13,182::brokerlink::282::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) Full response: failure <type 'exceptions.RuntimeError'> MainThread::DEBUG::2016-03-20 14:12:13,182::brokerlink::258::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_checked_communicate) Failed response from socket MainThread::ERROR::2016-03-20 14:12:13,183::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to start monitor <type 'type'>, options {'hostname': 'alma07.qa.lab.tlv.redhat.com'}: Request failed: <type 'exceptions.RuntimeError'>' - trying to restart agent broker.log ============================= Thread-32979::ERROR::2016-03-20 14:12:18,978::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: "notify time=1458475938.98 type=state_transition detail=StartState-ReinitializeFSM hostname='alma07.qa.lab.tlv.redhat.com'" Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 302, in _dispatch if notifications.notify(**options): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 54, in notify archive_fname=constants.NOTIFY_CONF_FILE_ARCHIVE_FNAME, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/env/config.py", line 243, in refresh_local_conf_file conf_vol_id, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/heconflib.py", line 273, in get_volume_path vol_uuid=vol_uuid, RuntimeError: Path to volume None not found in /rhev/data-center/mnt Looks like because the answer file from the first host does not have values for: OVEHOSTED_STORAGE/confImageUUID OVEHOSTED_STORAGE/confVolUUID we have values under hosted-engine.conf conf_volume_UUID=None conf_image_UUID=None After that I added values to hosted-engine.conf file(taken from other host) and restarted the host I see that ovirt-ha-agent succeed to up, but: MainThread::ERROR::2016-03-20 14:12:18,806::upgrade::980::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Unable to upgrade while not in maintenance mode: please put this host into maintenance mode from the engine, and manually restart this service when ready I puted the host to maintenance via engine and restart ovirt-ha-agent, but again I encounter error message: MainThread::ERROR::2016-03-20 15:01:07,890::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Unable to connect SP: Wrong Master domain or its version: 'SD=3ac831d6-6124-4b42-a060-f89c64be09a1, pool=52313664-01fd-4df9-b2ae-d87cf7e2c81a'' - trying to restart agent from vdsm: Thread-3066::ERROR::2016-03-20 15:04:18,304::task::866::Storage.TaskManager.Task::(_setError) Task=`86520e58-b47d-4a10-8a5a-c8f69a6f7533`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1036, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/share/vdsm/storage/hsm.py", line 1101, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 657, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1231, in __rebuild self.setMasterDomain(msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1447, in setMasterDomain raise se.StoragePoolWrongMaster(self.spUUID, msdUUID) StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=3ac831d6-6124-4b42-a060-f89c64be09a1, pool=52313664-01fd-4df9-b2ae-d87cf7e2c81a' from some reason # vdsClient -s 0 getImagesList 3ac831d6-6124-4b42-a060-f89c64be09a1 return empty list, when on other host # vdsClient -s 0 getImagesList 3ac831d6-6124-4b42-a060-f89c64be09a1 9ecd4e5f-bb24-4fd6-8c20-c442425b59b6 b6b637a4-37be-48e9-aacb-e3d4a6be29cc 995171f0-1abb-488b-9b18-3e17aad0c3de 4d1915e1-a9f7-4bca-b666-0997adec5ef4
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
(In reply to Artyom from comment #8) > from some reason > # vdsClient -s 0 getImagesList 3ac831d6-6124-4b42-a060-f89c64be09a1 > return empty list, when on other host > # vdsClient -s 0 getImagesList 3ac831d6-6124-4b42-a060-f89c64be09a1 > 9ecd4e5f-bb24-4fd6-8c20-c442425b59b6 > b6b637a4-37be-48e9-aacb-e3d4a6be29cc > 995171f0-1abb-488b-9b18-3e17aad0c3de > 4d1915e1-a9f7-4bca-b666-0997adec5ef4 The issue is just here: the image are available on the NFS share [root@alma07 images]# pwd /rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_alukiano__HE__upgrade/3ac831d6-6124-4b42-a060-f89c64be09a1/images [root@alma07 images]# ls -l total 16 drwxr-xr-x. 2 vdsm kvm 4096 17 mar 15.36 4d1915e1-a9f7-4bca-b666-0997adec5ef4 drwxr-xr-x. 2 vdsm kvm 4096 18 mar 00.52 995171f0-1abb-488b-9b18-3e17aad0c3de drwxr-xr-x. 2 vdsm kvm 4096 20 mar 15.40 9ecd4e5f-bb24-4fd6-8c20-c442425b59b6 drwxr-xr-x. 2 vdsm kvm 4096 18 mar 00.52 b6b637a4-37be-48e9-aacb-e3d4a6be29cc but VDSM is not reporting them. I reproduced it with a small python script: from vdsm import vdscli sdUUID = '3ac831d6-6124-4b42-a060-f89c64be09a1' cli = vdscli.connect(timeout=60) result = cli.getImagesList(sdUUID) print(result) result = cli.getConnectedStoragePoolsList() print(result) That prints: {'status': {'message': 'OK', 'code': 0}, 'imageslist': []} {'status': {'message': 'OK', 'code': 0}, 'poollist': []} From the past we know that VDSM's getImagesList wasn't working on NFS when not connected to a storage pool (see rhbz#1274622 ) but it that case it was returning: {'status': {'message': 'list index out of range', 'code': 100}} And we implemented a workaround for that But now it's still not working (imageslist = [] when images are there) but it also returns a wrong error code of 0 hiding the issue so our workaround doesn't trigger.
So it's a regression on vdsm. Please open a BZ against VDSM and let's make it blocking for 3.6.4. Moving back this bug to QA, please make the new BZ blocking this one too.
Created attachment 1138524 [details] VDSM logs on getImagesList
*** Bug 1316143 has been marked as a duplicate of this bug. ***
Verified Upgrade 3.4 el6 -> 3.5 el6 -> 3.5 el7 -> 3.6 el7 3.4 el6 - Versions ================================================ ovirt-hosted-engine-setup-1.1.5-1.el6ev.noarch ovirt-hosted-engine-ha-1.1.6-3.el6ev.noarch vdsm-python-zombiereaper-4.14.18-7.el6ev.noarch vdsm-cli-4.14.18-7.el6ev.noarch vdsm-4.14.18-7.el6ev.x86_64 vdsm-xmlrpc-4.14.18-7.el6ev.noarch vdsm-python-4.14.18-7.el6ev.x86_64 3.5 el6 - Versions: ================================================ kernel-2.6.32-573.el6.x86_64 ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch vdsm-jsonrpc-4.16.36-1.el6ev.noarch vdsm-python-4.16.36-1.el6ev.noarch vdsm-cli-4.16.36-1.el6ev.noarch vdsm-yajsonrpc-4.16.36-1.el6ev.noarch vdsm-4.16.36-1.el6ev.x86_64 vdsm-python-zombiereaper-4.16.36-1.el6ev.noarch vdsm-xmlrpc-4.16.36-1.el6ev.noarch 3.5 el7 Versions: ================================================ kernel-3.10.0-327.13.1.el7.x86_64 ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch vdsm-4.16.36-1.el7ev.x86_64 vdsm-python-zombiereaper-4.16.36-1.el7ev.noarch vdsm-xmlrpc-4.16.36-1.el7ev.noarch vdsm-jsonrpc-4.16.36-1.el7ev.noarch vdsm-hook-ethtool-options-4.16.36-1.el7ev.noarch vdsm-python-4.16.36-1.el7ev.noarch vdsm-yajsonrpc-4.16.36-1.el7ev.noarch vdsm-cli-4.16.36-1.el7ev.noarch 3.6 el7 Versions: ================================================ kernel-3.10.0-327.13.1.el7.x86_64 ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch vdsm-python-4.17.23.1-0.el7ev.noarch vdsm-4.17.23.1-0.el7ev.noarch vdsm-cli-4.17.23.1-0.el7ev.noarch vdsm-hook-ethtool-options-4.17.23.1-0.el7ev.noarch vdsm-jsonrpc-4.17.23.1-0.el7ev.noarch vdsm-infra-4.17.23.1-0.el7ev.noarch vdsm-yajsonrpc-4.17.23.1-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.23.1-0.el7ev.noarch vdsm-xmlrpc-4.17.23.1-0.el7ev.noarch ================================================ Steps 3.4 el6 -> 3.5 el6: 1) Start with 3.4 engine(el6) and two hosts(el6) - environment has nfs storage domain and two additional running VM's 2) Set global maintenance 3) Upgrade engine to 3.5 4) Put first host to maintenance 5) Upgrade first host to 3.5(el6) 6) Activate first host 7) Put second host to maintenance 8) Upgrade second host to 3.5(el6) 9) Activate host 10) Change cluster and datacenter compatability version to 3.5 PASS Steps 3.5 el6 -> 3.5 el7: 1) Put first host to maintenance and remove it from engine 2) Reprovision first host to 3.5 el7 3) Redeploy first host via hosted-engine tool(use second host as source for config file and W/A from https://bugzilla.redhat.com/show_bug.cgi?id=1308962) 4) Put second host to maintenance and remove it from engine 5) Reprovision second host to 3.5 el7 6) Redeploy second host via hosted-engine tool(use first host as source for config file and W/A from https://bugzilla.redhat.com/show_bug.cgi?id=1308962) PASS Steps 3.5 el7 -> 3.6 el7: 1) Set global maintenance 2) Upgrade engine to 3.6 3) Put first host to maintenance 4) Upgrade first host to 3.6 5) Activate first host 6) Put second host to maintenance 7) Upgrade second host to 3.6 8) Activate host 9) Change cluster and datacenter compatability version to 3.6 PASS Deploy additional 3.6 host: ### Please upgrade the existing HE hosts to current release before adding this host. ### Please check the log file for more details. ***Q:STRING OVEHOSTED_PREVENT_MIXING_HE_35_CURRENT ### Replying "No" will abort Setup. ### Continue? (Yes, No) [No] Yes Deploy succeed agent and broker service succeed to start without any troubles NOTE: After host deploy, you will need to put it to maintenance via engine and restart ovirt-ha-agent to finish HE upgrade process