Bug 920636
| Summary: | engine: second host become non-operational when adding it to pool during createStoragePool | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
| Component: | ovirt-engine | Assignee: | Roy Golan <rgolan> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Artyom <alukiano> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.2.0 | CC: | acathrow, bazulay, iheim, jkt, lpeer, pstehlik, Rhev-m-bugs, yeylon, yzaslavs | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | 3.3.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | infra | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-10-20 17:08:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1019461 | ||||||
| Attachments: |
|
||||||
also reproduced when activating master domain and activating a host at the same time. reproduction: 1. have two hosts, one SPM and one HSM with 1 NFS domain 2. put the HSM host in maintenance 3. put master domain in maintenance 4. activate the master domain + activate the host. connectStoragePool should not be run until createStoragePool is finished. Sounds like infra should add the createStoragePool to the queue and then connect would hold until create is finished. Michael, thoughts about this? Agree, or new event should be or prevent activation of host until creation of pool (AddStoragePoolWithStoragesCommand) will be finished. By the way, auto recovery success to activate a host after that. Also the auto-recovery will activate the host after 3 minutes and it will probably be up. Isn't it better to succeed adding and installing the host (which is not a quick operation) first and then the system will recover for us from an intermediate state while we create the pool? (In reply to Roy Golan from comment #5) > Also the auto-recovery will activate the host after 3 minutes and it will > probably be up. > > Isn't it better to succeed adding and installing the host (which is not a > quick operation) first and then the system will recover for us from an > intermediate state while we create the pool? Agree Per comments #4 #5 & #6, moving this bug to CLOSE NOTABUG |
Created attachment 708966 [details] logs Description of problem: I have two hosts, one was in maintenance and other in up. I created a new pool and during the create of the pool I activated the second host. engine sends ConnectStoragePool to vdsm although the master domain is not yet created which causes hsm host to become non-operational. Version-Release number of selected component (if applicable): sf10 vdsm-4.10.2-11.0.el6ev.x86_64 How reproducible: looks like a race Steps to Reproduce: 1. create a new Pool with one host active and one in maintenance 2. activate the second host during createStorgaePool 3. Actual results: since engine has the master domain in the db but the domainhas not yet been finished to be created, when we add the second host and engine sends connectStoragePool with the master domain UUID from the db, vdsm on hsm host will fail with no master domain error and host will become non-operational. Expected results: host should not become non-operational (maybe creating a temp flag to new master domain in db before we actually know it was created and active). Additional info:logs 2013-03-12 02:27:34,127 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-46) START, ConnectStoragePoolVDSCommand(HostName = gold-vdsd, HostId = 83834e1f-9e60-41b5-a9cc-16460a8a2fe2, storagePoolId = 5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7, vds_spm_id = 2, masterDomainId = b37a4aef-09dd-4a4e-ae1e-a3bdb12c4ba5, masterVersion = 1), log id: 4ddc0a3f 2013-03-12 02:27:37,690 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-46) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7, msdUUID=b37a4aef-09dd-4a4e-ae1e-a3bdb12c4ba5' 2013-03-12 02:27:37,713 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-20) [10904fbb] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 83834e1f-9e60-41b5-a9cc-16460a8a2fe2 Type: VDS