Summary: | vdsm reports that the storage domain is active, when in fact it's missing a link to it | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Natalie Gavrielov <ngavrilo> | ||||||
Component: | General | Assignee: | Idan Shaby <ishaby> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.17.8 | CC: | amureini, bugs, danken, laravot, ngavrilo, nsoffer, sbonazzo, tnisan, ylavi | ||||||
Target Milestone: | ovirt-3.6.3 | Flags: | rule-engine:
ovirt-3.6.z+
rule-engine: exception+ ylavi: planning_ack+ tnisan: devel_ack+ rule-engine: testing_ack+ |
||||||
Target Release: | 4.17.20 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-03-10 12:49:07 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1227665, 1269982 | ||||||||
Attachments: |
|
Description
Natalie Gavrielov
2015-10-14 16:18:45 UTC
*** Bug 1271772 has been marked as a duplicate of this bug. *** *** Bug 1271773 has been marked as a duplicate of this bug. *** *** Bug 1271775 has been marked as a duplicate of this bug. *** The unfetched domains errors looks unrelated, but I agree that after unblocking the storage, you should be able to create a disk. Nir, can you take a look please? Does putting the domain in maintenance and activating it fix the problem? Created attachment 1084327 [details]
vdsm.log, engine.log
Yes, it seems to solve the problem.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. (In reply to Natalie Gavrielov from comment #6) > Created attachment 1084327 [details] > vdsm.log, engine.log > > Yes, it seems to solve the problem. Lowering the severity as this has easy workaround. Liron, I remember that you worked on this in the past - maybe this is a duplicate? Natalie, please note that you didn't attach the logs from the SPM host that the task was executed on, please attach it. Nir, I didn't work on that issue but I've found out what happened - First, This bug is duplicate of bug https://bugzilla.redhat.com/show_bug.cgi?id=1091030 BZ 1093924 was opened as a clone of BZ 1091030 to handle a scenario of hosts that are activated when there is an unreachable domain (as of my understanding). Later on during the work and reviews on BZ 1091030 patches it was decided that the solution will be merged to the ovirt-3.4 branch only (https://gerrit.ovirt.org/#/c/27466/) while the patch for 3.5 was abandoned (https://gerrit.ovirt.org/#/c/27334/) stating that it'll be fixed also on the engine side in BZ 1093924..but it wasn't clarified on 1093924 BZ, so the scenario described on this bug should be relevant for version >= 3.5 We can take https://gerrit.ovirt.org/#/c/27466/ and use it, what was the reasoning against using this vdsm side solution on all versions? Dan, can you explain why the patch mereged in 3.4 was not merged into master? Ser comment 10 for the details. In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone. Liron, SPM was on host aqua-vds4. Attachments include vdsm logs. From the first attachment (2015-10-14) file: /aqua-vds4/vdsm.log.1.xz Thread-832::DEBUG::2015-10-14 16:51:03,753::task::1191::Storage.TaskManager.Task::(prepare) Task=`0ef3b007-1f51-47c9-9306-f352011c7ca4`::finished: {'spm_st': {'spmId': 1, 'spmStatus': 'SPM', 'spmLver': 4L}} From the second attachment (2015-10-19) file: /aqua-vds4/vdsm.log Thread-183127::DEBUG::2015-10-19 12:33:46,941::task::1191::Storage.TaskManager.Task::(prepare) Task=`17bd7549-ba0b-490b-a588-13f98a0e3983`::finished: {'spm_st': {'spmId': 1, 'spmStatus': 'SPM', 'spmLver': 4L}} (In reply to Nir Soffer from comment #11) > Dan, can you explain why the patch mereged in 3.4 was not merged into master? > > Ser comment 10 for the details. So what's the resolution here? Is this patch missing from master by a sheer mistake? (In reply to Allon Mureinik from comment #14) > (In reply to Nir Soffer from comment #11) > > Dan, can you explain why the patch mereged in 3.4 was not merged into master? > > > > Ser comment 10 for the details. > > So what's the resolution here? > Is this patch missing from master by a sheer mistake? Dan should explain why it is missing from master. I think we should use the same patch. After fixing master 3.6 and 3.5, we can think about a better solution for 4 if needed. I am afraid that I fail to recall the details, beyond my then-understanding that solving bug 1093924 would make the Vdsm-side hack (with its slow sdCache.produce() and raceful-by-design response to event) redundant. Apparently it did not. Idan, note comment #10 by Liron when you start to fix this bug Instructions for testing: 1. Add a host with two storage domains - iscsi domain A and file domain B. 2. Take the host down to maintenance. 3. Block host from connecting to domain A/B (two scenarios). 4. Activate host and wait until it becomes the SPM. 5. Unblock the host from connecting to the domain. 6. Watch the missing link being added to /rhev/data-center/SPUUID. 7. Add a disk on that domain and watch it added successfully. Followed the scenario described in comment #18 for both scenarios of step #3 (for block and file). For both scenarios, the symlink under /rhev/data-center was created after connectivity to the storage got resumed and for both, image created successfully once the domain became accessible. Verified using: rhevm-3.6.3.2-0.1.el6.noarch vdsm-4.17.21-0.el7ev.noarch |