| Summary: | Reconstruct fails and host fails to activate when blocking connection from host to nfs server | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Liron Aravot <laravot> | ||||
| Component: | BLL.Storage | Assignee: | Liron Aravot <laravot> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lilach Zitnitski <lzitnits> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.1.0 | CC: | amureini, bugs, lzitnits, ratamir, tnisan | ||||
| Target Milestone: | ovirt-4.1.0-alpha | Keywords: | Regression | ||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.1+
rule-engine: blocker+ rule-engine: planning_ack+ amureini: devel_ack+ ratamir: testing_ack+ |
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1397861 | ||||||
| : | 1455273 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-02-15 15:00:23 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1455273 | ||||||
| Attachments: |
|
||||||
|
Description
Liron Aravot
2016-11-27 17:58:49 UTC
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Retargetting to 4.1. While the original bug 1397861 is still under investigation, V4 was only introduced in oVirt 4.1, so this is clearly not a zstream candidate. The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified. --------------------------------------
Tested with the following code:
----------------------------------------
ovirt-engine-4.1.0-0.2.master.20161203231307.gitd7d920b.el7.centos.noarch
vdsm-4.18.999-1162.gite95442e.el7.centos.x86_64
Tested with the following scenario:
Steps to Reproduce:
1. make sure the master domain is nfs (for some reason happens only with nfs master)
2. block connection from host to master domain
Actual results:
Whole environment is down, host fails to get back up and master cannot reconstruct.
Expected results:
Host should come back up after a few minutes, master should reconstruct and the dc should be active
Additional info:
When reconstruct is executed, it succeeds, the issue here that the reconstruct is not even executed. The host becomes non-responsive and fails to activate.
engine.log
2016-12-14 08:57:30,382+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-35) [1d7f1b29] Correlation ID: 1d7f1b29, Call Stack: null, Custom Event ID: -1, Message: Failed to connect Host blond-vdsf to Storage Servers
vdsm.log
2016-12-14 08:57:00,144 ERROR (monitor/6d7e184) [storage.StorageDomainCache] domain 6d7e184e-fc0b-4e44-baa6-eab7e129426e not found (sdc:157)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('6d7e184e-fc0b-4e44-baa6-eab7e129426e',)
Created attachment 1232181 [details]
host fails logs
engine.log
vdsm.log
-------------------------------------- Tested with the following code: ---------------------------------------- vdsm-4.18.999-1162.gite95442e.el7.centos.x86_64 ovirt-engine-4.1.0-0.2.master.20161203231307.gitd7d920b.el7.centos.noarch Tested with the following scenario: Steps to Reproduce: 1. block connection from host to master storage domain Actual results: Master domain reconstruct and DC is active again. Expected results: Moving to VERIFIED! -------------------------------------- Tested with the following code: ---------------------------------------- ovirt-engine-4.1.0-0.2.master.20161203231307.gitd7d920b.el7.centos.noarch vdsm-4.18.999-1184.git090267e.el7.centos.x86_64 Tested with the following scenario: Steps to Reproduce: 1. from hosts block connection to the storage domain which is currently master storage domain using iptables 2. wait for another storage domain to become master Actual results: New storage domain becomes master. Moving to VERIFIED! |