Bug 979553
Summary: | remoteFileHandler.py timeout causes host to go non-operational | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | wdaniel | ||||
Component: | vdsm | Assignee: | Saggi Mizrahi <smizrahi> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Barak Dagan <bdagan> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.2.0 | CC: | acathrow, bazulay, benglish, hateya, iheim, jkt, lpeer, pstehlik, wdaniel | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | 3.3.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | infra | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-09-12 10:15:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
wdaniel
2013-06-28 19:56:27 UTC
please attach SOS report. Created attachment 767917 [details]
sosreport 6/14
Everything seems to be working as expected. If the host has problems communicating with the NFS server things will obviously start to break. We need to figure out why the NFS is not responding and fix that. As long as VDSM can't read the metadata of the storage it can't connect to the pool. Just to be clear, The host attempts to connect to a storagePool but since it can not read the master domain (timed out) it fails to connect to the pool. At this point the engine moves it to non-operational (this is standard) as this host is required to be connected to this pool. So this happens due to the host inability to read the master domain, On standard flow the host recovery mechanism kicks in every 5 minutes and basically tries to redo connect to the storahePool. So a few questions: - does it happen to different hosts? (looks like yes) - does it happen always on the same NFS server ? - did you try to access the NFS server manually from the host after it happened? - did you try to restart VDSM instead of reboot - in case it fixes the problem it may indicate a VDSM problem - How much time did you wait before the reboot (to see if the host recovery kicked in and the assumed NFS problem went away) Basically it looks like a connectivity problem to the NFS server, and this might happen due to many reasons. |