Bug 923227
| Summary: | [engine-backend] VM stays in up state after VDSM crashed and failed to initialize storage | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||
| Component: | ovirt-engine | Assignee: | Michal Skrivanek <michal.skrivanek> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Elad <ebenahar> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.2.0 | CC: | acathrow, dyasny, hateya, iheim, lpeer, mbetak, Rhev-m-bugs, yeylon, ykaul | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | virt | ||||||
| Fixed In Version: | j | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-03-27 13:27:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Hi, could you please specify at what instant did you block the connection to storage? I tried to reproduce it several times, but when I disabled at whether during the system boot or after the system started (up to login screen). The machine appeared in the engine as paused due to storage problems (after a while - see the engine refresh delay, until then it was "UP"). The machine even resumed and continued working properly after restarting iptables and vdsmd (and consequent reactivation of storage domain in the engine).
My VDSM failed with
Thread-22::ERROR::2013-03-27 12:30:28,724::domainMonitor::223::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain a7e5f59c-2877-475b-8afc-f760ba63defb monitoring information
Traceback (most recent call last):
File "/usr/share/vdsm/storage/domainMonitor.py", line 200, in _monitorDomain
self.domain.selftest()
File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
fileSD.FileStorageDomain.selftest(self)
File "/usr/share/vdsm/storage/fileSD.py", line 481, in selftest
self.oop.os.statvfs(self.domaindir)
File "/usr/share/vdsm/storage/remoteFileHandler.py", line 275, in callCrabRPCFunction
*args, **kwargs)
File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in callCrabRPCFunction
rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in _recvAll
raise Timeout()
Timeout
and shortly after that
MainThread::ERROR::2013-03-27 12:31:25,399::misc::173::Storage.Misc::(panic) Panic: Couldn't connect to supervdsm
Traceback (most recent call last):
File "/usr/share/vdsm/supervdsm.py", line 195, in launch
utils.retry(self._connect, Exception, timeout=60, tries=3)
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 934, in retry
return func()
File "/usr/share/vdsm/supervdsm.py", line 181, in _connect
self._manager.connect()
File "/usr/lib64/python2.6/multiprocessing/managers.py", line 474, in connect
conn = Client(self._address, authkey=self._authkey)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 143, in Client
c = SocketClient(address)
File "/usr/lib64/python2.6/multiprocessing/connection.py", line 263, in SocketClient
s.connect(address)
File "<string>", line 1, in connect
error: [Errno 2] No such file or directory
Maybe you could also specify the exact way of how you blocked the connection to storage. My was
# iptables -A OUTPUT -d 10.34.63.204 -j REJECT
Thank you
Hi Martin, I've also didn't managed to reproduce. I'm closing the bug for now. |
Created attachment 712652 [details] vdsm+rhevm logs Description of problem: VM status is reported as up although its host crashed. It happened to me when I blocked the connection between VDSM and the domain. Version-Release number of selected component (if applicable): rhevm-backend-3.2.0-10.14.beta1.el6ev.noarch vdsm-4.10.2-11.0.el6ev.x86_64 libvirt-0.10.2-18.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Have one host and one iSCSI domain 2. Run a VM with 1 disk or more 3. Block the connection between host and the domain using Iptables. 4. VDSM will try to initialize connection to the domain and will fail. Then it will enter to non operational. Actual results: engine reports that the live VM is Up althouth it is down. Expected results: Engine should report that the VM is unknown Additional info: on VDSM: Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 395, in _recoverExistingVms not self.irs.getConnectedStoragePoolsList()['poollist']: AttributeError: 'NoneType' object has no attribute 'getConnectedStoragePoolsList' VM Channels Listener::DEBUG::2013-03-19 14:23:06,655::vmChannels::60::vds::(_handle_timeouts) Timeout on fileno 17. on RHEVM: 2013-03-19 15:02:23,533 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-96) [79788000] Command GetCapabilitiesVDS execution failed. Error: VDSRecoveringException: Failed to initialize storage see additional VDSM and RHEVM logs.