Description of problem: Could be split between virt and infra, targeting infra for the time being Experienced only during recovery only while tinkering with containers (https://github.com/mojaves/ovirt-site/blob/container-support/source/feature/container-support.html.md) I believe this is triggered by recovery being much faster with containers. Not sure it is a real risk using VMs. We need to wait for JSONRPC binding availability before to start recovery, otherwise VM recovery will fail at the final step because Thread-12::DEBUG::2016-04-05 10:11:22,733::__init__::207::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4376331650, "dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5": {"status": "Up", "displayInfo": [{"tlsPort": "-1", "ipAddress": "0", "type": "vnc", "port": "-1"}], "hash": "-8241192262902672289", "cpuUser": "0.00", "displayIp": "0", "monitorResponse": "0", "elapsedTime": "630", "displayType": "vnc", "cpuSys": "0.00", "displayPort": "-1", "displaySecurePort": "-1", "timeOffset": "0", "clientIp": ""}}, "jsonrpc": "2.0", "method": "|virt|VM_status|dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5"} Thread-12::ERROR::2016-04-05 10:11:22,733::vm::759::virt.vm::(_startUnderlyingVm) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 740, in _startUnderlyingVm self.send_status_event(**self._getRunningVmStats()) File "/usr/share/vdsm/virt/vm.py", line 349, in send_status_event self._notify('VM_status', stats) File "/usr/share/vdsm/virt/vm.py", line 353, in _notify self.cif.notify(sub_id, **{self.id: params}) File "/usr/share/vdsm/clientIF.py", line 160, in notify notification.emit(**kwargs) File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 208, in emit self._cb(notification) File "/usr/share/vdsm/clientIF.py", line 163, in _send_notification self.bindings['jsonrpc'].reactor.server.send(message, KeyError: 'jsonrpc' The binding will be booted only later: MainThread::DEBUG::2016-04-05 10:11:22,835::protocoldetector::206::vds.MultiProtocolAcceptor::(add_detector) Adding detector <yajsonrpc.stompreactor.StompDetector instance at 0x1696830> MainThread::DEBUG::2016-04-05 10:11:22,836::executor::69::Executor::(start) Starting executor MainThread::DEBUG::2016-04-05 10:11:22,837::executor::161::Executor::(__init__) Starting worker jsonrpc.Executor/0 MainThread::DEBUG::2016-04-05 10:11:22,837::executor::161::Executor::(__init__) Starting worker jsonrpc.Executor/1 Version-Release number of selected component (if applicable): Vdsm master How reproducible: I've only seen that using containers. Steps to Reproduce: Hopefully there are shorter ways, however that's how I do it: 1. Patch Vdsm to run containers: https://github.com/mojaves/ovirt-site/blob/container-support/source/feature/container-support.html.md#how-to-try-the-feature 2. run 1+ container 3. restart Vdsm Actual results: Vdsm fails to recovery with the aforementioned stack trace Expected results: Vdsm recovers succesfully Additional info: Please check full Vdsm log attached
Created attachment 1143709 [details] full vdsm log attempting container recovery
Francesco, trying to understand the impact, is only the event gets lost, or are there additional issues?
(In reply to Oved Ourfali from comment #2) > Francesco, trying to understand the impact, is only the event gets lost, or > are there additional issues? Should we face this bug, the recovery of VMs fail alltogether, Thread-12::INFO::2016-04-05 10:11:22,804::vm::1291::virt.vm::(setDownStatus) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::Changed state to Down: 'jsonrpc' (code=1) Thread-12::INFO::2016-04-05 10:11:22,805::guestagent::345::virt.vm::(stop) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::Stopping connection Thread-12::DEBUG::2016-04-05 10:11:22,805::vmchannels::234::vds::(unregister) Delete fileno 24 from listener. Thread-12::DEBUG::2016-04-05 10:11:22,806::vmchannels::66::vds::(_unregister_fd) Failed to unregister FD from epoll (ENOENT): 24 Thread-12::DEBUG::2016-04-05 10:11:22,833::__init__::207::jsonrpc.Notification::(emit) Sending event {"params": {"notify_time": 4376331750, "dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5": {"status": "Down", "timeOffset": "0", "exitReason": 1, "exitMessage": "'jsonrpc'", "exitCode": 1}}, "jsonrpc": "2.0", "method": "|virt|VM_status|dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5"} Thread-12::ERROR::2016-04-05 10:11:22,833::utils::374::root::(wrapper) Unhandled exception And the VMs are thus destroyed by Engine: jsonrpc.Executor/7::DEBUG::2016-04-05 10:11:37,813::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.destroy' in bridge with {u'vmID': u'dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5'} jsonrpc.Executor/7::DEBUG::2016-04-05 10:11:37,815::API::310::vds::(destroy) About to destroy VM dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5 jsonrpc.Executor/7::DEBUG::2016-04-05 10:11:37,816::vm::3960::virt.vm::(destroy) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::destroy Called jsonrpc.Executor/7::INFO::2016-04-05 10:11:37,816::vm::3875::virt.vm::(releaseVm) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::Release VM resources jsonrpc.Executor/7::WARNING::2016-04-05 10:11:37,816::vm::335::virt.vm::(_set_lastStatus) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::trying to set state to Powering down when already Down jsonrpc.Executor/7::INFO::2016-04-05 10:11:37,817::guestagent::345::virt.vm::(stop) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::Stopping connection jsonrpc.Executor/7::INFO::2016-04-05 10:11:37,817::vm::3907::virt.vm::(_destroyVm) vmId=`dd38b452-0a9a-4e9e-a349-2f4ca73cc8c5`::_destroyVmGraceful attempt #0 OTOH I'd like to remark that I've never encountered this misebehaviour without my experimental container patches: I never seen this happen on Vdsm master.
Closed due to capacity, if still reproduce, please reopen.