Bug 869258 - [Engine] - error message "VdcBLLException: null" when engine tries to run VM and operation is failing due to network timeout exception
Summary: [Engine] - error message "VdcBLLException: null" when engine tries to run VM ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.2.0
Assignee: Michal Skrivanek
QA Contact: Pavel Stehlik
URL:
Whiteboard: virt
Depends On:
Blocks: 915537
TreeView+ depends on / blocked
 
Reported: 2012-10-23 12:23 UTC by David Botzer
Modified: 2014-01-14 00:04 UTC (History)
12 users (show)

Fixed In Version: sf2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine (219.15 KB, application/x-gzip)
2012-10-23 12:23 UTC, David Botzer
no flags Details

Description David Botzer 2012-10-23 12:23:28 UTC
Created attachment 632063 [details]
engine

Description of problem:
error message VdcBLLException: null when libvirt & vdsm are in deadlock

Version-Release number of selected component (if applicable):
3.1/si21

How reproducible:
always

Steps to Reproduce:
1.Installed rhevm+dwh+reports
2.Created setup - DC, Host, Storage(iscsi), vms
3.Changed engine & vds date to 31/12/12
  
Actual results:
All VMs show "Not Responding"
Host CPU jumps to 99%
libvirt in deadlock while trying to destroy qemu process while it can't communicate with its monitor socket

Expected results:
should give such exceptions in engine log

Additional info:
logs

Comment 1 David Botzer 2012-10-23 12:35:47 UTC
The problem is that when Engine tries to connect to vdsm, while vdsm is in deadlock, there is a timeout,
But the engine also sends Exception with NULL to the log, which it shouldnt be.

error message "VdcBLLException: null" when engine tries to run VM and operation is failing due to network timeout exception

Comment 2 Itamar Heim 2012-10-24 05:08:52 UTC
I'd also rename VdcBllException...

Comment 3 Andrew Cathrow 2012-12-03 11:36:08 UTC
Is there a BZ for the deadlock which is more important than the error message that we can clean up in 3.2

Comment 4 Haim 2012-12-04 07:37:41 UTC
(In reply to comment #3)
> Is there a BZ for the deadlock which is more important than the error
> message that we can clean up in 3.2

deadlock was originated in libvirt and already solved in libvirt-0.9.10-21.el6_3.6.x86_64.

Comment 5 Roy Golan 2012-12-06 15:27:07 UTC
better error reporting was merged [1]  so now you should see something like

2012-12-06 17:23:14,424 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (pool-10-thread-48) [7c2d1e0a] Command org.ovirt.engine.core.bll.StopVmCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException

[1] http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=6b7ae33a8ef9682adf00bf2495487d3617ffc99b

Comment 7 Roy Golan 2012-12-30 12:49:57 UTC
to test the change you can block a host to 54321 and see the error underlying exception is now printed

host shell:
iptables -I INPUT --proto tcp --dport 54321 -j REJECT

engine:
2012-12-30 14:37:15,282 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-13) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ab416c3a-4374-43b1-961b-008897d74b87 : suz, VDS Network Error, continuing.
java.net.NoRouteToHostException: No route to host

Comment 9 Libor Spevak 2013-02-24 14:37:29 UTC
Already solved and verified.

Comment 11 Itamar Heim 2013-06-11 08:55:04 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 08:55:07 UTC
3.2 has been released

Comment 13 Itamar Heim 2013-06-11 08:55:10 UTC
3.2 has been released

Comment 14 Itamar Heim 2013-06-11 08:58:05 UTC
3.2 has been released

Comment 15 Itamar Heim 2013-06-11 09:27:41 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.