Bug 1294025

Summary: Spice vm console fails because servlet pki-resource is unavailable
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED ERRATA QA Contact: Ondra Machacek <omachace>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.6CC: achareka, didi, gklein, gveitmic, joallen, lsurette, mgoldboi, michal.skrivanek, mperina, oourfali, pstehlik, rbalakri, Rhev-m-bugs, sbonazzo, yeylon
Target Milestone: ovirt-3.6.3Keywords: ZStream
Target Release: 3.6.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1302776 (view as bug list) Environment:
Last Closed: 2016-03-09 21:15:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1302776    

Description Germano Veit Michel 2015-12-24 05:46:21 UTC
Description of problem:

- Cannot open Spice Console to VM in RHEV-M web GUI.

2015-12-21 13:31:49,001 WARN  [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ovirt-engine/services].[pki-resource]] (ajp-/127.0.0.1:8702-13) JBWEB00023
4: Servlet pki-resource is currently unavailable

# wget -O rhevm.cer http://rhevm/ca.crt
Resolving rhevm... A.B.C.D
Connecting to rhevm|A.B.C.D|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-12-22 12:06:21 ERROR 503: Service Unavailable.

- It's failing because the pki servlet is unavailable.

How reproducible:
100% (customer site)

Actual results:
Error 500

Expected results:
Open VM Spice Console

Additional info:

The servlet is marked as unavailable because it failed initialization. The initial problems is this:
--
2015-12-19 11:34:56,114 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ovirt-engine/services].[pki-resource]] (ajp-/127.0.0.1:8702-4) JBWEB000235: Allocate exception for servlet pki-resource: java.lang.NullPointerException
	at org.ovirt.engine.core.common.config.Config.getValue(Config.java:22) [common.jar:]
	at org.ovirt.engine.core.common.config.Config.getValue(Config.java:18) [common.jar:]
	at org.ovirt.engine.core.utils.PKIResources$Resource.<clinit>(PKIResources.java:84) [utils.jar:]
	at org.ovirt.engine.core.services.PKIResourceServlet.<clinit>(PKIResourceServlet.java:26) [classes:]
--

<clinit> is a class' static initializer, and if a static initializer throws any exception, the entire class is unusable forever. There is no way to recover from one of those, the JVM marks the class as bad.

A NullPointerException in Config.getValue() means that getConfigUtils() returned null, so either Config.setConfigUtils() has not been called or was called with null.

This may be a race condition where configuration is being used prior to Config.setConfigUtils() is called.

The only setConfigUtils() call outside of test code is called from an @Singleton @Startup EJB.

So this looks like a simple race condition between a Servlet and an EJB stating up. The best solution is to fix the design, but a simple solution would be to make the servlet depend on the EJB, for example putting this in PKIResourceServlet:
 @EJB BackendLocal backend;

Comment 4 Michal Skrivanek 2016-01-15 11:34:15 UTC
something wrong with the deployment perhaps? I suppose http://rhevm/ca.crt should always work.
Oved, PKI issues is your area, moving to infra

Comment 5 Oved Ourfali 2016-01-15 14:07:56 UTC
If that's in deployment then it is integration. Didi, can you take a look. Move back to infra if needed.

Comment 6 Yedidyah Bar David 2016-01-17 09:58:37 UTC
Can't see an integration problem.

If ca.crt is unreadable, that's noted in server.log, and current log does not mention that.

Comment 13 Martin Perina 2016-01-19 15:38:11 UTC
Hi Germano,

your initial analysis was correct, if you try to access PKIResourcesServlet before Backend EJB is initialized, then PKIResourcesServlet is inaccessible until engine instance is restarted. I was able to reproduce it even on latest master using those steps:

1. Start engine service

2. At the same time as step 1. execute following script to access PKIResourcesServlet as soon as available:

  for i in {1..8192} ; do wget -O rhevm.cer http://localhost/ca.crt ; done

3. Even after engine started up successfully, PKIResourcesServlet is inaccessible

The only workaround until proper fix is posted is either "don't access engine until it's properly started" or block HTTP access completely by:

  1. Stop Apache service
       - so nothing can access engine

  2. Start ovirt-engine service and wait until it's properly started

  3. Start Apache service

Thanks

Martin

Comment 14 Germano Veit Michel 2016-01-20 00:32:10 UTC
Yedidyah Bar David,

Sorry for not responding your questions right away.

1) I suppose it's still failing every single time he tries it. It did fail 100% during troubleshooting. Do you have anything in mind that might also be failing? Customer seems happy using VNC, did not came back.

2) Yes, always the same

3) We were also unable to understand why it suddenly started failing.

4) We checked these permissions, they were the same as in our labs (working).

Martin,

It was actually James analysis, I just asked him to check this since I don't know much Java. I'm glad you could reproduce this. According to 1075013 step (2) returns before engine is properly started so this is a bit tricky.

From what I understand 1075013 will not fix this (unless apache service depends on engine, which I am not sure is a good idea).

Cheers,
Germano

Comment 15 Martin Perina 2016-01-20 09:54:26 UTC
(In reply to Germano Veit Michel from comment #14)
> It was actually James analysis, I just asked him to check this since I don't
> know much Java. I'm glad you could reproduce this. According to 1075013 step
> (2) returns before engine is properly started so this is a bit tricky.
> 
> From what I understand 1075013 will not fix this (unless apache service
> depends on engine, which I am not sure is a good idea).
> 
> Cheers,
> Germano

Hi,

there is no easy/reliable way how to detect if J2EE application finished it's deployment successfully, so I doubt we could fix 1075013.

But regarding this bug the fix is not that complex, because it's not only about PKIResourceServlet <-> Backend dependency, but also about improper usage of internal API and that can be fixed easily.

Martin

Comment 17 Ondra Machacek 2016-02-22 14:59:49 UTC
ok in rhevm-3.6.3.2-0.1.el6.noarch

Comment 19 errata-xmlrpc 2016-03-09 21:15:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html