Bug 1294025 - Spice vm console fails because servlet pki-resource is unavailable
Summary: Spice vm console fails because servlet pki-resource is unavailable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.6
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-3.6.3
: 3.6.3
Assignee: Martin Perina
QA Contact: Ondra Machacek
URL:
Whiteboard:
Depends On:
Blocks: 1302776
TreeView+ depends on / blocked
 
Reported: 2015-12-24 05:46 UTC by Germano Veit Michel
Modified: 2019-09-12 09:41 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1302776 (view as bug list)
Environment:
Last Closed: 2016-03-09 21:15:02 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1075013 0 low CLOSED 'service ovirt-engine start' should return only after it is started 2022-02-25 08:23:33 UTC
Red Hat Product Errata RHEA-2016:0376 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager 3.6.0 2016-03-10 01:20:52 UTC
oVirt gerrit 52764 0 master MERGED core: Reimplement PKIResources.Resource as class 2016-01-28 14:42:46 UTC
oVirt gerrit 52868 0 ovirt-engine-3.6 MERGED core: Reimplement PKIResources.Resource as class 2016-01-28 20:29:37 UTC
oVirt gerrit 52870 0 ovirt-engine-3.6.3 MERGED core: Reimplement PKIResources.Resource as class 2016-01-28 20:58:32 UTC

Internal Links: 1075013

Description Germano Veit Michel 2015-12-24 05:46:21 UTC
Description of problem:

- Cannot open Spice Console to VM in RHEV-M web GUI.

2015-12-21 13:31:49,001 WARN  [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ovirt-engine/services].[pki-resource]] (ajp-/127.0.0.1:8702-13) JBWEB00023
4: Servlet pki-resource is currently unavailable

# wget -O rhevm.cer http://rhevm/ca.crt
Resolving rhevm... A.B.C.D
Connecting to rhevm|A.B.C.D|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-12-22 12:06:21 ERROR 503: Service Unavailable.

- It's failing because the pki servlet is unavailable.

How reproducible:
100% (customer site)

Actual results:
Error 500

Expected results:
Open VM Spice Console

Additional info:

The servlet is marked as unavailable because it failed initialization. The initial problems is this:
--
2015-12-19 11:34:56,114 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ovirt-engine/services].[pki-resource]] (ajp-/127.0.0.1:8702-4) JBWEB000235: Allocate exception for servlet pki-resource: java.lang.NullPointerException
	at org.ovirt.engine.core.common.config.Config.getValue(Config.java:22) [common.jar:]
	at org.ovirt.engine.core.common.config.Config.getValue(Config.java:18) [common.jar:]
	at org.ovirt.engine.core.utils.PKIResources$Resource.<clinit>(PKIResources.java:84) [utils.jar:]
	at org.ovirt.engine.core.services.PKIResourceServlet.<clinit>(PKIResourceServlet.java:26) [classes:]
--

<clinit> is a class' static initializer, and if a static initializer throws any exception, the entire class is unusable forever. There is no way to recover from one of those, the JVM marks the class as bad.

A NullPointerException in Config.getValue() means that getConfigUtils() returned null, so either Config.setConfigUtils() has not been called or was called with null.

This may be a race condition where configuration is being used prior to Config.setConfigUtils() is called.

The only setConfigUtils() call outside of test code is called from an @Singleton @Startup EJB.

So this looks like a simple race condition between a Servlet and an EJB stating up. The best solution is to fix the design, but a simple solution would be to make the servlet depend on the EJB, for example putting this in PKIResourceServlet:
 @EJB BackendLocal backend;

Comment 4 Michal Skrivanek 2016-01-15 11:34:15 UTC
something wrong with the deployment perhaps? I suppose http://rhevm/ca.crt should always work.
Oved, PKI issues is your area, moving to infra

Comment 5 Oved Ourfali 2016-01-15 14:07:56 UTC
If that's in deployment then it is integration. Didi, can you take a look. Move back to infra if needed.

Comment 6 Yedidyah Bar David 2016-01-17 09:58:37 UTC
Can't see an integration problem.

If ca.crt is unreadable, that's noted in server.log, and current log does not mention that.

Comment 13 Martin Perina 2016-01-19 15:38:11 UTC
Hi Germano,

your initial analysis was correct, if you try to access PKIResourcesServlet before Backend EJB is initialized, then PKIResourcesServlet is inaccessible until engine instance is restarted. I was able to reproduce it even on latest master using those steps:

1. Start engine service

2. At the same time as step 1. execute following script to access PKIResourcesServlet as soon as available:

  for i in {1..8192} ; do wget -O rhevm.cer http://localhost/ca.crt ; done

3. Even after engine started up successfully, PKIResourcesServlet is inaccessible

The only workaround until proper fix is posted is either "don't access engine until it's properly started" or block HTTP access completely by:

  1. Stop Apache service
       - so nothing can access engine

  2. Start ovirt-engine service and wait until it's properly started

  3. Start Apache service

Thanks

Martin

Comment 14 Germano Veit Michel 2016-01-20 00:32:10 UTC
Yedidyah Bar David,

Sorry for not responding your questions right away.

1) I suppose it's still failing every single time he tries it. It did fail 100% during troubleshooting. Do you have anything in mind that might also be failing? Customer seems happy using VNC, did not came back.

2) Yes, always the same

3) We were also unable to understand why it suddenly started failing.

4) We checked these permissions, they were the same as in our labs (working).

Martin,

It was actually James analysis, I just asked him to check this since I don't know much Java. I'm glad you could reproduce this. According to 1075013 step (2) returns before engine is properly started so this is a bit tricky.

From what I understand 1075013 will not fix this (unless apache service depends on engine, which I am not sure is a good idea).

Cheers,
Germano

Comment 15 Martin Perina 2016-01-20 09:54:26 UTC
(In reply to Germano Veit Michel from comment #14)
> It was actually James analysis, I just asked him to check this since I don't
> know much Java. I'm glad you could reproduce this. According to 1075013 step
> (2) returns before engine is properly started so this is a bit tricky.
> 
> From what I understand 1075013 will not fix this (unless apache service
> depends on engine, which I am not sure is a good idea).
> 
> Cheers,
> Germano

Hi,

there is no easy/reliable way how to detect if J2EE application finished it's deployment successfully, so I doubt we could fix 1075013.

But regarding this bug the fix is not that complex, because it's not only about PKIResourceServlet <-> Backend dependency, but also about improper usage of internal API and that can be fixed easily.

Martin

Comment 17 Ondra Machacek 2016-02-22 14:59:49 UTC
ok in rhevm-3.6.3.2-0.1.el6.noarch

Comment 19 errata-xmlrpc 2016-03-09 21:15:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html


Note You need to log in before you can comment on or make changes to this bug.