Bug 1345853 - engine fails to restart after failing to create a pool with name already existing
Summary: engine fails to restart after failing to create a pool with name already exis...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.0.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.0.0-rc3
: 4.0.0
Assignee: Arik
QA Contact: sefi litmanovich
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-13 10:31 UTC by sefi litmanovich
Modified: 2016-07-05 07:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-05 07:50:16 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.0.0+
rule-engine: blocker+
rule-engine: planning_ack+
michal.skrivanek: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine + server logs (730.87 KB, application/x-gzip)
2016-06-13 10:31 UTC, sefi litmanovich
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 59096 0 master MERGED core: fix possible NPE on pool creation/update 2016-06-14 10:22:19 UTC
oVirt gerrit 59097 0 master MERGED core: fix serialization of pool related parameter classes 2016-06-14 10:43:29 UTC
oVirt gerrit 59143 0 ovirt-engine-4.0 MERGED core: fix possible NPE on pool creation/update 2016-06-14 12:41:12 UTC
oVirt gerrit 59144 0 ovirt-engine-4.0 MERGED core: fix serialization of pool related parameter classes 2016-06-14 15:55:35 UTC

Description sefi litmanovich 2016-06-13 10:31:24 UTC
Created attachment 1167416 [details]
engine + server logs

Description of problem:

After attempting to create a pool with same name as an existing pool in the cluster and failing (as expected), trying to restart ovirt-engine will result in failure:

2016-06-13 11:47:27,593 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (ServerService Thread Pool -- 60) [] Error in invocating CTOR of command 'AddVmPoolWithVms': null
2016-06-13 11:47:27,594 ERROR [org.ovirt.engine.core.bll.InitBackendServicesOnStartupBean] (ServerService Thread Pool -- 60) [] Failed to initialize backend: org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke private v
oid org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.init() on org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller@642601aa
............ (see full engine log attached)

Need to remove the relevant entries from asyn_tasks and command_entities  in order to restart engine properly again.


Version-Release number of selected component (if applicable):
rhevm-4.0.0.2-0.1.el7ev.noarch

How reproducible:
always

Steps to Reproduce:
1. Create a vm pool with some name e.g. 'test'.
2. Attempt to create another pool with the same name 'test' - this will fail as expected.
3. Restart engine.

Actual results:
engine restarts successfully.

Expected results:
engine starts and is active but is in failed state:

● ovirt-engine.service - oVirt Engine
   Loaded: loaded (/usr/lib/systemd/system/ovirt-engine.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-06-13 13:11:45 IDT; 1min 50s ago
 Main PID: 9399 (ovirt-engine.py)
   CGroup: /system.slice/ovirt-engine.service
           ├─9399 /usr/bin/python /usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.py --redirect-output --systemd=notify start
           └─9430 ovirt-engine -server -XX:+TieredCompilation -Xms1024M -Xmx1024M -Djava.awt.headless=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djsse.enableSNIExtension=false -XX:+HeapDumpOnO...

Jun 13 13:11:42 slitmano-rhevm.scl.lab.tlv.redhat.com systemd[1]: ovirt-engine.service: main process exited, code=exited, status=1/FAILURE
Jun 13 13:11:42 slitmano-rhevm.scl.lab.tlv.redhat.com systemd[1]: Unit ovirt-engine.service entered failed state.
Jun 13 13:11:42 slitmano-rhevm.scl.lab.tlv.redhat.com systemd[1]: ovirt-engine.service failed.
Jun 13 13:11:42 slitmano-rhevm.scl.lab.tlv.redhat.com systemd[1]: Starting oVirt Engine...
Jun 13 13:11:45 slitmano-rhevm.scl.lab.tlv.redhat.com systemd[1]: Started oVirt Engine.

In DB command_entities table has a command type 304 stuck with status:
ENDED_WITH_FAILURE.

Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2016-06-15 09:42:41 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Eyal Edri 2016-06-23 13:58:02 UTC
included in the last released build of ovirt-4.0.0-rc3

Comment 3 sefi litmanovich 2016-06-27 16:36:04 UTC
verified with rhevm-4.0.0.6-0.1.el7ev.noarch according to steps in description.

Comment 4 Sandro Bonazzola 2016-07-05 07:50:16 UTC
oVirt 4.0.0 has been released, closing current release.


Note You need to log in before you can comment on or make changes to this bug.