Bug 2107985 - Live migration of disk cause a database error (engine won't start anymore)
Summary: Live migration of disk cause a database error (engine won't start anymore)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.5.1.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.5.2
: ---
Assignee: Benny Zlotnik
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-18 07:50 UTC by Giulio Casella
Modified: 2022-08-30 08:47 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.5.2
Clone Of:
Environment:
Last Closed: 2022-08-30 08:47:42 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?
ahadas: blocker+


Attachments (Terms of Use)
engine log files (relevant part) (19.65 KB, application/gzip)
2022-07-18 07:50 UTC, Giulio Casella
no flags Details
Content of command_entities table (335.95 KB, application/gzip)
2022-07-18 07:55 UTC, Giulio Casella
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 526 0 None Merged core: do not use double-brace initialization for parameter value 2022-07-18 08:06:10 UTC
Red Hat Issue Tracker RHV-47712 0 None None None 2022-07-18 07:53:06 UTC

Description Giulio Casella 2022-07-18 07:50:52 UTC
Created attachment 1897890 [details]
engine log files (relevant part)

Description of problem: Following a live disk migration from a storage domain to another, ovirt-engine is broken (e.g. unavailable), and won't start/restart.
Digging in engine.log I can find:

ERROR [org.ovirt.engine.core.utils.serialization.json.JsonObjectDeserializer] (ServerService Thread Pool -- 45) [] Cannot deserialize {
"@class" : "org.ovirt.engine.core.common.action.CreateSnapshotDiskParameters",
  "commandId" : [ "org.ovirt.engine.core.compat.Guid", {
    "uuid" : "6ae544f6-b608-4d8d-9f99-eabd5d5db0ad"
  } ],
[...cut...]
"domain" : "my.dom.ain"[truncated 5971 chars]; line: 72, column: 89] (through reference chain: org.ovirt.engine.core.common.action.CreateSnapshotDiskParameters["diskImagesMap"])
2022-07-13 09:57:48,315+02 ERROR [org.ovirt.engine.core.bll.InitBackendServicesOnStartupBean] (ServerService Thread Pool -- 45) [] Failed to initialize backend: org.jboss.weld.exceptions.WeldException: WELD-000049: Unable to invoke public void org.ovirt.engine.core.bll.tasks.CommandContextsCacheImpl.initContextsMap() on org.ovirt.engine.core.bll.tasks.CommandContextsCacheImpl@3f52ccce
[...cut...] 




Version-Release number of selected component (if applicable):
ovirt-engine-4.5.1.3-1.el8.noarch

How reproducible:

Cannot reproduce, happened only once over many disk image live migrations

Actual results:

ovirt engine (and admin portal) hangs, won't start/restart

Expected results:

Disk live migration complete successful

Additional info:

Comment 1 Giulio Casella 2022-07-18 07:54:59 UTC
In the command_entities table there are many rows (65): 2 of them reference the disk live migration task. No rows in job table reference their correlationId, so removing those rows from command_entities made the engine start again.
Find attached content of command_entities.

Comment 2 Giulio Casella 2022-07-18 07:55:40 UTC
Created attachment 1897891 [details]
Content of command_entities table

Comment 3 RHEL Program Management 2022-07-18 08:06:17 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 4 RHEL Program Management 2022-07-19 06:59:11 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Arik 2022-07-19 07:00:45 UTC
caused by the fix for bz 1958032

Comment 6 Evelina Shames 2022-07-24 10:13:11 UTC
Verification flow:
1. Start LSM
2. Restart ovirt-engine during LSM

---> Service is up, no 'Cannot deserialize' error in logs.
Version: ovirt-engine-4.5.2-0.3.el8ev


But LSM gets stuck, a new bug was opened for this issue: bug 2110186

Comment 7 Arik 2022-07-24 11:10:05 UTC
(In reply to Evelina Shames from comment #6)
> But LSM gets stuck, a new bug was opened for this issue: bug 2110186

right, we restarted the engine in order to ensure that we deserialize the parameters but in light of this issue that was found when doing that (bz 2110186), I guess that's not what the user did but the parameters were deserialized for a different reason (no necessarily when the engine started, we clear the cached parameters from time to time and then deserialize them from the database as well) - therefore it makes sense to verify this bug and consider the issue during engine restart as a separate issue

Comment 8 Sandro Bonazzola 2022-08-30 08:47:42 UTC
This bugzilla is included in oVirt 4.5.2 release, published on August 10th 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.2 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.