Bug 1651874
Summary: | Live merge failed with NPE on endAction of DestroyImage | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> |
Component: | ovirt-engine | Assignee: | Dana <delfassy> |
Status: | CLOSED ERRATA | QA Contact: | Avihai <aefrat> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.2.7 | CC: | aefrat, bzlotnik, delfassy, eshenitz, gwatson, lsvaty, mperina, mtessun, nashok, pelauter, Rhev-m-bugs, tnisan |
Target Milestone: | ovirt-4.3.5 | Keywords: | ZStream |
Target Release: | 4.3.5 | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.3.5.1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-12 11:53:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Germano Veit Michel
2018-11-21 05:50:24 UTC
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. Martin, we cannot seem be able to reproduce but it appears the NPE occurs while serializing to JSON, can someone from Infra please have a look? From the stacktrace it seems that either DestroyImageParameters [1] or StorageDomainParametersBase [2] classes are not serializable to Json, my guess is that the problem is Phase enum in StorageDomainParametersBase. If so, then we need to write some MixIn class similar to those we already have in [3]. Ravi, could you please take a look and suggest the fix? [1] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/action/DestroyImageParameters.java [2] https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/action/StorageDomainParametersBase.java [3] https://github.com/oVirt/ovirt-engine/tree/master/backend/manager/modules/utils/src/main/java/org/ovirt/engine/core/utils/serialization/json Just a note: this bug happens at around the same time every time 03:35:35 which is the time the CommandEntityCleanupManager runs, you can see it's running in 4.1 (Gordon's logs) one but not in >4.2, probably because of the migration from quartz to jboss scheduled threads. While CommandEntityCleanupManager should not cleanup commands not older than 30 days, perhaps there is some issue with the reload of the commands entity cache? *** Bug 1694888 has been marked as a duplicate of this bug. *** see also bug 1694888 for newer logs. Hi, We/QE needs a clear reproduction scenario so we can qa_ack/Nack(if we do not have HW/capacity) this bug and later verify. If this is something that is pure code, as it seems its serialization issue with JSON and the user/QE cannot reproduce it with a user flow you have the green light to verify this as DEV. (In reply to Avihai from comment #20) > Hi, > > We/QE needs a clear reproduction scenario so we can qa_ack/Nack(if we do not > have HW/capacity) this bug and later verify. > > If this is something that is pure code, as it seems its serialization issue > with JSON and the user/QE cannot reproduce it with a user flow you have the > green light to verify this as DEV. We don't have a clear reproducing steps, we have fixed the code according to the provided stacktrace. According to the code we know this issue can happen when you execute DestroyImageCommand and engine saves parameters of this command as JSON object into database. (In reply to Martin Perina from comment #21) > (In reply to Avihai from comment #20) > > Hi, > > > > We/QE needs a clear reproduction scenario so we can qa_ack/Nack(if we do not > > have HW/capacity) this bug and later verify. > > > > If this is something that is pure code, as it seems its serialization issue > > with JSON and the user/QE cannot reproduce it with a user flow you have the > > green light to verify this as DEV. > > We don't have a clear reproducing steps, we have fixed the code according to > the provided stacktrace. According to the code we know this issue can happen > when you execute DestroyImageCommand and engine saves parameters of this > command as JSON object into database. I was able to reproduce the issue (with breakpoints) after Moti asked me to verify the patch: The issue is here being that CommandEntityCleanupManager forces a refresh of the CommandsCache which retrieves the commands from the command_entities table. Since the DestroyImageParameters is persisted without the parent parameters, this refresh causes them to be reloaded without it and causes the NPE later on, this also explains why this issue only happens when CommandEntityCleanupManager runs as it's the only things the forces a refresh. To reproduce one could set CommandEntityCleanupTime to some other time and have multiple concurrent live merge commands run at the same time > I was able to reproduce the issue (with breakpoints) after Moti asked me to
> verify the patch:
> The issue is here being that CommandEntityCleanupManager forces a refresh of
> the CommandsCache which retrieves the commands from the command_entities
> table.
> Since the DestroyImageParameters is persisted without the parent parameters,
> this refresh causes them to be reloaded without it and causes the NPE later
> on,
> this also explains why this issue only happens when
> CommandEntityCleanupManager runs as it's the only things the forces a
> refresh.
> To reproduce one could set CommandEntityCleanupTime to some other time and
> have multiple concurrent live merge commands run at the same time
So as there is no way to reproduce this without using break-points in the engine, right?
If so, as you already verified this fix, can you please verify this bug?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2431 sync2jira sync2jira |