Bug 1789389 - "Internal Engine error" appears when using legacy affinity labels
Summary: "Internal Engine error" appears when using legacy affinity labels
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.5.2
: ---
Assignee: Shmuel Melamud
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-09 13:29 UTC by Marko Myllynen
Modified: 2022-08-30 08:47 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.5.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-30 08:47:42 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.5?
ahadas: planning_ack?
ahadas: devel_ack+
pm-rhel: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 519 0 None open core: Return validation messages from SchedulingManager 2022-07-07 00:25:54 UTC

Description Marko Myllynen 2020-01-09 13:29:45 UTC
Description of problem:
In RHV 4.3.4 setting a Host which is running a VM with a single-host Affinity Label (=the VM is allowed to run only on that Host) to Maintenance Mode doesn't quite succeed or fail but rather leaves the Host to an undesired state. The expected behavior is that you would not be able to set the Host to maintenance mode if the running VM has no other place to migrate. Although this was the result in the end, once selecting the Host and putting it to Maintenance Mode the other VMs in that host started to migrate away as if the Maintenance Mode would go through. However, the manager is unable to migrate the VM that has the Affinity Label rule and the Maintenance Mode setting never completes to any direction.
 
In 4.3.7 the Affinity Label function has changed and does not work as before by default. When run with the new functionality, all VMs (including the one with the label) are able to migrate away from the Host without issue and the Host is properly set to Maintenance Mode.

The issue with 4.3.7 is that it still has the possibility to enforce the legacy Affinity Label by introducing the "Implicit" option when creating/modifying an Affinity Label. When enabling this and running the same scenario again the issue reproduces.

It looks like this legacy/implicit mode should either be dropped or fixed.

Version-Release number of selected component (if applicable):
RHV 4.3.7

Comment 1 Ryan Barry 2020-01-10 00:22:15 UTC
Why is this a problem? "Fixing" affinity so hosts can go into maintenance was important to customers, but leaving the legacy mode provides a way to keep the old behavior. This is desired behavior

Comment 2 Marko Myllynen 2020-01-10 07:47:33 UTC
The problem is that the Host ends up into a limbo, not in Maintenance but not fully operational. I would have expected there would be a notification stating it can't be put into Maintenance. But if this behavior is expected by users then please feel free to close this BZ. Thanks.

Comment 3 Andrej Krejcir 2020-01-13 15:54:55 UTC
Affinity is only one possible cause for this issue.

When a host is put to maintenance mode and some of the VMs cannot be migrated away for any reason (affinity or simply cannot fit elsewhere), then the host is stuck in 'PreparingForMaintenance'. It is up to the user to either manually resolve the conflicts and migrate the VMs, or activate the host again. An error message is displayed, but it is not helpful: "Internal Engine error".

There is a comment in the code, from which it looks like this behavior is expected.


The same cause is also for Bug 1660837.

Comment 4 Arik 2020-06-08 15:11:28 UTC
We will investigate the "Internal Engine error".
As for the original report - we'll keep it as is, as that's the intended behaviour.

Comment 5 Michal Skrivanek 2022-04-11 09:33:09 UTC
no update for a while, didn't make it into 4.5, closing

Comment 6 Arik 2022-04-13 07:19:19 UTC
Polina, do we still get the "Internal Engine error" message?

Comment 7 Polina 2022-04-13 09:02:26 UTC
Hi Arik , today we have the following behavior for such a scenario:

In UI user get a popup window saying: 
"Error while executing action Move Host to Maintenance mode: Internal Engine Error"
I think the reporter says about it.

In Events we have good Event message
"Host host_mixed_1 cannot change into maintenance mode - not all Vms have been migrated successfully. Consider manual intervention: stopping/migrating Vms running on this host. (User: admin@internal-authz)."

In engine.log - 
2022-04-13 11:56:34,924+03 WARN  [org.ovirt.engine.core.bll.MigrateMultipleVmsCommand] (default task-3) [88dcb62] VM 'golden_env_mixed_virtio_1' cannot be migrated.
2022-04-13 11:56:34,924+03 WARN  [org.ovirt.engine.core.bll.MigrateMultipleVmsCommand] (default task-3) [88dcb62] VM 'golden_env_mixed_virtio_2' cannot be migrated.
2022-04-13 11:56:34,931+03 ERROR [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (default task-3) [88dcb62] Failed to migrate one or more VMs.

Comment 8 Polina 2022-07-20 13:46:42 UTC
Verified on ovirt-engine-4.5.1.3-0.36.el8ev.noarch

1. Create affinity group with hard vm-to-host rule,like

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<affinity_groups>
    <affinity_group href="/ovirt-engine/api/clusters/20fb0bee-17e0-414a-b3b9-337473f8c310/affinitygroups/b30f663d-b819-44ec-826d-e17a8286fcf3" id="b30f663d-b819-44ec-826d-e17a8286fcf3">
        <name>A</name>
        <link href="/ovirt-engine/api/clusters/20fb0bee-17e0-414a-b3b9-337473f8c310/affinitygroups/b30f663d-b819-44ec-826d-e17a8286fcf3/vms" rel="vms"/>
        <link href="/ovirt-engine/api/clusters/20fb0bee-17e0-414a-b3b9-337473f8c310/affinitygroups/b30f663d-b819-44ec-826d-e17a8286fcf3/hosts" rel="hosts"/>
        <link href="/ovirt-engine/api/clusters/20fb0bee-17e0-414a-b3b9-337473f8c310/affinitygroups/b30f663d-b819-44ec-826d-e17a8286fcf3/vmlabels" rel="vmlabels"/>
        <link href="/ovirt-engine/api/clusters/20fb0bee-17e0-414a-b3b9-337473f8c310/affinitygroups/b30f663d-b819-44ec-826d-e17a8286fcf3/hostlabels" rel="hostlabels"/>
        <broken>false</broken>
        <enforcing>false</enforcing>
        <hosts_rule>
            <enabled>true</enabled>
            <enforcing>true</enforcing>
            <positive>true</positive>
        </hosts_rule>
        <positive>true</positive>
        <priority>1</priority>
        <vms_rule>
            <enabled>true</enabled>
            <enforcing>false</enforcing>
            <positive>true</positive>
        </vms_rule>
        <cluster href="/ovirt-engine/api/clusters/20fb0bee-17e0-414a-b3b9-337473f8c310" id="20fb0bee-17e0-414a-b3b9-337473f8c310"/>
        <host_labels/>
        <hosts>
            <host href="/ovirt-engine/api/hosts/bf771def-9820-4d1d-8bcd-c05331faa43f" id="bf771def-9820-4d1d-8bcd-c05331faa43f"/>
        </hosts>
        <vm_labels/>
        <vms>
            <vm href="/ovirt-engine/api/vms/73b15ce5-a245-45f0-9e9f-17dafbe9c56d" id="73b15ce5-a245-45f0-9e9f-17dafbe9c56d"/>
            <vm href="/ovirt-engine/api/vms/b322f945-60cf-4207-8015-5c7964249f23" id="b322f945-60cf-4207-8015-5c7964249f23"/>
        </vms>
    </affinity_group>
</affinity_groups>

2. set host to maintenance

Result:

now get 'Operation Canceled ' popup message
Error while executing action:

Cannot maintenance Host. There is no host that satisfies current scheduling constraints. See below for details:
The host host_mixed_2 did not satisfy internal filter VmToHostsAffinityGroups because it did not match positive affinity rules .
The host host_mixed_3 did not satisfy internal filter VmToHostsAffinityGroups because it did not match positive affinity rules .

and also Event 
Host host_mixed_1 cannot change into maintenance mode - not all Vms have been migrated successfully. Consider manual intervention: stopping/migrating Vms running on this host. (User: admin@internal-authz).

Comment 9 Sandro Bonazzola 2022-08-30 08:47:42 UTC
This bugzilla is included in oVirt 4.5.2 release, published on August 10th 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.2 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.