Bug 2092816 - Unable to enable maintenance mode for host due to "image transfer in progress" which is not the case.
Summary: Unable to enable maintenance mode for host due to "image transfer in progress...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.5.0.8
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.5.3
: ---
Assignee: Benny Zlotnik
QA Contact: sshmulev
URL:
Whiteboard:
Depends On:
Blocks: 2123141
TreeView+ depends on / blocked
 
Reported: 2022-06-02 10:13 UTC by Andreas Bleischwitz
Modified: 2022-11-16 06:54 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.5.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-19 14:31:42 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)
Error reported by UI (20.27 KB, image/png)
2022-06-02 10:13 UTC, Andreas Bleischwitz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 92 0 None Draft core: validate transfer phase 2022-06-27 11:32:29 UTC
Red Hat Issue Tracker RHV-46348 0 None None None 2022-06-02 10:16:17 UTC

Description Andreas Bleischwitz 2022-06-02 10:13:12 UTC
Created attachment 1886027 [details]
Error reported by UI

Description of problem:
Unable to enable maintenance mode for any host in the cluster after updating to RHV-4.5.0.7-0.9.el8ev.

Version-Release number of selected component (if applicable):
RHV-4.5.0.7-0.9.el8ev 
Host with RHEL-8.5 and RHEL-8.6 - both suffer from same issue.

How reproducible:
Always

Steps to Reproduce:
1. Select host, click on "Management -> Maintenance"
2. get faced by the dialog as shown in the attachment
3.

Actual results:
Maintenance mode fails

Expected results:
Maintenance mode activates

Additional info:

Two different log entries for my hosts:

Host rhev-node1:

2022-06-02 12:02:39,812+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLocks='[d50e06b8-18e8-407b-b0d4-e2bab8d60480=POOL]'}'
2022-06-02 12:02:40,005+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host rhev-node1.bleischwitz.org to Maintenance mode. Image transfer is in progress for the following (1) disks: 

        null 

Please wait for the operations to complete and try again.
2022-06-02 12:02:40,005+02 WARN  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] Validation of action 'MaintenanceNumberOfVdss' failed for user andreas@openidc-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host rhev-node1.bleischwitz.org,$disks    null,$disks_COUNTER 1
2022-06-02 12:02:40,006+02 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[d50e06b8-18e8-407b-b0d4-e2bab8d60480=POOL]'}'


Host rhev-node2:
2022-06-02 11:06:13,293+02 ERROR [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-26) [3d058938-d72a-4416-9bdb-c2c40765e4bf] Error during ValidateFailure.: java.lang.NullPointerException
        at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:52)
        at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47)
        at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
        at java.base/java.util.TimSort.sort(TimSort.java:220)
        at java.base/java.util.Arrays.sort(Arrays.java:1515)
        at java.base/java.util.ArrayList.sort(ArrayList.java:1750)
        at java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:392)
        at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
        at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand.validateNoActiveImageTransfers(MaintenanceNumberOfVdssCommand.java:488)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand.validate(MaintenanceNumberOfVdssCommand.java:378)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.internalValidateInTransaction(CommandBase.java:824)
        at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140)

Dump of database will be attached to this BZ as well.

Comment 2 Andreas Bleischwitz 2022-06-02 10:28:41 UTC
Based on the discussion with Nir Soffer, I checked the image_transfers table of the engine-db:

# select phase,last_updated from image_transfers;
┌───────┬────────────────────────────┐
│ phase │        last_updated        │
├───────┼────────────────────────────┤
│     7 │ 2022-05-22 21:05:47.404+02 │
│     7 │ 2022-05-15 21:06:24.531+02 │
│     7 │ 2022-05-08 21:04:31.984+02 │
│     7 │ 2022-05-03 19:54:42.426+02 │
└───────┴────────────────────────────┘

I was also told that those entries should get deleted 15 minutes after the task completed (phase 7 is final). I deleted those and was able to set my host into maintenance mode.

Comment 3 Nir Soffer 2022-06-02 10:31:46 UTC
Andreas, can you also attach engine log showing the the image transfers that you found
in the db? (grep the transfer id).

Comment 4 Andreas Bleischwitz 2022-06-02 10:51:39 UTC
Adding the logs from the image-transfers:

# select command_id, last_updated from image_transfers;
┌──────────────────────────────────────┬────────────────────────────┐
│              command_id              │        last_updated        │
│ c29584ab-3314-4a19-ad60-c210342cf5cc │ 2022-05-22 21:05:47.404+02 │
│ c7669063-6faa-4e1f-9335-71d74e01e350 │ 2022-05-15 21:06:24.531+02 │
│ 3cacc8e9-b3e2-4063-a772-8a118d16f22b │ 2022-05-08 21:04:31.984+02 │
│ 49a9fa67-de6f-4eca-9f6b-db73895f6643 │ 2022-05-03 19:54:42.426+02 │
└──────────────────────────────────────┴────────────────────────────┘

# zgrep c29584ab-3314-4a19-ad60-c210342cf5cc engine.log-20220526.gz
2022-05-22 21:00:50,410+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Creating ImageTransfer entity for command 'c29584ab-3314-4a19-ad60-c210342cf5cc', proxyEnabled: true
2022-05-22 21:00:50,628+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Successfully added Download disk 'lb_Disk1' (disk id: '3f410b77-a29d-44d7-8782-90534f7b9e2c', image id: 'f921d642-02d0-456b-b103-a85eb2181a4d') for image transfer command 'c29584ab-3314-4a19-ad60-c210342cf5cc'
2022-05-22 21:00:52,858+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] START, AddImageTicketVDSCommand(HostName = rhev-node2.bleischwitz.org, AddImageTicketVDSCommandParameters:{hostId='d9f07c93-130d-49c6-b48d-a025c92422a6', ticketId='86009e2e-2b01-462b-9b12-95427815b1e8', timeout='300', operations='[read]', size='49392123904', url='file:///rhev/data-center/mnt/blockSD/19d4c5de-82f2-4c7b-9419-e064ca6b7749/images/3f410b77-a29d-44d7-8782-90534f7b9e2c/f921d642-02d0-456b-b103-a85eb2181a4d', filename='null', sparse='false', transferId='c29584ab-3314-4a19-ad60-c210342cf5cc', dirty='false'}), log id: 38aa02b4
2022-05-22 21:00:52,899+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Started transfer session with transfer id c29584ab-3314-4a19-ad60-c210342cf5cc, timeout 300 seconds
2022-05-22 21:00:52,922+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Transferring
2022-05-22 21:00:52,925+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Returning from proceedCommandExecution after starting transfer session for image transfer command 'c29584ab-3314-4a19-ad60-c210342cf5cc'
2022-05-22 21:04:06,844+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Cancelled
2022-05-22 21:04:16,858+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-64) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finalizing Failure
2022-05-22 21:04:27,995+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [f130e960-4b35-4099-845e-607ace11d6bc] Failed to transfer disk 'f921d642-02d0-456b-b103-a85eb2181a4d' (command id 'c29584ab-3314-4a19-ad60-c210342cf5cc')
2022-05-22 21:04:27,998+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finished Failure
2022-05-22 21:05:47,307+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-3) [4fc1f025-4334-43e9-898a-b993e00e3e94] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finalizing Success
2022-05-22 21:05:47,404+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-3) [ece84046-ec0f-45bc-9648-66d5a99d0bef] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finalizing Success


# zgrep c7669063-6faa-4e1f-9335-71d74e01e350 engine.log-20220519.gz
2022-05-15 21:00:56,331+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Creating ImageTransfer entity for command 'c7669063-6faa-4e1f-9335-71d74e01e350', proxyEnabled: true
2022-05-15 21:00:57,128+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Successfully added Download disk 'lb_Disk1' (disk id: '3f410b77-a29d-44d7-8782-90534f7b9e2c', image id: 'f921d642-02d0-456b-b103-a85eb2181a4d') for image transfer command 'c7669063-6faa-4e1f-9335-71d74e01e350'
2022-05-15 21:00:59,209+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] START, AddImageTicketVDSCommand(HostName = rhev-node2.bleischwitz.org, AddImageTicketVDSCommandParameters:{hostId='d9f07c93-130d-49c6-b48d-a025c92422a6', ticketId='fcb83ed8-df96-4497-9e08-8d8568880281', timeout='300', operations='[read]', size='45097156608', url='file:///rhev/data-center/mnt/blockSD/19d4c5de-82f2-4c7b-9419-e064ca6b7749/images/3f410b77-a29d-44d7-8782-90534f7b9e2c/f921d642-02d0-456b-b103-a85eb2181a4d', filename='null', sparse='false', transferId='c7669063-6faa-4e1f-9335-71d74e01e350', dirty='false'}), log id: 7f1f8379
2022-05-15 21:00:59,228+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Started transfer session with transfer id c7669063-6faa-4e1f-9335-71d74e01e350, timeout 300 seconds
2022-05-15 21:00:59,250+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Transferring
2022-05-15 21:00:59,254+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Returning from proceedCommandExecution after starting transfer session for image transfer command 'c7669063-6faa-4e1f-9335-71d74e01e350'
2022-05-15 21:06:02,096+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-55) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Cancelled
2022-05-15 21:06:12,108+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-27) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finalizing Failure
2022-05-15 21:06:23,208+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Failed to transfer disk 'f921d642-02d0-456b-b103-a85eb2181a4d' (command id 'c7669063-6faa-4e1f-9335-71d74e01e350')
2022-05-15 21:06:23,210+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finished Failure
2022-05-15 21:06:24,465+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-64) [9679edb3-59d7-4b65-955d-fdb4dd08aa0b] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finalizing Success
2022-05-15 21:06:24,531+02 INFO  [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-64) [a1749af0-2190-435a-b403-7f3ee11cf60e] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finalizing Success

Adding full log of engine covering at least two transfers to the case.

As I'm not aware of triggering any imagetransfer manually, could oVirt-CSI be involved? I did enable this for the OpenShift cluster running on this RHV.

Comment 6 Benny Zlotnik 2022-06-02 12:11:58 UTC
Hi,

Was this a clean 4.5 install? Or upgraded from 4.4?
If upgraded, were this transfers done in 4.4?

Comment 7 Andreas Bleischwitz 2022-06-02 12:58:49 UTC
This was an upgrade from 4.4 and I did the upgrade to 4.5 just this week - so probably the transfers have been started on 4.4.

Comment 8 Benny Zlotnik 2022-06-02 13:11:33 UTC
Thanks, the NPE should be fixed for new transfer since bug 2043984 was fixed, because we always set the disk_id since 4.5.0, and it is missing in the provided db dump
The other issue is allowing the client to change the transfer's phase from finalizing_failure to finalizing_success

Comment 9 Benny Zlotnik 2022-06-07 14:06:11 UTC
To follow up, we have to open issues:
1. Not letting clients set invalid phases for the transfer (i.e. moving from FINISHED_FAILED to FINALIZING_SUCCESS)
2. Blocking the upgrade of the engine if there are active transfer, it makes little sense to move transfer started in one version to a newer version, especially given it may not be possible to continue it (if parameters changed for example)

Comment 10 Casper (RHV QE bot) 2022-09-13 19:31:16 UTC
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).

Comment 11 Casper (RHV QE bot) 2022-09-19 14:31:42 UTC
This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status.


Note You need to log in before you can comment on or make changes to this bug.