Created attachment 1886027 [details] Error reported by UI Description of problem: Unable to enable maintenance mode for any host in the cluster after updating to RHV-4.5.0.7-0.9.el8ev. Version-Release number of selected component (if applicable): RHV-4.5.0.7-0.9.el8ev Host with RHEL-8.5 and RHEL-8.6 - both suffer from same issue. How reproducible: Always Steps to Reproduce: 1. Select host, click on "Management -> Maintenance" 2. get faced by the dialog as shown in the attachment 3. Actual results: Maintenance mode fails Expected results: Maintenance mode activates Additional info: Two different log entries for my hosts: Host rhev-node1: 2022-06-02 12:02:39,812+02 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLocks='[d50e06b8-18e8-407b-b0d4-e2bab8d60480=POOL]'}' 2022-06-02 12:02:40,005+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host rhev-node1.bleischwitz.org to Maintenance mode. Image transfer is in progress for the following (1) disks: null Please wait for the operations to complete and try again. 2022-06-02 12:02:40,005+02 WARN [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] Validation of action 'MaintenanceNumberOfVdss' failed for user andreas@openidc-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host rhev-node1.bleischwitz.org,$disks null,$disks_COUNTER 1 2022-06-02 12:02:40,006+02 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-2) [e20adb84-d43c-4b47-8d39-6587c0138661] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[d50e06b8-18e8-407b-b0d4-e2bab8d60480=POOL]'}' Host rhev-node2: 2022-06-02 11:06:13,293+02 ERROR [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-26) [3d058938-d72a-4416-9bdb-c2c40765e4bf] Error during ValidateFailure.: java.lang.NullPointerException at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:52) at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47) at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) at java.base/java.util.TimSort.sort(TimSort.java:220) at java.base/java.util.Arrays.sort(Arrays.java:1515) at java.base/java.util.ArrayList.sort(ArrayList.java:1750) at java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:392) at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258) at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand.validateNoActiveImageTransfers(MaintenanceNumberOfVdssCommand.java:488) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand.validate(MaintenanceNumberOfVdssCommand.java:378) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.internalValidateInTransaction(CommandBase.java:824) at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:140) Dump of database will be attached to this BZ as well.
Based on the discussion with Nir Soffer, I checked the image_transfers table of the engine-db: # select phase,last_updated from image_transfers; ┌───────┬────────────────────────────┐ │ phase │ last_updated │ ├───────┼────────────────────────────┤ │ 7 │ 2022-05-22 21:05:47.404+02 │ │ 7 │ 2022-05-15 21:06:24.531+02 │ │ 7 │ 2022-05-08 21:04:31.984+02 │ │ 7 │ 2022-05-03 19:54:42.426+02 │ └───────┴────────────────────────────┘ I was also told that those entries should get deleted 15 minutes after the task completed (phase 7 is final). I deleted those and was able to set my host into maintenance mode.
Andreas, can you also attach engine log showing the the image transfers that you found in the db? (grep the transfer id).
Adding the logs from the image-transfers: # select command_id, last_updated from image_transfers; ┌──────────────────────────────────────┬────────────────────────────┐ │ command_id │ last_updated │ │ c29584ab-3314-4a19-ad60-c210342cf5cc │ 2022-05-22 21:05:47.404+02 │ │ c7669063-6faa-4e1f-9335-71d74e01e350 │ 2022-05-15 21:06:24.531+02 │ │ 3cacc8e9-b3e2-4063-a772-8a118d16f22b │ 2022-05-08 21:04:31.984+02 │ │ 49a9fa67-de6f-4eca-9f6b-db73895f6643 │ 2022-05-03 19:54:42.426+02 │ └──────────────────────────────────────┴────────────────────────────┘ # zgrep c29584ab-3314-4a19-ad60-c210342cf5cc engine.log-20220526.gz 2022-05-22 21:00:50,410+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Creating ImageTransfer entity for command 'c29584ab-3314-4a19-ad60-c210342cf5cc', proxyEnabled: true 2022-05-22 21:00:50,628+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Successfully added Download disk 'lb_Disk1' (disk id: '3f410b77-a29d-44d7-8782-90534f7b9e2c', image id: 'f921d642-02d0-456b-b103-a85eb2181a4d') for image transfer command 'c29584ab-3314-4a19-ad60-c210342cf5cc' 2022-05-22 21:00:52,858+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] START, AddImageTicketVDSCommand(HostName = rhev-node2.bleischwitz.org, AddImageTicketVDSCommandParameters:{hostId='d9f07c93-130d-49c6-b48d-a025c92422a6', ticketId='86009e2e-2b01-462b-9b12-95427815b1e8', timeout='300', operations='[read]', size='49392123904', url='file:///rhev/data-center/mnt/blockSD/19d4c5de-82f2-4c7b-9419-e064ca6b7749/images/3f410b77-a29d-44d7-8782-90534f7b9e2c/f921d642-02d0-456b-b103-a85eb2181a4d', filename='null', sparse='false', transferId='c29584ab-3314-4a19-ad60-c210342cf5cc', dirty='false'}), log id: 38aa02b4 2022-05-22 21:00:52,899+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Started transfer session with transfer id c29584ab-3314-4a19-ad60-c210342cf5cc, timeout 300 seconds 2022-05-22 21:00:52,922+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Transferring 2022-05-22 21:00:52,925+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-3) [f130e960-4b35-4099-845e-607ace11d6bc] Returning from proceedCommandExecution after starting transfer session for image transfer command 'c29584ab-3314-4a19-ad60-c210342cf5cc' 2022-05-22 21:04:06,844+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Cancelled 2022-05-22 21:04:16,858+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-64) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finalizing Failure 2022-05-22 21:04:27,995+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [f130e960-4b35-4099-845e-607ace11d6bc] Failed to transfer disk 'f921d642-02d0-456b-b103-a85eb2181a4d' (command id 'c29584ab-3314-4a19-ad60-c210342cf5cc') 2022-05-22 21:04:27,998+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [f130e960-4b35-4099-845e-607ace11d6bc] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finished Failure 2022-05-22 21:05:47,307+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-3) [4fc1f025-4334-43e9-898a-b993e00e3e94] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finalizing Success 2022-05-22 21:05:47,404+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-3) [ece84046-ec0f-45bc-9648-66d5a99d0bef] Updating image transfer c29584ab-3314-4a19-ad60-c210342cf5cc phase to Finalizing Success # zgrep c7669063-6faa-4e1f-9335-71d74e01e350 engine.log-20220519.gz 2022-05-15 21:00:56,331+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Creating ImageTransfer entity for command 'c7669063-6faa-4e1f-9335-71d74e01e350', proxyEnabled: true 2022-05-15 21:00:57,128+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Successfully added Download disk 'lb_Disk1' (disk id: '3f410b77-a29d-44d7-8782-90534f7b9e2c', image id: 'f921d642-02d0-456b-b103-a85eb2181a4d') for image transfer command 'c7669063-6faa-4e1f-9335-71d74e01e350' 2022-05-15 21:00:59,209+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddImageTicketVDSCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] START, AddImageTicketVDSCommand(HostName = rhev-node2.bleischwitz.org, AddImageTicketVDSCommandParameters:{hostId='d9f07c93-130d-49c6-b48d-a025c92422a6', ticketId='fcb83ed8-df96-4497-9e08-8d8568880281', timeout='300', operations='[read]', size='45097156608', url='file:///rhev/data-center/mnt/blockSD/19d4c5de-82f2-4c7b-9419-e064ca6b7749/images/3f410b77-a29d-44d7-8782-90534f7b9e2c/f921d642-02d0-456b-b103-a85eb2181a4d', filename='null', sparse='false', transferId='c7669063-6faa-4e1f-9335-71d74e01e350', dirty='false'}), log id: 7f1f8379 2022-05-15 21:00:59,228+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Started transfer session with transfer id c7669063-6faa-4e1f-9335-71d74e01e350, timeout 300 seconds 2022-05-15 21:00:59,250+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Transferring 2022-05-15 21:00:59,254+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (default task-64) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Returning from proceedCommandExecution after starting transfer session for image transfer command 'c7669063-6faa-4e1f-9335-71d74e01e350' 2022-05-15 21:06:02,096+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-55) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Cancelled 2022-05-15 21:06:12,108+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-27) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finalizing Failure 2022-05-15 21:06:23,208+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Failed to transfer disk 'f921d642-02d0-456b-b103-a85eb2181a4d' (command id 'c7669063-6faa-4e1f-9335-71d74e01e350') 2022-05-15 21:06:23,210+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [e480ef89-4dd3-4a18-8bcc-c6dfecafcc2f] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finished Failure 2022-05-15 21:06:24,465+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-64) [9679edb3-59d7-4b65-955d-fdb4dd08aa0b] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finalizing Success 2022-05-15 21:06:24,531+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater] (default task-64) [a1749af0-2190-435a-b403-7f3ee11cf60e] Updating image transfer c7669063-6faa-4e1f-9335-71d74e01e350 phase to Finalizing Success Adding full log of engine covering at least two transfers to the case. As I'm not aware of triggering any imagetransfer manually, could oVirt-CSI be involved? I did enable this for the OpenShift cluster running on this RHV.
Hi, Was this a clean 4.5 install? Or upgraded from 4.4? If upgraded, were this transfers done in 4.4?
This was an upgrade from 4.4 and I did the upgrade to 4.5 just this week - so probably the transfers have been started on 4.4.
Thanks, the NPE should be fixed for new transfer since bug 2043984 was fixed, because we always set the disk_id since 4.5.0, and it is missing in the provided db dump The other issue is allowing the client to change the transfer's phase from finalizing_failure to finalizing_success
To follow up, we have to open issues: 1. Not letting clients set invalid phases for the transfer (i.e. moving from FINISHED_FAILED to FINALIZING_SUCCESS) 2. Blocking the upgrade of the engine if there are active transfer, it makes little sense to move transfer started in one version to a newer version, especially given it may not be possible to continue it (if parameters changed for example)
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).
This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status.