Bug 2008898 - failed to move host to maintenance exit on error : "Image transfer is in progress" although no operations are in progress
Summary: failed to move host to maintenance exit on error : "Image transfer is in pr...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-29 12:58 UTC by Tzahi Ashkenazi
Modified: 2021-10-03 16:37 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-03 16:37:57 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
host (292.64 KB, image/png)
2021-09-29 12:58 UTC, Tzahi Ashkenazi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43721 0 None None None 2021-09-29 12:59:08 UTC

Description Tzahi Ashkenazi 2021-09-29 12:58:19 UTC
Created attachment 1827348 [details]
host

Created attachment 1827348 [details]
host

Created attachment 1827348 [details]
host

Description of problem:
while trying  to remove the last host to maintenance out of a cluster of 10 hosts 
the operation failed on error related to "Image transfer is in progress" 
although no images are in progress and all the disks status are marked as OK 

the error from the UI : 

Error while executing action: Cannot switch Host f16-h33-000-r640.rdu2.scalelab.redhat.com to Maintenance mode. Image transfer is in progress for the following (13) disks:

c32314ab-77d4-40ec-9962-eb53b9f90f58,
d71c52c0-4154-4070-a0f8-b51cca2ce0c7,
ea1a1782-dac9-4409-9a8d-0d737a213ffe,
f4e6933e-f360-4c60-90be-51e42cf6c027,
f4e6933e-f360-4c60-90be-51e42cf6c027,
...

Please wait for the operations to complete and try again.


Version-Release number of selected component (if applicable):
rhv-release-4.4.8-5-001.noarch


from the engine log : 
2021-09-29 08:54:40,267-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-947) [e0255ab2-e7b0-4ffb-8b03-8109e5cc3963] EVENT_ID: GENERIC_ERROR_MESSAGE(14,001), Cannot switch Host f16-h33-000-r640.rdu2.scalelab.redhat.com to Maintenance mode. Image transfer is in progress for the following (13) disks:

 ${disks}

Please wait for the operations to complete and try again.,$disks 	c32314ab-77d4-40ec-9962-eb53b9f90f58,
	d71c52c0-4154-4070-a0f8-b51cca2ce0c7,
	ea1a1782-dac9-4409-9a8d-0d737a213ffe,
	f4e6933e-f360-4c60-90be-51e42cf6c027,
	f4e6933e-f360-4c60-90be-51e42cf6c027,
	...
2021-09-29 08:54:40,267-04 WARN  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-947) [e0255ab2-e7b0-4ffb-8b03-8109e5cc3963] Validation of action 'MaintenanceNumberOfVdss' failed for user admin@internal-authz. Reasons: VAR__TYPE__HOST,VAR__ACTION__MAINTENANCE,VDS_CANNOT_MAINTENANCE_HOST_WITH_RUNNING_IMAGE_TRANSFERS,$host f16-h33-000-r640.rdu2.scalelab.redhat.com,$disks 	c32314ab-77d4-40ec-9962-eb53b9f90f58,
	d71c52c0-4154-4070-a0f8-b51cca2ce0c7,
	ea1a1782-dac9-4409-9a8d-0d737a213ffe,
	f4e6933e-f360-4c60-90be-51e42cf6c027,
	f4e6933e-f360-4c60-90be-51e42cf6c027,
	...,$disks_COUNTER 13
2021-09-29 08:54:40,267-04 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (default task-947) [e0255ab2-e7b0-4ffb-8b03-8109e5cc3963] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[94b5e416-efef-41dc-949a-c4df7c78d63f=POOL]'}'
2021-09-29 08:54:42,918-04 INFO  [org.ovirt.engine.core.sso.service.AuthenticationService] (default task-947) [] User admin@internal-authz with profile [internal] successfully logged in with scopes: ovirt-app-api ovirt-ext=token-info:authz-search ovirt-ext=token-info:public-authz-search ovirt-ext=token-info:validate ovirt-ext=token:password-access
2021-09-29 08:54:42,930-04 INFO  [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-946) [56f33456] Running command: CreateUserSessionCommand internal: false.
2021-09-29 08:54:42,937-04 INFO  [org.ovirt.engine.core.bll.aaa.LogoutSessionCommand] (default task-946) [4eb58174] Running command: LogoutSessionCommand internal: false.

p.s 

1. nine hosts have been moved to maintenance successfully and then  was removed from the UI without any issues on the same cluster L0_Group_4
2. the severity set to high  because there is no work around to by pass the message and put the host to maitenance

the vdsm logs and the engine log can be found here :
https://drive.google.com/drive/folders/1pOouooLC8kD9M8JGmLsDSyxZQ0dKsLwj?usp=sharing

Comment 1 Eyal Shenitzky 2021-09-29 14:13:33 UTC
This bug is the same as bug 1987295 that should be fixed in 4.4.8-5.
Can you confirm the engine version?

Comment 2 Tzahi Ashkenazi 2021-09-29 14:18:45 UTC
engine version : 
  ovirt-engine-4.4.8.4-0.7.el8ev.noarch

Comment 3 Tzahi Ashkenazi 2021-09-29 14:51:14 UTC
any workaround to by pass the message and put the host to maintenance ??

Comment 4 Eyal Shenitzky 2021-09-30 07:00:03 UTC
(In reply to Tzahi Ashkenazi from comment #3)
> any workaround to by pass the message and put the host to maintenance ??

Yes, you just need to wait for some time (15 min top) until the DB cleaner thread will remove the image_transfer entity from the DB.
Or, you can remove it manually.

Closing as duplication of bug 1987295.

*** This bug has been marked as a duplicate of bug 1987295 ***

Comment 5 Tzahi Ashkenazi 2021-10-03 12:50:28 UTC
manual intervention was required in order to move the host to maintenance the recommendation to wait 15 min didn't work in this case ( by Eyal )
the workaround on the engine DB : 

select vds_id from vds_static where vds_name='f16-h33-000-r640.rdu2.scalelab.redhat.com';  
select status from vds_dynamic where vds_id='060f71f9-ebdf-4e04-8fe5-8ac538172b12';  
update vds_dynamic set status=2 where vds_id='060f71f9-ebdf-4e04-8fe5-8ac538172b12';  

p.s 
status=2 is maintenance

Comment 6 Eyal Shenitzky 2021-10-03 13:27:55 UTC
(In reply to Tzahi Ashkenazi from comment #5)
> manual intervention was required in order to move the host to maintenance
> the recommendation to wait 15 min didn't work in this case ( by Eyal )
> the workaround on the engine DB : 
> 
> select vds_id from vds_static where
> vds_name='f16-h33-000-r640.rdu2.scalelab.redhat.com';  
> select status from vds_dynamic where
> vds_id='060f71f9-ebdf-4e04-8fe5-8ac538172b12';  
> update vds_dynamic set status=2 where
> vds_id='060f71f9-ebdf-4e04-8fe5-8ac538172b12';  
> 
> p.s 
> status=2 is maintenance

That's strange, in this version we should have the DbEntityCleanupManager thread that should clean finished/failed image transfers sessions.
Can you check the DB for any existing Image transfer entity?

select * from image_transfers;

And add the engine logs.

Comment 7 Tzahi Ashkenazi 2021-10-03 13:58:12 UTC
engine=# select * from image_transfers;
              command_id              | command_type | phase |        last_updated        | message |                vds_id                |               disk_id                | imaged_ticket_id |
          proxy_uri                         | bytes_sent  | bytes_total | type | active |                           daemon_uri                           | client_inactivity_timeout | image_format | backend | backu
p_id | client_type | shallow | timeout_policy
--------------------------------------+--------------+-------+----------------------------+---------+--------------------------------------+--------------------------------------+------------------+---------------
--------------------------------------------+-------------+-------------+------+--------+----------------------------------------------------------------+---------------------------+--------------+---------+------
-----+-------------+---------+----------------
 910ed2c3-44ca-4502-851f-6b065113f650 |         1024 |     7 | 2021-08-24 07:27:25.582+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 6134bbd3-51c6-4b74-96a8-b091d4e5c5ea |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 19537068032 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 65e8a5e9-266c-4533-ab9f-e176c2bf3041 |         1024 |     7 | 2021-08-24 07:27:22.707+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | d71c52c0-4154-4070-a0f8-b51cca2ce0c7 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 10687086592 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 bad05882-3795-4350-8922-acd54dcee6c8 |         1024 |     7 | 2021-08-23 10:31:38.583+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 73b80669-c21f-403f-8496-ad2d99d11c27 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 19998441472 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 a8b7395d-87bb-4591-983c-17fa8f562b32 |         1024 |     7 | 2021-08-23 10:31:38.979+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | f4e6933e-f360-4c60-90be-51e42cf6c027 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 50843353088 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 f7b07d64-202d-49e8-a7c8-b980ce0b18bc |         1024 |     7 | 2021-08-23 10:31:34.927+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 59850072-f493-440f-8724-8568583ba1e9 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 34191966208 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 aa020bcb-ac75-4c7b-b051-4534842044e1 |         1024 |     7 | 2021-08-23 10:31:39.213+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 3dd79e70-bf55-4a39-b7b7-f4e439e54c80 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 23077060608 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 66736bc1-5406-4fd8-bbcd-99260a6453a2 |         1024 |     7 | 2021-08-23 10:31:39.258+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 6134bbd3-51c6-4b74-96a8-b091d4e5c5ea |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 21760049152 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 4e204816-eeac-464d-b44b-d36260b22dee |         1024 |     7 | 2021-08-23 10:31:35.11+00  |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | ea1a1782-dac9-4409-9a8d-0d737a213ffe |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 42840621056 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 ce36488f-5b12-4a8a-92ca-30cff34bfee3 |         1024 |     7 | 2021-08-23 10:31:36.47+00  |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | c32314ab-77d4-40ec-9962-eb53b9f90f58 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 26508001280 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 b43856fe-2a98-40e4-a976-c8374ddbeb3e |         1024 |     7 | 2021-08-24 07:27:21.666+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 73b80669-c21f-403f-8496-ad2d99d11c27 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 20887633920 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 b6771f89-b358-4c30-99bc-f3b129a04eb8 |         1024 |     7 | 2021-08-24 07:27:23.1+00   |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | f4e6933e-f360-4c60-90be-51e42cf6c027 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 42991616000 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 d77894e6-7179-4ad7-8fd0-a364b8e7a624 |         1024 |     7 | 2021-08-23 12:36:31.868+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 59850072-f493-440f-8724-8568583ba1e9 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 53687091200 | 53687091200 |    1 | f      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
 7f2d7f41-5a16-4e2a-9b89-b3c0c9db5261 |         1024 |     7 | 2021-08-24 07:36:43.474+00 |         | 060f71f9-ebdf-4e04-8fe5-8ac538172b12 | 348fcbb4-4dd0-4e62-9428-f935a9e6e037 |                  | https://rhev-r
ed-01.rdu2.scalelab.redhat.com:54323/images | 26667384832 | 53687091200 |    1 | t      | https://f16-h33-000-r640.rdu2.scalelab.redhat.com:54322/images |                        60 |            5 |       1 |
     |           2 | f       | legacy
(13 rows)

Comment 8 Eyal Shenitzky 2021-10-03 16:37:57 UTC
It seems like your environment still has ongoing image transfer sessions.

We can see that all the image transfers have phase '7' which is FINALIZING_SUCCESS.

The host can be set to maintenance only when the phase is FINISHED_SUCCESS=9  or FINISHED_FAILURE=10 or PAUSED_SYSTEM=4 or PAUSED_USER=5.

Not sure what cause them to stay at that phase but the reported bug is the expected behavior.


Note You need to log in before you can comment on or make changes to this bug.