Bug 1780910 - Using "Ignore OVF update failure" on maintenance puts SD into Inactive state
Summary: Using "Ignore OVF update failure" on maintenance puts SD into Inactive state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.3.7.2
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ovirt-4.4.1
: ---
Assignee: shani
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-08 12:43 UTC by Amit Bawer
Modified: 2020-07-08 08:24 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.4.1.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-08 08:24:42 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
engine log (21.60 KB, text/plain)
2019-12-08 12:52 UTC, Amit Bawer
no flags Details
vdsm log (28.78 KB, text/plain)
2019-12-08 12:52 UTC, Amit Bawer
no flags Details
engine log for maintanace with good ovf store (43.87 KB, text/plain)
2019-12-08 17:18 UTC, Amit Bawer
no flags Details
vdsm log for matinenance with good ovf store (95.81 KB, text/plain)
2019-12-08 17:19 UTC, Amit Bawer
no flags Details
vdsm log for maintenance VG checksum error (7.65 MB, text/plain)
2019-12-10 17:57 UTC, Amit Bawer
no flags Details
engine log for maintenance VG checksum error (1.21 MB, text/plain)
2019-12-10 17:58 UTC, Amit Bawer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 105490 0 master ABANDONED core: moving domain to inactive only when in use 2020-10-15 04:58:30 UTC
oVirt gerrit 109438 0 master MERGED core: clear compensation data after deactivating SD 2020-10-15 04:58:30 UTC

Internal Links: 1768821

Description Amit Bawer 2019-12-08 12:43:40 UTC
Description of problem:

When 'Ignore OVF update failure' is pressed and an error occurs in one of the OVF update steps the storage domain should enter to maintenance mode although the errors occur, however SD state transitions into 'inactive'.

Version-Release number of selected component (if applicable): 4.3.7.2


How reproducible: 100%


Steps to Reproduce:
1. Create VM on SD
2. Modify the VM disk attributes (for example, it's name).
3. Make the VM OVF store inaccessible for updates:
chmod 000 /rhev/data-center/<uuid1>/<uuid2>/images/<uuid3>
4. Attempt to put SD on Maintenance with 'ignore OVF update failure' checked.

Actual results:

SD state transitions into 'inactive'.

Expected results:

When 'ignore OVF update failure' is pressed and an error occurs in one of the OVF update steps the storage domain should enter to maintenance mode although the errors occur.

Additional info:

Comment 1 Amit Bawer 2019-12-08 12:52:15 UTC
Created attachment 1643011 [details]
engine log

engine log for maintenance with "ignore ovf update" with bad ovf store.

Comment 2 Amit Bawer 2019-12-08 12:52:53 UTC
Created attachment 1643012 [details]
vdsm log

vdsm log for maintenance with ignore ovf update with bad ovf store

Comment 3 Nir Soffer 2019-12-08 13:15:05 UTC
Amit, can you add you analysis here, explaning the normal flow and the flow
in the case when OVF disk cannot be accessed?

Also, which engine version was tested? which vdsm version?

Finally, since the original case came from FC system, can you reproduce
this on block storage by corrupting the vg metadata?

Comment 4 Amit Bawer 2019-12-08 17:16:40 UTC
(In reply to Nir Soffer from comment #3)
> Amit, can you add you analysis here, explaning the normal flow and the flow
> in the case when OVF disk cannot be accessed?

Attached the engine and vdsm logs for failing to update OVF store with "ignore" option set.
Also adding logs for good flow, where we drop to maintenance without OVF store error.

The difference between the runs is that we enter the fallback code in the OVF update error flow,
running DeactivateStorageDomainVDSCommand and DisconnectStoragePoolVDSCommand from the fallback context

2019-12-08 04:57:07,976-05 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [2dc810a8-30cc-4310-95ea-8b7971e12b7b] Command 'UpdateOvfStoreForStorageDomain' id: '073345ec-e6f6-4dd0-a659-9146fe4e1e22' with children [] failed when attempting to perform the next operation, marking as 'FAILED'
2019-12-08 04:57:07,976-05 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [2dc810a8-30cc-4310-95ea-8b7971e12b7b] EngineException: ENGINE (Failed with error ENGINE and code 5001): org.ovirt.engine.core.common.errors.EngineException: EngineException: ENGINE (Failed with error ENGINE and code 5001)
    at org.ovirt.engine.core.bll.storage.domain.UpdateOvfStoreForStorageDomainCommand.executeNextOperation(UpdateOvfStoreForStorageDomainCommand.java:124) [bll.jar:]
    at org.ovirt.engine.core.bll.storage.domain.UpdateOvfStoreForStorageDomainCommand.performNextOperation(UpdateOvfStoreForStorageDomainCommand.java:112) [bll.jar:]
    at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
    at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:]
    at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) [bll.jar:]
    at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_232]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_232]
    at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:]
    at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_232]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_232]
    at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_232]
    at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]

2019-12-08 04:57:07,977-05 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [2dc810a8-30cc-4310-95ea-8b7971e12b7b] Command 'UpdateOvfStoreForStorageDomain' id: '073345ec-e6f6-4dd0-a659-9146fe4e1e22' child commands '[]' executions were completed, status 'FAILED'
2019-12-08 04:57:07,977-05 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-12) [2dc810a8-30cc-4310-95ea-8b7971e12b7b] Command 'UpdateOvfStoreForStorageDomain' id: '073345ec-e6f6-4dd0-a659-9146fe4e1e22' Updating status to 'FAILED', The command end method logic will be executed by one of its parent commands.
2019-12-08 04:57:08,991-05 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-35) [2dc810a8-30cc-4310-95ea-8b7971e12b7b] Command 'ProcessOvfUpdateForStorageDomain' id: 'af0c100a-1fbd-436f-8da5-6549014d82d5' child commands '[]' executions were completed, status 'SUCCEEDED'
2019-12-08 04:57:08,991-05 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-35) [2dc810a8-30cc-4310-95ea-8b7971e12b7b] Command 'ProcessOvfUpdateForStorageDomain' id: 'af0c100a-1fbd-436f-8da5-6549014d82d5' Updating status to 'SUCCEEDED', The command end method logic will be executed by one of its parent commands.


And not finishing with the regular disconnection execution flow:

ed-Thread-93) [26449e85-eabb-4b4a-96bb-9aab5207b93f] Command 'DeactivateStorageDomainWithOvfUpdate' id 'ce10da79-fb47-4d72-ada7-67c0de7ea2e9' executing step 'COMPLETE'
2019-12-08 04:22:08,381-05 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [26449e85-eabb-4b4a-96bb-9aab5207b93f] Command 'DeactivateStorageDomainWithOvfUpdate' id: 'ce10da79-fb47-4d72-ada7-67c0de7ea2e9' child commands '[a6e2f935-bb13-4fea-80bf-a0f946880eaa, 86b1ecac-e4e5-41dd-9569-f4b4de644e92]' executions were completed, status 'SUCCEEDED'
2019-12-08 04:22:09,395-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainWithOvfUpdateCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [26449e85-eabb-4b4a-96bb-9aab5207b93f] Ending command 'org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainWithOvfUpdateCommand' successfully.
2019-12-08 04:22:09,423-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainWithOvfUpdateCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [26449e85-eabb-4b4a-96bb-9aab5207b93f] Lock freed to object 'EngineLock:{exclusiveLocks='[49b02aa4-6ad8-4685-ae25-4bb9bf85547d=STORAGE, 293333d8-1753-11ea-906e-525400a1903a=POOL]', sharedLocks=''}'


I am not familiar with Engine flows, but probably someone who does can see the difference.


> 
> Also, which engine version was tested? which vdsm version?

engine 4.3.7.2-1 el7
vdsm 4.30.38-1 el7

> 
> Finally, since the original case came from FC system, can you reproduce
> this on block storage by corrupting the vg metadata?

Had no luck in generating "checksum error" for a VG without breaking SD entirely.

Comment 5 Amit Bawer 2019-12-08 17:18:12 UTC
Created attachment 1643092 [details]
engine log for maintanace with good ovf store

Comment 6 Amit Bawer 2019-12-08 17:19:22 UTC
Created attachment 1643093 [details]
vdsm log for matinenance with good ovf store

Comment 7 Amit Bawer 2019-12-10 17:49:40 UTC
Also reproduced with a setup closer to BZ 1768821:

1. Create iSCSI SD (not master).
2. Add VM to SD.
3. Add disk to VM.
4. Edit VM disk attributes (name).
5. Create VG checksum error for the SD:
5.1. dd if=/dev/mapper/36001405c156d7ad535044d79debe40fd  of=/tmp/md   bs=128M  count=1 conv=fsync iflag=direct
5.2. Edit the /tmp/md file, overwrite one of its "LVM2" text records with a short random data (keep a copy of /tmp/md first to fix the VG again).
5.3. dd of=/dev/mapper/36001405c156d7ad535044d79debe40fd  if=/tmp/md   bs=128M  count=1 conv=fsync iflag=direct
5.4. Verify VG checksum is broken: 

[root@localhost log]# vgs
  /dev/mapper/36001405c156d7ad535044d79debe40fd: Checksum error at offset 45568
  Couldn't read volume group metadata from /dev/mapper/36001405c156d7ad535044d79debe40fd.
  Metadata location on /dev/mapper/36001405c156d7ad535044d79debe40fd at 45568 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/36001405c156d7ad535044d79debe40fd
  Failed to scan VG from /dev/mapper/36001405c156d7ad535044d79debe40fd

6. Put SD into maintenance with "Ignore OVF update failure".

After 5 minutes the SD appears as "Inactive".



Adding engine_1768821.log and vdsm_1768821.log files for this reproduction.

$ egrep  "iscsi_4"\|"505bfaad-bdaf-45fa-a586-a842fdab75fd" engine_1768821.log


2019-12-10 11:25:46,778-05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-85) [de82d593-e6c5-4c09-b9b5-90961bc54f14] EVENT_ID: UPDATE_OVF_FOR_STORAGE_DOMAIN_FAILED(190), Failed to update VMs/Templates OVF data for Storage Domain iscsi_4 in Data Center Default.
2019-12-10 11:25:48,978-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [75c73d04] Running command: DeactivateStorageDomainCommand internal: true. Entities affected :  ID: 505bfaad-bdaf-45fa-a586-a842fdab75fd Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
2019-12-10 11:25:48,986-05 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [75c73d04] START, DeactivateStorageDomainVDSCommand( DeactivateStorageDomainVDSCommandParameters:{storagePoolId='293333d8-1753-11ea-906e-525400a1903a', ignoreFailoverLimit='false', storageDomainId='505bfaad-bdaf-45fa-a586-a842fdab75fd', masterDomainId='00000000-0000-0000-0000-000000000000', masterVersion='45'}), log id: 22b90056
2019-12-10 11:25:51,023-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engine-Thread-629) [75c73d04] Executing with domain map: {49b02aa4-6ad8-4685-ae25-4bb9bf85547d=active, 505bfaad-bdaf-45fa-a586-a842fdab75fd=attached}
2019-12-10 11:25:52,949-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [75c73d04] Domain '505bfaad-bdaf-45fa-a586-a842fdab75fd' will remain in 'PreparingForMaintenance' status until deactivated on all hosts
2019-12-10 11:25:53,017-05 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [75c73d04] EVENT_ID: USER_DEACTIVATED_STORAGE_DOMAIN(968), Storage Domain iscsi_4 (Data Center Default) was deactivated and has moved to 'Preparing for maintenance' until it will no longer be accessed by any Host of the Data Center.
2019-12-10 11:25:53,029-05 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [75c73d04] Command [id=d5808492-60bf-4bd2-9f67-9322c56d08a2]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot:{id='StoragePoolIsoMapId:{storagePoolId='293333d8-1753-11ea-906e-525400a1903a', storageId='505bfaad-bdaf-45fa-a586-a842fdab75fd'}', status='Unknown'}.
2019-12-10 11:25:53,043-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainWithOvfUpdateCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [75c73d04] Lock freed to object 'EngineLock:{exclusiveLocks='[505bfaad-bdaf-45fa-a586-a842fdab75fd=STORAGE]', sharedLocks='[293333d8-1753-11ea-906e-525400a1903a=POOL]'}'
2019-12-10 11:26:03,875-05 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-631) [] domain '505bfaad-bdaf-45fa-a586-a842fdab75fd:iscsi_4' in problem 'NOT_REPORTED'. vds: 'host'
2019-12-10 11:31:03,877-05 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-671) [] starting processDomainRecovery for domain '505bfaad-bdaf-45fa-a586-a842fdab75fd:iscsi_4'.
2019-12-10 11:31:03,880-05 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-671) [] Domain '505bfaad-bdaf-45fa-a586-a842fdab75fd:iscsi_4' was reported by all hosts in status UP as problematic. Moving the domain to NonOperational.
2019-12-10 11:31:03,953-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engine-Thread-671) [534445f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[505bfaad-bdaf-45fa-a586-a842fdab75fd=STORAGE]', sharedLocks='[293333d8-1753-11ea-906e-525400a1903a=POOL]'}'
2019-12-10 11:31:03,978-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engine-Thread-671) [534445f] Running command: DeactivateStorageDomainCommand internal: true. Entities affected :  ID: 505bfaad-bdaf-45fa-a586-a842fdab75fd Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
2019-12-10 11:31:03,986-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engine-Thread-671) [534445f] DeactivateStorageDomainVDS is skipped '505bfaad-bdaf-45fa-a586-a842fdab75fd'
2019-12-10 11:31:03,986-05 INFO  [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedThreadFactory-engine-Thread-671) [534445f] Lock freed to object 'EngineLock:{exclusiveLocks='[505bfaad-bdaf-45fa-a586-a842fdab75fd=STORAGE]', sharedLocks='[293333d8-1753-11ea-906e-525400a1903a=POOL]'}'
2019-12-10 11:31:04,028-05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-671) [534445f] EVENT_ID: SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), Storage Domain iscsi_4 (Data Center Default) was deactivated by system because it's not visible by any of the hosts.
2019-12-10 11:35:00,004-05 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (EE-ManagedThreadFactory-engineScheduled-Thread-61) [] Autorecovering storage domains id: 505bfaad-bdaf-45fa-a586-a842fdab75fd 
2019-12-10 11:35:00,017-05 INFO  [org.ovirt.engine.core.bll.storage.connection.ConnectDomainToStorageCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-61) [5977b8a5] Running command: ConnectDomainToStorageCommand internal: true. Entities affected :  ID: 505bfaad-bdaf-45fa-a586-a842fdab75fd Type: Storage
2019-12-10 11:39:05,144-05 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-738) [] Adding domain '505bfaad-bdaf-45fa-a586-a842fdab75fd' to the domains in maintenance cache
2019-12-10 12:18:36,618-05 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [4c999455] Lock Acquired to object 'EngineLock:{exclusiveLocks='[505bfaad-bdaf-45fa-a586-a842fdab75fd=STORAGE]', sharedLocks='[293333d8-1753-11ea-906e-525400a1903a=OVF_UPDATE]'}'
2019-12-10 12:18:36,622-05 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [4c999455] Lock freed to object 'EngineLock:{exclusiveLocks='[505bfaad-bdaf-45fa-a586-a842fdab75fd=STORAGE]', sharedLocks='[293333d8-1753-11ea-906e-525400a1903a=OVF_UPDATE]'}'


$ grep "vdsm.api" vdsm_1768821.log
...
2019-12-10 11:25:45,931-0500 INFO  (jsonrpc/6) [vdsm.api] START setVolumeDescription(sdUUID=u'505bfaad-bdaf-45fa-a586-a842fdab75fd', spUUID=u'293333d8-1753-11ea-906e-525400a1903a', imgUUID=u'9758e6cf-0d82-4982-a86b-39255ad1e3c3', volUUID=u'93a5add7-a3bc-44cd-bb50-f070d410afa9', description=u'{"Updated":true,"Size":20480,"Last Updated":"Tue Dec 10 11:25:43 EST 2019","Storage Domains":[{"uuid":"505bfaad-bdaf-45fa-a586-a842fdab75fd"}],"Disk Description":"OVF_STORE"}', options=None) from=::ffff:10.35.18.254,42726, flow_id=de82d593-e6c5-4c09-b9b5-90961bc54f14, task_id=1386daab-a2cd-4823-9135-36c82593ac1a (api:48)
2019-12-10 11:25:46,692-0500 INFO  (jsonrpc/6) [vdsm.api] FINISH setVolumeDescription error=Failed reload: 93a5add7-a3bc-44cd-bb50-f070d410afa9 from=::ffff:10.35.18.254,42726, flow_id=de82d593-e6c5-4c09-b9b5-90961bc54f14, task_id=1386daab-a2cd-4823-9135-36c82593ac1a (api:52)
...
2019-12-10 11:25:48,991-0500 INFO  (jsonrpc/0) [vdsm.api] START deactivateStorageDomain(sdUUID=u'505bfaad-bdaf-45fa-a586-a842fdab75fd', spUUID=u'293333d8-1753-11ea-906e-525400a1903a', msdUUID=u'00000000-0000-0000-0000-000000000000', masterVersion=45, options=None) from=::ffff:10.35.18.254,42726, flow_id=75c73d04, task_id=13059f1a-85c0-45ff-99e5-e2cee82ddd7c (api:48)
2019-12-10 11:25:50,999-0500 INFO  (jsonrpc/0) [vdsm.api] FINISH deactivateStorageDomain return=None from=::ffff:10.35.18.254,42726, flow_id=75c73d04, task_id=13059f1a-85c0-45ff-99e5-e2cee82ddd7c (api:54)
...
2019-12-10 11:25:52,271-0500 INFO  (jsonrpc/1) [vdsm.api] START disconnectStorageServer(domType=3, spUUID=u'293333d8-1753-11ea-906e-525400a1903a', conList=[{u'port': u'3260', u'connection': u'10.35.19.23', u'iqn': u'iqn.2003-01.org.hera03.iqn1', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': u'false', u'password': '********', u'id': u'5950940e-0f2f-4ad0-9b33-49a436c4d944'}], options=None) from=::ffff:10.35.18.254,42724, flow_id=75c73d04, task_id=e1825db1-a198-4395-a7e2-c07ecfc3ac60 (api:48)
2019-12-10 11:25:52,944-0500 INFO  (jsonrpc/1) [vdsm.api] FINISH disconnectStorageServer return={'statuslist': [{'status': 0, 'id': u'5950940e-0f2f-4ad0-9b33-49a436c4d944'}]} from=::ffff:10.35.18.254,42724, flow_id=75c73d04, task_id=e1825db1-a198-4395-a7e2-c07ecfc3ac60 (api:54)
...

Comment 8 Amit Bawer 2019-12-10 17:57:25 UTC
Created attachment 1643701 [details]
vdsm log for maintenance VG checksum error

Comment 9 Amit Bawer 2019-12-10 17:58:05 UTC
Created attachment 1643702 [details]
engine log for maintenance VG checksum error

Comment 10 Eyal Shenitzky 2020-05-03 13:13:07 UTC
Shani, please note that there is an old patch written but it doesn't solve the issue - https://gerrit.ovirt.org/#/c/105490/.

Comment 14 Amit Bawer 2020-05-19 15:01:11 UTC
The original issue from 4.3.7 (unable to put corrupted SD to maintenance with/without ignore OVF update) does not persist to 4.4;
for corrupted VG SD in 4.4 the only error indication in UI is that of: "Failed to determine the metadata devices of Storage Domain UUID",
other than that the domain can be put to maintenance and back to active as normal.

If this wanted to be fixed for 4.3 then it reproduces on 4.3 as seen earlier and on later attempts of comment 13.
For 4.4 I would say there is no much from the original issue to fix, aside to the fact we can move in Engine between states of the domain regardless of the actual status as pointed on previous comment.

Comment 15 shani 2020-05-20 07:47:27 UTC
The second issue is related to the lvm2 version:
The new lvm2-2.03.08-3.el8.x86_64 is using automatically vgck --updatemetadata, which repairs the "broken metadata", 
and therefore it doesn't need to "ignore ovf failures".

root@host44 ~ # vgs
  WARNING: Metadata location on /dev/mapper/3600140590d860f65aad4c9a8e591a89b at 48128 begins with invalid VG name.
  WARNING: bad metadata text on /dev/mapper/3600140590d860f65aad4c9a8e591a89b in mda1
  WARNING: scanning /dev/mapper/3600140590d860f65aad4c9a8e591a89b mda1 failed to read metadata summary.
  WARNING: repair VG metadata on /dev/mapper/3600140590d860f65aad4c9a8e591a89b with vgck --updatemetadata.
  WARNING: Metadata location on /dev/mapper/3600140590d860f65aad4c9a8e591a89b at 48128 begins with invalid VG name.
  WARNING: bad metadata text on /dev/mapper/3600140590d860f65aad4c9a8e591a89b in mda1
  WARNING: scanning /dev/mapper/3600140590d860f65aad4c9a8e591a89b mda1 failed to read metadata summary.
  WARNING: repair VG metadata on /dev/mapper/3600140590d860f65aad4c9a8e591a89b with vgck --updatemetadata.


This is why the operation succeeded on master, while it fails on 4.3, as the corrupted metadata fails the vgs operation (comment 13)

root@host43 ~ # vgs
  Metadata location on /dev/mapper/3600140590d860f65aad4c9a8e591a89b at 48128 begins with invalid VG name.
  Failed to read metadata summary from /dev/mapper/3600140590d860f65aad4c9a8e591a89b
  Failed to scan VG from /dev/mapper/3600140590d860f65aad4c9a8e591a89b

The same corrupted data acts differently on 4.3 and 4.4.

Eyal, no need to open a new bug for the second issue.
Can we close this one?

Comment 16 Eyal Shenitzky 2020-05-20 08:49:18 UTC
I don't think so, it seems to me that the issue is still there but we just need to find another way to reproduce it.

Comment 18 Sandro Bonazzola 2020-06-19 09:41:04 UTC
This bug is in modified state and targeting 4.4.3. Can this be re-targeted to 4.4.1?

Comment 19 Evelina Shames 2020-07-01 10:17:30 UTC
Verified on engine-4.4.1.5-0.17.el8ev

Comment 20 Sandro Bonazzola 2020-07-08 08:24:42 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.