+++ This bug was initially created as a clone of Bug #1911597 +++ Description of problem: Since the new ability to assign Master SD with 'SwitchMasterStorageDomainCommand' command was introduced (BZ 1576923), there are two ways of assigning a new master SD: 1. With a help of restfull API which invokes the 'SwitchMasterStorageDomainCommand' at the engine end. POST /ovirt-engine/api/datacenters/123/setmaster <action> <storage_domain id="456"/> </action> 2. Via the old way of putting the Master SD to maintenance, and waiting till some other SD will take the master role. When with a help of first way, we re-assign the 'master' role to another SD of the same storage type (GlusterFS), we are failing to do so. The engine imitates the 'SwitchMasterStorageDomainCommand' [1] Apparently it fails to complete it [2] And triggers the reconstruction of Storage domains [3] Then, a chain of ERRORs triggered on the engine [4] and SPM host [7] You will notice that at this stage, the SPM host will have an orphaned task [5] and the reconstruction wont end till we clear this task [6]. After the engine completes to reconstruct the SDs, there is still an orphaned async task in the engine's DB [8] The same issue can be reproduced with a help of 2nd way as well, but it requires some additional effort to hit this issue again by activating and deactivating the domains of the GlusterFs type. [1]: 2020-12-30 10:23:28,962+02 INFO [org.ovirt.engine.core.bll.storage.pool.SwitchMasterStorageDomainCommand] (default task-35) [d4c45b3d-6d41-473a-b5c5-8a1b2e2c7aeb] Lock Acquired to object 'EngineLock:{exclusiveLocks='[7b7f56b4-3957-41d2-9c27-da539d 6af836=STORAGE, 2a0d3c24-3357-4677-b9b2-35486af464a3=STORAGE, dfad0b2f-c1e9-4a0d-9ede-30414f6bee36=POOL]', sharedLocks=''}' 2020-12-30 10:23:28,970+02 INFO [org.ovirt.engine.core.bll.storage.pool.SwitchMasterStorageDomainCommand] (default task-35) [d4c45b3d-6d41-473a-b5c5-8a1b2e2c7aeb] Running command: SwitchMasterStorageDomainCommand internal: false. Entities affected : ID: dfad0b2f-c1e9-4a0d-9ede-30414f6bee36 Type: StoragePoolAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN 2020-12-30 10:23:28,970+02 INFO [org.ovirt.engine.core.bll.storage.pool.SwitchMasterStorageDomainCommand] (default task-35) [d4c45b3d-6d41-473a-b5c5-8a1b2e2c7aeb] Locking the following storage domains: test_gluster_0, test_gluster_1 [2]: 2020-12-30 10:23:32,305+02 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyImpl] (EE-ManagedThreadFactory-engine-Thread-63727) [] Master domain is not in sync between DB and VDSM. Domain test_gluster_1 marked as master in DB and not in the storage 2020-12-30 10:23:32,312+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-63727) [] EVENT_ID: SYSTEM_MASTER_DOMAIN_NOT_IN_SYNC(990), Sync Error on Master Domain between Host host_mixed_1 and oVirt Engine. Domain: test_gluster_1 is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue. [3]: 2020-12-30 10:23:32,330+02 INFO [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] Running command: ReconstructMasterDomainCommand internal: true. Entities affected : ID: 2a0d3c24-3357-4677-b9b2-35486af464a3 Type: Storage [4]: 2020-12-30 10:23:32,392+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] Failed in 'HSMGetAllTasksStatusesVDS' method 2020-12-30 10:23:32,397+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: value=(1, 0, b'', b'') abortedcode=100 2020-12-30 10:23:32,398+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', pool id 'dfad0b2f-c1e9-4a0d-9ede-30414f6bee36' as there are uncleared tasks 'Task '29240337-9e23-4500-ac1a-abe495ae2b68', status 'finished'' 2020-12-30 10:23:32,402+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_1, pool id dfad0b2f-c1e9-4a0d-9ede-30414f6bee36 as there are uncleared tasks Task '29240337-9e23-4500-ac1a-abe495ae2b68', status 'finished' [5]: [root@storage-ge5-vdsm1 ~]# vdsm-client Host getAllTasksStatuses { "63855d34-1358-4cc8-b44d-363616ed42d7": { "code": 100, "message": "value=(1, 0, b'', b'') abortedcode=100", "taskID": "63855d34-1358-4cc8-b44d-363616ed42d7", "taskResult": "cleanSuccess", "taskState": "finished" } } [6]: [root@storage-ge5-vdsm1 ~]# vdsm-client Task clear taskID=63855d34-1358-4cc8-b44d-363616ed42d7 true [7]: 2020-12-30 10:23:31,468+0200 ERROR (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err) vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'') [8]: engine=# select * from async_tasks; -[ RECORD 1 ]---+------------------------------------- task_id | 6f8549f0-b62a-4c15-86d3-3cbafa8d7b5d action_type | 1048 status | 2 result | 0 step_id | aa47a5b8-61bf-4bba-8e41-f7a2510e082f command_id | 1ae62c43-39c3-476f-a23b-e99ee613d65f started_at | 2020-12-30 10:23:29.42+02 storage_pool_id | dfad0b2f-c1e9-4a0d-9ede-30414f6bee36 task_type | 19 vdsm_task_id | 29240337-9e23-4500-ac1a-abe495ae2b68 root_command_id | 1ae62c43-39c3-476f-a23b-e99ee613d65f user_id | d0f3cb14-48fd-11eb-bf93-001a4a231747 Version-Release number of selected component (if applicable): rhv-release-4.4.4-6 How reproducible: 1st way always reproduces the issue 2nd way requires some additional tries Steps to Reproduce: 1. Have an env with more than one GlusterFS SDs. 2. Assign master from A glusterfs to B glusterfs 3. If you havent seen the issue at this point, just assign master to another SD of the same GlusterFS type. Actual results: Assigning the new master SD fails Orphaned tasks in engine DB and SPM host leads to infinite (or long) reconstruction of the SDs Expected results: Should re-assign the master role to another SD Additional info: Attaching engine log, spm log and rest of the vdsm hosts logs. --- Additional comment from Ilan Zuckerman on 2020-12-30 11:25:11 IST --- Looks like the problem related to the 2nd way of moving the master role (with maintenance) appeared in the past: https://bugzilla.redhat.com/show_bug.cgi?id=1360456, https://bugzilla.redhat.com/show_bug.cgi?id=1298724 --- Additional comment from Tal Nisan on 2020-12-30 20:07:40 IST --- Shir, please share you recent findings regardings --- Additional comment from Evelina Shames on 2020-12-31 16:04:00 IST --- (In reply to Tal Nisan from comment #2) > Shir, please share you recent findings regardings Reproduce on 4.3 as well: ovirt-engine-4.3.11.4-0.1.el7.noarch vdsm-4.30.50-1.el7ev.x86_64 --- Additional comment from Ilan Zuckerman on 2021-01-07 08:14:47 IST --- Hi Shani, can you please explain Why is this on QA? Was the issue described in the description of this bug fixed? I can only find a patch which adds a validation for avoiding switching the master storage domain from/to gluster domains. So there is no a real fix here if i understand correctly. Just a WA. --- Additional comment from shani on 2021-01-07 15:14:31 IST --- Maybe it was moved automatically to on_qa after merging the validation patch. Moving back to ASSIGNED as the issue is still under investigation of the gluster team.
Hi, I don't have an answer to your question, but for some reason, I tried again today to put the gluster domain into maintenance and I couldn't, although the service is running on the host. Ritesh, can you please have a look? Shir, was something changed on the host?
Shani and I talked about this issue privetly. I didn't change nothing on our environment, the solution that coming from Gluster team - installing the gluster packages on our SPM host wasn't resolved the issue. The installion of gluster packages solution was resolved the issue just for a moment on our environment and when we checked it again, we saw that the bug was reproduced again. It's not seems like an ideal solution. We are still waiting for response from Shani and Gluster team, what is the other solution that the Gluster team offer? instead of installing the packages that wasn't resolved the issue that desribed on this bug? I can provide another environment of 4.4 for investigation if needed.
Hello Shir, When i checked on the host gluster service is not running so i just gave a try installing and it did worked at the time. so i thought like that's the solution. Later when you guys tested it did not worked surely it's different issue then i have dive deep into it. Can you share me the details of remote server where gluster service is running. As of now i don't have answer to your question why it's failing so i will keep needinfo on me.
(In reply to Ritesh Chikatwar from comment #6) > Hello Shir, > > > When i checked on the host gluster service is not running so i just gave a > try installing and it did worked at the time. so i thought like that's the > solution. > > Later when you guys tested it did not worked surely it's different issue > then i have dive deep into it. > > Can you share me the details of remote server where gluster service is > running. > > As of now i don't have answer to your question why it's failing so i will > keep needinfo on me. Of courcse, gave you the details via Gchat
Moving this to 4.4.6 because i am still not able to find RCA for this.
Created attachment 1761947 [details] tested switch domain in local working as expected
Hello Shir, I created a setup in local to test this it was working as expected. i attached video of how i tested this. Shir can you provide me an environment where i can investigate this, the details earlier which you shared if still exist let me know i can use that or not. Packages details: Software Version:4.4.5.8_master glusterfs-8.3-1.el8.x86_64 Shani, As per Nir, TarCopyFailed error is part of deactivate storage domain flow. Switching master domain is not this flow, and should not include deactivating of the old storage domain. When i checked this i can see for a time being all the other gluster are locked for time being and then they seems to be up. This should not happen i guess. Have you observed this behavior.
Hi Ritesh, This one is very weird, as the switch master operation should be blocked for GlusterFS domains (bug https://bugzilla.redhat.com/1911597): <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <fault> <detail>[Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]</detail> <reason>Operation Failed</reason> </fault> The operation should be also grayed out on the UI for Gluster domains (Storage Domains > select a domain > 3 dots menu on the right > Set as master storage domain). Can you try to re-build your env and try again? Once the switchMaster operation is blocked, in order to force calling 'master migration', we need to put into maintenance the current master domain and verify the new domain was elected, without any errors on the VDSM log. Can you please give it a try as well?
Shani, Sure will try and update you, (In reply to shani from comment #11) > Hi Ritesh, > This one is very weird, as the switch master operation should be blocked for > GlusterFS domains (bug https://bugzilla.redhat.com/1911597): > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <fault> > <detail>[Cannot switch master storage domain. Switch master storage > domain is not supported for gluster-based domains.]</detail> > <reason>Operation Failed</reason> > </fault> > > The operation should be also grayed out on the UI for Gluster domains > (Storage Domains > select a domain > 3 dots menu on the right > Set as > master storage domain). > Yeah it did blocked me so locally i commented out the conndition and tried to switch master domain. > Can you try to re-build your env and try again? > > Once the switchMaster operation is blocked, in order to force calling > 'master migration', we need to put into maintenance the current master > domain and verify the new domain was elected, without any errors on the VDSM > log. > Can you please give it a try as well? Let me give a try this from UI. Will keep needinfo on me as i have to verify this from UI.
Hello Shani, I did tried from the UI it seems it's working. i am attaching the video for referance.
Created attachment 1762250 [details] Tested from UI
Hi Ritesh, I agree, looks like it is working on your env. It's still needed to figure out what's the difference between your env and the QE's. Here are some thoughts: - different versions of gluster/any related component - maybe the fact that the QE are using a remote gluster server affects the scenario - why there's a failure of TarCopy on the QE env logs Once the QE will have an env, those can be tested, but maybe some research can be done by looking at the code. What do you think?
Yeah let me take a look on code till we get the setup.
(In reply to Ritesh Chikatwar from comment #16) > Yeah let me take a look on code till we get the setup. Do we have any progress with the bug?
I have not found any suspicious in the code, may be environment will help to find the exact root cause. Can you provide me a fresh setup and i can dig more into the setup.
Shani/Shir, can you please assist?
This bug reproduced on both UI and Rest API by trying to switch the master SD from Gluser SD to Gluster SD. The operation fails on both cases. The reconstruction happens via the UI in the second try to reproduce the issue. [1] Rest API: https:<hostname>/ovirt-engine/api/datacenters/<dtacenterID>/setmaster <action> <storage_domain id="<storage_domain>"/> </action> response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <fault> <detail>[Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]</detail> <reason>Operation Failed</reason> </fault> Engine log after rest API request: 2021-04-28 15:25:52,550+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-191) [] Operation Failed: [Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.] [2] On UI: 1. Have an env with more than one GlusterFS SDs(3 Gulster SDs). 2. Assign master from A glusterfs to B glusterfs 2021-04-28 15:28:57,269+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] Failed in 'DeactivateStorageDomainVDS' method 2021-04-28 15:28:57,283+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command DeactivateStorageDomainVDS failed: Error in storage domain action: ('sdUUID=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, spUUID=bde80e57-4ece-44bd-b884-46099c442133, msdUUID=ad26e577-0099-43f2-b43a-5e3c3eeee4d1, masterVersion=5',) 2021-04-28 15:28:57,332+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, spUUID=bde80e57-4ece-44bd-b884-46099c442133, msdUUID=ad26e577-0099-43f2-b43a-5e3c3eeee4d1, masterVersion=5',), code = 350 2021-04-28 15:29:03,279+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command ConnectStoragePoolVDS failed: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133' 2021-04-28 15:29:03,344+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] Command 'ConnectStoragePoolVDSCommand(HostName = host_mixed_2, ConnectStoragePoolVDSCommandParameters:{hostId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91', vdsId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91', storagePoolId='bde80e57-4ece-44bd-b884-46099c442133', masterVersion='5'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133' 2021-04-28 15:29:03,344+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133' 2021-04-28 15:29:03,375+03 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] DeactivateStorageDomainVDS failed 'e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133' (Failed with error StoragePoolWrongMaster and code 324) 2021-04-28 15:29:03,379+03 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] Failed to deactivate storage domain 'e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd' 2021-04-28 15:29:03,438+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [] Command 'HSMGetAllTasksStatusesVDSCommand(HostName = host_mixed_2, VdsIdVDSCommandParametersBase:{hostId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91'})' execution failed: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM: () 2021-04-28 15:29:04,255+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain test_gluster_2 (Data Center golden_env_mixed). versions: ovirt-engine-4.4.6.5-447.gd80dda7.9.el8ev.noarch vdsm-4.40.60.4-1.el8ev.x86_64 glusterfs-6.0-49.1.el8.x86_64
Created attachment 1776597 [details] Logs
*** Bug 1956263 has been marked as a duplicate of this bug. ***
(In reply to Ritesh Chikatwar from comment #26) > Regarding TarCopyFailed exception. > > this is coming from this line > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/fileUtils.py#L59 > it's returning return code as 1, this is because tar is running on dynamic > directory because of this it's throwing that error. > i tried running that command on one of my setup to see what i get > > [root@rhsqa-grafton1-nic2 ~]# /usr/bin/tar cf -f --exclude=./lost+found -C > /rhev/data-center/mnt/glusterSD/<hostname>:_vmstore . > /usr/bin/tar: > ./e5a79957-bb2b-414f-bf6f-cbb2a4b98d0b/images/0b63e72a-ec4a-43e1-ba75- > a480c58b54c9: file changed as we read it The issue is that some file in the master mount changed during the copy. This is a real issue in vdsm. It must use a lock to prevent tasks from writing task state when switching master storage domain from one domain to the other. Normally every task writes to its own directory, so multiple tasks can write together without any locking. But when we copy the entire directory we need to lock the entire master mount while we copy. Moving the bug to vdsm, this is not related to engine or gluster.
Evelina, do we know how to reproduce this bug? If we have a good way to reproduce this, it will help if we do early testing before we merge this. The fix is also risky, adding locking in very dark corners of vdsm, which may cause deadlocks. I would like to make sure all QE tiers works with the fix.
Reproduced with NFS storage domain with a patch improving error reporting in tarCopy: 2021-09-14 00:33:44,369+0300 ERROR (tasks/1) [storage.taskmanager.task] (Task='928b287c-fdd0-487f-bec5-f98d63b63ba2') Unexpect ed error (task:877) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run return fn(*args, **kargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 989, in switchMaster self.masterMigrate(oldMasterUUID, newMasterUUID, masterVersion) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy raise se.TarCommandError(errors) vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+ found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/u sr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde -9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},) The issue are: - task directory changed during copy - temporary task directory removed while reading
Created attachment 1822863 [details] vdsm.log for reproduction on NFS storage domain
Log during tarCopy failure: Starting tarCopy: 2021-09-14 00:33:44,305+0300 INFO (tasks/1) [storage.fileutils] Copying '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97 c-497c-85a2-2f044993bc2e/master' to '/rhev/data-center/mnt/sparse:_export_01/0c065db4-feca-4179-801b-b51be9499943/master' (fil eUtils:55) Staring reader: 2021-09-14 00:33:44,305+0300 DEBUG (tasks/1) [storage.fileutils] Starting reader: ['/usr/bin/tar', 'cf', '-', '--exclude=./los t+found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'] (fileUtils:62) While copying masster filesystem, tasks modifying /tasks/ directory: Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 creates temporary directory: 2021-09-14 00:33:44,326+0300 DEBUG (jsonrpc/1) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781) Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 modifies temporary directory: 2021-09-14 00:33:44,335+0300 DEBUG (jsonrpc/1) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781) Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 modifies temporary directory: 2021-09-14 00:33:44,347+0300 DEBUG (tasks/5) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781) tarCopy ends: 2021-09-14 00:33:44,365+0300 DEBUG (tasks/1) [storage.fileutils] Child processes terminated (fileUtils:79) 2021-09-14 00:33:44,365+0300 ERROR (tasks/1) [storage.storagepool] Migration to new master 0c065db4-feca-4179-801b-b51be9499943 failed (sp:926) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate exclude=('./lost+found',)) File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy raise se.TarCommandError(errors) vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},) tarCopy failed because the task directory was modified while reading it, and the task temporary directory was deleted (part of task saving flow). So we to make sure tasks cannot modify storage (write data, create/remove directories/files) during tarCopy.
Created attachment 1822866 [details] reproduction log Log showing how to reproduce this issue using new change_master.py SDK example script.
To reproduce the issue, you can use change_master.py example script: https://github.com/oVirt/python-ovirt-engine-sdk4/pull/12 1. Create 10 1g empty preallocated disks 2. Start switching master loop $ for i in $(seq 100); do ./change_master.py -c engine-dev dc1 sd-1 ./change_master.py -c engine-dev dc1 sd-2 done 3. Move all disks between 2 other storage domains (e.g. sd-3, sd-4) Using the same storage domains used as master will not reproduce the issue since disk copy takes a lock on the storage domain, and this will block switch master operation during the copy. When starting a move, the system create 10 SPM tasks. When ending the move, the system creates 10 SPM tasks to delete the original disks. When all disks moved, move them back to the original domain. Repeat while switch master is running. This step should be automated. Expected results: Without the fix, switch master may fail after many iterations (see comment 35). With the fix, switch master should complete successfully. In the failing test (attachment 1822866 [details]), it took 111 change_master.py runs to fail. Notes: It is expected to see these messages: Setting storage domain sparse-nfs-00 as master Operation failed: [Cannot switch master storage domain. The storage pool has running tasks.] Trying again in 10 seconds... This means that SPM tasks are running, and change_master will retry the operation. It may be easier to reproduce with Gluster storage, but for this we must have engine version that does not block switch master operation for Gluster.
Ritesh, if you have Gluster environment available, and you know how to reproduce this quickly, it will help to reproduce again with current vdsm master, including this commit: commit 3cad5b9237ce2030861132173d4fc6bb9782fc08 fileUtils: Fix error handling and logging in tarCopy This commit improves the error logs when tarCopy fail, so we will have better understanding why it fails in Gluster. You will get detailed exception like the one in comment 35 (tested on NFS). You should be able to reproduce this by putting master storage domain to maintenance as described in comment 0 section 2.
Ritesh, we can also test the fix on Gluster with these repos: https://jenkins.ovirt.org/job/ovirt-engine_standard-check-patch/14264/artifact/build-artifacts.el8stream.x86_64/ https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29778/artifact/build-artifacts.build-py3.el8stream.x86_64/
@nsoffer as of now i don't have gluster environment available, i have to create one and test this. i will update the status with this change.
Hey Nir, I tried reproducing this issue on my local setup but i couldn't looks like your change is working. If you are interested in doing some of the testing i can provide the env please ping me over chat.
(In reply to Ritesh Chikatwar from comment #43) > I tried reproducing this issue on my local setup but i couldn't looks like > your change is working. Did you reproduce first with current master? Otherwise we cannot tell if the change fixed the issue, or you just did not reproduce the issue because of another reason. You can downgrade vdsm to use later release, and run the flow that reproduced this issue in the past. The change_master.py example should make it easy to reproduce, see comment 39.
We reproduced the issue with gluster with the vdsm locking fix: https://gerrit.ovirt.org/c/vdsm/+/116413 and with additional fix syncing the master fs before coping it: https://gerrit.ovirt.org/c/vdsm/+/117047 Both fixes do not help, copying the master filesystem with tar always fails on gluster with: /usr/bin/tar: .: file changed as we read it This looks exactly like bug 1104618, but this bug seems to be fixed in glusterfs-3.6.0.44-1, while we use # rpm -q glusterfs glusterfs-9.4-1.el8s.x86_64 So this looks like a glsuter regression. I could reproduce this issue without RHV: # cat reproduce.sh set -e mp=${1:?Usage: sh reproduce.sh MOUNTPOINT} task=`uuidgen` mkdir $mp/test mkdir $mp/test/tasks mkdir $mp/test/tasks/$task echo "task data" > $mp/test/tasks/$task/$task.job.0 sync --file-system $mp/test tar cf test.tar -C $mp/test . Reproducing example: # mount | grep gluster gluster01.lab.eng.tlv2.redhat.com:/GE_storage7_volume01 on /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__storage7__volume01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) # echo "testing tar copy" >> /var/log/glusterfs/rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com\:_GE__storage7__volume01.log # for i in $(seq 10); do rm -rf $M/test; echo iteration $i; sh reproduce.sh $M; echo '----'; done iteration 1 ---- iteration 2 tar: ./tasks/fd4c4d2c-5d55-41a8-b499-ade2172f7442: file changed as we read it tar: ./tasks: file changed as we read it ---- iteration 3 tar: ./tasks/b23d7fc2-5e7c-408e-9c65-e0cc27dc33d0: file changed as we read it tar: ./tasks: file changed as we read it ---- iteration 4 tar: ./tasks: file changed as we read it ---- iteration 5 ---- iteration 6 tar: ./tasks: file changed as we read it tar: .: file changed as we read it ---- iteration 7 tar: ./tasks/dd042dbd-8569-4257-b2c0-7a7b613ecd6b: file changed as we read it tar: ./tasks: file changed as we read it tar: .: file changed as we read it ---- iteration 8 tar: ./tasks: file changed as we read it tar: .: file changed as we read it ---- iteration 9 tar: ./tasks/40790646-c689-42f4-8ae6-6ca66c4556f2: file changed as we read it tar: ./tasks: file changed as we read it ---- iteration 10 tar: ./tasks: file changed as we read it ---- There are no error in glusterfs logs: # tail -1 /var/log/glusterfs/rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com\:_GE__storage7__volume01.log testing tar copy We should move this bug to gluster, or create new gluster bug, but I don't have the info needed to open the bug (packages versions, volume info, etc). Gobinda, who can help with this?
Move info I could get from: from one of the gluster servers: # rpm -qa | grep gluster glusterfs-libs-6.0-37.el8rhgs.x86_64 glusterfs-6.0-37.el8rhgs.x86_64 glusterfs-server-6.0-37.el8rhgs.x86_64 glusterfs-client-xlators-6.0-37.el8rhgs.x86_64 glusterfs-fuse-6.0-37.el8rhgs.x86_64 glusterfs-api-6.0-37.el8rhgs.x86_64 glusterfs-cli-6.0-37.el8rhgs.x86_64 python3-gluster-6.0-37.el8rhgs.x86_64 glusterfs-geo-replication-6.0-37.el8rhgs.x86_64 # gluster v info GE_storage9_volume01 Volume Name: GE_storage9_volume01 Type: Replicate Volume ID: 39feddf0-a521-4977-85bd-d5d2d5d19c65 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster01.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01 Brick2: gluster02.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01 Brick3: gluster03.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01 Options Reconfigured: storage.owner-uid: 36 storage.owner-gid: 36 transport.address-family: inet6 storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off # gluster v status GE_storage9_volume01 Status of volume: GE_storage9_volume01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster01.lab.eng.tlv2.redhat.com:/gl uster_volumes/GE_storage9_volume01 49368 0 Y 3739 Brick gluster02.lab.eng.tlv2.redhat.com:/gl uster_volumes/GE_storage9_volume01 49368 0 Y 3800 Brick gluster03.lab.eng.tlv2.redhat.com:/gl uster_volumes/GE_storage9_volume01 49368 0 Y 6991 Self-heal Daemon on localhost N/A N/A Y 3186417 Self-heal Daemon on gluster03.lab.eng.tlv2. redhat.com N/A N/A Y 3183458 Self-heal Daemon on gluster02.lab.eng.tlv2. redhat.com N/A N/A Y 2389346 Task Status of Volume GE_storage9_volume01 ------------------------------------------------------------------------------ There are no active volume tasks # gluster v get GE_storage9_volume01 all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal off cluster.data-self-heal off cluster.entry-self-heal off cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock on disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.quick-read-cache-invalidationfalse performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) client.ssl off network.remote-dio disable client.event-threads 2 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.all-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl off auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 2 server.tcp-user-timeout 42 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 transport.address-family inet6 performance.write-behind on performance.read-ahead off performance.readdir-ahead off performance.io-cache off performance.open-behind on performance.quick-read on performance.nl-cache off performance.stat-prefetch on performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false performance.global-cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable on nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.rdirplus on nfs.event-threads 2 nfs.exports-auth-enable (null) nfs.auth-refresh-interval-sec (null) nfs.auth-cache-ttl-sec (null) features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid 36 storage.owner-gid 36 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 20 storage.fips-mode-rchecksum on storage.force-create-mode 0000 storage.force-directory-mode 0000 storage.create-mask 0777 storage.create-directory-mask 0777 storage.max-hardlinks 100 features.ctime off config.gfproxyd off cluster.server-quorum-type off cluster.server-quorum-ratio 51 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable cluster.write-freq-threshold 0 cluster.read-freq-threshold 0 cluster.tier-pause off cluster.tier-promote-frequency 120 cluster.tier-demote-frequency 3600 cluster.watermark-hi 90 cluster.watermark-low 75 cluster.tier-mode cache cluster.tier-max-promote-file-size 0 cluster.tier-max-mb 4000 cluster.tier-max-files 10000 cluster.tier-query-limit 100 cluster.tier-compact on cluster.tier-hot-compact-frequency 604800 cluster.tier-cold-compact-frequency 604800 features.ctr-enabled off features.record-counters off features.ctr-record-metadata-heat off features.ctr_link_consistency off features.ctr_lookupheal_link_timeout 300 features.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db-cachesize 12500 features.ctr-sql-db-wal-autocheckpoint 25000 locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.shard off features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 ganesha.enable off features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 1 cluster.shd-wait-qlength 1024 cluster.locking-scheme full cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention yes features.locks-notify-contention-delay 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex disable glusterd.vol_count_per_thread 100 cluster.max-bricks-per-process 250 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 features.selinux off cluster.daemon-log-level INFO debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on disperse.quorum-count 0 features.sdfs off features.cloudsync off features.ctime off ctime.noatime on features.cloudsync-storetype (null) features.enforce-mandatory-lock off features.cloudsync-remote-read off features.cloudsync-store-id (null) features.cloudsync-product-id (null) Based on bug 1104618, this issue is mitigated by volume option: cluster.consistent-metadata on Currently we have: # gluster v get GE_storage9_volume01 all | grep cluster.consistent-metadata cluster.consistent-metadata no I tried to enable this option: # gluster v set GE_storage9_volume01 cluster.consistent-metadata on volume set: success But the this does not fix the issue. Maybe we need to unmount and remount to see the effect?
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1104618#c16 we need to remount the volume to see the change.
Correction: I queried the wrong volume before (GE_storage9_volume01) Here are the details of the right volume: # gluster v info GE_storage7_volume01 Volume Name: GE_storage7_volume01 Type: Replicate Volume ID: 60a2d971-9833-48b0-86f9-320a65f2f1d0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster01.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01 Brick2: gluster02.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01 Brick3: gluster03.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01 Options Reconfigured: storage.owner-uid: 36 storage.owner-gid: 36 transport.address-family: inet6 storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off # gluster v status GE_storage7_volume01 Status of volume: GE_storage7_volume01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster01.lab.eng.tlv2.redhat.com:/gl uster_volumes/GE_storage7_volume01 49354 0 Y 3585 Brick gluster02.lab.eng.tlv2.redhat.com:/gl uster_volumes/GE_storage7_volume01 49354 0 Y 3646 Brick gluster03.lab.eng.tlv2.redhat.com:/gl uster_volumes/GE_storage7_volume01 49354 0 Y 6837 Self-heal Daemon on localhost N/A N/A Y 3186417 Self-heal Daemon on gluster02.lab.eng.tlv2. redhat.com N/A N/A Y 2389346 Self-heal Daemon on gluster03.lab.eng.tlv2. redhat.com N/A N/A Y 3183458 Task Status of Volume GE_storage7_volume01 ------------------------------------------------------------------------------ There are no active volume tasks # gluster v get GE_storage7_volume01 all Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal off cluster.data-self-heal off cluster.entry-self-heal off cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock on disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.quick-read-cache-invalidationfalse performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) client.ssl off network.remote-dio disable client.event-threads 2 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.all-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl off auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 2 server.tcp-user-timeout 42 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 transport.address-family inet6 performance.write-behind on performance.read-ahead off performance.readdir-ahead off performance.io-cache off performance.open-behind on performance.quick-read on performance.nl-cache off performance.stat-prefetch on performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false performance.global-cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable on nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) nfs.rdirplus on nfs.event-threads 2 nfs.exports-auth-enable (null) nfs.auth-refresh-interval-sec (null) nfs.auth-cache-ttl-sec (null) features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid 36 storage.owner-gid 36 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 20 storage.fips-mode-rchecksum on storage.force-create-mode 0000 storage.force-directory-mode 0000 storage.create-mask 0777 storage.create-directory-mask 0777 storage.max-hardlinks 100 features.ctime off config.gfproxyd off cluster.server-quorum-type off cluster.server-quorum-ratio 51 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable cluster.write-freq-threshold 0 cluster.read-freq-threshold 0 cluster.tier-pause off cluster.tier-promote-frequency 120 cluster.tier-demote-frequency 3600 cluster.watermark-hi 90 cluster.watermark-low 75 cluster.tier-mode cache cluster.tier-max-promote-file-size 0 cluster.tier-max-mb 4000 cluster.tier-max-files 10000 cluster.tier-query-limit 100 cluster.tier-compact on cluster.tier-hot-compact-frequency 604800 cluster.tier-cold-compact-frequency 604800 features.ctr-enabled off features.record-counters off features.ctr-record-metadata-heat off features.ctr_link_consistency off features.ctr_lookupheal_link_timeout 300 features.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db-cachesize 12500 features.ctr-sql-db-wal-autocheckpoint 25000 locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.shard off features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 ganesha.enable off features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 1 cluster.shd-wait-qlength 1024 cluster.locking-scheme full cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention yes features.locks-notify-contention-delay 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex disable glusterd.vol_count_per_thread 100 cluster.max-bricks-per-process 250 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 features.selinux off cluster.daemon-log-level INFO debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on disperse.quorum-count 0 features.sdfs off features.cloudsync off features.ctime off ctime.noatime on features.cloudsync-storetype (null) features.enforce-mandatory-lock off features.cloudsync-remote-read off features.cloudsync-store-id (null) features.cloudsync-product-id (null) Relevant options related to bug 1104618: # gluster v get GE_storage7_volume01 cluster.consistent-metadata Option Value ------ ----- cluster.consistent-metadata no # gluster v set GE_storage7_volume01 cluster.consistent-metadata on volume set: success # gluster v get GE_storage7_volume01 cluster.consistent-metadata Option Value ------ ----- cluster.consistent-metadata on After putting all hosts in maintenance, and activating SPM host, the issue cannot be reproduced. # for i in $(seq 100); do rm -rf $M/test; echo iteration $i; sh reproduce.sh $M; echo '----'; done No failure in 100 runs. So the issue is wrong volume option. Avihai: 1. Check that you run the latest gluster version 2. Modify this option in all existing volumes # gluster v set volume-name cluster.consistent-metadata on 3. Check that when creating new volume cluster.consistent-metadata is enabled. If not, we need a gluster bug to include this option in the virt group. 4. Someone need to automatic handle upgrade of old volumes, detecting the incompatible volume option "cluster.consistent-metadata no", and modifying the option to "cluster.consistent-metadata on".
Summary - we have 2 issues: 1. Wrong configuration on gluster volume, causing very likely failure (~80%) when switching master from gluster domain. 2. Missing locking in vdsm, when reading and writing tasks data to the master file system. Failure may happen when SPM task (e.g. create disk, delete disk) run while switching master. The chance to fail is very low, I could reproduce only one failure when switching master 200 times, and moving disks between storage domains in the same time. I will open another bug for the locking issue.
Created attachment 1831592 [details] Example script for testing change master Script uses change_master.py example from: https://github.com/oVirt/python-ovirt-engine-sdk4/pull/12 Should run inside the examples directory: # sh ~/change-master.sh > ~/change-master.log 2>&1 The dc and storage domains names should be modified for the test setup.
Created attachment 1831593 [details] change-master test log Log generated using change-master.sh. Testing vdsm master: https://gerrit.ovirt.org/c/vdsm/+/116982 With engine patch enabling switch master with gluster storage.
Moving back from vdsm/storage to ovirt-engine/gluster, see comment 48.
(In reply to Nir Soffer from comment #49) > 2. Missing locking in vdsm, when reading and writing tasks data to the > master file system ... I filed bug 2012843 for this.
Closing this as not a bug as this a configuration issue. from comment 48 , Switching master SD is working as expected when setting volume option cluster.consistent-metadata to on .