Bug 1913764
| Summary: | GlusterFS - Failing to assign new Master SD within the same storage type leads to infinite (or long) reconstruction | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Tal Nisan <tnisan> | ||||||||
| Component: | BLL.Gluster | Assignee: | Ritesh Chikatwar <rchikatw> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | SATHEESARAN <sasundar> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 4.4.4 | CC: | aefrat, ahadas, aoconnor, bugs, bzlotnik, dfodor, eshames, eshenitz, godas, lsvaty, michal.skrivanek, nsoffer, rchikatw, sasundar, sfishbai, sleviim, sshmulev, tnisan | ||||||||
| Target Milestone: | --- | Flags: | rchikatw:
needinfo-
rchikatw: needinfo- rchikatw: needinfo- sbonazzo: ovirt-4.5- lsvaty: blocker- |
||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | 1911597 | Environment: | |||||||||
| Last Closed: | 2021-10-18 05:56:06 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Tal Nisan
2021-01-07 14:37:49 UTC
Hi, I don't have an answer to your question, but for some reason, I tried again today to put the gluster domain into maintenance and I couldn't, although the service is running on the host. Ritesh, can you please have a look? Shir, was something changed on the host? Shani and I talked about this issue privetly. I didn't change nothing on our environment, the solution that coming from Gluster team - installing the gluster packages on our SPM host wasn't resolved the issue. The installion of gluster packages solution was resolved the issue just for a moment on our environment and when we checked it again, we saw that the bug was reproduced again. It's not seems like an ideal solution. We are still waiting for response from Shani and Gluster team, what is the other solution that the Gluster team offer? instead of installing the packages that wasn't resolved the issue that desribed on this bug? I can provide another environment of 4.4 for investigation if needed. Hello Shir, When i checked on the host gluster service is not running so i just gave a try installing and it did worked at the time. so i thought like that's the solution. Later when you guys tested it did not worked surely it's different issue then i have dive deep into it. Can you share me the details of remote server where gluster service is running. As of now i don't have answer to your question why it's failing so i will keep needinfo on me. (In reply to Ritesh Chikatwar from comment #6) > Hello Shir, > > > When i checked on the host gluster service is not running so i just gave a > try installing and it did worked at the time. so i thought like that's the > solution. > > Later when you guys tested it did not worked surely it's different issue > then i have dive deep into it. > > Can you share me the details of remote server where gluster service is > running. > > As of now i don't have answer to your question why it's failing so i will > keep needinfo on me. Of courcse, gave you the details via Gchat Moving this to 4.4.6 because i am still not able to find RCA for this. Created attachment 1761947 [details]
tested switch domain in local working as expected
Hello Shir, I created a setup in local to test this it was working as expected. i attached video of how i tested this. Shir can you provide me an environment where i can investigate this, the details earlier which you shared if still exist let me know i can use that or not. Packages details: Software Version:4.4.5.8_master glusterfs-8.3-1.el8.x86_64 Shani, As per Nir, TarCopyFailed error is part of deactivate storage domain flow. Switching master domain is not this flow, and should not include deactivating of the old storage domain. When i checked this i can see for a time being all the other gluster are locked for time being and then they seems to be up. This should not happen i guess. Have you observed this behavior. Hi Ritesh, This one is very weird, as the switch master operation should be blocked for GlusterFS domains (bug https://bugzilla.redhat.com/1911597): <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <fault> <detail>[Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]</detail> <reason>Operation Failed</reason> </fault> The operation should be also grayed out on the UI for Gluster domains (Storage Domains > select a domain > 3 dots menu on the right > Set as master storage domain). Can you try to re-build your env and try again? Once the switchMaster operation is blocked, in order to force calling 'master migration', we need to put into maintenance the current master domain and verify the new domain was elected, without any errors on the VDSM log. Can you please give it a try as well? Shani, Sure will try and update you, (In reply to shani from comment #11) > Hi Ritesh, > This one is very weird, as the switch master operation should be blocked for > GlusterFS domains (bug https://bugzilla.redhat.com/1911597): > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <fault> > <detail>[Cannot switch master storage domain. Switch master storage > domain is not supported for gluster-based domains.]</detail> > <reason>Operation Failed</reason> > </fault> > > The operation should be also grayed out on the UI for Gluster domains > (Storage Domains > select a domain > 3 dots menu on the right > Set as > master storage domain). > Yeah it did blocked me so locally i commented out the conndition and tried to switch master domain. > Can you try to re-build your env and try again? > > Once the switchMaster operation is blocked, in order to force calling > 'master migration', we need to put into maintenance the current master > domain and verify the new domain was elected, without any errors on the VDSM > log. > Can you please give it a try as well? Let me give a try this from UI. Will keep needinfo on me as i have to verify this from UI. Hello Shani, I did tried from the UI it seems it's working. i am attaching the video for referance. Created attachment 1762250 [details]
Tested from UI
Hi Ritesh, I agree, looks like it is working on your env. It's still needed to figure out what's the difference between your env and the QE's. Here are some thoughts: - different versions of gluster/any related component - maybe the fact that the QE are using a remote gluster server affects the scenario - why there's a failure of TarCopy on the QE env logs Once the QE will have an env, those can be tested, but maybe some research can be done by looking at the code. What do you think? Yeah let me take a look on code till we get the setup. (In reply to Ritesh Chikatwar from comment #16) > Yeah let me take a look on code till we get the setup. Do we have any progress with the bug? I have not found any suspicious in the code, may be environment will help to find the exact root cause. Can you provide me a fresh setup and i can dig more into the setup. Shani/Shir, can you please assist? This bug reproduced on both UI and Rest API by trying to switch the master SD from Gluser SD to Gluster SD.
The operation fails on both cases.
The reconstruction happens via the UI in the second try to reproduce the issue.
[1]
Rest API:
https:<hostname>/ovirt-engine/api/datacenters/<dtacenterID>/setmaster
<action>
<storage_domain id="<storage_domain>"/>
</action>
response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
<detail>[Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]</detail>
<reason>Operation Failed</reason>
</fault>
Engine log after rest API request:
2021-04-28 15:25:52,550+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-191) [] Operation Failed: [Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]
[2]
On UI:
1. Have an env with more than one GlusterFS SDs(3 Gulster SDs).
2. Assign master from A glusterfs to B glusterfs
2021-04-28 15:28:57,269+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] Failed in 'DeactivateStorageDomainVDS' method
2021-04-28 15:28:57,283+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command DeactivateStorageDomainVDS failed: Error in storage domain action: ('sdUUID=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, spUUID=bde80e57-4ece-44bd-b884-46099c442133, msdUUID=ad26e577-0099-43f2-b43a-5e3c3eeee4d1, masterVersion=5',)
2021-04-28 15:28:57,332+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, spUUID=bde80e57-4ece-44bd-b884-46099c442133, msdUUID=ad26e577-0099-43f2-b43a-5e3c3eeee4d1, masterVersion=5',), code = 350
2021-04-28 15:29:03,279+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command ConnectStoragePoolVDS failed: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133'
2021-04-28 15:29:03,344+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] Command 'ConnectStoragePoolVDSCommand(HostName = host_mixed_2, ConnectStoragePoolVDSCommandParameters:{hostId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91', vdsId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91', storagePoolId='bde80e57-4ece-44bd-b884-46099c442133', masterVersion='5'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133'
2021-04-28 15:29:03,344+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133'
2021-04-28 15:29:03,375+03 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] DeactivateStorageDomainVDS failed 'e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133' (Failed with error StoragePoolWrongMaster and code 324)
2021-04-28 15:29:03,379+03 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] Failed to deactivate storage domain 'e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd'
2021-04-28 15:29:03,438+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [] Command 'HSMGetAllTasksStatusesVDSCommand(HostName = host_mixed_2, VdsIdVDSCommandParametersBase:{hostId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91'})' execution failed: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM: ()
2021-04-28 15:29:04,255+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain test_gluster_2 (Data Center golden_env_mixed).
versions:
ovirt-engine-4.4.6.5-447.gd80dda7.9.el8ev.noarch
vdsm-4.40.60.4-1.el8ev.x86_64
glusterfs-6.0-49.1.el8.x86_64
Created attachment 1776597 [details]
Logs
*** Bug 1956263 has been marked as a duplicate of this bug. *** (In reply to Ritesh Chikatwar from comment #26) > Regarding TarCopyFailed exception. > > this is coming from this line > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/fileUtils.py#L59 > it's returning return code as 1, this is because tar is running on dynamic > directory because of this it's throwing that error. > i tried running that command on one of my setup to see what i get > > [root@rhsqa-grafton1-nic2 ~]# /usr/bin/tar cf -f --exclude=./lost+found -C > /rhev/data-center/mnt/glusterSD/<hostname>:_vmstore . > /usr/bin/tar: > ./e5a79957-bb2b-414f-bf6f-cbb2a4b98d0b/images/0b63e72a-ec4a-43e1-ba75- > a480c58b54c9: file changed as we read it The issue is that some file in the master mount changed during the copy. This is a real issue in vdsm. It must use a lock to prevent tasks from writing task state when switching master storage domain from one domain to the other. Normally every task writes to its own directory, so multiple tasks can write together without any locking. But when we copy the entire directory we need to lock the entire master mount while we copy. Moving the bug to vdsm, this is not related to engine or gluster. Evelina, do we know how to reproduce this bug? If we have a good way to reproduce this, it will help if we do early testing before we merge this. The fix is also risky, adding locking in very dark corners of vdsm, which may cause deadlocks. I would like to make sure all QE tiers works with the fix. Reproduced with NFS storage domain with a patch improving error reporting
in tarCopy:
2021-09-14 00:33:44,369+0300 ERROR (tasks/1) [storage.taskmanager.task] (Task='928b287c-fdd0-487f-bec5-f98d63b63ba2') Unexpect
ed error (task:877)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
return fn(*args, **kargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
return self.cmd(*self.argslist, **self.argsdict)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 989, in switchMaster
self.masterMigrate(oldMasterUUID, newMasterUUID, masterVersion)
File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate
exclude=('./lost+found',))
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy
raise se.TarCommandError(errors)
vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+
found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/u
sr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde
-9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},)
The issue are:
- task directory changed during copy
- temporary task directory removed while reading
Created attachment 1822863 [details]
vdsm.log for reproduction on NFS storage domain
Log during tarCopy failure:
Starting tarCopy:
2021-09-14 00:33:44,305+0300 INFO (tasks/1) [storage.fileutils] Copying '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97
c-497c-85a2-2f044993bc2e/master' to '/rhev/data-center/mnt/sparse:_export_01/0c065db4-feca-4179-801b-b51be9499943/master' (fil
eUtils:55)
Staring reader:
2021-09-14 00:33:44,305+0300 DEBUG (tasks/1) [storage.fileutils] Starting reader: ['/usr/bin/tar', 'cf', '-', '--exclude=./los
t+found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'] (fileUtils:62)
While copying masster filesystem, tasks modifying /tasks/ directory:
Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 creates temporary directory:
2021-09-14 00:33:44,326+0300 DEBUG (jsonrpc/1) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781)
Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 modifies temporary directory:
2021-09-14 00:33:44,335+0300 DEBUG (jsonrpc/1) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781)
Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 modifies temporary directory:
2021-09-14 00:33:44,347+0300 DEBUG (tasks/5) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781)
tarCopy ends:
2021-09-14 00:33:44,365+0300 DEBUG (tasks/1) [storage.fileutils] Child processes terminated (fileUtils:79)
2021-09-14 00:33:44,365+0300 ERROR (tasks/1) [storage.storagepool] Migration to new master 0c065db4-feca-4179-801b-b51be9499943 failed (sp:926)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate
exclude=('./lost+found',))
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy
raise se.TarCommandError(errors)
vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},)
tarCopy failed because the task directory was modified while reading it, and the task
temporary directory was deleted (part of task saving flow).
So we to make sure tasks cannot modify storage (write data, create/remove directories/files)
during tarCopy.
Created attachment 1822866 [details]
reproduction log
Log showing how to reproduce this issue using new change_master.py SDK example script.
To reproduce the issue, you can use change_master.py example script: https://github.com/oVirt/python-ovirt-engine-sdk4/pull/12 1. Create 10 1g empty preallocated disks 2. Start switching master loop $ for i in $(seq 100); do ./change_master.py -c engine-dev dc1 sd-1 ./change_master.py -c engine-dev dc1 sd-2 done 3. Move all disks between 2 other storage domains (e.g. sd-3, sd-4) Using the same storage domains used as master will not reproduce the issue since disk copy takes a lock on the storage domain, and this will block switch master operation during the copy. When starting a move, the system create 10 SPM tasks. When ending the move, the system creates 10 SPM tasks to delete the original disks. When all disks moved, move them back to the original domain. Repeat while switch master is running. This step should be automated. Expected results: Without the fix, switch master may fail after many iterations (see comment 35). With the fix, switch master should complete successfully. In the failing test (attachment 1822866 [details]), it took 111 change_master.py runs to fail. Notes: It is expected to see these messages: Setting storage domain sparse-nfs-00 as master Operation failed: [Cannot switch master storage domain. The storage pool has running tasks.] Trying again in 10 seconds... This means that SPM tasks are running, and change_master will retry the operation. It may be easier to reproduce with Gluster storage, but for this we must have engine version that does not block switch master operation for Gluster. Ritesh, if you have Gluster environment available, and you know how to
reproduce this quickly, it will help to reproduce again with current
vdsm master, including this commit:
commit 3cad5b9237ce2030861132173d4fc6bb9782fc08
fileUtils: Fix error handling and logging in tarCopy
This commit improves the error logs when tarCopy fail, so we will have
better understanding why it fails in Gluster. You will get detailed exception
like the one in comment 35 (tested on NFS).
You should be able to reproduce this by putting master storage domain to
maintenance as described in comment 0 section 2.
Ritesh, we can also test the fix on Gluster with these repos: https://jenkins.ovirt.org/job/ovirt-engine_standard-check-patch/14264/artifact/build-artifacts.el8stream.x86_64/ https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29778/artifact/build-artifacts.build-py3.el8stream.x86_64/ @nsoffer as of now i don't have gluster environment available, i have to create one and test this. i will update the status with this change. Hey Nir, I tried reproducing this issue on my local setup but i couldn't looks like your change is working. If you are interested in doing some of the testing i can provide the env please ping me over chat. (In reply to Ritesh Chikatwar from comment #43) > I tried reproducing this issue on my local setup but i couldn't looks like > your change is working. Did you reproduce first with current master? Otherwise we cannot tell if the change fixed the issue, or you just did not reproduce the issue because of another reason. You can downgrade vdsm to use later release, and run the flow that reproduced this issue in the past. The change_master.py example should make it easy to reproduce, see comment 39. We reproduced the issue with gluster with the vdsm locking fix: https://gerrit.ovirt.org/c/vdsm/+/116413 and with additional fix syncing the master fs before coping it: https://gerrit.ovirt.org/c/vdsm/+/117047 Both fixes do not help, copying the master filesystem with tar always fails on gluster with: /usr/bin/tar: .: file changed as we read it This looks exactly like bug 1104618, but this bug seems to be fixed in glusterfs-3.6.0.44-1, while we use # rpm -q glusterfs glusterfs-9.4-1.el8s.x86_64 So this looks like a glsuter regression. I could reproduce this issue without RHV: # cat reproduce.sh set -e mp=${1:?Usage: sh reproduce.sh MOUNTPOINT} task=`uuidgen` mkdir $mp/test mkdir $mp/test/tasks mkdir $mp/test/tasks/$task echo "task data" > $mp/test/tasks/$task/$task.job.0 sync --file-system $mp/test tar cf test.tar -C $mp/test . Reproducing example: # mount | grep gluster gluster01.lab.eng.tlv2.redhat.com:/GE_storage7_volume01 on /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__storage7__volume01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) # echo "testing tar copy" >> /var/log/glusterfs/rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com\:_GE__storage7__volume01.log # for i in $(seq 10); do rm -rf $M/test; echo iteration $i; sh reproduce.sh $M; echo '----'; done iteration 1 ---- iteration 2 tar: ./tasks/fd4c4d2c-5d55-41a8-b499-ade2172f7442: file changed as we read it tar: ./tasks: file changed as we read it ---- iteration 3 tar: ./tasks/b23d7fc2-5e7c-408e-9c65-e0cc27dc33d0: file changed as we read it tar: ./tasks: file changed as we read it ---- iteration 4 tar: ./tasks: file changed as we read it ---- iteration 5 ---- iteration 6 tar: ./tasks: file changed as we read it tar: .: file changed as we read it ---- iteration 7 tar: ./tasks/dd042dbd-8569-4257-b2c0-7a7b613ecd6b: file changed as we read it tar: ./tasks: file changed as we read it tar: .: file changed as we read it ---- iteration 8 tar: ./tasks: file changed as we read it tar: .: file changed as we read it ---- iteration 9 tar: ./tasks/40790646-c689-42f4-8ae6-6ca66c4556f2: file changed as we read it tar: ./tasks: file changed as we read it ---- iteration 10 tar: ./tasks: file changed as we read it ---- There are no error in glusterfs logs: # tail -1 /var/log/glusterfs/rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com\:_GE__storage7__volume01.log testing tar copy We should move this bug to gluster, or create new gluster bug, but I don't have the info needed to open the bug (packages versions, volume info, etc). Gobinda, who can help with this? Move info I could get from:
from one of the gluster servers:
# rpm -qa | grep gluster
glusterfs-libs-6.0-37.el8rhgs.x86_64
glusterfs-6.0-37.el8rhgs.x86_64
glusterfs-server-6.0-37.el8rhgs.x86_64
glusterfs-client-xlators-6.0-37.el8rhgs.x86_64
glusterfs-fuse-6.0-37.el8rhgs.x86_64
glusterfs-api-6.0-37.el8rhgs.x86_64
glusterfs-cli-6.0-37.el8rhgs.x86_64
python3-gluster-6.0-37.el8rhgs.x86_64
glusterfs-geo-replication-6.0-37.el8rhgs.x86_64
# gluster v info GE_storage9_volume01
Volume Name: GE_storage9_volume01
Type: Replicate
Volume ID: 39feddf0-a521-4977-85bd-d5d2d5d19c65
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster01.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01
Brick2: gluster02.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01
Brick3: gluster03.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01
Options Reconfigured:
storage.owner-uid: 36
storage.owner-gid: 36
transport.address-family: inet6
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
# gluster v status GE_storage9_volume01
Status of volume: GE_storage9_volume01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster01.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage9_volume01 49368 0 Y 3739
Brick gluster02.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage9_volume01 49368 0 Y 3800
Brick gluster03.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage9_volume01 49368 0 Y 6991
Self-heal Daemon on localhost N/A N/A Y 3186417
Self-heal Daemon on gluster03.lab.eng.tlv2.
redhat.com N/A N/A Y 3183458
Self-heal Daemon on gluster02.lab.eng.tlv2.
redhat.com N/A N/A Y 2389346
Task Status of Volume GE_storage9_volume01
------------------------------------------------------------------------------
There are no active volume tasks
# gluster v get GE_storage9_volume01 all
Option Value
------ -----
cluster.lookup-unhashed on
cluster.lookup-optimize on
cluster.min-free-disk 10%
cluster.min-free-inodes 5%
cluster.rebalance-stats off
cluster.subvols-per-directory (null)
cluster.readdir-optimize off
cluster.rsync-hash-regex (null)
cluster.extra-hash-regex (null)
cluster.dht-xattr-name trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid off
cluster.rebal-throttle normal
cluster.lock-migration off
cluster.force-migration off
cluster.local-volume-name (null)
cluster.weighted-rebalance on
cluster.switch-pattern (null)
cluster.entry-change-log on
cluster.read-subvolume (null)
cluster.read-subvolume-index -1
cluster.read-hash-mode 1
cluster.background-self-heal-count 8
cluster.metadata-self-heal off
cluster.data-self-heal off
cluster.entry-self-heal off
cluster.self-heal-daemon on
cluster.heal-timeout 600
cluster.self-heal-window-size 1
cluster.data-change-log on
cluster.metadata-change-log on
cluster.data-self-heal-algorithm (null)
cluster.eager-lock on
disperse.eager-lock on
disperse.other-eager-lock on
disperse.eager-lock-timeout 1
disperse.other-eager-lock-timeout 1
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.choose-local true
cluster.self-heal-readdir-size 1KB
cluster.post-op-delay-secs 1
cluster.ensure-durability on
cluster.consistent-metadata no
cluster.heal-wait-queue-length 128
cluster.favorite-child-policy none
cluster.full-lock yes
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-level CRITICAL
diagnostics.brick-logger (null)
diagnostics.client-logger (null)
diagnostics.brick-log-format (null)
diagnostics.client-log-format (null)
diagnostics.brick-log-buf-size 5
diagnostics.client-log-buf-size 5
diagnostics.brick-log-flush-timeout 120
diagnostics.client-log-flush-timeout 120
diagnostics.stats-dump-interval 0
diagnostics.fop-sample-interval 0
diagnostics.stats-dump-format json
diagnostics.fop-sample-buf-size 65535
diagnostics.stats-dnscache-ttl-sec 86400
performance.cache-max-file-size 0
performance.cache-min-file-size 0
performance.cache-refresh-timeout 1
performance.cache-priority
performance.cache-size 32MB
performance.io-thread-count 16
performance.high-prio-threads 16
performance.normal-prio-threads 16
performance.low-prio-threads 16
performance.least-prio-threads 1
performance.enable-least-priority on
performance.iot-watchdog-secs (null)
performance.iot-cleanup-disconnected-reqsoff
performance.iot-pass-through false
performance.io-cache-pass-through false
performance.cache-size 128MB
performance.qr-cache-timeout 1
performance.quick-read-cache-invalidationfalse
performance.ctime-invalidation false
performance.flush-behind on
performance.nfs.flush-behind on
performance.write-behind-window-size 1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct off
performance.nfs.strict-o-direct off
performance.strict-write-ordering off
performance.nfs.strict-write-ordering off
performance.write-behind-trickling-writeson
performance.aggregate-size 128KB
performance.nfs.write-behind-trickling-writeson
performance.lazy-open yes
performance.read-after-open yes
performance.open-behind-pass-through false
performance.read-ahead-page-count 4
performance.read-ahead-pass-through false
performance.readdir-ahead-pass-through false
performance.md-cache-pass-through false
performance.md-cache-timeout 1
performance.cache-swift-metadata true
performance.cache-samba-metadata false
performance.cache-capability-xattrs true
performance.cache-ima-xattrs true
performance.md-cache-statfs off
performance.xattr-cache-list
performance.nl-cache-pass-through false
features.encryption off
network.frame-timeout 1800
network.ping-timeout 42
network.tcp-window-size (null)
client.ssl off
network.remote-dio disable
client.event-threads 2
client.tcp-user-timeout 0
client.keepalive-time 20
client.keepalive-interval 2
client.keepalive-count 9
network.tcp-window-size (null)
network.inode-lru-limit 16384
auth.allow *
auth.reject (null)
transport.keepalive 1
server.allow-insecure on
server.root-squash off
server.all-squash off
server.anonuid 65534
server.anongid 65534
server.statedump-path /var/run/gluster
server.outstanding-rpc-limit 64
server.ssl off
auth.ssl-allow *
server.manage-gids off
server.dynamic-auth on
client.send-gids on
server.gid-timeout 300
server.own-thread (null)
server.event-threads 2
server.tcp-user-timeout 42
server.keepalive-time 20
server.keepalive-interval 2
server.keepalive-count 9
transport.listen-backlog 1024
transport.address-family inet6
performance.write-behind on
performance.read-ahead off
performance.readdir-ahead off
performance.io-cache off
performance.open-behind on
performance.quick-read on
performance.nl-cache off
performance.stat-prefetch on
performance.client-io-threads off
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true
performance.cache-invalidation false
performance.global-cache-invalidation true
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
features.tag-namespaces off
network.compression off
network.compression.window-size -15
network.compression.mem-level 8
network.compression.min-size 0
network.compression.compression-level -1
network.compression.debug false
features.default-soft-limit 80%
features.soft-timeout 60
features.hard-timeout 5
features.alert-time 86400
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no
debug.log-file no
debug.exclude-ops (null)
debug.include-ops (null)
debug.error-gen off
debug.error-failure (null)
debug.error-number (null)
debug.random-failure off
debug.error-fops (null)
nfs.enable-ino32 no
nfs.mem-factor 15
nfs.export-dirs on
nfs.export-volumes on
nfs.addr-namelookup off
nfs.dynamic-volumes off
nfs.register-with-portmap on
nfs.outstanding-rpc-limit 16
nfs.port 2049
nfs.rpc-auth-unix on
nfs.rpc-auth-null on
nfs.rpc-auth-allow all
nfs.rpc-auth-reject none
nfs.ports-insecure off
nfs.trusted-sync off
nfs.trusted-write off
nfs.volume-access read-write
nfs.export-dir
nfs.disable on
nfs.nlm on
nfs.acl on
nfs.mount-udp off
nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd /sbin/rpc.statd
nfs.server-aux-gids off
nfs.drc off
nfs.drc-size 0x20000
nfs.read-size (1 * 1048576ULL)
nfs.write-size (1 * 1048576ULL)
nfs.readdir-size (1 * 1048576ULL)
nfs.rdirplus on
nfs.event-threads 2
nfs.exports-auth-enable (null)
nfs.auth-refresh-interval-sec (null)
nfs.auth-cache-ttl-sec (null)
features.read-only off
features.worm off
features.worm-file-level off
features.worm-files-deletable on
features.default-retention-period 120
features.retention-mode relax
features.auto-commit-period 180
storage.linux-aio off
storage.batch-fsync-mode reverse-fsync
storage.batch-fsync-delay-usec 0
storage.owner-uid 36
storage.owner-gid 36
storage.node-uuid-pathinfo off
storage.health-check-interval 30
storage.build-pgfid off
storage.gfid2path on
storage.gfid2path-separator :
storage.reserve 1
storage.health-check-timeout 20
storage.fips-mode-rchecksum on
storage.force-create-mode 0000
storage.force-directory-mode 0000
storage.create-mask 0777
storage.create-directory-mask 0777
storage.max-hardlinks 100
features.ctime off
config.gfproxyd off
cluster.server-quorum-type off
cluster.server-quorum-ratio 51
changelog.changelog off
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs
changelog.encoding ascii
changelog.rollover-time 15
changelog.fsync-interval 5
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off
features.barrier disable
features.barrier-timeout 120
features.trash off
features.trash-dir .trashcan
features.trash-eliminate-path (null)
features.trash-max-filesize 5MB
features.trash-internal-op off
cluster.enable-shared-storage disable
cluster.write-freq-threshold 0
cluster.read-freq-threshold 0
cluster.tier-pause off
cluster.tier-promote-frequency 120
cluster.tier-demote-frequency 3600
cluster.watermark-hi 90
cluster.watermark-low 75
cluster.tier-mode cache
cluster.tier-max-promote-file-size 0
cluster.tier-max-mb 4000
cluster.tier-max-files 10000
cluster.tier-query-limit 100
cluster.tier-compact on
cluster.tier-hot-compact-frequency 604800
cluster.tier-cold-compact-frequency 604800
features.ctr-enabled off
features.record-counters off
features.ctr-record-metadata-heat off
features.ctr_link_consistency off
features.ctr_lookupheal_link_timeout 300
features.ctr_lookupheal_inode_timeout 300
features.ctr-sql-db-cachesize 12500
features.ctr-sql-db-wal-autocheckpoint 25000
locks.trace off
locks.mandatory-locking off
cluster.disperse-self-heal-daemon enable
cluster.quorum-reads no
client.bind-insecure (null)
features.shard off
features.shard-block-size 64MB
features.shard-lru-limit 16384
features.shard-deletion-rate 100
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false
features.expiry-time 120
features.cache-invalidation off
features.cache-invalidation-timeout 60
ganesha.enable off
features.leases off
features.lease-lock-recall-timeout 60
disperse.background-heals 8
disperse.heal-wait-qlength 128
cluster.heal-timeout 600
dht.force-readdirp on
disperse.read-policy gfid-hash
cluster.shd-max-threads 1
cluster.shd-wait-qlength 1024
cluster.locking-scheme full
cluster.granular-entry-heal no
features.locks-revocation-secs 0
features.locks-revocation-clear-all false
features.locks-revocation-max-blocked 0
features.locks-monkey-unlocking false
features.locks-notify-contention yes
features.locks-notify-contention-delay 5
disperse.shd-max-threads 1
disperse.shd-wait-qlength 1024
disperse.cpu-extensions auto
disperse.self-heal-window-size 1
cluster.use-compound-fops off
performance.parallel-readdir off
performance.rda-request-size 131072
performance.rda-low-wmark 4096
performance.rda-high-wmark 128KB
performance.rda-cache-limit 10MB
performance.nl-cache-positive-entry false
performance.nl-cache-limit 10MB
performance.nl-cache-timeout 60
cluster.brick-multiplex disable
glusterd.vol_count_per_thread 100
cluster.max-bricks-per-process 250
disperse.optimistic-change-log on
disperse.stripe-cache 4
cluster.halo-enabled False
cluster.halo-shd-max-latency 99999
cluster.halo-nfsd-max-latency 5
cluster.halo-max-latency 5
cluster.halo-max-replicas 99999
cluster.halo-min-replicas 2
features.selinux off
cluster.daemon-log-level INFO
debug.delay-gen off
delay-gen.delay-percentage 10%
delay-gen.delay-duration 100000
delay-gen.enable
disperse.parallel-writes on
disperse.quorum-count 0
features.sdfs off
features.cloudsync off
features.ctime off
ctime.noatime on
features.cloudsync-storetype (null)
features.enforce-mandatory-lock off
features.cloudsync-remote-read off
features.cloudsync-store-id (null)
features.cloudsync-product-id (null)
Based on bug 1104618, this issue is mitigated by volume option:
cluster.consistent-metadata on
Currently we have:
# gluster v get GE_storage9_volume01 all | grep cluster.consistent-metadata
cluster.consistent-metadata no
I tried to enable this option:
# gluster v set GE_storage9_volume01 cluster.consistent-metadata on
volume set: success
But the this does not fix the issue. Maybe we need to unmount and remount to
see the effect?
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1104618#c16 we need to remount the volume to see the change. Correction: I queried the wrong volume before (GE_storage9_volume01)
Here are the details of the right volume:
# gluster v info GE_storage7_volume01
Volume Name: GE_storage7_volume01
Type: Replicate
Volume ID: 60a2d971-9833-48b0-86f9-320a65f2f1d0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster01.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01
Brick2: gluster02.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01
Brick3: gluster03.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01
Options Reconfigured:
storage.owner-uid: 36
storage.owner-gid: 36
transport.address-family: inet6
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
# gluster v status GE_storage7_volume01
Status of volume: GE_storage7_volume01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster01.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage7_volume01 49354 0 Y 3585
Brick gluster02.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage7_volume01 49354 0 Y 3646
Brick gluster03.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage7_volume01 49354 0 Y 6837
Self-heal Daemon on localhost N/A N/A Y 3186417
Self-heal Daemon on gluster02.lab.eng.tlv2.
redhat.com N/A N/A Y 2389346
Self-heal Daemon on gluster03.lab.eng.tlv2.
redhat.com N/A N/A Y 3183458
Task Status of Volume GE_storage7_volume01
------------------------------------------------------------------------------
There are no active volume tasks
# gluster v get GE_storage7_volume01 all
Option Value
------ -----
cluster.lookup-unhashed on
cluster.lookup-optimize on
cluster.min-free-disk 10%
cluster.min-free-inodes 5%
cluster.rebalance-stats off
cluster.subvols-per-directory (null)
cluster.readdir-optimize off
cluster.rsync-hash-regex (null)
cluster.extra-hash-regex (null)
cluster.dht-xattr-name trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid off
cluster.rebal-throttle normal
cluster.lock-migration off
cluster.force-migration off
cluster.local-volume-name (null)
cluster.weighted-rebalance on
cluster.switch-pattern (null)
cluster.entry-change-log on
cluster.read-subvolume (null)
cluster.read-subvolume-index -1
cluster.read-hash-mode 1
cluster.background-self-heal-count 8
cluster.metadata-self-heal off
cluster.data-self-heal off
cluster.entry-self-heal off
cluster.self-heal-daemon on
cluster.heal-timeout 600
cluster.self-heal-window-size 1
cluster.data-change-log on
cluster.metadata-change-log on
cluster.data-self-heal-algorithm (null)
cluster.eager-lock on
disperse.eager-lock on
disperse.other-eager-lock on
disperse.eager-lock-timeout 1
disperse.other-eager-lock-timeout 1
cluster.quorum-type auto
cluster.quorum-count (null)
cluster.choose-local true
cluster.self-heal-readdir-size 1KB
cluster.post-op-delay-secs 1
cluster.ensure-durability on
cluster.consistent-metadata no
cluster.heal-wait-queue-length 128
cluster.favorite-child-policy none
cluster.full-lock yes
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-level CRITICAL
diagnostics.brick-logger (null)
diagnostics.client-logger (null)
diagnostics.brick-log-format (null)
diagnostics.client-log-format (null)
diagnostics.brick-log-buf-size 5
diagnostics.client-log-buf-size 5
diagnostics.brick-log-flush-timeout 120
diagnostics.client-log-flush-timeout 120
diagnostics.stats-dump-interval 0
diagnostics.fop-sample-interval 0
diagnostics.stats-dump-format json
diagnostics.fop-sample-buf-size 65535
diagnostics.stats-dnscache-ttl-sec 86400
performance.cache-max-file-size 0
performance.cache-min-file-size 0
performance.cache-refresh-timeout 1
performance.cache-priority
performance.cache-size 32MB
performance.io-thread-count 16
performance.high-prio-threads 16
performance.normal-prio-threads 16
performance.low-prio-threads 16
performance.least-prio-threads 1
performance.enable-least-priority on
performance.iot-watchdog-secs (null)
performance.iot-cleanup-disconnected-reqsoff
performance.iot-pass-through false
performance.io-cache-pass-through false
performance.cache-size 128MB
performance.qr-cache-timeout 1
performance.quick-read-cache-invalidationfalse
performance.ctime-invalidation false
performance.flush-behind on
performance.nfs.flush-behind on
performance.write-behind-window-size 1MB
performance.resync-failed-syncs-after-fsyncoff
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct off
performance.nfs.strict-o-direct off
performance.strict-write-ordering off
performance.nfs.strict-write-ordering off
performance.write-behind-trickling-writeson
performance.aggregate-size 128KB
performance.nfs.write-behind-trickling-writeson
performance.lazy-open yes
performance.read-after-open yes
performance.open-behind-pass-through false
performance.read-ahead-page-count 4
performance.read-ahead-pass-through false
performance.readdir-ahead-pass-through false
performance.md-cache-pass-through false
performance.md-cache-timeout 1
performance.cache-swift-metadata true
performance.cache-samba-metadata false
performance.cache-capability-xattrs true
performance.cache-ima-xattrs true
performance.md-cache-statfs off
performance.xattr-cache-list
performance.nl-cache-pass-through false
features.encryption off
network.frame-timeout 1800
network.ping-timeout 42
network.tcp-window-size (null)
client.ssl off
network.remote-dio disable
client.event-threads 2
client.tcp-user-timeout 0
client.keepalive-time 20
client.keepalive-interval 2
client.keepalive-count 9
network.tcp-window-size (null)
network.inode-lru-limit 16384
auth.allow *
auth.reject (null)
transport.keepalive 1
server.allow-insecure on
server.root-squash off
server.all-squash off
server.anonuid 65534
server.anongid 65534
server.statedump-path /var/run/gluster
server.outstanding-rpc-limit 64
server.ssl off
auth.ssl-allow *
server.manage-gids off
server.dynamic-auth on
client.send-gids on
server.gid-timeout 300
server.own-thread (null)
server.event-threads 2
server.tcp-user-timeout 42
server.keepalive-time 20
server.keepalive-interval 2
server.keepalive-count 9
transport.listen-backlog 1024
transport.address-family inet6
performance.write-behind on
performance.read-ahead off
performance.readdir-ahead off
performance.io-cache off
performance.open-behind on
performance.quick-read on
performance.nl-cache off
performance.stat-prefetch on
performance.client-io-threads off
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true
performance.cache-invalidation false
performance.global-cache-invalidation true
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
features.tag-namespaces off
network.compression off
network.compression.window-size -15
network.compression.mem-level 8
network.compression.min-size 0
network.compression.compression-level -1
network.compression.debug false
features.default-soft-limit 80%
features.soft-timeout 60
features.hard-timeout 5
features.alert-time 86400
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no
debug.log-file no
debug.exclude-ops (null)
debug.include-ops (null)
debug.error-gen off
debug.error-failure (null)
debug.error-number (null)
debug.random-failure off
debug.error-fops (null)
nfs.enable-ino32 no
nfs.mem-factor 15
nfs.export-dirs on
nfs.export-volumes on
nfs.addr-namelookup off
nfs.dynamic-volumes off
nfs.register-with-portmap on
nfs.outstanding-rpc-limit 16
nfs.port 2049
nfs.rpc-auth-unix on
nfs.rpc-auth-null on
nfs.rpc-auth-allow all
nfs.rpc-auth-reject none
nfs.ports-insecure off
nfs.trusted-sync off
nfs.trusted-write off
nfs.volume-access read-write
nfs.export-dir
nfs.disable on
nfs.nlm on
nfs.acl on
nfs.mount-udp off
nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd /sbin/rpc.statd
nfs.server-aux-gids off
nfs.drc off
nfs.drc-size 0x20000
nfs.read-size (1 * 1048576ULL)
nfs.write-size (1 * 1048576ULL)
nfs.readdir-size (1 * 1048576ULL)
nfs.rdirplus on
nfs.event-threads 2
nfs.exports-auth-enable (null)
nfs.auth-refresh-interval-sec (null)
nfs.auth-cache-ttl-sec (null)
features.read-only off
features.worm off
features.worm-file-level off
features.worm-files-deletable on
features.default-retention-period 120
features.retention-mode relax
features.auto-commit-period 180
storage.linux-aio off
storage.batch-fsync-mode reverse-fsync
storage.batch-fsync-delay-usec 0
storage.owner-uid 36
storage.owner-gid 36
storage.node-uuid-pathinfo off
storage.health-check-interval 30
storage.build-pgfid off
storage.gfid2path on
storage.gfid2path-separator :
storage.reserve 1
storage.health-check-timeout 20
storage.fips-mode-rchecksum on
storage.force-create-mode 0000
storage.force-directory-mode 0000
storage.create-mask 0777
storage.create-directory-mask 0777
storage.max-hardlinks 100
features.ctime off
config.gfproxyd off
cluster.server-quorum-type off
cluster.server-quorum-ratio 51
changelog.changelog off
changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs
changelog.encoding ascii
changelog.rollover-time 15
changelog.fsync-interval 5
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off
features.barrier disable
features.barrier-timeout 120
features.trash off
features.trash-dir .trashcan
features.trash-eliminate-path (null)
features.trash-max-filesize 5MB
features.trash-internal-op off
cluster.enable-shared-storage disable
cluster.write-freq-threshold 0
cluster.read-freq-threshold 0
cluster.tier-pause off
cluster.tier-promote-frequency 120
cluster.tier-demote-frequency 3600
cluster.watermark-hi 90
cluster.watermark-low 75
cluster.tier-mode cache
cluster.tier-max-promote-file-size 0
cluster.tier-max-mb 4000
cluster.tier-max-files 10000
cluster.tier-query-limit 100
cluster.tier-compact on
cluster.tier-hot-compact-frequency 604800
cluster.tier-cold-compact-frequency 604800
features.ctr-enabled off
features.record-counters off
features.ctr-record-metadata-heat off
features.ctr_link_consistency off
features.ctr_lookupheal_link_timeout 300
features.ctr_lookupheal_inode_timeout 300
features.ctr-sql-db-cachesize 12500
features.ctr-sql-db-wal-autocheckpoint 25000
locks.trace off
locks.mandatory-locking off
cluster.disperse-self-heal-daemon enable
cluster.quorum-reads no
client.bind-insecure (null)
features.shard off
features.shard-block-size 64MB
features.shard-lru-limit 16384
features.shard-deletion-rate 100
features.scrub-throttle lazy
features.scrub-freq biweekly
features.scrub false
features.expiry-time 120
features.cache-invalidation off
features.cache-invalidation-timeout 60
ganesha.enable off
features.leases off
features.lease-lock-recall-timeout 60
disperse.background-heals 8
disperse.heal-wait-qlength 128
cluster.heal-timeout 600
dht.force-readdirp on
disperse.read-policy gfid-hash
cluster.shd-max-threads 1
cluster.shd-wait-qlength 1024
cluster.locking-scheme full
cluster.granular-entry-heal no
features.locks-revocation-secs 0
features.locks-revocation-clear-all false
features.locks-revocation-max-blocked 0
features.locks-monkey-unlocking false
features.locks-notify-contention yes
features.locks-notify-contention-delay 5
disperse.shd-max-threads 1
disperse.shd-wait-qlength 1024
disperse.cpu-extensions auto
disperse.self-heal-window-size 1
cluster.use-compound-fops off
performance.parallel-readdir off
performance.rda-request-size 131072
performance.rda-low-wmark 4096
performance.rda-high-wmark 128KB
performance.rda-cache-limit 10MB
performance.nl-cache-positive-entry false
performance.nl-cache-limit 10MB
performance.nl-cache-timeout 60
cluster.brick-multiplex disable
glusterd.vol_count_per_thread 100
cluster.max-bricks-per-process 250
disperse.optimistic-change-log on
disperse.stripe-cache 4
cluster.halo-enabled False
cluster.halo-shd-max-latency 99999
cluster.halo-nfsd-max-latency 5
cluster.halo-max-latency 5
cluster.halo-max-replicas 99999
cluster.halo-min-replicas 2
features.selinux off
cluster.daemon-log-level INFO
debug.delay-gen off
delay-gen.delay-percentage 10%
delay-gen.delay-duration 100000
delay-gen.enable
disperse.parallel-writes on
disperse.quorum-count 0
features.sdfs off
features.cloudsync off
features.ctime off
ctime.noatime on
features.cloudsync-storetype (null)
features.enforce-mandatory-lock off
features.cloudsync-remote-read off
features.cloudsync-store-id (null)
features.cloudsync-product-id (null)
Relevant options related to bug 1104618:
# gluster v get GE_storage7_volume01 cluster.consistent-metadata
Option Value
------ -----
cluster.consistent-metadata no
# gluster v set GE_storage7_volume01 cluster.consistent-metadata on
volume set: success
# gluster v get GE_storage7_volume01 cluster.consistent-metadata
Option Value
------ -----
cluster.consistent-metadata on
After putting all hosts in maintenance, and activating SPM host,
the issue cannot be reproduced.
# for i in $(seq 100); do rm -rf $M/test; echo iteration $i; sh reproduce.sh $M; echo '----'; done
No failure in 100 runs.
So the issue is wrong volume option.
Avihai:
1. Check that you run the latest gluster version
2. Modify this option in all existing volumes
# gluster v set volume-name cluster.consistent-metadata on
3. Check that when creating new volume cluster.consistent-metadata is
enabled. If not, we need a gluster bug to include this option in the
virt group.
4. Someone need to automatic handle upgrade of old volumes, detecting
the incompatible volume option "cluster.consistent-metadata no", and
modifying the option to "cluster.consistent-metadata on".
Summary - we have 2 issues: 1. Wrong configuration on gluster volume, causing very likely failure (~80%) when switching master from gluster domain. 2. Missing locking in vdsm, when reading and writing tasks data to the master file system. Failure may happen when SPM task (e.g. create disk, delete disk) run while switching master. The chance to fail is very low, I could reproduce only one failure when switching master 200 times, and moving disks between storage domains in the same time. I will open another bug for the locking issue. Created attachment 1831592 [details] Example script for testing change master Script uses change_master.py example from: https://github.com/oVirt/python-ovirt-engine-sdk4/pull/12 Should run inside the examples directory: # sh ~/change-master.sh > ~/change-master.log 2>&1 The dc and storage domains names should be modified for the test setup. Created attachment 1831593 [details] change-master test log Log generated using change-master.sh. Testing vdsm master: https://gerrit.ovirt.org/c/vdsm/+/116982 With engine patch enabling switch master with gluster storage. Moving back from vdsm/storage to ovirt-engine/gluster, see comment 48. (In reply to Nir Soffer from comment #49) > 2. Missing locking in vdsm, when reading and writing tasks data to the > master file system ... I filed bug 2012843 for this. Closing this as not a bug as this a configuration issue. from comment 48 , Switching master SD is working as expected when setting volume option cluster.consistent-metadata to on . |