Bug 1913764

Summary:

GlusterFS - Failing to assign new Master SD within the same storage type leads to infinite (or long) reconstruction

Product:

[oVirt] ovirt-engine

Reporter:

Tal Nisan <tnisan>

Component:

BLL.Gluster

Assignee:

Ritesh Chikatwar <rchikatw>

Status:

CLOSED NOTABUG

QA Contact:

SATHEESARAN <sasundar>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.4.4

CC:

aefrat, ahadas, aoconnor, bugs, bzlotnik, dfodor, eshames, eshenitz, godas, lsvaty, michal.skrivanek, nsoffer, rchikatw, sasundar, sfishbai, sleviim, sshmulev, tnisan

Target Milestone:

---

Flags:

rchikatw: needinfo-
rchikatw: needinfo-
rchikatw: needinfo-
sbonazzo: ovirt-4.5-
lsvaty: blocker-

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

1911597

Environment:

Last Closed:

2021-10-18 05:56:06 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Gluster

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
tested switch domain in local working as expected	none
Logs	none
change-master test log	none

Description Tal Nisan 2021-01-07 14:37:49 UTC

+++ This bug was initially created as a clone of Bug #1911597 +++

Description of problem:

Since the new ability to assign Master SD with 'SwitchMasterStorageDomainCommand' command was introduced (BZ 1576923), there are two ways of assigning a new master SD:

1. With a help of restfull API which invokes the 'SwitchMasterStorageDomainCommand' at the engine end.
 POST /ovirt-engine/api/datacenters/123/setmaster
  <action>
    <storage_domain id="456"/>
  </action>

2. Via the old way of putting the Master SD to maintenance, and waiting till some other SD will take the master role.

When with a help of first way, we re-assign the 'master' role to another SD of the same storage type (GlusterFS), we are failing to do so.
The engine imitates the 'SwitchMasterStorageDomainCommand' [1]
Apparently it fails to complete it [2]
And triggers the reconstruction of Storage domains [3]
Then, a chain of ERRORs triggered on the engine [4] and SPM host [7]
You will notice that at this stage, the SPM host will have an orphaned task [5] and the reconstruction wont end till we clear this task [6].
After the engine completes to reconstruct the SDs, there is still an orphaned async task in the engine's DB [8]

The same issue can be reproduced with a help of 2nd way as well, but it requires some additional effort to hit this issue again by activating and deactivating the domains of the GlusterFs type.


[1]:
2020-12-30 10:23:28,962+02 INFO  [org.ovirt.engine.core.bll.storage.pool.SwitchMasterStorageDomainCommand] (default task-35)
 [d4c45b3d-6d41-473a-b5c5-8a1b2e2c7aeb] Lock Acquired to object 'EngineLock:{exclusiveLocks='[7b7f56b4-3957-41d2-9c27-da539d
6af836=STORAGE, 2a0d3c24-3357-4677-b9b2-35486af464a3=STORAGE, dfad0b2f-c1e9-4a0d-9ede-30414f6bee36=POOL]', sharedLocks=''}'
2020-12-30 10:23:28,970+02 INFO  [org.ovirt.engine.core.bll.storage.pool.SwitchMasterStorageDomainCommand] (default task-35)
 [d4c45b3d-6d41-473a-b5c5-8a1b2e2c7aeb] Running command: SwitchMasterStorageDomainCommand internal: false. Entities affected
 :  ID: dfad0b2f-c1e9-4a0d-9ede-30414f6bee36 Type: StoragePoolAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
2020-12-30 10:23:28,970+02 INFO  [org.ovirt.engine.core.bll.storage.pool.SwitchMasterStorageDomainCommand] (default task-35)
 [d4c45b3d-6d41-473a-b5c5-8a1b2e2c7aeb] Locking the following storage domains: test_gluster_0, test_gluster_1


[2]:
2020-12-30 10:23:32,305+02 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyImpl] (EE-ManagedThreadFactory-engine-Thread-63727) [] Master domain is not in sync between DB and VDSM. Domain test_gluster_1 marked as master in DB and not in the storage
2020-12-30 10:23:32,312+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-63727) [] EVENT_ID: SYSTEM_MASTER_DOMAIN_NOT_IN_SYNC(990), Sync Error on Master Domain between Host host_mixed_1 and oVirt Engine. Domain: test_gluster_1 is marked as Master in oVirt Engine database but not on the Storage side. Please consult with Support on how to fix this issue.


[3]:
2020-12-30 10:23:32,330+02 INFO  [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] Running command: ReconstructMasterDomainCommand internal: true. Entities affected :  ID: 2a0d3c24-3357-4677-b9b2-35486af464a3 Type: Storage


[4]:
2020-12-30 10:23:32,392+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] Failed in 'HSMGetAllTasksStatusesVDS' method
2020-12-30 10:23:32,397+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: value=(1, 0, b'', b'') abortedcode=100
2020-12-30 10:23:32,398+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_1', pool id 'dfad0b2f-c1e9-4a0d-9ede-30414f6bee36' as there are uncleared tasks 'Task '29240337-9e23-4500-ac1a-abe495ae2b68', status 'finished''
2020-12-30 10:23:32,402+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-63727) [541ee6e7] EVENT_ID: VDS_ALERT_NOT_STOPPING_SPM_UNCLEARED_TASKS(9,030), Not stopping SPM on vds host_mixed_1, pool id dfad0b2f-c1e9-4a0d-9ede-30414f6bee36 as there are uncleared tasks Task '29240337-9e23-4500-ac1a-abe495ae2b68', status 'finished'


[5]:
[root@storage-ge5-vdsm1 ~]# vdsm-client Host getAllTasksStatuses
{
    "63855d34-1358-4cc8-b44d-363616ed42d7": {
        "code": 100,
        "message": "value=(1, 0, b'', b'') abortedcode=100",
        "taskID": "63855d34-1358-4cc8-b44d-363616ed42d7",
        "taskResult": "cleanSuccess",
        "taskState": "finished"
    }
}


[6]:
[root@storage-ge5-vdsm1 ~]# vdsm-client Task clear taskID=63855d34-1358-4cc8-b44d-363616ed42d7
true


[7]:
2020-12-30 10:23:31,468+0200 ERROR (tasks/2) [storage.StoragePool] Migration to new master 2a0d3c24-3357-4677-b9b2-35486af464a3 failed (sp:903)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 891, in masterMigrate
    exclude=('./lost+found',))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 71, in tarCopy
    raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err)
vdsm.storage.fileUtils.TarCopyFailed: (1, 0, b'', b'')


[8]:
engine=# select * from async_tasks;
-[ RECORD 1 ]---+-------------------------------------
task_id         | 6f8549f0-b62a-4c15-86d3-3cbafa8d7b5d
action_type     | 1048
status          | 2
result          | 0
step_id         | aa47a5b8-61bf-4bba-8e41-f7a2510e082f
command_id      | 1ae62c43-39c3-476f-a23b-e99ee613d65f
started_at      | 2020-12-30 10:23:29.42+02
storage_pool_id | dfad0b2f-c1e9-4a0d-9ede-30414f6bee36
task_type       | 19
vdsm_task_id    | 29240337-9e23-4500-ac1a-abe495ae2b68
root_command_id | 1ae62c43-39c3-476f-a23b-e99ee613d65f
user_id         | d0f3cb14-48fd-11eb-bf93-001a4a231747





Version-Release number of selected component (if applicable):
rhv-release-4.4.4-6


How reproducible:
1st way always reproduces the issue
2nd way requires some additional tries


Steps to Reproduce:
1. Have an env with more than one GlusterFS SDs.
2. Assign master from A glusterfs to B glusterfs
3. If you havent seen the issue at this point, just assign master to another SD of the same GlusterFS type.

Actual results:
Assigning the new master SD fails
Orphaned tasks in engine DB and SPM host leads to infinite (or long) reconstruction of the SDs

Expected results:
Should re-assign the master role to another SD

Additional info:
Attaching engine log, spm log and rest of the vdsm hosts logs.

--- Additional comment from Ilan Zuckerman on 2020-12-30 11:25:11 IST ---

Looks like the problem related to the 2nd way of moving the master role (with maintenance) appeared in the past: 
https://bugzilla.redhat.com/show_bug.cgi?id=1360456, https://bugzilla.redhat.com/show_bug.cgi?id=1298724

--- Additional comment from Tal Nisan on 2020-12-30 20:07:40 IST ---

Shir, please share you recent findings regardings

--- Additional comment from Evelina Shames on 2020-12-31 16:04:00 IST ---

(In reply to Tal Nisan from comment #2)
> Shir, please share you recent findings regardings

Reproduce on 4.3 as well:
ovirt-engine-4.3.11.4-0.1.el7.noarch
vdsm-4.30.50-1.el7ev.x86_64

--- Additional comment from Ilan Zuckerman on 2021-01-07 08:14:47 IST ---

Hi Shani, can you please explain Why is this on QA?
Was the issue described in the description of this bug fixed?
I can only find a patch which adds a validation for avoiding switching the master storage domain from/to gluster domains. So there is no a real fix here if i understand correctly. Just a WA.

--- Additional comment from shani on 2021-01-07 15:14:31 IST ---

Maybe it was moved automatically to on_qa after merging the validation patch.
Moving back to ASSIGNED as the issue is still under investigation of the gluster team.

Comment 4 shani 2021-02-02 14:49:18 UTC

Hi,
I don't have an answer to your question,
but for some reason, I tried again today to put the gluster domain into maintenance and I couldn't,
although the service is running on the host.

Ritesh, can you please have a look?
Shir, was something changed on the host?

Comment 5 Shir Fishbain 2021-02-08 14:17:31 UTC

Shani and I talked about this issue privetly. I didn't change nothing on our environment, the solution that coming from Gluster team - installing the gluster packages on our SPM host wasn't resolved the issue. 
The installion of gluster packages solution was resolved the issue just for a moment on our environment and when we checked it again, we saw that the bug was reproduced again. It's not seems like an ideal solution.

We are still waiting for response from Shani and Gluster team, what is the other solution that the Gluster team offer? instead of installing the packages that wasn't resolved the issue that desribed on this bug?
I can provide another environment of 4.4 for investigation if needed.

Comment 6 Ritesh Chikatwar 2021-02-10 04:43:53 UTC

Hello Shir,


When i checked on the host gluster service is not running so i just gave a try installing and it did worked at the time. so i thought like that's the solution.

Later when you guys tested it did not worked surely it's different issue then i have dive deep into it. 

Can you share me the details of remote server where gluster service is running. 

As of now i don't have answer to your question why it's failing so i will keep needinfo on me.

Comment 7 Shir Fishbain 2021-02-10 08:33:23 UTC

(In reply to Ritesh Chikatwar from comment #6)
> Hello Shir,
> 
> 
> When i checked on the host gluster service is not running so i just gave a
> try installing and it did worked at the time. so i thought like that's the
> solution.
> 
> Later when you guys tested it did not worked surely it's different issue
> then i have dive deep into it. 
> 
> Can you share me the details of remote server where gluster service is
> running. 
> 
> As of now i don't have answer to your question why it's failing so i will
> keep needinfo on me.

Of courcse, gave you the details via Gchat

Comment 8 Ritesh Chikatwar 2021-03-03 10:19:33 UTC

Moving this to 4.4.6 because i am still not able to find RCA for this.

Comment 9 Ritesh Chikatwar 2021-03-09 09:25:18 UTC

Created attachment 1761947 [details]
tested switch domain in local working as expected

Comment 10 Ritesh Chikatwar 2021-03-09 09:39:01 UTC

Hello Shir, 

I created a setup in local to test this it was working as expected. i attached video of how i tested this. 
Shir can you provide me an environment where i can investigate this, the details earlier which you shared if still exist let me know i can use that or not.

Packages details:
Software Version:4.4.5.8_master
glusterfs-8.3-1.el8.x86_64


Shani,
As per Nir, 
TarCopyFailed error is part of deactivate storage domain flow. Switching master domain is not this flow, and should not include deactivating of the old storage domain.

When i checked this i can see for a time being all the other gluster are locked for time being and then they seems to be up. This should not happen i guess. Have you observed this behavior.

Comment 11 shani 2021-03-09 13:43:18 UTC

Hi Ritesh,
This one is very weird, as the switch master operation should be blocked for GlusterFS domains (bug https://bugzilla.redhat.com/1911597):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>[Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]</detail>
    <reason>Operation Failed</reason>
</fault>

The operation should be also grayed out on the UI for Gluster domains (Storage Domains > select a domain > 3 dots menu on the right > Set as master storage domain).

Can you try to re-build your env and try again?

Once the switchMaster operation is blocked, in order to force calling 'master migration', we need to put into maintenance the current master domain and verify the new domain was elected, without any errors on the VDSM log.
Can you please give it a try as well?

Comment 12 Ritesh Chikatwar 2021-03-10 06:10:44 UTC

Shani,

Sure will try and update you,

(In reply to shani from comment #11)
> Hi Ritesh,
> This one is very weird, as the switch master operation should be blocked for
> GlusterFS domains (bug https://bugzilla.redhat.com/1911597):
> 
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <fault>
>     <detail>[Cannot switch master storage domain. Switch master storage
> domain is not supported for gluster-based domains.]</detail>
>     <reason>Operation Failed</reason>
> </fault>
> 
> The operation should be also grayed out on the UI for Gluster domains
> (Storage Domains > select a domain > 3 dots menu on the right > Set as
> master storage domain).
> 
Yeah it did blocked me so locally i commented out the conndition and tried to switch master domain.

> Can you try to re-build your env and try again?
> 
> Once the switchMaster operation is blocked, in order to force calling
> 'master migration', we need to put into maintenance the current master
> domain and verify the new domain was elected, without any errors on the VDSM
> log.
> Can you please give it a try as well?
Let me give a try this from UI.

Will keep needinfo on me as i have to verify this from UI.

Comment 13 Ritesh Chikatwar 2021-03-10 10:26:22 UTC

Hello Shani,


I did tried from the UI it seems it's working. i am attaching the video for referance.

Comment 14 Ritesh Chikatwar 2021-03-10 10:27:45 UTC

Created attachment 1762250 [details]
Tested from UI

Comment 15 shani 2021-03-10 15:51:53 UTC

Hi Ritesh,
I agree, looks like it is working on your env.
It's still needed to figure out what's the difference between your env and the QE's.

Here are some thoughts:
- different versions of gluster/any related component
- maybe the fact that the QE are using a remote gluster server affects the scenario
- why there's a failure of TarCopy on the QE env logs

Once the QE will have an env, those can be tested, but maybe some research can be done by looking at the code.
What do you think?

Comment 16 Ritesh Chikatwar 2021-03-11 15:33:59 UTC

Yeah let me take a look on code till we get the setup.

Comment 17 Eyal Shenitzky 2021-04-19 05:39:51 UTC

(In reply to Ritesh Chikatwar from comment #16)
> Yeah let me take a look on code till we get the setup.

Do we have any progress with the bug?

Comment 18 Ritesh Chikatwar 2021-04-19 05:56:18 UTC

I have not found any suspicious in the code, may be environment will help to find the exact root cause. Can you provide me a fresh setup and i can dig more into the setup.

Comment 19 Eyal Shenitzky 2021-04-19 06:09:06 UTC

Shani/Shir, can you please assist?

Comment 21 sshmulev 2021-04-28 13:01:56 UTC

This bug reproduced on both UI and Rest API by trying to switch the master SD from Gluser SD to Gluster SD.
The operation fails on both cases.
The reconstruction happens via the UI in the second try to reproduce the issue.

[1]
Rest API:
 https:<hostname>/ovirt-engine/api/datacenters/<dtacenterID>/setmaster
   <action>
    <storage_domain id="<storage_domain>"/>
  </action>
response:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>[Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]</detail>
    <reason>Operation Failed</reason>
</fault>
 
Engine log after rest API request: 
2021-04-28 15:25:52,550+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-191) [] Operation Failed: [Cannot switch master storage domain. Switch master storage domain is not supported for gluster-based domains.]
 
[2]
On UI:
 1. Have an env with more than one GlusterFS SDs(3 Gulster SDs).
 2. Assign master from A glusterfs to B glusterfs
2021-04-28 15:28:57,269+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] Failed in 'DeactivateStorageDomainVDS' method

2021-04-28 15:28:57,283+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command DeactivateStorageDomainVDS failed: Error in storage domain action: ('sdUUID=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, spUUID=bde80e57-4ece-44bd-b884-46099c442133, msdUUID=ad26e577-0099-43f2-b43a-5e3c3eeee4d1, masterVersion=5',)

2021-04-28 15:28:57,332+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [50024de0] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: ('sdUUID=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, spUUID=bde80e57-4ece-44bd-b884-46099c442133, msdUUID=ad26e577-0099-43f2-b43a-5e3c3eeee4d1, masterVersion=5',), code = 350

2021-04-28 15:29:03,279+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command ConnectStoragePoolVDS failed: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133'

2021-04-28 15:29:03,344+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] Command 'ConnectStoragePoolVDSCommand(HostName = host_mixed_2, ConnectStoragePoolVDSCommandParameters:{hostId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91', vdsId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91', storagePoolId='bde80e57-4ece-44bd-b884-46099c442133', masterVersion='5'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133'

2021-04-28 15:29:03,344+03 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] IrsBroker::Failed::DeactivateStorageDomainVDS: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133'

2021-04-28 15:29:03,375+03 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] DeactivateStorageDomainVDS failed 'e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd': org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: 'SD=e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd, pool=bde80e57-4ece-44bd-b884-46099c442133' (Failed with error StoragePoolWrongMaster and code 324)

2021-04-28 15:29:03,379+03 ERROR [org.ovirt.engine.core.bll.storage.domain.DeactivateStorageDomainCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] Failed to deactivate storage domain 'e3c9d7a4-a53f-4fb8-a9a2-2c74c693e6bd'

2021-04-28 15:29:03,438+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-65) [] Command 'HSMGetAllTasksStatusesVDSCommand(HostName = host_mixed_2, VdsIdVDSCommandParametersBase:{hostId='bdb5f6cd-f0fb-4352-bd0d-25856b633a91'})' execution failed: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM: ()

2021-04-28 15:29:04,255+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-86) [7644af65] EVENT_ID: USER_DEACTIVATE_STORAGE_DOMAIN_FAILED(969), Failed to deactivate Storage Domain test_gluster_2 (Data Center golden_env_mixed).

versions:
ovirt-engine-4.4.6.5-447.gd80dda7.9.el8ev.noarch
vdsm-4.40.60.4-1.el8ev.x86_64
glusterfs-6.0-49.1.el8.x86_64

Comment 22 sshmulev 2021-04-28 13:03:21 UTC

Created attachment 1776597 [details]
Logs

Comment 23 Eyal Shenitzky 2021-05-04 05:36:44 UTC

*** Bug 1956263 has been marked as a duplicate of this bug. ***

Comment 31 Nir Soffer 2021-06-02 00:48:45 UTC

(In reply to Ritesh Chikatwar from comment #26)
> Regarding TarCopyFailed exception.
> 
> this is coming from this line 
> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/fileUtils.py#L59
> it's returning return code as 1,  this is because tar is running on dynamic
> directory because of this it's throwing that error. 
> i tried running that command on one of my setup to see what i get 
> 
> [root@rhsqa-grafton1-nic2 ~]# /usr/bin/tar cf -f   --exclude=./lost+found -C
> /rhev/data-center/mnt/glusterSD/<hostname>:_vmstore .
> /usr/bin/tar:
> ./e5a79957-bb2b-414f-bf6f-cbb2a4b98d0b/images/0b63e72a-ec4a-43e1-ba75-
> a480c58b54c9: file changed as we read it

The issue is that some file in the master mount changed during the copy.

This is a real issue in vdsm. It must use a lock to prevent tasks from
writing task state when switching master storage domain from one domain
to the other.

Normally every task writes to its own directory, so multiple tasks
can write together without any locking. But when we copy the entire
directory we need to lock the entire master mount while we copy.

Moving the bug to vdsm, this is not related to engine or gluster.

Comment 33 Nir Soffer 2021-09-01 20:22:16 UTC

Evelina, do we know how to reproduce this bug?

If we have a good way to reproduce this, it will help if we do early
testing before we merge this.

The fix is also risky, adding locking in very dark corners of vdsm, which
may cause deadlocks. I would like to make sure all QE tiers works with the
fix.

Comment 35 Nir Soffer 2021-09-13 21:38:13 UTC

Reproduced with NFS storage domain with a patch improving error reporting
in tarCopy:

2021-09-14 00:33:44,369+0300 ERROR (tasks/1) [storage.taskmanager.task] (Task='928b287c-fdd0-487f-bec5-f98d63b63ba2') Unexpect
ed error (task:877)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 884, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 989, in switchMaster
    self.masterMigrate(oldMasterUUID, newMasterUUID, masterVersion)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate
    exclude=('./lost+found',))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy
    raise se.TarCommandError(errors)
vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+
found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/u
sr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde
-9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},)

The issue are:
- task directory changed during copy
- temporary task directory removed while reading

Comment 36 Nir Soffer 2021-09-13 21:40:18 UTC

Created attachment 1822863 [details]
vdsm.log for reproduction on NFS storage domain

Comment 37 Nir Soffer 2021-09-13 21:51:39 UTC

Log during tarCopy failure:


Starting tarCopy:

2021-09-14 00:33:44,305+0300 INFO  (tasks/1) [storage.fileutils] Copying '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97
c-497c-85a2-2f044993bc2e/master' to '/rhev/data-center/mnt/sparse:_export_01/0c065db4-feca-4179-801b-b51be9499943/master' (fil
eUtils:55)

Staring reader:

2021-09-14 00:33:44,305+0300 DEBUG (tasks/1) [storage.fileutils] Starting reader: ['/usr/bin/tar', 'cf', '-', '--exclude=./los
t+found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'] (fileUtils:62)

While copying masster filesystem, tasks modifying /tasks/ directory:

Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 creates temporary directory:

2021-09-14 00:33:44,326+0300 DEBUG (jsonrpc/1) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781)

Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 modifies temporary directory:

2021-09-14 00:33:44,335+0300 DEBUG (jsonrpc/1) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781)

Task 1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 modifies temporary directory: 

2021-09-14 00:33:44,347+0300 DEBUG (tasks/5) [storage.taskmanager.task] (Task='1b1ef2f3-6e2b-4dde-9e07-f720569ccd76') _save: orig /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76 temp /rhev/data-center/3f8e08e6-5485-429e-a686-63754906b27b/mastersd/master/tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp (task:781)

tarCopy ends:

2021-09-14 00:33:44,365+0300 DEBUG (tasks/1) [storage.fileutils] Child processes terminated (fileUtils:79)
2021-09-14 00:33:44,365+0300 ERROR (tasks/1) [storage.storagepool] Migration to new master 0c065db4-feca-4179-801b-b51be9499943 failed (sp:926)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 914, in masterMigrate
    exclude=('./lost+found',))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileUtils.py", line 101, in tarCopy
    raise se.TarCommandError(errors)
vdsm.storage.exception.TarCommandError: Tar command failed: ({'reader': {'cmd': ['/usr/bin/tar', 'cf', '-', '--exclude=./lost+found', '-C', '/rhev/data-center/mnt/sparse:_export_00/60edfd87-a97c-497c-85a2-2f044993bc2e/master', '.'], 'rc': 1, 'err': '/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76: file changed as we read it\n/usr/bin/tar: ./tasks/1b1ef2f3-6e2b-4dde-9e07-f720569ccd76.temp: File removed before we read it\n/usr/bin/tar: ./tasks: file changed as we read it\n'}},)

tarCopy failed because the task directory was modified while reading it, and the task
temporary directory was deleted (part of task saving flow).

So we to make sure tasks cannot modify storage (write data, create/remove directories/files)
during tarCopy.

Comment 38 Nir Soffer 2021-09-13 21:58:37 UTC

Created attachment 1822866 [details]
reproduction log

Log showing how to reproduce this issue using new change_master.py SDK example script.

Comment 39 Nir Soffer 2021-09-14 19:47:34 UTC

To reproduce the issue, you can use change_master.py example script:
https://github.com/oVirt/python-ovirt-engine-sdk4/pull/12

1. Create 10 1g empty preallocated disks

2. Start switching master loop

$ for i in $(seq 100); do
    ./change_master.py -c engine-dev dc1 sd-1
    ./change_master.py -c engine-dev dc1 sd-2
done

3. Move all disks between 2 other storage domains (e.g. sd-3, sd-4)

Using the same storage domains used as master will not reproduce the
issue since disk copy takes a lock on the storage domain, and this
will block switch master operation during the copy.

When starting a move, the system create 10 SPM tasks. When ending the move,
the system creates 10 SPM tasks to delete the original disks.

When all disks moved, move them back to the original domain. Repeat while
switch master is running.

This step should be automated.


Expected results:

Without the fix, switch master may fail after many iterations (see comment 35).
With the fix, switch master should complete successfully.

In the failing test (attachment 1822866 [details]), it took 111 change_master.py runs to fail.

Notes:

It is expected to see these messages:

    Setting storage domain sparse-nfs-00 as master
    Operation failed: [Cannot switch master storage domain. The storage pool has running tasks.]
    Trying again in 10 seconds...

This means that SPM tasks are running, and change_master will retry the operation.


It may be easier to reproduce with Gluster storage, but for this we must
have engine version that does not block switch master operation for Gluster.

Comment 40 Nir Soffer 2021-09-14 20:23:02 UTC

Ritesh, if you have Gluster environment available, and you know how to 
reproduce this quickly, it will help to reproduce again with current
vdsm master, including this commit:

commit 3cad5b9237ce2030861132173d4fc6bb9782fc08

    fileUtils: Fix error handling and logging in tarCopy

This commit improves the error logs when tarCopy fail, so we will have
better understanding why it fails in Gluster. You will get detailed exception
like the one in comment 35 (tested on NFS).

You should be able to reproduce this by putting master storage domain to
maintenance as described in comment 0 section 2.

Comment 41 Nir Soffer 2021-09-19 19:36:07 UTC

Ritesh, we can also test the fix on Gluster with these repos:
https://jenkins.ovirt.org/job/ovirt-engine_standard-check-patch/14264/artifact/build-artifacts.el8stream.x86_64/
https://jenkins.ovirt.org/job/vdsm_standard-check-patch/29778/artifact/build-artifacts.build-py3.el8stream.x86_64/

Comment 42 Ritesh Chikatwar 2021-09-22 06:30:36 UTC

@nsoffer as of now i don't have gluster environment available, i have to create one and test this.

i will update the status with this change.

Comment 43 Ritesh Chikatwar 2021-09-27 09:25:23 UTC

Hey Nir,

I tried reproducing this issue on my local setup but i couldn't looks like your change is working.

If you are interested in doing some of the testing i can provide the env please ping me over chat.

Comment 44 Nir Soffer 2021-09-30 13:35:18 UTC

(In reply to Ritesh Chikatwar from comment #43)
> I tried reproducing this issue on my local setup but i couldn't looks like
> your change is working.

Did you reproduce first with current master? Otherwise we cannot tell if the
change fixed the issue, or you just did not reproduce the issue because of
another reason.

You can downgrade vdsm to use later release, and run the flow that reproduced
this issue in the past.

The change_master.py example should make it easy to reproduce, see comment 39.

Comment 45 Nir Soffer 2021-10-10 18:51:15 UTC

We reproduced the issue with gluster with the vdsm locking fix:
https://gerrit.ovirt.org/c/vdsm/+/116413

and with additional fix syncing the master fs before coping it:
https://gerrit.ovirt.org/c/vdsm/+/117047

Both fixes do not help, copying the master filesystem with tar always
fails on gluster with:

    /usr/bin/tar: .: file changed as we read it

This looks exactly like bug 1104618, but this bug seems to be fixed
in glusterfs-3.6.0.44-1, while we use

# rpm -q glusterfs
glusterfs-9.4-1.el8s.x86_64

So this looks like a glsuter regression.

I could reproduce this issue without RHV:

# cat reproduce.sh 
set -e

mp=${1:?Usage: sh reproduce.sh MOUNTPOINT}
task=`uuidgen`

mkdir $mp/test
mkdir $mp/test/tasks
mkdir $mp/test/tasks/$task
echo "task data" > $mp/test/tasks/$task/$task.job.0

sync --file-system $mp/test

tar cf test.tar -C $mp/test .

Reproducing example:

# mount | grep gluster
gluster01.lab.eng.tlv2.redhat.com:/GE_storage7_volume01 on /rhev/data-center/mnt/glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__storage7__volume01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

# echo "testing tar copy" >> /var/log/glusterfs/rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com\:_GE__storage7__volume01.log

# for i in $(seq 10); do rm -rf $M/test; echo iteration $i; sh reproduce.sh $M; echo '----'; done
iteration 1
----
iteration 2
tar: ./tasks/fd4c4d2c-5d55-41a8-b499-ade2172f7442: file changed as we read it
tar: ./tasks: file changed as we read it
----
iteration 3
tar: ./tasks/b23d7fc2-5e7c-408e-9c65-e0cc27dc33d0: file changed as we read it
tar: ./tasks: file changed as we read it
----
iteration 4
tar: ./tasks: file changed as we read it
----
iteration 5
----
iteration 6
tar: ./tasks: file changed as we read it
tar: .: file changed as we read it
----
iteration 7
tar: ./tasks/dd042dbd-8569-4257-b2c0-7a7b613ecd6b: file changed as we read it
tar: ./tasks: file changed as we read it
tar: .: file changed as we read it
----
iteration 8
tar: ./tasks: file changed as we read it
tar: .: file changed as we read it
----
iteration 9
tar: ./tasks/40790646-c689-42f4-8ae6-6ca66c4556f2: file changed as we read it
tar: ./tasks: file changed as we read it
----
iteration 10
tar: ./tasks: file changed as we read it
----

There are no error in glusterfs logs:

# tail -1 /var/log/glusterfs/rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com\:_GE__storage7__volume01.log
testing tar copy

We should move this bug to gluster, or create new gluster bug, but I don't have
the info needed to open the bug (packages versions, volume info, etc).

Gobinda, who can help with this?

Comment 46 Nir Soffer 2021-10-10 19:14:45 UTC

Move info I could get from:
from one of the gluster servers:

# rpm -qa | grep gluster
glusterfs-libs-6.0-37.el8rhgs.x86_64
glusterfs-6.0-37.el8rhgs.x86_64
glusterfs-server-6.0-37.el8rhgs.x86_64
glusterfs-client-xlators-6.0-37.el8rhgs.x86_64
glusterfs-fuse-6.0-37.el8rhgs.x86_64
glusterfs-api-6.0-37.el8rhgs.x86_64
glusterfs-cli-6.0-37.el8rhgs.x86_64
python3-gluster-6.0-37.el8rhgs.x86_64
glusterfs-geo-replication-6.0-37.el8rhgs.x86_64

# gluster v info GE_storage9_volume01
 
Volume Name: GE_storage9_volume01
Type: Replicate
Volume ID: 39feddf0-a521-4977-85bd-d5d2d5d19c65
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster01.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01
Brick2: gluster02.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01
Brick3: gluster03.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage9_volume01
Options Reconfigured:
storage.owner-uid: 36
storage.owner-gid: 36
transport.address-family: inet6
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

# gluster v status GE_storage9_volume01
Status of volume: GE_storage9_volume01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster01.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage9_volume01          49368     0          Y       3739 
Brick gluster02.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage9_volume01          49368     0          Y       3800 
Brick gluster03.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage9_volume01          49368     0          Y       6991 
Self-heal Daemon on localhost               N/A       N/A        Y       3186417
Self-heal Daemon on gluster03.lab.eng.tlv2.
redhat.com                                  N/A       N/A        Y       3183458
Self-heal Daemon on gluster02.lab.eng.tlv2.
redhat.com                                  N/A       N/A        Y       2389346
 
Task Status of Volume GE_storage9_volume01
------------------------------------------------------------------------------
There are no active volume tasks

# gluster v get GE_storage9_volume01 all
Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.lookup-optimize                 on                                      
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                off                                     
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.rebal-throttle                  normal                                  
cluster.lock-migration                  off                                     
cluster.force-migration                 off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                       
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              off                                     
cluster.data-self-heal                  off                                     
cluster.entry-self-heal                 off                                     
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        (null)                                  
cluster.eager-lock                      on                                      
disperse.eager-lock                     on                                      
disperse.other-eager-lock               on                                      
disperse.eager-lock-timeout             1                                       
disperse.other-eager-lock-timeout       1                                       
cluster.quorum-type                     auto                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.consistent-metadata             no                                      
cluster.heal-wait-queue-length          128                                     
cluster.favorite-child-policy           none                                    
cluster.full-lock                       yes                                     
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             INFO                                    
diagnostics.client-log-level            INFO                                    
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
diagnostics.stats-dump-interval         0                                       
diagnostics.fop-sample-interval         0                                       
diagnostics.stats-dump-format           json                                    
diagnostics.fop-sample-buf-size         65535                                   
diagnostics.stats-dnscache-ttl-sec      86400                                   
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             16                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            16                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.iot-watchdog-secs           (null)                                  
performance.iot-cleanup-disconnected-reqsoff                                     
performance.iot-pass-through            false                                   
performance.io-cache-pass-through       false                                   
performance.cache-size                  128MB                                   
performance.qr-cache-timeout            1                                       
performance.quick-read-cache-invalidationfalse                                   
performance.ctime-invalidation          false                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.resync-failed-syncs-after-fsyncoff                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.write-behind-trickling-writeson                                      
performance.aggregate-size              128KB                                   
performance.nfs.write-behind-trickling-writeson                                      
performance.lazy-open                   yes                                     
performance.read-after-open             yes                                     
performance.open-behind-pass-through    false                                   
performance.read-ahead-page-count       4                                       
performance.read-ahead-pass-through     false                                   
performance.readdir-ahead-pass-through  false                                   
performance.md-cache-pass-through       false                                   
performance.md-cache-timeout            1                                       
performance.cache-swift-metadata        true                                    
performance.cache-samba-metadata        false                                   
performance.cache-capability-xattrs     true                                    
performance.cache-ima-xattrs            true                                    
performance.md-cache-statfs             off                                     
performance.xattr-cache-list                                                    
performance.nl-cache-pass-through       false                                   
features.encryption                     off                                     
network.frame-timeout                   1800                                    
network.ping-timeout                    42                                      
network.tcp-window-size                 (null)                                  
client.ssl                              off                                     
network.remote-dio                      disable                                 
client.event-threads                    2                                       
client.tcp-user-timeout                 0                                       
client.keepalive-time                   20                                      
client.keepalive-interval               2                                       
client.keepalive-count                  9                                       
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 16384                                   
auth.allow                              *                                       
auth.reject                             (null)                                  
transport.keepalive                     1                                       
server.allow-insecure                   on                                      
server.root-squash                      off                                     
server.all-squash                       off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
server.ssl                              off                                     
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
server.dynamic-auth                     on                                      
client.send-gids                        on                                      
server.gid-timeout                      300                                     
server.own-thread                       (null)                                  
server.event-threads                    2                                       
server.tcp-user-timeout                 42                                      
server.keepalive-time                   20                                      
server.keepalive-interval               2                                       
server.keepalive-count                  9                                       
transport.listen-backlog                1024                                    
transport.address-family                inet6                                   
performance.write-behind                on                                      
performance.read-ahead                  off                                     
performance.readdir-ahead               off                                     
performance.io-cache                    off                                     
performance.open-behind                 on                                      
performance.quick-read                  on                                      
performance.nl-cache                    off                                     
performance.stat-prefetch               on                                      
performance.client-io-threads           off                                     
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
performance.cache-invalidation          false                                   
performance.global-cache-invalidation   true                                    
features.uss                            off                                     
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
features.tag-namespaces                 off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              off                                     
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          off                                     
features.inode-quota                    off                                     
features.bitrot                         disable                                 
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.enable-ino32                        no                                      
nfs.mem-factor                          15                                      
nfs.export-dirs                         on                                      
nfs.export-volumes                      on                                      
nfs.addr-namelookup                     off                                     
nfs.dynamic-volumes                     off                                     
nfs.register-with-portmap               on                                      
nfs.outstanding-rpc-limit               16                                      
nfs.port                                2049                                    
nfs.rpc-auth-unix                       on                                      
nfs.rpc-auth-null                       on                                      
nfs.rpc-auth-allow                      all                                     
nfs.rpc-auth-reject                     none                                    
nfs.ports-insecure                      off                                     
nfs.trusted-sync                        off                                     
nfs.trusted-write                       off                                     
nfs.volume-access                       read-write                              
nfs.export-dir                                                                  
nfs.disable                             on                                      
nfs.nlm                                 on                                      
nfs.acl                                 on                                      
nfs.mount-udp                           off                                     
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab             
nfs.rpc-statd                           /sbin/rpc.statd                         
nfs.server-aux-gids                     off                                     
nfs.drc                                 off                                     
nfs.drc-size                            0x20000                                 
nfs.read-size                           (1 * 1048576ULL)                        
nfs.write-size                          (1 * 1048576ULL)                        
nfs.readdir-size                        (1 * 1048576ULL)                        
nfs.rdirplus                            on                                      
nfs.event-threads                       2                                       
nfs.exports-auth-enable                 (null)                                  
nfs.auth-refresh-interval-sec           (null)                                  
nfs.auth-cache-ttl-sec                  (null)                                  
features.read-only                      off                                     
features.worm                           off                                     
features.worm-file-level                off                                     
features.worm-files-deletable           on                                      
features.default-retention-period       120                                     
features.retention-mode                 relax                                   
features.auto-commit-period             180                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       36                                      
storage.owner-gid                       36                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.gfid2path                       on                                      
storage.gfid2path-separator             :                                       
storage.reserve                         1                                       
storage.health-check-timeout            20                                      
storage.fips-mode-rchecksum             on                                      
storage.force-create-mode               0000                                    
storage.force-directory-mode            0000                                    
storage.create-mask                     0777                                    
storage.create-directory-mask           0777                                    
storage.max-hardlinks                   100                                     
features.ctime                          off                                     
config.gfproxyd                         off                                     
cluster.server-quorum-type              off                                     
cluster.server-quorum-ratio             51                                      
changelog.changelog                     off                                     
changelog.changelog-dir                 {{ brick.path }}/.glusterfs/changelogs  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
changelog.capture-del-path              off                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
features.trash                          off                                     
features.trash-dir                      .trashcan                               
features.trash-eliminate-path           (null)                                  
features.trash-max-filesize             5MB                                     
features.trash-internal-op              off                                     
cluster.enable-shared-storage           disable                                 
cluster.write-freq-threshold            0                                       
cluster.read-freq-threshold             0                                       
cluster.tier-pause                      off                                     
cluster.tier-promote-frequency          120                                     
cluster.tier-demote-frequency           3600                                    
cluster.watermark-hi                    90                                      
cluster.watermark-low                   75                                      
cluster.tier-mode                       cache                                   
cluster.tier-max-promote-file-size      0                                       
cluster.tier-max-mb                     4000                                    
cluster.tier-max-files                  10000                                   
cluster.tier-query-limit                100                                     
cluster.tier-compact                    on                                      
cluster.tier-hot-compact-frequency      604800                                  
cluster.tier-cold-compact-frequency     604800                                  
features.ctr-enabled                    off                                     
features.record-counters                off                                     
features.ctr-record-metadata-heat       off                                     
features.ctr_link_consistency           off                                     
features.ctr_lookupheal_link_timeout    300                                     
features.ctr_lookupheal_inode_timeout   300                                     
features.ctr-sql-db-cachesize           12500                                   
features.ctr-sql-db-wal-autocheckpoint  25000                                   
locks.trace                             off                                     
locks.mandatory-locking                 off                                     
cluster.disperse-self-heal-daemon       enable                                  
cluster.quorum-reads                    no                                      
client.bind-insecure                    (null)                                  
features.shard                          off                                     
features.shard-block-size               64MB                                    
features.shard-lru-limit                16384                                   
features.shard-deletion-rate            100                                     
features.scrub-throttle                 lazy                                    
features.scrub-freq                     biweekly                                
features.scrub                          false                                   
features.expiry-time                    120                                     
features.cache-invalidation             off                                     
features.cache-invalidation-timeout     60                                      
ganesha.enable                          off                                     
features.leases                         off                                     
features.lease-lock-recall-timeout      60                                      
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
dht.force-readdirp                      on                                      
disperse.read-policy                    gfid-hash                               
cluster.shd-max-threads                 1                                       
cluster.shd-wait-qlength                1024                                    
cluster.locking-scheme                  full                                    
cluster.granular-entry-heal             no                                      
features.locks-revocation-secs          0                                       
features.locks-revocation-clear-all     false                                   
features.locks-revocation-max-blocked   0                                       
features.locks-monkey-unlocking         false                                   
features.locks-notify-contention        yes                                     
features.locks-notify-contention-delay  5                                       
disperse.shd-max-threads                1                                       
disperse.shd-wait-qlength               1024                                    
disperse.cpu-extensions                 auto                                    
disperse.self-heal-window-size          1                                       
cluster.use-compound-fops               off                                     
performance.parallel-readdir            off                                     
performance.rda-request-size            131072                                  
performance.rda-low-wmark               4096                                    
performance.rda-high-wmark              128KB                                   
performance.rda-cache-limit             10MB                                    
performance.nl-cache-positive-entry     false                                   
performance.nl-cache-limit              10MB                                    
performance.nl-cache-timeout            60                                      
cluster.brick-multiplex                 disable                                 
glusterd.vol_count_per_thread           100                                     
cluster.max-bricks-per-process          250                                     
disperse.optimistic-change-log          on                                      
disperse.stripe-cache                   4                                       
cluster.halo-enabled                    False                                   
cluster.halo-shd-max-latency            99999                                   
cluster.halo-nfsd-max-latency           5                                       
cluster.halo-max-latency                5                                       
cluster.halo-max-replicas               99999                                   
cluster.halo-min-replicas               2                                       
features.selinux                        off                                     
cluster.daemon-log-level                INFO                                    
debug.delay-gen                         off                                     
delay-gen.delay-percentage              10%                                     
delay-gen.delay-duration                100000                                  
delay-gen.enable                                                                
disperse.parallel-writes                on                                      
disperse.quorum-count                   0                                       
features.sdfs                           off                                     
features.cloudsync                      off                                     
features.ctime                          off                                     
ctime.noatime                           on                                      
features.cloudsync-storetype            (null)                                  
features.enforce-mandatory-lock         off                                     
features.cloudsync-remote-read          off                                     
features.cloudsync-store-id             (null)                                  
features.cloudsync-product-id           (null)                                  


Based on bug 1104618, this issue is mitigated by volume option:

    cluster.consistent-metadata on

Currently we have:

# gluster v get GE_storage9_volume01 all | grep cluster.consistent-metadata
cluster.consistent-metadata             no                                      

I tried to enable this option:

# gluster v set GE_storage9_volume01 cluster.consistent-metadata on
volume set: success

But the this does not fix the issue. Maybe we need to unmount and remount to
see the effect?

Comment 47 Nir Soffer 2021-10-10 19:24:03 UTC

Based on https://bugzilla.redhat.com/show_bug.cgi?id=1104618#c16
we need to remount the volume to see the change.

Comment 48 Nir Soffer 2021-10-10 20:08:28 UTC

Correction: I queried the wrong volume before (GE_storage9_volume01)

Here are the details of the right volume:

# gluster v info GE_storage7_volume01
 
Volume Name: GE_storage7_volume01
Type: Replicate
Volume ID: 60a2d971-9833-48b0-86f9-320a65f2f1d0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster01.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01
Brick2: gluster02.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01
Brick3: gluster03.lab.eng.tlv2.redhat.com:/gluster_volumes/GE_storage7_volume01
Options Reconfigured:
storage.owner-uid: 36
storage.owner-gid: 36
transport.address-family: inet6
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off

# gluster v status GE_storage7_volume01
Status of volume: GE_storage7_volume01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster01.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage7_volume01          49354     0          Y       3585 
Brick gluster02.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage7_volume01          49354     0          Y       3646 
Brick gluster03.lab.eng.tlv2.redhat.com:/gl
uster_volumes/GE_storage7_volume01          49354     0          Y       6837 
Self-heal Daemon on localhost               N/A       N/A        Y       3186417
Self-heal Daemon on gluster02.lab.eng.tlv2.
redhat.com                                  N/A       N/A        Y       2389346
Self-heal Daemon on gluster03.lab.eng.tlv2.
redhat.com                                  N/A       N/A        Y       3183458
 
Task Status of Volume GE_storage7_volume01
------------------------------------------------------------------------------
There are no active volume tasks

# gluster v get GE_storage7_volume01 all
Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.lookup-optimize                 on                                      
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                off                                     
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.rebal-throttle                  normal                                  
cluster.lock-migration                  off                                     
cluster.force-migration                 off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                       
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              off                                     
cluster.data-self-heal                  off                                     
cluster.entry-self-heal                 off                                     
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        (null)                                  
cluster.eager-lock                      on                                      
disperse.eager-lock                     on                                      
disperse.other-eager-lock               on                                      
disperse.eager-lock-timeout             1                                       
disperse.other-eager-lock-timeout       1                                       
cluster.quorum-type                     auto                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.consistent-metadata             no                                      
cluster.heal-wait-queue-length          128                                     
cluster.favorite-child-policy           none                                    
cluster.full-lock                       yes                                     
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             INFO                                    
diagnostics.client-log-level            INFO                                    
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
diagnostics.stats-dump-interval         0                                       
diagnostics.fop-sample-interval         0                                       
diagnostics.stats-dump-format           json                                    
diagnostics.fop-sample-buf-size         65535                                   
diagnostics.stats-dnscache-ttl-sec      86400                                   
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             16                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            16                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.iot-watchdog-secs           (null)                                  
performance.iot-cleanup-disconnected-reqsoff                                     
performance.iot-pass-through            false                                   
performance.io-cache-pass-through       false                                   
performance.cache-size                  128MB                                   
performance.qr-cache-timeout            1                                       
performance.quick-read-cache-invalidationfalse                                   
performance.ctime-invalidation          false                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.resync-failed-syncs-after-fsyncoff                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.write-behind-trickling-writeson                                      
performance.aggregate-size              128KB                                   
performance.nfs.write-behind-trickling-writeson                                      
performance.lazy-open                   yes                                     
performance.read-after-open             yes                                     
performance.open-behind-pass-through    false                                   
performance.read-ahead-page-count       4                                       
performance.read-ahead-pass-through     false                                   
performance.readdir-ahead-pass-through  false                                   
performance.md-cache-pass-through       false                                   
performance.md-cache-timeout            1                                       
performance.cache-swift-metadata        true                                    
performance.cache-samba-metadata        false                                   
performance.cache-capability-xattrs     true                                    
performance.cache-ima-xattrs            true                                    
performance.md-cache-statfs             off                                     
performance.xattr-cache-list                                                    
performance.nl-cache-pass-through       false                                   
features.encryption                     off                                     
network.frame-timeout                   1800                                    
network.ping-timeout                    42                                      
network.tcp-window-size                 (null)                                  
client.ssl                              off                                     
network.remote-dio                      disable                                 
client.event-threads                    2                                       
client.tcp-user-timeout                 0                                       
client.keepalive-time                   20                                      
client.keepalive-interval               2                                       
client.keepalive-count                  9                                       
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 16384                                   
auth.allow                              *                                       
auth.reject                             (null)                                  
transport.keepalive                     1                                       
server.allow-insecure                   on                                      
server.root-squash                      off                                     
server.all-squash                       off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
server.ssl                              off                                     
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
server.dynamic-auth                     on                                      
client.send-gids                        on                                      
server.gid-timeout                      300                                     
server.own-thread                       (null)                                  
server.event-threads                    2                                       
server.tcp-user-timeout                 42                                      
server.keepalive-time                   20                                      
server.keepalive-interval               2                                       
server.keepalive-count                  9                                       
transport.listen-backlog                1024                                    
transport.address-family                inet6                                   
performance.write-behind                on                                      
performance.read-ahead                  off                                     
performance.readdir-ahead               off                                     
performance.io-cache                    off                                     
performance.open-behind                 on                                      
performance.quick-read                  on                                      
performance.nl-cache                    off                                     
performance.stat-prefetch               on                                      
performance.client-io-threads           off                                     
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
performance.cache-invalidation          false                                   
performance.global-cache-invalidation   true                                    
features.uss                            off                                     
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
features.tag-namespaces                 off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              off                                     
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          off                                     
features.inode-quota                    off                                     
features.bitrot                         disable                                 
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.enable-ino32                        no                                      
nfs.mem-factor                          15                                      
nfs.export-dirs                         on                                      
nfs.export-volumes                      on                                      
nfs.addr-namelookup                     off                                     
nfs.dynamic-volumes                     off                                     
nfs.register-with-portmap               on                                      
nfs.outstanding-rpc-limit               16                                      
nfs.port                                2049                                    
nfs.rpc-auth-unix                       on                                      
nfs.rpc-auth-null                       on                                      
nfs.rpc-auth-allow                      all                                     
nfs.rpc-auth-reject                     none                                    
nfs.ports-insecure                      off                                     
nfs.trusted-sync                        off                                     
nfs.trusted-write                       off                                     
nfs.volume-access                       read-write                              
nfs.export-dir                                                                  
nfs.disable                             on                                      
nfs.nlm                                 on                                      
nfs.acl                                 on                                      
nfs.mount-udp                           off                                     
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab             
nfs.rpc-statd                           /sbin/rpc.statd                         
nfs.server-aux-gids                     off                                     
nfs.drc                                 off                                     
nfs.drc-size                            0x20000                                 
nfs.read-size                           (1 * 1048576ULL)                        
nfs.write-size                          (1 * 1048576ULL)                        
nfs.readdir-size                        (1 * 1048576ULL)                        
nfs.rdirplus                            on                                      
nfs.event-threads                       2                                       
nfs.exports-auth-enable                 (null)                                  
nfs.auth-refresh-interval-sec           (null)                                  
nfs.auth-cache-ttl-sec                  (null)                                  
features.read-only                      off                                     
features.worm                           off                                     
features.worm-file-level                off                                     
features.worm-files-deletable           on                                      
features.default-retention-period       120                                     
features.retention-mode                 relax                                   
features.auto-commit-period             180                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       36                                      
storage.owner-gid                       36                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.gfid2path                       on                                      
storage.gfid2path-separator             :                                       
storage.reserve                         1                                       
storage.health-check-timeout            20                                      
storage.fips-mode-rchecksum             on                                      
storage.force-create-mode               0000                                    
storage.force-directory-mode            0000                                    
storage.create-mask                     0777                                    
storage.create-directory-mask           0777                                    
storage.max-hardlinks                   100                                     
features.ctime                          off                                     
config.gfproxyd                         off                                     
cluster.server-quorum-type              off                                     
cluster.server-quorum-ratio             51                                      
changelog.changelog                     off                                     
changelog.changelog-dir                 {{ brick.path }}/.glusterfs/changelogs  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
changelog.capture-del-path              off                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
features.trash                          off                                     
features.trash-dir                      .trashcan                               
features.trash-eliminate-path           (null)                                  
features.trash-max-filesize             5MB                                     
features.trash-internal-op              off                                     
cluster.enable-shared-storage           disable                                 
cluster.write-freq-threshold            0                                       
cluster.read-freq-threshold             0                                       
cluster.tier-pause                      off                                     
cluster.tier-promote-frequency          120                                     
cluster.tier-demote-frequency           3600                                    
cluster.watermark-hi                    90                                      
cluster.watermark-low                   75                                      
cluster.tier-mode                       cache                                   
cluster.tier-max-promote-file-size      0                                       
cluster.tier-max-mb                     4000                                    
cluster.tier-max-files                  10000                                   
cluster.tier-query-limit                100                                     
cluster.tier-compact                    on                                      
cluster.tier-hot-compact-frequency      604800                                  
cluster.tier-cold-compact-frequency     604800                                  
features.ctr-enabled                    off                                     
features.record-counters                off                                     
features.ctr-record-metadata-heat       off                                     
features.ctr_link_consistency           off                                     
features.ctr_lookupheal_link_timeout    300                                     
features.ctr_lookupheal_inode_timeout   300                                     
features.ctr-sql-db-cachesize           12500                                   
features.ctr-sql-db-wal-autocheckpoint  25000                                   
locks.trace                             off                                     
locks.mandatory-locking                 off                                     
cluster.disperse-self-heal-daemon       enable                                  
cluster.quorum-reads                    no                                      
client.bind-insecure                    (null)                                  
features.shard                          off                                     
features.shard-block-size               64MB                                    
features.shard-lru-limit                16384                                   
features.shard-deletion-rate            100                                     
features.scrub-throttle                 lazy                                    
features.scrub-freq                     biweekly                                
features.scrub                          false                                   
features.expiry-time                    120                                     
features.cache-invalidation             off                                     
features.cache-invalidation-timeout     60                                      
ganesha.enable                          off                                     
features.leases                         off                                     
features.lease-lock-recall-timeout      60                                      
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
dht.force-readdirp                      on                                      
disperse.read-policy                    gfid-hash                               
cluster.shd-max-threads                 1                                       
cluster.shd-wait-qlength                1024                                    
cluster.locking-scheme                  full                                    
cluster.granular-entry-heal             no                                      
features.locks-revocation-secs          0                                       
features.locks-revocation-clear-all     false                                   
features.locks-revocation-max-blocked   0                                       
features.locks-monkey-unlocking         false                                   
features.locks-notify-contention        yes                                     
features.locks-notify-contention-delay  5                                       
disperse.shd-max-threads                1                                       
disperse.shd-wait-qlength               1024                                    
disperse.cpu-extensions                 auto                                    
disperse.self-heal-window-size          1                                       
cluster.use-compound-fops               off                                     
performance.parallel-readdir            off                                     
performance.rda-request-size            131072                                  
performance.rda-low-wmark               4096                                    
performance.rda-high-wmark              128KB                                   
performance.rda-cache-limit             10MB                                    
performance.nl-cache-positive-entry     false                                   
performance.nl-cache-limit              10MB                                    
performance.nl-cache-timeout            60                                      
cluster.brick-multiplex                 disable                                 
glusterd.vol_count_per_thread           100                                     
cluster.max-bricks-per-process          250                                     
disperse.optimistic-change-log          on                                      
disperse.stripe-cache                   4                                       
cluster.halo-enabled                    False                                   
cluster.halo-shd-max-latency            99999                                   
cluster.halo-nfsd-max-latency           5                                       
cluster.halo-max-latency                5                                       
cluster.halo-max-replicas               99999                                   
cluster.halo-min-replicas               2                                       
features.selinux                        off                                     
cluster.daemon-log-level                INFO                                    
debug.delay-gen                         off                                     
delay-gen.delay-percentage              10%                                     
delay-gen.delay-duration                100000                                  
delay-gen.enable                                                                
disperse.parallel-writes                on                                      
disperse.quorum-count                   0                                       
features.sdfs                           off                                     
features.cloudsync                      off                                     
features.ctime                          off                                     
ctime.noatime                           on                                      
features.cloudsync-storetype            (null)                                  
features.enforce-mandatory-lock         off                                     
features.cloudsync-remote-read          off                                     
features.cloudsync-store-id             (null)                                  
features.cloudsync-product-id           (null)                                  


Relevant options related to bug 1104618:

# gluster v get GE_storage7_volume01 cluster.consistent-metadata
Option                                  Value                                   
------                                  -----                                   
cluster.consistent-metadata             no                                      

# gluster v set GE_storage7_volume01 cluster.consistent-metadata on
volume set: success

# gluster v get GE_storage7_volume01 cluster.consistent-metadata
Option                                  Value                                   
------                                  -----                                   
cluster.consistent-metadata             on                                      

After putting all hosts in maintenance, and activating SPM host,
the issue cannot be reproduced.

# for i in $(seq 100); do rm -rf $M/test; echo iteration $i; sh reproduce.sh $M; echo '----'; done

No failure in 100 runs.

So the issue is wrong volume option.

Avihai:

1. Check that you run the latest gluster version

2. Modify this option in all existing volumes

    # gluster v set volume-name cluster.consistent-metadata on

3. Check that when creating new volume cluster.consistent-metadata is
   enabled. If not, we need a gluster bug to include this option in the
   virt group.

4. Someone need to automatic handle upgrade of old volumes, detecting
   the incompatible volume option "cluster.consistent-metadata no", and
   modifying the option to "cluster.consistent-metadata on".

Comment 49 Nir Soffer 2021-10-10 20:55:59 UTC

Summary - we have 2 issues:

1. Wrong configuration on gluster volume, causing very likely failure
   (~80%) when switching master from gluster domain.

2. Missing locking in vdsm, when reading and writing tasks data to the
   master file system. Failure may happen when SPM task (e.g. create disk,
   delete disk) run while switching master. The chance to fail is very low,
   I could reproduce only one failure when switching master 200 times, and
   moving disks between storage domains in the same time.

I will open another bug for the locking issue.

Comment 50 Nir Soffer 2021-10-10 21:58:22 UTC

Created attachment 1831592 [details]
Example script for testing change master

Script uses change_master.py example from:
https://github.com/oVirt/python-ovirt-engine-sdk4/pull/12

Should run inside the examples directory:

# sh ~/change-master.sh > ~/change-master.log 2>&1

The dc and storage domains names should be modified for the test setup.

Comment 51 Nir Soffer 2021-10-10 22:01:01 UTC

Created attachment 1831593 [details]
change-master test log

Log generated using change-master.sh.

Testing vdsm master:
https://gerrit.ovirt.org/c/vdsm/+/116982

With engine patch enabling switch master with gluster storage.

Comment 54 Nir Soffer 2021-10-11 10:16:47 UTC

Moving back from vdsm/storage to ovirt-engine/gluster, see comment 48.

Comment 57 Nir Soffer 2021-10-11 13:03:55 UTC

(In reply to Nir Soffer from comment #49)
> 2. Missing locking in vdsm, when reading and writing tasks data to the
>    master file system ...

I filed bug 2012843 for this.

Comment 58 Ritesh Chikatwar 2021-10-18 05:56:06 UTC

Closing this as not a bug as this a configuration issue. from comment 48 , Switching master SD is working as expected when setting volume option cluster.consistent-metadata to on .