Bug 1568282 - Remove-brick on nodes failed with error "inodelk failed on subvol [Read-only file system]" in rebalance logs
Summary: Remove-brick on nodes failed with error "inodelk failed on subvol [Read-only ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-17 07:03 UTC by Prasad Desala
Modified: 2020-01-16 06:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-16 06:59:24 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Prasad Desala 2018-04-17 07:03:02 UTC
Description of problem:
======================
Remove-brick on nodes failed with error "inodelk failed on subvol [Read-only file system]" in rebalance logs.

gluster v remove-brick 1413005 10.70.42.167:/bricks/brick3/1413005-b3 10.70.42.177:/bricks/brick3/1413005-b3 10.70.42.173:/bricks/brick3/1413005-b3 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             1             0               failed        0:00:00
       dhcp42-177.lab.eng.blr.redhat.com                0        0Bytes             0             1             0               failed        0:00:00
       dhcp42-173.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress        0:00:04


rebalance logs snip:

[2018-04-17 06:38:55.911624] I [dht-rebalance.c:4411:gf_defrag_start_crawl] 0-1413005-dht: gf_defrag_start_crawl using commit hash 3601642346
[2018-04-17 06:38:55.935521] I [MSGID: 109081] [dht-common.c:5554:dht_setxattr] 0-1413005-dht: fixing the layout of /
[2018-04-17 06:38:55.953209] E [MSGID: 109119] [dht-lock.c:1051:dht_blocking_inodelk_cbk] 0-1413005-dht: inodelk failed on subvol 1413005-replicate-1, gfid:00000000-0000-0000-0000-000000000001 [Read-only file system]
[2018-04-17 06:38:55.957373] E [MSGID: 109026] [dht-rebalance.c:4455:gf_defrag_start_crawl] 0-1413005-dht: fix layout on / failed [Read-only file system]

Version-Release number of selected component (if applicable):
3.12.2-7.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a x3 volume and start it.
2) FUSE mount on multiple clients and create two sub directories say <mountpoint>Terminal{1..2}
3) From client-1: 
Terminal-1 --> Start script for creating files and folders inside Terminal-1 folder
Terminal-2 --> Start script for creating files and folders inside Terminal-2 folder
From Client-2: Start infinite lookups --> while true; do ls -lRt;done
4) while step-3 is in-progress, add few bricks to the volume.
5) Immediately remove the added bricks in step-4  and wait till remove-brick completes.

Actual results:
===============
Remove-brick on nodes failed with error "inodelk failed on subvol [Read-only file system]" in rebalance logs

Expected results:
=================
Remove-brick on the nodes should complete without any issues.

Comment 3 Raghavendra G 2018-04-17 10:32:19 UTC
> [2018-04-17 06:38:55.953209] E [MSGID: 109119] [dht-lock.c:1051:dht_blocking_inodelk_cbk] 0-1413005-dht: inodelk failed on subvol 1413005-replicate-1, gfid:00000000-0000-0000-0000-000000000001 [Read-only file system]

It's a timing issue.

* One child of afr comes online
* AFR sends CHILD_UP
* DHT starts crawl and issues a fix-layout on /
* An inodelk on / is sent. However, Quorum is not met in afr (atleast 2 children have to be up) and hence it fails inodelk with EROFS

From logs, I could see only client-4 up. Other children of replicate-1 - client-3 and client-5 - are still down (I didn't see setvolume_cbk log for these clients).

Comment 5 Raghavendra G 2018-04-18 06:44:57 UTC
Prasad did another test with a new volume killing two bricks from each replica.

Below are relevant logs:
[2018-04-17 14:02:55.351052] T [MSGID: 0] [dht-lock.c:1096:dht_blocking_inodelk_rec] 0-stack-trace: stack-address: 0x7f115c00d9b0, winding from lock-dht to lo
ck-replicate-0
[2018-04-17 14:02:55.351157] T [MSGID: 0] [afr-common.c:3581:afr_fop_lock_wind] 0-stack-trace: stack-address: 0x7f115c00d9b0, winding from lock-replicate-0 to
 lock-client-2
[2018-04-17 14:02:55.351222] T [rpc-clnt.c:1496:rpc_clnt_record] 0-lock-client-2: Auth Info: pid: 4294967293, uid: 0, gid: 0, owner: 7028005c117f0000
[2018-04-17 14:02:55.351261] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 148, payload: 80, rpc hdr: 68
[2018-04-17 14:02:55.351522] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x10 Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to 
rpc-transport (lock-client-2)
[2018-04-17 14:02:55.369676] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-lock-client-2: received rpc message (RPC XID: 0x10 Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) from rpc-transport (lock-client-2)
[2018-04-17 14:02:55.369841] T [MSGID: 0] [client-rpc-fops.c:1511:client3_3_inodelk_cbk] 0-stack-trace: stack-address: 0x7f115c00d9b0, lock-client-2 returned 0
[2018-04-17 14:02:55.370008] T [MSGID: 0] [afr-common.c:3581:afr_fop_lock_wind] 0-stack-trace: stack-address: 0x7f115c00d9b0, winding from lock-replicate-0 to lock-client-2
[2018-04-17 14:02:55.370088] T [rpc-clnt.c:1496:rpc_clnt_record] 0-lock-client-2: Auth Info: pid: 4294967293, uid: 0, gid: 0, owner: 7028005c117f0000
[2018-04-17 14:02:55.370146] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 148, payload: 80, rpc hdr: 68
[2018-04-17 14:02:55.370447] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x11 Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (lock-client-2)
[2018-04-17 14:02:55.373928] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-lock-client-2: received rpc message (RPC XID: 0x11 Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) from rpc-transport (lock-client-2)
[2018-04-17 14:02:55.374027] T [MSGID: 0] [client-rpc-fops.c:1511:client3_3_inodelk_cbk] 0-stack-trace: stack-address: 0x7f115c00d9b0, lock-client-2 returned 0
[2018-04-17 14:02:55.374104] D [MSGID: 0] [afr-common.c:3548:afr_fop_lock_unwind] 0-stack-trace: stack-address: 0x7f115c00d9b0, lock-replicate-0 returned -1 error: Read-only file system [Read-only file system]
[2018-04-17 14:02:55.374171] E [MSGID: 109119] [dht-lock.c:1051:dht_blocking_inodelk_cbk] 0-lock-dht: inodelk failed on subvol lock-replicate-0, gfid:00000000-0000-0000-0000-000000000001 [Read-only file system]
[2018-04-17 14:02:55.374316] D [MSGID: 0] [dht-common.c:3565:dht_fix_layout_setxattr_cbk] 0-stack-trace: stack-address: 0x7f115c002870, lock-dht returned -1 error: Read-only file system [Read-only file system]
[2018-04-17 14:02:55.374555] E [MSGID: 109026] [dht-rebalance.c:4455:gf_defrag_start_crawl] 0-lock-dht: fix layout on / failed [Read-only file system]
[2018-04-17 14:02:55.376604] I [MSGID: 109028] [dht-rebalance.c:5044:gf_defrag_status_get] 0-lock-dht: Rebalance is failed. Time taken is 0.00 secs

Some observations:

* Its afr which failed inodelk with EROFS
* Afr wound inodelk only to lock-client-2 not to lock-client-0 and lock-client-1. Brick1 and Brick2 were killed
Brick 10.70.42.167:/bricks/brick0/lock-b0   N/A       N/A        N       N/A  
Brick 10.70.42.177:/bricks/brick0/lock-b0   N/A       N/A        N       N/A  

1: volume lock-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host 10.70.42.167
  5:     option remote-subvolume /bricks/brick0/lock-b0
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username f3d1a17f-8016-4dd8-b0f5-b759a632a906
  9:     option password 9f49374c-ee8f-4860-8e52-8b20e485c59e
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14: end-volume
 15:  
 16: volume lock-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host 10.70.42.177
 20:     option remote-subvolume /bricks/brick0/lock-b0
 21:     option transport-type socket
 22:     option transport.address-family inet
 23:     option username f3d1a17f-8016-4dd8-b0f5-b759a632a906
 24:     option password 9f49374c-ee8f-4860-8e52-8b20e485c59e
 25:     option transport.tcp-user-timeout 0
 26:     option transport.socket.keepalive-time 20
 27:     option transport.socket.keepalive-interval 2
 28:     option transport.socket.keepalive-count 9
 29: end-volume


So, I think afr is failing inodelk with EROFS as there is no Quorum. This proves the hypothesis in comment #3.

Comment 7 Raghavendra G 2018-04-19 05:22:47 UTC
> So, I think afr is failing inodelk with EROFS as there is no Quorum. This proves the hypothesis in comment #3.

One loophole in this argument though. The part about inodelk failing with EROFS due to two subvols of replicate-1 - client-3 and client-5 is correct. However, nothing in the log indicates whether connections from client-3 and client-5 to bricks either succeeded or failed. Also, I didn't observer logs indicating reconfigure of ports. This means there was no attempt to connect to bricks on these nodes. Instead connection to glusterds on the nodes running these bricks itself didn't yield any result - either success or failure.

DHT waits for an event (doesn't matter which) from all of its subvols before starting with rebalance. Event propagation logic in afr is also similar. The puzzling part is without hearing from client-3 and client-5, replicate-1 shouldn't have propagated _any_ event to its parent and DHT without hearing from replicate-1 shouldn't have proceeded with rebalance. But, this didn't happen. So, there is still more to be discovered in RCA.

Comment 10 Sahina Bose 2019-11-25 07:33:30 UTC
Marking medium priority as no activity for a year. Do we need to continue keeping this open or close?

Comment 11 Raghavendra G 2019-11-25 09:55:46 UTC
(In reply to Sahina Bose from comment #10)
> Marking medium priority as no activity for a year. Do we need to continue
> keeping this open or close?

Not a common scenario. So, if there is no bandwidth the bug can be closed as WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.