Bug 1749352

Summary:	Failures in remove-brick due to [Input/output error] errors
Product:	[Community] GlusterFS	Reporter:	Mohammed Rafi KC <rkavunga>
Component:	replicate	Assignee:	bugs <bugs>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	5	CC:	bugs, ksubrahm, nchilaka, rhs-bugs, rkavunga, saraut, spalai, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1728770	Environment:
Last Closed:	2019-09-06 08:22:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1726673, 1728770
Bug Blocks:	1749305, 1749307, 1759832

Description Mohammed Rafi KC 2019-09-05 12:31:47 UTC

+++ This bug was initially created as a clone of Bug #1728770 +++

+++ This bug was initially created as a clone of Bug #1726673 +++

Description of problem:
While performing remove-brick to convert 3X3 volume to 2X3 volume, there were failures in remove-brick rebalance due to " E [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_opendir_cbk] 0-vol4-client-8: remote operation failed. Path: /dir1/thread0/level03/level13/level23/level33/level43 (69e97af3-d2d7-450a-881e-0c4ef6ac1355) [Input/output error] "

Version-Release number of selected component (if applicable):
6.0.7

How reproducible:
1/1

Steps to Reproduce:
1. Created 1X3 volume.
2. Fuse mount the volume and start I/O on the volume.
3. Convert it into 2X3 volume, triggered rebalance.
4. Let the rebalance complete and then convert into 3X3 volume;triggered rebalance.
5. After that, started remove-brick operation on the volume to convert it back    into 2X3 volume.
6. Check the remove-brick status.

Actual results:
There are failures in remove-brick rebalance.
Errors from rebalance logs:
E [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_opendir_cbk] 0-vol4-client-2: remote operation failed. Path: /dir1/thread0/level03/level13/level23/level33/level43 (69e97af3-d2d7-450a-881e-0c4ef6ac1355) [Input/output error]

E [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_opendir_cbk] 0-vol4-client-8: remote operation failed. Path: /dir1/thread0/level03/level13/level23/level33/level43 (69e97af3-d2d7-450a-881e-0c4ef6ac1355) [Input/output error]

W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-vol4-client-8: remote operation failed. Path: /dir1/thread0/level03/level13/level23/level33/level43/level53/5d1b1579%%P3TRO7PG35 (558423e2-478e-40e9-9958-31c710e50b89) [Input/output error]

W [MSGID: 114031] [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-vol4-client-2: remote operation failed. Path: /dir1/thread0/level03/level13/level23/level33/level43 (69e97af3-d2d7-450a-881e-0c4ef6ac1355) [Input/output error]


Expected results:
Remove-brick should complete successfully.


Remove-brick rebalance status:
==============================
# gluster v remove-brick vol4 replica 3 10.70.47.88:/bricks/brick2/vol4-b2 10.70.47.190:/bricks/brick2/vol4-b2 10.70.47.5:/bricks/brick2/vol4-b2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.47.190             3463         3.5MB         18425            23             0            completed        0:37:14
                              10.70.47.5             3308         3.7MB         21920           136             0            completed        0:32:59
                               localhost             3397         3.3MB         21977           138             0            completed        0:33:35



On checking the volume status, it showed that two bricks are down:
=================================================================
# gluster v status vol4
Status of volume: vol4
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.88:/bricks/brick2/vol4-b1    49159     0          Y       30394
Brick 10.70.47.190:/bricks/brick2/vol4-b1   49159     0          Y       29191
Brick 10.70.47.5:/bricks/brick2/vol4-b1     N/A       N/A        N       N/A  
Brick 10.70.46.246:/bricks/brick2/vol4-b1   49158     0          Y       22598
Brick 10.70.47.188:/bricks/brick2/vol4-b1   49158     0          Y       22865
Brick 10.70.46.63:/bricks/brick2/vol4-b1    49158     0          Y       21036
Brick 10.70.47.88:/bricks/brick2/vol4-b2    49160     0          Y       5938 
Brick 10.70.47.190:/bricks/brick2/vol4-b2   49160     0          Y       4825 
Brick 10.70.47.5:/bricks/brick2/vol4-b2     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       6330 
Self-heal Daemon on 10.70.46.246            N/A       N/A        Y       5672 
Self-heal Daemon on 10.70.47.5              N/A       N/A        Y       5600 
Self-heal Daemon on 10.70.46.63             N/A       N/A        Y       4593 
Self-heal Daemon on 10.70.47.188            N/A       N/A        Y       4501 
Self-heal Daemon on 10.70.47.190            N/A       N/A        Y       5352 
 
Task Status of Volume vol4
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : 273f04c3-b8bb-4613-a403-0c655de86ca3
Removed bricks:     
10.70.47.88:/bricks/brick2/vol4-b2
10.70.47.190:/bricks/brick2/vol4-b2
10.70.47.5:/bricks/brick2/vol4-b2
Status               : completed           


dmesg:
=====

[161039.214245] XFS (dm-66): Metadata CRC error detected at xfs_dir3_block_read_verify+0x5e/0x110 [xfs], xfs_dir3_block block 0x1dd8568
[161039.214912] XFS (dm-66): Unmount and run xfs_repair
[161039.215126] XFS (dm-66): First 64 bytes of corrupted metadata buffer:
[161039.215426] ffffbb1db27a6000: 20 20 20 20 20 23 20 51 75 69 63 6b 20 4d 61 69       # Quick Mai
[161039.215729] ffffbb1db27a6010: 6c 20 54 72 61 6e 73 66 65 72 20 50 72 6f 74 6f  l Transfer Proto
[161039.216110] ffffbb1db27a6020: 63 6f 6c 0a 71 6d 74 70 20 20 20 20 20 20 20 20  col.qmtp
[161039.216527] ffffbb1db27a6030: 20 20 20 20 32 30 39 2f 75 64 70 20 20 20 20 20      209/udp
[161039.217200] XFS (dm-66): metadata I/O error: block 0x1dd8568 ("xfs_trans_read_buf_map") error 74 numblks 16
[161039.217937] XFS (dm-66): xfs_do_force_shutdown(0x1) called from line 370 of file fs/xfs/xfs_trans_buf.c.  Return address = 0xffffffffc057de9a
[161039.344196] XFS (dm-66): I/O Error Detected. Shutting down filesystem
[161039.344495] XFS (dm-66): Please umount the filesystem and rectify the problem(s)


---> Though due to the brick issue, one brick is down in two replica pairs of the volume, but as it is a distributed-replicated volume,there should not be failures in rebalance.


Failure reason: 

"[2019-07-02 08:32:01.514139] W [MSGID: 109023] [dht-rebalance.c:626:__is_file_migratable] 0-vol4-dht: Mi
grate file failed:/dir1/thread0/level04/level14/level24/level34/level44/level54/level64/level74/level84/
symlink_to_files/5d1b15ed%%XS3OMQKQBN: Unable to get lock count for file
"




Key:/GLUSTERFS_POSIXLK_COUNT is used to get lock count from posix-lock translator. This information is used to decide whether to migrate the file or not.
In the current scenario as Sayalee mentioned one disk is corrupted on server *.5 rendering both participating brick from that server unresponsive(all operation leading to IO error). Given that only of the brick from two replicas was down, DHT should have received a valid response. Actually, the key was entirely missing from the dictionary itself.

Moving to AFR component for analysis.


Adding a needinfo on Rafi, as he had done some investigation on the same.

--- Additional comment from Mohammed Rafi KC on 2019-07-10 16:08:22 UTC ---

RCA:

As mentioned in the comment6, it failed because the lookup couldn't return lock count requested through GLUSTERFS_POSIXLK_COUNT. This is because While processing afr_lookup_cbk, if it requires a name heal, we process the name heal in afr_lookup_selfheal_wrap by wiping all the current lookup data. And after finishing the lookup we return the fresh data. But here when doing the healing using lookup we are not passing the xdata_req, which then posix misses to populate lock count.

<code>

2802 int
2803 afr_lookup_selfheal_wrap(void *opaque)
2804 {
2805     int ret = 0;
2806     call_frame_t *frame = opaque;
2807     afr_local_t *local = NULL;
2808     xlator_t *this = NULL;
2809     inode_t *inode = NULL;
2810     uuid_t pargfid = {
2811         0,
2812     };
2813 
2814     local = frame->local;
2815     this = frame->this;
2816     loc_pargfid(&local->loc, pargfid);
2817 
2818     ret = afr_selfheal_name(frame->this, pargfid, local->loc.name,
2819                             &local->cont.lookup.gfid_req, local->xattr_req);
2820     if (ret == -EIO)
2821         goto unwind;
2822     
2823     afr_local_replies_wipe(local, this->private);
2824     
2825     inode = afr_selfheal_unlocked_lookup_on(frame, local->loc.parent,
2826                                             local->loc.name, local->replies,
2827                                             local->child_up, NULL);
2828     if (inode)
2829         inode_unref(inode);
2830     
2831     afr_lookup_metadata_heal_check(frame, this);
2832     return 0;
2833 
2834 unwind:
2835     AFR_STACK_UNWIND(lookup, frame, -1, EIO, NULL, NULL, NULL, NULL);
2836     return 0;
</code>

--- Additional comment from Worker Ant on 2019-07-10 16:22:14 UTC ---

REVIEW: https://review.gluster.org/23024 (afr/lookup: Pass xattr_req in while doing a slefheal in lookup) posted (#1) for review on master by mohammed rafi  kc

--- Additional comment from Worker Ant on 2019-09-05 09:53:57 UTC ---

REVIEW: https://review.gluster.org/23024 (afr/lookup: Pass xattr_req in while doing a selfheal in lookup) merged (#15) on master by Ravishankar N

Comment 1 Worker Ant 2019-09-05 12:45:28 UTC

REVIEW: https://review.gluster.org/23365 (afr/lookup: Pass xattr_req in while doing a selfheal in lookup) posted (#2) for review on release-5 by mohammed rafi  kc

Comment 2 Worker Ant 2019-09-06 08:22:20 UTC

REVIEW: https://review.gluster.org/23365 (afr/lookup: Pass xattr_req in while doing a selfheal in lookup) merged (#2) on release-5 by mohammed rafi  kc