1336295 – Replace brick causes vm to pause and /.shard is always present in the heal info

Bug 1336295 - Replace brick causes vm to pause and /.shard is always present in the heal info

Summary: Replace brick causes vm to pause and /.shard is always present in the heal info

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Anuradha
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Gluster-HC-1 1311817
TreeView+	depends on / blocked

Reported:	2016-05-16 06:46 UTC by RamaKasturi
Modified:	2016-09-20 02:00 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.7.9-6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-23 05:23:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description RamaKasturi 2016-05-16 06:46:43 UTC

Description of problem:
Did a replace brick by running the command 'gluster volume replace-brick data zod.lab.eng.blr.redhat.com:/rhgs/data/data-brick3 <old brick> zod.lab.eng.blr.redhat.com:/rhgs/data/data-brick4 <new brick> commit force. I see that some of my vms goes to paused state and gluster vol heal data info always reports /.shard . Output from gluster vol heal data info:

[root@zod data]# gluster vol heal data info
Brick sulphur.lab.eng.blr.redhat.com:/rhgs/data/data-brick1
/.shard 
Status: Connected
Number of entries: 1

Brick tettnang.lab.eng.blr.redhat.com:/rhgs/data/data-brick2
/.shard 
Status: Connected
Number of entries: 1

Brick zod.lab.eng.blr.redhat.com:/rhgs/data/data-brick4
Status: Connected
Number of entries: 0


Version-Release number of selected component (if applicable):
glusterfs-3.7.9-4.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Have I/O running on your vms.
2. Run the command to replace the brick gluster volume replace-brick data zod.lab.eng.blr.redhat.com:/rhgs/data/data-brick3 <old brick> zod.lab.eng.blr.redhat.com:/rhgs/data/data-brick4 <new brick> commit force
3.

Actual results:
VMs goes to paused state and /.shard is always reported in the heal info.

Expected results:
Vms should not go to paused state and heal info should be not always report /.shard in the output.

Additional info:

Comment 2 RamaKasturi 2016-05-16 06:48:33 UTC

Failures seen in mount log of data :
=======================================

[2016-05-16 05:34:53.939937] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 2-data-client-2: remote operation failed [No such file or directory]
The message "W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 2-data-client-2: remote operation failed [No such file or directory]" repeated 9 times betwe
en [2016-05-16 05:34:53.939937] and [2016-05-16 05:34:53.941235]
[2016-05-16 05:34:53.941673] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-0: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:53.941891] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022670: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.941942] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022675: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.941961] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022668: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.944489] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022673: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.944924] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022671: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.945027] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022674: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.945172] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022669: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.945234] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022672: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.945305] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022677: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.945402] W [fuse-bridge.c:2305:fuse_writev_cbk] 0-glusterfs-fuse: 5022676: WRITE => -1 (Input/output error)
[2016-05-16 05:34:53.949119] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-2: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:53.950339] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-1: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:53.952624] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-0: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:53.956553] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-2: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:53.957084] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-1: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.310983] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-0: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.316845] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-2: remote operation failed. Path: (null) (00000000-0000-0000-
0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.317162] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.317443] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.321221] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.322126] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2016-05-16 05:34:54.323285] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 2-data-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]

Comment 3 SATHEESARAN 2016-05-17 03:24:15 UTC

Adding blocker flag to the release, as the replace brick results in the VM pause.

Replacing the brick is the only solution to remove the failed hard drives from the the volume configuration.

Comment 4 Anuradha 2016-05-17 10:27:04 UTC

Regarding /.shard always being in heal info output :

Some files were created on the backend separately in /.shard directory, meaning these files were not created from the mount.

When self-heal was triggered, as lookup on this file failed with ENODATA, undo pending was not done. As a result, the pending markers on /.shard were not cleared which is the reason it kept showing up in the heal info output even though there was no heal needed.

Given that these files were created directly on the backend and not from mount, this case is not valid.

Kasturi,
I'm adding a need info on you to confirm that "/.shard always shows up in heal info" in this bug was logged corresponding to the same issue I debugged yesterday. VM pause on replace-brick still needs to be looked into.

Comment 5 RamaKasturi 2016-05-17 11:51:16 UTC

yes anuradha, agree with you. /.shard was always seen in the heal info output because of the out.txt being present in the backend. Now heal info shows zero entries which means the heal is completed. But i am unable to boot one of vm which was in paused state. vm says no bootable device and i do see that there are some entries shown in the heal info output of the brick which is replaced.

Comment 7 Atin Mukherjee 2016-05-18 10:05:08 UTC

RCA is still in progress is what I heard from AFR team.

Comment 8 Pranith Kumar K 2016-05-19 11:04:45 UTC

Just updating with the discussion our team had about this issue:
http://review.gluster.org/14369 should most probably fix this issue as well. Looks like similar symptoms are faced when this issue is observed.

Comment 11 Pranith Kumar K 2016-05-23 10:07:06 UTC

Kasturi,
   Could you confirm that the issue is seen on latest release as well which has the file mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1336295#c8

Pranith

Comment 12 Atin Mukherjee 2016-05-23 11:46:11 UTC

Downstream patch https://code.engineering.redhat.com/gerrit/74759 has made into rhgs-3.1.3. As per comment 8 since dev feels that this issue is been taken care by this patch moving the bug state to ON_QA.

Comment 13 RamaKasturi 2016-06-03 07:32:40 UTC

Verified and works fine with build glusterfs-3.7.9-6.el7rhgs.x86_64.

Did a replace-brick commit force on the old brick to the new one while I/O is happening on the volume. Replace-brick operation went successfully and did not see any vm pauses.

I did observe some behavior on the vms while i performed the above exercise,  it says  "kernel:BUG: soft lockup - CPU#1 stuck for 22s! [fio:22835]"  and event messages on UI says "storage domain data experienced high latency of 9.466 seconds from host, this may cause performance and functional issues.Please consult your storage administrator". There is patch for addressing latency, will verify for the soft lock there again. 

Since the actual problem of vm going to pause state while replace brick is happening got fixed moving this to on_qa. will reopen this if i see the issue.

Comment 14 RamaKasturi 2016-06-03 07:38:25 UTC

Since the actual problem of vm going to pause state while replace brick is happening got fixed moving this to verified. will reopen this, if happens again

Comment 15 SATHEESARAN 2016-06-03 07:39:41 UTC

(In reply to RamaKasturi from comment #13)

> I did observe some behavior on the vms while i performed the above exercise,
> it says  "kernel:BUG: soft lockup - CPU#1 stuck for 22s! [fio:22835]"  and
> event messages on UI says "storage domain data experienced high latency of
> 9.466 seconds from host, this may cause performance and functional
> issues.Please consult your storage administrator". There is patch for
> addressing latency, will verify for the soft lock there again. 
> 
 For the worth of information :
The latency issue was tracked under a separate bug - https://bugzilla.redhat.com/show_bug.cgi?id=1339136 , for which Krutika was provided a private build with O_DIRECT enabled.

Comment 17 errata-xmlrpc 2016-06-23 05:23:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.