1388734 – glusterfs can't self heal character dev file for invalid dev_t parameters

Bug 1388734 - glusterfs can't self heal character dev file for invalid dev_t parameters

Summary: glusterfs can't self heal character dev file for invalid dev_t parameters

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:	1384297 1388912 1388948 1388949
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-10-26 04:18 UTC by Atin Mukherjee
Modified:	2017-03-23 06:15 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.8.4-4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1384297
Environment:
Last Closed:	2017-03-23 06:15:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Atin Mukherjee 2016-10-26 04:18:00 UTC

+++ This bug was initially created as a clone of Bug #1384297 +++

Description of problem:
For replicate volume, if a character dev file only exit on one brick, it can't heal to other brick. there are below error logs.

1. glusterfs server side log:
bricks/mnt-bricks-export-brick.log:[2016-08-29 06:25:55.380204] E [posix.c:1145:posix_mknod] 0-export-posix: mknod on /mnt/bricks/export/brick/myzero failed: Invalid argument
bricks/mnt-bricks-export-brick.log:[2016-08-29 06:25:55.380223] I [server-rpc-fops.c:522:server_mknod_cbk] 0-export-server: 27: MKNOD /myzero (00000000-0000-0000-0000-000000000001/myzero) ==> (Invalid argument)

2. glusterfs client side log:
glusterfs/glustershd.log:[2016-08-29 06:25:56.481530] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-export-client-1: remote operation failed: Invalid argument. Path: (null)


Version-Release number of selected component (if applicable):
3.6.9


How reproducible:
For replicate volume.
1. shutdown one brick of the volume.
2. write a character dev file in the volume.
mknod myzero c 1 5
3. startup the volume.
4. check if the character dev file is healed.


Additional info:
I print the parameters of mknod, it isn't correct.

[2016-08-29 08:44:48.015571] E [posix.c:1150:posix_mknod] 0-export-posix: mknod on /mnt/bricks/export/brick/myzero failed: Invalid argument
[2016-08-29 08:44:48.015589] I [server-rpc-fops.c:522:server_mknod_cbk] 0-export-server: 2950: MKNOD /myzero (00000000-0000-0000-0000-000000000001/myzero) ==> (Invalid argument)
[2016-08-29 08:45:33.330540] E [posix.c:1129:posix_mknod] 0-export-posix: mknod on /mnt/bricks/export/brick/myzero , mode: 0x21a4, dev: 0x5, major 16777216, minor 5

--- Additional comment from xiaopwu on 2016-10-12 23:31:23 EDT ---

the root cause of the issue as below:
--- old/afr-self-heal-entry.c
+++ new/afr-self-heal-entry.c
@@ -142,8 +142,10 @@
                ret = dict_set_int32 (xdata, GLUSTERFS_INTERNAL_FOP_KEY, 1);
                if (ret)
                        goto out;
+
                ret = syncop_mknod (priv->children[dst], &loc, mode,
-                                   iatt->ia_rdev, xdata, &newent);
+                   makedev (ia_major (iatt->ia_rdev), ia_minor (iatt->ia_rdev)), xdata, &newent);
+
                if (ret == 0 && newent.ia_nlink == 1) {
                        /* New entry created. Mark @dst pending on all sources */
                         newentry[dst] = 1;

--- Additional comment from Pranith Kumar K on 2016-10-25 08:48:23 EDT ---

This is a very good catch. We have same bug in EC too. I will send out the patches thanks a lot!!

--- Additional comment from xiaopwu on 2016-10-25 21:15:06 EDT ---

Could you merge the patch to glusterfs 3.6.9?

--- Additional comment from Pranith Kumar K on 2016-10-25 21:20:31 EDT ---

hi,
  3.6.x is nearing EOL, I will make sure the patch reaches 3.9.x, 3.8.x and 3.7.x

Pranith

--- Additional comment from xiaopwu on 2016-10-25 21:23:02 EDT ---

ok, thanks.

--- Additional comment from Worker Ant on 2016-10-25 21:49:40 EDT ---

REVIEW: http://review.gluster.org/15728 (afr,ec: Heal device files with correct major, minor numbers) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Worker Ant on 2016-10-25 22:47:39 EDT ---

REVIEW: http://review.gluster.org/15728 (afr,ec: Heal device files with correct major, minor numbers) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Nag Pavan Chilakam 2016-11-18 14:22:37 UTC

QATP:
====
TC#1: Server side healing
For replicate volume.
1. shutdown one brick of the volume.
2. write a character dev file in the volume.
mknod myzero c 1 5
3. force start  the volume.
4. check if the character dev file is healed.

the char file must be healed


TC#2:client side healing
====
For replicate volume.
1. Turn of heal deamon 
shutdown one brick of the volume.
2. write a character dev file in the volume.
mknod myzero c 1 5
3. force start  the volume.
4. check if the character dev file is healed.

the char file must be healed

Comment 8 Nag Pavan Chilakam 2016-11-18 14:24:37 UTC

validation:
==========
TC#1-->passed
TC#2--->entry heal passed, but metadata heal failed.(on discussion with pranith) However this is due to the reason of making healing from client side optimal and not stressing client,
this can be seen even with regular vol.


Hence moving to verified

3.8.4-5

Comment 9 Nag Pavan Chilakam 2016-11-18 14:27:11 UTC

checked even on x3

Comment 11 errata-xmlrpc 2017-03-23 06:15:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.