Bug 1475266

Summary: IO Error hit with triggering a manual heal in between bringing down of redundant bricks
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED WORKSFORME QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: medium Docs Contact:
Priority: low    
Version: rhgs-3.3CC: pkarampu, rhs-bugs, sheggodu, storage-qa-internal, ubansal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-15 10:00:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Nag Pavan Chilakam 2017-07-26 10:35:11 UTC


root@dhcp35-214 ~]# gluster v status
Status of volume: ec
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.192:/rhs/brick1/ec           49154     0          Y       25954
Brick 10.70.35.214:/rhs/brick1/ec           N/A       N/A        N       N/A  ===>(above brought down first)
Brick 10.70.35.215:/rhs/brick1/ec           49152     0          Y       28857
Brick 10.70.35.192:/rhs/brick2/ec           49153     0          Y       25307
Brick 10.70.35.214:/rhs/brick2/ec           N/A       N/A        N       N/A  
===>(above brought down post triggering manual heal)
Brick 10.70.35.215:/rhs/brick2/ec           49153     0          Y       28876
Self-heal Daemon on localhost               N/A       N/A        Y       1572 
Self-heal Daemon on 10.70.35.215            N/A       N/A        Y       29296
Self-heal Daemon on dhcp35-192.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       25986
 
Task Status of Volume ec
------------------------------------------------------------------------------
There are no active volume tasks

xattrs of one of the parent dir where the file creates started to fail:
(note b2 is missing which was the first brick brought down)



b1:
[root@dhcp35-192 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

b4:
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff



b5:
root@dhcp35-214 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000000000000000000000
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff



b3:
root@dhcp35-215 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

b6:
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

Comment 3 Nag Pavan Chilakam 2017-07-26 10:36:27 UTC
[root@dhcp35-214 ~]# gluster v info
 
Volume Name: ec
Type: Disperse
Volume ID: be892fe9-7daa-40f8-a52f-428e3d396cfa
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.35.192:/rhs/brick1/ec
Brick2: 10.70.35.214:/rhs/brick1/ec
Brick3: 10.70.35.215:/rhs/brick1/ec
Brick4: 10.70.35.192:/rhs/brick2/ec
Brick5: 10.70.35.214:/rhs/brick2/ec
Brick6: 10.70.35.215:/rhs/brick2/ec
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Comment 4 Nag Pavan Chilakam 2017-07-26 10:36:46 UTC
[root@dhcp35-192 ~]# gluster v heal ec info|grep ntries
Number of entries: 2483
Number of entries: -
Number of entries: 2483
Number of entries: 2483
Number of entries: -
Number of entries: 2483

Comment 5 Nag Pavan Chilakam 2017-07-26 10:56:14 UTC
sosreports and logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1475266/

Comment 6 Nag Pavan Chilakam 2017-07-26 12:23:13 UTC
I have tried it another 4 times , but couldn't hit it