Bug 1475266 - IO Error hit with triggering a manual heal in between bringing down of redundant bricks
Summary: IO Error hit with triggering a manual heal in between bringing down of redun...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Ashish Pandey
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-26 10:31 UTC by Nag Pavan Chilakam
Modified: 2018-11-15 10:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-15 10:00:53 UTC
Embargoed:


Attachments (Terms of Use)

Comment 2 Nag Pavan Chilakam 2017-07-26 10:35:11 UTC


root@dhcp35-214 ~]# gluster v status
Status of volume: ec
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.192:/rhs/brick1/ec           49154     0          Y       25954
Brick 10.70.35.214:/rhs/brick1/ec           N/A       N/A        N       N/A  ===>(above brought down first)
Brick 10.70.35.215:/rhs/brick1/ec           49152     0          Y       28857
Brick 10.70.35.192:/rhs/brick2/ec           49153     0          Y       25307
Brick 10.70.35.214:/rhs/brick2/ec           N/A       N/A        N       N/A  
===>(above brought down post triggering manual heal)
Brick 10.70.35.215:/rhs/brick2/ec           49153     0          Y       28876
Self-heal Daemon on localhost               N/A       N/A        Y       1572 
Self-heal Daemon on 10.70.35.215            N/A       N/A        Y       29296
Self-heal Daemon on dhcp35-192.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       25986
 
Task Status of Volume ec
------------------------------------------------------------------------------
There are no active volume tasks

xattrs of one of the parent dir where the file creates started to fail:
(note b2 is missing which was the first brick brought down)



b1:
[root@dhcp35-192 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

b4:
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff



b5:
root@dhcp35-214 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000000000000000000000
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff



b3:
root@dhcp35-215 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

b6:
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

Comment 3 Nag Pavan Chilakam 2017-07-26 10:36:27 UTC
[root@dhcp35-214 ~]# gluster v info
 
Volume Name: ec
Type: Disperse
Volume ID: be892fe9-7daa-40f8-a52f-428e3d396cfa
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.35.192:/rhs/brick1/ec
Brick2: 10.70.35.214:/rhs/brick1/ec
Brick3: 10.70.35.215:/rhs/brick1/ec
Brick4: 10.70.35.192:/rhs/brick2/ec
Brick5: 10.70.35.214:/rhs/brick2/ec
Brick6: 10.70.35.215:/rhs/brick2/ec
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Comment 4 Nag Pavan Chilakam 2017-07-26 10:36:46 UTC
[root@dhcp35-192 ~]# gluster v heal ec info|grep ntries
Number of entries: 2483
Number of entries: -
Number of entries: 2483
Number of entries: 2483
Number of entries: -
Number of entries: 2483

Comment 5 Nag Pavan Chilakam 2017-07-26 10:56:14 UTC
sosreports and logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1475266/

Comment 6 Nag Pavan Chilakam 2017-07-26 12:23:13 UTC
I have tried it another 4 times , but couldn't hit it


Note You need to log in before you can comment on or make changes to this bug.