Bug 1475266 - IO Error hit with triggering a manual heal in between bringing down of redundant bricks
IO Error hit with triggering a manual heal in between bringing down of redun...
Status: ASSIGNED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: disperse (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Ashish Pandey
nchilaka
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-26 06:31 EDT by nchilaka
Modified: 2017-07-28 05:01 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 2 nchilaka 2017-07-26 06:35:11 EDT


root@dhcp35-214 ~]# gluster v status
Status of volume: ec
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.192:/rhs/brick1/ec           49154     0          Y       25954
Brick 10.70.35.214:/rhs/brick1/ec           N/A       N/A        N       N/A  ===>(above brought down first)
Brick 10.70.35.215:/rhs/brick1/ec           49152     0          Y       28857
Brick 10.70.35.192:/rhs/brick2/ec           49153     0          Y       25307
Brick 10.70.35.214:/rhs/brick2/ec           N/A       N/A        N       N/A  
===>(above brought down post triggering manual heal)
Brick 10.70.35.215:/rhs/brick2/ec           49153     0          Y       28876
Self-heal Daemon on localhost               N/A       N/A        Y       1572 
Self-heal Daemon on 10.70.35.215            N/A       N/A        Y       29296
Self-heal Daemon on dhcp35-192.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       25986
 
Task Status of Volume ec
------------------------------------------------------------------------------
There are no active volume tasks

xattrs of one of the parent dir where the file creates started to fail:
(note b2 is missing which was the first brick brought down)



b1:
[root@dhcp35-192 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

b4:
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff



b5:
root@dhcp35-214 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000000000000000000000
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff



b3:
root@dhcp35-215 ~]# getfattr -d -m . -e hex /rhs/brick*/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

b6:
# file: rhs/brick2/ec/dir1/linux-4.10.4/Documentation/media/uapi/v4l
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.version=0x00000000000000950000000000000097
trusted.gfid=0x731951d33b0848009c0fa2e07ca17329
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
Comment 3 nchilaka 2017-07-26 06:36:27 EDT
[root@dhcp35-214 ~]# gluster v info
 
Volume Name: ec
Type: Disperse
Volume ID: be892fe9-7daa-40f8-a52f-428e3d396cfa
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.35.192:/rhs/brick1/ec
Brick2: 10.70.35.214:/rhs/brick1/ec
Brick3: 10.70.35.215:/rhs/brick1/ec
Brick4: 10.70.35.192:/rhs/brick2/ec
Brick5: 10.70.35.214:/rhs/brick2/ec
Brick6: 10.70.35.215:/rhs/brick2/ec
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Comment 4 nchilaka 2017-07-26 06:36:46 EDT
[root@dhcp35-192 ~]# gluster v heal ec info|grep ntries
Number of entries: 2483
Number of entries: -
Number of entries: 2483
Number of entries: 2483
Number of entries: -
Number of entries: 2483
Comment 5 nchilaka 2017-07-26 06:56:14 EDT
sosreports and logs at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1475266/
Comment 6 nchilaka 2017-07-26 08:23:13 EDT
I have tried it another 4 times , but couldn't hit it

Note You need to log in before you can comment on or make changes to this bug.