+++ This bug was initially created as a clone of Bug #1431955 +++ Description of problem: When EC opens a file and get fd from all the bricks, if the brick is down, it will not have the fd from that sub volume. Before sending a write on that fd, if the brick comes UP, we should be able to send this fd on this brick also to avoid unnecessary heal later. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Pranith Kumar K on 2017-03-27 01:28:29 EDT --- Sunil, When you do a dd on a file and as long as the file is open, you see something like the following in statedump of the client: [xlator.protocol.client.ec2-client-0.priv] fd.0.remote_fd=0 connecting=0 connected=1 total_bytes_read=7288220 ping_timeout=42 total_bytes_written=11045016 ping_msgs_sent=3 msgs_sent=19812 This should be present for each of the fds that are open from each client-xlator. So if we have 3=2+1 configuration we will have one for each of the client xlators. But if the brick was down at the time of opening the file this won't be present. Now after bringing the brick back up and operating on the file we should have this file opened again. I think at the moment this gets converted to anonymous-fd based operation so the operation may not fail. But it is important to open the file again for all operations to function properly like lk etc. --- Additional comment from Sunil Kumar Acharya on 2017-03-27 07:56:27 EDT --- Steps to re-create/test: 1. Created and mounted an EC (2+1) volume. Heal disabled. [root@server3 ~]# gluster volume info Volume Name: ec-vol Type: Disperse Volume ID: b676891f-392d-49a6-891c-8e7e3790658d Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: server1:/LAB/store/ec-vol Brick2: server2:/LAB/store/ec-vol Brick3: server3:/LAB/store/ec-vol Options Reconfigured: cluster.disperse-self-heal-daemon: disable <<<<< transport.address-family: inet nfs.disable: on disperse.background-heals: 0 <<<<< [root@server3 ~]# 2. Touched a file on the mountpoint. # touch file 3. Brought down one of the brick process. 4. Opened a file descriptor for the file. # exec 30<> file 5. Brought up the brick process which was down. 6. Wrote to the FD. # echo "abc" >&30 7. File status on clinet and bricks after write completes. Client: [root@varada mount]# ls -lh file -rw-r--r--. 1 root root 4 Mar 27 17:11 file [root@varada mount]# du -kh file 1.0K file [root@varada mount]# Bricks: [root@server1 ~]# du -kh /LAB/store/ec-vol/file 4.0K /LAB/store/ec-vol/file [root@server1 ~]# ls -lh /LAB/store/ec-vol/file -rw-r--r-- 2 root root 0 Mar 27 17:08 /LAB/store/ec-vol/file [root@server1 ~]# cat /LAB/store/ec-vol/file [root@server1 ~]# [root@server2 ~]# du -kh /LAB/store/ec-vol/file 8.0K /LAB/store/ec-vol/file [root@server2 ~]# ls -lh /LAB/store/ec-vol/file -rw-r--r-- 2 root root 512 Mar 27 17:11 /LAB/store/ec-vol/file [root@server2 ~]# cat /LAB/store/ec-vol/file abc [root@server2 ~]# [root@server3 ~]# du -kh /LAB/store/ec-vol/file 8.0K /LAB/store/ec-vol/file [root@server3 ~]# ls -lh /LAB/store/ec-vol/file -rw-r--r-- 2 root root 512 Mar 27 17:11 /LAB/store/ec-vol/file [root@server3 ~]# cat /LAB/store/ec-vol/file abc abc [root@server3 ~]# --- Additional comment from Ashish Pandey on 2017-04-09 07:08:05 EDT --- We will also need to disable some performance option to actually open an FD in step - 4 for those bricks which are UP. 1 - gluster v set vol performance.lazy-open no 2 - gluster v set vol performance.read-after-open yes [root@apandey /]# gluster v info Volume Name: vol Type: Disperse Volume ID: d007c6c2-98da-4cd9-8d5e-99e0e3f37012 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: apandey:/home/apandey/bricks/gluster/vol-1 Brick2: apandey:/home/apandey/bricks/gluster/vol-2 Brick3: apandey:/home/apandey/bricks/gluster/vol-3 Options Reconfigured: disperse.background-heals: 0 cluster.disperse-self-heal-daemon: disable performance.read-after-open: yes performance.lazy-open: no transport.address-family: inet nfs.disable: on [root@apandey glusterfs]# gluster v status Status of volume: vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick apandey:/home/apandey/bricks/gluster/ vol-1 49152 0 Y 6297 Brick apandey:/home/apandey/bricks/gluster/ vol-2 49153 0 Y 5865 Brick apandey:/home/apandey/bricks/gluster/ vol-3 49154 0 Y 5884 Task Status of Volume vol ------------------------------------------------------------------------------ There are no active volume tasks After bringing the brick vol-1 UP and writing data on FD. [root@apandey glusterfs]# cat /home/apandey/bricks/gluster/vol-1/dir/file [root@apandey glusterfs]# cat /home/apandey/bricks/gluster/vol-2/dir/file abc abc abc abc abc abc abc abc abc abc abc [root@apandey glusterfs]# cat /home/apandey/bricks/gluster/vol-3/dir/file abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc abc [root@apandey glusterfs]# [root@apandey glusterfs]# getfattr -m. -d -e hex /home/apandey/bricks/gluster/vol-*/dir/file getfattr: Removing leading '/' from absolute path names # file: home/apandey/bricks/gluster/vol-1/dir/file security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000 trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x000000000000000b000000000000000b trusted.ec.size=0x0000000000000000 trusted.ec.version=0x00000000000000000000000000000001 trusted.gfid=0xf8cf475afa5e4873bf2274f45278f74f # file: home/apandey/bricks/gluster/vol-2/dir/file security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000 trusted.bit-rot.version=0x020000000000000058ea10fd0005a7ee trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x000000000000000c000000000000000c trusted.ec.size=0x000000000000002c trusted.ec.version=0x000000000000000c000000000000000d trusted.gfid=0xf8cf475afa5e4873bf2274f45278f74f # file: home/apandey/bricks/gluster/vol-3/dir/file security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000 trusted.bit-rot.version=0x020000000000000058ea110100063c59 trusted.ec.config=0x0000080301000200 trusted.ec.dirty=0x000000000000000c000000000000000c trusted.ec.size=0x000000000000002c trusted.ec.version=0x000000000000000c000000000000000d trusted.gfid=0xf8cf475afa5e4873bf2274f45278f74 --- Additional comment from Worker Ant on 2017-04-18 11:40:54 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#1) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-04-18 11:47:46 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#2) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-04-20 09:21:48 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#3) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-04-27 07:39:20 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#4) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-05-03 09:22:11 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#5) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-05-16 02:29:34 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#6) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-05-16 10:06:24 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#7) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-05-30 06:50:05 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#8) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-05-31 11:08:27 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#9) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-06-05 11:36:58 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#10) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-06-06 07:58:57 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#11) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-06-08 14:18:03 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#12) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-07-20 09:51:07 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#13) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-08-24 08:10:49 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#14) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-09-12 09:05:28 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#15) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-09-22 07:38:17 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#16) for review on master by Sunil Kumar Acharya (sheggodu) --- Additional comment from Worker Ant on 2017-10-11 11:42:36 EDT --- REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal implementation for EC) posted (#17) for review on master by Sunil Kumar Acharya (sheggodu)
Upstream Patch : https://review.gluster.org/17077
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607