Description of problem: ========================== When Bala(bmekala) ran the testcase of inservice upgrade of ecvolume, from rhgs3.3.1->3.5.0 (ie 3.8.4-54.15 ->6.0.21) the client started to face input/output error of linux untar after 4 nodes were upgraded. ### below is the info #### While performing in-service upgrade during the upgrade of 4thnode, I have seen Input/Output Errors on the client side with linux untar on disperse and distributed-disperse volumes. I have turned off Disperse.optimistic-change-log and Disperse.eager-lock options before starting the upgrade on disperse and distributed-disperse volumes. After this I haven't proceeded with upgrade on the other two nodes. I stopped the upgrade. Please look into it. ############# tar: linux-4.20/tools/testing/selftests/powerpc/stringloops: Cannot utime: Input/output error tar: linux-4.20/tools/testing/selftests/powerpc/stringloops: Cannot change ownership to uid 0, gid 0: Input/output error tar: linux-4.20/tools/testing/selftests/powerpc/stringloops: Cannot change mode to rwxrwxr-x: Input/output error tar: linux-4.20/tools/testing/selftests/powerpc/primitives/asm: Cannot utime: Input/output error tar: linux-4.20/tools/testing/selftests/powerpc/primitives/asm: Cannot change ownership to uid 0, gid 0: Input/output error tar: linux-4.20/tools/testing/selftests/powerpc/primitives/asm: Cannot change mode to rwxrwxr-x: Input/output error tar: Exiting with failure status due to previous errors ############# Cluster Details: Credentials root/1 Upgraded nodes are below 10.70.35.150 10.70.35.210 10.70.35.107 10.70.35.164 Below nodes which are yet to upgrade. 10.70.35.119 10.70.35.46 Clients: 10.70.35.198 10.70.35.147 All the above machines are hosted on "tettnang.lab.eng.blr.redhat.com" Regards, Bala Version-Release number of selected component (if applicable): ============= rhgs3.3.1->3.5.0 (ie 3.8.4-54.15 ->6.0.21) How reproducible: ================= hit it once on 2 different ecvolumes on same cluster Steps to Reproduce: 1.ecvolume and distecvol on 3.3.1. turn off eager lock and optimistic change log 2. mounted on 2 clients, linux untar IO 3. start to upgrade one node at a time
https://code.engineering.redhat.com/gerrit/184178
Verified with RHGS 3.5.1 interim build ( glusterfs-6.0-24.el7rhgs ) with the following steps 1. Created a 6 node trusted storage pool ( gluster cluster ) with RHGS 3.3.1 ( glusterfs-3.8.4-54.15.el7rhgs ) 2. Created 1x(4+2) and 2x(4+2) disperse volumes 3. Disable disperse.eager-lock and disperse.optimistic-change-log off 4. Mount the volumes from 2 clients 5. Start kernel untar workload 6. Kill glusterfsd(brick), glusterfs and glusterd process in node1 ( # pkill glusterfsd; pkill glusterfs; systemctl stop glusterd ) 7. Perform upgrade to glusterfs-6.0-24.el7rhgs 8. Post successful upgrade, start glusterd 9. Wait for self-heal to get completed on both the disperse volumes 10. Repeat steps 6 to 10 on other nodes and monitor the progress of kernel untar workload, post upgrading each node is completed. Observation 1. Kernel untar workload was in progress and no interruption With these steps, marking this bug as verified. After upgrading the server, also bumped-up the op-version to 70000 and also unmounted the client, upgraded the client and remounted the disperse volumes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0288