Description of problem:
When Bala(bmekala) ran the testcase of inservice upgrade of ecvolume, from rhgs3.3.1->3.5.0 (ie 3.8.4-54.15 ->6.0.21) the client started to face input/output error of linux untar after 4 nodes were upgraded.
### below is the info ####
While performing in-service upgrade during the upgrade of 4thnode, I have seen Input/Output Errors on the client side with linux untar on disperse and distributed-disperse volumes.
I have turned off Disperse.optimistic-change-log and Disperse.eager-lock options before starting the upgrade on disperse and distributed-disperse volumes.
After this I haven't proceeded with upgrade on the other two nodes. I stopped the upgrade. Please look into it.
tar: linux-4.20/tools/testing/selftests/powerpc/stringloops: Cannot utime: Input/output error
tar: linux-4.20/tools/testing/selftests/powerpc/stringloops: Cannot change ownership to uid 0, gid 0: Input/output error
tar: linux-4.20/tools/testing/selftests/powerpc/stringloops: Cannot change mode to rwxrwxr-x: Input/output error
tar: linux-4.20/tools/testing/selftests/powerpc/primitives/asm: Cannot utime: Input/output error
tar: linux-4.20/tools/testing/selftests/powerpc/primitives/asm: Cannot change ownership to uid 0, gid 0: Input/output error
tar: linux-4.20/tools/testing/selftests/powerpc/primitives/asm: Cannot change mode to rwxrwxr-x: Input/output error
tar: Exiting with failure status due to previous errors
Cluster Details: Credentials root/1
Upgraded nodes are below
Below nodes which are yet to upgrade.
All the above machines are hosted on "tettnang.lab.eng.blr.redhat.com"
Version-Release number of selected component (if applicable):
rhgs3.3.1->3.5.0 (ie 3.8.4-54.15 ->6.0.21)
hit it once on 2 different ecvolumes on same cluster
Steps to Reproduce:
1.ecvolume and distecvol on 3.3.1. turn off eager lock and optimistic change log
2. mounted on 2 clients, linux untar IO
3. start to upgrade one node at a time
Verified with RHGS 3.5.1 interim build ( glusterfs-6.0-24.el7rhgs ) with the following steps
1. Created a 6 node trusted storage pool ( gluster cluster ) with RHGS 3.3.1 ( glusterfs-3.8.4-54.15.el7rhgs )
2. Created 1x(4+2) and 2x(4+2) disperse volumes
3. Disable disperse.eager-lock and disperse.optimistic-change-log off
4. Mount the volumes from 2 clients
5. Start kernel untar workload
6. Kill glusterfsd(brick), glusterfs and glusterd process in node1 ( # pkill glusterfsd; pkill glusterfs; systemctl stop glusterd )
7. Perform upgrade to glusterfs-6.0-24.el7rhgs
8. Post successful upgrade, start glusterd
9. Wait for self-heal to get completed on both the disperse volumes
10. Repeat steps 6 to 10 on other nodes and monitor the progress of kernel untar workload, post upgrading each node is completed.
1. Kernel untar workload was in progress and no interruption
With these steps, marking this bug as verified.
After upgrading the server, also bumped-up the op-version to 70000
and also unmounted the client, upgraded the client and remounted the disperse volumes
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.