Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1658472

Summary: Mountpoint not accessible for few seconds when bricks are brought down to max redundancy after reset brick
Product: [Community] GlusterFS Reporter: Upasana <ubansal>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED UPSTREAM QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 3.12CC: bugs, pasik
Target Milestone: ---Keywords: Automation, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1658451 Environment:
Last Closed: 2020-03-18 05:58:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1658451    
Bug Blocks:    

Description Upasana 2018-12-12 08:43:23 UTC
+++ This bug was initially created as a clone of Bug #1658451 +++

Description of problem:
========================
Had written a automation script for reset volume in EC and it was failing 2 out of 11 times in getting arequal after bringing down bricks to max redundancy.
so added a ls -lrt /mnt before getting arequal at this point and the logs show that the mount point is not accessible

2018-12-12 12:28:11,941 INFO (run) root.35.11 (cp): ls -lrt /mnt
2018-12-12 12:28:11,941 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.35.11

2018-12-12 12:28:12,432 INFO (_log_results) RETCODE (root.35.11): 1
2018-12-12 12:28:12,433 INFO (_log_results) STDOUT (root.35.11)...
total 0
d?????????? ? ?    ?    ?            ? testvol_dispersed_glusterfs
drwxr-xr-x. 2 root root 6 Dec 11 02:22 tmp

2018-12-12 12:28:12,433 INFO (_log_results) STDERR (root.35.11)...
ls: cannot access /mnt/testvol_dispersed_glusterfs: Transport endpoint is not connected




Version-Release number of selected component (if applicable):
=============================================================
3.4


How reproducible:
================
Downstream - 2/11
Upstream - 2/2

Steps to Reproduce:
===================
Create a EC Volume and mount the volume
        - Create IO on dir2 of volume mountpoint
        - Reset brick start
        - Check if brick is offline
        - Reset brick with destination same as source with force running IO's
        - Validating IO's and waiting for it to complete on dir2
        - Remove dir2
        - Create 5 directory and 5 files in dir1 of mountpoint
        - Rename all files inside dir1 at mountpoint
        - Create softlink and hardlink of files in dir1 of mountpoint
        - Delete op for deleting all file in one of the dirs inside dir1
        - Change chmod, chown, chgrp
        - Create tiny, small, medium and large file
        - Create IO's
        - Validating IO's and waiting for it to complete
        - Calculate arequal before kiiling brick
        - Get brick from Volume
        - Reset brick
        - Check if brick is offline
        - Reset brick by giving a different source and dst node --> Fails (Expected)
        - Reset brick by giving dst and source same without force --> fails (Expected)
        - Obtain hostname
        - Reset brick with dst-source same force using hostname - Successful
        - Monitor heal completion
        - Bring down other bricks to max redundancy
        - Get arequal after bringing down bricks


Actual results:
================
Getting arequal fails with the below error
2018-12-12 12:28:12,435 INFO (run_async) root.35.11 (cp): arequal-checksum -p /mnt/testvol_dispersed_glusterfs -i .trashcan
2018-12-12 12:28:12,436 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.35.11
2018-12-12 12:28:13,117 INFO (_log_results) RETCODE (root.35.11): 1
2018-12-12 12:28:13,119 INFO (_log_results) STDERR (root.35.11)...
ftw (-p) returned -1 (Transport endpoint is not connected), terminating
2018-12-12 12:28:13,119 ERROR (collect_mounts_arequal) Collecting arequal-checksum failed on 10.70.35.11:/mnt/testvol_dispersed_glusterfs


Expected results:
=================
This should pass


Additional info:
=================
This issue is seen only for a few seconds after which mountpoint becomes accessible hence very difficult to reproduce it manually 

Tried a couple of times but was not able to reproduce the issue manually on downstream

--- Additional comment from Red Hat Bugzilla Rules Engine on 2018-12-12 07:32:52 UTC ---

This bug is automatically being proposed for a Z-stream release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Upasana on 2018-12-12 08:32:33 UTC ---

Cent0S logs - https://ci.centos.org/job/gluster_glusto-patch-check/1019/artifact/glustomain.log/*view*/

--- Additional comment from Upasana on 2018-12-12 08:39:01 UTC ---

Downstream runs setup's sos report - http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/ubansal/1658451/ (10.70.35.11 was used as client also)

Comment 1 Worker Ant 2020-03-18 05:58:57 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1122, and will be tracked there from now on. Visit GitHub issues URL for further details