Description of problem: ======================== Had written a automation script for reset volume in EC and it was failing 2 out of 11 times in getting arequal after bringing down bricks to max redundancy. so added a ls -lrt /mnt before getting arequal at this point and the logs show that the mount point is not accessible 2018-12-12 12:28:11,941 INFO (run) root.35.11 (cp): ls -lrt /mnt 2018-12-12 12:28:11,941 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.35.11 2018-12-12 12:28:12,432 INFO (_log_results) RETCODE (root.35.11): 1 2018-12-12 12:28:12,433 INFO (_log_results) STDOUT (root.35.11)... total 0 d?????????? ? ? ? ? ? testvol_dispersed_glusterfs drwxr-xr-x. 2 root root 6 Dec 11 02:22 tmp 2018-12-12 12:28:12,433 INFO (_log_results) STDERR (root.35.11)... ls: cannot access /mnt/testvol_dispersed_glusterfs: Transport endpoint is not connected Version-Release number of selected component (if applicable): ============================================================= 3.4 How reproducible: ================ Downstream - 2/11 Upstream - 2/2 Steps to Reproduce: =================== Create a EC Volume and mount the volume - Create IO on dir2 of volume mountpoint - Reset brick start - Check if brick is offline - Reset brick with destination same as source with force running IO's - Validating IO's and waiting for it to complete on dir2 - Remove dir2 - Create 5 directory and 5 files in dir1 of mountpoint - Rename all files inside dir1 at mountpoint - Create softlink and hardlink of files in dir1 of mountpoint - Delete op for deleting all file in one of the dirs inside dir1 - Change chmod, chown, chgrp - Create tiny, small, medium and large file - Create IO's - Validating IO's and waiting for it to complete - Calculate arequal before kiiling brick - Get brick from Volume - Reset brick - Check if brick is offline - Reset brick by giving a different source and dst node --> Fails (Expected) - Reset brick by giving dst and source same without force --> fails (Expected) - Obtain hostname - Reset brick with dst-source same force using hostname - Successful - Monitor heal completion - Bring down other bricks to max redundancy - Get arequal after bringing down bricks Actual results: ================ Getting arequal fails with the below error 2018-12-12 12:28:12,435 INFO (run_async) root.35.11 (cp): arequal-checksum -p /mnt/testvol_dispersed_glusterfs -i .trashcan 2018-12-12 12:28:12,436 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.35.11 2018-12-12 12:28:13,117 INFO (_log_results) RETCODE (root.35.11): 1 2018-12-12 12:28:13,119 INFO (_log_results) STDERR (root.35.11)... ftw (-p) returned -1 (Transport endpoint is not connected), terminating 2018-12-12 12:28:13,119 ERROR (collect_mounts_arequal) Collecting arequal-checksum failed on 10.70.35.11:/mnt/testvol_dispersed_glusterfs Expected results: ================= This should pass Additional info: ================= This issue is seen only for a few seconds after which mountpoint becomes accessible hence very difficult to reproduce it manually Tried a couple of times but was not able to reproduce the issue manually on downstream