Bug 1658451
| Summary: | Mountpoint not accessible for few seconds when bricks are brought down to max redundancy after reset brick | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Upasana <ubansal> | |
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
| Status: | CLOSED WORKSFORME | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.4 | CC: | pkarampu, rhinduja, rhs-bugs, storage-qa-internal, ubansal, vavuthu | |
| Target Milestone: | --- | Keywords: | Automation, ZStream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1658472 (view as bug list) | Environment: | ||
| Last Closed: | 2020-02-04 06:18:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1658472 | |||
Description of problem: ======================== Had written a automation script for reset volume in EC and it was failing 2 out of 11 times in getting arequal after bringing down bricks to max redundancy. so added a ls -lrt /mnt before getting arequal at this point and the logs show that the mount point is not accessible 2018-12-12 12:28:11,941 INFO (run) root.35.11 (cp): ls -lrt /mnt 2018-12-12 12:28:11,941 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.35.11 2018-12-12 12:28:12,432 INFO (_log_results) RETCODE (root.35.11): 1 2018-12-12 12:28:12,433 INFO (_log_results) STDOUT (root.35.11)... total 0 d?????????? ? ? ? ? ? testvol_dispersed_glusterfs drwxr-xr-x. 2 root root 6 Dec 11 02:22 tmp 2018-12-12 12:28:12,433 INFO (_log_results) STDERR (root.35.11)... ls: cannot access /mnt/testvol_dispersed_glusterfs: Transport endpoint is not connected Version-Release number of selected component (if applicable): ============================================================= 3.4 How reproducible: ================ Downstream - 2/11 Upstream - 2/2 Steps to Reproduce: =================== Create a EC Volume and mount the volume - Create IO on dir2 of volume mountpoint - Reset brick start - Check if brick is offline - Reset brick with destination same as source with force running IO's - Validating IO's and waiting for it to complete on dir2 - Remove dir2 - Create 5 directory and 5 files in dir1 of mountpoint - Rename all files inside dir1 at mountpoint - Create softlink and hardlink of files in dir1 of mountpoint - Delete op for deleting all file in one of the dirs inside dir1 - Change chmod, chown, chgrp - Create tiny, small, medium and large file - Create IO's - Validating IO's and waiting for it to complete - Calculate arequal before kiiling brick - Get brick from Volume - Reset brick - Check if brick is offline - Reset brick by giving a different source and dst node --> Fails (Expected) - Reset brick by giving dst and source same without force --> fails (Expected) - Obtain hostname - Reset brick with dst-source same force using hostname - Successful - Monitor heal completion - Bring down other bricks to max redundancy - Get arequal after bringing down bricks Actual results: ================ Getting arequal fails with the below error 2018-12-12 12:28:12,435 INFO (run_async) root.35.11 (cp): arequal-checksum -p /mnt/testvol_dispersed_glusterfs -i .trashcan 2018-12-12 12:28:12,436 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.35.11 2018-12-12 12:28:13,117 INFO (_log_results) RETCODE (root.35.11): 1 2018-12-12 12:28:13,119 INFO (_log_results) STDERR (root.35.11)... ftw (-p) returned -1 (Transport endpoint is not connected), terminating 2018-12-12 12:28:13,119 ERROR (collect_mounts_arequal) Collecting arequal-checksum failed on 10.70.35.11:/mnt/testvol_dispersed_glusterfs Expected results: ================= This should pass Additional info: ================= This issue is seen only for a few seconds after which mountpoint becomes accessible hence very difficult to reproduce it manually Tried a couple of times but was not able to reproduce the issue manually on downstream