Bug 1451602
Summary: | Brick Multiplexing:Even clean Deleting of the brick directories of base volume is resulting in posix health check errors(just as we see in ungraceful delete methods) | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | core | Assignee: | Mohit Agrawal <moagrawa> | |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, rhs-bugs, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | brick-multiplexing | |||
Fixed In Version: | glusterfs-3.8.4-28 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1459781 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:43:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1417151, 1457219, 1459781, 1461647 |
Description
Nag Pavan Chilakam
2017-05-17 06:46:06 UTC
upstream patch : https://review.gluster.org/17356 downstream patch :https://code.engineering.redhat.com/gerrit/#/c/108021/ I am still seeing this issue on 3.8.4-27 I deleted the directory of the base volume post deletion of the base volume, and saw posix errors Broadcast message from systemd-journald.eng.blr.redhat.com (Wed 2017-06-07 19:12:25 IST): rhs-brick30-test3_30[23121]: [2017-06-07 13:42:25.016770] M [MSGID: 113075] [posix-helpers.c:1905:posix_health_check_thread_proc] 0-test3_30-posix: health-check failed, going down Message from syslogd@localhost at Jun 7 19:12:25 ... rhs-brick30-test3_30[23121]:[2017-06-07 13:42:25.016770] M [MSGID: 113075] [posix-helpers.c:1905:posix_health_check_thread_proc] 0-test3_30-posix: health-check failed, going down I was testing 1451598 - Brick Multiplexing: Deleting brick directories of the base volume must gracefully detach from glusterfsd without impacting other volumes IO(currently seeing transport end point error) hence moving to failed_qa RCA: As per log message you can say it is failed-QA but as per functionality aspect is not failed. Why? Earlier when u raised the bugzilla the main issue was thread was not cleaned up properly even after down the brick in graceful manner and it was a huge memory leak. That issue resolved after apply the patch https://review.gluster.org/17458, now the remaining issue is message is shown after remove the brick. It is not high priority but we are in development phase so i will post a patch for the same. Why it was missed in our testing?? As you can see below is the code to monitor health check file and in this code we are calling 30 second sleep before start the activity and then deferred cancel signal. In my testing i stopped the volume after just started the volume and remove the brick from backend. The testing steps were finished before call the function to monitor the file so it was passed in my testing. >>>>>>>>>>>>>>> while (1) { /* aborting sleep() is a request to exit this thread, sleep() * will normally not return when cancelled */ ret = sleep (interval); if (ret > 0) break; /* prevent thread errors while doing the health-check(s) */ pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, NULL); /* Do the health-check.*/ ret = posix_fs_health_check (this); if (ret < 0) goto abort; pthread_setcancelstate (PTHREAD_CANCEL_ENABLE, NULL); } >>>>>>>>>>>>>>>> Regards Mohit Agrawal Upstream patch link: REVIEW: https://review.gluster.org/17492 (glusterfsd: Deletion of brick dir throw emerg msgs after stop volume) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa) downstream patch : https://code.engineering.redhat.com/gerrit/#/c/108719/ onqa validation: not seeing posix warnings when repeating above testcase(as mentioned in description) hence moving to verified testversion:3.8.4-32 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |