Bug 1302199
| Summary: | Scrubber crash (list corruption) | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Venky Shankar <vshankar> | |
| Component: | bitrot | Assignee: | bugs <bugs> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
| Severity: | unspecified | Docs Contact: | bugs <bugs> | |
| Priority: | unspecified | |||
| Version: | 3.7.7 | CC: | bugs, khiremat, manu, rabhat, vbellur | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.7.7 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1302201 (view as bug list) | Environment: | ||
| Last Closed: | 2016-04-19 07:53:11 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1302201 | |||
_br_scrubber_find_scrubbable_entry() does a pthread_cond_wait(...) to get signalled when ->scrublist is non-empty:
if (list_empty (&fsscrub->scrublist))
pthread_cond_wait (&fsscrub->cond, &fsscrub->mutex);
pthread_cond_wait() is prone to spurious wakeups as mentioned in man(3) pthread_cond_wait and callers are expected to validate the condition again. In the above case, if pthread_cond_wait() returns prematurely, then accessing first element of ->scrublist and calling list_entry() would give garbage.
REVIEW: http://review.gluster.org/13307 (features / bitrot: Prevent spurious pthread_cond_wait() wakeup) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu) COMMIT: http://review.gluster.org/13307 committed in release-3.7 by Venky Shankar (vshankar) ------ commit f5ff11159544aa16b44965a0ce4500e9c615895d Author: Venky Shankar <vshankar> Date: Wed Jan 27 17:04:18 2016 +0530 features / bitrot: Prevent spurious pthread_cond_wait() wakeup Backport of http://review.gluster.org/13302 pthread_cond_wait() is prone to spurious wakeups and it's utmost necessarry to check a boolean predicate for thread continuation. See man(3) pthread_cond_wait() for details. The following is done in bitrot scrubber: if (list_empty (&fsscrub->scrublist)) pthread_cond_wait (&fsscrub->cond, &fsscrub->mutex); followed by: list_first_entry (&fsscrub->scrublist, ...) A spurious wakeup from pthread_cond_wait() with the absence of list_empty() check causes list_first_entry() to return garbage. BUG: 1302199 Change-Id: I60151eabb8af257a35acd8e7c117876388166a0e Signed-off-by: Venky Shankar <vshankar> Reviewed-on: http://review.gluster.org/13307 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report. glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |
Description of problem: Emmanuel reported a scrubber crash in NetBSD. Backtrace shows list corruption when bitrot scrubber tries to fetch an item to scrub from a set of bricks. Backtrace: (gdb) bt #0 0xbb213b74 in list_del_init (old=0x0) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/list.h:87 #1 0xbb21682f in _br_scrubber_get_entry (child=0xbb106924, fsentry=0xb84fcfc0) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1033 #2 0xbb2168b0 in _br_scrubber_find_scrubbable_entry (fsscrub=0xbb106cf0, fsentry=0xb84fcfc0) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1055 #3 0xbb216959 in br_scrubber_pick_entry (fsscrub=0xbb106cf0, fsentry=0xb84fcfc0) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1077 #4 0xbb216b0f in br_scrubber_proc (arg=<error reading variable: Cannot access memory at address 0xb84fcfd8>) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c:1153 Version-Release number of selected component (if applicable): 3.7 How reproducible: Intermittently Steps to Reproduce: Run the following test case: ./tests/bitrot/br-state-check.t Actual results: Test case fails at times and scrubber crashes Expected results: Test case should pass (and generate no cores) Additional info: