Bug 1627610
Summary: | glusterd crash in regression build | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Sanju <srakonde> | |
Component: | glusterd | Assignee: | Sanju <srakonde> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | amukherj, bugs | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-6.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1631418 1633552 (view as bug list) | Environment: | ||
Last Closed: | 2019-03-25 16:30:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1631418, 1633552 |
Description
Sanju
2018-09-11 02:45:01 UTC
Root Cause: From Thread 7: #10 0x00007f50dd2801f9 in glusterd_store_volinfo (volinfo=0x902290, ac=GLUSTERD_VOLINFO_VER_AC_NONE) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1806 From Thread 1: #10 0x00007f50dd2801f9 in glusterd_store_volinfo (volinfo=0x902290, ac=GLUSTERD_VOLINFO_VER_AC_NONE) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1806 From above snippets from the output of "t a a bt", we can say that Thread 7 and Thread 1 are pointing to the same volinfo structure. Source code for glusterd_store volinfo_write: int32_t glusterd_store_volinfo_write (int fd, glusterd_volinfo_t *volinfo) { int32_t ret = -1; gf_store_handle_t *shandle = NULL; GF_ASSERT (fd > 0); GF_ASSERT (volinfo); GF_ASSERT (volinfo->shandle); shandle = volinfo->shandle; ret = glusterd_volume_exclude_options_write (fd, volinfo); if (ret) goto out; shandle->fd = fd; dict_foreach (volinfo->dict, _storeopts, shandle); dict_foreach (volinfo->gsync_slaves, _storeslaves, shandle); shandle->fd = 0; out: gf_msg_debug (THIS->name, 0, "Returning %d", ret); return ret; } At Thread 1, #8 0x00007f50dd27e211 in glusterd_store_volinfo_write (fd=8, volinfo=0x902290) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1157 glusterd_store_volinfo_write is calling _storeopts, which again calls gf_store_save_value. _storeopts is also having a assertion check for whether fd>0. At glusterd_store_volinfo_write fd value is 8. #4 0x00007f50e882b341 in gf_store_save_value (fd=0, key=0x91bff0 "performance.client-io-threads", value=0x8bcc40 "off") at /home/jenkins/root/workspace/regression-test-burn-in/libglusterfs/src/store.c:344 From above we can see that fd value is 0. At Thread 7, #8 0x00007f50dd27edbf in glusterd_store_brickinfos (volinfo=0x902290, vol_fd=16) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1373 #9 0x00007f50dd27fa35 in glusterd_store_perform_volume_store (volinfo=0x902290) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1613 #10 0x00007f50dd2801f9 in glusterd_store_volinfo (volinfo=0x902290, ac=GLUSTERD_VOLINFO_VER_AC_NONE) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1806 #11 0x00007f50dd258a76 in glusterd_restart_bricks (opaque=0x0) at /home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-utils.c:6422 #12 0x00007f50e883111e in synctask_wrap () at /home/jenkins/root/workspace/regression-test-burn-in/libglusterfs/src/syncop.c:375 #13 0x00007f50e6e42030 in ?? () from ./lib64/libc.so.6 #14 0x0000000000000000 in ?? () In the stack, we can see glusterd_store_perform_volume_store calling glusterd_store_brickinfos. Before calling glusterd_store_brickinfos, glusterd_store_perform_volume_store calls glusterd_store_volinfo_write, which is writing shandle->fd as 0. So, Thread 7 updated the fd value as 0, where as Thread 1 is expecting fd > 0. This is happening because we are having a separate syntask for glusterd_restart_bricks. We can see glusterd_restart_bricks at Thread 7 bt. Solution for this can be, acquiring the locks before writing in a critical section. Need to explore more on the solution. Link to the regression build in which core generated is: https://build.gluster.org/job/regression-test-burn-in/4085/ REVIEW: https://review.gluster.org/21150 (glusterd: acquire write lock to update volinfo structure) posted (#1) for review on master by Sanju Rakonde COMMIT: https://review.gluster.org/21150 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: acquire lock to update volinfo structure Problem: With commit cb0339f92, we are using a separate syntask for restart_bricks. There can be a situation where two threads are accessing the same volinfo structure at the same time and updating volinfo structure. This can lead volinfo to have inconsistent values and assertion failures because of unexpected values. Solution: While updating the volinfo structure, acquire a store_volinfo_lock, and release the lock only when the thread completed its critical section part. Fixes: bz#1627610 Signed-off-by: Sanju Rakonde <srakonde> Change-Id: I545e4e2368e3285d8f7aa28081ff4448abb72f5d This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report. glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html [2] https://www.gluster.org/pipermail/gluster-users/ |