Bug 1528558
Summary: | /usr/sbin/glusterfs crashing on Red Hat OpenShift Container Platform node | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra G <rgowdapp> | |
Component: | write-behind | Assignee: | Raghavendra G <rgowdapp> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | mainline | CC: | atumball, bkunal, csaba, ravishankar, rgowdapp, rhinduja, rhs-bugs, sreber, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-4.0.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1516638 | |||
: | 1529094 1529095 1529096 (view as bug list) | Environment: | ||
Last Closed: | 2018-03-15 11:24:00 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1516638 | |||
Bug Blocks: | 1529094, 1529095, 1529096 |
Comment 1
Raghavendra G
2017-12-22 06:26:00 UTC
(gdb) thr 1 [Switching to thread 1 (Thread 0x7fba6ffff700 (LWP 64279))] #0 0x00007fba76602b17 in __wb_preprocess_winds (wb_inode=wb_inode@entry=0x7fba6434cd60) at write-behind.c:1325 1325 offset_expected = holder->stub->args.offset (gdb) p *holder $13 = {all = {next = 0x7fba65756520, prev = 0x7fba65756520}, todo = {next = 0x7fba643e8240, prev = 0x7fba6434cd88}, lie = {next = 0x7fba65756540, prev = 0x7fba65756540}, winds = {next = 0x7fba65756550, prev = 0x7fba65756550}, unwinds = {next = 0x7fba65756560, prev = 0x7fba65756560}, wip = { next = 0x7fba65756570, prev = 0x7fba65756570}, stub = 0x0, write_size = 20480, orig_size = 131072, total_size = 0, op_ret = 131072, op_errno = 0, refcount = 0, wb_inode = 0x7fba6434cd60, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x0, gen = 9, fd = 0x7fba648d99d0, wind_count = 9, ordering = {size = 131072, off = 1179648, append = 0, tempted = -1, lied = -1, fulfilled = -1, go = 0}} (gdb) p &holder->todo $14 = (list_head_t *) 0x7fba65756530 (gdb) p holder->lie $15 = {next = 0x7fba65756540, prev = 0x7fba65756540} (gdb) p &holder->lie $16 = (list_head_t *) 0x7fba65756540 (gdb) p holder->todo $17 = {next = 0x7fba643e8240, prev = 0x7fba6434cd88} (gdb) p &holder->todo $18 = (list_head_t *) 0x7fba65756530 (gdb) p holder->winds $19 = {next = 0x7fba65756550, prev = 0x7fba65756550} (gdb) p &holder->winds $20 = (list_head_t *) 0x7fba65756550 (gdb) p holder->unwinds $21 = {next = 0x7fba65756560, prev = 0x7fba65756560} (gdb) p &holder->unwinds $22 = (list_head_t *) 0x7fba65756560 (gdb) p holder->wip $23 = {next = 0x7fba65756570, prev = 0x7fba65756570} (gdb) p &holder->wip $24 = (list_head_t *) 0x7fba65756570 (gdb) p holder->stub $25 = (call_stub_t *) 0x0 Some observations: * holder->stub is NULL causing the crash * holder->refcount is 0, so the "holder" is likely to be freed * holder->todo is not an empty list. However, holder is not part of any other lists. Note that the wind_count for holder is 9. This means the request was retried multiple times to sync the content to backend. This can be due to two reasons: * previous syncs failed * previous syncs succeeded partially resulting in short write Note that holder->orig_size is 131072, but holder->write_size is 20480 which is less than the orig_size. This means we ran into case 2 above. Looking through how write-behind handles short writes, I saw this code: list_for_each_entry_safe (req, next, &head->winds, winds) { accounted_size = __wb_fulfill_short_write (req, size, &fulfilled); size -= accounted_size; if (size == 0) { if (fulfilled) req = next; break; } } Note that if the flag "fulfilled" is true, the entire content of request is written to backend and the request can potentially be freed. Also, as the contents are written, there is no need to "retry" that request. So, we move the "head" (tracked by variable req) of "requests to be retried" to the next request in list. But, the variable "fulfilled" is never reset for each member of the list. This means if _any_ of the previous requests are fulfilled, when we break from the loop (if (size == 0)), the "head" is moved to "next". This has some interesting consequences: * If we break from the loop while processing last member of the list head->winds, req is reset to head as the list is a circular one. However, head is already fulfilled and can potentially be freed. So, we end up adding a freed request to wb_inode->todo list. This is the RCA for the crash tracked by this bug (Note that we saw "holder" which is freed in todo list). * If we break from the loop while processing any of the last but one member of the list head->winds, req is set to next member in the list, skipping the current request, even though it is not entirely synced. This can lead to data corruption. The fix is very simple and we've to change the code to make sure "fulfilled" reflects whether the current request is fulfilled or not and it doesn't carry history of previous requests in the list. REVIEW: https://review.gluster.org/19064 (performance/write-behind: fix bug while handling short writes) posted (#1) for review on master by Raghavendra G COMMIT: https://review.gluster.org/19064 committed in master by \"Raghavendra G\" <rgowdapp> with a commit message- performance/write-behind: fix bug while handling short writes The variabled "fulfilled" in wb_fulfill_short_write is not reset to 0 while handling every member of the list. This has some interesting consequences: * If we break from the loop while processing last member of the list head->winds, req is reset to head as the list is a circular one. However, head is already fulfilled and can potentially be freed. So, we end up adding a freed request to wb_inode->todo list. This is the RCA for the crash tracked by the bug associated with this patch (Note that we saw "holder" which is freed in todo list). * If we break from the loop while processing any of the last but one member of the list head->winds, req is set to next member in the list, skipping the current request, even though it is not entirely synced. This can lead to data corruption. The fix is very simple and we've to change the code to make sure "fulfilled" reflects whether the current request is fulfilled or not and it doesn't carry history of previous requests in the list. Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8 BUG: 1528558 Signed-off-by: Raghavendra G <rgowdapp> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report. glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html [2] https://www.gluster.org/pipermail/gluster-users/ *** Bug 1434332 has been marked as a duplicate of this bug. *** |