Bug 1456385
Summary: | glusterfs client crash on io-cache.so(__ioc_page_wakeup+0x44) | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> | |
Component: | io-cache | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | amukherj, bugs, csaba, moagrawa, nbalacha, rhinduja, rhs-bugs, rnalakka, sbhaloth | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.12.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1435357 | |||
: | 1457054 1457058 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-05 17:32:07 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1435357, 1457054, 1457058 |
Description
Nithya Balachandran
2017-05-29 09:39:48 UTC
Core was generated by `/usr/sbin/glusterfs --volfile-server=redhatstorage.web.skynas.local --volfile-i'. Program terminated with signal 11, Segmentation fault. #0 0x00007f45e525e5b4 in __ioc_page_wakeup (page=0x7f43246e1500, page@entry=0x7f45f17d0d64, op_errno=0) at page.c:960 960 gf_msg_trace (page->inode->table->xl->name, 0, Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-4.el7.x86_64 (gdb) bt #0 0x00007f45e525e5b4 in __ioc_page_wakeup (page=0x7f43246e1500, page@entry=0x7f45f17d0d64, op_errno=0) at page.c:960 #1 0x00007f45e525ffa4 in ioc_inode_wakeup (frame=0x7f45e00396c8, frame@entry=0x7f45f17d0d64, ioc_inode=ioc_inode@entry=0x7f45e0e62160, stbuf=stbuf@entry=0x7f45e69cca10) at ioc-inode.c:119 #2 0x00007f45e5257b2b in ioc_cache_validate_cbk (frame=0x7f45f17d0d64, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=<optimized out>, stbuf=<optimized out>, xdata=0x0) at io-cache.c:402 #3 0x00007f45e566edfa in ra_attr_cbk (frame=0x7f45f17e22e0, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7f45e69cca10, xdata=0x0) at read-ahead.c:721 #4 0x00007f45f3c47ada in default_fstat_cbk (frame=0x7f45f17b7188, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7f45e69cca10, xdata=0x0) at defaults.c:1053 #5 0x00007f45e5aea505 in dht_file_attr_cbk (frame=0x7f45f17ba090, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, stbuf=<optimized out>, xdata=0x0) at dht-inode-read.c:214 #6 0x00007f45e5d27de1 in afr_fstat_cbk (frame=0x7f45f17562d8, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0, buf=0x7f45e69cca10, xdata=0x0) at afr-inode-read.c:291 #7 0x00007f45e5fa7f8e in client3_3_fstat_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f45f17e1c28) at client-rpc-fops.c:1574 #8 0x00007f45f3a0c990 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f45e03547c0, pollin=pollin@entry=0x7f45e1033480) at rpc-clnt.c:764 #9 0x00007f45f3a0cc4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f45e03547f0, event=<optimized out>, data=0x7f45e1033480) at rpc-clnt.c:905 #10 0x00007f45f3a08793 in rpc_transport_notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at rpc-transport.c:546 #11 0x00007f45e86a19b4 in socket_event_poll_in (this=0x7f45e0364440) at socket.c:2355 #12 0x00007f45e86a45f4 in socket_event_handler (fd=<optimized out>, idx=8, data=0x7f45e0364440, poll_in=1, poll_out=0, poll_err=0) at socket.c:2469 #13 0x00007f45f3cacc0a in event_dispatch_epoll_handler (event=0x7f45e69cce80, event_pool=0x7f45f507c350) at event-epoll.c:570 #14 event_dispatch_epoll_worker (data=0x7f45f50d2ff0) at event-epoll.c:678 #15 0x00007f45f2aa6dc5 in start_thread (arg=0x7f45e69cd700) at pthread_create.c:308 #16 0x00007f45f23ebced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) p *page $1 = {page_lru = {next = 0xbabebabe, prev = 0xcafecafe}, inode = 0x0, priority = 0x0, dirty = 0 '\000', ready = 1 '\001', vector = 0x0, count = 1, offset = 356384768, size = 131072, waitq = 0x0, iobref = 0x7f45d235fe40, page_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}, op_errno = 0, stale = 1 '\001'} This segfaults in gf_msg_trace : (gdb) p *page->inode Cannot access memory at address 0x0 (gdb) f 1 #1 0x00007f45e525ffa4 in ioc_inode_wakeup (frame=0x7f45e00396c8, frame@entry=0x7f45f17d0d64, ioc_inode=ioc_inode@entry=0x7f45e0e62160, stbuf=stbuf@entry=0x7f45e69cca10) at ioc-inode.c:119 119 page_waitq = (gdb) l 114 if (waiter_page) { 115 if (cache_still_valid) { 116 /* cache valid, wake up page */ 117 ioc_inode_lock (ioc_inode); 118 { 119 page_waitq = 120 __ioc_page_wakeup (waiter_page, 121 waiter_page->op_errno); 122 } 123 ioc_inode_unlock (ioc_inode); (gdb) p waiter_page $2 = (ioc_page_t *) 0x7f45f17d0d64 (gdb) p *waiter_page $3 = {page_lru = {next = 0x7f45f15a95ec, prev = 0x0}, inode = 0x7f45f15a9c54, priority = 0x7f45f17e22f0, dirty = -56 '\310', ready = -106 '\226', vector = 0x7f45e0029c20, count = 0, offset = 4294967296, size = 0, waitq = 0x0, iobref = 0x0, page_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, op_errno = 0, stale = 0 '\000'} (gdb) f 0 (gdb) p page->inode $49 = (struct ioc_inode *) 0x0 (gdb) p page->inode $49 = (struct ioc_inode *) 0x0 (gdb) p waitq $50 = (ioc_waitq_t *) 0x0 No symbol "inode" in current context. (gdb) l 955 waitq = page->waitq; 956 page->waitq = NULL; 957 958 page->ready = 1; 959 960 gf_msg_trace (page->inode->table->xl->name, 0, 961 "page is %p && waitq = %p", page, waitq); 962 963 for (trav = waitq; trav; trav = trav->next) { 964 frame = trav->data; (gdb) p page $51 = (ioc_page_t *) 0x7f43246e1500 (gdb) p *page $52 = {page_lru = {next = 0xbabebabe, prev = 0xcafecafe}, inode = 0x0, priority = 0x0, dirty = 0 '\000', ready = 1 '\001', vector = 0x0, count = 1, offset = 356384768, size = 131072, waitq = 0x0, iobref = 0x7f45d235fe40, page_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}, op_errno = 0, stale = 1 '\001'} This page has already been freed. REVIEW: https://review.gluster.org/17410 (perf/ioc: Fix race causing crash when accessing freed page) posted (#1) for review on master by N Balachandran (nbalacha) From Mohit:
gdb) f 0
#0 0x00007f45e525e5b4 in __ioc_page_wakeup (page=0x7f43246e1500, page@entry=0x7f45f17d0d64, op_errno=0) at page.c:960
960 gf_msg_trace (page->inode->table->xl->name, 0,
(gdb) p page->inode
$1 = (struct ioc_inode *) 0x0
(gdb) f 1
#1 0x00007f45e525ffa4 in ioc_inode_wakeup (frame=0x7f45e00396c8, frame@entry=0x7f45f17d0d64,
ioc_inode=ioc_inode@entry=0x7f45e0e62160, stbuf=stbuf@entry=0x7f45e69cca10) at ioc-inode.c:119
119 page_waitq =
(gdb) p waiter
$2 = (ioc_waitq_t *) 0x7f4325f60140
(gdb) p ioc_inode
$3 = (ioc_inode_t *) 0x7f45e0e62160
(gdb) p *ioc_inode
$4 = {table = 0x7f45e0038f10, ia_size = 1037687610, cache = {page_table = 0x7f4324fe6940, page_lru = {next = 0x7f45d124aad0,
prev = 0x7f45d124aad0}, mtime = 1480448949, mtime_nsec = 697732971, tv = {tv_sec = 1490221892, tv_usec = 31395}},
inode_list = {next = 0x7f45d805d5b8, prev = 0x7f45dbfec4b8}, inode_lru = {next = 0x7f45e0039020, prev = 0x7f45e0774068},
waitq = 0x0, inode_lock = {__data = {__lock = 2, __count = 0, __owner = 11655, __nusers = 1, __kind = 0, __spins = 0,
__list = {__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\207-\000\000\001", '\000' <repeats 26 times>, __align = 2}, weight = 1,
inode = 0x7f45d78719a8}
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f45d7498700 (LWP 11659))]
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 2: movl %edx, %eax
(gdb) f 3
#3 0x00007f45e525d181 in ioc_prune (table=0x7f45e0038f10) at page.c:221
221 ioc_inode_lock (curr);
(gdb) p curr
$5 = (ioc_inode_t *) 0x7f45e0e62160
(gdb) p index
$6 = 1
>>>>>>>>>>>>>>>>>>>>>>>>
IMO in ioc_inode_wakeup we are not fetching waiter queue from ioc_inode in correct way,I think we should fetch
waiter page in while loop at the time of calling __ioc_page_wakeup.
Regards
Mohit Agrawal
REVIEW: https://review.gluster.org/17410 (perf/ioc: Fix race causing crash when accessing freed page) posted (#2) for review on master by N Balachandran (nbalacha) REVIEW: https://review.gluster.org/17410 (perf/ioc: Fix race causing crash when accessing freed page) posted (#3) for review on master by N Balachandran (nbalacha) COMMIT: https://review.gluster.org/17410 committed in master by Jeff Darcy (jeff.us) ------ commit 6b6162f7ff93bccef0e615cb490e881168827e1d Author: N Balachandran <nbalacha> Date: Mon May 29 15:21:39 2017 +0530 perf/ioc: Fix race causing crash when accessing freed page ioc_inode_wakeup does not lock the ioc_inode for the duration of the operation, leaving a window where ioc_prune could find a NULL waitq and hence free the page which ioc_inode_wakeup later tries to access. Thanks to Mohit for the analysis. credit: moagrawa Change-Id: I54b064857e2694826d0c03b23f8014e3984a3330 BUG: 1456385 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17410 Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/ |