| Summary: | NFS crash in nfs_fop_fsync_cbk | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Krishna Srinivas <krishna> |
| Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | nfs-alpha | CC: | gluster-bugs, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
How to reproduce? How often does it happen? Is this mainline or nfs-beta branch? Try exporting from gnfs using the trusted-write option as a work-around. Yes, you're right, it may just be the nfl access after mem-put. Checking it out.. (In reply to comment #1) > How to reproduce? How often does it happen? > This is a customer crash. Highly critical. It happened when there was a lot of I/O. No other known trigger. > Is this mainline or nfs-beta branch? This is rc8. > Try exporting from gnfs using the trusted-write option as a work-around. Are you sure this will fix the problem? because we will still be accessing invalid memory after mem_put() > > Yes, you're right, it may just be the nfl access after mem-put. Checking it > out.. It is definitely due to that, see this: (gdb) p *nfl Cannot access memory at address 0x1d73bc40 (In reply to comment #2) > (In reply to comment #1) ... ... > > Try exporting from gnfs using the trusted-write option as a work-around. > > Are you sure this will fix the problem? because we will still be accessing > invalid memory after mem_put() > Yes, at least for fsync because the client will not send COMMIT requests which translate to fsync fop. PATCH: http://patches.gluster.com/patch/4422 in master (nfs: Free fop local only after inode checks) Mem-pool starts CALLOCing when the pool over-flows, for this data structure, the pool will overflow in a very high load situation and only then dereference of a free area will happen. Keeping unresolved till I figure out a way to reproduce without the need for very high load. |
Customer crash. backtrace: Core was generated by `/opt/glusterfs/sbin/glusterfs -f /etc/glusterfs/nfs.vol'. Program terminated with signal 11, Segmentation fault. #0 0x00002b3ea92e6cc4 in nfs_fop_fsync_cbk (frame=0x1d69bd18, cookie=0x1d204850, this=0x1d201060, op_ret=0, op_errno=0, prebuf=0x7fff966c1d90, postbuf=0x7fff966c1d20) at nfs-fops.c:1170 1170 nfs_fop_restore_root_ino (nfl, prebuf, postbuf, NULL, NULL); (gdb) bt #0 0x00002b3ea92e6cc4 in nfs_fop_fsync_cbk (frame=0x1d69bd18, cookie=0x1d204850, this=0x1d201060, op_ret=0, op_errno=0, prebuf=0x7fff966c1d90, postbuf=0x7fff966c1d20) at nfs-fops.c:1170 #1 0x00002b3ea90c6820 in iot_fsync_cbk (frame=0x1d6bb2e0, cookie=0x2aaac00de840, this=0x1d201060, op_ret=0, op_errno=0, prebuf=0x7fff966c1d90, postbuf=0x7fff966c1d20) at io-threads.c:893 #2 0x00002b3ea8eb0019 in client_fsync_cbk (frame=0x2aaac00de840, hdr=0x2aaac8006e70, hdrlen=268, iobuf=0x0) at client-protocol.c:4324 #3 0x00002b3ea8eb5af8 in protocol_client_interpret (this=0x1d1f9c00, trans=0x2aaaac0048e0, hdr_p=0x2aaac8006e70 "", hdrlen=268, iobuf=0x0) at client-protocol.c:6137 #4 0x00002b3ea8eb67be in protocol_client_pollin (this=0x1d1f9c00, trans=0x2aaaac0048e0) at client-protocol.c:6435 #5 0x00002b3ea8eb6e35 in notify (this=0x1d1f9c00, event=2, data=0x2aaaac0048e0) at client-protocol.c:6554 #6 0x00002b3ea83c5b7c in xlator_notify (xl=0x1d1f9c00, event=2, data=0x2aaaac0048e0) at xlator.c:919 #7 0x00002aaaaaf09e96 in socket_event_poll_in (this=0x2aaaac0048e0) at socket.c:731 #8 0x00002aaaaaf0a18b in socket_event_handler (fd=16, idx=8, data=0x2aaaac0048e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:831 #9 0x00002b3ea83ec2b9 in event_dispatch_epoll_handler (event_pool=0x1d1f18b0, events=0x2aaaac009b20, i=0) at event.c:804 #10 0x00002b3ea83ec48e in event_dispatch_epoll (event_pool=0x1d1f18b0) at event.c:867 #11 0x00002b3ea83ec7a4 in event_dispatch (event_pool=0x1d1f18b0) at event.c:975 #12 0x0000000000406344 in main (argc=3, argv=0x7fff966c29f8) at glusterfsd.c:1494 (gdb) p *nfl Cannot access memory at address 0x1d73bc40 looking at the code: nfs_fop_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int32_t op_ret, int32_t op_errno, struct iatt *prebuf, struct iatt *postbuf) { struct nfs_fop_local *nfl = NULL; fop_fsync_cbk_t progcbk = NULL; nfl_to_prog_data (nfl, progcbk, frame); nfs_fop_restore_root_ino (nfl, prebuf, postbuf, NULL, NULL); nfl_to_prog_data() does mem_put() of nfl after which nfl is accessed in nfs_fop_restore_root_ino() which might cause segfault