Bug 1519315
Summary: | glusterfs 3.12.3 crashes with segmentation fault in glusterd_submit_request or rpcsvc_dump | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Alain Zscheile <zseri.devel> | ||||
Component: | rpc | Assignee: | Mohit Agrawal <moagrawa> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.12 | CC: | bugs, chewi, kkeithle, kvigor, rgowdapp | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-4.1.3 (or later) | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-08-29 03:36:09 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1521004 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Alain Zscheile
2017-11-30 15:22:07 UTC
possible related to: https://bugs.gentoo.org/635172 Created attachment 1360977 [details]
output of `emerge --info`
compiled sources (with applied gentoo patches): http://ezscheile.bplaced.net/glusterd-segv-work.tar.gz libtirpc version: 1.0.2-r1 (gentoo), there might be a bug in xdr_sizeof, which setups the x_ops structure, but doesn't set x_ops->x_getint32. relevant code snippets parts from glibc xdr_intXX_t.c: ---- __GI_xdr_uint64_t /* XDR 64bit integers */ bool_t xdr_int64_t (XDR *xdrs, int64_t *ip) { int32_t t1, t2; switch (xdrs->x_op) { case XDR_ENCODE: t1 = (int32_t) ((*ip) >> 32); t2 = (int32_t) (*ip); return (XDR_PUTINT32(xdrs, &t1) && XDR_PUTINT32(xdrs, &t2)); case XDR_DECODE: /*** SEGFAULT HERE ***/ if (!XDR_GETINT32(xdrs, &t1) || !XDR_GETINT32(xdrs, &t2)) return FALSE; *ip = ((int64_t) t1) << 32; *ip |= (uint32_t) t2; /* Avoid sign extension. */ return TRUE; case XDR_FREE: return TRUE; default: return FALSE; } } libc_hidden_nolink_sunrpc (xdr_int64_t, GLIBC_2_1_1) bool_t xdr_quad_t (XDR *xdrs, quad_t *ip) { return xdr_int64_t (xdrs, (int64_t *) ip); } libc_hidden_nolink_sunrpc (xdr_quad_t, GLIBC_2_3_4) ---- parts from libtirpc-1.0.2/src/ glusterfs-3.12.3/contrib/sunrpc/ xdr_sizeof.c ---- xdr_sizeof unsigned long xdr_sizeof (xdrproc_t func, void *data) { XDR x; struct xdr_ops ops; bool_t stat; #ifdef GF_DARWIN_HOST_OS typedef bool_t (*dummyfunc1) (XDR *, int *); #else typedef bool_t (*dummyfunc1) (XDR *, long *); #endif typedef bool_t (*dummyfunc2) (XDR *, caddr_t, u_int); ops.x_putlong = x_putlong; ops.x_putbytes = x_putbytes; ops.x_inline = x_inline; ops.x_getpostn = x_getpostn; ops.x_setpostn = x_setpostn; ops.x_destroy = x_destroy; /* the other harmless ones */ ops.x_getlong = (dummyfunc1) harmless; ops.x_getbytes = (dummyfunc2) harmless; /*** ops.x_getint32 NOT SET ***/ x.x_op = XDR_ENCODE; x.x_ops = &ops; x.x_handy = 0; x.x_private = (caddr_t) NULL; x.x_base = (caddr_t) 0; stat = func (&x, data, 0); if (x.x_private) free (x.x_private); return (stat == TRUE ? (unsigned) x.x_handy : 0); } ---- NOTE: glibc source taken from https://code.woboq.org/userspace/glibc/sunrpc/xdr_intXX_t.c.html glusterd compiled without libtirpc works. This bug only happens when glusterfs-3.12.3 is compiled against libtirpc-1.0.2-r1. It works with libtirpc-1.0.1-r1. So this is a bug in libtirpc. I contributed the patch for explicitly using libtirpc to master and backported it to 3.12.3 for Gentoo. I had been under the impression that libtirpc is just a drop-in replacement for glibc's RPC but after investigating this report, I have found that it segfaults unless you give --with-ipv6-default, which is new in 3.13.0. More specifically, the crash is avoided if I change addr_family in rpc_transport_inet_options_build (rpc/rpc-lib/src/rpc-transport.c) from inet to inet6. I don't understand why the former causes a crash. Is it not possible to use libtirpc without IPv6? Our libtirpc package allows you to build it with IPv6 support disabled, though this doesn't seem to make any difference to the crash. Do I also have to change the other instances of inet to inet6 or is the new --with-libtirpc flag effectively redundant because all the --with-ipv6-default code is actually required? Sorry for being slightly clueless here. I have used Gluster occasionally but I'm not the official Gentoo package maintainer, just a dev who thought he'd give the package some attention. I'm not yet familiar with IPv6 either. I am concerned that the flag I've added to master is effectively only good for causing segfaults so I'd like to resolve this before it ends up in a release. CC'ing Kevin Vigor because he added the --with-ipv6-default flag. backlink to a new backtrace, happens with applied patch: https://bugs.gentoo.org/639838#c16 The bug is probably in the part which uses RPC / XDR to communicate with peers and depends on libtirpc, the availablity of peers and IPv4/IPv6. Following Erik's testing, we have determined that changing rpc_transport_inet_options_build alone is not sufficient. Still looking for guidance here. REVIEW: https://review.gluster.org/19334 (build: Fix redefinitions when using libtirpc without IPv6 by default) posted (#1) for review on master by James Le Cuirot I initially thought the above commit would fix this issue. It doesn't but it is still related. This is actually fixed by https://review.gluster.org/#/c/19330. This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug. |