Bug 950121
Summary: | gluster doesn't like Oracle's FSINFO RPC call | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Michael Brown <michael> | ||||||||||||||
Component: | nfs | Assignee: | Niels de Vos <ndevos> | ||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | unspecified | ||||||||||||||||
Version: | 3.3.1 | CC: | gluster-bugs, kaushal, ndevos | ||||||||||||||
Target Milestone: | --- | ||||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | glusterfs-3.5.0 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | |||||||||||||||||
: | 955753 (view as bug list) | Environment: |
2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL 6.4 64-bit, glusterfs-server-3.3.1-1.el6.x86_64 (from EPEL)
4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7 64-bit, glusterfs-3.3.1-11.el5 (from kkeithley's repo, only used for testing)
bricks are 400GB SSDs with ext4 (and dir_index off)
common network is 10GbE, replication between servers happens over direct 10GbE link.
gluster> volume info gv0
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 20117b48-7f88-4f16-9490-a0349afacf71
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdata
Brick2: fearless2:/export/bricks/500117310007a674/glusterdata
Brick3: fearless1:/export/bricks/500117310007a714/glusterdata
Brick4: fearless2:/export/bricks/500117310007a684/glusterdata
Brick5: fearless1:/export/bricks/500117310007a7dc/glusterdata
Brick6: fearless2:/export/bricks/500117310007a694/glusterdata
Brick7: fearless1:/export/bricks/500117310007a7e4/glusterdata
Brick8: fearless2:/export/bricks/500117310007a720/glusterdata
Brick9: fearless1:/export/bricks/500117310007a7ec/glusterdata
Brick10: fearless2:/export/bricks/500117310007a74c/glusterdata
Brick11: fearless1:/export/bricks/500117310007a838/glusterdata
Brick12: fearless2:/export/bricks/500117310007a814/glusterdata
Brick13: fearless1:/export/bricks/500117310007a850/glusterdata
Brick14: fearless2:/export/bricks/500117310007a84c/glusterdata
Brick15: fearless1:/export/bricks/500117310007a858/glusterdata
Brick16: fearless2:/export/bricks/500117310007a8f8/glusterdata
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: off
|
||||||||||||||
Last Closed: | 2014-04-17 11:41:35 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Michael Brown
2013-04-09 17:00:23 UTC
Created attachment 733286 [details]
Good FSINFO RPC from Linux
Created attachment 733287 [details]
Text summary of failed FSINFO RPC
Created attachment 733288 [details]
Text summary of successful FSINFO RPC
Niels de Vos <ndevos> points out in http://lists.gnu.org/archive/html/gluster-devel/2013-04/msg00050.html: « XDR (http://tools.ietf.org/html/rfc4506, the encoding used for the RPC protocol) uses 'blocks' for alignment. A fhandle byte array that is 34-bytes long, needs to be (34 / 4 + 1)*4 = 36 bytes in size. The 'length' given in the structure tells the consumer to ignore the two tailing bytes. The NFSv3 specification (http://tools.ietf.org/html/rfc1813#page-21) defines the nfs_fh3 as a opaque (not bytes) structure. My guess is that this (untested) change would fix it, can you try that? » It didn't :) Looks like Niels may have identified the problem, still need to fix it however. New proposal sent to Michael with gluster-devel@ on CC: xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp) { uint32_t size; if (!xdr_int (xdrs, &size)) if (!xdr_opaque (xdrs, (u_int *)&objp->data.data_val, size)) return FALSE return TRUE; } Created attachment 735043 [details]
Proposed patch for testing
23:51 < ndevos> Supermathie: ah, I've thought of the error in my
suggestion - that function is used to encode and decode
23:52 < ndevos> which means, that the size parameter must be set
correctly - the .data_len attribute contain the size when encoding,
and should be overwritten when decoding
23:53 < ndevos> KERBOOM happens when an idea is only half looked at :-/
Maybe something the attached patch works better? It should encode/decode
both the length and the fhandle value. Compile tested only.
Created attachment 735301 [details]
Updated patch
This patch does not break the Linux NFS client. I wonder if this makes it
possible to use the Oracle DNF client.
What happens when gluster accepts the bad RPC in the FSINFO handler is that things continue on, but that same bad XDR blocking keeps coming in and causes the glusterfs NFS daemon to crash. Test cases need to be added to gluster to be more robust in handling this situation. Regarding Oracle, I'm able to work around the problem by expanding the size of the FD so that it happens to be congruent to 0mod4 bytes: https://github.com/Supermathie/glusterfs/commit/95880cf71375cb4b04a1b645598c7570c5087de7 I'm morally opposed to submitting this for inclusion in Gluster however - Oracle needs to fix their code! I'm inclined to leave this bug open as a request for better robustness in the handling of bad XDR encoding in incoming RPCs - they shouldn't be crashing Gluster's NFS. REVIEW: http://review.gluster.org/4918 (Expand gluster's NFS FD header to 4 bytes) posted (#2) for review on master by Anand Avati (avati) COMMIT: http://review.gluster.org/4918 committed in master by Anand Avati (avati) ------ commit 39a1eaf38d64f66dfa74c6843dc9266f40dd4645 Author: Michael Brown <michael> Date: Tue Apr 30 11:34:57 2013 -0400 Expand gluster's NFS FD header to 4 bytes * https://bugzilla.redhat.com/show_bug.cgi?id=950121 * Oracle's DNFS does not properly XDR encoding on NFS FDs that are not congruent to 0mod4 bytes long * This patch is a workaround to support Oracle's buggy code Change-Id: Ic621e2cd679a86aa9a06ed9ca684925e1e0ec43f BUG: 950121 Signed-off-by: Michael Brown <michael> Reviewed-on: http://review.gluster.org/4918 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Anand Avati <avati> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |