Bug 1334985 - Under high read load, sometimes the message "XDR decoding failed" appears in the logs and read fails
Summary: Under high read load, sometimes the message "XDR decoding failed" appears in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.1.3
Assignee: Raghavendra G
QA Contact: Ambarish
URL:
Whiteboard: rpc, socket
Depends On: 1334285
Blocks: 1311817 1331502 1334287
TreeView+ depends on / blocked
 
Reported: 2016-05-11 03:43 UTC by Raghavendra G
Modified: 2016-06-23 05:22 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.7.9-5
Doc Type: Bug Fix
Doc Text:
Clone Of: 1334285
Environment:
Last Closed: 2016-06-23 05:22:17 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Description Raghavendra G 2016-05-11 03:43:53 UTC
+++ This bug was initially created as a clone of Bug #1334285 +++

+++ This bug was initially created as a clone of Bug #1331502 +++

Description of problem:

Under some circumstances, the following errors appear in the logs:

[2016-04-28 16:48:58.710106] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-test-client-2: XDR decoding failed [Invalid argument]
[2016-04-28 16:48:58.710223] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-test-client-2: remote operation failed [Invalid argument]

This causes the read to fail.

Apparently there are no problems in the bricks.

Version-Release number of selected component (if applicable): 3.7.11


How reproducible:

It appears randomly but quite consistently with the following steps.

Steps to Reproduce:
1. create a distributed-disperse 4*(4+2) volume
2. export it with ganesha using gfapi
3. mount it from another machine using NFSv4
4. iozone -t2 -s10g -r1024k -i0 -w -F <mount>/iozone{1..2}.dat
5. echo 3 >/proc/sys/vm/drop_caches
6. iozone -t2 -s10g -r1024k -i1 -w -F <mount>/iozone{1..2}.dat

Generally the problem appears soon after the step 6 begins.

Actual results:

The iozone read test fails.

Expected results:

The iozone test should complete successfully.

Additional info:

This seems caused by an incorrect handling of incoming TCP data. This info is at the moment that the error happens:

Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac, count=<optimized out>, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021
3021                    gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov
$1 = {iov_base = 0x7f53e994e018, iov_len = 116}
(gdb) x/116xb 0x7f53e994e018
0x7f53e994e018: 0x00    0x00    0x80    0x00    0x00    0x00    0x00    0x00
0x7f53e994e020: 0xa8    0xbf    0xa3    0xe0    0x5f    0x48    0x4c    0x1e
0x7f53e994e028: 0x80    0xa3    0x8a    0xd8    0x9d    0xa1    0x1c    0x75
0x7f53e994e030: 0x80    0xa3    0x8a    0xd8    0x9d    0xa1    0x1c    0x75
0x7f53e994e038: 0x00    0x00    0x00    0x00    0x00    0x00    0x08    0x00
0x7f53e994e040: 0x00    0x00    0x81    0xa0    0x00    0x00    0x00    0x01
0x7f53e994e048: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7f53e994e050: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7f53e994e058: 0x00    0x00    0x00    0x00    0xa0    0x00    0x00    0x00
0x7f53e994e060: 0x00    0x00    0x10    0x00    0x00    0x00    0x00    0x00
0x7f53e994e068: 0x00    0x50    0x00    0x00    0x57    0x22    0x04    0x1f
0x7f53e994e070: 0x25    0x38    0x92    0x91    0x57    0x22    0x04    0xb3
0x7f53e994e078: 0x03    0x53    0x1b    0x13    0x57    0x22    0x04    0xb3
0x7f53e994e080: 0x06    0xf5    0xe1    0x99    0x00    0x00    0x80    0x00
0x7f53e994e088: 0x00    0x00    0x00    0x26
(gdb) x/154xb 0x7f53e994e018
0x7f53e994e018: 0x00    0x00    0x80    0x00    0x00    0x00    0x00    0x00
0x7f53e994e020: 0xa8    0xbf    0xa3    0xe0    0x5f    0x48    0x4c    0x1e
0x7f53e994e028: 0x80    0xa3    0x8a    0xd8    0x9d    0xa1    0x1c    0x75
0x7f53e994e030: 0x80    0xa3    0x8a    0xd8    0x9d    0xa1    0x1c    0x75
0x7f53e994e038: 0x00    0x00    0x00    0x00    0x00    0x00    0x08    0x00
0x7f53e994e040: 0x00    0x00    0x81    0xa0    0x00    0x00    0x00    0x01
0x7f53e994e048: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7f53e994e050: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7f53e994e058: 0x00    0x00    0x00    0x00    0xa0    0x00    0x00    0x00
0x7f53e994e060: 0x00    0x00    0x10    0x00    0x00    0x00    0x00    0x00
0x7f53e994e068: 0x00    0x50    0x00    0x00    0x57    0x22    0x04    0x1f
0x7f53e994e070: 0x25    0x38    0x92    0x91    0x57    0x22    0x04    0xb3
0x7f53e994e078: 0x03    0x53    0x1b    0x13    0x57    0x22    0x04    0xb3
0x7f53e994e080: 0x06    0xf5    0xe1    0x99    0x00    0x00    0x80    0x00
0x7f53e994e088: 0x00    0x00    0x00    0x26    0x00    0x00    0x00    0x01
0x7f53e994e090: 0x00    0x00    0x00    0x17    0x00    0x00    0x00    0x02
0x7f53e994e098: 0x67    0x6c    0x75    0x73    0x74    0x65    0x72    0x66
0x7f53e994e0a0: 0x73    0x2e    0x69    0x6e    0x6f    0x64    0x65    0x6c
0x7f53e994e0a8: 0x6b    0x2d    0x63    0x6f    0x75    0x6e    0x74    0x00
0x7f53e994e0b0: 0x31    0x00
(gdb) bt
#0  client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac, count=<optimized out>, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021
#1  0x00007f542a677ab0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f54101cdef0, pollin=pollin@entry=0x7f54000091f0) at rpc-clnt.c:764
#2  0x00007f542a677d6f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f54101cdf20, event=<optimized out>, data=0x7f54000091f0) at rpc-clnt.c:925
#3  0x00007f542a673853 in rpc_transport_notify (this=this@entry=0x7f54101ddb70, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f54000091f0) at rpc-transport.c:546
#4  0x00007f541d881666 in socket_event_poll_in (this=this@entry=0x7f54101ddb70) at socket.c:2237
#5  0x00007f541d8842c4 in socket_event_handler (fd=fd@entry=30, idx=idx@entry=20, data=0x7f54101ddb70, poll_in=1, poll_out=0, poll_err=0) at socket.c:2350
#6  0x00007f542a90aa4a in event_dispatch_epoll_handler (event=0x7f540effc540, event_pool=0xd3d9a0) at event-epoll.c:575
#7  event_dispatch_epoll_worker (data=0x7f54100d0890) at event-epoll.c:678
#8  0x00007f542d0fbdc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f542c7bc28d in clone () from /lib64/libc.so.6

The fragment to decode has 116 bytes, but it should have 154 bytes (otherwise the xdata encoded is cut). I have verified that the remaining 38 bytes are stored in the beggining of the second iovec entry (iov[1]).

A network capture of this exact packet reveals the following data:

Frame 334067: 53754 bytes on wire (430032 bits), 53754 bytes captured (430032 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Apr 28, 2016 18:00:47.706023000 Hora de verano romance
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1461859247.706023000 seconds
    [Time delta from previous captured frame: 0.000130000 seconds]
    [Time delta from previous displayed frame: 0.002272000 seconds]
    [Time since reference or first frame: 5.865347000 seconds]
    Frame Number: 334067
    Frame Length: 53754 bytes (430032 bits)
    Capture Length: 53754 bytes (430032 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ip:tcp:rpc:glusterfs:data:rpc]
Ethernet II, Src: 00:1b:21:91:aa:30, Dst: 00:1b:21:91:aa:17
    Destination: 00:1b:21:91:aa:17
        Address: 00:1b:21:91:aa:17
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: 00:1b:21:91:aa:30
        Address: 00:1b:21:91:aa:30
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 193.109.174.232, Dst: 193.109.174.230
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 53740
    Identification: 0x517a (20858)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (6)
    Header checksum: 0x36e7 [validation disabled]
        [Good: False]
        [Bad: False]
    Source: 193.109.174.232
    Destination: 193.109.174.230
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 49152 (49152), Dst Port: 65205 (65205), Seq: 383891321, Ack: 1959945, Len: 53688
    Source Port: 49152
    Destination Port: 65205
    [Stream index: 11]
    [TCP Segment Len: 53688]
    Sequence number: 383891321    (relative sequence number)
    [Next sequence number: 383945009    (relative sequence number)]
    Acknowledgment number: 1959945    (relative ack number)
    Header Length: 32 bytes
    Flags: 0x010 (ACK)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...1 .... = Acknowledgment: Set
        .... .... 0... = Push: Not set
        .... .... .0.. = Reset: Not set
        .... .... ..0. = Syn: Not set
        .... .... ...0 = Fin: Not set
        [TCP Flags: *******A****]
    Window size value: 17518
    [Calculated window size: 17518]
    [Window size scaling factor: -1 (unknown)]
    Checksum: 0xb289 [validation disabled]
        [Good Checksum: False]
        [Bad Checksum: False]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
        No-Operation (NOP)
            Type: 1
                0... .... = Copy on fragmentation: No
                .00. .... = Class: Control (0)
                ...0 0001 = Number: No-Operation (NOP) (1)
        No-Operation (NOP)
            Type: 1
                0... .... = Copy on fragmentation: No
                .00. .... = Class: Control (0)
                ...0 0001 = Number: No-Operation (NOP) (1)
        Timestamps: TSval 192225006, TSecr 192258004
            Kind: Time Stamp Option (8)
            Length: 10
            Timestamp value: 192225006
            Timestamp echo reply: 192258004
    [SEQ/ACK analysis]
        [Bytes in flight: 53688]
    TCP segment data (32836 bytes)
    TCP segment data (20852 bytes)
[2 Reassembled TCP Segments (32952 bytes): #334064(116), #334067(32836)]
    [Frame: 334064, payload: 0-115 (116 bytes)]
    [Frame: 334067, payload: 116-32951 (32836 bytes)]
    [Segment count: 2]
    [Reassembled TCP length: 32952]
    [Reassembled TCP Data: 800080b40000896800000001000000000000000000000000...]
Remote Procedure Call, Type:Reply XID:0x00008968
    Fragment header: Last fragment, 32948 bytes
        1... .... .... .... .... .... .... .... = Last Fragment: Yes
        .000 0000 0000 0000 1000 0000 1011 0100 = Fragment Length: 32948
    XID: 0x00008968 (35176)
    Message Type: Reply (1)
    [Program: GlusterFS (1298437)]
    [Program Version: 330]
    [Procedure: READ (12)]
    Reply State: accepted (0)
    [This is a reply to a request in frame 333872]
    [Time from request: 0.002272000 seconds]
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
    Accept State: RPC executed successfully (0)
GlusterFS
    [Program Version: 330]
    [GlusterFS: READ (12)]
    Return value: 32768
    Errno: 0 (No error)
    IATT
        GFID: c38e35f0-31a1-4501-8a21-064b084c59df
        Inode: 9953243570797042143
        Device: 0x0000000000000800
        Mode: 0100640, S_IRUSR, S_IWUSR, S_IRGRP
            .... .... .... .... .... 0... .... .... = S_ISUID: Not set
            .... .... .... .... .... .0.. .... .... = S_ISGID: Not set
            .... .... .... .... .... ..0. .... .... = S_ISVTX: Not set
            .... .... .... .... .... ...1 .... .... = S_IRUSR: Set
            .... .... .... .... .... .... 1... .... = S_IWUSR: Set
            .... .... .... .... .... .... .0.. .... = S_IXUSR: Not set
            .... .... .... .... .... .... ..1. .... = S_IRGRP: Set
            .... .... .... .... .... .... ...0 .... = S_IWGRP: Not set
            .... .... .... .... .... .... .... 0... = S_IXGRP: Not set
            .... .... .... .... .... .... .... .0.. = S_IROTH: Not set
            .... .... .... .... .... .... .... ..0. = S_IWOTH: Not set
            .... .... .... .... .... .... .... ...0 = S_IXOTH: Not set
        Number of hard links: 1
        UID: 0
        GID: 0
        Root device: 0x0000000000000000
        Size: 2684354560
        Block size: 4096
        Blocks: 5242880
        Time of last access: Apr 28, 2016 17:58:23.604815960 Hora de verano romance
        Time of last modification: Apr 28, 2016 18:00:33.873544532 Hora de verano romance
        Time of last status change: Apr 28, 2016 18:00:33.874544506 Hora de verano romance
    Size: 32768
    Dict, contains 1 item
        [Size: 38 (40 bytes inc. RPC-roundup)]
        Items: 1
        glusterfs.inodelk-count: 1
        [RPC-roundup bytes: 00005c5c5c5c5c5c5c5c0000000000000000000000000000...]
    Data (32768 bytes)
        <raw data stripped>
        Data: 5c5c5c5c5c5c5c5c00000000000000000000000000000000...
        [Length: 32768]

We can see the answer has been reassembled from 2 TCP segments, the first one being of 116 bytes. Probably this is the cause of the problem.

Maybe the XDR decoding should make use of additional data in the second iovec entry if needed ?

--- Additional comment from Vijay Bellur on 2016-05-09 07:03:42 EDT ---

REVIEW: http://review.gluster.org/14270 (socket: Fix incorrect handling of partial reads) posted (#1) for review on master by Xavier Hernandez (xhernandez)

Comment 2 Raghavendra G 2016-05-16 04:59:32 UTC
https://code.engineering.redhat.com/gerrit/74342

Comment 5 Ambarish 2016-06-02 18:04:16 UTC
***********
TEST STEPS
***********

A) The way Xavier reported it : 

iozone -t2 -s10g -r1024k -i1 -w -F <mount>/iozone{1..2}.dat

after creating the data set.

B) In a distributed-multithreaded fashion from 4 clients and 16 workload threads: iozone -+m <Config file> -+h <one o the hostnames> -C -w -c -e -i 1 -+n -r 64k -s 8g -t 16

Either way,did not hit the reported issue.

Verified.

Comment 7 errata-xmlrpc 2016-06-23 05:22:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.