Bug 1450004 - [nl-cache] Remove-brick rebalance on a node failed; log says "Readdirp failed. Aborting data migration for directory: / [File descriptor in bad state]"
Summary: [nl-cache] Remove-brick rebalance on a node failed; log says "Readdirp failed...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: samba
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Milind Changire
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-05-11 11:08 UTC by Prasad Desala
Modified: 2017-09-21 04:41 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:41:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Prasad Desala 2017-05-11 11:08:00 UTC
Description of problem:
=======================
Remove-brick rebalance on a node failed; log says "Readdirp failed. Aborting data migration for directory: / [File descriptor in bad state]"

Version-Release number of selected component (if applicable):
3.8.4-24.el7rhgs.x86_64
samba-4.6.3-0.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a distributed-replicate volume and start it.
2) Enable nl-cache settings.
3) cifs mount the volume on multiple clients.
4) Perform below tasks simultaneously from multiple clients,
Client-1: Start creating 6000 files
for i in {1..6000}; do cat /etc/redhat-release > f$i;done
Continuous lookup -for i in {1..4000};do ll f$i fl$i;done 
Client-2: Start creating hardlinks
for i in {1..4000};do ln f$i fl$i;done
Continuous lookup - for i in {1..4000};do ll f$i fl$i;done
5) While step-4 is in-progress start removing few bricks from the volume.
Keep monitoring remove-brick status.

Rebalance logs:
===============
[2017-05-11 10:17:59.827740] I [MSGID: 109022] [dht-rebalance.c:1998:dht_migrate_file] 0-distrep-dht: completed migration of /f2577 from subvolume distrep-replicate-9 to distrep-replicate-3
[2017-05-11 10:17:59.832095] I [MSGID: 109069] [dht-common.c:1322:dht_lookup_unlink_cbk] 0-distrep-dht: lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for /fl341
[2017-05-11 10:17:59.842942] W [MSGID: 114061] [client-common.c:1198:client_pre_readdirp] 0-distrep-client-6:  (00000000-0000-0000-0000-000000000001) remote_fd is -1. EBADFD [File descriptor in bad state]
[2017-05-11 10:17:59.843917] W [MSGID: 109021] [dht-rebalance.c:2684:gf_defrag_get_entry] 0-distrep-dht: Readdirp failed. Aborting data migration for directory: / [File descriptor in bad state]
[2017-05-11 10:17:59.843984] W [dht-rebalance.c:3142:gf_defrag_process_dir] 0-distrep-dht: Found error from gf_defrag_get_entry
[2017-05-11 10:17:59.844404] E [MSGID: 109111] [dht-rebalance.c:3439:gf_defrag_fix_layout] 0-distrep-dht: gf_defrag_process_dir failed for directory: /
[2017-05-11 10:17:59.844614] I [dht-rebalance.c:2563:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 3
[2017-05-11 10:17:59.844745] I [dht-rebalance.c:2563:gf_defrag_task] 0-DHT: Thread wokeup. defrag->current_thread_count: 4

NOTE: Only smb packages are installed on the system and CTDB is **NOT** configured.

Actual results:
===============
Remove-brick rebalance on a node failed; log says "Readdirp failed. Aborting data migration for directory: / [File descriptor in bad state]"

Expected results:
================
Rebalance should complete without any failures.

Comment 3 Poornima G 2017-05-12 04:46:42 UTC
nl-cache mostly has nothing to do with readdirp or rebalance, can you please try the same disabling nl-cache? and is it consistently seen only with nl-cache enabled?

Comment 4 Poornima G 2017-05-12 04:47:04 UTC
Need info on comment 3

Comment 7 Poornima G 2017-05-17 06:42:47 UTC
Also i see that parallel-readdir was not enabled on this volume. Hence looks like neither nl-cache nor parallel-readir bug.

Comment 8 Nithya Balachandran 2017-05-17 07:04:48 UTC
Assigning this to myself while I check if this is a DHT issue. I am not changing the component just yet.

Comment 11 Nithya Balachandran 2017-05-17 14:42:58 UTC
CCing Atin on this BZ as the crash looks like it may be related to brick mux.

Comment 16 Soumya Koduri 2017-05-29 10:07:10 UTC
From the core -->

(gdb) f 1
#1  0x00007f74e9995dfa in server_process_event_upcall (this=0x7f74ec0283d0, data=data@entry=0x7f74b8571ef0)
    at server.c:1353
1353	                        if (!client || strcmp(client->client_uid, client_uid))
(gdb) p client
$1 = <optimized out>
(gdb) p client
$2 = <optimized out>
(gdb) l
1348	                list_for_each_entry (xprt, &conf->xprt_list, list) {
1349	                        client = xprt->xl_private;
1350	
1351	                        /* 'client' is not atomically added during xprt entry
1352	                         * addition to the list. */
1353	                        if (!client || strcmp(client->client_uid, client_uid))
1354	                                continue;
1355	
1356	                        ret = rpcsvc_request_submit (conf->rpc, xprt,
1357	                                                     &server_cbk_prog,
(gdb) p xprt
$3 = (rpc_transport_t *) 0x7f74ec2f8b90
(gdb) p xprt->xl_private
$4 = (struct _client_t *) 0x7f74c0009600
(gdb) p *$4
$5 = {scratch_ctx = {lock = {spinlock = -1073729472, mutex = {__data = {__lock = -1073729472, __count = 32628, 
          __owner = -1073700176, __nusers = 32628, __kind = 0, __spins = 0, __list = {__prev = 0x7f74c00086b0, 
            __next = 0x7f74c000a4d0}}, 
        __size = "@0\000\300t\177\000\000\260\242\000\300t\177\000\000\000\000\000\000\000\000\000\000\260\206\000\300t\177\000\000ะค\000\300t\177\000", __align = 140139414171712}}, count = 45680, ctx = 0x7f74c000a3a0}, bind = {
    cnt = 140139414200704, lk = {spinlock = -335454016, mutex = {__data = {__lock = -335454016, __count = 32628, 
          __owner = -347170848, __nusers = 32628, __kind = -347171040, __spins = 32628, __list = {
            __prev = 0x7f74eb4e9780 <dumpops>, __next = 0x7f74c000a400}}, 
        __size = "\300`\001\354t\177\000\000\340\227N\353t\177\000\000 \227N\353t\177\000\000\200\227N\353t\177\000\000\000\244\000\300t\177\000", __align = 140140152447168}}}, count = {cnt = 140139414201344, lk = {
      spinlock = -349325120, mutex = {__data = {__lock = -349325120, __count = 32628, __owner = -349325760, 
          __nusers = 32628, __kind = -349325824, __spins = 32628, __list = {
            __prev = 0x7f74eb2da660 <mem_acct_init>, __next = 0x7f74fe5c2970 <default_notify>}}, 
        __size = "\300\270-\353t\177\000\000@\266-\353t\177\000\000\000\266-\353t\177\000\000`\246-\353t\177\000\000p)\\\376t\177\000", __align = 140140138576064}}}, bound_xl = 0x0, this = 0x0, tbl_index = 0, client_uid = 0x0, 
  auth = {flavour = 0, len = 0, data = 0x0, username = 0x0, passwd = 0x0}}
(gdb) 


Upcalls are sent (by default) to all the clients connected to the brick process except for the one which actually is doing the fop (which triggered the upcall).

Hence in 'server_process_event_upcall', we scan through existing transport objects and match their corresponding clients' client_uid with the one requested the fop. From the core looks like, the rpc transport object has reference to client object which is either invalid or uninitialized. 

Request Du to take a look and provide his comments.

Comment 17 Raghavendra G 2017-06-08 06:13:25 UTC
looks like a DISCONNECT event on a transport came to server even before an ACCEPT event was delivered, leading to a stale xprt object in conf->xprt_list. Hence likely a duplicate of bz 1442535 and fixed by patch [1]

Note that the patch [1] went in 3.8.4-25, but this bug is filed under 3.8.4-24.

[1] https://code.engineering.redhat.com/gerrit/105653

Comment 18 Atin Mukherjee 2017-06-08 06:21:09 UTC
Based on comment 17, moving this bug to ON_QA.

Comment 19 Soumya Koduri 2017-06-08 07:19:46 UTC
(In reply to Raghavendra G from comment #17)
> looks like a DISCONNECT event on a transport came to server even before an
> ACCEPT event was delivered, leading to a stale xprt object in
> conf->xprt_list. Hence likely a duplicate of bz 1442535 and fixed by patch
> [1]
> 
> Note that the patch [1] went in 3.8.4-25, but this bug is filed under
> 3.8.4-24.
> 
> [1] https://code.engineering.redhat.com/gerrit/105653

Thanks Raghavendra.

Comment 20 Prasad Desala 2017-07-05 06:16:25 UTC
This issue is not reproduced on glusterfs version 3.8.4-32.el7rhgs.x86_64. 
Moving this BZ to Verified.

Comment 22 errata-xmlrpc 2017-09-21 04:41:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.