1306930 – Writes on files are hung from nfs mount on performing attach-tier

Bug 1306930 - Writes on files are hung from nfs mount on performing attach-tier

Summary: Writes on files are hung from nfs mount on performing attach-tier

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nfs
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Niels de Vos
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1306194 1311002 1333645 1347524
Blocks:	1228553 1282771 1305205 1358581
TreeView+	depends on / blocked

Reported:	2016-02-12 08:41 UTC by spandura
Modified:	2018-04-16 18:19 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-16 18:19:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nfs-hang-from-server.pcap (10.90 MB, application/octet-stream) 2016-02-15 13:50 UTC, Soumya Koduri	no flags	Details
View All

Description spandura 2016-02-12 08:41:38 UTC

Description of problem:
========================
Writes are happening on files from nfs mount on 2x2 dis-rep volume. After attaching tier, all the writes and ls is hung on the nfs mount. 

Version-Release number of selected component (if applicable):
================================================================
glusterfs-3.7.5-19.el7rhgs.x86_64

How reproducible:
==================
Every time 

Steps to Reproduce:
====================
1. Create a 2x2 dis-rep volume. Start the volume. Mount the volume(nfs mount)

2. Open fd's on files and start writes on those fd's . 

3. attach-tier ( 2x2 ) to the volume. 

4. Try listing the files on the mount

Actual results:
===============
Both writes and ls is hung

Expected results:
==================
attach-tier shouldn't effect the IO's on mount

Additional info:
==================
[root@rhsauto019 ~]# gluster v info
 
Volume Name: testvol
Type: Tier
Volume ID: 52063226-a694-4352-8b84-774a6d6ad98b
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier3
Brick2: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier2
Brick3: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier1
Brick4: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick0
Brick6: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick1
Brick7: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick2
Brick8: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick3
Options Reconfigured:
cluster.watermark-hi: 40
cluster.watermark-low: 2
cluster.tier-mode: cache
features.ctr-enabled: on
performance.quick-read: off
performance.io-cache: off
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
[root@rhsauto019 ~]# 


'ls' output before Attaching-tier
==================================
[root@skoduri ~]# ls -lh /mnt/nfs
total 244K
-rwxr-xr-x. 1 root root  54K Feb 12 13:16 file_dir_ops.sh
-rw-r--r--. 1 root root 6.0K Feb 12 13:18 testfile_1
-rw-r--r--. 1 root root 6.4K Feb 12 13:18 testfile_10
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_11
-rw-r--r--. 1 root root 6.9K Feb 12 13:18 testfile_12
-rw-r--r--. 1 root root 6.9K Feb 12 13:18 testfile_13
-rw-r--r--. 1 root root 6.4K Feb 12 13:18 testfile_14
-rw-r--r--. 1 root root 5.9K Feb 12 13:18 testfile_15
-rw-r--r--. 1 root root 6.7K Feb 12 13:18 testfile_16
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_17
-rw-r--r--. 1 root root 6.4K Feb 12 13:18 testfile_18
-rw-r--r--. 1 root root 5.5K Feb 12 13:18 testfile_19
-rw-r--r--. 1 root root 6.6K Feb 12 13:18 testfile_2
-rw-r--r--. 1 root root 6.6K Feb 12 13:18 testfile_20
-rw-r--r--. 1 root root 6.0K Feb 12 13:18 testfile_21
-rw-r--r--. 1 root root 6.0K Feb 12 13:18 testfile_22
-rw-r--r--. 1 root root 6.6K Feb 12 13:18 testfile_23
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_24
-rw-r--r--. 1 root root 6.1K Feb 12 13:18 testfile_25
-rw-r--r--. 1 root root 6.5K Feb 12 13:18 testfile_26
-rw-r--r--. 1 root root 6.3K Feb 12 13:18 testfile_27
-rw-r--r--. 1 root root 6.0K Feb 12 13:18 testfile_28
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_29
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_3
-rw-r--r--. 1 root root 6.6K Feb 12 13:18 testfile_4
-rw-r--r--. 1 root root 6.6K Feb 12 13:18 testfile_5
-rw-r--r--. 1 root root 7.1K Feb 12 13:18 testfile_6
-rw-r--r--. 1 root root 6.5K Feb 12 13:18 testfile_7
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_8
-rw-r--r--. 1 root root 6.2K Feb 12 13:18 testfile_9


After attach-tier, the files are not migrated to hot-tier:
==========================================================
[2016-02-12 07:49:23.794702] I [MSGID: 109038] [tier.c:170:tier_check_watermark] 0-testvol-tier-dht: Tier watermark now 1
[2016-02-12 08:40:07.217141] I [MSGID: 109028] [dht-rebalance.c:3619:gf_defrag_status_get] 0-glusterfs: Rebalance is in progress. Time taken is 3049.00 secs
[2016-02-12 08:40:07.217194] I [MSGID: 109028] [dht-rebalance.c:3623:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0, skipped: 0

Comment 2 Niels de Vos 2016-02-12 09:25:10 UTC

1. How reproducible is this problem, does it happen every time you run this test?
2. Is there a distaf testcase that we can run to easily reproduce this problem?
3. Does it only happen when mounting over NFS, or also with FUSE, Samba, Ganesha?
4. Did you try to reproduce in upstream, has it already been fixed there?

Comment 3 spandura 2016-02-12 11:52:31 UTC

1) and 3) This is happening everytime we run this on nfs mount. It doesn't happen on FUSE. 

2) This is a distaf testcase and can be easily recreated. (But the testcase is not in the git repo yet)

4) No haven't tried to reproduce in upstream.

Comment 5 Soumya Koduri 2016-02-15 13:48:52 UTC

When checked with the socket#, we found that fd#21 is opened for that socket by gluster-nfs process.

From the core-

(gdb) p ev_data
$4 = (struct event_thread_data *) 0x7ff5790c1b10
(gdb) p ev_data->event_indec
There is no member named event_indec.
(gdb) p ev_data->event_index
$5 = 1
(gdb) p event_pool->activethreadcount
$6 = 3
(gdb) p event_pool->eventthreadcount
$7 = 2
(gdb) p *event_pool
$8 = {ops = 0x7ff578ac8700 <event_ops_epoll>, fd = 3, breaker = {0, 0}, count = 16384, reg = 0x0, ereg = {
    0x7ff5790b1980, 0x0 <repeats 1023 times>}, slots_used = {18, 0 <repeats 1023 times>}, used = 0, changed = 0, 
  mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {
        __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {
      __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, 
      __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, evcache = 0x0, evcache_size = 0, 
  eventthreadcount = 2, pollers = {140692049266432, 140691903022848, 0 <repeats 30 times>}, destroy = 0, 
  activethreadcount = 3}
(gdb) p ctx
No symbol "ctx" in current context.
(gdb) p *event_pool->slots_used
$9 = 18
(gdb) p event_pool->slots_used
$10 = {18, 0 <repeats 1023 times>}
(gdb) p event_pool->ereg[0]
$11 = (struct event_slot_epoll *) 0x7ff5790b1980
(gdb) p event_pool->ereg[1]
$12 = (struct event_slot_epoll *) 0x0
(gdb) p *event_pool->ereg[0]
$13 = {fd = 7, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff57908f5d0, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p event_pool->slots_used[0]
$14 = 18
(gdb) p *event_pool->ereg[1]
Cannot access memory at address 0x0
(gdb) p *event_pool->ereg[2]
Cannot access memory at address 0x0
(gdb) p *event_pool->ereg[3]
Cannot access memory at address 0x0
(gdb) p $11
$15 = (struct event_slot_epoll *) 0x7ff5790b1980
(gdb) p $11[0]
$16 = {fd = 7, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff57908f5d0, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p $11[1]
$17 = {fd = 9, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff5790bf7b0, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[2]
$18 = {fd = 11, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56404a9f0, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p $11[3]
$19 = {fd = 12, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56404be30, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p $11[4]
$20 = {fd = 13, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56404d200, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p $11[5]
$21 = {fd = 14, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56404e6a0, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p $11[6]
$22 = {fd = 15, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56404fda0, 
  handler = 0x7ff56d4baaf0 <socket_server_event_handler>, lock = 1}
(gdb) p $11[7]
$23 = {fd = 18, events = 1073741827, gen = 13, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff5641f5b90, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[8]
$24 = {fd = 19, events = 1073741827, gen = 4, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff5641a8130, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[9]
$25 = {fd = 20, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff564181400, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[10]
$26 = {fd = 10, events = 1073741827, gen = 13, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56407f7b0, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[11]
$27 = {fd = 22, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff56415a6d0, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[12]
$28 = {fd = 23, events = 1073741827, gen = 1, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff5641339a0, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p $11[13]
$29 = {fd = 21, events = 1073741826, gen = 4, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff564002140, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p event_pool->slots_used[13]
$30 = 0
(gdb) p event_pool->slots_used[0][13]
cannot subscript something of type `int'
(gdb) p event_pool->ereg[0][13]
$31 = {fd = 21, events = 1073741826, gen = 4, ref = 1, do_close = 0, in_handler = 0, data = 0x7ff564002140, 
  handler = 0x7ff56d4be240 <socket_event_handler>, lock = 1}
(gdb) p event_pool->ereg[0][13]->events
$32 = 1073741826
(gdb) p *event_pool->ereg[0][13]->events
Cannot access memory at address 0x40000002
(gdb) p &event_pool->ereg[0][13]->events
$33 = (int *) 0x7ff5790b1bf4 
(gdb) p/x event_pool->ereg[0][13]->events <<<<<<<<<<<<< EPOLLIN is not enabled
$34 = 0x40000002
(gdb) p/u 1u << 30
$35 = 1073741824
(gdb) p/x 1u << 30
$36 = 0x40000000
(gdb) p/x ($34|0x001)
$37 = 0x40000003
(gdb) p/x event_pool->ereg[0][1]->events
$38 = 0x40000003
(gdb) p/x event_pool->ereg[0][0]->events
$39 = 0x40000003
(gdb) p/x event_pool->ereg[0][2]->events
$40 = 0x40000003
(gdb) p/x event_pool->ereg[0][3]->events
$41 = 0x40000003
(gdb) p/x event_pool->ereg[0][4]->events
$42 = 0x40000003
(gdb) p/x event_pool->ereg[0][5]->events
$43 = 0x40000003
(gdb) p/x event_pool->ereg[0][6]->events
$44 = 0x40000003
(gdb) p/x event_pool->ereg[0][7]->events
$45 = 0x40000003
(gdb) p/x event_pool->ereg[0][8]->events
$46 = 0x40000003
(gdb) p/x event_pool->ereg[0][9]->events
$47 = 0x40000003
(gdb) p/x event_pool->ereg[0][10]->events
$48 = 0x40000003
(gdb) p/x event_pool->ereg[0][11]->events
$49 = 0x40000003
(gdb) p/x event_pool->ereg[0][12]->events
$50 = 0x40000003
(gdb) p/x event_pool->ereg[0][13]->events
$51 = 0x40000002
(gdb) p/x event_pool->ereg[0][14]->events
$52 = 0x40000003


As suspected, there is no EPOLLIN event enabled for this particular fd. We need to analyse the possible scenarios which could result in this case.

Comment 6 Soumya Koduri 2016-02-15 13:50:26 UTC

Created attachment 1127291 [details]
nfs-hang-from-server.pcap

Comment 7 Raghavendra G 2016-03-11 12:44:21 UTC

If the request is served from drc, we don't decrement the outstanding request count as we are missing the rpcsvc_request_destroy. This is causing the outstanding request count to be always greater than outstanding request limit, resulting in this issue. I'll send a patch to fix this issue.

        /* DRC */
	if (rpcsvc_need_drc (req)) {
                drc = req->svc->drc;

                LOCK (&drc->lock);
                {
                        reply = rpcsvc_drc_lookup (req);

                        /* retransmission of completed request, send cached reply */
                        if (reply && reply->state == DRC_OP_CACHED) {
                                gf_log (GF_RPCSVC, GF_LOG_INFO, "duplicate request:"
                                        " XID: 0x%x", req->xid);
                                ret = rpcsvc_send_cached_reply (req, reply);
                                drc->cache_hits++;
                                UNLOCK (&drc->lock);
>>>>>>>>>>>>>>> we don't destroy request here, causing a request leak and making outstanding request count off by 1 (for every request served from drc).
                                goto out;

                        } /* retransmitted request, original op in transit, drop it */

Comment 8 Niels de Vos 2016-03-11 15:53:17 UTC

I thought DRC is disable by default. Is that only for Gluster/NFS and not for the Gluster-client? If it really is disabled, we should not be able to reach this piece of code...

Comment 9 Raghavendra G 2016-03-11 16:45:25 UTC

I am not so familiar with NFS. Soumya should be able to answer this.

Comment 10 Soumya Koduri 2016-03-14 12:36:07 UTC

As Niels mentioned, DRC seems to be disabled by default and hence its unlikely we hit the above mentioned code path.

Comment 11 Raghavendra G 2016-03-16 05:04:51 UTC

In (In reply to Soumya Koduri from comment #10)
> As Niels mentioned, DRC seems to be disabled by default and hence its
> unlikely we hit the above mentioned code path.

In that case, my analysis is invalid.

Comment 12 krishnaram Karthick 2016-04-11 14:10:09 UTC

This issue is seen with the following workload too.

1) create a 4x2 dist-rep cold tier
2) Perform an nfs mount of this volume and create bigger size files using dd
for i in {1..10};do dd if=/dev/urandom of=file$i bs=1024 count=700000;echo $?;done
3) Attach tier (2x2) and allow attach tier to complete
4) detach tier and allow detach tier to complete
5) Attach tier once again with a different or cleaned up brick path. 

IO is hung while writing 'file-3'

sosreports from all 4 nodes and client are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1306930/

Note You need to log in before you can comment on or make changes to this bug.