Bug 1172938 - Excessive logging - 'trying duplicate remote fd set' on fuse mount logfile - after rebalance completion
Summary: Excessive logging - 'trying duplicate remote fd set' on fuse mount logfile - ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Raghavendra G
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: dht-rca-unknown, dht-rebalance-file, ...
Depends On:
Blocks: 1367265 1367283
TreeView+ depends on / blocked
 
Reported: 2014-12-11 06:53 UTC by SATHEESARAN
Modified: 2019-10-14 08:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1367265 (view as bug list)
Environment:
Last Closed: 2019-10-14 08:23:15 UTC
Embargoed:


Attachments (Terms of Use)
fuse mount logs (1.11 MB, text/plain)
2014-12-11 07:01 UTC, SATHEESARAN
no flags Details
sosreport from RHSS NODE1 (14.66 MB, application/x-xz)
2014-12-11 07:24 UTC, SATHEESARAN
no flags Details
sosreport from RHSS NODE2 (10.97 MB, application/x-xz)
2014-12-11 07:54 UTC, SATHEESARAN
no flags Details

Description SATHEESARAN 2014-12-11 06:53:23 UTC
Description of problem:
-----------------------
After completion of rebalance of the gluster volume backing VM Images, I could see lots of error messages - "trying duplicate remote fd set' on the mount log files.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
glusterfs-3.6.0.36-1.el6rhs

How reproducible:
-----------------
Always

Steps to Reproduce:
--------------------
1. Configure the gluster volume as the image store in RHEVM
2. Create App VMs with theirs disk images on gluster volume
3. Install OS on the App VMs and run some minimal load on VM using 'dd' command
4. Add 2 more bricks to the volume
5. Trigger rebalance on the volume.
Observation: I could notice that the rebalance was in progress and one of the VM's image file is under migration
6. After rebalance completion, take a look at fuse mount log file

Actual results:
---------------
There are too much information messages logged related to - "trying duplicate remote fd set."

Expected results:
-----------------
Excessive logging could be avoided, and thereby log files not consuming much space under root filesystem

Additional info:
----------------
# grep -i "trying duplicate remote fd set" /var/log/glusterfs/rhev-data-center-mnt-glusterSD-dhcp37-44.lab.eng.blr.redhat.com\:volume-imgstore.log | wc -l
3800

# ls -lh /var/log/glusterfs/rhev-data-center-mnt-glusterSD-dhcp37-44.lab.eng.blr.redhat.com\:volume-imgstore.log
-rw-------. 1 root root 1.2M Dec  9 03:32 /var/log/glusterfs/rhev-data-center-mnt-glusterSD-dhcp37-44.lab.eng.blr.redhat.com:volume-imgstore.log

Exact Error Message as seen on the logs :
-----------------------------------------
[2014-12-09 08:32:38.180961] I [client-helpers.c:99:this_fd_set_ctx] 2-volume-imgstore-client-4: /ebbf10c8-4b58-4b65-8f53-87c44058d19a/images/36357154-e609-47c0-9f1e-b77e89cef998/e4f33100-034c-46be-84c4-5a76a3952017 (367f266e-f306-4498-b920-5e113afa95f9): trying duplicate remote fd set. 
[2014-12-09 08:32:38.181036] I [client-helpers.c:99:this_fd_set_ctx] 2-volume-imgstore-client-5: /ebbf10c8-4b58-4b65-8f53-87c44058d19a/images/36357154-e609-47c0-9f1e-b77e89cef998/e4f33100-034c-46be-84c4-5a76a3952017 (367f266e-f306-4498-b920-5e113afa95f9): trying duplicate remote fd set. 
[2014-12-09 08:32:38.192022] I [client-helpers.c:99:this_fd_set_ctx] 2-volume-imgstore-client-5: /ebbf10c8-4b58-4b65-8f53-87c44058d19a/images/36357154-e609-47c0-9f1e-b77e89cef998/e4f33100-034c-46be-84c4-5a76a3952017 (367f266e-f306-4498-b920-5e113afa95f9): trying duplicate remote fd set. 
[2014-12-09 08:32:38.192059] I [client-helpers.c:99:this_fd_set_ctx] 2-volume-imgstore-client-4: /ebbf10c8-4b58-4b65-8f53-87c44058d19a/images/36357154-e609-47c0-9f1e-b77e89cef998/e4f33100-034c-46be-84c4-5a76a3952017 (367f266e-f306-4498-b920-5e113afa95f9): trying duplicate remote fd set.

Comment 1 SATHEESARAN 2014-12-11 06:56:57 UTC
Other relevant information
---------------------------
1. Volume Type      : Distributed-Replicate with 2X2
2. Number of AppVMs : 4 ( each with ~21GB of disk size )
3. Gluster Peer, Volume,rebalance status
[root@dhcp37-44 ~]# gluster pe s
Number of Peers: 1

Hostname: dhcp37-201.lab.eng.blr.redhat.com
Uuid: 98d2952d-8ee4-4de2-a6d9-786e7a0435a9
State: Peer in Cluster (Connected)

[root@dhcp37-44 ~]# gluster pool list
UUID					Hostname	State
98d2952d-8ee4-4de2-a6d9-786e7a0435a9	dhcp37-201.lab.eng.blr.redhat.com	Connected 
a28b5a64-4f47-419a-a373-9d2777df7038	localhost	Connected 

[root@dhcp37-44 ~]# gluster volume info
 
Volume Name: volume-imgstore
Type: Distributed-Replicate
Volume ID: 761a2895-ae83-4cc3-8a18-9478f8cf43b8
Status: Started
Snap Volume: no
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick2: dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick3: dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick2/b2
Brick4: dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick2/b2
Brick5: dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick3/b3
Brick6: dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick3/b3
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: enable
nfs.disable: off
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

[root@dhcp37-44 ~]# gluster volume status
Status of volume: volume-imgstore
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick1/b1	49152	Y	12048
Brick dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick1/b1	49152	Y	9841
Brick dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick2/b2	49153	Y	12059
Brick dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick2/b2	49153	Y	9852
Brick dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick3/b3	49154	Y	14227
Brick dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick3/b3	49154	Y	20689
NFS Server on localhost					2049	Y	14240
Self-heal Daemon on localhost				N/A	Y	14247
NFS Server on dhcp37-201.lab.eng.blr.redhat.com		2049	Y	20701
Self-heal Daemon on dhcp37-201.lab.eng.blr.redhat.com	N/A	Y	20708
 
Task Status of Volume volume-imgstore
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : c37950fc-2ca5-4752-99b8-23e4ee74a22a
Status               : completed           
 
[root@dhcp37-44 ~]# gluster volume rebalance volume-imgstore status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                9        22.0GB            31             0             0            completed            7522.00
       dhcp37-201.lab.eng.blr.redhat.com                0        0Bytes            22             0             0            completed               1.00
volume rebalance: volume-imgstore: success:

Comment 2 SATHEESARAN 2014-12-11 07:01:02 UTC
Created attachment 967096 [details]
fuse mount logs

Fuse mount logs on the client machine (Hypervisor)

Comment 4 SATHEESARAN 2014-12-11 07:24:15 UTC
Created attachment 967101 [details]
sosreport from RHSS NODE1

sosreport from NODE1

Comment 5 SATHEESARAN 2014-12-11 07:54:34 UTC
Created attachment 967110 [details]
sosreport from RHSS NODE2

Comment 6 Raghavendra G 2016-07-01 11:35:29 UTC
These logs indicate that duplicate open on the fds during rebalance. Apart from the perspective of logs, this needs to be fixed to avoid fd-leaks

Comment 8 Raghavendra G 2016-09-07 11:56:12 UTC
patch [1] largely solves the in_progress/completion task races. However the fix is not complete as
1. the traversal of list of open fds on an inode is not a single atomic operation.
2. [1] doesn't track in-progress open and hence more than one open can be issued on same fd by racing in-progress-check/rebalance-completion-detection tasks.

I can think of two possible solutions to this problem:
1. As mentioned in 2, tracking in-progress opens and waiting for them to complete before issuing open. However this requires to pause current task till in-progress open completes (as current fop cannot proceed till the fd is open).
2. Let the redundant opens be issued, but make sure protocol/client detect duplicate opens _and_ cleanup the fdctx, close the oldfd on brick, by calling client_fdctx_destroy on oldfdctx.

We can implement both 1 and 2 to be more careful.

@Nithya and others,

Your inputs are required on possible approach.

Comment 9 Raghavendra G 2016-09-07 11:57:38 UTC
(In reply to Raghavendra G from comment #8)
> patch [1] largely solves the in_progress/completion task races.

[1] http://review.gluster.org/12985

Comment 10 Raghavendra G 2016-09-14 06:28:13 UTC
(In reply to Raghavendra G from comment #8)
> patch [1] largely solves the in_progress/completion task races. However the
> fix is not complete as
> 1. the traversal of list of open fds on an inode is not a single atomic
> operation.
> 2. [1] doesn't track in-progress open and hence more than one open can be
> issued on same fd by racing in-progress-check/rebalance-completion-detection
> tasks.
> 
> I can think of two possible solutions to this problem:
> 1. As mentioned in 2, tracking in-progress opens and waiting for them to
> complete before issuing open. However this requires to pause current task
> till in-progress open completes (as current fop cannot proceed till the fd
> is open).

This seems a bit complex to implement. However, since in-progress/completion-check are already sync-tasks, we might've a way out. I'll try exploring synctask_yield etc to implement this. If it turns out too complex, I propose 2 should be fine enough.

> 2. Let the redundant opens be issued, but make sure protocol/client detect
> duplicate opens _and_ cleanup the fdctx, close the oldfd on brick, by
> calling client_fdctx_destroy on oldfdctx.
> 
> We can implement both 1 and 2 to be more careful.
> 
> @Nithya and others,
> 
> Your inputs are required on possible approach.

Comment 11 Atin Mukherjee 2016-11-07 13:29:26 UTC
upstream mainline patch http://review.gluster.org/15462 posted for review.


Note You need to log in before you can comment on or make changes to this bug.