Bug 1172938
| Summary: | Excessive logging - 'trying duplicate remote fd set' on fuse mount logfile - after rebalance completion | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||||||
| Component: | distribute | Assignee: | Raghavendra G <rgowdapp> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | storage-qa-internal <storage-qa-internal> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | rhgs-3.0 | CC: | amukherj, kramdoss, nbalacha, pkarampu, rcyriac, rgowdapp, rhinduja, rhs-bugs | ||||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | dht-rca-unknown, dht-rebalance-file, dht-log, dht-3.2.0-proposed, dht-retest | ||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 1367265 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2019-10-14 08:23:15 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1367265, 1367283 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
SATHEESARAN
2014-12-11 06:53:23 UTC
Other relevant information
---------------------------
1. Volume Type : Distributed-Replicate with 2X2
2. Number of AppVMs : 4 ( each with ~21GB of disk size )
3. Gluster Peer, Volume,rebalance status
[root@dhcp37-44 ~]# gluster pe s
Number of Peers: 1
Hostname: dhcp37-201.lab.eng.blr.redhat.com
Uuid: 98d2952d-8ee4-4de2-a6d9-786e7a0435a9
State: Peer in Cluster (Connected)
[root@dhcp37-44 ~]# gluster pool list
UUID Hostname State
98d2952d-8ee4-4de2-a6d9-786e7a0435a9 dhcp37-201.lab.eng.blr.redhat.com Connected
a28b5a64-4f47-419a-a373-9d2777df7038 localhost Connected
[root@dhcp37-44 ~]# gluster volume info
Volume Name: volume-imgstore
Type: Distributed-Replicate
Volume ID: 761a2895-ae83-4cc3-8a18-9478f8cf43b8
Status: Started
Snap Volume: no
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick2: dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick1/b1
Brick3: dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick2/b2
Brick4: dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick2/b2
Brick5: dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick3/b3
Brick6: dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick3/b3
Options Reconfigured:
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: enable
nfs.disable: off
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root@dhcp37-44 ~]# gluster volume status
Status of volume: volume-imgstore
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick1/b1 49152 Y 12048
Brick dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick1/b1 49152 Y 9841
Brick dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick2/b2 49153 Y 12059
Brick dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick2/b2 49153 Y 9852
Brick dhcp37-44.lab.eng.blr.redhat.com:/rhs/brick3/b3 49154 Y 14227
Brick dhcp37-201.lab.eng.blr.redhat.com:/rhs/brick3/b3 49154 Y 20689
NFS Server on localhost 2049 Y 14240
Self-heal Daemon on localhost N/A Y 14247
NFS Server on dhcp37-201.lab.eng.blr.redhat.com 2049 Y 20701
Self-heal Daemon on dhcp37-201.lab.eng.blr.redhat.com N/A Y 20708
Task Status of Volume volume-imgstore
------------------------------------------------------------------------------
Task : Rebalance
ID : c37950fc-2ca5-4752-99b8-23e4ee74a22a
Status : completed
[root@dhcp37-44 ~]# gluster volume rebalance volume-imgstore status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 9 22.0GB 31 0 0 completed 7522.00
dhcp37-201.lab.eng.blr.redhat.com 0 0Bytes 22 0 0 completed 1.00
volume rebalance: volume-imgstore: success:
Created attachment 967096 [details]
fuse mount logs
Fuse mount logs on the client machine (Hypervisor)
Created attachment 967101 [details]
sosreport from RHSS NODE1
sosreport from NODE1
Created attachment 967110 [details]
sosreport from RHSS NODE2
These logs indicate that duplicate open on the fds during rebalance. Apart from the perspective of logs, this needs to be fixed to avoid fd-leaks patch [1] largely solves the in_progress/completion task races. However the fix is not complete as 1. the traversal of list of open fds on an inode is not a single atomic operation. 2. [1] doesn't track in-progress open and hence more than one open can be issued on same fd by racing in-progress-check/rebalance-completion-detection tasks. I can think of two possible solutions to this problem: 1. As mentioned in 2, tracking in-progress opens and waiting for them to complete before issuing open. However this requires to pause current task till in-progress open completes (as current fop cannot proceed till the fd is open). 2. Let the redundant opens be issued, but make sure protocol/client detect duplicate opens _and_ cleanup the fdctx, close the oldfd on brick, by calling client_fdctx_destroy on oldfdctx. We can implement both 1 and 2 to be more careful. @Nithya and others, Your inputs are required on possible approach. (In reply to Raghavendra G from comment #8) > patch [1] largely solves the in_progress/completion task races. [1] http://review.gluster.org/12985 (In reply to Raghavendra G from comment #8) > patch [1] largely solves the in_progress/completion task races. However the > fix is not complete as > 1. the traversal of list of open fds on an inode is not a single atomic > operation. > 2. [1] doesn't track in-progress open and hence more than one open can be > issued on same fd by racing in-progress-check/rebalance-completion-detection > tasks. > > I can think of two possible solutions to this problem: > 1. As mentioned in 2, tracking in-progress opens and waiting for them to > complete before issuing open. However this requires to pause current task > till in-progress open completes (as current fop cannot proceed till the fd > is open). This seems a bit complex to implement. However, since in-progress/completion-check are already sync-tasks, we might've a way out. I'll try exploring synctask_yield etc to implement this. If it turns out too complex, I propose 2 should be fine enough. > 2. Let the redundant opens be issued, but make sure protocol/client detect > duplicate opens _and_ cleanup the fdctx, close the oldfd on brick, by > calling client_fdctx_destroy on oldfdctx. > > We can implement both 1 and 2 to be more careful. > > @Nithya and others, > > Your inputs are required on possible approach. upstream mainline patch http://review.gluster.org/15462 posted for review. |