Bug 1632960
Summary: | Hundreds of gvfsd-trash processes are spawned when user runs Xsession/Gnome after an NFS session failed | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Robert Verstandig <r.verstandig> | |
Component: | gvfs | Assignee: | Ondrej Holy <oholy> | |
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 7.5 | CC: | jwright, mboisver, r.verstandig, tpelka | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | gvfs-1.36.2-2.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1739117 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 12:57:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1656436, 1739117 |
Description
Robert Verstandig
2018-09-25 23:02:55 UTC
Thanks for your detailed report. I think we have enough info here to reproduce and propose some fix thanks to the fact that it relates to NFS failures. This was fixed upstream for some time simply by ignoring network filesystems (because they were marked as system-internal). However, this upstream change has been recently reverted and thus this bug has to be fixed in another way... Actually, it is still not clear what is causing the big number of requests to the trash backend. I don't see it in my environment, it is probably some bug in some client application and it would be nice to fix it. But I don't see any easy way how to find the culprit. But this is not something that would block fixing of the bug in gvfs infrastructure which allows spawning of the big amount of gvfsd-trash processes... I am still trying to find a source of the requests. You are talking about the hundreds of gvfsd-trash processes, but can you please provide also info about how many user sessions are running on that server, resp. how many processes are spawned per one user session? The NFS hangs seemed to be completely random. During semester we have around 20-30 user Turbo VNC sessions running. These seem to generate around 80 processes each including the main application the students run during classes. Right now there are a little over 2300 processes active on the frontend and only around the equivalent of 3 CPUs in use. There is little actual processing load on the frontend node as most of the processes are idle. In Ganglia during even the busiest periods there are around 4-6 of the 28 CPUs in use at any one time. The nodes have 320 GB RAM with about 40 GB in use. The remaining memory is buffered/cached with around 3GB completely unused. There is 8 GB of swap installed; however, this is usually 100% free. The only time I have seen it change is after this issue occurs and the thousands of trash processes are generated. At that point the whole server is compromised anyway... The actual processing workload is distributed across the 15 worker nodes via the Torque PBS batch queuing system so only these nodes are under load. It is end of semester now so classes are over. I will go through and clean up the leftover student sessions early next month. It is difficult to find a time window when the cluster can be restarted as it is still quite busy with several researchers running heavy workloads over long periods of time across the worker nodes. We will be decommissioning the VNX shared storage over the next two months and replacing it with new storage, which will be connected directly to the frontend. The old RHEL5 node that currently provides the VNX shares will be decommissioned so I expect (am hoping) that this problems will go away. Let me know if there is anything else you need. Thanks for the info. I meant how many gvfsd-trash processes were spawned per user session, not processes in general. I'm sorry that I didn't write that clearly at first. Do you know that? Anyway, the proposed fix ensures that only one gvfsd-trash is spawned per user at max. OK, when the VNX NFS shares crashed, every user session began to spawn the trash processes until the frontend crashed, i.e., hundreds... I tried killing them all off via a search and destroy script but they just kept coming back. From memory one would spawn around every 30 seconds for each user. Times that by 30 users... Well hopefully your fix will stop that from occurring. I still have no idea why the NFS VNX shares crashed on the frontend only though. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2145 |