Description of problem: vSphere Problem Detector may be leaking sessions in vSphere due to client not being logged out between syncs. Version-Release number of selected component (if applicable): 4.10 and master How reproducible: Always Steps to Reproduce: 1. Monitor sessions using `govc session.ls` while vsphere-problem-detector is running Actual results: Sessions count increases from 1 to 14 in about 17 minutes. In the output below, I was running build from https://github.com/openshift/vsphere-problem-detector/pull/58 $ while true; do date; govc session.ls | grep vsphere-prob; sleep 5m ; done Thu Oct 7 10:20:26 AM MDT 2021 523e179e-4ead-fd7b-b51a-4f51ad4789a0 VSPHERE.LOCAL\rbost 2021-10-07 16:20 12s x.x.x.x vsphere-problem-detector/v0.0.0-unknown Thu Oct 7 10:25:26 AM MDT 2021 5237d10f-eb55-be7b-08f3-199397fec29d VSPHERE.LOCAL\rbost 2021-10-07 16:23 2m8s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 523e179e-4ead-fd7b-b51a-4f51ad4789a0 VSPHERE.LOCAL\rbost 2021-10-07 16:20 5m12s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 528a51b6-142d-9e56-afbd-2e19b384b975 VSPHERE.LOCAL\rbost 2021-10-07 16:24 1m7s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 528f6e0d-da37-ffe0-247a-11f9bff5a2de VSPHERE.LOCAL\rbost 2021-10-07 16:22 3m10s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52f4a6c1-c603-bdf5-1922-0f3129bc0196 VSPHERE.LOCAL\rbost 2021-10-07 16:21 4m11s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52facf01-0f12-d658-077a-2485d25b602e VSPHERE.LOCAL\rbost 2021-10-07 16:25 7s x.x.x.x vsphere-problem-detector/v0.0.0-unknown Thu Oct 7 10:30:26 AM MDT 2021 5212af06-f896-be51-8088-4c881459226f VSPHERE.LOCAL\rbost 2021-10-07 16:26 4m5s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 522071e9-0ab3-d843-e02e-d44924644ea2 VSPHERE.LOCAL\rbost 2021-10-07 16:28 2m3s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 5237d10f-eb55-be7b-08f3-199397fec29d VSPHERE.LOCAL\rbost 2021-10-07 16:23 7m9s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 523e179e-4ead-fd7b-b51a-4f51ad4789a0 VSPHERE.LOCAL\rbost 2021-10-07 16:20 10m13s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 528a51b6-142d-9e56-afbd-2e19b384b975 VSPHERE.LOCAL\rbost 2021-10-07 16:24 6m8s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 528f6e0d-da37-ffe0-247a-11f9bff5a2de VSPHERE.LOCAL\rbost 2021-10-07 16:22 8m10s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52b62109-7618-33f1-12ab-0eccf6c46c77 VSPHERE.LOCAL\rbost 2021-10-07 16:27 3m4s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52d08b6b-9be2-b4de-c7bc-4aae330242ab VSPHERE.LOCAL\rbost 2021-10-07 16:29 1m2s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52df60e9-7dd4-666f-711c-99d872e0a85a VSPHERE.LOCAL\rbost 2021-10-07 16:30 2s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52f4a6c1-c603-bdf5-1922-0f3129bc0196 VSPHERE.LOCAL\rbost 2021-10-07 16:21 9m11s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52facf01-0f12-d658-077a-2485d25b602e VSPHERE.LOCAL\rbost 2021-10-07 16:25 5m7s x.x.x.x vsphere-problem-detector/v0.0.0-unknown Thu Oct 7 10:35:27 AM MDT 2021 5211905e-4c06-e20e-edc2-6d7debaf3af8 VSPHERE.LOCAL\rbost 2021-10-07 16:31 4m0s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 5212af06-f896-be51-8088-4c881459226f VSPHERE.LOCAL\rbost 2021-10-07 16:26 9m5s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 522071e9-0ab3-d843-e02e-d44924644ea2 VSPHERE.LOCAL\rbost 2021-10-07 16:28 7m3s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 5234bcde-f49b-f53f-edcf-fcdd0db91de5 VSPHERE.LOCAL\rbost 2021-10-07 16:32 2m59s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 5237d10f-eb55-be7b-08f3-199397fec29d VSPHERE.LOCAL\rbost 2021-10-07 16:23 12m9s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 523e179e-4ead-fd7b-b51a-4f51ad4789a0 VSPHERE.LOCAL\rbost 2021-10-07 16:20 15m13s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 528a51b6-142d-9e56-afbd-2e19b384b975 VSPHERE.LOCAL\rbost 2021-10-07 16:24 11m8s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 528f6e0d-da37-ffe0-247a-11f9bff5a2de VSPHERE.LOCAL\rbost 2021-10-07 16:22 13m10s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52b62109-7618-33f1-12ab-0eccf6c46c77 VSPHERE.LOCAL\rbost 2021-10-07 16:27 8m4s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52d08b6b-9be2-b4de-c7bc-4aae330242ab VSPHERE.LOCAL\rbost 2021-10-07 16:29 6m2s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52db6e7f-d860-dc5a-109c-e2a936be556e VSPHERE.LOCAL\rbost 2021-10-07 16:34 56s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52df60e9-7dd4-666f-711c-99d872e0a85a VSPHERE.LOCAL\rbost 2021-10-07 16:30 5m1s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52ebd12c-0084-4ee6-e6b5-339337f7a3d1 VSPHERE.LOCAL\rbost 2021-10-07 16:33 1m57s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52f4a6c1-c603-bdf5-1922-0f3129bc0196 VSPHERE.LOCAL\rbost 2021-10-07 16:21 14m11s x.x.x.x vsphere-problem-detector/v0.0.0-unknown 52facf01-0f12-d658-077a-2485d25b602e VSPHERE.LOCAL\rbost 2021-10-07 16:25 10m7s x.x.x.x vsphere-problem-detector/v0.0.0-unknown
I did some test in vsphere6.7 which we have admin access to check the session with the nightly 4.10.0-0.nightly-2021-10-16-173656. 1. In the running cluster, I did not see the vsphere-problem-detector session alive: $ govc session.ls | wc -l 25 $ govc session.ls | grep vsphere-prob | wc -l 0 2. Using the command "while true; do govc session.ls | grep vsphere-prob;done" during restarting vsphere-problem-detector pod, I still could not see the alive vsphere-problem-detector session. From my side, it seems no session leak any more. @Robert What do you think? BTW, where do you find this issue? I understand it is not in VMC as you could check the session in the bug description. " $ while true; do date; govc session.ls | grep vsphere-prob; sleep 5m ; done Thu Oct 7 10:20:26 AM MDT 2021 523e179e-4ead-fd7b-b51a-4f51ad4789a0 VSPHERE.LOCAL\rbost 2021-10-07 16:20 12s x.x.x.x vsphere-problem-detector/v0.0.0-unknown " I'm wondering if you could help verify on your env (no need on VMC) to double confirm. Thanks.
> $ govc session.ls | grep vsphere-prob | wc -l The `grep vsphere-prob` would only be correct if https://github.com/openshift/vsphere-problem-detector/pull/58 was merged or you had a build containing that change. Otherwise, the user agents will not be set as you would expect. > BTW, where do you find this issue? I understand it is not in VMC as you could check the session in the bug description. I was testing in a vSphere installation in IBM Cloud where full vSphere admin privileges are given and happy to test there again, however.. This issue may be blocked until https://github.com/openshift/vsphere-problem-detector/pull/58 can be merged otherwise we cannot really isolate what is vsphere-problem-detector and what is not.
@Robert Thanks for the info.
With regards to https://issues.redhat.com/browse/RFE-2123, I suggest we backport it everywhere we have vsphere-problem-detector installed by default. This is a report from 4.7: https://access.redhat.com/support/cases/#/case/02988444 - they fixed their credentials and it's closed, but it looks related.
Verified pass on 4.10.0-0.nightly-2021-12-06-201335 1. Check on the cluster after running for a while, no session from vsphere-problem-detector $ govc session.ls | grep vsphere-prob | wc -l 0 2. Using following command to monitor the session and restart the vsphere-problem-detector-operator pod, we see new session launched and releases immediately $ while true; do date; govc session.ls | grep vsphere-prob; done $ oc -n openshift-cluster-storage-operator delete pod vsphere-problem-detector-operator-8794689bc-kqm7l (restart vsphere-problem-detector-operator pod) Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC 5227b896-edbb-5049-6544-327c122e5689 VSPHERE.LOCAL\openshift-qe-machineset 2021-12-14 02:29 0s 10.8.30.2 vsphere-problem-detector/4.10.0-202111222329.p0.gbda2d97.assembly.stream-bda2d97 Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC Tue 14 Dec 2021 02:25:32 AM UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056