Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1523292 - sos plugin is generating Exception during plugin-setup
sos plugin is generating Exception during plugin-setup
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
4.1.6
Unspecified Unspecified
unspecified Severity medium
: ovirt-4.2.1
: 4.2.0
Assigned To: Ala Hino
Kevin Alon Goldblatt
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-12-07 11:33 EST by Steffen Froemer
Modified: 2018-05-15 13:53 EDT (History)
12 users (show)

See Also:
Fixed In Version: v4.20.13
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-05-15 13:52:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 85958 master MERGED sosplugin: Add checkStatus=False when getting storage domains info 2018-01-10 10:15 EST
Red Hat Product Errata RHEA-2018:1489 None None None 2018-05-15 13:53 EDT

  None (edit)
Description Steffen Froemer 2017-12-07 11:33:22 EST
Description of problem:
collecting a sosreport on a RHEL-7.4 hypervisor is throwing an exception during plugin setup routine

Version-Release number of selected component (if applicable):
vdsm-4.19.31-1.el7ev.x86_64

How reproducible:


Steps to Reproduce:
** I was not able to reproduce this in any way. It could be depended on RHV environment


Actual results:
 Setting up archive ...
 Setting up plugins ...
caught exception in plugin method "vdsm.setup()"     <<======
writing traceback to sos_logs/vdsm-plugin-errors.txt
 Running plugins. Please wait ...



Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in setup
    plug.setup()
  File "/usr/lib/python2.7/site-packages/sos/plugins/vdsm.py", line 159, in setup
    sd_uuids = cli.Host.getStorageDomains()
  File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 252, in _call
    raise TimeoutError(method, kwargs, timeout)
TimeoutError: Request Host.getStorageDomains with args {} timed out after 60 seconds



Expected results:
no exception 

Additional info:
Comment 2 Dan Kenigsberg 2017-12-07 12:55:32 EST
It would be helpful if you attach the vdsm.log from the time of the failed getStorageDomains command.
Comment 11 Nir Soffer 2017-12-24 08:33:34 EST
Steffen, if collecting info from vdsm timed out, what do you expect to see in
the sosreport instead of the traceback?
Comment 13 Steffen Froemer 2017-12-31 13:29:54 EST
(In reply to Nir Soffer from comment #11)
> Steffen, if collecting info from vdsm timed out, what do you expect to see in
> the sosreport instead of the traceback?

I would expect to not see this error, as I would like to have the expected information inside the sosreport.
If this error does occur alltime, it would be possible, to miss some data for analysis.
Comment 14 Nir Soffer 2017-12-31 14:04:21 EST
(In reply to Steffen Froemer from comment #13)
> (In reply to Nir Soffer from comment #11)
> > Steffen, if collecting info from vdsm timed out, what do you expect to see in
> > the sosreport instead of the traceback?
> 
> I would expect to not see this error, as I would like to have the expected
> information inside the sosreport.
> If this error does occur alltime, it would be possible, to miss some data
> for analysis.

sosreport cannot guarantee that the information will be in the sosreport. If vdsm
is not responsive, information from vdsm cannot be in the sosreport.

I think we have multiple issues:

1. sosreport is using incorrect timeout for requests that can take lot of time.

We should use different times for different requests, so we can get results
on a system with lot of luns.

2. sosreport is using getDeviceList incorrectly:

178             self.collectVdsmCommand(
179                 "Host.getDeviceList", cli.Host.getDeviceList)  

getDeviceList must be called with checkStatus=False. Otherwise it will try to
check the status of every LUN, which can take many minutes with hundreds of LUNs.

3. sosreport is collecting data in the setup phase

It should collect data in the collection phase. Not sure what is the correct way
to implement this with sosreport.

4. sosreport is failing after the first timeout

It should continue with the next request. In the worst case, some request will
never complete and we will not have the data for these requests.

I suggest to open new bug for each item.
Comment 15 Nir Soffer 2018-01-10 10:57:36 EST
Ala, the attached patch is fixing only issue 2. What about the other issues?

I think we need a new bug for each issue, or explain why how they are resolved.
Comment 16 Ala Hino 2018-01-10 11:04:21 EST
The original bug is about the error that fixed in the reference patch.

I will ask Steffen to open new bugs per the other issues.
Comment 17 Raz Tamir 2018-01-11 04:20:39 EST
Ala,

Please provide steps to reproduce when you have it

Thanks
Comment 18 Steffen Froemer 2018-01-11 04:42:27 EST
Nir and Ala,

fixing issue 2 is fine for me. I can't give information, if we hit other issues as well. 
For the first time, I would use the patched vdsm-module and would ask customer for testing. If the see further issues, I will open a new bugzilla for this. Otherwise we're fine.

Is the patch somewhere available? I would like to use a test-version in customer environment.

Thanks,
Steffen
Comment 19 RHV Bugzilla Automation and Verification Bot 2018-01-12 09:39:41 EST
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com
Comment 20 Ala Hino 2018-01-15 06:30:20 EST
(In reply to Steffen Froemer from comment #18)
> Nir and Ala,
> 
> fixing issue 2 is fine for me. I can't give information, if we hit other
> issues as well. 
> For the first time, I would use the patched vdsm-module and would ask
> customer for testing. If the see further issues, I will open a new bugzilla
> for this. Otherwise we're fine.
> 
> Is the patch somewhere available? I would like to use a test-version in
> customer environment.

The patch is available in Vdsm 4.20.13.
> 
> Thanks,
> Steffen
Comment 21 Ala Hino 2018-01-15 06:32:35 EST
(In reply to Raz Tamir from comment #17)
> Ala,
> 
> Please provide steps to reproduce when you have it
> 
> Thanks

Add as many devices as you can (30 or more), and generate the sos report on the host by executing `sosreport` command. No timeout error should be raised during the report generation.

You can also verify that when the storage server is down, there is a timeout but the report is still generated.
Comment 22 RHV Bugzilla Automation and Verification Bot 2018-01-18 12:39:24 EST
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com
Comment 23 RHV Bugzilla Automation and Verification Bot 2018-01-24 17:07:56 EST
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com
Comment 24 RHV Bugzilla Automation and Verification Bot 2018-01-30 06:22:56 EST
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com
Comment 27 Kevin Alon Goldblatt 2018-02-01 15:31:12 EST
Verified with the following code:
-------------------------------------------------
ovirt-engine-4.2.1.3-0.1.el7.noarch
vdsm-4.20.17-11.gite2d6775.el7.centos.x86_64


Verified with the following scenario:
-------------------------------------------------
1. Create a system with more than 30 storage domains
2. Run 'ovirt-log-collector' on the engine


report is generated. No exceptions thrown.

Moving to VERIFIED
Comment 28 RHV Bugzilla Automation and Verification Bot 2018-02-02 17:05:35 EST
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com
Comment 31 errata-xmlrpc 2018-05-15 13:52:46 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Note You need to log in before you can comment on or make changes to this bug.