Bug 1564442

Summary: virt-who hangs on check_report_state
Product: Red Hat Satellite Reporter: sbadhwar
Component: Subscriptions - virt-whoAssignee: Barnaby Court <bcourt>
Status: CLOSED CURRENTRELEASE QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.3.0CC: csnyder, hsun, jhutar, khowell, mmccune, psuriset, sbadhwar, sthirugn, tstrachota
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: scale_lab
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-09 14:36:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sbadhwar 2018-04-06 09:57:16 UTC
Description of problem:
While trying to test virt-who at Scale, we discovered that virt-who hangs quite a lot of times at the step check_report_state for the created jobs.

Virt-who was ran in one shot mode with debug messages enabled trying to report the hosts running in VMWare ESXi environment. The setup consisted of reporting 100 hosts with 6000 virtual machines.

The debug output is as follows:
2018-04-06 05:40:09,619 [rhsm.connection DEBUG] MainProcess(1213):Thread-3 @connection.py:_request:516 - Making request: POST /rhsm/hypervisors/Default_Organization?reporter_id=gprfc023.sbu.lab.eng.bos.redhat.com-e542c5cdcd5a42b9afddad8b4529e928&cloaked=False&env=Library
2018-04-06 05:40:12,391 [rhsm.connection INFO] MainProcess(1213):Thread-3 @connection.py:_request:552 - Response: status=200, request="POST /rhsm/hypervisors/Default_Organization?reporter_id=gprfc023.sbu.lab.eng.bos.redhat.com-e542c5cdcd5a42b9afddad8b4529e928&cloaked=False&env=Library"
2018-04-06 05:40:12,398 [virtwho.destination_8759358284855132892 DEBUG] MainProcess(1213):Thread-3 @subscriptionmanager.py:_connect:128 - Authenticating with RHSM username admin
2018-04-06 05:40:12,398 [rhsm.connection INFO] MainProcess(1213):Thread-3 @connection.py:__init__:822 - Connection built: host=gprfc018.sbu.lab.eng.bos.redhat.com port=443 handler=/rhsm auth=basic username=admin
2018-04-06 05:40:12,399 [rhsm.connection DEBUG] MainProcess(1213):Thread-3 @connection.py:_request:516 - Making request: GET /rhsm/status/
2018-04-06 05:40:12,444 [rhsm.connection INFO] MainProcess(1213):Thread-3 @connection.py:_request:552 - Response: status=200, request="GET /rhsm/status/"
2018-04-06 05:40:12,444 [virtwho.destination_8759358284855132892 DEBUG] MainProcess(1213):Thread-3 @subscriptionmanager.py:check_report_state:263 - Checking status of job hypervisor_update_9e23fe58-4f70-4902-9ac7-326ae9c8f6f8
2018-04-06 05:40:12,446 [rhsm.connection DEBUG] MainProcess(1213):Thread-3 @connection.py:_request:516 - Making request: GET /rhsm/jobs/hypervisor_update_9e23fe58-4f70-4902-9ac7-326ae9c8f6f8?result_data=True
2018-04-06 05:40:12,498 [rhsm.connection INFO] MainProcess(1213):Thread-3 @connection.py:_request:552 - Response: status=200, request="GET /rhsm/jobs/hypervisor_update_9e23fe58-4f70-4902-9ac7-326ae9c8f6f8?result_data=True"
2018-04-06 05:40:12,498 [virtwho.destination_8759358284855132892 DEBUG] MainProcess(1213):Thread-3 @subscriptionmanager.py:check_report_state:277 - Job hypervisor_update_9e23fe58-4f70-4902-9ac7-326ae9c8f6f8 not finished

On quering the API endpoint for the same job, the following output was generated:

{
  "id": "hypervisor_update_9e23fe58-4f70-4902-9ac7-326ae9c8f6f8",
  "state": "FINISHED",
  "startTime": "2018-04-06T09:40:12+0000",
  "finishTime": "2018-04-06T09:40:16+0000",
  "result": "Created: 0, Updated: 0, Unchanged:100, Failed: 0",
  "principalName": "foreman_admin",
  "targetType": "owner",
  "targetId": "Default_Organization",
  "ownerId": "Default_Organization",
  "correlationId": "6b0d058c",
  "resultData": "[cloaked]",
  "statusPath": "/jobs/hypervisor_update_9e23fe58-4f70-4902-9ac7-326ae9c8f6f8",
  "done": true,
  "group": "async group",
  "created": "2018-04-06T09:40:12+0000",
  "updated": "2018-04-06T09:40:16+0000"
}

Indicating the Job has already finished.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
The job was stuck at waiting state

Expected results:
virt-who one shot should finish automatically

Additional info:

Comment 1 Marek Hulan 2018-04-06 10:24:34 UTC
This does not seem to be related to configuration plugin, but virt-who itself. Moving of configuration plugin component. Chris, could you please take a look?

Comment 2 Eko 2018-04-09 06:27:04 UTC
What is your virt-who package version? the latest virt-who for RHEL7.5 is virt-who-0.21.5-1.el7

Comment 3 sbadhwar 2018-04-09 06:33:32 UTC
(In reply to Eko from comment #2)
> What is your virt-who package version? the latest virt-who for RHEL7.5 is
> virt-who-0.21.5-1.el7

The package version for virt-who is: virt-who-0.20.4-1.el7sat.noarch running on RHEL 7.4

Comment 6 Kevin Howell 2018-04-09 14:36:09 UTC
Closing CURRENTRELEASE per comment 5.

Comment 7 sthirugn@redhat.com 2018-04-09 19:57:20 UTC
Satellite 6.3 currently ships virt-who version 0.20.4-1.  To resolve this specific bug reported here, use virt-who version 0.21.5-1.el7 which is shipped with rhel 7.5.