1367473 – SmartState Analysis not working for container images

Bug 1367473 - SmartState Analysis not working for container images

Summary: SmartState Analysis not working for container images

Keywords:
Status:	CLOSED DUPLICATE of bug 1366143
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	SmartState Analysis
Sub Component:
Version:	5.6.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	GA
Target Release:	5.7.0
Assignee:	Rich Oliveri
QA Contact:	Dave Johnson
Docs Contact:
URL:
Whiteboard:	container
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-16 13:35 UTC by Prasad Mukhedkar
Modified:	2020-04-15 14:36 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-19 08:00:42 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Prasad Mukhedkar 2016-08-16 13:35:37 UTC

Smart State analysis for container images not working, The fleecing task is getting stuck into "waiting_to_start" status infinitely.. In log I see following :

 [----] W, [2016-08-11T09:34:32.938155 #3020:b09988]  WARN -- : Q-task_id([job_dispatcher]) MIQ(JobProxyDispatcher#dispatch_to_ems) SKIPPING remaining Container Image scan jobs for Ext Management System [99000000000001] in dispatch since there are [3] active scans in zone [default]

This is what I see in the database : 


vmdb_production=# select guid,state,status,message,name,dispatch_status from jobs where dispatch_status='active';
                 guid                 |      state      | status |      message      |           name           | dispatch_status 
--------------------------------------+-----------------+--------+-------------------+--------------------------+-----------------
 7e4d8d06-48de-11e6-9c8c-005056957282 | waiting_to_scan | ok     | process initiated | Container image analysis | active
 7e497702-48de-11e6-9c8c-005056957282 | waiting_to_scan | ok     | process initiated | Container image analysis | active
 7e4bbc2e-48de-11e6-9c8c-005056957282 | waiting_to_scan | ok     | process initiated | Container image analysis | active
(3 rows)


vmdb_production=# select guid,state,status,message,name,dispatch_status from jobs where dispatch_status!='active';
 00759e7c-5a5f-11e6-872e-005056957282 | waiting_to_start | ok     | process initiated | Container image analysis | pending
 29434312-5a60-11e6-872e-005056957282 | waiting_to_start | ok     | process initiated | Container image analysis | pending
(394 rows)

Other ERROR in the logs : 

[----] I, [2016-08-12T06:27:35.161662 #8285:b09988]  INFO -- : MIQ(MiqGenericWorker::Runner) ID [99000000031743] PID [8285] GUID [7fcc106a-6041-11e6-872e-005056957282] Exit request received. Worker exiting.
------------------------

[----] I, [2016-08-11T08:26:11.311503 #25633:b09988]  INFO -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::MetricsCollectorWorker::Runner) ID [99000000028351] PID [25633] GUID [6ece4a2c-5f8c-11e6-872e-005056957282] Exit request received. Worker exiting.


----------

[----] E, [2016-08-11T07:13:18.040908 #11818:b09988] ERROR -- : MIQ(Job.check_jobs_for_timeout) Couldn't find VmOrTemplate with 'id'=99000000000003
[----] I, [2016-08-11T07:14:10.374479 #11845:b09988]  INFO -- : MIQ(MiqQueue.put) Message id: [99000002812075],  id: [], Zone: [default], Role: [], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Job.check_jobs_for_timeout], Timeout: [600], Priority: [90], State: [ready], Deliver On: [], Data: [], Args: []

Can we remove the jobs from the database? Will that help? 
We dont have assertive info in the logs to understand why 
the active tasks execution is failing. Dont see any 
timeout either. 

Customer database Restored on : 10.65.200.236  root:smartvm

Comment 2 Mooli Tayer 2016-08-17 09:04:41 UTC

Prasad is this a clone of https://bugzilla.redhat.com/show_bug.cgi?id=1366143 ?

That happens if we have three failed jobs already stuck in the queue but their status isn't reported correctly.

Comment 4 Mooli Tayer 2016-08-17 12:32:35 UTC

Quick fix[1]:
cd /var/www/miq/vmdb/
source /etc/default/evm
bin/rails c

irb(main):016:0> Job.update(:state => 'finished')
irb(main):014:0> Job.destroy_all

[1] since only "finished" or "waiting_to_start" jobs can be deleted.

Comment 6 Mooli Tayer 2016-08-17 12:41:08 UTC

(In reply to Mooli Tayer from comment #4)
> Quick fix[1]:
> cd /var/www/miq/vmdb/
> source /etc/default/evm
> bin/rails c
> 
> irb(main):016:0> Job.update(:state => 'finished')
> irb(main):014:0> Job.destroy_all
> 
> [1] since only "finished" or "waiting_to_start" jobs can be deleted.

Actually that's very bad. I copied it from what I provided to qe. 

We don't want to delete all of a customer's job history.
Just update and delete the jobs that are stuck.

Note You need to log in before you can comment on or make changes to this bug.