Created attachment 1207892 [details] SSA Scan fail Description of problem: After running compliance test on an image there's an error in Tasks menu How reproducible: Always Steps to Reproduce: 1. Enable SSA in EVM --> Configuration 2. Add a policy to the provider 3. Select a Container Image and run Smart State Analysis on it Actual results: The SSA fails and there's error in Tasks menu that says: job timed out after 355.310125967 seconds of inactivity. Inactivity threshold [300 seconds] Expected results: The SSA should pass successfully Additional info: Log file with further info attached
I haven't been able to get it to work either using my goto VMs for this. Here is output for ubuntu, rhel, and windows. All of these worked in 5.6.1 Status = Error 10/11/16 16:43:58 UTC 10/11/16 15:52:54 UTC 10/11/16 15:52:42 UTC finished job timed out after 3048.5864924 seconds of inactivity. Inactivity threshold [3000 seconds] Scan from Vm RHEL72SSA admin EVM Status = Error 10/11/16 15:49:58 UTC 10/11/16 14:59:21 UTC 10/11/16 14:59:18 UTC finished job timed out after 3022.1714621 seconds of inactivity. Inactivity threshold [3000 seconds] Scan from Vm WS2012R2SSA admin EVM Status = Error 10/11/16 15:28:53 UTC 10/11/16 14:38:16 UTC 10/11/16 14:38:07 UTC finished job timed out after 3022.2359251 seconds of inactivity. Inactivity threshold [3000 seconds] Scan from Vm UBU1404 admin EVM
Just to be clear, you're trying to perform SSA on a container, not a VM, correct?
My three were Azure VMs
Jeff, yes but the original description says "container image", so it might not be related to what you're seeing on Azure. Azure will throttle requests based on usage, which can cause this problem. Errors in the log may shed more light on it. If the problems are related, it would have to be at a very high level (like an appliance issue) because the 2 code paths are very different.
My attempts were on a Container Image
I do not remember, but I will try to test it again
Created attachment 1220366 [details] OpenSCAP Fail 5.7.10
Erez, I added updated log from an image scan on 5.7.10
I checked it again on 5.7.0.10 with a new OpenShift setup and got this while trying to scan an nginx image I got from Docker.io [----] E, [2016-11-16T04:41:39.731551 #2871:899144] ERROR -- : Q-task_id([d34c1cc2-abe0-11e6-a2c0-001a4a1697bb]) MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::Scanning::Job#process_abort) job aborting, cannot analyze non docker images [----] E, [2016-11-16T04:41:39.756713 #2871:899144] ERROR -- : Q-task_id([d34c1cc2-abe0-11e6-a2c0-001a4a1697bb]) MIQ(MiqQueue#deliver) Message id: [24904], Error: [undefined method `[]' for nil:NilClass] [----] E, [2016-11-16T04:41:39.756921 #2871:899144] ERROR -- : Q-task_id([d34c1cc2-abe0-11e6-a2c0-001a4a1697bb]) [NoMethodError]: undefined method `[]' for nil:NilClass Method:[rescue in deliver] [----] E, [2016-11-16T04:41:39.757103 #2871:899144] ERROR -- : Q-task_id([d34c1cc2-abe0-11e6-a2c0-001a4a1697bb]) /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/scanning/job.rb:195:in `cleanup'
I am not sure how much how much are following findings relevant for this bug, or is it another one, but after starting the SmartState Analysis, the CPU load average of VM running the openshift jumps from 0.5 to like 30, I saw image_inspector process on the top of `top` sorted by CPU%. The VM is laggy for quite a while (1 minute or more) after teh image_inspector process seems to be gone, kswap and loop processes are active in that time.
Created attachment 1233371 [details] OSE VM findings It seems the the high load is created by the image extraction process. maybe it the timeout of the image scan in CFME is caused by the VM being overloaded by IO of the image_scanning or extraction process.