Bug 1458797

Summary: Validation error: ems/core not defined while ContainerGroups in the "Pending" state
Product: Red Hat CloudForms Management Engine Reporter: Gellert Kis <gekis>
Component: C&U Capacity and UtilizationAssignee: Yaacov Zamir <yzamir>
Status: CLOSED CURRENTRELEASE QA Contact: Einat Pacifici <epacific>
Severity: high Docs Contact:
Priority: high    
Version: 5.7.0CC: cpelland, fsimonce, gekis, jhardy, obarenbo
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: container:c&u
Fixed In Version: 5.9.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1461522 (view as bug list) Environment:
Last Closed: 2018-03-06 15:47:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1461522    

Comment 2 Federico Simoncelli 2017-06-05 13:56:18 UTC
Yaacov those Containers and and ContainerGroups that are not associated with a node yet (e.g. still Pending) are throwing this exception at metrics collection time:

  Validation error: cores not defined


I think that one possible solution would be to transform that into a warning with something like:


      if @target.respond_to?(:hardware)
        @node_hardware = @target.hardware
      else
        @node_hardware = @target.try(:container_node).try(:hardware)
      end

...

    def validate_target
      raise TargetValidationError, "Validation error: ems not defined" unless @ext_management_system
      raise TargetValidationWarning, "Warning: object not scheduled to a node yet" unless @node_hardware
      ...
    end


Not sure if it would enough though to make this less scary. Maybe another option is to filter these targets altogether if they don't have a "container_node" association, but I feel that it would be more risky not to have any info in the logs about why an object metrics collection didn't happen.

Comment 3 Federico Simoncelli 2017-06-05 14:05:28 UTC
Gellert do you think it is acceptable to transform the error:

 ERROR -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Container(25987) is not valid: Validation error: cores not defined

into a warning with a proper message?

 WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Container(25987) is not scheduled on any node yet


I am worried to remove these messages completely.

Comment 4 Yaacov Zamir 2017-06-05 14:20:33 UTC
submitted upstream

https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/33

Comment 5 Yaacov Zamir 2017-06-05 16:00:25 UTC
in https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/33

the warning format is:

[----] W, [2017-06-05T18:36:41.441675 #29072:2aec7bb4af7c] WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Container(68) has no hardware: State may be pending

Comment 6 Gellert Kis 2017-06-06 06:09:00 UTC
Certainly worth keeping log level messages for lower level and not completely removing the message on all level

Comment 7 Yaacov Zamir 2017-06-08 11:23:50 UTC
merged upstream
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/33