Yaacov those Containers and and ContainerGroups that are not associated with a node yet (e.g. still Pending) are throwing this exception at metrics collection time: Validation error: cores not defined I think that one possible solution would be to transform that into a warning with something like: if @target.respond_to?(:hardware) @node_hardware = @target.hardware else @node_hardware = @target.try(:container_node).try(:hardware) end ... def validate_target raise TargetValidationError, "Validation error: ems not defined" unless @ext_management_system raise TargetValidationWarning, "Warning: object not scheduled to a node yet" unless @node_hardware ... end Not sure if it would enough though to make this less scary. Maybe another option is to filter these targets altogether if they don't have a "container_node" association, but I feel that it would be more risky not to have any info in the logs about why an object metrics collection didn't happen.
Gellert do you think it is acceptable to transform the error: ERROR -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Container(25987) is not valid: Validation error: cores not defined into a warning with a proper message? WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Container(25987) is not scheduled on any node yet I am worried to remove these messages completely.
submitted upstream https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/33
in https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/33 the warning format is: [----] W, [2017-06-05T18:36:41.441675 #29072:2aec7bb4af7c] WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::MetricsCapture#perf_collect_metrics) Container(68) has no hardware: State may be pending
Certainly worth keeping log level messages for lower level and not completely removing the message on all level
merged upstream https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/33