Bug 1765261

Summary: Logging operator adds "kubernetes.io/os: linux" to nodeSelectors, breaking deployment on OCP 4.1.20
Product: OpenShift Container Platform Reporter: Naveen Malik <nmalik>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, bparees, okashi, rmeggins
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Mismatch between cluster version and clusterlogging version Consequence: ClusterLogging doesn't deploy because proper node selectors don't exist Fix: Ensure the kubeversion that supports the deployed ClusterLogging version Result: ClusterLogging deploys
Story Points: ---
Clone Of:
: 1766343 (view as bug list) Environment:
Last Closed: 2020-01-23 11:09:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1766343    

Description Naveen Malik 2019-10-24 16:37:04 UTC
Description of problem:
On 4.1.20 the logging operator is adding to nodeSelectors "kubernetes.io/os: linux" BUT not a single node has this label.  They have "beta.kubernetes.io/os: linux".

In OSD we manage compute with Hive.  Hive does not allow edits to labels.


Version-Release number of selected component (if applicable):
4.1.20


How reproducible:
Every time.

Steps to Reproduce:
1. Provision 4.1.20 cluster.
2. Attempt to configure ClusterLogging (https://docs.openshift.com/dedicated/4/logging/dedicated-efk-deploying.html)


Actual results:
Logging cannot be deployed.


Expected results:
Logging stack is deployed.


Additional info:
The docs for OSD do not include any variant of kubernetes.io/os label in the ClusterLogging CR.  This is *added* by the logging operator.  My workaround at this time is recreate compute w/ the expected label.

Possibly

Comment 2 Ben Parees 2019-10-24 17:06:44 UTC
which channel did you install logging from?  preview(which is 4.1) or 4.2?

Comment 3 Ben Parees 2019-10-24 17:10:51 UTC
(for a 4.1 cluster you should be installing from the preview channel).

Comment 4 Naveen Malik 2019-10-24 17:22:10 UTC
Customers install the logging and elasticsearch operators from OperatorHub.  The channel selected by default at this time is "4.2".



On this cluster it is "4.2".



$ oc get subscription -n openshift-logging cluster-logging -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  creationTimestamp: "2019-10-24T15:41:06Z"
  generation: 1
  labels:
    csc-owner-name: installed-redhat-openshift-logging
    csc-owner-namespace: openshift-marketplace
  name: cluster-logging
  namespace: openshift-logging
  resourceVersion: "75469575"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-logging/subscriptions/cluster-logging
  uid: b14458fb-f674-11e9-bf9d-02af53d1c26e
spec:
  channel: "4.2"
  installPlanApproval: Automatic
  name: cluster-logging
  source: installed-redhat-openshift-logging
  sourceNamespace: openshift-logging
  startingCSV: clusterlogging.4.2.0-201910101614
status:
  currentCSV: clusterlogging.4.2.0-201910101614
  installPlanRef:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-h6p6l
    namespace: openshift-logging
    resourceVersion: "75469490"
    uid: b1c51009-f674-11e9-bf6e-0a8edca4ee4c
  installedCSV: clusterlogging.4.2.0-201910101614
  installplan:
    apiVersion: operators.coreos.com/v1alpha1
    kind: InstallPlan
    name: install-h6p6l
    uuid: b1c51009-f674-11e9-bf6e-0a8edca4ee4c
  lastUpdated: "2019-10-24T15:41:12Z"
  state: AtLatestKnown

Comment 5 Ben Parees 2019-10-24 17:29:59 UTC
please change the subscription to preview if you are installing in a 4.1 cluster.

We can pursue how to enforce this (OLM operators can specify a minimum kubeversion, sounds like the v4.2 logging operator should specify a minimum kubeversion that ensures it only gets installed on v4.2+ clusters), but manually choosing the correct channel should get you past this.

Comment 6 Naveen Malik 2019-10-24 17:31:31 UTC
I have deleted the cluster-logging operator and installed from the "preview" channel and it works.

I hadn't tried selecting a specific channel before this and honestly wasn't sure what the UI allowed until I poked it.  How does a customer know what version of an operator can be deployed on what version of OCP?

Comment 7 Ben Parees 2019-10-24 17:33:08 UTC
see my comment 5.  I think we need to enforce a minKubeVersion in the logging operator to help avoid this problem in the future.  We can use this bug to get that done.

Comment 8 Ben Parees 2019-10-24 17:36:48 UTC
also if the nodeselector in question was on the operand (not the operator) then we should definitely enhance the logging api to allow the user to set their own nodeselector on the operand (i.e. we should also fix that as part of resolving this bug)

Comment 9 Ben Parees 2019-10-24 17:42:06 UTC
> In OSD we manage compute with Hive.  Hive does not allow edits to labels.

also it would be worth taking this up with the Hive team... i'd be interested to know if this is an intentional restriction in Hive, or a missing feature.

Maybe send a BZ/RFE their way also?

Comment 10 Jeff Cantrill 2019-10-25 13:14:32 UTC
(In reply to Ben Parees from comment #8)
> also if the nodeselector in question was on the operand (not the operator)
> then we should definitely enhance the logging api to allow the user to set
> their own nodeselector on the operand (i.e. we should also fix that as part
> of resolving this bug)

You can define nodeselector in the CR but this one in particular is fixed and intentionally it is not possible to change it or override it

Comment 11 Jeff Cantrill 2019-10-25 13:15:28 UTC
(In reply to Ben Parees from comment #7)
> see my comment 5.  I think we need to enforce a minKubeVersion in the
> logging operator to help avoid this problem in the future.  We can use this
> bug to get that done.

Do we know how to enforce?

Comment 12 Ben Parees 2019-10-25 13:30:14 UTC
It's a csv setting, Evan should be able to show you how to do it.

Comment 15 okashi 2019-10-31 19:15:16 UTC
Please note that when installing Cluster Logging you must also be in the "openshift-logging" project and it is not enough to just select the "openshift-logging" namespace as described in the docs.

Comment 16 Anping Li 2019-11-07 09:05:29 UTC
It is still minKubeVersion in CSV 4.3.0-201910311526. Waiting another image.

  # The version value is substituted by the ART pipeline
  version: 4.3.0-201910311526
  displayName: Cluster Logging
  minKubeVersion: 1.14.0

Comment 18 Anping Li 2019-11-22 10:15:06 UTC
The pr weren't merged to 4.3 branch.

Comment 19 Jeff Cantrill 2019-11-25 21:07:22 UTC
This BZ is for the 4.3 branch which was merged here https://github.com/openshift/cluster-logging-operator/pull/263

Comment 20 Anping Li 2019-12-03 02:42:43 UTC
Verified in 4.3, the Logging eventroute message can be gathered and transform to json

Comment 22 errata-xmlrpc 2020-01-23 11:09:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062