2228805 – rsh failed for new prometheus-adapter pod which is running in openshift-storage NS

Bug 2228805 - rsh failed for new prometheus-adapter pod which is running in openshift-storage NS

Summary: rsh failed for new prometheus-adapter pod which is running in openshift-stora...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.14.0
Assignee:	Naveen Paul
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-08-03 10:25 UTC by Vijay Avuthu
Modified:	2023-11-08 18:54 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.14.0-110
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 18:53:30 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 1190	None	Merged	Skip hpav2 resources if metrics type is Resource	2023-08-10 08:45:39 UTC
Github	noobaa noobaa-operator pull 1194	None	open	[backport to 5.14] Skip hpav2 resources if metrics type is Resource	2023-08-10 08:45:41 UTC
Red Hat Product Errata	RHSA-2023:6832	None	None	None	2023-11-08 18:54:29 UTC

Description Vijay Avuthu 2023-08-03 10:25:00 UTC

Description of problem (please be detailed as possible and provide log
snippests):

New pod "prometheus-adapter" is running in openshift-storage and rsh failed

Version of all relevant components (if applicable):
ODF 4.14.0-96


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2/2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. install ODF using ocs-ci
2. new pod "prometheus-adapter" is shown up and not able to rsh
3.


Actual results:

$ oc rsh prometheus-adapter-6cb87d55b8-46qvl
ERRO[0000] exec failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory 
command terminated with exit code 255

Expected results:

rsh should work


Additional info:

$ oc get pods | grep -i prom
prometheus-adapter-6cb87d55b8-46qvl                               1/1     Running     0          31m

$ $ oc get pod prometheus-adapter-6cb87d55b8-46qvl -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.33/23"],"mac_address":"0a:58:0a:80:02:21","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.2.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.2.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.2.1"}],"ip_address":"10.128.2.33/23","gateway_ip":"10.128.2.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.128.2.33"
          ],
          "mac": "0a:58:0a:80:02:21",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2023-08-03T09:41:26Z"
  generateName: prometheus-adapter-6cb87d55b8-
  labels:
    app.kubernetes.io/component: metrics-adapter
    app.kubernetes.io/name: prometheus-adapter
    app.kubernetes.io/version: 0.10.0
    pod-template-hash: 6cb87d55b8
  name: prometheus-adapter-6cb87d55b8-46qvl
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: prometheus-adapter-6cb87d55b8
    uid: 209d4e01-50e1-4fd0-a38d-81b7ef16d94c
  resourceVersion: "74594"
  uid: 32ec7844-4d72-4a7c-9aea-5bd1d5ad5981
spec:
  automountServiceAccountToken: true
  containers:
  - args:
    - --v=6
    - --config=/etc/adapter/config.yaml
    - --logtostderr=true
    - --metrics-relist-interval=1m
    - --secure-port=6443
    - --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
    - --prometheus-url=https://prometheus-k8s.openshift-monitoring.svc:9091
    - --prometheus-auth-config=/etc/prometheus-config/prometheus-config.yaml
    - --tls-cert-file=/var/run/serving-cert/tls.crt
    - --tls-private-key-file=/var/run/serving-cert/tls.key
    image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.10.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 5
      httpGet:
        path: /livez
        port: https
        scheme: HTTPS
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    name: prometheus-adapter
    ports:
    - containerPort: 6443
      name: https
      protocol: TCP
    readinessProbe:
      failureThreshold: 5
      httpGet:
        path: /readyz
        port: https
        scheme: HTTPS
      initialDelaySeconds: 30
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 102m
        memory: 180Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000670000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /tmp
      name: tmpfs
      readOnly: true
    - mountPath: /var/run/serving-cert
      name: volume-serving-cert
    - mountPath: /etc/adapter
      name: config
      readOnly: true
    - mountPath: /etc/prometheus-config
      name: prometheus-adapter-prometheus-config
    - mountPath: /etc/ssl/certs
      name: serving-certs-ca-bundle
      readOnly: true
    - mountPath: /var/run/empty/serving-cert
      name: volume-empty-serving-cert
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-bgs4q
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: custom-metrics-prometheus-adapter-dockercfg-72hmj
  nodeName: compute-0
  nodeSelector:
    kubernetes.io/os: linux
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000670000
    seLinuxOptions:
      level: s0:c26,c10
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: custom-metrics-prometheus-adapter
  serviceAccountName: custom-metrics-prometheus-adapter
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - emptyDir: {}
    name: tmpfs
  - name: volume-serving-cert
    secret:
      defaultMode: 420
      optional: true
      secretName: prometheus-adapter-serving-cert
  - configMap:
      defaultMode: 420
      name: adapter-config
      optional: true
    name: config
  - configMap:
      defaultMode: 420
      name: prometheus-adapter-prometheus-config
      optional: true
    name: prometheus-adapter-prometheus-config
  - configMap:
      defaultMode: 420
      items:
      - key: service-ca.crt
        path: service-ca.crt
      name: serving-certs-ca-bundle
      optional: true
    name: serving-certs-ca-bundle
  - emptyDir: {}
    name: volume-empty-serving-cert
  - name: kube-api-access-bgs4q
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-08-03T09:41:26Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-08-03T09:42:02Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-08-03T09:42:02Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-08-03T09:41:26Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://e099e46d491a8c1814a7e5b5d6ba8df6e937295a51e8602ec4aac0bc58b6bdc9
    image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.10.0
    imageID: registry.k8s.io/prometheus-adapter/prometheus-adapter@sha256:276d59929ae0429b2ce765f814a2ac3c4bd6e315f1a59ccd3315b0da097dcf1e
    lastState: {}
    name: prometheus-adapter
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-08-03T09:41:28Z"
  hostIP: 10.1.112.199
  phase: Running
  podIP: 10.128.2.33
  podIPs:
  - ip: 10.128.2.33
  qosClass: Burstable
  startTime: "2023-08-03T09:41:26Z"

> 
  ownerReferences:
  - apiVersion: noobaa.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: NooBaa
    name: noobaa
    uid: 6615a5e2-d7d3-4495-b08b-b53b7099da4d
  resourceVersion: "74598"
  uid: 7165cc08-7cb2-44eb-9f49-f802236e917e


job: https://url.corp.redhat.com/29c3460
must gather: https://url.corp.redhat.com/7b80ee0

Also, could you provide some info why this pod is useful and how its different from 
"prometheus-adapter" in "openshift-monitoring" namespace.

Comment 2 Naveen Paul 2023-08-07 05:22:31 UTC

@vijay `prometheus-adapter` in `openshift-storage` is similar to `openhift-monitoring` one. The difference here is Prometheus adaptor in openshift-storage is pull metrics from Noobaa specific metrics endpoints and makes them available for Noobaa HPAV2 as custom metrics.

Comment 10 Vijay Avuthu 2023-08-17 10:22:31 UTC

verified with ocs-registry:4.14.0-110

job: https://url.corp.redhat.com/9e3924e
logs: https://url.corp.redhat.com/cb00547

As expected, prometheus-adapter disappeared after installation

$ oc get csv
NAME                                         DISPLAY                       VERSION             REPLACES   PHASE
mcg-operator.v4.14.0-110.stable              NooBaa Operator               4.14.0-110.stable              Succeeded
ocs-operator.v4.14.0-110.stable              OpenShift Container Storage   4.14.0-110.stable              Succeeded
odf-csi-addons-operator.v4.14.0-110.stable   CSI Addons                    4.14.0-110.stable              Succeeded
odf-operator.v4.14.0-110.stable              OpenShift Data Foundation     4.14.0-110.stable              Succeeded

$ oc get pods | grep -i prom
$

Comment 12 Naveen Paul 2023-09-21 06:35:43 UTC

no need to update doc for this BZ

Comment 14 errata-xmlrpc 2023-11-08 18:53:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832

Note You need to log in before you can comment on or make changes to this bug.