Bug 1911016
| Summary: | Prometheus unable to mount NFS volumes after upgrading to 4.6 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yash Chouksey <ychoukse> | 
| Component: | Node | Assignee: | Peter Hunt <pehunt> | 
| Node sub component: | CRI-O | QA Contact: | MinLi <minmli> | 
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | alegrand, anpicker, aos-bugs, dkulkarn, erooth, hekumar, kakkoyun, ksathe, lcosic, minmli, pehunt, pkrupa, rbost, rdomnu, rphillips, schoudha, surbania, wking, xingli | 
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | runc-1.0.0-82.rhaos4.6.git086e841.el8 | Doc Type: | If docs needed, set a value | 
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:48:29 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
 
        
          Description
        
        
          Yash Chouksey
        
        
        
        
        
          2020-12-27 01:19:46 UTC
        
       
      
      
      
    upgrade from 4.6.16 to 4.7.0-0.nightly-2021-02-09-192846 succeed, and after upgrade the prometheus pods runs well and mount nfs volume normally.
@Peter Hunt, I am not sure if we could verify the fix by upgrading from 4.6 to 4.7, because in 4.6, the user id in prometheus pod is 65534 already, and when upgrade to 4.7, the user id keep the same. But in the original bug, in 4.5, the user id is 0(root), and upgrade to 4.6, the user id change to 65534.
Can you confirm this? If yes, I think this bug is verified.
before upgrade ================
    volumeMounts:
    - mountPath: /prometheus
      name: prometheus-k8s-db
      subPath: prometheus-db
  
volumes:
  - name: prometheus-k8s-db
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0
$ oc get pvc 
NAME                                 STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-k8s-db-prometheus-k8s-0   Bound    nfspv6   20Gi       RWO,ROX,RWX    nfs215         82s
prometheus-k8s-db-prometheus-k8s-1   Bound    nfspv5   20Gi       RWO,ROX,RWX    nfs215         82s
$ oc rsh prometheus-k8s-0
Defaulting container name to prometheus.
Use 'oc describe pod/prometheus-k8s-0 -n openshift-monitoring' to see all of the containers in this pod.
sh-4.4$ id 
uid=65534(nobody) gid=65534(nobody) groups=65534(nobody)
sh-4.4$ pwd
/prometheus
sh-4.4$ ls
chunks_head  queries.active  wal
sh-4.4$ ls -lR
.:
total 20
drwxr-xr-x. 2 nobody nobody     6 Feb 10 09:22 chunks_head
-rw-r--r--. 1 nobody nobody 20001 Feb 10 09:27 queries.active
drwxr-xr-x. 2 nobody nobody    22 Feb 10 09:22 wal
after upgrade ============
prometheus-k8s-0                               7/7     Running   1          37m
prometheus-k8s-1                               7/7     Running   1          42m
$ oc rsh prometheus-k8s-0
Defaulting container name to prometheus.
Use 'oc describe pod/prometheus-k8s-0 -n openshift-monitoring' to see all of the containers in this pod.
sh-4.4$ id 
uid=65534(nobody) gid=65534(nobody) groups=65534(nobody)
sh-4.4$ pwd
/prometheus
sh-4.4$ ls
chunks_head  queries.active  wal
sh-4.4$ ls -lR
.:
total 20
drwxr-xr-x. 2 nobody nobody    48 Feb 10 10:20 chunks_head
-rw-r--r--. 1 nobody nobody 20001 Feb 10 10:52 queries.active
drwxr-xr-x. 2 nobody nobody    54 Feb 10 10:15 wal
    I believe the only reason it was failing in upgrade from 4.5 to 4.6 was the directory permission wasn't correctly handled for the new ID, not necessarily that there was a switch in ID. As in, the old ID (root) had permission but the new one did not. Thus, I think the upgrade from 4.6 to 4.7 succeeding verifies the bug is fixed, as this ID does have the correct permission Thanks Peter for confirmation, marking it verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633  |