Bug 1623426
Summary: | [3.9] Recycler refuses to recycle PVs when a pod has a different PV using the same PVC | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Priyanka Kanthale <pkanthal> | |
Component: | Storage | Assignee: | Tomas Smetana <tsmetana> | |
Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> | |
Severity: | low | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.7.0 | CC: | aos-bugs, aos-storage-staff, bchilds, lxia, pkanthal, pwallend | |
Target Milestone: | --- | |||
Target Release: | 3.9.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: |
undefined
|
Story Points: | --- | |
Clone Of: | ||||
: | 1663145 (view as bug list) | Environment: | ||
Last Closed: | 2019-02-20 08:46:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1663145, 1663260 | |||
Bug Blocks: |
Description
Priyanka Kanthale
2018-08-29 10:39:50 UTC
I will try to reproduce the problem. The code to invoke the recycler is the same in the newest openshift/kubernetes versions so it's possible this is not only 3.7 bug... I think I know what is causing the problem -- this method is not checing the PVC to PV binding and may incorrectly consider a PV to be still in use if the PVC was bound to another PV in the meantime: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/controller/volume/persistentvolume/pv_controller.go#L1270 The fix is quite simple. I still have troubles reproducing the issue though: it's quite timing-sensitive. QE try to verify the fix with below version and steps, but volume failed to mount which blocks bug verification. @Tomas could you check if there is anything QE is doing not correctly ? ================================================================================================================================================== # oc version oc v3.9.64 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://preserve-qe-lxia-39-master-etcd-nfs-1:8443 openshift v3.9.64 kubernetes v1.9.1+a0ce1bc657 ================================================================================================================================================== # oc get pv,pvc,pod NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv/pv1 10G RWX Recycle Bound p1/mypvc 13m pv/pv2 10G RWX Recycle Available 12m pv/regpv-volume 17G RWX Retain Bound default/regpv-claim 1d NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc/mypvc Bound pv1 10G RWX 13m NAME READY STATUS RESTARTS AGE po/dynamic 0/1 ContainerCreating 0 12m ================================================================================================================================================== Node logs, Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.182726 124020 volume_host.go:218] using default mounter/exec for kubernetes.io/nfs Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.282939 124020 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv1" (UniqueName: "kubernetes.io/nfs/e0477a21-1326-11e9-aff2-fa163ed08e39-pv1") pod "dynamic" (UID: "e0477a21-1326-11e9-aff2-fa163ed08e39") Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.282995 124020 volume_host.go:218] using default mounter/exec for kubernetes.io/nfs Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.283022 124020 reconciler.go:262] operationExecutor.MountVolume started for volume "pv1" (UniqueName: "kubernetes.io/nfs/e0477a21-1326-11e9-aff2-fa163ed08e39-pv1") pod "dynamic" (UID: "e0477a21-1326-11e9-aff2-fa163ed08e39") Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.283111 124020 nsenter_mount.go:167] findmnt: directory /var/lib/origin/openshift.local.volumes/pods/e0477a21-1326-11e9-aff2-fa163ed08e39/volumes/kubernetes.io~nfs/pv1 does not exist Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.283142 124020 nsenter.go:107] Running nsenter command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- realpath -e /var/lib/origin/openshift.local.volumes/pods/e0477a21-1326-11e9-aff2-fa163ed08e39/volumes/kubernetes.io~nfs/pv1] Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.286812 124020 nsenter.go:151] failed to resolve symbolic links on /var/lib/origin/openshift.local.volumes/pods/e0477a21-1326-11e9-aff2-fa163ed08e39/volumes/kubernetes.io~nfs/pv1: exit status 1 Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.286830 124020 nfs.go:237] NFS mount set up: /var/lib/origin/openshift.local.volumes/pods/e0477a21-1326-11e9-aff2-fa163ed08e39/volumes/kubernetes.io~nfs/pv1 false exit status 1 Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: E0108 04:27:20.286937 124020 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/nfs/e0477a21-1326-11e9-aff2-fa163ed08e39-pv1\" (\"e0477a21-1326-11e9-aff2-fa163ed08e39\")" failed. No retries permitted until 2019-01-08 04:27:52.286909929 -0500 EST m=+79.248905481 (durationBeforeRetry 32s). Error: "MountVolume.SetUp failed for volume \"pv1\" (UniqueName: \"kubernetes.io/nfs/e0477a21-1326-11e9-aff2-fa163ed08e39-pv1\") pod \"dynamic\" (UID: \"e0477a21-1326-11e9-aff2-fa163ed08e39\") : exit status 1" Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.286959 124020 server.go:290] Event(v1.ObjectReference{Kind:"Pod", Namespace:"p1", Name:"dynamic", UID:"e0477a21-1326-11e9-aff2-fa163ed08e39", APIVersion:"v1", ResourceVersion:"136061", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "pv1" : exit status 1 Jan 08 04:27:20 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[123994]: I0108 04:27:20.287161 124020 request.go:881] Request Body: {"count":7,"lastTimestamp":"2019-01-08T09:27:20Z","message":"MountVolume.SetUp failed for volume \"pv1\" : exit status 1"} This looks to be unrelated to the fix in question. Might be another instance of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1663260 # oc version oc v3.9.65 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-lxia-39-nfs-master-etcd-nfs-1:8443 openshift v3.9.65 kubernetes v1.9.1+a0ce1bc657 # oc get pv pv1 pv2 ; oc get pvc mypvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv1 5G RWX Recycle Bound default/mypvc 26m pv2 5G RWX Recycle Available 53s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Bound pv1 5G RWX 1m # oc delete pvc mypvc ; oc create -f pvc.yaml persistentvolumeclaim "mypvc" deleted persistentvolumeclaim "mypvc" created # oc get pv pv1 pv2 ; oc get pvc mypvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv1 5G RWX Recycle Released default/mypvc 26m pv2 5G RWX Recycle Bound default/mypvc 1m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Bound pv2 5G RWX 3s # oc get pv pv1 pv2 ; oc get pvc mypvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv1 5G RWX Recycle Available 26m pv2 5G RWX Recycle Bound default/mypvc 1m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Bound pv2 5G RWX 10s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0331 |