Description of problem: It takes 10m to bind when pvc is created before pv of pre-bind Version-Release number of selected component (if applicable): openshift v3.6.140 kubernetes v1.6.1+5115d708d7 How reproducible: Always Steps to Reproduce: 1. Create pvc of pre-bind to pv $ cat preboundpvc-rwo.yaml | oc create -f - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nfsc spec: volumeName: pv-oq83a accessModes: - ReadWriteOnce resources: requests: storage: 1Gi 2. Create this pre-bind pv $ cat nfs.json | oc create -f - { "apiVersion": "v1", "kind": "PersistentVolume", "metadata": { "name": "pv-oq83a" }, "spec": { "capacity": { "storage": "5Gi" }, "accessModes": [ "ReadWriteOnce" ], "nfs": { "path": "/jhou", "server": "10.66.79.133" } } } 3. Check the pvc Actual results: It takes long time about 10 mins to bind $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 3m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 4m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 4m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 4m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 4m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 6m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 6m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Pending pv-oq83a 0 8m $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-prebind-oq83a Bound pv-oq83a 5Gi RWO 10m Expected results: pvc and pv should be bound within 60s Master Log: Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.768371 7690 admission.go:104] no storage class for claim pvc-prebind-oq83a (generate: ) Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770245 7690 wrap.go:42] POST /api/v1/namespaces/oq83a/persistentvolumeclaims: (8.081657ms) 201 [[oc/v1.6.1+5115d708d7 (linux/amd64) kubernetes/5115d70] 119.254.120.72:50091] Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770803 7690 graph_builder.go:457] GraphBuilder process object: v1/PersistentVolumeClaim, namespace oq83a, name pvc-prebind-oq83a, uid 4c0705df-66d5-11e7-b39c-000d3a1a322e, event type 0 Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770851 7690 pv_controller_base.go:180] enqueued "oq83a/pvc-prebind-oq83a" for sync Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770908 7690 pv_controller_base.go:360] claimWorker[oq83a/pvc-prebind-oq83a] Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770923 7690 pv_controller_base.go:530] storeObjectUpdate: adding claim "oq83a/pvc-prebind-oq83a", version 13444 Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770958 7690 pv_controller.go:219] synchronizing PersistentVolumeClaim[oq83a/pvc-prebind-oq83a]: phase: Pending, bound to: "pv-oq83a", bindCompleted: false, boundByController: false Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770964 7690 pv_controller.go:274] synchronizing unbound PersistentVolumeClaim[oq83a/pvc-prebind-oq83a]: volume "pv-oq83a" requested Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770973 7690 pv_controller.go:283] synchronizing unbound PersistentVolumeClaim[oq83a/pvc-prebind-oq83a]: volume "pv-oq83a" requested and not found, will try again next time Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770979 7690 pv_controller.go:566] updating PersistentVolumeClaim[oq83a/pvc-prebind-oq83a] status: set phase Pending Jul 12 07:39:54 storage-master-nfs-1 atomic-openshift-master: I0712 07:39:54.770992 7690 pv_controller.go:614] updating PersistentVolumeClaim[oq83a/pvc-prebind-oq83a] status: phase Pending already set Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.317623 7690 pv_controller_base.go:180] enqueued "pv-oq83a" for sync Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.317654 7690 pv_controller_base.go:302] volumeWorker[pv-oq83a] Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.317668 7690 pv_controller_base.go:530] storeObjectUpdate: adding volume "pv-oq83a", version 13449 Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.317707 7690 pv_controller.go:407] synchronizing PersistentVolume[pv-oq83a]: phase: Pending, bound to: "", boundByController: false Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.317717 7690 pv_controller.go:412] synchronizing PersistentVolume[pv-oq83a]: volume is unused Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.317721 7690 pv_controller.go:663] updating PersistentVolume[pv-oq83a]: set phase Available Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.327392 7690 handler.go:146] kube-apiserver: PUT "/api/v1/persistentvolumes/pv-oq83a/status" satisfied by gorestful with webservice /api/v1 Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329114 7690 wrap.go:42] PUT /api/v1/persistentvolumes/pv-oq83a/status: (10.713973ms) 200 [[openshift/v1.6.1+5115d708d7 (linux/amd64) kubernetes/fff65cf/system:serviceaccount:kube-system:persistent-volume-binder] 52.179.12.254:58410] Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329314 7690 graph_builder.go:457] GraphBuilder process object: v1/PersistentVolume, namespace , name pv-oq83a, uid 52505f23-66d5-11e7-b39c-000d3a1a322e, event type 1 Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329383 7690 pv_controller_base.go:180] enqueued "pv-oq83a" for sync Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329545 7690 pv_controller_base.go:558] storeObjectUpdate updating volume "pv-oq83a" with version 13450 Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329563 7690 pv_controller.go:692] volume "pv-oq83a" entered phase "Available" Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329587 7690 pv_controller_base.go:302] volumeWorker[pv-oq83a] Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329596 7690 pv_controller_base.go:558] storeObjectUpdate updating volume "pv-oq83a" with version 13450 Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329605 7690 pv_controller.go:407] synchronizing PersistentVolume[pv-oq83a]: phase: Available, bound to: "", boundByController: false Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329614 7690 pv_controller.go:412] synchronizing PersistentVolume[pv-oq83a]: volume is unused Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329617 7690 pv_controller.go:663] updating PersistentVolume[pv-oq83a]: set phase Available Jul 12 07:40:05 storage-master-nfs-1 atomic-openshift-master: I0712 07:40:05.329622 7690 pv_controller.go:666] updating PersistentVolume[pv-oq83a]: phase Available already set Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.118432 7690 graph_builder.go:457] GraphBuilder process object: v1/PersistentVolume, namespace , name pv-oq83a, uid 52505f23-66d5-11e7-b39c-000d3a1a322e, event type 1 Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.118466 7690 pv_controller_base.go:180] enqueued "pv-oq83a" for sync Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.118558 7690 pv_controller_base.go:558] storeObjectUpdate updating volume "pv-oq83a" with version 13693 Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.118574 7690 pv_controller.go:692] volume "pv-oq83a" entered phase "Bound" Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.118581 7690 pv_controller.go:788] updating PersistentVolumeClaim[oq83a/pvc-prebind-oq83a]: binding to "pv-oq83a" Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.118596 7690 pv_controller.go:828] volume "pv-oq83a" bound to claim "oq83a/pvc-prebind-oq83a" Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.123735 7690 store.go:345] GuaranteedUpdate of /kubernetes.io/persistentvolumes/pv-oq83a failed because of a conflict, going to retry Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.123950 7690 wrap.go:42] PUT /api/v1/persistentvolumes/pv-oq83a/status: (16.552712ms) 409 [[openshift/v1.6.1+5115d708d7 (linux/amd64) kubernetes/fff65cf/system:serviceaccount:kube-system:persistent-volume-binder] 52.179.12.254:58410] Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.123965 7690 handler.go:146] kube-apiserver: PUT "/api/v1/namespaces/default/endpoints/router" satisfied by gorestful with webservice /api/v1 Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124258 7690 pv_controller.go:684] updating PersistentVolume[pv-oq83a]: set phase Bound failed: Operation cannot be fulfilled on persistentvolumes "pv-oq83a": the object has been modified; please apply your changes to the latest version and try again Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124295 7690 pv_controller_base.go:215] could not sync volume "pv-oq83a": Operation cannot be fulfilled on persistentvolumes "pv-oq83a": the object has been modified; please apply your changes to the latest version and try again Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124309 7690 pv_controller_base.go:302] volumeWorker[pv-oq83a] Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124341 7690 pv_controller_base.go:558] storeObjectUpdate updating volume "pv-oq83a" with version 13693 Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124359 7690 pv_controller.go:407] synchronizing PersistentVolume[pv-oq83a]: phase: Bound, bound to: "oq83a/pvc-prebind-oq83a (uid: 4c0705df-66d5-11e7-b39c-000d3a1a322e)", boundByController: true Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124364 7690 pv_controller.go:432] synchronizing PersistentVolume[pv-oq83a]: volume is bound to claim oq83a/pvc-prebind-oq83a Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124373 7690 pv_controller.go:449] synchronizing PersistentVolume[pv-oq83a]: claim oq83a/pvc-prebind-oq83a found: phase: Pending, bound to: "pv-oq83a", bindCompleted: false, boundByController: false Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124380 7690 pv_controller.go:502] synchronizing PersistentVolume[pv-oq83a]: all is bound Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124383 7690 pv_controller.go:663] updating PersistentVolume[pv-oq83a]: set phase Bound Jul 12 07:49:40 storage-master-nfs-1 atomic-openshift-master: I0712 07:49:40.124388 7690 pv_controller.go:666] updating PersistentVolume[pv-oq83a]: phase Bound already set Additional info: This was also happened while PV pre-bind to PVC
Reproduced locally, So far I observed it only on fresh "installations" of locally complied openshift (tag v3.6.140-1): $ rm -rf openshift.local.etcd $ openshift start --loglevel=5 |& tee log $ oc create -f pvc.yaml (any PVC is fine as long as it does not trigger dynamic provisioning) $ grep synchronizing log 11:05:47.192983 12268 pv_controller.go:219] synchronizing PersistentVolumeClaim[default/nfsc]: phase: Pending, ... This line should repeat every 15 seconds. *Sometimes* it happens that it does not repeat at all. Restart of openshift-master usually fixes the problem. Digging deeper, I noticed these lines in the log: I0713 11:47:59.704986 17163 client_builder.go:233] Verified credential for persistent-volume-binder/kube-system ... W0713 11:47:59.705900 17163 shared_informer.go:298] resyncPeriod 15000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 W0713 11:47:59.705951 17163 shared_informer.go:298] resyncPeriod 15000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 It's from PV controller startup where it calls AddEventHandlerWithResyncPeriod() on PV and PVC shared informers, requesting resync every 15 second. Instead it gets 600 seconds, i.e. 10 minutes. There seems to be some race where this bug is reproduced when OpenShift is busy initializing everything (namespaces, roles, ...) and not reproduced (or much harder to reproduce) when it gets already pre-populated etcd and just starts.
Bisect revealed the culprit: commit f80e65ca2976add1bc994f3b65532e53408ed120 (HEAD, refs/bisect/bad) Author: deads2k <deads> Date: Tue Jun 20 09:17:20 2017 -0400 refactor openshift start to separate controllers and apiserver Steps to reproduce: 1. rm -rf openshift.local.etcd (this makes the bug more reproducible as it slows down openshift startup) 2. openshift start --loglevel=5 |& tee log 3. wait ~1 minute 4. grep resyncPeriod log Actual result (in ~50% of cases): shared_informer.go:298] resyncPeriod 300000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 shared_informer.go:298] resyncPeriod 15000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 shared_informer.go:298] resyncPeriod 30000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 shared_informer.go:298] resyncPeriod 120000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 shared_informer.go:298] resyncPeriod 120000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 Expected result: W0713 12:13:30.171812 20020 shared_informer.go:298] resyncPeriod 120000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. Changing it to 600000000000 [From some reason there was always one complaining informer, even before the patch. It's worth investigation too, maybe it breaks something.] Looking at the code, Master.Start() starts both API server and controller-manager this way: if controllersEnabled { ... go func() { controllerPlug.WaitForStart() startControllers() informers.Start(utilwait.NeverStop) }() } if m.api { ... informers.Start(utilwait.NeverStop) } So, we have two goroutines starting `informers` eventually. Can the "api" one start the informers before the "controller" one initializes everything?
Opened https://github.com/kubernetes/kubernetes/issues/48941 upstream to fix the PV controller's dependence on a 15 second resync
Is this only occurring in an "all-in-one" setup (with controllers and apiserver running in the same process)?
fixed by https://github.com/openshift/origin/pull/15217
I have tried on a HA OCP which means it has controllers and apiserver service run separately. This issue is not repro. version is below: openshift v3.6.151 kubernetes v1.6.1+5115d708d7 # for num in {0..10} ; do oc get pvc ; done pvc-w8bdw Pending 19s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Pending pv-prebind-w8bdw 0 19s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Pending pv-prebind-w8bdw 0 20s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Bound pv-prebind-w8bdw 1Gi RWO 20s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Bound pv-prebind-w8bdw 1Gi RWO 20s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Bound pv-prebind-w8bdw 1Gi RWO 20s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Bound pv-prebind-w8bdw 1Gi RWO 20s NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE pvc-w8bdw Bound pv-prebind-w8bdw 1Gi RWO 21s
> Is this only occurring in an "all-in-one" setup (with controllers and apiserver running in the same process)? Yes, PV controller is fine in standalone controller-manager process. On the other hand, I still see two informers started too early on Friday's(?) master: github.com/openshift/origin/pkg/service/controller/servingcert.NewServiceServingCertController: resyncPeriod 120000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/namespace.NewNamespaceController: resyncPeriod 300000000000 is smaller than resyncCheckPeriod 600000000000 and the informer has already started. I am not sure if it has any impact on OpenShift stability, feel free to open a separate bug if you think it's worth fixing.
I'm not concerned about those
fixed in 3.6.148-1
Tested on below version: openshift v3.6.153 kubernetes v1.6.1+5115d708d7 This bug is fixed.
This was fixed in 3.6.0