Description of problem: When creating a PV-backed app, persistent volumes take about 7-8 minutes to bind to the PVC. It takes even longer for the PV to get an AWS volume ID. Version-Release number of selected component (if applicable): atomic-openshift-3.3.1.3-1.git.0.86dc49a.el7.x86_64 oso-volume-provisioner:v3.3.0.0-4 How reproducible: 100% Steps to Reproduce: 1. oadm new-project <projectname> 2. oc new-app cakephp-mysql-example -n <projectname> 3. oc get pvc -n <projectname> -w Actual results: It takes about 7-8 minutes for the PVC to bind. And the PV is bound without a real AWS volume ID. Expected results: PVC should bind within about a minute and have a real AWS volume ID. Additional info: # These logs show the process of creating a new project and app. # Even though I've never used this project name before, it said it already exists. This particular behavior is not very reproducible, but the overall behavior here is (PVs are slow to provision/bind). [root@preview-master-e69da ~]# oadm new-project dakinitestpvs Error from server: namespaces "dakinitestpvs" already exists [root@preview-master-e69da ~]# oc new-app cakephp-mysql-example -n dakinitestpvs # after 8 minutes, the PV still does not have an AWS volume ID. [root@preview-master-e69da ~]# oc get pvc -n dakinitestpvs -w NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Pending 54s NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Pending pv-aws-vy3qj 0 5m mysql Bound pv-aws-vy3qj 1Gi RWO 5m [root@preview-master-e69da ~]# oc get pvc -n dakinitestpvs -w NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Bound pv-aws-vy3qj 1Gi RWO 8m [root@preview-master-e69da ~]# oc describe pv pv-aws-vy3qj Name: pv-aws-vy3qj Labels: <none> Status: Bound Claim: dakinitestpvs/mysql Reclaim Policy: Delete Access Modes: RWO Capacity: 1Gi Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: placeholder-for-provisioning FSType: ext4 Partition: 0 ReadOnly: false No events. # after 10-11 minutes I noticed it finally has an AWS volume ID [root@preview-master-e69da ~]# oc get pvc -n dakinitestpvs -w The connection to the server internal.api.preview.openshift.com was refused - did you specify the right host or port? [root@preview-master-e69da ~]# oc get pvc -n dakinitestpvs NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Bound pv-aws-vy3qj 1Gi RWO 11m [root@preview-master-e69da ~]# oc describe pv pv-aws-vy3qj Name: pv-aws-vy3qj Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1c Status: Bound Claim: dakinitestpvs/mysql Reclaim Policy: Delete Access Modes: RWO Capacity: 1Gi Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1c/vol-01472e33d2fc71a03 FSType: ext4 Partition: 0 ReadOnly: false No events. # PV provisioner pod logs [root@preview-master-e69da ~]# oc logs --tail=500 online-volume-provisioner-4-kq3o0 -n openshift-infra -f |grep dakinitestpvs E1107 19:08:10.189364 1 provisioner.go:362] PersistentVolumeClaim[dakinitestpvs/mysql] error updating claim. This could lead to a duplicate volumes being provisioned: Operation cannot be fulfilled on persistentvolumeclaims "mysql": the object has been modified; please apply your changes to the latest version and try again I1107 19:08:10.189504 1 event.go:211] Event(api.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dakinitestpvs", Name:"mysql", UID:"9b972792-a546-11e6-8c72-0ebeb1070c7f", APIVersion:"v1", ResourceVersion:"322792491", FieldPath:""}): type: 'Normal' reason: 'ProvisioningStarted' Volume pv-aws-vy3qj is being created and provisioned for this claim. E1107 19:08:10.541872 1 event.go:193] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"mysql.1484e9d7ab882fcf", GenerateName:"", Namespace:"dakinitestpvs", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dakinitestpvs", Name:"mysql", UID:"9b972792-a546-11e6-8c72-0ebeb1070c7f", APIVersion:"v1", ResourceVersion:"322792491", FieldPath:""}, Reason:"ProvisioningStarted", Message:"Volume pv-aws-vy3qj is being created and provisioned for this claim.", Source:api.EventSource{Component:"online-volume-provisioner", Host:""}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63614160490, nsec:189434831, loc:(*time.Location)(0x279ed20)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63614160490, nsec:189434831, loc:(*time.Location)(0x279ed20)}}, Count:1, Type:"Normal"}': 'User "system:serviceaccount:openshift-infra:volume-provisioner" cannot create events in project "dakinitestpvs"' (will not retry!) # at this point, about 5 minutes after the app was created, the placeholder-for-provisioning is still present in the PV I1107 19:13:10.383860 1 event.go:211] Event(api.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dakinitestpvs", Name:"mysql", UID:"9b972792-a546-11e6-8c72-0ebeb1070c7f", APIVersion:"v1", ResourceVersion:"322832521", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSuccessful' A volume has been provisioned for this claim. E1107 19:13:10.394895 1 event.go:193] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"mysql.1484ea1d9081a9b2", GenerateName:"", Namespace:"dakinitestpvs", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"dakinitestpvs", Name:"mysql", UID:"9b972792-a546-11e6-8c72-0ebeb1070c7f", APIVersion:"v1", ResourceVersion:"322832521", FieldPath:""}, Reason:"ProvisioningSuccessful", Message:"A volume has been provisioned for this claim.", Source:api.EventSource{Component:"online-volume-provisioner", Host:""}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63614160790, nsec:383733170, loc:(*time.Location)(0x279ed20)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63614160790, nsec:383733170, loc:(*time.Location)(0x279ed20)}}, Count:1, Type:"Normal"}': 'User "system:serviceaccount:openshift-infra:volume-provisioner" cannot create events in project "dakinitestpvs"' (will not retry!) # controller logs [sedgar@use-tower1 ~]$ opssh -i -c preview "journalctl -lu atomic-openshift-master-controllers --since=today |grep dakinitestpvs" --v3 -t master [1] 19:22:30 [FAILURE] 54.211.248.61 Exited with error code 1 [2] 19:22:30 [SUCCESS] 54.83.159.229 Nov 08 00:01:37 ip-172-31-10-25.ec2.internal atomic-openshift-master-controllers[57968]: I1108 00:01:37.076227 57968 vnids_master.go:119] Allocated netid 3370675 for namespace "dakinitestpvs" Nov 08 00:02:35 ip-172-31-10-25.ec2.internal atomic-openshift-master-controllers[57968]: I1108 00:02:35.575129 57968 factory.go:212] Replication controller "dakinitestpvs/" has been deleted Nov 08 00:03:10 ip-172-31-10-25.ec2.internal atomic-openshift-master-controllers[57968]: I1108 00:03:10.085651 57968 factory.go:212] Replication controller "dakinitestpvs/" has been deleted [3] 19:22:30 [FAILURE] 54.173.224.22 Exited with error code 1 # related pods never start up [root@preview-master-e69da ~]# oc get pods -n dakinitestpvs NAME READY STATUS RESTARTS AGE cakephp-mysql-example-1-build 0/1 Error 0 20m mysql-1-deploy 0/1 Error 0 20m # this could be related to some controller crashes [sedgar@use-tower1 ~]$ opssh -i -c preview "journalctl -lu atomic-openshift-master-controllers --since=today |grep 'failed state'" --v3 -t master [1] 19:40:59 [SUCCESS] 54.211.248.61 Nov 08 00:22:20 ip-172-31-10-23.ec2.internal systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state. Nov 08 00:26:42 ip-172-31-10-23.ec2.internal systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state. [...] [sedgar@use-tower1 ~]$ opssh -i -c preview "journalctl -lu atomic-openshift-master-controllers --since 2016-11-07 --until 2016-11-08 |grep -c 'failed state'" --v3 -t master [1] 19:47:02 [SUCCESS] 54.211.248.61 101 [2] 19:47:05 [SUCCESS] 54.173.224.22 112 [3] 19:47:08 [SUCCESS] 54.83.159.229 108
This is the raw data for the API DoS issue in Preview. It appears to be mostly caused by DescribeInstances calls. DescribeVolumes is also happening often enough to cause issues. [dakini@samadhi ~]$ for error in $(grep errorCode ~/preview_cloudtrail_20161108.txt|sort -u|awk '{print $2}'|tr -d \":,) ; do echo "-------- $error ---------" ; grep errorCode ~/preview_cloudtrail_20161108. txt |grep -c $error ; done -------- AccessPointNotFoundException --------- 1625 -------- Client.InvalidParameterValue --------- 21 -------- Client.InvalidVolume.NotFound --------- 1 -------- Client.RequestLimitExceeded --------- 269900 -------- Client.UnauthorizedOperation --------- 20 -------- Client.VolumeInUse --------- 102747 -------- Server.InternalError --------- 116 -------- Server.Unavailable --------- 9 [dakini@samadhi ~]$ for event in $(grep eventName ~/preview_cloudtrail_20161108.txt|sort -u|awk '{print $2}'|tr -d \":,) ; do echo "-------- $event ---------" ; grep eventName ~/preview_cloudtrail_20161108.txt |grep -c $event ; done -------- AssumeRoleWithSAML --------- 1 -------- AttachVolume --------- 59691 -------- CheckMfa --------- 1 -------- ConsoleLogin --------- 1 -------- CreateSnapshot --------- 1374 -------- CreateTags --------- 2922 -------- CreateVolume --------- 133 -------- DeleteAccessKey --------- 1 -------- DeleteLoginProfile --------- 1 -------- DeleteSnapshot --------- 66 -------- DeleteUser --------- 1 -------- DeleteVolume --------- 46528 -------- DeregisterInstancesFromLoadBalancer --------- 5 -------- DescribeConfigurationRecorders --------- 3 -------- DescribeConfigurationRecorderStatus --------- 3 -------- DescribeInstanceHealth --------- 15 -------- DescribeInstances --------- 622988 -------- DescribeLoadBalancers --------- 1643 -------- DescribeReservedInstances --------- 1 -------- DescribeSnapshots --------- 23 -------- DescribeSubnets --------- 20 -------- DescribeTrails --------- 8 -------- DescribeVolumes --------- 211991 -------- DetachVolume --------- 207 -------- GetBucketLocation --------- 2 -------- GetBucketVersioning --------- 1 -------- GetUser --------- 1 -------- ListAccessKeys --------- 2 -------- ListBuckets --------- 4 -------- LookupEvents --------- 3 -------- RegisterInstancesWithLoadBalancer --------- 10 -------- RemoveUserFromGroup --------- 1 If we can find a way to reduce either of those calls, Preview should stabilize. But I'm not sure how to do that offhand. I'll have to look through more logs to see at what points openshift is calling DescribeInstances. I might be able to reduce the DescribeVolumes calls by finding and deleting any volumes that haven't been properly cleaned up by the cluster.
I wasn't able to work around the issue by detaching volumes. It turns out the affected volumes are attaching repeatedly and sometimes succeeding. So we need to figure out why they're attempting to attach so often, even when their previous attach attempts were successful. This shows one volume where the attach succeeded 73 times today, and only failed 11 times. Why does it keep trying to attach? Maybe due to node evacuations? [root@preview-master-e69da ~]# journalctl -lu atomic-openshift-master-controllers --since=today |grep vol-08030aec4f6f0e0e3 |grep 'Attach succeeded' -c 73 [root@preview-master-e69da ~]# journalctl -lu atomic-openshift-master-controllers --since=today |grep vol-08030aec4f6f0e0e3 |grep 'Attach failed' -c 11 The number of attach attempts seems to line up with the events listed in the project: [root@preview-master-e69da ~]# oc get events -w -n swd-test-project LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 2016-11-10T00:22:01Z 2016-11-09T21:35:42Z 76 jenkins-1-pftp6 Pod Warning FailedMount {kubelet ip-172-31-2-89.ec2.internal} Unable to mount volumes for pod "jenkins-1-pftp6_swd-test-project(2fa9aed3-a6c4-11e6-8286-0ebeb1070c7f)": timeout expired waiting for volumes to attach/mount for pod "jenkins-1-pftp6"/"swd-test-project". list of unattached/unmounted volumes=[jenkins-data] 2016-11-10T00:22:01Z 2016-11-09T21:35:42Z 76 jenkins-1-pftp6 Pod Warning FailedSync {kubelet ip-172-31-2-89.ec2.internal} Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "jenkins-1-pftp6"/"swd-test-project". list of unattached/unmounted volumes=[jenkins-data] In the default project, I see a lot of node NotReady events and tons of RegisteredNode events, but no evacuations yet. In general, etcd is struggling to keep up on this cluster. I noticed this around the same time we started having trouble with the PVs, though I'm not sure if it's related. We could try adding a couple more etcd nodes (or replace the storage on our current nodes) and see if it does a better job keeping up with all the events on the cluster. Here are some etcd errors I'm seeing often: etcd[1675]: failed to send out heartbeat on time (deadline exceeded for 44.589112ms) etcd[1675]: server is likely overloaded etcd[1675]: etcdserver: request timed out, possibly due to previous leader failure [merged 17 repeated lines in 1s]
I have run on dev-preview-int, the pvc are created and bound to a pv quickly, the time cost is 42 seconds Version-Release number for dev-preview-int: openshift v3.4.0.21+ca4702d kubernetes v1.4.0+776c994 1. [root@ymsun origin]# oc new-project dakinitestpvs 2. [root@ymsun origin]# oc new-app cakephp-mysql-example --> Deploying template "openshift/cakephp-mysql-example" to project dakinitestpvs .... <skips> 3. [root@ymsun origin]# oc get pvc -n dakinitestpvs -w NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Bound pvc-e4bcaf0f-a733-11e6-9c35-0a30d6fc46dc 1Gi RWO 42s
Just adding more numbers. Statistics for the entire day look like this: Count: 86 Key: CreateVolume Count: 2 Key: ListAliases Count: 4 Key: LookupEvents Count: 6 Key: DescribeServices Count: 1019850 Key: DescribeInstances Count: 4 Key: DescribeConfigurationRecorders Count: 445 Key: DeleteSnapshot Count: 4 Key: DescribeConfigurationRecorderStatus Count: 9 Key: DescribeTrails Count: 6 Key: DescribeCases Count: 6 Key: DescribeAlarms Count: 1 Key: DescribeReservedInstances Count: 2401 Key: DescribeLoadBalancers Count: 1 Key: CreateCase Count: 1 Key: CreateNetworkInterface Count: 4 Key: DescribeSeverityLevels Count: 1 Key: ConsoleLogin Count: 41 Key: DetachVolume Count: 211972 Key: AttachVolume Count: 25 Key: DescribeSnapshots Count: 2 Key: ListBuckets Count: 10703 Key: DeleteVolume Count: 2101 Key: CreateTags Count: 246051 Key: DescribeVolumes Count: 3 Key: DeleteNetworkInterface Count: 1 Key: CheckMfa Count: 1 Key: DescribeAccountLimits Count: 997 Key: CreateSnapshot That is for around 20 hours of logging in cloudtrail. This comes out to be 20.7601 api calls per second. This does not include the api calls that were disallowed due to the limiting.
AWS support says we should implement an exponential backoff if we don't have one already. http://docs.aws.amazon.com/general/latest/gr/api-retries.html We should look at the controller logic and see if something like this can be implemented with regards to our PV attaches.
*** Bug 1394428 has been marked as a duplicate of this bug. ***
There are couple of things here. 1) OpenShift Online uses its own provisioner and not the OpenShift/Kubernetes one. Mark Turansky should know more if there is exponential backoff or not. 2) Looking at the API call numbers, DescribeVolumes and AttachVolume is called a lot. It's used to attach a volume to a node and to check if the volume is attached there (DescribeVolumes). Maybe we could tune the timeouts here to ask less, now we ask AWS every 10 seconds. Os is here an event that would tell Kubernetes that a volume got attached?
Today I did some more testing and showed some new apps to Mark who determined that we have a race condition between the storage controller and the PV provisioning pod. My understanding is that the controller moves the PVC into Bound state before it's ready, which updates the object's resourceVersion, which prevents the provisioning pod from provisioning the volume because it's always 1 resourceVersion behind. The controller is also trying to attach the PV before it has a valid AWS VolumeID, so it's trying to mount a volume that doesn't exist yet, which also updates the resourceVersion and keeps the race going. Eventually the provisioning pod is able to provision the volume, but it can take 10-20+ minutes before it finally wins the race between itself and the controller. The race condition is causing extra AWS API calls and contributing to our AWS API throttling, which contributes to the instability of the controller. (We're still seeing frequent controller restarts). So next we either need to come up with a potential bug fix for the race condition, or get 3.4 out to prod so we can use the built-in provisioning instead.
*** Bug 1392189 has been marked as a duplicate of this bug. ***
Testing on dev-preview-stg, pvc can be bounded to pv quickly. Version-Release number for dev-preview-stg: openshift v3.3.1.4 kubernetes v1.3.0+52492b4 [root@ymsun default-stg]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Bound pv-aws-ry30z 1Gi RWO 8s
Testing on Dev_preview_Prod, pvc can be bounded to pv quickly. Version-Release number for Dev_preview_Prod: openshift v3.3.1.3 kubernetes v1.3.0+52492b4 [root@ymsun online-test]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE mysql Bound pv-1-preview-master-afbb8-vol-04750624d98803d7c 1Gi RWO 9s