On AWS Openshift will generate device names that are invalid and rejected by AWS API. Version-Release number of selected component (if applicable): 3.4.1.x Following is original bug report copied from Stefanie's message in - https://bugzilla.redhat.com/show_bug.cgi?id=1404811#c39 We have installed 3.4.1.2 in Preview prod, but we're still seeing the same behavior as before: To reproduce: 1. Create a pv-backed app. In this case, we were installing metrics and the cassandra pod is backed by a PV. 2. Scale the app down to 0 replicas. 3. Scale back up to 1 replicas. 4. Watch the EBS volume get stuck in 'attaching' state in the AWS web console. At this point, we get an error like this: timeout expired waiting for volumes to attach/mount for pod "hawkular-cassandra-2-dcq1r"/"openshift-infra" 5. Force-detach the volume in the web console and repeat steps 2,3,4. The app never succeeds in scaling up. Level4 node logs during the scale up. http://paste-platops.itos.redhat.com/p03x0rmms/wbv2qn/raw Container is creating. Node says the PV is "in use" but not "attached". http://paste-platops.itos.redhat.com/pcuvksblp/kxhdbk/raw Events log showing the timeout. And 'oc describe pv' for the affected pv. http://paste-platops.itos.redhat.com/pmpbmpon6/uqauex/raw AWS shows the volume in 'available' state after repeatedly scaling up and down. Sometimes it get stuck in 'attaching' state, but mostly only after deleting the PV and app and recreating it all from scratch. In the controller logs, I'm now seeing messages like this for various other volumes: Feb 14 22:58:01 ip-172-31-10-24.ec2.internal atomic-openshift-master-controllers[64870]: E0214 22:58:01.864781 64870 attacher.go:72] Error attaching volume "aws://us-east-1c/vol-0fbcf15804f98f8e9": Error attaching EBS volume: InvalidParameterValue: Value (/dev/xvdfh) for parameter device is invalid. /dev/xvdfh is not a valid EBS device name. Feb 14 20:36:49 ip-172-31-10-24.ec2.internal atomic-openshift-master-controllers[64870]: E0214 20:36:49.972282 64870 attacher.go:72] Error attaching volume "aws://us-east-1c/vol-0ffea06983cd98900": Error attaching EBS volume: InvalidParameterValue: Value (/dev/xvddz) for parameter device is invalid. /dev/xvddz is not a valid EBS device name. More controller logs are here: http://paste-platops.itos.redhat.com/pjqz8puvn/fdnxt4/raw
The upstream PR - https://github.com/kubernetes/kubernetes/pull/41455 I am waiting for it to be merged before I start cherry picking this.
*** Bug 1422457 has been marked as a duplicate of this bug. ***
The fix for 3.4 has been included in v3.4.1.8 . I will hold off on putting this BZ in QA since fix for 3.3 isn't merged yet.