Description of problem: Issue reported by the customer in a Openshift Dedicated cluster. Deployments are failing randomly in this cluster, the msg displayed by the deploy job being: --> Scaling syndesis-prometheus-1 to 1 error: update acceptor rejected syndesis-prometheus-1: acceptAvailablePods failed watching for ReplicationController proj442639/syndesis-prometheus-1: received event ERROR It's manifesting in different deploymentConfigs randomly (including customer applications). If you retry, by deleting the replicationController, it get's recreated fine, and it works. Version-Release number of selected component (if applicable): Openshift/k8s Version openshift v3.11.43 kubernetes v1.11.0+d4cacc0 master-api and master-controller pod images: ose-control-plane:v3.11.43 How reproducible: The issue is random, but is appearing in different projects in the cluster Steps to Reproduce: 1.Create an application deployment Sample DeploymentConfig yaml: apiVersion: apps.openshift.io/v1 kind: DeploymentConfig metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: 2019-07-15T18:08:35Z generation: 3 labels: app: syndesis syndesis.io/app: syndesis syndesis.io/component: syndesis-oauthproxy syndesis.io/type: infrastructure name: syndesis-oauthproxy namespace: proj435427 spec: replicas: 1 revisionHistoryLimit: 10 selector: app: syndesis syndesis.io/app: syndesis syndesis.io/component: syndesis-oauthproxy strategy: activeDeadlineSeconds: 21600 recreateParams: timeoutSeconds: 600 resources: limits: memory: 256Mi requests: memory: 20Mi type: Recreate template: metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: null labels: app: syndesis syndesis.io/app: syndesis syndesis.io/component: syndesis-oauthproxy syndesis.io/type: infrastructure spec: containers: - args: - --provider=openshift - --client-id=system:serviceaccount:proj435427:syndesis-oauth-client - --client-secret=<REDACTED> - --upstream=http://syndesis-server/api/ - --upstream=http://syndesis-server/mapper/ - --upstream=http://syndesis-ui/ - --tls-cert=/etc/tls/private/tls.crt - --tls-key=/etc/tls/private/tls.key - --cookie-secret=$(OAUTH_COOKIE_SECRET) - --pass-access-token - --skip-provider-button - --skip-auth-regex=/logout - --skip-auth-regex=/[^/]+\.(png|jpg|eot|svg|ttf|woff|woff2) - --skip-auth-regex=/api/v1/swagger.* - --skip-auth-regex=/api/v1/index.html - --skip-auth-regex=/api/v1/credentials/callback - --skip-auth-regex=/api/v1/version - --skip-auth-preflight - --openshift-ca=/etc/pki/tls/certs/ca-bundle.crt - --openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt - --openshift-sar={"namespace":"proj435427","resource":"pods","verb":"get"} env: - name: OAUTH_COOKIE_SECRET valueFrom: secretKeyRef: key: <REDACTED> name: <REDACTED> image: docker-registry.default.svc:5000/<REDACTED>/oauth-proxy@sha256:<SHA> imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /oauth/healthz port: 8443 scheme: HTTPS initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 name: syndesis-oauthproxy ports: - containerPort: 8443 name: public protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /oauth/healthz port: 8443 scheme: HTTPS initialDelaySeconds: 15 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 resources: limits: memory: 200Mi requests: memory: 20Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/tls/private name: syndesis-oauthproxy-tls dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: syndesis-oauth-client serviceAccountName: syndesis-oauth-client terminationGracePeriodSeconds: 30 volumes: - name: syndesis-oauthproxy-tls secret: defaultMode: 420 secretName: syndesis-oauthproxy-tls test: false triggers: - imageChangeParams: automatic: true containerNames: - syndesis-oauthproxy from: kind: ImageStreamTag name: oauth-proxy:v1.1.0 namespace: <REDACTED> lastTriggeredImage: docker-registry.default.svc:5000/<REDACTED>/oauth-proxy@sha256:<SHA> type: ImageChange - type: ConfigChange Actual results: 1) The deploy pod fails with the log below after a couple of seconds (it's not a timeout waiting for the pod to come up): --> Scaling syndesis-oauthproxy-2 to 1 error: update acceptor rejected syndesis-oauthproxy-2: acceptAvailablePods failed watching for ReplicationController proj435427/syndesis-oauthproxy-2: received event ERROR 2) The deploymentConfig object receives the following status: status: availableReplicas: 1 conditions: - lastTransitionTime: 2019-08-07T09:05:48Z lastUpdateTime: 2019-08-07T09:05:48Z message: replication controller "syndesis-oauthproxy-2" has failed progressing reason: ProgressDeadlineExceeded status: "False" type: Progressing - lastTransitionTime: 2019-08-07T09:06:17Z lastUpdateTime: 2019-08-07T09:06:17Z message: Deployment config has minimum availability. status: "True" type: Available details: causes: - imageTrigger: from: kind: DockerImage name: docker-registry.default.svc:5000/<REDACTED>/oauth-proxy@sha256:<SHA> type: ImageChange message: image change latestVersion: 2 observedGeneration: 3 readyReplicas: 1 replicas: 1 unavailableReplicas: 0 updatedReplicas: 0 Expected results: The deploy job should provision the pod as expected. Additional info:
yeah, we should log the actual error in deployers but it should be also logged in apiserver logs - can you pls collect those logs and attach them to this BZ?
just to close on my previous comment - opened https://bugzilla.redhat.com/show_bug.cgi?id=1741103 and a fix to show the actual error
This bug hasn't had any engineering activity in the last ~30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale". If you have further information on the current state of the bug, please update it and remove the "LifecycleStale" keyword, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.
This bug hasn't had any activity 7 days after it was marked as LifecycleStale, so we are closing this bug as WONTFIX. If you consider this bug still valuable, please reopen it or create new bug.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days