1740382 – Application Deployment failing randomly with watch error

Bug 1740382 - Application Deployment failing randomly with watch error

Summary: Application Deployment failing randomly with watch error

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Stefan Schimanski
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-12 19:21 UTC by Rogerio Bastos
Modified:	2023-09-14 05:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-26 11:04:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Rogerio Bastos 2019-08-12 19:21:50 UTC

Description of problem:
Issue reported by the customer in a Openshift Dedicated cluster. Deployments are failing randomly in this cluster, the msg displayed by the deploy job being:
--> Scaling syndesis-prometheus-1 to 1
error: update acceptor rejected syndesis-prometheus-1: acceptAvailablePods failed watching for ReplicationController proj442639/syndesis-prometheus-1: received event ERROR

It's manifesting in different deploymentConfigs randomly (including customer applications). If you retry, by deleting the replicationController, it get's recreated fine, and it works. 

Version-Release number of selected component (if applicable):

Openshift/k8s Version
openshift v3.11.43
kubernetes v1.11.0+d4cacc0

master-api and master-controller pod images: ose-control-plane:v3.11.43



How reproducible:
The issue is random, but is appearing in different projects in the cluster 

Steps to Reproduce:
1.Create an application deployment
Sample DeploymentConfig yaml:

apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2019-07-15T18:08:35Z
  generation: 3
  labels:
    app: syndesis
    syndesis.io/app: syndesis
    syndesis.io/component: syndesis-oauthproxy
    syndesis.io/type: infrastructure
  name: syndesis-oauthproxy
  namespace: proj435427
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    app: syndesis
    syndesis.io/app: syndesis
    syndesis.io/component: syndesis-oauthproxy
  strategy:
    activeDeadlineSeconds: 21600
    recreateParams:
      timeoutSeconds: 600
    resources:
      limits:
        memory: 256Mi
      requests:
        memory: 20Mi
    type: Recreate
  template:
    metadata:
      annotations:
        openshift.io/generated-by: OpenShiftNewApp
      creationTimestamp: null
      labels:
        app: syndesis
        syndesis.io/app: syndesis
        syndesis.io/component: syndesis-oauthproxy
        syndesis.io/type: infrastructure
    spec:
      containers:
      - args:
        - --provider=openshift
        - --client-id=system:serviceaccount:proj435427:syndesis-oauth-client
        - --client-secret=<REDACTED>
        - --upstream=http://syndesis-server/api/
        - --upstream=http://syndesis-server/mapper/
        - --upstream=http://syndesis-ui/
        - --tls-cert=/etc/tls/private/tls.crt
        - --tls-key=/etc/tls/private/tls.key
        - --cookie-secret=$(OAUTH_COOKIE_SECRET)
        - --pass-access-token
        - --skip-provider-button
        - --skip-auth-regex=/logout
        - --skip-auth-regex=/[^/]+\.(png|jpg|eot|svg|ttf|woff|woff2)
        - --skip-auth-regex=/api/v1/swagger.*
        - --skip-auth-regex=/api/v1/index.html
        - --skip-auth-regex=/api/v1/credentials/callback
        - --skip-auth-regex=/api/v1/version
        - --skip-auth-preflight
        - --openshift-ca=/etc/pki/tls/certs/ca-bundle.crt
        - --openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        - --openshift-sar={"namespace":"proj435427","resource":"pods","verb":"get"}
        env:
        - name: OAUTH_COOKIE_SECRET
          valueFrom:
            secretKeyRef:
              key: <REDACTED>
              name: <REDACTED>
        image: docker-registry.default.svc:5000/<REDACTED>/oauth-proxy@sha256:<SHA>
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /oauth/healthz
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10
        name: syndesis-oauthproxy
        ports:
        - containerPort: 8443
          name: public
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /oauth/healthz
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 15
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10
        resources:
          limits:
            memory: 200Mi
          requests:
            memory: 20Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/tls/private
          name: syndesis-oauthproxy-tls
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: syndesis-oauth-client
      serviceAccountName: syndesis-oauth-client
      terminationGracePeriodSeconds: 30
      volumes:
      - name: syndesis-oauthproxy-tls
        secret:
          defaultMode: 420
          secretName: syndesis-oauthproxy-tls
  test: false
  triggers:
  - imageChangeParams:
      automatic: true
      containerNames:
      - syndesis-oauthproxy
      from:
        kind: ImageStreamTag
        name: oauth-proxy:v1.1.0
        namespace: <REDACTED>
      lastTriggeredImage: docker-registry.default.svc:5000/<REDACTED>/oauth-proxy@sha256:<SHA>
    type: ImageChange
  - type: ConfigChange




Actual results:


1) The deploy pod fails with the log below after a couple of seconds (it's not a timeout waiting for the pod to come up):
--> Scaling syndesis-oauthproxy-2 to 1
error: update acceptor rejected syndesis-oauthproxy-2: acceptAvailablePods failed watching for ReplicationController proj435427/syndesis-oauthproxy-2: received event ERROR


2) The deploymentConfig object receives the following status:

status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2019-08-07T09:05:48Z
    lastUpdateTime: 2019-08-07T09:05:48Z
    message: replication controller "syndesis-oauthproxy-2" has failed progressing
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-08-07T09:06:17Z
    lastUpdateTime: 2019-08-07T09:06:17Z
    message: Deployment config has minimum availability.
    status: "True"
    type: Available
  details:
    causes:
    - imageTrigger:
        from:
          kind: DockerImage
          name: docker-registry.default.svc:5000/<REDACTED>/oauth-proxy@sha256:<SHA>
      type: ImageChange
    message: image change
  latestVersion: 2
  observedGeneration: 3
  readyReplicas: 1
  replicas: 1
  unavailableReplicas: 0
  updatedReplicas: 0

Expected results:
The deploy job should provision the pod as expected.



Additional info:

Comment 1 Tomáš Nožička 2019-08-13 11:49:19 UTC

yeah, we should log the actual error in deployers but it should be also logged in apiserver logs - can you pls collect those logs and attach them to this BZ?

Comment 2 Tomáš Nožička 2019-08-14 09:37:10 UTC

just to close on my previous comment - opened https://bugzilla.redhat.com/show_bug.cgi?id=1741103 and a fix to show the actual error

Comment 3 Michal Fojtik 2020-05-19 13:18:27 UTC

This bug hasn't had any engineering activity in the last ~30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale".

If you have further information on the current state of the bug, please update it and remove the "LifecycleStale" keyword, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 4 Michal Fojtik 2020-05-26 11:04:30 UTC

This bug hasn't had any activity 7 days after it was marked as LifecycleStale, so we are closing this bug as WONTFIX. If you consider this bug still valuable, please reopen it or create new bug.

Comment 5 Red Hat Bugzilla 2023-09-14 05:41:34 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.