Bug 1476433
| Summary: | [free-stg] repeated invocations of oadm drain unable to drain node | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Justin Pierce <jupierce> | ||||
| Component: | Containers | Assignee: | Antonio Murdaca <amurdaca> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | DeShuai Ma <dma> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.6.0 | CC: | aos-bugs, eparis, jhonce, jligon, jokerman, jupierce, mmccomas, mwoodson, sdodson, sjenning | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-01 15:19:34 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1460729 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Justin Pierce
2017-07-29 01:55:16 UTC
Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1473777, but killing the process and retrying does not clear the issue. Describe of pods still associated with the node:
[root@free-stg-master-03fb6 ~]# oc get pods --all-namespaces | grep -e oso-clamd-57h3b -e mongodb-8-jj7xp -e database-2-5fcz6
management-infra oso-clamd-57h3b 1/2 Error 1153 32d
xyz mongodb-8-jj7xp 0/1 Terminating 0 8d
yasun-1-1 database-2-5fcz6 1/1 Terminating 0 8d
[root@free-stg-master-03fb6 ~]# oc describe -n management-infra oso-clamd-57h3b
the server doesn't have a resource type "oso-clamd-57h3b"
[root@free-stg-master-03fb6 ~]# oc describe -n management-infra pod oso-clamd-57h3b
Name: oso-clamd-57h3b
Namespace: management-infra
Security Policy: privileged
Node: ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193
Start Time: Mon, 26 Jun 2017 15:36:50 +0000
Labels: name=oso-clamd
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"management-infra","name":"oso-clamd","uid":"1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7",...
openshift.io/scc=privileged
Status: Running
IP: 10.128.5.52
Controllers: DaemonSet/oso-clamd
Containers:
oso-clamd:
Container ID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72
Image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest
Image ID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd@sha256:90ba009c48d28494aac15240f41d311a5384db3e97649f5eeffc1cefb055b9e6
Port:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 29 Jul 2017 01:59:17 +0000
Finished: Sat, 29 Jul 2017 01:59:26 +0000
Ready: False
Restart Count: 1141
Environment:
OO_PAUSE_ON_START: false
Mounts:
/var/lib/clamav from clamsigs (ro)
/var/run/clamd.scan from clamsock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from clamd-deployer-token-dwzg1 (ro)
oso-clamd-update:
Container ID: docker://631352cbb39726531fab65c7b5621a513d570e47035a4a0ba8e049f8d730b713
Image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest
Image ID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9
Port:
State: Running
Started: Mon, 24 Jul 2017 21:35:45 +0000
Last State: Terminated
Reason: Error
Message: Error on reading termination log /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: open /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2: no such file or directory
Exit Code: 1
Started: Wed, 19 Jul 2017 23:14:02 +0000
Finished: Mon, 24 Jul 2017 20:22:48 +0000
Ready: True
Restart Count: 12
Environment:
OO_PAUSE_ON_START: false
Mounts:
/token from token (rw)
/usr/bin/oc from usr-bin-oc (rw)
/var/lib/clamav from clamsigs (rw)
/var/run/clamd.scan from clamsock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from clamd-deployer-token-dwzg1 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
clamsock:
Type: HostPath (bare host directory volume)
Path: /var/run/clamd.scan
clamsigs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
token:
Type: Secret (a volume populated by a Secret)
SecretName: clamd-image-watcher-token
Optional: false
usr-bin-oc:
Type: HostPath (bare host directory volume)
Path: /usr/bin/oc
clamd-deployer-token-dwzg1:
Type: Secret (a volume populated by a Secret)
SecretName: clamd-deployer-token-dwzg1
Optional: false
QoS Class: BestEffort
Node-Selectors: type=compute
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3d 1h 19107 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{oso-clamd} Warning BackOff Back-off restarting failed container
3d 37m 847 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{oso-clamd} Normal Pulling pulling image "172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest"
3d 2m 19475 kubelet, ip-172-31-75-193.us-east-2.compute.internal Warning FailedSync Error syncing pod
[root@free-stg-master-03fb6 ~]# oc describe pod mongodb-8-jj7xp -n xyz
Name: mongodb-8-jj7xp
Namespace: xyz
Security Policy: restricted
Node: ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193
Start Time: Thu, 20 Jul 2017 19:23:48 +0000
Labels: deployment=mongodb-8
deploymentconfig=mongodb
name=mongodb
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"xyz","name":"mongodb-8","uid":"3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b","...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container mongodb; cpu limit for container mongodb
openshift.io/deployment-config.latest-version=8
openshift.io/deployment-config.name=mongodb
openshift.io/deployment.name=mongodb-8
openshift.io/scc=restricted
Status: Terminating (expires Fri, 28 Jul 2017 21:24:55 +0000)
Termination Grace Period: 30s
IP:
Controllers: ReplicationController/mongodb-8
Containers:
mongodb:
Container ID: docker://ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d
Image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
Image ID: docker-pullable://registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
Port: 27017/TCP
State: Running
Started: Mon, 24 Jul 2017 21:02:39 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 60m
memory: 307Mi
Liveness: tcp-socket :27017 delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/sh -i -c mongo 127.0.0.1:27017/$MONGODB_DATABASE -u $MONGODB_USER -p $MONGODB_PASSWORD --eval="quit()"] delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
MONGODB_USER: <set to the key 'database-user' in secret 'nodejs-mongo-persistent'> Optional: false
MONGODB_PASSWORD: <set to the key 'database-password' in secret 'nodejs-mongo-persistent'> Optional: false
MONGODB_DATABASE: sampledb
MONGODB_ADMIN_PASSWORD: <set to the key 'database-admin-password' in secret 'nodejs-mongo-persistent'> Optional: false
Mounts:
/var/lib/mongodb/data from mongodb-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rbzg6 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
mongodb-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb
ReadOnly: false
default-token-rbzg6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rbzg6
Optional: false
QoS Class: Burstable
Node-Selectors: type=compute
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4h 4s 519 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{mongodb} Normal Killing Killing container with id docker://mongodb:Need to kill Pod
[root@free-stg-master-03fb6 ~]# oc describe pod database-2-5fcz6 -n yasun-1-1
Name: database-2-5fcz6
Namespace: yasun-1-1
Security Policy: restricted
Node: ip-172-31-75-193.us-east-2.compute.internal/172.31.75.193
Start Time: Thu, 20 Jul 2017 19:23:48 +0000
Labels: deployment=database-2
deploymentconfig=database
name=database
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"yasun-1-1","name":"database-2","uid":"3cc07aeb-6cd6-11e7-bb9e-02306c0c...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container mysql; cpu limit for container mysql
openshift.io/deployment-config.latest-version=2
openshift.io/deployment-config.name=database
openshift.io/deployment.name=database-2
openshift.io/scc=restricted
Status: Terminating (expires Fri, 28 Jul 2017 21:24:56 +0000)
Termination Grace Period: 30s
IP:
Controllers: ReplicationController/database-2
Containers:
mysql:
Container ID: docker://5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24
Image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
Image ID: docker-pullable://registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
Port: 3306/TCP
State: Running
Started: Mon, 24 Jul 2017 21:08:44 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 60m
memory: 307Mi
Liveness: tcp-socket :3306 delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/sh -i -c MYSQL_PWD='aE6hTcmx' mysql -h 127.0.0.1 -u userVCI -D sampledb -e 'SELECT 1'] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
MYSQL_USER: <set to the key 'database-user' in secret 'dancer-mysql-persistent'> Optional: false
MYSQL_PASSWORD: <set to the key 'database-password' in secret 'dancer-mysql-persistent'> Optional: false
MYSQL_DATABASE: sampledb
Mounts:
/var/lib/mysql/data from database-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qqx38 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
database-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: database
ReadOnly: false
default-token-qqx38:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qqx38
Optional: false
QoS Class: Burstable
Node-Selectors: type=compute
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4h 37s 519 kubelet, ip-172-31-75-193.us-east-2.compute.internal spec.containers{mysql} Normal Killing Killing container with id docker://mysql:Need to kill Pod
[root@free-stg-master-03fb6 ~]#
Yaml for pods:
[root@free-stg-master-03fb6 ~]# oc get pod database-2-5fcz6 -n yasun-1-1 -o=yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"yasun-1-1","name":"database-2","uid":"3cc07aeb-6cd6-11e7-bb9e-02306c0cdc4b","apiVersion":"v1","resourceVersion":"34780432"}}
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
mysql; cpu limit for container mysql'
openshift.io/deployment-config.latest-version: "2"
openshift.io/deployment-config.name: database
openshift.io/deployment.name: database-2
openshift.io/scc: restricted
creationTimestamp: 2017-07-20T19:23:43Z
deletionGracePeriodSeconds: 30
deletionTimestamp: 2017-07-28T21:24:56Z
generateName: database-2-
labels:
deployment: database-2
deploymentconfig: database
name: database
name: database-2-5fcz6
namespace: yasun-1-1
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: ReplicationController
name: database-2
uid: 3cc07aeb-6cd6-11e7-bb9e-02306c0cdc4b
resourceVersion: "36769986"
selfLink: /api/v1/namespaces/yasun-1-1/pods/database-2-5fcz6
uid: f17b1c10-6d80-11e7-91e2-02306c0cdc4b
spec:
containers:
- env:
- name: MYSQL_USER
valueFrom:
secretKeyRef:
key: database-user
name: dancer-mysql-persistent
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
key: database-password
name: dancer-mysql-persistent
- name: MYSQL_DATABASE
value: sampledb
image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3306
timeoutSeconds: 1
name: mysql
ports:
- containerPort: 3306
protocol: TCP
readinessProbe:
exec:
command:
- /bin/sh
- -i
- -c
- MYSQL_PWD='aE6hTcmx' mysql -h 127.0.0.1 -u userVCI -D sampledb -e 'SELECT
1'
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 60m
memory: 307Mi
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- NET_RAW
- SETGID
- SETUID
- SYS_CHROOT
privileged: false
runAsUser: 1005000000
seLinuxOptions:
level: s0:c71,c15
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/mysql/data
name: database-data
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-qqx38
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: default-dockercfg-24g9f
nodeName: ip-172-31-75-193.us-east-2.compute.internal
nodeSelector:
type: compute
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1005000000
seLinuxOptions:
level: s0:c71,c15
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: database-data
persistentVolumeClaim:
claimName: database
- name: default-token-qqx38
secret:
defaultMode: 420
secretName: default-token-qqx38
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-07-20T19:23:48Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-07-25T22:22:13Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-07-20T19:23:43Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24
image: registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
imageID: docker-pullable://registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:76554b1dfd6a018b834be932f5e8dc5cf7088c4170877cfed42b117d85aee9ec
lastState: {}
name: mysql
ready: true
restartCount: 0
state:
running:
startedAt: 2017-07-24T21:08:44Z
hostIP: 172.31.75.193
phase: Running
qosClass: Burstable
startTime: 2017-07-20T19:23:48Z
[root@free-stg-master-03fb6 ~]# oc get pod mongodb-8-jj7xp -n xyz -o=yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"xyz","name":"mongodb-8","uid":"3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b","apiVersion":"v1","resourceVersion":"34780694"}}
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
mongodb; cpu limit for container mongodb'
openshift.io/deployment-config.latest-version: "8"
openshift.io/deployment-config.name: mongodb
openshift.io/deployment.name: mongodb-8
openshift.io/scc: restricted
creationTimestamp: 2017-07-20T19:23:43Z
deletionGracePeriodSeconds: 30
deletionTimestamp: 2017-07-28T21:24:55Z
generateName: mongodb-8-
labels:
deployment: mongodb-8
deploymentconfig: mongodb
name: mongodb
name: mongodb-8-jj7xp
namespace: xyz
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: ReplicationController
name: mongodb-8
uid: 3efa6e37-6cd6-11e7-bb9e-02306c0cdc4b
resourceVersion: "36770066"
selfLink: /api/v1/namespaces/xyz/pods/mongodb-8-jj7xp
uid: f1c9f7b6-6d80-11e7-91e2-02306c0cdc4b
spec:
containers:
- env:
- name: MONGODB_USER
valueFrom:
secretKeyRef:
key: database-user
name: nodejs-mongo-persistent
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
key: database-password
name: nodejs-mongo-persistent
- name: MONGODB_DATABASE
value: sampledb
- name: MONGODB_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
key: database-admin-password
name: nodejs-mongo-persistent
image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 27017
timeoutSeconds: 1
name: mongodb
ports:
- containerPort: 27017
protocol: TCP
readinessProbe:
exec:
command:
- /bin/sh
- -i
- -c
- mongo 127.0.0.1:27017/$MONGODB_DATABASE -u $MONGODB_USER -p $MONGODB_PASSWORD
--eval="quit()"
failureThreshold: 3
initialDelaySeconds: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 60m
memory: 307Mi
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- NET_RAW
- SETGID
- SETUID
- SYS_CHROOT
privileged: false
runAsUser: 1000130000
seLinuxOptions:
level: s0:c11,c10
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/mongodb/data
name: mongodb-data
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rbzg6
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: default-dockercfg-2zp9s
nodeName: ip-172-31-75-193.us-east-2.compute.internal
nodeSelector:
type: compute
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000130000
seLinuxOptions:
level: s0:c11,c10
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: mongodb-data
persistentVolumeClaim:
claimName: mongodb
- name: default-token-rbzg6
secret:
defaultMode: 420
secretName: default-token-rbzg6
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-07-20T19:23:48Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-07-25T22:22:10Z
message: 'containers with unready status: [mongodb]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-07-20T19:23:43Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d
image: registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
imageID: docker-pullable://registry.access.redhat.com/rhscl/mongodb-32-rhel7@sha256:48e323b31f38ca23bf6c566756c08e7b485d19e5cbee3507b7dd6cbf3b1a9ece
lastState: {}
name: mongodb
ready: false
restartCount: 0
state:
running:
startedAt: 2017-07-24T21:02:39Z
hostIP: 172.31.75.193
phase: Running
qosClass: Burstable
startTime: 2017-07-20T19:23:48Z
[root@free-stg-master-03fb6 ~]# oc get -n management-infra pod oso-clamd-57h3b -o=yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"management-infra","name":"oso-clamd","uid":"1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7","apiVersion":"extensions","resourceVersion":"22329955"}}
openshift.io/scc: privileged
creationTimestamp: 2017-06-26T15:36:49Z
generateName: oso-clamd-
labels:
name: oso-clamd
name: oso-clamd-57h3b
namespace: management-infra
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: oso-clamd
uid: 1e9f6f94-57b4-11e7-a29c-0203ad7dfcd7
resourceVersion: "36799993"
selfLink: /api/v1/namespaces/management-infra/pods/oso-clamd-57h3b
uid: 456773b5-5a85-11e7-ba4c-02306c0cdc4b
spec:
containers:
- env:
- name: OO_PAUSE_ON_START
value: "false"
image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest
imagePullPolicy: Always
name: oso-clamd
resources: {}
securityContext:
capabilities:
drop:
- AUDIT_WRITE
- CHOWN
- DAC_OVERRIDE
- FOWNER
- FSETID
- KILL
- MKNOD
- NET_BIND_SERVICE
- NET_RAW
- SETFCAP
- SETGID
- SETPCAP
- SETUID
- SYS_CHROOT
privileged: true
runAsUser: 10000
seLinuxOptions:
level: s0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/clamav
name: clamsigs
readOnly: true
- mountPath: /var/run/clamd.scan
name: clamsock
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: clamd-deployer-token-dwzg1
readOnly: true
- env:
- name: OO_PAUSE_ON_START
value: "false"
image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest
imagePullPolicy: Always
name: oso-clamd-update
resources: {}
securityContext:
privileged: false
runAsUser: 10001
seLinuxOptions:
level: s0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/clamav
name: clamsigs
- mountPath: /token
name: token
- mountPath: /usr/bin/oc
name: usr-bin-oc
- mountPath: /var/run/clamd.scan
name: clamsock
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: clamd-deployer-token-dwzg1
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: clamd-deployer-dockercfg-zffvz
nodeName: ip-172-31-75-193.us-east-2.compute.internal
nodeSelector:
type: compute
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
seLinuxOptions:
level: s0
serviceAccount: clamd-deployer
serviceAccountName: clamd-deployer
terminationGracePeriodSeconds: 10
volumes:
- hostPath:
path: /var/run/clamd.scan
name: clamsock
- emptyDir: {}
name: clamsigs
- name: token
secret:
defaultMode: 420
secretName: clamd-image-watcher-token
- hostPath:
path: /usr/bin/oc
name: usr-bin-oc
- name: clamd-deployer-token-dwzg1
secret:
defaultMode: 420
secretName: clamd-deployer-token-dwzg1
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-06-26T15:36:50Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-07-29T01:59:26Z
message: 'containers with unready status: [oso-clamd]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-06-26T15:36:56Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72
image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd:latest
imageID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd@sha256:90ba009c48d28494aac15240f41d311a5384db3e97649f5eeffc1cefb055b9e6
lastState:
terminated:
containerID: docker://dce4c8ac7c293079496c939794019b760df86246e9f3e015970e2a4e5190ac72
exitCode: 1
finishedAt: 2017-07-29T01:59:26Z
reason: Error
startedAt: 2017-07-29T01:59:17Z
name: oso-clamd
ready: false
restartCount: 1141
state:
waiting:
message: Back-off 5m0s restarting failed container=oso-clamd pod=oso-clamd-57h3b_management-infra(456773b5-5a85-11e7-ba4c-02306c0cdc4b)
reason: CrashLoopBackOff
- containerID: docker://631352cbb39726531fab65c7b5621a513d570e47035a4a0ba8e049f8d730b713
image: 172.30.44.192:5000/management-infra/oso-rhel7-clamd-update:latest
imageID: docker-pullable://172.30.44.192:5000/management-infra/oso-rhel7-clamd-update@sha256:8bf75d87ffe3b31cd9c591590d17bd6a8d29855a5e796a1cb1a8c374fdad2ea9
lastState:
terminated:
containerID: docker://6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419
exitCode: 1
finishedAt: 2017-07-24T20:22:48Z
message: 'Error on reading termination log /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2:
open /var/lib/origin/openshift.local.volumes/pods/456773b5-5a85-11e7-ba4c-02306c0cdc4b/containers/oso-clamd-update/d05df9c2:
no such file or directory'
reason: Error
startedAt: 2017-07-19T23:14:02Z
name: oso-clamd-update
ready: true
restartCount: 12
state:
running:
startedAt: 2017-07-24T21:35:45Z
hostIP: 172.31.75.193
phase: Running
podIP: 10.128.5.52
qosClass: BestEffort
startTime: 2017-06-26T15:36:50Z
Created attachment 1307158 [details]
Backtrace in readable form
Unlike the previously referenced issues this one does not appear to be an oadm drain issue. It is either a docker or a kubelet issue. This is yet another 'pods stuck terminating' problem. Which can have nearly infinite causes. Quick triage: database-2-5fcz6 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 Container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24 failed to exit within 30 seconds of signal 15 - using the force container kill failed because of 'container not found' or 'no such process': Cannot kill container 5b193c9c4c7217768aae5a59961b583f013700b3f2804a7f801f0fe056a38b24: rpc error: code = 2 desc = containerd: container not found mongodb-8-jj7xp ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d Container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d failed to exit within 30 seconds of signal 15 - using the force container kill failed because of 'container not found' or 'no such process': Cannot kill container ffcccf58ec0c6353c2835c79aae462013af7f6598f868d3f29c8f3fec28ceb5d: rpc error: code = 2 desc = containerd: container not found oso-clamd-57h3b 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419 Error removing mounted layer 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419: failed to remove device 6af5352dd32d0c60a7bdf47dd956af1cd8f9175263cfae113b7cd666d89324db:Device is Busy Handler for DELETE /v1.24/containers/6a8654f423a0 returned error: Driver devicemapper failed to remove root filesystem 6a8654f423a0f6292225943579e9074fd1e0c54b372b0d570a7adbf7326f6419: failed to remove device 6af5352dd32d0c60a7bdf47dd956af1cd8f9175263cfae113b7cd666d89324db:Device is Busy database-2-5fcz6 and mongodb-8-jj7xp removal is failing to remove because containerd reports the containers don't exist oso-clamd-57h3b removal is failing because of a busy mount point The docker log is mostly a loop of the same errors over and over. There are three types: There are 34 unique device mapper ids that result in recurring "Error removing mounted layer" errors. There are 48 unique container ids that result in recurring "No such container" errors. database-2-5fcz6 and mongodb-8-jj7xp are only two that result in recurring "Cannot kill container ... containerd: container not found" messages. While the first two error types generate a ton of logging noise and are messy, I think the last one with "containerd: container not found" is indicative of the issue. I imagine that containerd is successfully deleting the container but the kubelet isn't getting notification of the success and thus continues trying to delete the container and containerd doesn't know about it anymore. Not sure how that would happen yet. Which build of docker are you running? $ rpm -q -a |grep docker Thanks! Why would you make that comment private? # rpm -qa \*docker\* docker-client-1.12.6-30.git97ba2c0.el7.x86_64 docker-common-1.12.6-30.git97ba2c0.el7.x86_64 docker-1.12.6-30.git97ba2c0.el7.x86_64 docker-rhel-push-plugin-1.12.6-30.git97ba2c0.el7.x86_64 atomic-openshift-docker-excluder-3.6.170-1.git.0.9eec78e.el7.noarch I didn't notice before that 'docker ps' _does_ show the containers of which containerd has no knowledge. I'm not sure if the kubelet can be blamed if docker itself is also under the impression that the container still exists. Unfortunately, the logs don't go back far enough to see what happened the first time the container stop/delete was attempted. Assigning to containers to figure out why there is a disconnect between docker and containerd. Might also be a dup, but I'll let Containers decide. Antonio, does this not look like a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1460729 ? (In reply to Eric Paris from comment #10) > Antonio, does this not look like a dup of > https://bugzilla.redhat.com/show_bug.cgi?id=1460729 ? From my POV, this bug is the same as the bug you linked previously. For reference, that bug was caused by building docker using a wrong docker and containerd commit. I referenced the commits here https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c41 Containerd, specifically, was missing this commit https://github.com/projectatomic/containerd/commit/988be67cb9f5cf55a2591ab50ba25f4d9055b3ac I backported that containerd commit from upstream, then I aligned docker commit to f55a11849bb4b654c511f0d9cfe7e25e833c2bed (which is part of a DevMapper fixes wihch should have gone together, but for some reason we built from a different commit). The combination of the above, and more importantly the containerd backport fixed this issue as shown in https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c54 (which is indeed VERIFIED now). The commits listed in https://bugzilla.redhat.com/show_bug.cgi?id=1460729#c41 (referenced bug) are here: docker: https://github.com/projectatomic/docker/commit/f55a11849bb4b654c511f0d9cfe7e25e833c2bed containerd: https://github.com/projectatomic/containerd/commit/fa8fb3d455e1baf716f3131581f0ed8b07c573a6 *** This bug has been marked as a duplicate of bug 1460729 *** |