Bug 1415076 - unable to delete dead container with service account filesystem
Summary: unable to delete dead container with service account filesystem
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-20 07:49 UTC by Jaspreet Kaur
Modified: 2020-02-14 18:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-31 22:21:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jaspreet Kaur 2017-01-20 07:49:29 UTC
Description of problem: The docker-cleanup script is unable to delete the container because of serviceaccount filesystem is busy.

Error response from daemon: Unable to remove filesystem for 1c2ae6b10fe6fe66af762e4177e57f748995837b0d2f61695181e0573263f37c: remove /var/lib/docker/containers/1c2ae6b10fe6fe66af762e4177e57f748995837b0d2f61695181e0573263f37c/secrets/kubernetes.io/serviceaccount: device or resource busy
Error response from daemon: Unable to remove filesystem for 7048c117f5a8887c49e561188db76a10b38dc1ba986d553be45244948bc91125: remove /var/lib/docker/containers/7048c117f5a8887c49e561188db76a10b38dc1ba986d553be45244948bc91125/secrets/kubernetes.io/serviceaccount: device or resource busy
Error response from daemon: Unable to remove filesystem for e965cc54c38e5332cde6d6e175f3e662451ab2756a10302d9daee49cc51d5234: remove /var/lib/docker/containers/e965cc54c38e5332cde6d6e175f3e662451ab2756a10302d9daee49cc51d5234/secrets/kubernetes.io/serviceaccount: device or resource busy


Tried steps here : https://access.redhat.com/solutions/2840311

The above article fails to kill the process.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Seth Jennings 2017-01-26 23:28:28 UTC
Some first level triage.

All the processes in the support ticket that are holding the mount points are in uninterruptable sleep as indicated by D n the STAT colum:

# ps -f 5380 5381 15850 109932
UID         PID   PPID  C STIME TTY      STAT   TIME CMD
root       5380      1  0 Jan18 ?        Dl     0:00 java -Xmx256m -Djava.library.path=/opt/draios/lib -Dsun.rmi.transport.connectionTimeout=2000 -Dsun.rmi.transport.tc
root       5381      1  0 Jan18 ?        D      0:00 statsite -f /opt/draios/etc/statsite.ini
1000090+  15850 107147  0 Jan18 ?        D      0:00 du -sh _state indices node.lock
1000090+ 109932 109905 13  2016 ?        Dl   20105:32 /bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccu

This is why they are not dying in response to a kill -9.

I'm not sure if the customer system is still in this state, but it is likely that a NFS server or block device is not responding.

"cat /proc/<pid>/stack" might shed some light on where in the kernel the processes are stuck.

Comment 2 Derek Carr 2017-01-31 22:21:00 UTC
Closing bug as customer case is resolved.


Note You need to log in before you can comment on or make changes to this bug.