Description of problem: The docker-cleanup script is unable to delete the container because of serviceaccount filesystem is busy. Error response from daemon: Unable to remove filesystem for 1c2ae6b10fe6fe66af762e4177e57f748995837b0d2f61695181e0573263f37c: remove /var/lib/docker/containers/1c2ae6b10fe6fe66af762e4177e57f748995837b0d2f61695181e0573263f37c/secrets/kubernetes.io/serviceaccount: device or resource busy Error response from daemon: Unable to remove filesystem for 7048c117f5a8887c49e561188db76a10b38dc1ba986d553be45244948bc91125: remove /var/lib/docker/containers/7048c117f5a8887c49e561188db76a10b38dc1ba986d553be45244948bc91125/secrets/kubernetes.io/serviceaccount: device or resource busy Error response from daemon: Unable to remove filesystem for e965cc54c38e5332cde6d6e175f3e662451ab2756a10302d9daee49cc51d5234: remove /var/lib/docker/containers/e965cc54c38e5332cde6d6e175f3e662451ab2756a10302d9daee49cc51d5234/secrets/kubernetes.io/serviceaccount: device or resource busy Tried steps here : https://access.redhat.com/solutions/2840311 The above article fails to kill the process. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Some first level triage. All the processes in the support ticket that are holding the mount points are in uninterruptable sleep as indicated by D n the STAT colum: # ps -f 5380 5381 15850 109932 UID PID PPID C STIME TTY STAT TIME CMD root 5380 1 0 Jan18 ? Dl 0:00 java -Xmx256m -Djava.library.path=/opt/draios/lib -Dsun.rmi.transport.connectionTimeout=2000 -Dsun.rmi.transport.tc root 5381 1 0 Jan18 ? D 0:00 statsite -f /opt/draios/etc/statsite.ini 1000090+ 15850 107147 0 Jan18 ? D 0:00 du -sh _state indices node.lock 1000090+ 109932 109905 13 2016 ? Dl 20105:32 /bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccu This is why they are not dying in response to a kill -9. I'm not sure if the customer system is still in this state, but it is likely that a NFS server or block device is not responding. "cat /proc/<pid>/stack" might shed some light on where in the kernel the processes are stuck.
Closing bug as customer case is resolved.