Hide Forgot
3. What is the nature and description of the request? The local docker registry can get corrupt. This should not happen, it should stay stable in all situations. 4. Why does the customer need this? (List the business requirements here) To have a stable environment. To have lower cost in operations. 5. How would the customer like to achieve this? (List the functional requirements here) I don't know the details of how the managing of the local docker registry is implemented. But before starting operations on the local docker registry, it should be checked for available space. If there is not enough space available and no space can be freed, an error should be reported instead of making it corrupt. 6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented. Decrease the size of the local docker registry on a node, so it is possible to fill it quickly. Play around with some pods (new-app, delete pod). The registry should never get corrupt (e.g. pvs does not show warnings). 7. Is there already an existing RFE upstream or in Red Hat Bugzilla? not yet 8. Does the customer have any specific timeline dependencies and which release would they like to target? ASAP. Every customer would like to have STABLE environments. 9. Is the sales team involved in this request and do they have any additional input? Red Hat Consultant on site, account team fully aware of the request. no 10. List any affected packages or components. docker, local docker registry 11. Would the customer be able to assist in testing this functionality if implemented? Yes
This seems to be pointed more at docker registry then at docker.
Couldn't you have just removed some container images, and then it would start working again?
Hello, No it doesnt help once it is corrupted + it is not an effective way as it might prevent deployment of application that need those images for an existing project. Regards, Jaspreet
My point is there is probably a lot of junk images that you don't even know about. atomic image prune Should get rid of hanging images which nothing is using. It would temporarily get you out of this situation, and get your containers working again. Being able to expand the disk image would also help supposedly.
Does a reboot solve the problem? If thin pool is full, xfs can infinitely and to solve that, one needs to add more storage to thin pool and as of now system needs to be rebooted to get rid of unkillable IO thread. I think after reboot,one can also first try to delete some images and hopefully that will work. If not, we first need to add more storage to thin pool and make sure it grows successfully and then do further docker operations.
Once you run into the situation, please attach following - journalctl output - Preferrably run docker daemon in debug mode (-D) - output of commands "lvs", "vgs"
We need to document a better way to get out of this state. You need to reboot. atomic images purge Now if you still need more space list docker images and see if there is other images that can be removed. Long range we have patches for docker-1.10 that will block docker pull and docker create when the system is 90% used up.
No I am not saying that this will not happen in docker-1.10, we are just taking steps to make it less likely, Giving the users 10% of disk space to figure out he is having a problem. This will block new containers and images from being installed but will not prevent existing containers from growing.
Thanks Daniel for the information. But if the docker gets corrupt after growing containers it should have an easy way to get it back to ready state. Reboot will not be an option for any of the users. The only concern is that even they they take preventions and meet the corrupt state then there should be a resolution to that.
xfs going wild is a kernel issue that we can not fix. I believe their is a kernel bug on it. Only way to fix this with current kernels is to reboot.
Hello Daniel, Can you please share the Kernel bugzilla on this. Regards, Jaspreet
https://bugzilla.redhat.com/show_bug.cgi?id=1244300
https://github.com/docker/docker/issues/20707
Vivek, I could not google up a bugzilla on the kernel for this. Do you know of any?
Dan, Following is one of the bugs which talked about xfs being full and leading to hang. https://bugzilla.redhat.com/show_bug.cgi?id=1240437
Since we are now shipping docker-1.10, I am going to close this as fixed in the current release.