RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1339164 - HTTP Error" err="Cannot start container <hash>: [8] System error: read parent: connection reset by peer" statusCode=500
Summary: HTTP Error" err="Cannot start container <hash>: [8] System error: read parent...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Antonio Murdaca
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-24 09:46 UTC by Laurent Rineau
Modified: 2023-09-14 03:23 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: when reading from the sync pipe between docker and libcontainer a new left was left behind unread Consequence: failed to start containers with "error: read parent: connection reset by peer" Fix: fix reading all bytes from the sync pipe Result: containers can be started
Clone Of:
Environment:
Last Closed: 2016-06-23 16:18:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Extraction of logs from journalctl (30.29 KB, text/plain)
2016-05-24 09:46 UTC, Laurent Rineau
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1274 0 normal SHIPPED_LIVE docker bug fix and enhancement update 2016-06-23 20:12:28 UTC

Description Laurent Rineau 2016-05-24 09:46:50 UTC
Created attachment 1160968 [details]
Extraction of logs from journalctl

Description of problem:

== TL;DR ==

One image is started every day, along with others, and randomly, 4 days out of 20 days, Docker logs said:

level=error msg="HTTP Error" err="Cannot start container d48905e2cf7edf696e0bbc23a99ab93cb8d31a8055ea7b2c2f9b04e30a5a754a: [8] System error: read parent: connection reset by peer" statusCode=500

I attached the result of:
sudo journalctl -b SYSLOG_IDENTIFIER=kernel \+ SYSLOG_IDENTIFIER=systemd \+ _EXE=/usr/bin/forward-journald --since '2016-05-24 00:19' --until '2016-05-24 00:20'
that is the logs of docker/systemd/kernel around the last occurrence of the issue, today (with obfuscation of user id, hostname, and email address, each replaced by one random string). Docker was turned to debug mode recently to help reporting this issue.

== Longer explanation ==
The CGAL open source project has a python script that start/stop containers to run CPU-intensive tests. Three containers are used in parallel. Every days, about 20 different images are tested.

In about four weeks, there has been four highly unreproductible incidents. That is unreproducible because most of the days, all images have been tested successfully, but four days.

I have tried several version of Docker.
The version docker-1.8.2-10.el7.centos.x86_64 is fine, but I got the issue with:
  docker-1.9.1-25.el7.centos.x86_64 
and
  docker-1.9.1-40.el7.centos.x86_64

As you can see, my system is CentOS 7, and not RHEL 7.


Version-Release number of selected component (if applicable):

I show here the latest tested version:

cgal ~ $ docker version
Client:
 Version:         1.9.1
 API version:     1.21
 Package version: docker-common-1.9.1-40.el7.centos.x86_64
 Go version:      go1.4.2
 Git commit:      ab77bde/1.9.1
 Built:           
 OS/Arch:         linux/amd64

Server:
 Version:         1.9.1
 API version:     1.21
 Package version: docker-common-1.9.1-40.el7.centos.x86_64
 Go version:      go1.4.2
 Git commit:      ab77bde/1.9.1
 Built:           
 OS/Arch:         linux/amd64

cgal ~ $ rpm -qa \*docker\*
python-docker-py-1.7.2-1.el7.noarch
docker-common-1.9.1-40.el7.centos.x86_64
docker-forward-journald-1.9.1-40.el7.centos.x86_64
docker-1.9.1-40.el7.centos.x86_64
docker-selinux-1.9.1-40.el7.centos.x86_64


How reproducible: I cannot reproduce it. I just have to wait a few days.

Comment 1 Laurent Rineau 2016-05-24 09:53:20 UTC
I think the bug is from Docker, or systemd, but just in case, here is the Python script that starts the Docker containers:

https://github.com/CGAL/cgal-testsuite-dockerfiles/blob/358e7e833297b1c3d2a0094e8f038320781ccd13/test_cgal.py

Comment 3 Antonio Murdaca 2016-05-24 15:23:55 UTC
Fixed in docker-1.10 - could you try out with docker-latest?
Upstream issue: https://github.com/docker/docker/issues/14203

Otherwise the fix is https://github.com/opencontainers/runc/pull/515/commits/ddcee3cc2a2ffb3ab8c630fd62689fd14ce82e07 which could be backported to docker-1.9 (Mrunal could do it probably, a lot of conflicts after container's state refactor in libcontainer I don't know about)

Comment 4 Laurent Rineau 2016-05-27 09:20:32 UTC
Well, docker is in EPEL and RHEL Extras. I use the package from EPEL instead of installing Docker myself because I trust the EPEL packagers to be better than me in the subject of the right integration of docker with the rest of the system (in particular with systemd, journald, and SELinux).

So Please fix the bug in RHEL and EPEL.

For the purpose of testing, and helping you fixing the bug, what would be a correct way to install docker-1.10 or later on my system, without breaking the package management and the integration with systemd, journald, and SELinux ? Is there a srpm that I could build locally? Or would it be better try to install another version of docker in /usr/local/? In that case, I know how to deal with SELinux issues, but for the integration with systemd/journald, I am not sure of the procedure.

Can docker-1.9 and 1.10 share the same storage (an LVM volume in my case), if they are never run at the same time?

Comment 6 Luwen Su 2016-06-13 16:39:40 UTC
I ONLY confirm the patch is into the docker-1.10.3-40.el7.x86_64, 
* Neither find a cgl account nor has enough resource to run the script (the script makes my vm's disk runs out...)

If anyone get a chance to trigger the problem again, feels free open the bug

Comment 12 errata-xmlrpc 2016-06-23 16:18:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1274

Comment 13 Red Hat Bugzilla 2023-09-14 03:23:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.