Bug 1608505
Summary: | oc cluster up fails with error Error: failed to start the web console server | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Lukas Slebodnik <lslebodn> | ||||
Component: | origin | Assignee: | Jakub Čajka <jcajka> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 29 | CC: | adimania, admiller, amurdaca, dwalsh, ealfassa, fkluknav, ichavero, jcajka, joe, lnykryn, lsm5, marianne, mpatel, msekleta, nalin, santiago, ssahani, s, systemd-maint, tdawson, vbatts, zbyszek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | origin-3.11.0-0.alpha1.0.fc30 origin-3.11.0-0.alpha1.0.fc29 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-10-09 00:05:21 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1598406 | ||||||
Attachments: |
|
Description
Lukas Slebodnik
2018-07-25 16:27:03 UTC
Created attachment 1470541 [details]
Output of journalctl -u docker
This seems to affect only rawhide(f27, f28 with the rawhide origin(3.9) is not affected), I have managed to reproduce it. It seems that the docker daemon fails to pull openshift/origin-pod:v3.9.0 based on `time="2018-07-25T12:13:05.554183434-04:00" level=error msg="Handler for GET /v1.26/images/openshift/origin-pod:v3.9.0/json returned error: No such image: openshift/origin-pod:v3.9.0"` in log, although the image is available and pull-able on the host. As I'm planning to do the rebase to 3.10, I will revisit this issue after the rebase. (In reply to Jakub Čajka from comment #2) > This seems to affect only rawhide(f27, f28 with the rawhide origin(3.9) is > not affected), I have managed to reproduce it. It seems that the docker > daemon fails to pull openshift/origin-pod:v3.9.0 based on > `time="2018-07-25T12:13:05.554183434-04:00" level=error msg="Handler for GET > /v1.26/images/openshift/origin-pod:v3.9.0/json returned error: No such > image: openshift/origin-pod:v3.9.0"` in log, although the image is available > and pull-able on the host. > > As I'm planning to do the rebase to 3.10, I will revisit this issue after > the rebase. Is there any ETA? I have managed to reproduce it on rawhide with origin 3.10 alpha release. It seems that something breaks networking/image pulling on the machine. This happens even with same version of docker as run on f28 not affected machine(anecdotal same as mentioned by reporter) and disabled firewall and selinux. This even happens with default docker configuration(not using the commands from the reproducer). I'm kind of out of ideas what I can do tho debug this or what can influence the networking/pulling. Trying docker now, although I don't believe that it is the root cause. Folks do you have any ideas? Errors observed in the log(that are not present with successful oc up) Aug 01 14:24:15 localhost.localdomain dockerd-current[810]: time="2018-08-01T14:24:15.749839355+02:00" level=error msg="Handler for GET /v1.26/images/openshift/origin-pod:v3.10/json returned error: No such image: openshift/origin-pod:v3.10" Aug 01 14:24:15 localhost.localdomain dockerd-current[810]: time="2018-08-01T14:24:15.750494849+02:00" level=error msg="Handler for GET /v1.26/images/openshift/origin-pod:v3.10/json returned error: No such image: openshift/origin-pod:v3.10" Aug 01 14:24:15 localhost.localdomain dockerd-current[810]: time="2018-08-01T14:24:15.767248526+02:00" level=warning msg="failed to retrieve docker-init version: unknown output format: tini version 0.18.0\n This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle. Changing version to '29'. Hm... Seems that there is issue with systemd. Updating the systemd on the f28 to the version from rawhide results in timeout waiting for "https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused ()". It seems to be unreachable/closed(mind that firewall is disabled along with selinux) Systemd folks, have there been changes in rawhide/f29 systemd that this could be attributed to? (In reply to Jakub Čajka from comment #6) > Hm... Seems that there is issue with systemd. Updating the systemd on the > f28 to the version from rawhide results in timeout waiting for > "https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: > connect: connection refused ()". It seems to be unreachable/closed(mind that > firewall is disabled along with selinux) > > Systemd folks, have there been changes in rawhide/f29 systemd that this > could be attributed to? I can confirm that it works well with systemd-238-9.git0e0aa59.fc29.x86_64. (In reply to Lukas Slebodnik from comment #7) > (In reply to Jakub Čajka from comment #6) > > Hm... Seems that there is issue with systemd. Updating the systemd on the > > f28 to the version from rawhide results in timeout waiting for > > "https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: > > connect: connection refused ()". It seems to be unreachable/closed(mind that > > firewall is disabled along with selinux) > > > > Systemd folks, have there been changes in rawhide/f29 systemd that this > > could be attributed to? > > I can confirm that it works well with systemd-238-9.git0e0aa59.fc29.x86_64. And it doesn't with 239-3.fc29, right? I can confirm that it doesn't work with 239-3.fc29. Downgrading to systemd-238-9.git0e0aa59.fc29.x86_64 allows origin 3.10 to install correctly. This might be related to https://bugzilla.redhat.com/show_bug.cgi?id=1568594 and https://bugzilla.redhat.com/show_bug.cgi?id=1558425 For the record steps to reproduce. 1. Clean f28(of f29) install 2. Update to rawhide/f29 systemd 239(Downgrade f29 to 238) 3 systemctl disable firewalld 4. reboot 5. dnf isntall origin-clients docker 6. create /etc/docker/daemon.json with contents { "insecure-registries" : [ "172.30.0.0/16" ] } 7. systemctl start docker 8. oc cluster up It should fail as up mentioned *** Bug 1629431 has been marked as a duplicate of this bug. *** Based on the discussion in https://pagure.io/atomic-wg/issue/510 it is and origin/runc issue. origin-3.11.0-0.alpha1.0.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-7ed03f9dcf origin-3.11.0-0.alpha1.0.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-7ed03f9dcf origin-3.11.0-0.alpha1.0.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report. |