Bug 2218577
| Summary: | [OSP16.1] Systemd can't start/restart Chronyd in some nodes on OSP | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ricardo Ramos Thomas <riramos> |
| Component: | openvswitch | Assignee: | RHOSP:NFV_Eng <rhosp-nfv-int> |
| Status: | CLOSED DUPLICATE | QA Contact: | Eran Kuris <ekuris> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.1 (Train) | CC: | apevec, chrisw, rjarry |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-19 12:20:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 1903091 *** |
Description of problem: After a deploy (adding new ceph nodes) CU hit this issue where systemd can't start: "fatal: [xxx-controller-0]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-computeovsdpdkhtoff-0]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-controller-1]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-computesriov-1]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-computesriovhtoff-1]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-computesriov-2]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-computesriovhtoff-0]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "fatal: [xxx-controller-2]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}", "NO MORE HOSTS LEFT *************************************************************", "PLAY RECAP *********************************************************************", "xxx-cephstorage-0 : ok=39 changed=2 unreachable=0 failed=0 skipped=87 rescued=0 ignored=0 ", "xxx-cephstorage-1 : ok=39 changed=2 unreachable=0 failed=0 skipped=87 rescued=0 ignored=0 ", "xxx-cephstorage-2 : ok=39 changed=2 unreachable=0 failed=0 skipped=87 rescued=0 ignored=0 ", "xxx-cephstorage-3 : ok=39 changed=2 unreachable=0 failed=0 skipped=87 rescued=0 ignored=0 ", "xxx-computeovsdpdk-0 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computeovsdpdk-1 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computeovsdpdkhtoff-0 : ok=28 changed=1 unreachable=0 failed=1 skipped=85 rescued=0 ignored=0 ", "xxx-computesriov-0 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computesriov-1 : ok=28 changed=1 unreachable=0 failed=1 skipped=85 rescued=0 ignored=0 ", "xxx-computesriov-2 : ok=28 changed=1 unreachable=0 failed=1 skipped=85 rescued=0 ignored=0 ", "xxx-computesriov-3 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computesriov-4 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computesriovhtoff-0 : ok=28 changed=1 unreachable=0 failed=1 skipped=85 rescued=0 ignored=0 ", "xxx-computesriovhtoff-1 : ok=28 changed=1 unreachable=0 failed=1 skipped=85 rescued=0 ignored=0 ", "xxx-computesriovhtoff-2 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computesriovhtoff-3 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-computesriovhtoff-4 : ok=29 changed=2 unreachable=0 failed=0 skipped=85 rescued=0 ignored=0 ", "xxx-controller-0 : ok=43 changed=2 unreachable=0 failed=1 skipped=86 rescued=0 ignored=0 ", "xxx-controller-1 : ok=33 changed=1 unreachable=0 failed=1 skipped=81 rescued=0 ignored=0 ", "xxx-controller-2 : ok=33 changed=1 unreachable=0 failed=1 skipped=81 rescued=0 ignored=0 ", "Monday 19 June 2023 17:39:22 +0200 (0:01:42.411) 0:04:54.795 *********** ", We try to restart chronyd manually but same result fail From strace and sos report we notice the following ~~~ $ grep /openvswitch proc/1/mountinfo | wc -l 8191 entries are similar to 82888 82887 0:23 /openvswitch /run/systemd/unit-root/run/openvswitch rw,nosuid,nodev master ~~~ and ~~~ 27798 26 0:23 /openvswitch /run/openvswitch rw,nosuid,nodev shared:26 - tmpfs tmpfs rw,seclabel,mode=755 ~~~ why those mount points are happen ? A reboot looks like solved the issue. Version-Release number of selected component (if applicable): RHOSP 16.1.3 (Train) How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Chronyd fail and 8k of calls from OVS Expected results: Chronyd start normally Additional info: SOS reports, strace and info available on the case