Bug 2218577 - [OSP16.1] Systemd can't start/restart Chronyd in some nodes on OSP
Summary: [OSP16.1] Systemd can't start/restart Chronyd in some nodes on OSP
Keywords:
Status: CLOSED DUPLICATE of bug 1903091
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 16.1 (Train)
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: RHOSP:NFV_Eng
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-29 14:17 UTC by Ricardo Ramos Thomas
Modified: 2023-07-19 12:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-19 12:20:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26225 0 None None None 2023-06-29 14:18:22 UTC

Description Ricardo Ramos Thomas 2023-06-29 14:17:24 UTC
Description of problem:

After a deploy (adding new ceph nodes) CU hit this issue where systemd can't start:

"fatal: [xxx-controller-0]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-computeovsdpdkhtoff-0]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-controller-1]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-computesriov-1]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-computesriovhtoff-1]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-computesriov-2]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-computesriovhtoff-0]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "fatal: [xxx-controller-2]: FAILED! => {\"changed\": false, \"msg\": \"Unable to start service chronyd: Job for chronyd.service failed because a timeout was exceeded.\\nSee \\\"systemctl status chronyd.service\\\" and \\\"journalctl -xe\\\" for details.\\n\"}",
        "NO MORE HOSTS LEFT *************************************************************",
        "PLAY RECAP *********************************************************************",
        "xxx-cephstorage-0        : ok=39   changed=2    unreachable=0    failed=0    skipped=87   rescued=0    ignored=0   ",
        "xxx-cephstorage-1        : ok=39   changed=2    unreachable=0    failed=0    skipped=87   rescued=0    ignored=0   ",
        "xxx-cephstorage-2        : ok=39   changed=2    unreachable=0    failed=0    skipped=87   rescued=0    ignored=0   ",
        "xxx-cephstorage-3        : ok=39   changed=2    unreachable=0    failed=0    skipped=87   rescued=0    ignored=0   ",
        "xxx-computeovsdpdk-0     : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computeovsdpdk-1     : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computeovsdpdkhtoff-0 : ok=28   changed=1    unreachable=0    failed=1    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriov-0       : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriov-1       : ok=28   changed=1    unreachable=0    failed=1    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriov-2       : ok=28   changed=1    unreachable=0    failed=1    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriov-3       : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriov-4       : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriovhtoff-0  : ok=28   changed=1    unreachable=0    failed=1    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriovhtoff-1  : ok=28   changed=1    unreachable=0    failed=1    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriovhtoff-2  : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriovhtoff-3  : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-computesriovhtoff-4  : ok=29   changed=2    unreachable=0    failed=0    skipped=85   rescued=0    ignored=0   ",
        "xxx-controller-0         : ok=43   changed=2    unreachable=0    failed=1    skipped=86   rescued=0    ignored=0   ",
        "xxx-controller-1         : ok=33   changed=1    unreachable=0    failed=1    skipped=81   rescued=0    ignored=0   ",
        "xxx-controller-2         : ok=33   changed=1    unreachable=0    failed=1    skipped=81   rescued=0    ignored=0   ",
        "Monday 19 June 2023  17:39:22 +0200 (0:01:42.411)       0:04:54.795 *********** ",


We try to restart chronyd manually but same result fail

From strace and sos report we notice the following

~~~
$ grep /openvswitch proc/1/mountinfo | wc -l
8191
entries are similar to
82888 82887 0:23 /openvswitch /run/systemd/unit-root/run/openvswitch rw,nosuid,nodev master
~~~


and

~~~
27798 26 0:23 /openvswitch /run/openvswitch rw,nosuid,nodev shared:26 - tmpfs tmpfs rw,seclabel,mode=755
~~~

why those mount points are happen  ?


A reboot looks like solved the issue.


Version-Release number of selected component (if applicable):

RHOSP 16.1.3 (Train)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Chronyd fail and 8k of calls from OVS

Expected results:

Chronyd start normally

Additional info:

SOS reports, strace and info available on the case

Comment 6 Robin Jarry 2023-07-19 12:20:09 UTC

*** This bug has been marked as a duplicate of bug 1903091 ***


Note You need to log in before you can comment on or make changes to this bug.