RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2140893 - systemd[1]: Failed to start Initial cloud-init job after reboot system via sysrq 'b' [RHEL-9]
Summary: systemd[1]: Failed to start Initial cloud-init job after reboot system via sy...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: cloud-init
Version: 9.1
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Emanuele Giuseppe Esposito
QA Contact: xiachen
URL:
Whiteboard:
Depends On:
Blocks: 2162258 2165942
TreeView+ depends on / blocked
 
Reported: 2022-11-08 06:06 UTC by Frank Liang
Modified: 2023-05-09 08:18 UTC (History)
15 users (show)

Fixed In Version: cloud-init-22.1-8.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2162258 2165942 (view as bug list)
Environment:
Last Closed: 2023-05-09 07:30:25 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cloud-init.log (387.60 KB, text/plain)
2022-11-08 06:06 UTC, Frank Liang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src cloud-init merge_requests 41 0 None opened cc_set_hostname: ignore /var/lib/cloud/data/set-hostname if it's empty (#1967) 2023-01-19 08:54:30 UTC
Red Hat Issue Tracker RHELPLAN-138632 0 None None None 2022-11-08 06:12:23 UTC
Red Hat Product Errata RHBA-2023:2183 0 None None None 2023-05-09 07:30:38 UTC

Description Frank Liang 2022-11-08 06:06:08 UTC
Created attachment 1922984 [details]
cloud-init.log

Description of problem:

Start an instance running RHEL-9.1 on aws t4g.large system, after reboot system via sysrq 'b', the system is not accessible via ssh and cloudinit service failed to start.
$ cat cloud-init.service.log
× cloud-init.service - Initial cloud-init job (metadata service crawler)
     Loaded: loaded (/usr/lib/systemd/system/cloud-init.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Tue 2022-11-08 02:33:59 UTC; 19min ago
    Process: 703 ExecStart=/usr/bin/cloud-init init (code=exited, status=1/FAILURE)
   Main PID: 703 (code=exited, status=1/FAILURE)
        CPU: 424ms

Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:     return _default_decoder.decode(s)
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:   File "/usr/lib64/python3.9/json/decoder.py", line 337, in decode
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:   File "/usr/lib64/python3.9/json/decoder.py", line 355, in raw_decode
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]:     raise JSONDecodeError("Expecting value", s, err.value) from None
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal cloud-init[801]: ------------------------------------------------------------
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: cloud-init.service: Main process exited, code=exited, status=1/FAILURE
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: cloud-init.service: Failed with result 'exit-code'.
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: Failed to start Initial cloud-init job (metadata service crawler).
$ cat journal.log|grep sshd
Nov 08 02:33:57 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: Created slice Slice /system/sshd-keygen.
Nov 08 02:33:57 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: Reached target sshd-keygen.target.
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal sshd[949]: sshd: no hostkeys available -- exiting.
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE
Nov 08 02:33:59 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: sshd.service: Failed with result 'exit-code'.
Nov 08 02:34:41 ip-10-22-1-50.us-west-2.compute.internal systemd[1]: sshd.service: Scheduled restart job, restart counter is at 1.
RHEL Version:
RHEL-9.1(5.14.0-162.6.1.el9_1.x86_64)

How reproducible:
50%

Steps to Reproduce:
1. Create an aws t4g.large instance using RHEL-9.1.0_HVM-20221101
2. Trigger system reboot('echo b > /proc/sysrq-trigger & echo b > /proc/sysrq-trigger')
3. Repeat step1~2 if cannot reproduce it.
4. option, reproduce in auto
$ os-tests --user ec2-user --keyfile /home/virtqe_s1.pem --platform_profile /home/aws.yaml -p test_reboot_simultaneous


Actual results:
cannot access system via ssh after boot up

Expected results:
system can boot up and access normally

Additional info:
- N/A

Comment 8 Eduardo Otubo 2022-12-06 09:00:36 UTC
Raising the priority as inaccessible VMs can't happen. @eesposit can you follow up upstream? Thanks!

Comment 13 Eduardo Otubo 2022-12-19 13:43:26 UTC
Is this bug a regression or does it happen on other branches too?

Comment 14 Emanuele Giuseppe Esposito 2022-12-19 14:02:08 UTC
Frank can correct me, but it seems to affect 9.1 so also 9.2/8.7/8.8.

Comment 16 Frank Liang 2022-12-28 04:01:05 UTC
(In reply to Emanuele Giuseppe Esposito from comment #14)
> Frank can correct me, but it seems to affect 9.1 so also 9.2/8.7/8.8.

Yes, we can trigger this exception by manually empty "/var/lib/cloud/data/set-hostname".

Comment 19 Emanuele Giuseppe Esposito 2023-01-17 08:37:40 UTC
upstream PR: https://github.com/canonical/cloud-init/pull/1967

Comment 32 errata-xmlrpc 2023-05-09 07:30:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (cloud-init bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2183


Note You need to log in before you can comment on or make changes to this bug.