Bug 1866738

Summary: [OSP] OCP failed to install with rhcos-46.82.202008030340-0
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: RHCOSAssignee: Benjamin Gilbert <bgilbert>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.6CC: achernet, bbreard, bgilbert, dsanzmor, gfontana, imcleod, jialiu, jligon, miabbott, nstielau, rpecora, smaitra
Target Milestone: ---Keywords: TestBlocker, TestBlockerForLayeredProduct
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Telco
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:25:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rhcos boot log
none
rhcos boot log on baremetal none

Description weiwei jiang 2020-08-06 08:51:12 UTC
Description of problem: 
After RHCOS bump to rhcos-46.82.202008030340-0, OCP on OSP does not work as before. 
We can not provide more detail since the servers can not be sshed. 

rhcos server boot logs is attached.

But rhcos-46.82.202007212240-0 works well.

 
Version-Release number of selected component (if applicable): 
rhcos-46.82.202008030340-0 
./openshift-install 4.6.0-0.nightly-2020-08-06-062308 
built from commit 1da6208796da9e0fa4cd8a4fa1467094342c6168 
release image registry.svc.ci.openshift.org/ocp/release@sha256:eebe346a5c0bc115c6dcea5e7c4cf2203c3d70bb72afb186f8a805fb6c80bcc5 
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1. Try to setup one OCP on OSP with target RHCOS, no matter IPI or UPI is ok  
2. 
3. 
 
Actual results: 
no matter IPI or UPI, Install failed. 
 
Expected results: 
Should succeed 
 
Additional info:

Comment 1 weiwei jiang 2020-08-06 08:53:35 UTC
Created attachment 1710627 [details]
rhcos boot log

Comment 2 David Sanz 2020-08-06 11:24:48 UTC
Also failing on baremetal.

Last lines of console are (entire log attached):

[  235.444464] EDAC MC0: Giving out device to module amd64_edac controller F17h_M30h: DEV 0000:00:18.3 )
[  235.444476] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 00)
[  235.444476] AMD64 EDAC driver v3.5.0
[  235.445249] Console: switching to colour dummy device 80x25
[  235.453002] ipmi_si IPI0001:00: The BMC does not support setting the recv irq bit, compensating, but.
[  235.459752] [TTM] Zone  kernel: Available graphics memory: 32768880 KiB
[  235.484067] ipmi_si IPI0001:00: Using irq 10
[  235.487179] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[  235.487180] [TTM] Initializing pool allocator
[  235.518043] ipmi_si IPI0001:00: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20)
[  235.520634] [TTM] Initializing DMA pool allocator
[  235.595246] ipmi_si IPI0001:00: IPMI kcs interface initialized
[  235.602624] fbcon: mgag200drmfb (fb0) is primary device
[  235.602665] Console: switching to colour frame buffer device 128x48
[  235.604194] ipmi_ssif: IPMI SSIF Interface driver
[  236.003995] mgag200 0000:c2:00.0: fb0: mgag200drmfb frame buffer device
[  236.012013] [drm] Initialized mgag200 1.0.0 20110418 for 0000:c2:00.0 on minor 0
[  236.167040] i40e: Registered client i40iw
[  236.296748] Rounding down aligned max_sectors from 4294967295 to 4294967288
[  236.303896] db_root: cannot open: /etc/target
[  236.324538] iscsi: registered transport (iser)
[  236.398804] RPC: Registered named UNIX socket transport module.
[  236.398806] RPC: Registered udp transport module.
[  236.398807] RPC: Registered tcp transport module.
[  236.398808] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  236.449055] RPC: Registered rdma transport module.
[  236.449056] RPC: Registered rdma backchannel transport module.

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.

[  738.522839] kauditd_printk_skb: 235 callbacks suppressed
[  738.522840] audit: type=1400 audit(1596712624.001:247): avc:  denied  { unlink } for  pid=2044 comm=0

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.






Cannot ssh server, it is on connection refused

Comment 3 David Sanz 2020-08-06 11:25:47 UTC
Created attachment 1710642 [details]
rhcos boot log on baremetal

Comment 4 Micah Abbott 2020-08-06 20:13:41 UTC
In the bare metal case, the bootstrap.log has its messages truncated which makes debugging a bit difficult, however the big thing standing out is the SELinux denials all over the place.  There's some ACPI errors in there, but not sure if they are fatal.

@dsanzmor Is it possible to get the bootstrap.log without the lines truncated?  The bare metal failure may be better tracked in a separate BZ.


For the OSP case, the attached log shows that the node appears to have booted successfully with an IP and a login prompt.  The package diff between the two versions, 46.82.202007212240-0 and 46.82.202008030340-0, shows a number of changes, but without additional information, it is hard to understand what is failing.  @wjiang can you provide any other information to help investigation?


```
$ ./differ.py -fe api.ci -fr rhcos-4.6/46.82.202007212240-0 -se api.ci -sr rhcos-4.6/46.82.202008030340-0                                                                                                        
{                                                                                                         
    "sources": {
        "rhcos-4.6/46.82.202007212240-0": "https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202007212240-0/x86_64/commitmeta.json",                                              
        "rhcos-4.6/46.82.202008030340-0": "https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.6/46.82.202008030340-0/x86_64/commitmeta.json"                                               
    },                                                                                                    
    "diff": {
        "NetworkManager": {  
            "rhcos-4.6/46.82.202007212240-0": "NetworkManager-1.22.8-5.el8_2.x86_64",                     
            "rhcos-4.6/46.82.202008030340-0": "NetworkManager-1.22.8-6.el8_2.x86_64"   
        },
        "NetworkManager-libnm": {
            "rhcos-4.6/46.82.202007212240-0": "NetworkManager-libnm-1.22.8-5.el8_2.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "NetworkManager-libnm-1.22.8-6.el8_2.x86_64"
        },
        "NetworkManager-ovs": {
            "rhcos-4.6/46.82.202007212240-0": "NetworkManager-ovs-1.22.8-5.el8_2.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "NetworkManager-ovs-1.22.8-6.el8_2.x86_64"
        },
        "NetworkManager-team": {
            "rhcos-4.6/46.82.202007212240-0": "NetworkManager-team-1.22.8-5.el8_2.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "NetworkManager-team-1.22.8-6.el8_2.x86_64"
        },
        "NetworkManager-tui": {
            "rhcos-4.6/46.82.202007212240-0": "NetworkManager-tui-1.22.8-5.el8_2.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "NetworkManager-tui-1.22.8-6.el8_2.x86_64"
        },
        "conmon": {
            "rhcos-4.6/46.82.202007212240-0": "conmon-2.0.17-1.rhaos4.5.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "conmon-2.0.20-1.rhaos4.6.el8.x86_64"
        },
        "containers-common": {
            "rhcos-4.6/46.82.202007212240-0": "containers-common-1.0.0-1.module+el8.2.1+6676+604e1b26.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "containers-common-1.1.1-2.rhaos4.6.el8.x86_64"
        },
        "coreos-installer": {
            "rhcos-4.6/46.82.202007212240-0": "coreos-installer-0.2.0-4.rhaos4.6.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "coreos-installer-0.5.0-1.rhaos4.6.el8.x86_64"
        },
        "coreos-installer-systemd": {
            "rhcos-4.6/46.82.202007212240-0": "coreos-installer-systemd-0.2.0-4.rhaos4.6.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "Not present"
        },
        "cri-o": {
            "rhcos-4.6/46.82.202007212240-0": "cri-o-1.19.0-41.rhaos4.6.git988f60e.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "cri-o-1.19.0-61.rhaos4.6.git79c1228.el8.x86_64"
        },
        "grub2-common": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-common-2.02-82.el8_2.1.noarch",
            "rhcos-4.6/46.82.202008030340-0": "grub2-common-2.02-87.el8_2.noarch"
        },
        "grub2-efi-x64": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-efi-x64-2.02-82.el8_2.1.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "grub2-efi-x64-2.02-87.el8_2.x86_64"
        },
        "grub2-pc": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-pc-2.02-82.el8_2.1.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "grub2-pc-2.02-87.el8_2.x86_64"
        },
        "grub2-pc-modules": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-pc-modules-2.02-82.el8_2.1.noarch",
            "rhcos-4.6/46.82.202008030340-0": "grub2-pc-modules-2.02-87.el8_2.noarch"
        },
        "grub2-tools": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-tools-2.02-82.el8_2.1.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "grub2-tools-2.02-87.el8_2.x86_64"
        },
        "grub2-tools-extra": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-tools-extra-2.02-82.el8_2.1.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "grub2-tools-extra-2.02-87.el8_2.x86_64"
        },
        "grub2-tools-minimal": {
            "rhcos-4.6/46.82.202007212240-0": "grub2-tools-minimal-2.02-82.el8_2.1.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "grub2-tools-minimal-2.02-87.el8_2.x86_64"
        },
        "ignition": {
            "rhcos-4.6/46.82.202007212240-0": "ignition-2.3.0-1.rhaos4.6.gitee616d5.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "ignition-2.5.0-1.rhaos4.6.git0d6f3e5.el8.x86_64"
        },
        "openshift-clients": {
            "rhcos-4.6/46.82.202007212240-0": "openshift-clients-4.6.0-202007212120.p0.git.3658.e2f0cb0.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "openshift-clients-4.6.0-202008011451.p0.git.3685.3939f2f.el8.x86_64"
        },
        "openshift-hyperkube": {
            "rhcos-4.6/46.82.202007212240-0": "openshift-hyperkube-4.6.0-202007110420.p1.git.0.4de1d1d.el8.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "openshift-hyperkube-4.6.0-202008011154.p0.git.93402.577b186.el8.x86_64"
        },
        "openvswitch2.13": {
            "rhcos-4.6/46.82.202007212240-0": "openvswitch2.13-2.13.0-39.el8fdp.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "openvswitch2.13-2.13.0-49.el8fdp.x86_64"
        },
        "shim-x64": {
            "rhcos-4.6/46.82.202007212240-0": "shim-x64-15-11.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "shim-x64-15-15.el8_2.x86_64"
        },
        "skopeo": {
            "rhcos-4.6/46.82.202007212240-0": "skopeo-1.0.0-1.module+el8.2.1+6676+604e1b26.x86_64",
            "rhcos-4.6/46.82.202008030340-0": "skopeo-1.1.1-2.rhaos4.6.el8.x86_64"
        },
        "toolbox": {
            "rhcos-4.6/46.82.202007212240-0": "toolbox-0.0.7-1.rhaos4.5.el8.noarch",
            "rhcos-4.6/46.82.202008030340-0": "toolbox-0.0.8-1.rhaos4.6.el8.noarch"
        },
        "coreos-installer-bootinfra": {
            "rhcos-4.6/46.82.202007212240-0": "Not present",
            "rhcos-4.6/46.82.202008030340-0": "coreos-installer-bootinfra-0.5.0-1.rhaos4.6.el8.x86_64"
        },
        "openssl-pkcs11": {
            "rhcos-4.6/46.82.202007212240-0": "Not present",
            "rhcos-4.6/46.82.202008030340-0": "openssl-pkcs11-0.4.10-2.el8.x86_64"
        }
    }
}
```

Comment 5 weiwei jiang 2020-08-07 02:05:26 UTC
I tried again, and the same result, even though the boot log show login prompt, but I can not ssh into the server.
So any advice I can have a try to fetch more details?

# openstack server list --name wj46                                                                                                                                                                               
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------+-------------------------+-----------+
| ID                                   | Name                        | Status | Networks                                               | Image                   | Flavor    |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------+-------------------------+-----------+
| 641fd4bd-a4e3-4497-a7f1-595a6d787390 | wj46ios807a-q8ht2-bootstrap | ACTIVE | wj46ios807a-q8ht2-openshift=192.168.0.211, 10.0.103.46 | wj46ios807a-q8ht2-rhcos | m1.xlarge |
| 453e0f5e-5f00-4f49-bfce-57bbf7c9f8f9 | wj46ios807a-q8ht2-master-2  | ACTIVE | wj46ios807a-q8ht2-openshift=192.168.1.2                | wj46ios807a-q8ht2-rhcos | m1.xlarge |
| e5fb6bfc-79cb-4e27-b90a-91a10d7fa002 | wj46ios807a-q8ht2-master-1  | ACTIVE | wj46ios807a-q8ht2-openshift=192.168.2.220              | wj46ios807a-q8ht2-rhcos | m1.xlarge |
| 6f4fdc91-71a7-4293-bc85-656891def5dd | wj46ios807a-q8ht2-master-0  | ACTIVE | wj46ios807a-q8ht2-openshift=192.168.3.150              | wj46ios807a-q8ht2-rhcos | m1.xlarge |
+--------------------------------------+-----------------------------+--------+--------------------------------------------------------+-------------------------+-----------+

$ ssh -i ~/.ssh/openshift-qe.pem core.103.46 -v                                                                                                                                                                                  255 ↵
OpenSSH_8.1p1, OpenSSL 1.1.1g FIPS  21 Apr 2020
debug1: Reading configuration data /home/wjiang/.ssh/config
debug1: /home/wjiang/.ssh/config line 12: Applying options for 10.*
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: /etc/ssh/ssh_config.d/05-redhat.conf line 8: Applying options for *
debug1: Connecting to 10.0.103.46 [10.0.103.46] port 22.
debug1: fd 4 clearing O_NONBLOCK
debug1: Connection established.
debug1: identity file /home/wjiang/.ssh/openshift-qe.pem type -1
debug1: identity file /home/wjiang/.ssh/openshift-qe.pem-cert type -1
debug1: identity file /home/wjiang/.ssh/libra-new.pem type -1
debug1: identity file /home/wjiang/.ssh/libra-new.pem-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.0
debug1: match: OpenSSH_8.0 pat OpenSSH* compat 0x04000000
debug1: Authenticating to 10.0.103.46:22 as 'core'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: aes256-gcm MAC: <implicit> compression: none
debug1: kex: client->server cipher: aes256-gcm MAC: <implicit> compression: none
debug1: kex: curve25519-sha256 need=32 dh_need=32
debug1: kex: curve25519-sha256 need=32 dh_need=32
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:RDYSsCy6DvPzso9rZrUgjjdB7RoCClCf/1CARjQXkS4
Warning: Permanently added '10.0.103.46' (ECDSA) to the list of known hosts.
debug1: rekey out after 4294967296 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 4294967296 blocks
debug1: Will attempt key: /home/wjiang/.ssh/openshift-qe.pem  explicit
debug1: Will attempt key: /home/wjiang/.ssh/libra-new.pem  explicit
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: publickey
debug1: Trying private key: /home/wjiang/.ssh/openshift-qe.pem
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Trying private key: /home/wjiang/.ssh/libra-new.pem
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: No more authentication methods to try.
core.103.46: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Comment 6 weiwei jiang 2020-08-07 02:10:10 UTC
If the server boot well, then how to check if the ignition is injected to the server, as 4.6 have different output than 4.5.

Comment 7 weiwei jiang 2020-08-07 02:13:21 UTC
[    7.784431] ignition[719]: Ignition 2.5.0
[    7.787822] ignition[719]: Stage: fetch-offline
[    7.791479] ignition[719]: fetched base config from "system"
[    7.795587] ignition[719]: reading system config file "/usr/lib/ignition/base.ign"
[    7.801315] ignition[719]: no config URL provided
[    7.804568] ignition[719]: reading system config file "/usr/lib/ignition/user.ign"
[    7.810299] ignition[719]: no config at "/usr/lib/ignition/user.ign"
[    7.814545] ignition[719]: failed to fetch config from metadata service: resource requires networking
[[0m[0;31m*     [0m] A start job is running for Ignition (fetch-offline) (8s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (fetch-offline) (8s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is 
running for Ignition (fetch-offline) (9s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (fetch-offline) (9s / no limit)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (fetch-o$
fline) (10s / no limit)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Ignition (fetch-offline) (11s / no limit)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (fetch-offline) (11s / no limit)[K[     [0;31$
*[0m] A start job is running for Ignition (fetch-offline) (12s / no limit)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (fetch-offline) (12s / no limit)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Ign$
tion (fetch-offline) (13s / no limit)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (fetch-offline) (14s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (fetch-offline) (14s / 
no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ignition (fetch-offline) (15s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (fetch-offline) (15s / no limit)[K[[0m[0;31m*     [0m] A 
start job is running for Ignition (fetch-offline) (16s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (fetch-offline) (16s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ignition 
(fetch-offline) (17s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (fetch-offline) (18s / no limit)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (fetch-offline) (18s / no l$
mit)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Ignition (fetch-offline) (19s / no limit)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (fetch-offline) (19s / no limit)[K[     [0;31m*[0m] A start job $
s running for Ignition (fetch-offline) (20s / no limit)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (fetch-offline) (21s / no limit)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Ignition (fetch-offlin$
) (21s / no limit)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (fetch-offline) (22s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (fetch-offline) (22s / no limit)[K[[0;31m$
[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ignition (fetch-offline) (23s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (fetch-offline) (23s / no limit)[K[[0m[0;31m*     [0m] A start job is runni$
g for Ignition (fetch-offline) (24s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (fetch-offline) (25s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] A start job is running for Ignition (fetch-offline) (2$
s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (fetch-offline) (26s / no limit)[K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (fetch-offline) (26s / no limit)[K[   [0;31m*[$
;1;31m*[0m[0;31m*[0m] A start job is running for Ignition (fetch-offline) (27s / no limit)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (fetch-offline) (28s / no limit)[K[     [0;31m*[0m] A start job is running for Igni$
ion (fetch-offline) (28s / no limit)[K[    [0;31m*[0;1;31m*[0m] A start job is running for Ignition (fetch-offline) (29s / no limit)[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for Ignition (fetch-offline) (29s / no limit)$
K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for Ignition (fetch-offline) (30s / no limit)[K[ [0;31m*[0;1;31m*[0m[0;31m*  [0m] A start job is running for Ignition (fetch-offline) (31s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m$
   [0m] A start job is running for Ignition (fetch-offline) (31s / no limit)[K[[0;1;31m*[0m[0;31m*    [0m] A start job is running for Ignition (fetch-offline) (32s / no limit)[K[[0m[0;31m*     [0m] A start job is running for Ignition (fe$
ch-offline) (32s / no limit)[   37.786130] ignition[719]: neither config drive nor metadata service were available in time. Continuing without a config...
[   37.795374] ignition[719]: not a config (empty): provider config was empty, continuing with empty cache config
[   37.801472] ignition[719]: timed out while fetching config from config drive (CONFIG-2)
[K[[0;32m  OK  [0m] Started Ignition (fetch-offline).
[   37.808937] systemd[1]: Started Ignition (fetch-offline).
[   37.812515] ignition[719]: timed out while fetching config from config drive (config-2)
         Starting Copy CoreOS Firstboot Networking Config...
[   37.819200] systemd[1]: Starting Copy CoreOS Firstboot Networking Config...
         Starting Check for FIPS mode...
[   37.824334] ignition[719]: fetch-offline: fetch-offline passed


Does this means ignition fetch failed

Comment 8 Johnny Liu 2020-08-07 04:11:08 UTC
I am running a baremetal install using the same boot image to launch vms, hit the same issues. 

Bootstrap failed, but I even can not ssh log into the bootstrap vms. This is also blocking QE's baremetal install.

Comment 10 Micah Abbott 2020-08-07 14:07:32 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1867091 for the bare metal issue


For the OSP case, it looks like the Igntion fetch failed:

```
[   37.786130] ignition[719]: neither config drive nor metadata service were available in time. Continuing without a config...
[   37.795374] ignition[719]: not a config (empty): provider config was empty, continuing with empty cache config
[   37.801472] ignition[719]: timed out while fetching config from config drive (CONFIG-2)
[K[[0;32m  OK  [0m] Started Ignition (fetch-offline).
[   37.808937] systemd[1]: Started Ignition (fetch-offline).
[   37.812515] ignition[719]: timed out while fetching config from config drive (config-2)
```

Earlier it looks like networking isn't available:

`[    7.814545] ignition[719]: failed to fetch config from metadata service: resource requires networking`


@slowrie can you investigate this further?

Comment 11 Benjamin Gilbert 2020-08-07 16:24:41 UTC
This is https://github.com/coreos/ignition/issues/1056, which is fixed upstream and needs a backport to RHCOS.

Comment 12 Benjamin Gilbert 2020-08-08 05:53:16 UTC
Repro: launch an RHCOS instance in an OpenStack cluster that exposes userdata via the OpenStack metadata service.  If the Ignition config is not applied, and you see

    failed to fetch config from metadata service: resource requires networking
    ...
    neither config drive nor metadata service were available in time. Continuing without a config...

...you have the bug.

Comment 15 Micah Abbott 2020-08-08 13:53:21 UTC
Additional details:

- the linked fix was included in uptream Ignition 2.6.0
- a downstream package of that version was made: `ignition-2.6.0-1.rhaos4.6.git947598e.el8`

Comment 16 weiwei jiang 2020-08-10 04:55:21 UTC
Checked OCP on OSP both IPI and UPI with rhcos-46.82.202008080704-0 and ignition- 2.6.0-1.rhaos4.6.git947598e.el8

Also checked with IPI on Baremetal with OpenStack, work well now.

Comment 17 Johnny Liu 2020-08-10 06:08:21 UTC
If anything is okay, pls bump up rhcos verison in data/data/rhcos.json, so that QE can verify this bug.

Comment 18 Micah Abbott 2020-08-10 20:22:45 UTC
This should be the installer PR - https://github.com/openshift/installer/pull/4036

Comment 19 weiwei jiang 2020-08-11 02:13:45 UTC
(In reply to Micah Abbott from comment #18)
> This should be the installer PR -
> https://github.com/openshift/installer/pull/4036

Hi, the RHCOS in the PR is not contain the fix. Please help update RHCOS version greater than rhcos-46.82.202008080704. Thanks

Comment 20 weiwei jiang 2020-08-11 02:20:22 UTC
(In reply to weiwei jiang from comment #16)
> Checked OCP on OSP both IPI and UPI with rhcos-46.82.202008080704-0 and
> ignition- 2.6.0-1.rhaos4.6.git947598e.el8
> 
> Also checked with IPI on Baremetal with OpenStack, work well now.

Move to verified according to comment 16, and will open another BZ for the installer RHCOS bindings.

Comment 21 weiwei jiang 2020-08-11 02:32:05 UTC
The BZ for installer RHCOS binding - https://bugzilla.redhat.com/show_bug.cgi?id=1867853

Comment 23 errata-xmlrpc 2020-10-27 16:25:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 24 Rafael Pécora 2021-02-02 18:39:50 UTC
Bug seemed to be closed, otherwise I got baremetal pxe boot on IBM Power 9 -  ppcle architecture issue.
After pxe boot started the RHCOS installation (OCP 4.6.13) the installation program stucked at message: 

[   15.573440] RPC: Registered rdma backchannel transport module.


Steps to reproduce:

1. Tftpboot package fair configured to be reached over Red Hat Enterprise Linux 8.
2. GRUB2 pxe menu and grub2.conf at tftp server.
3. A Live Image ISO of CoreOS (rhcos-live.ppc64le.iso).
4. An initrd and rootfs image available at tftp server.
5. A Httpd server to reside iso, images and ignition config file


Start the VM from HMC (hardware management console) to run over PXE Boot Menu, the machine reached the tft server, grab an IP address leased by dhcp and starts grub menu with the similar options below:

###########
##grub2.cfg

default=0
fallback=1
timeout=1
menuentry "Bootstrap CoreOS (BIOS)" {
echo "Loading kernel Bootstrap"
linux "/rhcos-live-kernel-ppc64le" rd.neednet=1 ip=dhcp console=tty0 console=ttyS0 coreos.inst=yes coreos.inst.install_dev=sda coreos.inst.image_url=http://10.19.7.81/rhcos/rhcos-metal.ppc64le.raw.gz coreos.inst.ignition_url=http://10.19.7.81/bootstrap.ign
echo "Loading initrd"
initrd "/rhcos-4.3.18-ppc64le-installer-initramfs.ppc64le.img"
}
##########

The installation starts and afterward it stucked at error described.


Expect Results:

RHCOS Installation Succeed

Actual Results:

Freezes the installation


Some Installation error log:


[   13.138381] systemd[1]: Starting Create Static Device Nodes in /dev...            
[   13.181268] synth uevent: /devices/vio: failed to send uevent                    
[   13.181270] vio vio: uevent: failed to send synthetic uevent                      
[   13.182784] synth uevent: /devices/vio/4000: failed to send uevent                
[   13.182785] vio 4000: uevent: failed to send synthetic uevent                    
[   13.182836] synth uevent: /devices/vio/4001: failed to send uevent                
[   13.182837] vio 4001: uevent: failed to send synthetic uevent                    
[   13.182887] synth uevent: /devices/vio/4002: failed to send uevent                
[   13.182888] vio 4002: uevent: failed to send synthetic uevent                    
[   13.182938] synth uevent: /devices/vio/4004: failed to send uevent                
[   13.182939] vio 4004: uevent: failed to send synthetic uevent                    
[   13.267097] systemd-journald[1224]: Received request to flush runtime journal from
 PID 1                                                                              
[   13.522803] pseries_rng: Registering IBM pSeries RNG driver                      
[   15.458025] Rounding down aligned max_sectors from 4294967295 to 4294967168      
[   15.458229] db_root: cannot open: /etc/target                                    
[   15.476879] iscsi: registered transport (iser)                                    
[   15.553414] RPC: Registered named UNIX socket transport module.                  
[   15.553462] RPC: Registered udp transport module.                                
[   15.553487] RPC: Registered tcp transport module.                                
[   15.553511] RPC: Registered tcp NFSv4.1 backchannel transport module.            
[   15.573410] RPC: Registered rdma transport module.                                
[   15.573440] RPC: Registered rdma backchannel transport module.

Comment 25 Benjamin Gilbert 2021-02-02 19:22:14 UTC
That appears to be a different issue from the one reported here.  Please open a new bug.