Bug 1779755

Summary: RHCOS Installer kernel panics during pxeboot
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: RHCOSAssignee: Colin Walters <walters>
Status: CLOSED WORKSFORME QA Contact: Michael Nguyen <mnguyen>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: bbreard, dustymabe, imcleod, jligon, miabbott, nstielau, rnoriega
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-12 19:08:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1773108, 1776011    
Attachments:
Description Flags
serial console dump none

Description Scott Dodson 2019-12-04 16:29:11 UTC
Created attachment 1642124 [details]
serial console dump

PXEBoot RHCOS Installer is failing with a kernel panic and seemingly not attempting to decompress the initrd. Full output attached as a file.


iPXE 1.0.255+(3fe683e) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP HTTPS iSCSI TFTP SRP VLAN AoE EFI Menu

net0: 98:03:9b:4a:c8:2c using ConnectX-4Lx on 0000:01:00.0 (open)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
Configuring (net0 98:03:9b:4a:c8:2c).................. ok
net0: 147.75.90.146/255.255.255.252 gw 147.75.90.145
net0: fe80::9a03:9bff:fe4a:c82c/64
net1: fe80::9a03:9bff:fe4a:c82d/64 (inaccessible)
Next server: 147.75.200.3
Filename: http://147.75.200.3/auto.ipxe
http://147.75.200.3/auto.ipxe... ok
auto.ipxe : 433 bytes [script]
Packet.net Baremetal - iPXE boot
http://147.75.200.3/phone-home... ok
http://http-matchbox.svc.ci.openshift.org/ipxe... ok
https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.81.201911221453.0/x86_64/rhcos-43.81.201911221453.0-installer-kernel-x86_64..... ok                              
https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.81.201911221453.0/x86_64/rhcos-43.81.201911221453.0-installer-initramfs.x86_64.img... ok 
[    0.000000] Linux version 4.18.0-147.0.3.el8_1.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP Mon Nov 11 12:58:36 UTC 2019
[    0.000000] Command line: rhcos-43.81.201911221453.0-installer-kernel-x86_64 console=tty0 console=ttyS1,115200n8 rd.neednet=1 coreos.inst=yes coreos.inst.image_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.81.201911221453.0/x86_64/rhcos-43.81.201911221453.0-metal.x86_64.raw.gz coreos.inst.skip_media_check coreos.inst.ignition_url=http://http-matchbox.svc.ci.openshift.org/ignition?cluster_id=ci-op-t04x6260-d3e37&role=bootstrap

...

[    4.924261] md: Waiting for all devices to be available before autodetect
[    4.931008] usb 3-1.4: new high-speed USB device number 4 using xhci_hcd
[    4.937640] md: If you don't use raid, use raid=noautodetect
[    4.964064] md: Autodetecting RAID arrays.
[    4.975729] md: autorun ...
[    4.985627] md: ... autorun DONE.
[    4.996045] List of all partitions:
[    5.006466] No filesystem could mount root, tried: 
[    5.006466] 
[    5.026509] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    5.027926] usb 3-1.4: New USB device found, idVendor=1604, idProduct=10c0, bcdDevice= 0.00
[    5.041691] CPU: 39 PID: 1 Comm: swapper/0 Not tainted 4.18.0-147.0.3.el8_1.x86_64 #1
[    5.041692] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019
[    5.041694] Call Trace:
[    5.057107] usb 3-1.4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    5.071913]  dump_stack+0x5c/0x80
[    5.119662]  panic+0xe7/0x247
[    5.129144]  mount_block_root+0x2c2/0x2e6
[    5.139688]  ? do_early_param+0x91/0x91
[    5.149814] hub 3-1.4:1.0: USB hub found
[    5.150103]  prepare_namespace+0x135/0x16b
[    5.150105]  kernel_init_freeable+0x22e/0x258
[    5.150109]  ? rest_init+0xaa/0xaa
[    5.160800] hub 3-1.4:1.0: 4 ports detected
[    5.170997]  kernel_init+0xa/0xff
[    5.171000]  ret_from_fork+0x22/0x40
[    5.181797] Kernel Offset: 0x25e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    5.236524] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
[    5.251953] ------------[ cut here ]------------
[    5.262314] sched: Unexpected reschedule of offline CPU#0!
[    5.273634] WARNING: CPU: 39 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x34/0x40
[    5.288805] Modules linked in:
[    5.297708] CPU: 39 PID: 1 Comm: swapper/0 Not tainted 4.18.0-147.0.3.el8_1.x86_64 #1
[    5.311513] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019
[    5.325189] RIP: 0010:native_smp_send_reschedule+0x34/0x40
[    5.336766] Code: 05 51 8b 3b 01 73 15 48 8b 05 e8 aa 0f 01 be fd 00 00 00 48 8b 40 30 e9 da 4e bb 00 89 fe 48 c7 c7 78 7f e7 a7 e8 c6 40 06 00 <0f> 0b c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
[    5.368263] RSP: 0018:ffff9491af443ec8 EFLAGS: 00010082
[    5.379918] RAX: 0000000000000000 RBX: ffff9489c6582f80 RCX: ffffffffa805a308
[    5.393621] RDX: 0000000000000001 RSI: 0000000000000092 RDI: 0000000000000046
[    5.407394] RBP: 0000000000000000 R08: 00000000000004f1 R09: 0000000000aaaaaa
[    5.421175] R10: 0000000000000000 R11: ffffaeba4eaff020 R12: ffffaeba40077cf8
[    5.434865] R13: ffff9491af45cf80 R14: ffffffffa6f460a0 R15: ffff9491af45d0b8
[    5.448484] FS:  0000000000000000(0000) GS:ffff9491af440000(0000) knlGS:0000000000000000
[    5.463102] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.475365] CR2: 0000000000000000 CR3: 00000002e680a000 CR4: 00000000003406e0
[    5.489044] Call Trace:
[    5.498073]  <IRQ>
[    5.506585]  update_process_times+0x4f/0x60
[    5.517339]  tick_sched_handle+0x22/0x60
[    5.527838]  tick_sched_timer+0x37/0x70
[    5.538185]  __hrtimer_run_queues+0x100/0x280
[    5.548979]  hrtimer_interrupt+0x100/0x220
[    5.559382]  smp_apic_timer_interrupt+0x6a/0x140
[    5.570209]  apic_timer_interrupt+0xf/0x20
[    5.580424]  </IRQ>
[    5.588502] RIP: 0010:panic+0x201/0x247
[    5.598266] Code: eb a6 83 3d 5f 1b 98 01 00 74 05 e8 08 5c 02 00 48 c7 c6 20 22 83 a8 48 c7 c7 a8 2a e8 a7 e8 c3 66 06 00 fb 66 0f 1f 44 00 00 <31> db e8 27 a9 0d 00 4c 39 eb 7c 1d 41 83 f4 01 48 8b 05 07 1b 98
[    5.629980] RSP: 0018:ffffaeba40077da8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[    5.644498] RAX: 000000000000005c RBX: ffff9491ae991000 RCX: ffffffffa805a308
[    5.658449] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000046
[    5.673138] RBP: ffffaeba40077e18 R08: 00000000000004ef R09: 0000000000aaaaaa
[    5.686902] R10: 0000000000000000 R11: ffffaeba4eaff020 R12: 0000000000000000
[    5.700814] R13: 0000000000000000 R14: ffffecb141ba6440 R15: 0000000000008001
[    5.714550]  ? panic+0x1fa/0x247
[    5.724356]  mount_block_root+0x2c2/0x2e6
[    5.734908]  ? do_early_param+0x91/0x91
[    5.745414]  prepare_namespace+0x135/0x16b
[    5.755968]  kernel_init_freeable+0x22e/0x258
[    5.766661]  ? rest_init+0xaa/0xaa
[    5.776840]  kernel_init+0xa/0xff
[    5.787685]  ret_from_fork+0x22/0x40
[    5.797056] ---[ end trace 06110e23a3e8843f ]---

Comment 1 Scott Dodson 2019-12-04 18:39:10 UTC
FWIW, same panic is observed on 4.2 on this Packet c2.medium instance types.

https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.2/42.80.20191002.0/rhcos-42.80.20191002.0-installer-kernel.... ok                           
https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.2/42.80.20191002.0/rhcos-42.80.20191002.0-installer-initramfs.img... ok 
[    0.000000] Linux version 4.18.0-80.11.2.el8_0.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.2.1 20180905 (Red Hat 8.2.1-3) (GCC)) #1 SMP Sun Sep 15 11:24:21 UTC 2019
[    0.000000] Command line: rhcos-42.80.20191002.0-installer-kernel console=tty0 console=ttyS1,115200n8 rd.neednet=1 coreos.inst=yes coreos.inst.image_url=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.2/42.
80.20191002.0/rhcos-42.80.20191002.0-metal-bios.raw.gz coreos.inst.install_dev=sda coreos.inst.skip_media_check coreos.inst.ignition_url=http://http-matchbox.svc.ci.openshift.org/ignition?cluster_id=ci-op-8q2g8gz2-c3585&role=bootstrap
...
[    4.904341] List of all partitions:
[    4.911657] hub 3-1.1:1.0: USB hub found
[    4.914693] No filesystem could mount root, tried: 
[    4.925566] 
[    4.925731] hub 3-1.1:1.0: 4 ports detected
[    4.937464] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    4.959238] random: fast init done
[    4.972273] CPU: 33 PID: 1 Comm: swapper/0 Not tainted 4.18.0-80.11.2.el8_0.x86_64 #1
[    4.997611] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.5.6 08/17/2018
[    5.012262] Call Trace:
[    5.021623]  dump_stack+0x5c/0x80
[    5.031665]  panic+0xe7/0x247
[    5.034008] usb 3-1.4: new high-speed USB device number 4 using xhci_hcd
[    5.041123]  mount_block_root+0x2bd/0x2e1
[    5.041127]  ? do_early_param+0x91/0x91
[    5.075184]  prepare_namespace+0x135/0x16b
[    5.085875]  kernel_init_freeable+0x22e/0x258
[    5.096720]  ? rest_init+0xaa/0xaa
[    5.106421]  kernel_init+0xa/0x106
[    5.115935]  ret_from_fork+0x22/0x40
[    5.125947] Kernel Offset: 0x26c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    5.142961] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
[    5.158701] ------------[ cut here ]------------
[    5.169484] sched: Unexpected reschedule of offline CPU#0!
[    5.181047] WARNING: CPU: 33 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x34/0x40
[    5.196355] Modules linked in:
[    5.205484] CPU: 33 PID: 1 Comm: swapper/0 Not tainted 4.18.0-80.11.2.el8_0.x86_64 #1
[    5.219621] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.5.6 08/17/2018
[    5.233490] RIP: 0010:native_smp_send_reschedule+0x34/0x40
[    5.245246] Code: 05 81 92 3b 01 73 15 48 8b 05 58 bf 10 01 be fd 00 00 00 48 8b 40 30 e9 7a 94 bb 00 89 fe 48 c7 c7 90 6c c8 a8 e8 76 29 06 00 <0f> 0b c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 83 ec 20
[    5.277099] RSP: 0018:ffff8d272f603ec8 EFLAGS: 00010082
[    5.288966] RAX: 0000000000000000 RBX: ffff8d1fc62097c0 RCX: ffffffffa8e59d68
[    5.303005] RDX: 0000000000000001 RSI: 0000000000000092 RDI: 0000000000000046
[    5.316992] RBP: 0000000000000000 R08: 00000000000004f3 R09: 0000000000aaaaaa
[    5.330855] R10: 0000000000000000 R11: ffffa5bfceaff020 R12: ffffa5bfc0073cf8
[    5.344672] R13: ffff8d272f61cf00 R14: ffffffffa7d3cc50 R15: ffff8d272f61d038
[    5.358402] FS:  0000000000000000(0000) GS:ffff8d272f600000(0000) knlGS:0000000000000000
[    5.373132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.385377] CR2: 0000000000000000 CR3: 000000044040a000 CR4: 00000000003406e0
[    5.399090] Call Trace:
[    5.408285]  <IRQ>
[    5.416642]  update_process_times+0x4f/0x60
[    5.427270]  tick_sched_handle+0x22/0x60
[    5.437620]  tick_sched_timer+0x37/0x70
[    5.447830]  __hrtimer_run_queues+0x100/0x280
[    5.458557]  hrtimer_interrupt+0x100/0x220
[    5.469046]  smp_apic_timer_interrupt+0x6a/0x130
[    5.480149]  apic_timer_interrupt+0xf/0x20
[    5.490505]  </IRQ>
[    5.498644] RIP: 0010:panic+0x201/0x247
[    5.508456] Code: eb a6 83 3d 5f b8 96 01 00 74 05 e8 28 51 02 00 48 c7 c6 20 62 61 a9 48 c7 c7 48 16 c9 a8 e8 d3 33 06 00 fb 66 0f 1f 44 00 00 <31> db e8 17 41 0d 00 4c 39 eb 7c 1d 41 83 f4 01 48 8b 05 07 b8 96
[    5.539849] RSP: 0018:ffffa5bfc0073da8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[    5.553907] RAX: 000000000000005c RBX: ffff8d2b2ec79000 RCX: ffffffffa8e59d68
[    5.567507] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000046
[    5.581075] RBP: ffffa5bfc0073e18 R08: 00000000000004f1 R09: 0000000000aaaaaa
[    5.594737] R10: 0000000000000000 R11: ffffa5bfceaff020 R12: 0000000000000000
[    5.608434] R13: 0000000000000000 R14: ffffe7abb1bb1e40 R15: 0000000000008001
[    5.624096]  ? panic+0x1fa/0x247
[    5.634559]  mount_block_root+0x2bd/0x2e1
[    5.645222]  ? do_early_param+0x91/0x91
[    5.656982]  prepare_namespace+0x135/0x16b
[    5.667569]  kernel_init_freeable+0x22e/0x258
[    5.678136]  ? rest_init+0xaa/0xaa
[    5.687595]  kernel_init+0xa/0x106
[    5.697269]  ret_from_fork+0x22/0x40
[    5.706651] ---[ end trace 2eb047d6dec0c26d ]---

Comment 3 Scott Dodson 2019-12-12 19:08:29 UTC
With the changes from https://github.com/openshift/installer/pull/2776 I no longer see this problem.