Bug 1326538

Summary: Introspection fails, drops to dracut prompt
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: openstack-ironic-discoverdAssignee: RHOS Maint <rhos-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Raviv Bar-Tal <rbartal>
Severity: high Docs Contact:
Priority: urgent    
Version: 8.0 (Liberty)CC: apevec, dtantsur, dyocum, ggillies, lhh, mburns, mcornea, rhel-osp-director-maint, sbaker
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-22 15:08:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
node rdsosreport.txt
none
journalctl output
none
node dmesg output
none
log file none

Description Dan Yocum 2016-04-13 01:30:40 UTC
Created attachment 1146689 [details]
node rdsosreport.txt

Description of problem:

After node boots using discovery ramdisk and kernel the system fails to complete introspection, dropping to the dracut# prompt.

Version-Release number of selected component (if applicable):

[stack@ops2 tftpboot]$ rpm -qa | grep 'ironic\|director-images' | sort
openstack-ironic-api-4.2.2-4.el7ost.noarch
openstack-ironic-common-4.2.2-4.el7ost.noarch
openstack-ironic-conductor-4.2.2-4.el7ost.noarch
openstack-ironic-inspector-2.2.5-2.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
python-ironic-inspector-client-1.2.0-6.el7ost.noarch
rhosp-director-images-8.0-20160401.2.el7ost.noarch
rhosp-director-images-ipa-8.0-20160401.2.el7ost.noarch


How reproducible:

every

Steps to Reproduce:
1. start introspection on a node
2. open a console on the node
3. wait for it to fail

Actual results:

dracut#

Expected results:

success!

Additional info:

rdsosreport.txt, journalctl.txt, dmesg.txt output attached

Comment 2 Dan Yocum 2016-04-13 01:31:28 UTC
Created attachment 1146690 [details]
journalctl output

Comment 3 Dan Yocum 2016-04-13 01:32:15 UTC
Created attachment 1146691 [details]
node dmesg output

Comment 4 Dan Yocum 2016-04-13 01:37:37 UTC
Hardware is Dell r630 and r730xd with BIOS 1.3.6, fw version 2.15.10.10. Intel CPUs and Intel(R) 2P X520/2P I350 rNDC	NIC.

Might be related to BZ https://bugzilla.redhat.com/show_bug.cgi?id=1316550.

Comment 5 Dmitry Tantsur 2016-04-18 08:04:05 UTC
There is a file called "log" (or logs) in the root of the ramdisk, could you please grab it as well?

also would be great to confirm if the problem is the same as in bug 1316550.

Comment 6 Dan Yocum 2016-04-18 19:24:34 UTC
This is currently blocked by this bug, now:

https://bugzilla.redhat.com/show_bug.cgi?id=1328237

When I can re-recreate the inspector.sqlite db schema, then yes, I will get the log file on ramdisk.

Comment 7 Dan Yocum 2016-04-19 00:12:53 UTC
Created attachment 1148292 [details]
log file

Comment 8 Steve Baker 2016-04-19 01:25:25 UTC
Is 01-ec-f4-bb-e7-06-cc em3 the expected BOOTIF interface? If not then you may benefit from a newer iPXE as per bug #1326086

Comment 9 Dan Yocum 2016-04-21 15:00:50 UTC
I can verify that the following ramdisk images allow introspection to complete successfully on Dell R630 and R730xd systems with Intel X520 i350 nics:

[root@ops2 ~]# rpm -qa | grep director-images
rhosp-director-images-ipa-8.0-20160415.1.el7ost.noarch
rhosp-director-images-8.0-20160415.1.el7ost.noarch

Additionally, I can verify that ipxe booting is now successfully working using the these boot roms:

[root@ops2 ~]# rpm -qa | grep ipxe
ipxe-bootimgs-20160127-1.git6366fa7a.el7.noarch
ipxe-roms-qemu-20160127-1.git6366fa7a.el7.noarch

Comment 10 Dmitry Tantsur 2016-04-22 12:04:57 UTC
Judging by common 9 I assume the problem got fixed by one of puddle updates, right? If so, please close this bug. Otherwise please clarify what you hit now.

Comment 11 Dan Yocum 2016-04-22 14:07:28 UTC
I'm using GA rpms, nothing from the puddles.

With that in mind, can you give a summary of what bash issue was preventing RHELv7.2 ramdisks from completing the introspection?

Comment 12 Dmitry Tantsur 2016-04-22 14:52:18 UTC
We've fixed a bunch of different problems in OSPd8 (some of them still hit us in OSPd7). I can't recall one big specific problem.

Comment 13 Dan Yocum 2016-04-22 15:08:23 UTC
Per our conversation and the lack of a concrete cause for the failure, and since the error has now been resolved, possibly due to changes in the upstream RHEL ramdisk, I'm closing this BZ.