| Summary: | rhel-osp-director: Node introspection fails and nodes gets to #DRACUT mode . (machine type: HP DL165 G7 with Intel X520 10G dual port NIC). | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Omri Hochman <ohochman> | ||||||||||||||||||
| Component: | rhosp-director | Assignee: | Angus Thomas <athomas> | ||||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Tzach Shefi <tshefi> | ||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||
| Priority: | high | ||||||||||||||||||||
| Version: | 7.0 (Kilo) | CC: | aschultz, bfournie, dbecker, dtantsur, dyocum, lmartins, mburns, mcornea, morazi, ohochman, rhel-osp-director-maint, sasha, sclewis, tshefi | ||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||
| Target Release: | 7.0 (Kilo) | ||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||
| Last Closed: | 2017-09-19 20:33:01 UTC | Type: | Bug | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
Omri Hochman
2016-03-10 13:07:54 UTC
Created attachment 1134889 [details]
dracut-screen-shot
Please looks for the exact error in the /run/initramds/rdsosreport.txt. Also if it's OSPd7, there should be a /logs file, please take a look at it as well. (all these files are on the ramdisk, not on the undercloud) Boot errors can vary a lot, it can missing drivers, network problems (since in the bash ramdisk the code runs as pid 1) and so on... There's a file that dracut generates once you drop there, can you see any hints about what happened when you run the command below? $ cat /run/initramfs/sosreport.txt Another thing you can do to try to figure out what went wrong is to append "rd.debug" to the kernel command line. If you do that the logs will go to journald and you can check it by running "journalctl -ab". If there's no systemd you can find the debug logs in the "dmesg" and into the file "/run/initramfs/init.log". You can add the "rd.debug" parameter to the command line by editing the /etc/ironic/ironic.conf file and appending the parameter there for the "pxe_append_params" config option, e.g: [pxe] pxe_append_params=nofb nomodeset vga=normal rd.debug Restart the openstack-ironic-conductor service after editing it to apply the changes and try to deploy again. Hope that helps, Lucas Created attachment 1134903 [details]
Dracut logs
Attached above requested logs plus others i found on the way. discovery-log ens1f0.log (provision eth) ens1f1.log log rdsosreport.txt Looking over these logs my self also. Maybe this no route to host, under log file line 1048
WARNING: log file /run/initramfs/rdsosreport.txt does not exist
ERROR: ('Connection aborted.', error(113, 'No route to host')) when calling to discoverd
///lib/dracut/hooks/pre-mount/50-init.sh@475(source): give_up 'Failed to discover hardware'
///lib/dracut/hooks/pre-mount/50-init.sh@144(give_up): log 'Failed to discover hardware'
///lib/dracut/hooks/pre-mount/50-init.sh@136(log): echo 'Failed to discover hardware'
Failed to discover hardware
///lib/dracut/hooks/pre-mount/50-init.sh@146(give_up): case "$ONFAILURE" in
///lib/dracut/hooks/pre-mount/50-init.sh@152(give_up): log 'ONFAILURE=console, launching an interactive shell'
///lib/dracut/hooks/pre-mount/50-init.sh@136(log): echo 'ONFAILURE=console, launching an interactive shell'
ONFAILURE=console, launching an interactive shell
Created attachment 1134933 [details]
Discovery fails
Shooting in the dark here, could AMD CPU cause this issue? Asking as both of my effected server having this issue are AMD based, all the other servers Intel based and don't exhibit this error.
On line 2691 of rdsosreport.txt:
[ 42.717425] localhost mcelog[804]: ERROR: AMD Processor family 21: mcelog does not support this processor. Please use the edac_mce_amd module instead.
: No such file or directory
[ 48.242185] localhost lldpad[1155]: config file failed to load,
Started reading about mcelog, found these links:
https://access.redhat.com/solutions/158503
https://bugzilla.redhat.com/show_bug.cgi?id=1166978
(In reply to Tzach Shefi from comment #9) > Shooting in the dark here, could AMD CPU cause this issue? Asking as both of > my effected server having this issue are AMD based, all the other servers > Intel based and don't exhibit this error. > > On line 2691 of rdsosreport.txt: > [ 42.717425] localhost mcelog[804]: ERROR: AMD Processor family 21: mcelog > does not support this processor. Please use the edac_mce_amd module instead. > : No such file or directory > [ 48.242185] localhost lldpad[1155]: config file failed to load, > > Started reading about mcelog, found these links: > https://access.redhat.com/solutions/158503 > https://bugzilla.redhat.com/show_bug.cgi?id=1166978 That's interesting... While I don't think that error particularly would cause the node boot to fail but, the fact that the AMD machines are not working and the Intel ones are; which makes me think about the version of the kernel used for the deployment. I believe the host OS we use for the distributed images is RHEL right? Can you take a look at the kernel version it is using please? Also, I think it would worth to try to create an image based on a distro with a newer kernel and see if that works, so that we can isolate the problem. Can you generate a deploy ramdisk/kernel with fedora and see if that works please? You can use the command below to create the image: $ ramdisk-image-create -o fedora-deploy fedora deploy-ironic dracut-ramdisk Once it's create we need to upload it to Glance: $ glance image-create --name my-kernel --is-public True --disk-format aki --container-format aki < fedora-deploy.vmlinuz $ glance image-create --name my-image.initrd --is-public True --disk-format ari --container-format ari < fedora-deploy.initrd And now set the new image to the Ironic nodes: ironic node update <node uuid or name> add driver_info/deploy_ramdisk=<glance uuid> driver_info/deploy_kernel=<glance uuid> Start the deployment again to see if that works. uname -a under dracut returns: Linux localhost 3.10.0-327.10.1.el7d /x86_64 #1 SMP Sat Jan23 ... We created fedora based discovery images: export ELEMENTS_PATH=/usr/share/instack-undercloud:/usr/share/diskimage-builder/elements:/usr/share/tripleo-image-elements export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/f22/current" export DELOREAN_REPO_URL=$DELOREAN_TRUNK_REPO disk-image-create -a amd64 -o fedora-discover fedora ironic-agent delorean-repo -p python-hardware-detect 2>&1 | tee disk1.log Then replaced files under /httpboot/ sudo mv fedora-discover.initramfs /httpboot/discovery.ramdisk sudo mv fedora-discover.vmlinuz /httpboot/discovery.kernel Chown ironic:ironic on both files setenforce0 to not mess with SE. Introspect AMD node, this time it didn't get stuck in dracut, after a while I got login screen. Fedora release 21.. Kernel 4.1.13-100.. Localhost login: when I checked introspection status: [stack@undercloud72 ~]$ openstack baremetal introspection status 13b102df-8ccb-42e8-abf0-7eda3f48181a +----------+-------+ | Field | Value | +----------+-------+ | error | None | | finished | False | +----------+-------+ Gave it 15-20 minuets, same status doesn't look like it's going to get finished=true. (In reply to Tzach Shefi from comment #12) > uname -a under dracut returns: > > Linux localhost 3.10.0-327.10.1.el7d /x86_64 #1 SMP Sat Jan23 ... > > We created fedora based discovery images: > export > ELEMENTS_PATH=/usr/share/instack-undercloud:/usr/share/diskimage-builder/ > elements:/usr/share/tripleo-image-elements > export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/f22/current" > export DELOREAN_REPO_URL=$DELOREAN_TRUNK_REPO > > disk-image-create -a amd64 -o fedora-discover fedora ironic-agent > delorean-repo -p python-hardware-detect 2>&1 | tee disk1.log > Oh, so you don't see to be using the discover element to create the ramdisk. The "ironic-agent" works for inspection but only on OSP 8.0+ which uses the IPA ramdisk for deploy and inspection. For 7.0 you have to use the ironic-discoverd-ramdisk-instack[0] element when creating the image. [0] https://github.com/rdo-management/instack-undercloud/tree/master/elements/ironic-discoverd-ramdisk-instack I see 2 problems in this report:
1. ERROR: ('Connection aborted.', error(113, 'No route to host')) when calling to discoverd
2. "No node found for MAC blah-blah".
Which one are we debugging right now? I don't think mcelog is somehow involved here.
Adding info I have deployed OPSD 7.3 on of the AMD nodes, created rhel images on it and tested introspection of second AMD node - same dracut error. So we know this is a discovery image issue with AMD CPUS. Trying to make fedora based images on same OSPD, not having too much luck creating them. I still see no direct link to the CPU manufacturer. Seems like the case of "no route to host" error we saw on some configurations. Just dumping my findings in the logs: [ 0.000000] localhost kernel: Command line: discoverd_callback_url=http://10.35.20.1:5050/v1/continue RUNBENCH=0 ip=10.35.20.29:10.35.20.1:10.35.20.1:255.255.255.0 BOOTIF=a0:36:9f:22:e8:78 ///lib/dracut/hooks/pre-mount/50-init.sh@465(source): ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens1f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether a0:36:9f:22:e8:78 brd ff:ff:ff:ff:ff:ff inet 10.35.20.29 peer 10.35.20.1/32 scope global ens1f0 valid_lft forever preferred_lft forever inet 10.35.20.29/24 brd 10.35.20.255 scope global dynamic ens1f0 valid_lft 106sec preferred_lft 106sec 3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether a0:36:9f:22:e8:7a brd ff:ff:ff:ff:ff:ff inet6 fe80::a236:9fff:fe22:e87a/64 scope link valid_lft forever preferred_lft forever ///lib/dracut/hooks/pre-mount/50-init.sh@473(source): ironic-discoverd-ramdisk --use-hardware-detect --bootif a0:36:9f:22:e8:78 -L /run/initramfs/rdsosreport.txt -L /log http://10.35.20.1:5050/v1/continue ERROR: ('Connection aborted.', error(113, 'No route to host')) when calling to discoverd 2016-03-10 13:46:36,529 INFO: ironic-discoverd-ramdisk: posting collected data to http://10.35.20.1:5050/v1/continue 2016-03-10 13:46:36,550 INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): 10.35.20.1 2016-03-10 13:46:54,585 ERROR: ironic-discoverd-ramdisk: ('Connection aborted.', error(113, 'No route to host')) when calling to discoverd Lets assume its a network issue, no route to host, then I shouldn't be able to ping undercloud from node and vise versa right? Also notice this on ens1fo -> mq state DOWN qlen 1000 Yet when I check under node in dracut mode, ip a -> ens1f0 is up with ip address x.y.20.29 So logs might show an issue, but it's either incorrect or none recent. I'm adding logs from latest attempt which i did on same node, same network switches, just installed OSPD on one of the AMD nodes. Created attachment 1137056 [details]
Dracut logs
These lines surprise me:
inet 10.35.20.29 peer 10.35.20.1/32 scope global ens1f0
valid_lft forever preferred_lft forever
inet 10.35.20.29/24 brd 10.35.20.255 scope global dynamic ens1f0
valid_lft 106sec preferred_lft 106sec
Maybe that's how dracut is expected to work, but dunno. Could you please try editing /httpboot/discoverd.ipxe by hand, removing "ip=${ip}:${next-server}:${gateway}:${netmask}" parameter. Then restart the introspection. If it fails, please also provide the ramdisk logs.
Tested this tip as well:
Modify your /httpboot/discoverd.ipxe and remove "ip=${ip}:${next-server}:${gateway}:${netmask}" parameter from your command line?
Restart discovery service attempt to introspect same dracut issue.
Please provide logs for this issue. I need to see how different it was this time. Created attachment 1137097 [details]
Dracut logs after modifing ipxe paramaters
These are the logs, after changing below plus restarting ironic services.
/httpboot/discoverd.ipxe and remove "ip=${ip}:${next-server}:${gateway}:${netmask}"
I still see ip=10.35.20.29:10.35.20.1:10.35.20.1:255.255.255.0 in the logs. Could you please paste your discoverd.ipxe? [stack@localhost httpboot]$ cat discoverd.ipxe #!ipxe dhcp kernel http://10.35.20.1:8088/discovery.kernel discoverd_callback_url=http://10.35.20.1:5050/v1/continue RUNBENCH=0 BOOTIF=${mac} initrd http://10.35.20.1:8088/discovery.ramdisk boot That's weird, I wonder how this "ip" variable is set then. Did you change any PXE/iPXE configuration previously? Could you check the httpd logs to ensure that it indeed serves the modified file? Created attachment 1138187 [details] http logs Tigris01 Remember we first time we hit this issue OPSD was installed on seal13 (intel) server, on which we also changed ipxe/pxe which didn't help. Then we build a fedora discovery image, which booted up OK, but reached log-in prompt without doing any discovery. To further debug issue, I took Tigris01 (AMD) server installed same OSPD on it as Seal13. Build images on it and then ran introspection of Tigris02 (AMD). Figuring that building images on same AMD hardware might help, it didn't we are stuck on same issue. To answer your question on recent OPSD (tigris01) all I'd changed was discoverd.ipxe as listed on comment #26. Attached http logs from tigris01. Hi @Tzach, Since the bug is now targeting OSP8 can we re-testing this bug on it? Hi Lucas, installed OPSD8 today, version below: openstack-tripleo-0.0.7-1.el7ost.noarch openstack-tripleo-common-0.3.0-3.el7ost.noarch openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch openstack-tripleo-image-elements-0.9.9-1.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.12-2.el7ost.noarch python-tripleoclient-0.3.1-1.el7ost.noarch openstack-tripleo-heat-templates-0.8.12-2.el7ost.noarch Installed OSPD8 on Tigris02 server (AMD), attempted to introspect Tigris01 (AMD) and Gizmo (Intel) server. This way I can update for both versions, if Tigris01 is booted from disk check OPSD7 if Tigris02 is booted from disk OSPD8. Don't worry I fixed boot order and instack.json files as needed. Any way during introspection process I hit a new bug https://bugzilla.redhat.com/show_bug.cgi?id=1320962 Can't say for if OPSD8 resolved this current AMD issue or not, as I don't finish introspection. On the new bug I had posted a screenshot of node it happens to be Gizmo but Tigris01 looks the same, if this step is past our DRACUT point than OPSD8 might have resolved this bug. Without completing introspection I can't say for sure. (In reply to Tzach Shefi from comment #29) > Hi Lucas, installed OPSD8 today, version below: > openstack-tripleo-0.0.7-1.el7ost.noarch > openstack-tripleo-common-0.3.0-3.el7ost.noarch > openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch > openstack-tripleo-image-elements-0.9.9-1.el7ost.noarch > openstack-tripleo-heat-templates-kilo-0.8.12-2.el7ost.noarch > python-tripleoclient-0.3.1-1.el7ost.noarch > openstack-tripleo-heat-templates-0.8.12-2.el7ost.noarch > Can you check the version of the ironic-python-agent package as well? > Installed OSPD8 on Tigris02 server (AMD), attempted to introspect Tigris01 > (AMD) and Gizmo (Intel) server. > This way I can update for both versions, if Tigris01 is booted from disk > check OPSD7 if Tigris02 is booted from disk OSPD8. Don't worry I fixed boot > order and instack.json files as needed. > > Any way during introspection process I hit a new bug > https://bugzilla.redhat.com/show_bug.cgi?id=1320962 > This new error seems very similar to the one described in https://bugzilla.redhat.com/show_bug.cgi?id=1308981 The fix for that was merged downstream yesterday, I wonder if that would also fix this new error you are seem there. Ironic version, forgot to mention. openstack-ironic-conductor-4.2.2-4.el7ost.noarch openstack-ironic-api-4.2.2-4.el7ost.noarch openstack-ironic-common-4.2.2-4.el7ost.noarch openstack-ironic-inspector-2.2.5-1.el7ost.noarch python-ironic-inspector-client-1.2.0-6.el7ost.noarch python-ironicclient-0.8.1-1.el7ost.noarch Looking at bz130981 says fixed in: ironic-python-agent-doc-1.1.0-8.el7ost.noarch.rpm I don't have any such competent installed, is this normal? So, the bug you mention seems to be swift-related. Could you please temporary disable storing introspection data in swift? Set "store_data" to "none" in /etc/ironic-inspector/inspector.conf, then restart openstack-ironic-inspector, then retry. Confirming that this bug affects OSPd8 would help a lot, as the ramdisk in OSPd8 is much easier to debug. Hmm, ignore me. The swift failure happens at a much later stage, when data is already received from the ramdisk. So OSPd8 does not seem to be affected by this bug. Do you plan on getting back to OSPd7? So I ran a fresh install on Tigris02(AMD) failed to introspect openstack-ironic-conductor-4.2.2-4.el7ost.noarch openstack-ironic-api-4.2.2-4.el7ost.noarch openstack-ironic-common-4.2.2-4.el7ost.noarch openstack-ironic-inspector-2.2.5-2.el7ost.noarch python-ironic-inspector-client-1.2.0-6.el7ost.noarch python-ironicclient-0.8.1-1.el7ost.noarch openstack-tripleo-common-0.3.1-1.el7ost.noarch openstack-tripleo-0.0.7-1.el7ost.noarch openstack-tripleo-puppet-elements-0.0.5-1.el7ost.noarch openstack-tripleo-image-elements-0.9.9-1.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-1.el7ost.noarch python-tripleoclient-0.3.4-1.el7ost.noarch openstack-tripleo-heat-templates-0.8.14-1.el7ost.noarch Introspection didn't finish, again it looks again bug 1320962 [stack@localhost ~]$ openstack baremetal introspection bulk start Setting nodes for introspection to manageable... Starting introspection of node: a3674785-5e3a-4b27-a898-626e42210118 Starting introspection of node: b05cff45-5c2f-4694-b166-4d3b25feedcb Waiting for introspection to finish... Introspection for UUID b05cff45-5c2f-4694-b166-4d3b25feedcb finished with error: Unexpected exception ConnectionError during processing: ('Connection aborted.', error(111, 'ECONNREFUSED')) Introspection didn't finish for nodes a3674785-5e3a-4b27-a898-626e42210118 Setting manageable nodes to available... Introspection completed with errors: b05cff45-5c2f-4694-b166-4d3b25feedcb: Unexpected exception ConnectionError during processing: ('Connection aborted.', error(111, 'ECONNREFUSED')) [stack@localhost ~]$ Write failed: Broken pipe Noticed some swift services were down, restarted them, deleted ironic nodes. Added name amdcpu on instack.json file imported again and restart introspection. Now it got something else, it reported finished but had errors http://pastebin.test.redhat.com/361236 Dmitry let me know how you wish to proceed, should I try without using swift as source of introspection images, or just reboot undercloud and retry. BTW I'd only used AMD node as my Intel one (the second node) is down due to HW issues. Created attachment 1142082 [details]
Logs for #34
I've now updated ironic-inspector ("store_data" to "none") as per #32.
Deleted node re-imported it, the output looks the same pastebin from #34
I've saved it in a new pastebin just in case
http://pastebin.test.redhat.com/361246
Node status is available, but what about these errors on the way?
These errors are actually warnings, dunno why they get displayed to you. Actually introspection finished successfully for you. Mind reporting a new bugzilla for these scary warnings? And lets get back to OSPd7 if you feel like, as seems like OSPd8 is not affected. Opened bug for OPSD8 and it's warnings: https://bugzilla.redhat.com/show_bug.cgi?id=1323444 I'll reinstall OSPD7 and report once I've got it up. Installed from scratch OPSD7, it's looking better I sent it to introspect the same an AMD hardware, this time it worked no DRACUT, gave some warnings see below: release 7-director -p 2016-03-09.1 rhos-release 7 -p 2016-03-24.2 Do we still need deployment? Can i reuse hardware or do you wish to further debug/check this? [stack@localhost ~]$ openstack baremetal introspection bulk start Setting available nodes to manageable... Starting introspection of node: 245baee0-f4c9-4385-87e4-006cc5fc2dd5 Waiting for discovery to finish... Discovery for UUID 245baee0-f4c9-4385-87e4-006cc5fc2dd5 finished successfully. Setting manageable nodes to available... WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 1 of 61 WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 2 of 61 WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 3 of 61 WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 4 of 61 WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 5 of 61 WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 61 WARNING: ironicclient.common.http Request returned failure status. WARNING: ironicclient.common.http Error contacting Ironic server: Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 7 of 61 Node 245baee0-f4c9-4385-87e4-006cc5fc2dd5 has been set to available. Discovery completed. Created attachment 1143350 [details]
Logs for #39
Disable pxe boot on all NICs that are not on the provisioning network, and try again. ggillies discovered this on the OS1 Public Prime cloud a while back where I've got the Intel X520 cards, too. Hmmm... my previous comment may be a red herring (but it certainly can't hurt). Tzach, see this bug regarding the "locked by host" errors: https://bugzilla.redhat.com/show_bug.cgi?id=1232997 Supposedly this was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1233452, but it's rearing its ugly head, again - I'm experiencing this error, too. I'm PXE booting off a 1G nic for this not the IntelX520 (had done so on #39 as well). Don't want to disable PXE booting on Intel 10G nic it's a pain in the butt to enable/disable it Intel's boot util. Plus I need it to PXE from 10G nic once done with this bug. I did however disconnect 10G cables this time, introspected again, same results as comment #39. Not sure disconnecting cables is 100% equivalent to disabling PXE on 10Gs but it easy to do/undo. Let me know if I should check/test anything else. > I did however disconnect 10G cables this time, introspected again, same results as comment #39
So, does it mean that with the 2nd NIC disabled, introspection actually works, but gives away scary warnings?
I can verify that the following ramdisk images allow introspection to complete successfully on Dell R630 and R730xd systems with Intel X520 i350 nics: [root@ops2 ~]# rpm -qa | grep director-images rhosp-director-images-ipa-8.0-20160415.1.el7ost.noarch rhosp-director-images-8.0-20160415.1.el7ost.noarch I can verify that the following ramdisk images allow introspection to complete successfully on Dell R630 and R730xd systems with Intel X520 i350 nics: [root@ops2 ~]# rpm -qa | grep director-images rhosp-director-images-ipa-8.0-20160415.1.el7ost.noarch rhosp-director-images-8.0-20160415.1.el7ost.noarch Ok, we can assume OSPd8 works. What about OSPd7? I'm trying to clarify where we are now. So introspection does work if only one NIC is left enabled, right? Or only if non-X520 NIC is enabled? OSPd7 introspected OK with onboard 1G NIC PXE enabled. The 10G Intel X520's cable was physically disconnected during process (but not PXE disabled). I'm guessing a discontented 10G NIC is equivalent to PXE disabling it. So yes introspection worked with only one NIC PXE enabled. Yet it still spat out warnings #39, but completed OK. As it appears there is a workaround for OSP-7, we'll close this one unless there are objections. |