Bug 1699832

Summary: nodes in ironic still point to older ram and kernel after images update in glance
Product: Red Hat OpenStack Reporter: Nilesh <nchandek>
Component: python-tripleoclientAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Sasha Smolyak <ssmolyak>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: bfournie, cylopez, hbrock, ipetrova, jslagle, lmarsh, mburns, rhos-maint, sasha, schhabdi, srevivo
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1572379 Environment:
Last Closed: 2019-07-01 18:17:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1572379    
Bug Blocks:    

Comment 2 Bob Fournier 2019-04-15 13:11:47 UTC
Please check the original bug - https://bugzilla.redhat.com/show_bug.cgi?id=1572379.  The fact the nodes are not not updated when the image changes is by design.  The fix for the original bug is simply to print an informational message, not update the nodes to use the new image. 

The case seems to be indicating that the nodes should be updated but that will not change, the nodes will still point to the older image.

Comment 3 Bob Fournier 2019-04-15 13:16:26 UTC
Note also that to update that image that the node is using the "openstack overcloud node configure command" should be rerun.

Comment 6 Bob Fournier 2019-04-17 11:13:34 UTC
Does it give the same error about the incorrectly configured driver - Node ... has an incorrectly configured driver_info/deploy_kernel. Expected .. but got ... ??
Can you provide the error message for this node?

Comment 9 Bob Fournier 2019-05-06 18:22:14 UTC
Will have mistral team look at this particular error for "openstack overcloud node configure" in OSP-10, its strange that the error only happens on one system on others.

As the real issue is just that the nodes needs to be updated to use the new images, recommend deleting and reading the nodes on this system.

Comment 10 Bob Fournier 2019-05-06 18:41:51 UTC
Ah I remember now, the issue is this - https://bugs.launchpad.net/tripleo/+bug/1710717.  Can you try running the "openstack overcloud node configure" command with the names instead of the IDs? i.e.:

openstack overcloud node configure --deploy-kernel bm-deploy-kernel --deploy-ramdisk bm-deploy-ramdisk 

I don't believe the bug fix to use the IDs has been backported to OSP-10.

Comment 11 Cyril Lopez 2019-05-07 12:36:38 UTC
Hello,

This is not a bug for me, just when baremetal node have Provisioning State as available, openstack overcloud node configure will not do the change.

openstack baremetal node show mynode | grep driver_info
driver_info            | {u'ipmi_port': u'6233', u'ipmi_username': u'admin', u'deploy_kernel': u'e5875dc0-a152-4e29-9a04-a96fd0a22aac', u'ipmi_address': u'192.168.1.254', u'deploy_ramdisk'
: u'16fc51c1-3206-4b77-a9a7-0fa5e6a77a83', u'ipmi_password': u'******'} |

openstack image list
+--------------------------------------+------------------------+--------+
| ID                                   | Name                   | Status |
+--------------------------------------+------------------------+--------+
| 2f26aba1-e2b1-4b2c-82a4-ac4fa23218dd | bm-deploy-kernel       | active |
| 9185cc42-e304-460f-950a-13e6b8342257 | bm-deploy-ramdisk      | active |
| ce381686-6e43-4896-acd5-5a8bdd1fd5e7 | overcloud-full         | active |
| b28d3e29-c0f7-4ff7-82ce-e67a6a048041 | overcloud-full-initrd  | active |
| aa6c10cb-5caa-4e70-9f89-1a89401e14f5 | overcloud-full-vmlinuz | active |
+--------------------------------------+------------------------+--------+

IDs are not the right, so to solve this :
openstack baremetal node manage mynode
openstack overcloud node configure --all-manageable
openstack baremetal node provide mynode

Check :
openstack baremetal node show mynode | grep driver_info 
| driver_info            | {u'ipmi_port': u'6230', u'ipmi_username': u'admin', u'deploy_kernel': u'2f26aba1-e2b1-4b2c-82a4-ac4fa23218dd', u'ipmi_address': u'192.168.1.254', u'deploy_ramdisk'
: u'9185cc42-e304-460f-950a-13e6b8342257', u'ipmi_password': u'******'} |

We will write a KCS for this.

Comment 12 Bob Fournier 2019-05-09 16:08:42 UTC
Nilesh - please try using the names instead of ID in the openstack overcloud node configure and verify the nodes use the proper images.  Also, as Cyril noted, the nodes must be manageable to run that command.

Comment 13 Bob Fournier 2019-05-22 20:21:44 UTC
Any update on this?

Comment 14 Bob Fournier 2019-05-23 15:31:04 UTC
Closing this (the case is also closed) as its expected that the nodes will be need to be updated when the images are changed, this is by design.

When using the "openstack overcloud node configure" the node name needs to be used instead of node ID.  This is fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1452979 for OSP-13 but not planned to backport to OSP-10.

Comment 15 Irina Petrova 2019-06-28 11:35:47 UTC
Not OK, here's the context.

This happens (as reported by the customer) on a Minor update [1], i.e. these are already deployed nodes (all of them):
## during a Minor update, we do...

* upgrade of the UC
* reboot of the UC
* update of the OC (overcloud) images
* update of the OC *plan* (`openstack overcloud deploy --update-plan-only ...`)  
* update of the OC *stack* (`openstack overcloud update stack -i overcloud`)     << here, on this step:

[stack@... ~]$ ./deploy.sh
Node uuid=e7e66eed-b11e-4b2a-a5a7-154ba0a139d9 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=e7e66eed-b11e-4b2a-a5a7-154ba0a139d9 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=9543948d-2760-4070-8357-cf07bab9d0cf has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=9543948d-2760-4070-8357-cf07bab9d0cf has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=84830735-519a-4c97-81bf-cb0eaf23de68 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=84830735-519a-4c97-81bf-cb0eaf23de68 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=a37cec19-e050-4046-8b15-a4e4c57f7cc9 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=a37cec19-e050-4046-8b15-a4e4c57f7cc9 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=b985a772-2b49-40a7-aa72-056c0d5278f5 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=b985a772-2b49-40a7-aa72-056c0d5278f5 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=9e30a755-4001-4c2a-935d-96c9362e67cd has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=9e30a755-4001-4c2a-935d-96c9362e67cd has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=872b71a3-2085-4534-916b-c60e96c5618b has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=872b71a3-2085-4534-916b-c60e96c5618b has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=ec06c8dd-e395-4537-947d-8b45084f54d3 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=ec06c8dd-e395-4537-947d-8b45084f54d3 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=0215ed7b-24c5-4dca-a645-e454d9cbe64c has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=0215ed7b-24c5-4dca-a645-e454d9cbe64c has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=c166ea31-f810-4e54-bc2a-fd108df67bc9 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=c166ea31-f810-4e54-bc2a-fd108df67bc9 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=afa26bbd-35b3-4e55-a5ee-c3f795d70a3e has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=afa26bbd-35b3-4e55-a5ee-c3f795d70a3e has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Node uuid=1e4d7b46-741f-4769-805d-97e8b96dc192 has an incorrectly configured driver_info/deploy_ramdisk. Expected "b2bbfd98-93d7-432f-bc85-152c50fb1e72" but got "cbaf0cb7-6963-4e3d-8835-b3ff05c1acbf".
Node uuid=1e4d7b46-741f-4769-805d-97e8b96dc192 has an incorrectly configured driver_info/deploy_kernel. Expected "7411033b-725a-4de6-a175-eee74b4d59e2" but got "b03810d2-d262-475b-bee6-28b5168299d7".
Configuration has 24 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.

real    0m34.861s
user    0m0.801s
sys     0m0.316s
[stack@... ~]$

IMO, this is a mistral bug: why would a MINOR UPDATE fail because Director cannot push the new kernel/ramdisk images to the already-deployed nodes? Especially since by design (i.e. explicitly!) we don't want to do that?

Additionally, it seems the problem is workaround-ed by pushing `openstack baremetal configure boot` on image upload, i.e. :
~~~
$ openstack overcloud image upload --update-existing --image-path /home/stack/images/
$ openstack baremetal configure boot
~~~

But we're deprecating that (command), aren't we?


I'm re-opening the BZ for clarification.


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/upgrading_red_hat_openstack_platform/index#sect-Updating_the_Environment

Comment 17 Bob Fournier 2019-07-01 13:36:23 UTC
To clarify:
1. When the images are updated the nodes must be updated with new image via the "openstack baremetal configure" with the images names.  From the case it looks like when they do this it succeeds.
2. This command has NOT been deprecated.  In later releases the "overcloud" syntax is used instead of "baremetal" as with other commands but the command is still there.

In OSP-13+
(undercloud) [stack@host01 ~]$ openstack overcloud node configure --help
usage: openstack overcloud node configure [-h] [--all-manageable]
                                          [--deploy-kernel DEPLOY_KERNEL]
                                          [--deploy-ramdisk DEPLOY_RAMDISK]
                                          [--instance-boot-option {local,netboot}]
                                          [--root-device ROOT_DEVICE]
                                          [--root-device-minimum-size ROOT_DEVICE_MINIMUM_SIZE]
                                          [--overwrite-root-device-hints]
                                          [<node_uuid> [<node_uuid> ...]]

Comment 18 Irina Petrova 2019-07-01 18:17:02 UTC
Thanks, Bob. Re-closing the BZ.