Bug 1381699

Summary: [RFE] Ability to provide kernel boot parameters for compute nodes
Product: Red Hat OpenStack Reporter: bigswitch <rhosp-bugs-internal>
Component: rhosp-directorAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED WONTFIX QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: atelang, bfournie, dbecker, dtantsur, fbaudin, mburns, morazi, rhel-osp-director-maint, skramaja, vchundur, yrachman
Target Milestone: ---Keywords: FutureFeature
Target Release: ---Flags: vchundur: needinfo+
vchundur: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 14:24:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description bigswitch 2016-10-04 18:45:29 UTC
Description of problem:
We need special kernel boot parameters, eg: 'isolcpus=X-Y hugepagesz=1G iommu=pt intel_iommu=on', to be installed on the compute nodes for the proper functioning of our NFVSwitch compute nodes. Without these, the NFVSwitch fails to start and the overcloud deployment fails as the API/Storage/etc management interfaces are attached to the vswitch, which failed to start.

We want the ability to specify these kernel-boot parameters during the overcloud deployment stage to have the compute nodes boot with them.

Comment 1 Mike Burns 2016-10-05 18:12:21 UTC
Dmitry, can you comment on this?

Comment 2 Dmitry Tantsur 2016-10-06 08:55:09 UTC
Hi! We've got an RFE against Ironic for that, but unfortunately, it was rejected upstream. We need to find ways of doing that outside of Ironic. Two immediate options come to my mind: 1. bake arguments into images (pretty bad UX), 2. update kernel arguments with a post-deployment script, then reboot (longer).

Comment 3 Saravanan KR 2016-10-06 09:13:10 UTC
We faced the same issue while working on SR-IOV and DPDK integration with TripleO. The proposed ironic spec was not well received [1] in the community. We ended on writing first-boot (user data) script [2] in THT, to provide the kernel args and reboot the node. Our criteria was the kernel args has to be updated before os-net-config starts. Detailed information is available on mailing list[3].

We had to do lot of workarounds in the first-boot script to achieve it, which is not a cleaner solution but works.

[1] https://review.openstack.org/#/c/331564/
[2] https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c
[3] http://lists.openstack.org/pipermail/openstack-dev/2016-September/104168.html

Comment 4 bigswitch 2016-10-06 18:00:01 UTC
(In reply to Dmitry Tantsur from comment #2)
> Hi! We've got an RFE against Ironic for that, but unfortunately, it was
> rejected upstream. We need to find ways of doing that outside of Ironic.

Do you think the RFE will see light in Newton/Ocata release? Or has it been discarded altogether?

Comment 5 bigswitch 2016-10-06 18:04:07 UTC
(In reply to Saravanan KR from comment #3)
> We faced the same issue while working on SR-IOV and DPDK integration with
> TripleO. The proposed ironic spec was not well received [1] in the
> community. We ended on writing first-boot (user data) script [2] in THT, to
> provide the kernel args and reboot the node. Our criteria was the kernel
> args has to be updated before os-net-config starts. Detailed information is
> available on mailing list[3].
> 
> We had to do lot of workarounds in the first-boot script to achieve it,
> which is not a cleaner solution but works.
> 
> [1] https://review.openstack.org/#/c/331564/
> [2] https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c
> [3]
> http://lists.openstack.org/pipermail/openstack-dev/2016-September/104168.html

Thank you. We also tried going the first-boot way, but were unable to differentiate between the Controller and Compute node (we want the kernel boot-params to be applied only to the Computes). Could you help with decrypting the HOSTNAME for the nodes. i.e. what is the expected pattern of the HOSTNAME on Compute vs that on the Controller nodes during the first-boot process?

Comment 6 Saravanan KR 2016-10-07 05:49:10 UTC
(In reply to bigswitch from comment #5)
> (In reply to Saravanan KR from comment #3)

> > [1] https://review.openstack.org/#/c/331564/
> > [2] https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c
> > [3]
> > http://lists.openstack.org/pipermail/openstack-dev/2016-September/104168.html
> 
> Thank you. We also tried going the first-boot way, but were unable to
> differentiate between the Controller and Compute node (we want the kernel
> boot-params to be applied only to the Computes). Could you help with
> decrypting the HOSTNAME for the nodes. i.e. what is the expected pattern of
> the HOSTNAME on Compute vs that on the Controller nodes during the
> first-boot process?

Compute name, if not provided as an input externally, then the default hostname format is defined in [4] (as per stable/mitaka branch), which is "%stackname%-novacompute-%index%". Here %stackname% will be replaced with the stack name, which generally is overcloud and %index% will be replaced with the count of this compute node in the order to deployment. But the string "novacompute" will be fixed string. You can evaluate this string with the hostname during the first-boot script to apply only to the compute node. Which is what i have done in the gist [2] at line number 38, where i am comparing with hostname as dpdkd (in your case, it should be novacompute.

Note there is a chance to override the default value mentioned in [4] ComputeHostnameFormat, with environment file , which can be taken as input to the first-boot script to compare, if set. In the gist [2] refer to the line 16, where i have take the hostname format as ComputeDpdkHostnameFormat, for your case, it should be ComputeHostnameFormat. 

Also the script has some workaround for reboot logic. Please refer to the comments on the gist [1].

[4] https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/overcloud.yaml#L838

Comment 7 bigswitch 2016-10-07 06:07:04 UTC
> Compute name, if not provided as an input externally, then the default
> hostname format is defined in [4] (as per stable/mitaka branch), which is
> "%stackname%-novacompute-%index%".

Thank you Saravanan for your input. This should help us with a temporary workaround to get things moving.

Comment 8 Dmitry Tantsur 2016-10-07 10:03:51 UTC
> Do you think the RFE will see light in Newton/Ocata release? Or has it been discarded altogether?

It was discarded. The reason is that changing kernel arguments is highly OS-specific (there is not only RHEL upstream, of course), and crosses the border of user instances configuration, which Ironic tries not to cross

That being said, there is some effort to land "deploy steps" framework in Ironic in Ocata. It might potentially (no promises here) allow extending what happens during deployment. If it happens, it may give us some freedom in what exactly we do during deployment. But this is very vague for now.

Comment 10 Mike Burns 2016-11-04 17:16:36 UTC
Franck,

Is there a blueprint or spec for this?

Comment 11 Franck Baudin 2016-11-10 12:51:02 UTC
No, there isn't, as of now the ironic feature "deploy steps", which dtantsur mentioned in the BZ, is still in very earlier stage. At this point, we have no technical solution defined, so no BP.

Comment 13 Yariv 2016-11-24 12:49:50 UTC
(In reply to Saravanan KR from comment #6)
> (In reply to bigswitch from comment #5)
> > (In reply to Saravanan KR from comment #3)
> 
> > > [1] https://review.openstack.org/#/c/331564/
> > > [2] https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c
> > > [3]
> > > http://lists.openstack.org/pipermail/openstack-dev/2016-September/104168.html
> > 
> > Thank you. We also tried going the first-boot way, but were unable to
> > differentiate between the Controller and Compute node (we want the kernel
> > boot-params to be applied only to the Computes). Could you help with
> > decrypting the HOSTNAME for the nodes. i.e. what is the expected pattern of
> > the HOSTNAME on Compute vs that on the Controller nodes during the
> > first-boot process?
> 
> Compute name, if not provided as an input externally, then the default
> hostname format is defined in [4] (as per stable/mitaka branch), which is
> "%stackname%-novacompute-%index%". Here %stackname% will be replaced with
> the stack name, which generally is overcloud and %index% will be replaced
> with the count of this compute node in the order to deployment. But the
> string "novacompute" will be fixed string. You can evaluate this string with
> the hostname during the first-boot script to apply only to the compute node.
> Which is what i have done in the gist [2] at line number 38, where i am
> comparing with hostname as dpdkd (in your case, it should be novacompute.
> 
> Note there is a chance to override the default value mentioned in [4]
> ComputeHostnameFormat, with environment file , which can be taken as input
> to the first-boot script to compare, if set. In the gist [2] refer to the
> line 16, where i have take the hostname format as ComputeDpdkHostnameFormat,
> for your case, it should be ComputeHostnameFormat. 
> 
> Also the script has some workaround for reboot logic. Please refer to the
> comments on the gist [1].
> 
> [4]
> https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/
> overcloud.yaml#L838

> which generally is overcloud and %index% will be replaced
> with the count of this compute node in the order to deployment. But the
> string "novacompute" will be fixed string.

It has weak reference, it could cause many user mistakes during deployment

> Note there is a chance to override the default value mentioned in [4]
> ComputeHostnameFormat, with environment file , which can be taken as input
> to the first-boot script to compare, if set. In the gist [2] refer to the
> line 16, where i have take the hostname format as ComputeDpdkHostnameFormat,
> for your case, it should be ComputeHostnameFormat.

Is the RFE valid only to OVS+DPDK? what about Guest DPDK+SRIOV

Comment 14 Yariv 2016-11-24 13:22:06 UTC
(In reply to Yariv from comment #13)
> (In reply to Saravanan KR from comment #6)
> > (In reply to bigswitch from comment #5)
> > > (In reply to Saravanan KR from comment #3)
> > 
> > > > [1] https://review.openstack.org/#/c/331564/
> > > > [2] https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c
> > > > [3]
> > > > http://lists.openstack.org/pipermail/openstack-dev/2016-September/104168.html
> > > 
> > > Thank you. We also tried going the first-boot way, but were unable to
> > > differentiate between the Controller and Compute node (we want the kernel
> > > boot-params to be applied only to the Computes). Could you help with
> > > decrypting the HOSTNAME for the nodes. i.e. what is the expected pattern of
> > > the HOSTNAME on Compute vs that on the Controller nodes during the
> > > first-boot process?
> > 
> > Compute name, if not provided as an input externally, then the default
> > hostname format is defined in [4] (as per stable/mitaka branch), which is
> > "%stackname%-novacompute-%index%". Here %stackname% will be replaced with
> > the stack name, which generally is overcloud and %index% will be replaced
> > with the count of this compute node in the order to deployment. But the
> > string "novacompute" will be fixed string. You can evaluate this string with
> > the hostname during the first-boot script to apply only to the compute node.
> > Which is what i have done in the gist [2] at line number 38, where i am
> > comparing with hostname as dpdkd (in your case, it should be novacompute.
> > 
> > Note there is a chance to override the default value mentioned in [4]
> > ComputeHostnameFormat, with environment file , which can be taken as input
> > to the first-boot script to compare, if set. In the gist [2] refer to the
> > line 16, where i have take the hostname format as ComputeDpdkHostnameFormat,
> > for your case, it should be ComputeHostnameFormat. 
> > 
> > Also the script has some workaround for reboot logic. Please refer to the
> > comments on the gist [1].
> > 
> > [4]
> > https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/
> > overcloud.yaml#L838
> 
> > which generally is overcloud and %index% will be replaced
> > with the count of this compute node in the order to deployment. But the
> > string "novacompute" will be fixed string.
> 
> It has weak reference, it could cause many user mistakes during deployment
> 
> > Note there is a chance to override the default value mentioned in [4]
> > ComputeHostnameFormat, with environment file , which can be taken as input
> > to the first-boot script to compare, if set. In the gist [2] refer to the
> > line 16, where i have take the hostname format as ComputeDpdkHostnameFormat,
> > for your case, it should be ComputeHostnameFormat.
> 
> Is the RFE valid only to OVS+DPDK? what about Guest DPDK+SRIOV


In case of composable roles.. the bash must include lots of if else...
The solution should be through Heat Templates

Comment 16 Saravanan KR 2017-01-18 08:12:04 UTC
In Ocata (OSP11), a new resource PreNetworkConfig has been added, which will be invoked before the NetworkDeployment. The PreNetworkConfig will group all the configs which require reboot of the overcloud node. More details on how to use it is in the upstream documentation [1] (documentation in progress).

[1] https://review.openstack.org/#/c/395431/3/doc/source/advanced_deployment/ovs_dpdk_config.rst

Comment 17 bigswitch 2017-01-23 19:40:41 UTC
(In reply to Saravanan KR from comment #16)
> In Ocata (OSP11), a new resource PreNetworkConfig has been added, which will
> be invoked before the NetworkDeployment. The PreNetworkConfig will group all
> the configs which require reboot of the overcloud node. More details on how
> to use it is in the upstream documentation [1] (documentation in progress).
> 
> [1]
> https://review.openstack.org/#/c/395431/3/doc/source/advanced_deployment/
> ovs_dpdk_config.rst

Thank you for the update Saravanan. The above looks like a clean solution to get the required effect.

For Mitaka/Newton, we went with your previous suggestion of deploying and checking the HOSTNAME in first-boot script to determine Compute nodes. We (often) observed a race condition where the first-boot script is invoked BEFORE the hostname (systemd-hostnamed) has been updated on the node. Hence, using the HOSTNAME would fail to be classified as compute. 

Notes: (first-boot: 13:11:18; hostname update:13:11:42)

Jan 12 13:11:18 localhost firstboot.sh: localhost.localdomain
Jan 12 13:11:18 localhost systemd: Started virt-sysprep firstboot service.
Jan 12 13:11:42 localhost dbus-daemon: dbus[1179]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Jan 12 13:11:42 localhost dbus[1179]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Jan 12 13:11:42 localhost systemd: Starting Hostname Service...
Jan 12 13:11:42 localhost dbus[1179]: [system] Successfully activated service 'org.freedesktop.hostname1'
Jan 12 13:11:42 localhost dbus-daemon: dbus[1179]: [system] Successfully activated service 'org.freedesktop.hostname1'
Jan 12 13:11:42 localhost systemd: Started Hostname Service.
Jan 12 13:11:42 localhost systemd-hostnamed: Changed static host name to 'overcloud-compute-nfv-0.localdomain'
Jan 12 13:11:42 localhost NetworkManager[1197]: <info>  [1484244702.3600] settings: hostname changed from "localhost.localdomain" to "overcloud-compute-nfv-0.localdomain"
Jan 12 13:11:42 localhost systemd-hostnamed: Changed host name to 'overcloud-compute-nfv-0.localdomain'

Our first-boot uses the following to get the name:
if [[ $(hostname) == *compute* ]]; then
fi

Is there a way we can enforce first-boot scripts are executed after hostname update? Or could you point us to the file that would have the HOSTNAME saved, which is probably used by systemd-hostnamed to update the node name (cat /etc/hostname during boot gives 'localhost.localdomain', i.e. not updated with name)

Comment 18 Saravanan KR 2017-01-24 10:29:28 UTC
(In reply to bigswitch from comment #17)
> (In reply to Saravanan KR from comment #16)
> could you point us to the file that would have the HOSTNAME
> saved, which is probably used by systemd-hostnamed to update the node name
> (cat /etc/hostname during boot gives 'localhost.localdomain', i.e. not
> updated with name)

There is way to get the hostname via querying the metadata from the undercloud. Like, if you execute "curl -m 10 http://169.254.169.254/openstack/latest/meta_data.json | python -m json.tool", from the compute overcloud node, the metadata of the overcloud node can be obtained, in which "hostname" field will give the actual host of the overcloud node. In the first boot, this can be queried to verify the host for applying the change.

Note, there is an issue in network readiness for the query to be successful, I have a workaround for it - https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c

Comment 19 bigswitch 2017-01-24 22:38:37 UTC
(In reply to Saravanan KR from comment #18)
> There is way to get the hostname via querying the metadata from the
> undercloud. Like, if you execute "curl -m 10
> http://169.254.169.254/openstack/latest/meta_data.json | python -m
> json.tool", from the compute overcloud node, the metadata of the overcloud
> node can be obtained, in which "hostname" field will give the actual host of
> the overcloud node. In the first boot, this can be queried to verify the
> host for applying the change.
> 
> Note, there is an issue in network readiness for the query to be successful,
> I have a workaround for it -
> https://gist.github.com/krsacme/1234bf024ac917c74913827298840c1c

Thank you so much for your inputs Saravanan!

Comment 20 Bob Fournier 2018-08-29 14:24:53 UTC
No code changes planned for this, the suggested workaround is provided and has been tested. Closing.