Hide Forgot
rhosp-director: overcloud deployment fails with: Error: modprobe nf_conntrack_proto_sctp returned 1 instead of one of [0] Error: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns Environment: openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch instack-undercloud-7.0.0-0.20170503001109.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170512193554.el7ost.noarch Steps to reproduce: 1. Before deploying overcloud, apply workarounds for bugs: #1448482 #1450370 #1452082 2. openstack overcloud deploy --templates --libvirt-type kvm -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e {{ templates_dir }}/docker-osp12.yaml -e {{ templates_dir }}/nodes_data.yaml --log-file overcloud_deployment_0.log Result: 2017-05-24 16:22:01.118 8419 WARNING tripleoclient.plugin [ admin] Waiting for messages on queue '2ed5b8a2-341d-43da-8230-ab40dbcd0f23' with no timeout. 2017-05-24 16:37:16.240 8419 ERROR openstack [ admin] Heat Stack create failed. (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 42c6254b-0aeb-4a72-bfc5-475ae33769c7 status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... Notice: /Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[aodh_api]/Tripleo::Firewall::Rule[128 aodh-api]/Firewall[128 aodh-api ipv6]/ensure: created Notice: /Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[panko_api]/Tripleo::Firewall::Rule[140 panko-api]/Firewall[140 panko-api ipv4]/ensure: created Notice: /Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[panko_api]/Tripleo::Firewall::Rule[140 panko-api]/Firewall[140 panko-api ipv6]/ensure: created Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]/seltype: seltype changed 'etc_t' to 'system_conf_t' Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]/seltype: seltype changed 'etc_t' to 'system_conf_t' Notice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content: content changed '{md5}1f337186b0e1ba5ee82760cb437fb810' to '{md5}acaf582bcaa099d2974602529bfbd2cc' Notice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/mode: mode changed '0644' to '0640' Notice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/seluser: seluser changed 'unconfined_u' to 'system_u' Notice: /Stage[main]/Tripleo::Profile::Base::Haproxy/Exec[haproxy-reload]: Triggered 'refresh' from 1 events Notice: Applied catalog in 90.63 seconds (truncated, view all with --long) deploy_stderr: | ... Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.accept_redirects]/Sysctl[net.ipv6.conf.default.accept_redirects]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.accept_redirects]/Sysctl_runtime[net.ipv6.conf.default.accept_redirects]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.autoconf]/Sysctl[net.ipv6.conf.default.autoconf]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.autoconf]/Sysctl_runtime[net.ipv6.conf.default.autoconf]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.disable_ipv6]/Sysctl[net.ipv6.conf.default.disable_ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.disable_ipv6]/Sysctl_runtime[net.ipv6.conf.default.disable_ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.netfilter.nf_conntrack_max]/Sysctl[net.netfilter.nf_conntrack_max]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.netfilter.nf_conntrack_max]/Sysctl_runtime[net.netfilter.nf_conntrack_max]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.nf_conntrack_max]/Sysctl[net.nf_conntrack_max]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.nf_conntrack_max]/Sysctl_runtime[net.nf_conntrack_max]: Skipping because of failed dependencies (truncated, view all with --long) overcloud.AllNodesDeploySteps.ControllerDockerConfigJsonStartupDataDeployment: resource_type: OS::Heat::SoftwareDeploymentGroup physical_resource_id: e8fd8f3e-803e-4fac-a858-fc93e58345c2 status: CREATE_FAILED status_reason: | CREATE aborted overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment: resource_type: OS::Heat::SoftwareDeploymentGroup physical_resource_id: cb760e31-7686-4321-b9d8-ee31ee5c9f36 status: CREATE_FAILED status_reason: | CREATE aborted Further debugging reveals this error: Error: modprobe nf_conntrack_proto_sctp returned 1 instead of one of [0] Error: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: change from notrun to 0 failed: modprobe nf_conntrack_proto_sctp returned 1 instead of one of [0] May 24 20:36:44 overcloud-controller-0.localdomain os-collect-config[1672]: Error: modprobe nf_conntrack_proto_sctp returned 1 instead of one of [0] May 24 20:36:44 overcloud-controller-0.localdomain os-collect-config[1672]: Error: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: change from notrun to 0 failed: modprobe nf_conntrack_proto_sctp returned 1 instead of one of [0] May 24 20:36:44 overcloud-controller-0.localdomain os-collect-config[1672]: [2017-05-24 20:36:43,974] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/e292f004-f611-4838-8b54-85a1c5c65482.pp. [6]
Further in the sosreport this can be seen: Exec[modprobe nf_conntrack_proto_sctp]/returns: modprobe: FATAL: Module nf_conntrack_proto_sctp not foun So we're trying to modprobe a module which isn't installed. This modprobe comes from OS::TripleO::Services::Kernel composable service, which we do not override in docker.yaml to a containerized variant. This means that it's the same as for baremetal deployments, and it should still be deployed on baremetal. Do we hit this in non-containerized deployments as well? (I'd expect it to be so.) Can you provide the output of find /lib/modules/ -name '*conntrack*' on the overcloud controller machine? Just as points of interest, here are some upstream commits related to the module -- where we introduced it [1] and rename [2]. [1] https://github.com/openstack/tripleo-heat-templates/commit/135cc2962d8cee920ddc4ff9bf9bb373c62ea8c5 [2] https://github.com/openstack/tripleo-heat-templates/commit/e811bb2efc3168ef5bec415c2099dedc8b98afe6
I reproduced that issue! overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 21f4d96a-5aa6-4866-a2ca-4d69085b0179 status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... Notice: /Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[aodh_api]/Tripleo::Firewall::Rule[128 aodh-api]/Firewall[128 aodh-api ipv6]/ensure: created Notice: /Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[panko_api]/Tripleo::Firewall::Rule[140 panko-api]/Firewall[140 panko-api ipv4]/ensure: created Notice: /Stage[main]/Tripleo::Firewall/Tripleo::Firewall::Service_rules[panko_api]/Tripleo::Firewall::Rule[140 panko-api]/Firewall[140 panko-api ipv6]/ensure: created Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]/seltype: seltype changed 'etc_t' to 'system_conf_t' Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]/seltype: seltype changed 'etc_t' to 'system_conf_t' Notice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content: content changed '{md5}1f337186b0e1ba5ee82760cb437fb810' to '{md5}621f346435892e702a2cf4a48934e3cc' Notice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/mode: mode changed '0644' to '0640' Notice: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/seluser: seluser changed 'unconfined_u' to 'system_u' Notice: /Stage[main]/Tripleo::Profile::Base::Haproxy/Exec[haproxy-reload]: Triggered 'refresh' from 1 events Notice: Applied catalog in 86.60 seconds (truncated, view all with --long) deploy_stderr: | ... Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.accept_redirects]/Sysctl[net.ipv6.conf.default.accept_redirects]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.accept_redirects]/Sysctl_runtime[net.ipv6.conf.default.accept_redirects]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.autoconf]/Sysctl[net.ipv6.conf.default.autoconf]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.autoconf]/Sysctl_runtime[net.ipv6.conf.default.autoconf]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.disable_ipv6]/Sysctl[net.ipv6.conf.default.disable_ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.ipv6.conf.default.disable_ipv6]/Sysctl_runtime[net.ipv6.conf.default.disable_ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.netfilter.nf_conntrack_max]/Sysctl[net.netfilter.nf_conntrack_max]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.netfilter.nf_conntrack_max]/Sysctl_runtime[net.netfilter.nf_conntrack_max]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.nf_conntrack_max]/Sysctl[net.nf_conntrack_max]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Profile::Base::Kernel/Sysctl::Value[net.nf_conntrack_max]/Sysctl_runtime[net.nf_conntrack_max]: Skipping because of failed dependencies (truncated, view all with --long) overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment: resource_type: OS::Heat::SoftwareDeploymentGroup physical_resource_id: 4a50f3df-f71f-44ea-8d54-e5f3f5a672df status: CREATE_FAILED status_reason: | CREATE aborted Heat Stack create failed.
output of [heat-admin@overcloud-controller-0 ~]$ find /lib/modules/ -name '*conntrack*' /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_sip.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_amanda.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_broadcast.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_pptp.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_netlink.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_ftp.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_h323.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_netbios_ns.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_tftp.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_sane.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/xt_conntrack.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_proto_gre.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_irc.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/netfilter/nf_conntrack_snmp.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/ipv4/netfilter/nf_conntrack_ipv4.ko.xz /lib/modules/3.10.0-663.el7.x86_64/kernel/net/ipv6/netfilter/nf_conntrack_ipv6.ko.xz
So indeed it looks like we're missing the kernel module that is being pulled in by https://github.com/openstack/tripleo-heat-templates/blob/ede56b7a8e2db78513b64996b0a0f5a1ce1904db/puppet/services/kernel.yaml#L68 This is very unlikely to have anything to do with containers, i think we'll need the networking folks to assess the BZ further (whether we need the module etc.).
That seems like this 3.10.0-663.el7.x86_64 RHEL 7 kernel isn't compiled with SCTP connection tracking support. Does Red Hat support SCTP connection tracking? Why do we treat SCTP specially?
Hi!Any news ?
@Itzik, this was originally introduced upstream as ip_conntrack_proto_sctp and later changed to nf_conntrack_proto_sctp. Was either version tested on a RHEL system?
Adding : urgent this breaks osp12-downstream-container-ci .
Checking the kernel configuration, SCTP was built in the kernel and not as a module: [heat-admin@overcloud-controller-0 ~]$ grep CONFIG_NF_CT_PROTO_SCTP /boot/config-3.10.0-663.el7.x86_64 CONFIG_NF_CT_PROTO_SCTP=y I'm leaning toward a wrong assumption inside a puppet module.
I looked at kernel packaging and it seems the used system is RHEL 7.4 which differs from CentOS 7.3 As per commit, they enabled built-in support for conntrack for SCTP in RHEL 7.4, see bug 1387537
(In reply to Martin André from comment #10) > Checking the kernel configuration, SCTP was built in the kernel and not as a > module: > > [heat-admin@overcloud-controller-0 ~]$ grep CONFIG_NF_CT_PROTO_SCTP > /boot/config-3.10.0-663.el7.x86_64 > CONFIG_NF_CT_PROTO_SCTP=y > > I'm leaning toward a wrong assumption inside a puppet module. Yes, it seems for 7.3 we need to load the module but we must not load it for 7.4 as it's a built-in.
As a workaround for RHEL 7.4 you can delete the line from t-h-t that requires the nf_conntrack_proto_sctp module: http://paste.openstack.org/show/610993/ Ideally we should make the upstream puppet module not fail if the requested module is built in the kernel, either in puppet-tripleo [1] on in the upstream kmod module [2]. [1] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/kernel.pp [2] https://github.com/camptocamp/puppet-kmod
(In reply to Martin André from comment #13) > As a workaround for RHEL 7.4 you can delete the line from t-h-t that > requires the nf_conntrack_proto_sctp module: > > http://paste.openstack.org/show/610993/ > > Ideally we should make the upstream puppet module not fail if the requested > module is built in the kernel, either in puppet-tripleo [1] on in the > upstream kmod module [2]. I think that "ideally" is setting a low bar here, I wouldn't consider this bug fixed unless TripleO deployed successfully on both 7.3 and 7.4. > > [1] > https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/ > base/kernel.pp > [2] https://github.com/camptocamp/puppet-kmod
(In reply to Martin André from comment #13) > As a workaround for RHEL 7.4 you can delete the line from t-h-t that > requires the nf_conntrack_proto_sctp module: > > http://paste.openstack.org/show/610993/ > Tested - the W/A ^ is valid .
(In reply to Brent Eagles from comment #8) > @Itzik, this was originally introduced upstream as ip_conntrack_proto_sctp > and later changed to nf_conntrack_proto_sctp. Was either version tested on a > RHEL system? Yes It was tested on RHEL. BZ#1426912
Likely the puppet change needs to be back-ported also to 11.
TL;DR summary RHEL 7.4 kernel moved nf_conntrack_proto_sctp into kernel from module This causes puppet-kmod to fail when testing with RHEL 7.4 kernel The current working approach is to guard loading this module or ignore if it fails, possibly in puppet-tripleo. Easy workaround is to remove entry from tripleo-heat-templates see comment #c15
upstream puppet-tripleo patch is promising
Verified: Environment: puppet-tripleo-7.1.1-0.20170615141731.el7ost.noarch The reported issue doesn't reproduce.
Or, we know OSP started loading the sctp module in earlier Director versions. This means that once those versions attempt to run RHEL 7.4 we'll hit the same issue. Can you please backport the fix to every appropriate OSP version?
No problem, will start working on that.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462