OC deployment with ipv6+vlan fails: invalid header field value \\"oci runtime error: container_linux.go:247 Environment: openstack-tripleo-heat-templates-7.0.3-18.el7ost.noarch openstack-puppet-modules-11.0.0-1.el7ost.noarch instack-undercloud-7.4.3-5.el7ost.noarch Steps to reproduce: Attempt a deploy with ipv6: openstack overcloud deploy --templates \ --libvirt-type kvm \ -e /home/stack/templates/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \ -e /home/stack/virt/network/network-environment-v6.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/inject-trust-anchor-hiera.yaml \ -e /home/stack/rhos12.yaml Looking for errors in heat, I see the following: \"Error running ['docker', 'run', '--name', 'rabbitmq_image_tag', '--label', 'config_id=tripleo_step1', '--label', 'container_name=rabbitmq_image_tag', '--label', 'managed_by=paunch', '--label', 'config_data={\\"start_order\\": 1, \\"command\\": [\\"/bin/bash\\", \\"-c\\", \\"/usr/bin/docker tag \\'192.168.24.1:8787/rhosp12/openstack-rabbitmq:12.0-20171201.1\\' \\'192.168.24.1:8787/rhosp12/openstack-rabbitmq:pcmklatest\\'\\"], \\"user\\": \\"root\\", \\"volumes\\": [\\"/etc/hosts:/etc/hosts:ro\\", \\"/etc/localtime:/etc/localtime:ro\\", \\"/dev/shm:/dev/shm:rw\\", \\"/etc/sysconfig/docker:/etc/sysconfig/docker:ro\\", \\"/usr/bin:/usr/bin:ro\\", \\"/var/run/docker.sock:/var/run/docker.sock:rw\\"], \\"image\\": \\"192.168.24.1:8787/rhosp12/openstack-rabbitmq:12.0-20171201.1\\", \\"detach\\": false, \\"net\\": \\"host\\"}', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/dev/shm:/dev/shm:rw', '--volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro', '--volume=/usr/bin:/usr/bin:ro', '--volume=/var/run/docker.sock:/var/run/docker.sock:rw', '192.168.24.1:8787/rhosp12/openstack-rabbitmq:12.0-20171201.1', '/bin/bash', '-c', \\"/usr/bin/docker tag '192.168.24.1:8787/rhosp12/openstack-rabbitmq:12.0-20171201.1' '192.168.24.1:8787/rhosp12/openstack-rabbitmq:pcmklatest'\\"]. [125]\", \"/usr/bin/docker-current: Error response from daemon: invalid header field value \\"oci runtime error: container_linux.go:247: starting container process caused \\\\"process_linux.go:258: applying cgroup configuration for process caused \\\\\\\\"write /sys/fs/cgroup/pids/system.slice/docker-0642d71adf65f90fac83693d33be8857e9b1c4a5c69254357ea04fdeadf10c49.scope/cgroup.procs: no such device\\\\\\\\"\\\\"\\n\\".\", Checking the logs on controller I see the following error message: Dec 06 22:26:25 overcloud-controller-0 oci-umount[33118]: umounthook <error>: 3fa2cdcfe1e6: Failed to read directory /usr/share/oci-umount/oci-umount.d: No such file or directory
The "umounthook <error>: 40d5622b04b3: Failed to read directory /usr/share/oci-umount/oci-umount.d: No such file or directory" error is certainly concerning but I do not think that is the cause of the issue we're seeing since it also shows up on "healthy" nodes. According to https://github.com/moby/moby/issues/17653 there seem to be a known issue with systemd cgroups driver and it is now recommended to use the cgroupfs driver (with native.cgroupdriver=cgroupfs) but I have absolutely no idea of the implications. Some more info: [heat-admin@overcloud-controller-2 ~]$ sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0fc1ea1995c2 192.168.24.1:8787/rhosp12/openstack-mariadb:12.0-20171201.1 "/bin/bash -c '/usr/b" 10 hours ago Exited (0) 10 hours ago mysql_image_tag d953b2a5373b 192.168.24.1:8787/rhosp12/openstack-memcached:12.0-20171201.1 "/bin/bash -c 'source" 10 hours ago Up 10 hours memcached fa989369dbf3 192.168.24.1:8787/rhosp12/openstack-haproxy:12.0-20171201.1 "/bin/bash -c '/usr/b" 10 hours ago Exited (0) 10 hours ago haproxy_image_tag cb7c0b3c844c 192.168.24.1:8787/rhosp12/openstack-mariadb:12.0-20171201.1 "bash -ecx 'if [ -e /" 10 hours ago Exited (0) 10 hours ago mysql_bootstrap ad27bd9c00d5 192.168.24.1:8787/rhosp12/openstack-redis:12.0-20171201.1 "/bin/bash -c '/usr/b" 10 hours ago Exited (0) 10 hours ago redis_image_tag 0642d71adf65 192.168.24.1:8787/rhosp12/openstack-rabbitmq:12.0-20171201.1 "/bin/bash -c '/usr/b" 10 hours ago Created rabbitmq_image_tag b6933a2f5745 192.168.24.1:8787/rhosp12/openstack-rabbitmq:12.0-20171201.1 "kolla_start" 10 hours ago Exited (0) 10 hours ago rabbitmq_bootstrap 25bea91ba36c 192.168.24.1:8787/rhosp12/openstack-memcached:12.0-20171201.1 "/bin/bash -c 'source" 10 hours ago Exited (0) 10 hours ago memcached_init_logs a46e74f2f80a 192.168.24.1:8787/rhosp12/openstack-mariadb:12.0-20171201.1 "chown -R mysql: /var" 10 hours ago Exited (0) 10 hours ago mysql_data_ownership [heat-admin@overcloud-controller-2 ~]$ sudo docker logs rabbitmq_image_tag container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"write /sys/fs/cgroup/pids/system.slice/docker-0642d71adf65f90fac83693d33be8857e9b1c4a5c69254357ea04fdeadf10c49.scope/cgroup.procs: no such device\"" [heat-admin@overcloud-controller-2 ~]$ sudo docker info Containers: 9 Running: 1 Paused: 0 Stopped: 8 Images: 18 Server Version: 1.12.6 Storage Driver: overlay2 Backing Filesystem: xfs Native Overlay Diff: true Logging Driver: journald Cgroup Driver: systemd Plugins: Volume: local Network: null host bridge overlay Authorization: rhel-push-plugin Swarm: inactive Runtimes: docker-runc runc Default Runtime: docker-runc Security Options: seccomp Kernel Version: 3.10.0-693.11.1.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo) OSType: linux Architecture: x86_64 Number of Docker Hooks: 3 CPUs: 8 Total Memory: 31.26 GiB Name: overcloud-controller-2 ID: Z7Y7:QR7R:Q35Z:JN57:JT5B:Q4CB:QLKM:4TFY:L54O:P23C:TDJE:4RKR Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://registry.access.redhat.com/v1/ Insecure Registries: 192.168.24.1:8787 127.0.0.0/8 Registries: registry.access.redhat.com (secure), docker.io (secure)
Got same issue during minor update ... u' "Trying to pull repository 192.168.24.1:8787/rhosp12/openstack-ceilometer-notification-docker ... ", ', u' "12.0-20171201.1: Pulling from 192.168.24.1:8787/rhosp12/openstack-ceilometer-notification-docker", ', u' "243dc7b9e786: Already exists", ', u' "550516fb1c76: Already exists", ', u' "d0b13a963636: Already exists", ', u' "9e15370858a9: Already exists", ', u' "5b5d4699b9fb: Already exists", ', u' "a554773d8409: Pulling fs layer", ', u' "a554773d8409: Verifying Checksum", ', u' "a554773d8409: Download complete", ', u' "a554773d8409: Pull complete", ', u' "Digest: sha256:4c793db2cbaaa8d506e5ae46ce3b3a77f2e8a3230021815f6152e7253bb966fd", ', u' "Error running [\'docker\', \'run\', \'--name\', \'horizon\', \'--label\', \'config_id=tripleo_step3\', \'--label\', \'container_name=horizon\', \'--label\', \'managed_by=paunch\', \'--label\', \'conf ig_data={\\"environment\\": [\\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\\", \\"ENABLE_IRONIC=yes\\", \\"ENABLE_MANILA=yes\\", \\"ENABLE_SAHARA=yes\\", \\"ENABLE_CLOUDKITTY=no\\", \\"ENABLE_FREEZER=no\\", \\"ENABLE_FWA AS=no\\", \\"ENABLE_KARBOR=no\\", \\"ENABLE_DESIGNATE=no\\", \\"ENABLE_MAGNUM=no\\", \\"ENABLE_MISTRAL=no\\", \\"ENABLE_MURANO=no\\", \\"ENABLE_NEUTRON_LBAAS=no\\", \\"ENABLE_SEARCHLIGHT=no\\", \\"ENABLE_SENLIN= no\\", \\"ENABLE_SOLUM=no\\", \\"ENABLE_TACKER=no\\", \\"ENABLE_TROVE=no\\", \\"ENABLE_WATCHER=no\\", \\"ENABLE_ZAQAR=no\\", \\"ENABLE_ZUN=no\\", \\"TRIPLEO_CONFIG_HASH=00aefaf228b0ca7aa445b3952d87fbca\\"], \\"v olumes\\": [\\"/etc/hosts:/etc/hosts:ro\\", \\"/etc/localtime:/etc/localtime:ro\\", \\"/etc/puppet:/etc/puppet:ro\\", \\"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\\", \\"/etc/pki/tls/certs/ca-bu ndle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\\", \\"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\\", \\"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\\", \\"/dev/log:/dev/log \\", \\"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\\", \\"/var/lib/kolla/config_files/horizon.json:/var/lib/kolla/config_files/config.json:ro\\", \\"/var/lib/config-data/puppet-generated/horizon/:/var/ lib/kolla/config_files/src:ro\\", \\"/var/log/containers/horizon:/var/log/horizon\\", \\"/var/log/containers/httpd/horizon:/var/log/httpd\\", \\"\\", \\"\\"], \\"image\\": \\"192.168.24.1:8787/rhosp12/openstack- horizon-docker:12.0-20171201.1Waiting for messages on queue '29bb1c58-b3b9-47f3-ac49-b459e0374747' with no timeout. inor update failed with: {u'status': u'FAILED', u'execution': {u'name': u'tripleo.package_update.v1.update_nodes', u'created_at': u'2017-12-07 08:59:23', u'id': u'2182486a-e6eb-4c14-8700-ea7d582c962f', u'params ': {u'namespace': u''}, u'input': {u'inventory_file': u"[undercloud]\nlocalhost\n\n[undercloud:vars]\nusername = admin\novercloud_keystone_url = https://10.0.0.101:13000/v2.0\nproject_name = admin\nundercloud_se rvice_list = ['openstack-nova-compute', 'openstack-heat-engine', 'openstack-ironic-conductor', 'openstack-swift-container', 'openstack-swift-object', 'openstack-mistral-engine']\novercloud_horizon_url = https:// 10.0.0.101:443/dashboard\nos_auth_token = gAAAAABaKQLl-DlR0O8mB56fcrYZ0ruDdEOnpcNw1K58ExR_BJCIDQMZmP7HoqIIzgltYinKc4zxYTUVPzxYfc3dQlmzuYRC4MKH6kajYrYLraqsVsGzB0AfXp-YkUkF2zmHbIuZN2WpWuXY8mUhpdBBKOy-nAIp79c-ykE1E IXSQKNwhywb4c4\novercloud_admin_password = Vd3cvaRM9W2h2UrFpfhERmB9K\nauth_url = https://192.168.24.2:13000/\nansible_connection = local\nundercloud_swift_url = https://192.168.24.2:13808/v1/AUTH_6a052c6cf58a4a6 0a3c6e4519477f42d\nplan = overcloud\n\n[controller-0]\n192.168.24.15\n\n[controller-0:vars]\ndeploy_server_id = fdcf3353-4513-4c23-8607-7143ca197ca9\n\n[controller-1]\n192.168.24.19\n\n[controller-1:vars]\ndeplo y_server_id = 9d97088d-b495-4a31-9c33-51ab0be09eec\n\n[controller-2]\n192.168.24.14\n\n[controller-2:vars]\ndeploy_server_id = 73bc306c-214a-45c1-8a9b-4ba37e9ab213\n\n[Controller:vars]\nrole_name = Controller\na nsible_ssh_user = heat-admin\nbootstrap_server_id = fdcf3353-4513-4c23-8607-7143ca197ca9\n\n[Controller:children]\ncontroller-0\ncontroller-1\ncontroller-2\n\n[compute-0]\n192.168.24.11\n\n[compute-0:vars]\ndepl oy_server_id = 6836ca7e-3f8e-43ef-91dd-50be4124379c\n\n[compute-1]\n192.168.24.7\n\n[compute-1:vars]\ndeploy_server_id = 5712379a-44ed-4075-ab7e-cb1c53487725\n\n[Compute:vars]\nrole_name = Compute\nansible_ssh_u ser = heat-admin\nbootstrap_server_id = fdcf3353-4513-4c23-8607-7143ca197ca9\n\n[Compute:children]\ncompute-0\ncompute-1\n\n[ceph-0]\n192.168.24.9\n\n[ceph-0:vars]\ndeploy_server_id = a63f8d94-7aad-46b6-8fc2-bd4 ff501bbeb\n\n[ceph-1]\n192.168.24.16\n\n[ceph-1:vars]\ndeploy_server_id = 4e9a327a-1948-47fc-927b-8e58e0d19aff\n\n[ceph-2]\n192.168.24.10\n\n[ceph-2:vars]\ndeploy_server_id = 6fe90fbb-36b8-4fe8-8e79-3b844a7bd8bc \n\n[CephStorage:vars]\nrole_name = CephStorage\nansible_ssh_user = heat-admin\nbootstrap_server_id = fdcf3353-4513-4c23-8607-7143ca197ca9\n\n[CephStorage:children]\nceph-0\nceph-1\nceph-2\n\n[overcloud:children ]\nCephStorage\nCompute\nController\n\n[aodh_evaluator:vars]\nansible_ssh_user = heat-admin\n\n[aodh_evaluator:children]\nController\n\n[kernel:vars]\nansible_ssh_user = heat-admin\n\n[kernel:children]\nCephStor age\nCompute\nController\n\n[neutron_metadata:vars]\nansible_ssh_user = heat-admin\n\n[neutron_metadata:children]\nController\n\n[pacemaker:vars]\nansible_ssh_user = heat-admin\n\n[pacemaker:children]\nControlle r\n\n[nova_placement:vars]\nansible_ssh_user = heat-admin\n\n[nova_placement:children]\nController\n\n[snmp:vars]\nansible_ssh_user = heat-admin\n\n[snmp:children]\nCephStorage\nCompute\nController\n\n[heat_api: vars]\nansible_ssh_user = heat-admin\n\n[heat_api:children]\nController\n\n[cinder_api:vars]\nansible_ssh_user = heat-admin\n\n[cinder_api:children]\nController\n\n[ceph_client:vars]\nansible_ssh_user = heat-adm in\n\n[ceph_client:children]\nCompute\n\n[ceph_mon:vars]\nansible_ssh_user = heat-admin\n\n[ceph_mon:children]\nController\n\n[aodh_listener:vars]\nansible_ssh_user = heat-admin\n\n[aodh_listener:children]\nCont roller\n\n[swift_ringbuilder:vars]\nansible_ssh_user = heat-admin\n\n[swift_ringbuilder:children]\nController\n\n[neutron_dhcp:vars]\nansible_ssh_user = heat-admin\n\n[neutron_dhcp:children]\nController\n\n[gnoc chi_api:vars]\nansible_ssh_user = heat-admin\n\n[gnocchi_api:children]\nController\n\n[timezone:vars]\nansible_ssh_user = heat-admin\n\n[timezone:children]\nCephStorage\nCompute\nController\n\n[ceilometer_agent_ central:vars]\nansible_ssh_user = heat-admin\n\n[ceilometer_agent_central:children]\nController\n\n[heat_api_cloudwatch_disabled:vars]\nansible_ssh_user = heat-admin\n\n[heat_api_cloudwatch_disabled:children]\nC ontroller\n\n[aodh_notifier:vars]\nansible_ssh_user = heat-admin\n\n[aodh_notifier:children]\nController\n\n[tripleo_firewall:vars]\nansible_ssh_user = heat-admin\n\n[tripleo_firewall:children]\nCephStorage\nCom pute\nController\n\n[swift_storage:vars]\nansible_ssh_user = heat-admin\n\n[swift_storage:children]\nController\n\n[redis:vars]\nansible_ssh_user = heat-admin\n\n[redis:children]\nController\n\n[gnocchi_statsd:v ars]\nansible_ssh_user = heat-admin\n\n[gnocchi_statsd:children]\nController\n\n[iscsid:vars]\nansible_ssh_user = heat-admin\n\n[iscsid:children]\nCompute\nController\n\n[nova_conductor:vars]\nansible_ssh_user = heat-admin\n\n[nova_conductor:children]\nController\n\n[mysql_client:vars]\nansible_ssh_user = heat-admin\n\n[mysql_client:children]\nCephStorage\nCompute\nController\n\n[nova_consoleauth:vars]\nansible_ssh_use r = heat-admin\n\n[nova_consoleauth:children]\nController\n\n[glance_api:vars]\nansible_ssh_user = heat-admin\n\n[glance_api:children]\nController\n\n[keystone:vars]\nansible_ssh_user = heat-admin\n\n[keystone:c hildren]\nController\n\n[cinder_volume:vars]\nansible_ssh_user = heat-admin\n\n[cinder_volume:children]\nController\n\n[ceilometer_collector_disabled:vars]\nansible_ssh_user = heat-admin\n\n[ceilometer_collector _disabled:children]\nController\n\n[ceilometer_agent_notification:vars]\nansible_ssh_user = heat-admin\n\n[ceilometer_agent_notification:children]\nController\n\n[memcached:vars]\nansible_ssh_user = heat-admin\n \n[memcached:children]\nController\n\n[haproxy:vars]\nansible_ssh_user = heat-admin\n\n[haproxy:children]\nController\n\n[mongodb_disabled:vars]\nansible_ssh_user = heat-admin\n\n[mongodb_disabled:children]\nCon troller\n\n[neutron_plugin_ml2:vars]\nansible_ssh_user = heat-admin\n\n[neutron_plugin_ml2:children]\nCompute\nController\n\n[nova_api:vars]\nansible_ssh_user = heat-admin\n\n[nova_api:children]\nController\n\n[ aodh_api:vars]\nansible_ssh_user = heat-admin\n\n[aodh_api:children]\nController\n\n[nova_metadata:vars]\nansible_ssh_user = heat-admin\n\n[nova_metadata:children]\nController\n\n[heat_engine:vars]\nansible_ssh_ user = heat-admin\n\n[heat_engine:children]\nController\n\n[ntp:vars]\nansible_ssh_user = heat-admin\n\n[ntp:children]\nCephStorage\nCompute\nController\n\n[ceilometer_expirer_disabled:vars]\nansible_ssh_user = heat-admin\n\n[ceilometer_expirer_disabled:children]\nController\n\n[ceilometer_api_disabled:vars]\nansible_ssh_user = heat-admin\n\n[ceilometer_api_disabled:children]\nController\n\n[nova_migration_target:vars] \nansible_ssh_user = heat-admin\n\n[nova_migration_target:children]\nCompute\n\n[cinder_scheduler:vars]\nansible_ssh_user = heat-admin\n\n[cinder_scheduler:children]\nController\n\n[gnocchi_metricd:vars]\nansibl e_ssh_user = heat-admin\n\n[gnocchi_metricd:children]\nController\n\n[tripleo_packages:vars]\nansible_ssh_user = heat-admin\n\n[tripleo_packages:children]\nCephStorage\nCompute\nController\n\n[nova_scheduler:vars]\nansible_ssh_user = heat-admin\n\n[nova_scheduler:children]\nController\n\n[nova_compute:vars]\nansible_ssh_user = heat-admin\n\n[nova_compute:children]\nCompute\n\n[ceph_osd:vars]\nansible_ssh_user = heat-admin\n\n[ceph_osd:children]\nCephStorage\n\n[logrotate_crond:vars]\nansible_ssh_user = heat-admin\n\n[logrotate_crond:children]\nCephStorage\nCompute\nController\n\n[neutron_ovs_agent:vars]\nansible_ssh_user = heat-admin\n\n[neutron_ovs_agent:children]\nCompute\nController\n\n[swift_proxy:vars]\nansible_ssh_user = heat-admin\n\n[swift_proxy:children]\nController\n\n[sshd:vars]\nansible_ssh_user = heat-admin\n\n[sshd:children]\nCephStorage\nCompute\nController\n\n[mysql:vars]\nansible_ssh_user = heat-admin\n\n[mysql:children]\nController\n\n[ceilometer_agent_compute:vars]\nansible_ssh_user = heat-admin\n\n[ceilometer_agent_compute:children]\nCompute\n\n[neutron_l3:vars]\nansible_ssh_user = heat-admin\n\n[neutron_l3:children]\nController\n\n[nova_libvirt:vars]\nansible_ssh_user = heat-admin\n\n[nova_libvirt:children]\nCompute\n\n[rabbitmq:vars]\nansible_ssh_user = heat-admin\n\n[rabbitmq:children]\nController\n\n[tuned:vars]\nansible_ssh_user = heat-admin\n\n[tuned:children]\nCephStorage\nCompute\nController\n\n[panko_api:vars]\nansible_ssh_user = heat-admin\n\n[panko_api:children]\nController\n\n[horizon:vars]\nansible_ssh_user = heat-admin\n\n[horizon:children]\nController\n\n[neutron_api:vars]\nansible_ssh_user = heat-admin\n\n[neutron_api:children]\nController\n\n[ca_certs:vars]\nansible_ssh_user = heat-admin\n\n[ca_certs:children]\nCephStorage\nCompute\nController\n\n[heat_api_cfn:vars]\nansible_ssh_user = heat-admin\n\n[heat_api_cfn:children]\nController\n\n[docker:vars]\nansible_ssh_user = heat-admin\n\n[docker:children]\nCephStorage\nCompute\nController\n\n[nova_vnc_proxy:vars]\nansible_ssh_user = heat-admin\n\n[nova_vnc_proxy:children]\nController\n\n[clustercheck:vars]\nansible_ssh_user = heat-admin\n\n[clustercheck:children]\nController\n\n", u'queue_name': u'29bb1c58-b3b9-47f3-ac49-b459e0374747', u'playbook': u'update_steps_playbook.yaml', u'ansible_extra_env_variables': {u'ANSIBLE_HOST_KEY_CHECKING': u'False'}, u'module_path': u'/usr/share/ansible-modules', u'nodes': u'Controller', u'node_user': u'heat-admin', u'ansible_queue_name': u'update'}, u'spec': {u'tasks': {u'node_update': {u'name': u'node_update', u'on-error': u'node_update_failed', u'on-success': [{u'node_update_passed': u'<% task().result.returncode = 0 %>'}, {u'node_update_failed': u'<% task().result.returncode != 0 %>'}], u'publish': {u'output': u'<% task(node_update).result %>'}, u'version': u'2.0', u'action': u'tripleo.ansible-playbook', u'input': {u'remote_user': u'<% $.node_user %>', u'become_user': u'root', u'ssh_private_key': u'<% $.private_key %>', u'verbosity': 0, u'queue_name': u'<% $.ansible_queue_name %>', u'extra_env_variables': u'<% $.ansible_extra_env_variables %>', u'inventory': u'<% $.inventory_file %>', u'module_path': u'<% $.module_path %>', u'become': True, u'limit_hosts': u'<% $.nodes %>', u'playbook': u'<% $.tmp_path %>/<% $.playbook %>'}, u'type': u'direct'}, u'get_private_key': {u'name': u'get_private_key', u'on-success': u'node_update', u'publish': {u'private_key': u'<% task(get_private_key).result %>'}, u'version': u'2.0', u'action': u'tripleo.validations.get_privkey', u'type': u'direct'}, u'node_update_failed': {u'version': u'2.0', u'type': u'direct', u'name': u'node_update_failed', u'publish': {u'status': u'FAILED', u'message': u'Failed to update nodes - <% $.nodes %>, please see the logs.'}, u'on-success': u'notify_zaqar'}, u'node_update_passed': {u'version': u'2.0', u'type': u'direct', u'name': u'node_update_passed', u'publish': {u'status': u'SUCCESS', u'message': u'Updated nodes - <% $.nodes %>'}, u'on-success': u'notify_zaqar'}, u'notify_zaqar': {u'retry': u'count=5 delay=1', u'name': u'notify_zaqar', u'on-success': [{u'fail': u'<% $.get(\'status\') = "FAILED" %>'}], u'version': u'2.0', u'action': u'zaqar.queue_post', u'input': {u'queue_name': u'<% $.queue_name %>', u'messages': {u'body': {u'type': u'tripleo.package_update.v1.update_nodes', u'payload': {u'status': u'<% $.status %>', u'execution': u'<% execution() %>'}}}}, u'type': u'direct'}, u'download_config': {u'name': u'download_config', u'on-error': u'node_update_failed', u'on-success': u'get_private_key', u'publish': {u'tmp_path': u'<% task(download_config).result %>'}, u'version': u'2.0', u'action': u'tripleo.config.download_config', u'type': u'direct'}}, u'name': u'update_nodes', u'tags': [u'tripleo-common-managed'], u'version': u'2.0', u'input': [{u'node_user': u'heat-admin'}, u'nodes', u'playbook', u'inventory_file', {u'queue_name': u'tripleo'}, {u'ansible_queue_name': u'tripleo'}, {u'module_path': u'/usr/share/ansible-modules'}, {u'ansible_extra_env_variables': {u'ANSIBLE_HOST_KEY_CHECKING': u'False'}}], u'description': u'Take a container and perform an update nodes by nodes'}}} \\", \\"net\\": \\"host\\", \\"restart\\": \\"always\\", \\"privileged\\": false}\', \'--detach=true\', \'--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\', \'--env=ENABLE_IRONIC=yes\', \'--env=ENABLE_MANILA=yes\', \'--env=ENABLE_SAHARA=yes\', \'--env=ENABLE_CLOUDKITTY=no\', \'--env=ENABLE_FREEZER=no\', \'--env=ENABLE_FWAAS=no\', \'--env=ENABLE_KARBOR=no\', \'--env=ENABLE_DESIGNATE=no\', \'--env=ENABLE_MAGNUM=no\', \'--env=ENABLE_MISTRAL=no\', \'--env=ENABLE_MURANO=no\', \'--env=ENABLE_NEUTRON_LBAAS=no\', \'--env=ENABLE_SEARCHLIGHT=no\', \'--env=ENABLE_SENLIN=no\', \'--env=ENABLE_SOLUM=no\', \'--env=ENABLE_TACKER=no\', \'--env=ENABLE_TROVE=no\', \'--env=ENABLE_WATCHER=no\', \'--env=ENABLE_ZAQAR=no\', \'--env=ENABLE_ZUN=no\', \'--env=TRIPLEO_CONFIG_HASH=00aefaf228b0ca7aa445b3952d87fbca\', \'--net=host\', \'--privileged=false\', \'--restart=always\', \'--volume=/etc/hosts:/etc/hosts:ro\', \'--volume=/etc/localtime:/etc/localtime:ro\', \'--volume=/etc/puppet:/etc/puppet:ro\', \'--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\', \'--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\', \'--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\', \'--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\', \'--volume=/dev/log:/dev/log\', \'--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\', \'--volume=/var/lib/kolla/config_files/horizon.json:/var/lib/kolla/config_files/config.json:ro\', \'--volume=/var/lib/config-data/puppet-generated/horizon/:/var/lib/kolla/config_files/src:ro\', \'--volume=/var/log/containers/horizon:/var/log/horizon\', \'--volume=/var/log/containers/httpd/horizon:/var/log/httpd\', \'192.168.24.1:8787/rhosp12/openstack-horizon-docker:12.0-20171201.1\']. [125]", ', u' "stdout: 6b51b137c2539503f5c7403600390a32fe5ad1b4ff3551349d80ae235353923e", ', u' "stderr: /usr/bin/docker-current: Error response from daemon: invalid header field value \\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\"process_linux.go:258: applying cgroup configuration for process caused \\\\\\\\\\\\\\"write /sys/fs/cgroup/pids/system.slice/docker-6b51b137c2539503f5c7403600390a32fe5ad1b4ff3551349d80ae235353923e.scope/cgroup.procs: no such device\\\\\\\\\\\\\\"\\\\\\"\\\\n\\".", ', u' "stdout: 0b5d54fd2499da7608c08b3c834452266b22dd76c0d472260bf4c496b230335e", ', u' "stderr: Unable to find image \'192.168.24.1:8787/rhosp12/openstack-swift-account-docker:12.0-20171201.1\' locally", ', docker logs horizon container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"write /sys/fs/cgroup/pids/system.slice/docker-6b51b137c2539503f5c7403600390a32fe5ad1b4ff3551349d80ae235353923e.scope/cgroup.procs: no such device\"" Packages: ------------------------------------------------------- docker-rhel-push-plugin-1.12.6-68.gitec8512b.el7.x86_64 python-docker-pycreds-1.10.6-3.el7.noarch docker-common-1.12.6-68.gitec8512b.el7.x86_64 python-heat-agent-docker-cmd-1.4.0-1.el7ost.noarch docker-client-1.12.6-68.gitec8512b.el7.x86_64 docker-1.12.6-68.gitec8512b.el7.x86_64 python-docker-py-1.10.6-3.el7.noarch libcgroup-0.41-13.el7.x86_64 libcgroup-tools-0.41-13.el7.x86_64
Not sure if related, but playing with docker I was able to run into the same error when tried to start a container from an image build with 'VOLUME /' instruction. "[root@undercloud-0 docker]# docker run -d --name nisim exportedroot 26c65ccd8e6172185260f24899e1cf59b2e55a9913df427f107229378cb74216 /usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"open /proc/self/fd: no such file or directory\"\n". " Once I set the VOLUME to a particular directory inside the image and rebuilt it - was able to run containers of it.
Hello Dan, we'll need help from your team here. We've seen containers failing with: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"write /sys/fs/cgroup/pids/system.slice/docker-0642d71adf65f90fac83693d33be8857e9b1c4a5c69254357ea04fdeadf10c49.scope/cgroup.procs: no such device\"" And according to https://github.com/moby/moby/issues/17653 it is a known issue with systemd cgroups driver and people recommended to use the cgroupfs driver instead. Do you think it is a valid workaround? Anything we need to know if we switch from systemd to cgroupfs driver?
Raising: severity/priority as this bug was also reported by CI_TEAM in non_IPv6_vlan possible Dup Bug repoted -> https://bugzilla.redhat.com/show_bug.cgi?id=1525229
*** Bug 1525229 has been marked as a duplicate of this bug. ***
Looks similar to https://github.com/openshift/origin/issues/16246 Adding Vikas also
python-docker-py is being pulled from RHEL for overcloud-full images python-docker-py-1.10.6-3.el7
Vikas took a look and basically said: 1) this is a very rare race condition (see https://github.com/openshift/origin/issues/16246) that has only ever been reported as reproduced in CI 2) they are not yet sure of a fix, and so it will be some time before any fix will be available in RHEL Based on that, I think we need to proceed, but with very clear advise for customers on what to do next if they hit this
Note, in bug #1514511, it looks like Marian Krcmarik encountered this issue when trying to reproduce the other systemd/containers issue: https://bugzilla.redhat.com/show_bug.cgi?id=1514511#c7
Lets disable oci-register-machine for now. Set /etc/oci-register-machine.conf # Disable oci-register-machine by setting the disabled field to true disabled : true Which will stop it from running and failing. This is only needed if you are running systemd in a container. Even in that case it is not fully needed. You could also just remove oci-register-machine package from the host.
oci-register-machine-0-3.14.gitcd1e331.el7_4 from bug #1514511 disables oci-register-machine If I understand Dan correctly, that's our fix ...
Wait ... we also disabled it in puppet-tripleo already: https://code.engineering.redhat.com/gerrit/#/c/124023/ and this fix should be in puppet-tripleo-7.4.3-9.el7ost and later. https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=627532
And yet, according to the logs in: https://bugzilla.redhat.com/show_bug.cgi?id=1525229#c1 we have apparently reproduced this with puppet-tripleo-7.4.3-11.el7ost.noarch
also confirmed in https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-network-neutron-12_director-rhel-virthost-3cont_2comp-ipv4-vxlan/122/artifact/ that /etc/oci-register-machine.conf contains: # Disable oci-register-machine by setting the disabled field to true disabled : true
(In reply to Daniel Walsh from comment #15) > Lets disable oci-register-machine for now. > > Set > /etc/oci-register-machine.conf > # Disable oci-register-machine by setting the disabled field to true > disabled : true > > Which will stop it from running and failing. > > This is only needed if you are running systemd in a container. Even in that > case it is not fully needed. > > You could also just remove oci-register-machine package from the host. In our environment we are already setting /etc/oci-register-machine.conf's disabled: true. In order to be able to remove the oci-register-machine package entirely we'd need to update the RPM deps. Currently I get this: error: Failed dependencies: oci-register-machine >= 1:0-3.10 is needed by (installed) docker-common-2:1.12.6-61.git85d7426.el7.x86_64
Ah. So now I'm wondering if perhaps the missing piece here on our side is a 'systemctl restart systemd-machined'. The puppet-tripleo patch currently would restart only the docker service, not systemd-machined.
Could you confirm if we need to restart anything along with the /etc/oci-register-machine.conf settings change? systemctl restart systemd-machined? ---- Also, see the RPM dependency issue with regards to removing the oci-register-machine package.
No the configuration for register-machine is read for every container. Should not need to restart any services.
Here is the detailed analysis of this issue https://github.com/openshift/origin/issues/16246#issuecomment-355852817
https://github.com/lnykryn/systemd-rhel/issues/180
Awesome, bug #1532586 is the underlying systemd bug but we think we don't need this fix because we have disabled oci-register-machine already
It is reproduced upstream as a promotion blocker https://bugs.launchpad.net/tripleo/+bug/1744954
It's not a promotion blocker (as we promoted this morning) but definitely a bug (possibly race condition).
We are having a similar if not the same situation in TripleO gate at this time: https://bugs.launchpad.net/tripleo/+bug/1746298 The issue is critical as it makes our jobs randomly failing and it blocks OSP13 production chain at this time. Note that oci-register-machine is already disabled. AFIU the comments, we shouldn't need the systemd fix, but can someone confirm it?
Dan, thanks for disabling the hook by default: https://github.com/projectatomic/oci-register-machine/commit/66691c3d0805c41e7336a364934e3e144e97a20f but it seems we still hit a similar issue at this sime, see https://bugs.launchpad.net/tripleo/+bug/1746298 A full journal can be found here: http://logs.openstack.org/46/538346/1/gate/tripleo-ci-centos-7-scenario004-multinode-oooq-container/c0e6264/logs/subnode-2/var/log/journal.txt.gz (can be found by grepping "oci runtime error"). I was wondering if we missed something and if you could help. Thanks a lot
For some reasons, another bug report was created on launchpad: https://bugs.launchpad.net/tripleo/+bug/1744954 One patch was proposed but I'm not sure it'll really help: https://review.openstack.org/539537 Note: we don't have the latest version of oci-register-machine in TripleO CI yet.
I realized I wanted Dan's thoughts but missed NEEDINFO flag.
> Note: we don't have the latest version of oci-register-machine in TripleO CI > yet. Sorry for misleading info in LP, I looked closer and oci-register-machine version we have does have disable: true https://bugs.launchpad.net/tripleo/+bug/1744954/comments/7 tl;dr disabling oci-register-machine does not help in the current case and I'm not sure how it helped back in December...
This looks weird. Jan 30 16:26:42 centos-7-citycloud-sto2-0002270221 oci-umount[12679]: umounthook <error>: 4e2c2bba7cb3: Failed to read directory /usr/share/oci-umount/oci-umount.d: No such file or directory Does the oci-umount package include this directory. If you crrate this directory does everything work?
(In reply to Mark McLoughlin from comment #36) > Awesome, bug #1532586 is the underlying systemd bug but we think we don't > need this fix because we have disabled oci-register-machine already @Mark I doubt that. race b/w runc and systemd is not related to enabling/disabling of oci-register-machine. If runc is runtime, then one should make sure that this fix, https://github.com/opencontainers/runc/pull/1683, is in there or systemd has fix for bug #1532586 Disabling oci-register-machine, most probably, will help with another race.
There are two different races. The one related to pids cgroup join, IMO, will not get work arounded by disabling oci-register-machine.
dwalsh: not sure yet but I'll try to create the directory and see how that works. Vikas: can you please look at https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-master/d559096/undercloud/home/jenkins/overcloud_prep_containers.log.txt.gz#_2018-02-02_22_01_51 A lot of logs are available here: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-master/d559096/undercloud/var/log/ And the rpm versions: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset017-master/d559096/rpm-qa.txt.gz Thanks
(In reply to Emilien Macchi from comment #48) > dwalsh: not sure yet but I'll try to create the directory and see how that > works. > > Vikas: can you please look at > https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7- > multinode-1ctlr-featureset017-master/d559096/undercloud/home/jenkins/ > overcloud_prep_containers.log.txt.gz#_2018-02-02_22_01_51 > > A lot of logs are available here: > https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7- > multinode-1ctlr-featureset017-master/d559096/undercloud/var/log/ > > And the rpm versions: > https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7- > multinode-1ctlr-featureset017-master/d559096/rpm-qa.txt.gz > > Thanks Verified from the logs that you shared, docker in use is 1.12.6 and that is using runc which is at this commit: https://github.com/projectatomic/runc/commit/c5d311627d39439c5b1cc35c67a51c9c6ccda648 Fix from opencontainers/runc, https://github.com/opencontainers/runc/pull/1683, is not there. Therefore as i said in previous comment, to avoid this failure, mentioned fix should be backported to projectatomic/runc
@run
Antonio can you see if we can get this patch back ported to docker-runc?
FYI I cloned this to RHEL/docker bug 1543575
*** Bug 1550588 has been marked as a duplicate of this bug. ***
The new docker version addresses this. See bug 1543575