Bug 1562035

Summary: DPDK: Deployment fails intermittently only when tuned profile "cpu-partitioning" is activated
Product: Red Hat OpenStack Reporter: Saravanan KR <skramaja>
Component: openstack-tripleo-heat-templatesAssignee: Saravanan KR <skramaja>
Status: CLOSED CURRENTRELEASE QA Contact: Yariv <yrachman>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aschultz, atelang, atheurer, dbecker, ealcaniz, emacchi, fherrman, houyatao, jianzzha, jraju, jschluet, jskarvad, ksundara, mbabushk, mburns, morazi, noama, openstack, owalsh, pablo.iranzo, rhel-osp-director-maint, sbaker, skramaja, vchundur, wdec, yrachman, yroblamo
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: All NFV deployment fails when starting a container randomly with nsenter invalid argument error. Consequence: A timing issue between tuned cpu-partitioning profile and unshare kernel call from the container runtime, uncovered the existing issue in the kernel of wrong usage of mm_users count. Fix: Previously, the unshare(CLONE_NEWPID) function could fail when racing with the get_task_mm() function, because the CLONE_NEWPID parameter incorrectly implied the CLONE_THREAD parameter and thus also the CLONE_VM parameter. This update fixes sys_unshare() to not imply CLONE_THREAD if CLONE_NEWPID is used, thus fixing the bug. Result: Need to use kernel 3.10.0-862.6.3.el7.x86_64 to fix this issue.
Story Points: ---
Clone Of:
: 1577745 (view as bug list) Environment:
Last Closed: 2018-07-24 06:14:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1577745    
Bug Blocks: 1566561    
Attachments:
Description Flags
stack failure log with tuned enabled deployment
none
stack_failures_log none

Description Saravanan KR 2018-03-29 12:15:37 UTC
Created attachment 1414748 [details]
stack failure log with tuned enabled deployment

Description of problem:
Deployment failed with docker run failing intermittently with below error log

-----------------------------
"2018-03-21 06:57:02,719 ERROR: 12318 -- Failed running docker-puppet.py for iscsid", 
"2018-03-21 06:57:02,719 ERROR: 12318 -- nsenter: failed to unshare namespaces: Invalid argument", 
"container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 29\\\"\"", 
"/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 29\\\"\".", 
"time=\"2018-03-21T06:57:02Z\" level=error msg=\"error getting events from daemon: net/http: request canceled\" ", 
-----------------------------

* Happens only with RHEL, verified in centos, doesn't happen.
* It is not always with the same docker image, randomly occuring on different containers and different steps of deployment
* Sometimes the deployment is successful when deployed again (stack update), but not always.
* Deployment is successful when the tuned profile is removed from the parameters (2 out of 2 passed)

Comment 1 Emilien Macchi 2018-03-29 19:09:41 UTC
From the logs, it looks similar to this known issue: https://github.com/moby/moby/issues/34971 - the reporter is on CentOS7 though. I wonder if the kernel parameters for namespaces in RHEL are different from CentOS.

Could you please give the version of your rpms? and the command used to deploy so we can try to reproduce.

Thanks

Comment 2 Saravanan KR 2018-04-02 11:56:07 UTC
(In reply to Emilien Macchi from comment #1)
> From the logs, it looks similar to this known issue:
> https://github.com/moby/moby/issues/34971 - the reporter is on CentOS7
> though. I wonder if the kernel parameters for namespaces in RHEL are
> different from CentOS.
> 
> Could you please give the version of your rpms? and the command used to
> deploy so we can try to reproduce.
> 
> Thanks

Able to deploy successfully by reducing the docker puppet instances to 1 and with Tuned profile enabled.

  DockerPuppetProcessCount: 1

I am re-deploying few times to ensure that it works always. I will update the BZ.

Comment 3 Saravanan KR 2018-04-02 12:31:23 UTC
(In reply to Saravanan KR from comment #2)
> (In reply to Emilien Macchi from comment #1)
> Able to deploy successfully by reducing the docker puppet instances to 1 and
> with Tuned profile enabled.
> 
>   DockerPuppetProcessCount: 1
> 
> I am re-deploying few times to ensure that it works always. I will update
> the BZ.

It failed with the same error for the 2nd run.

Tested with OSP-13 puddle 2018-03-16.1.

DPDK specific parameters:
--------------------------------
parameter_defaults:
  ComputeOvsDpdkParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=64 intel_iommu=on iommu=pt isolcpus=10-87"
    TunedProfileName: "cpu-partitioning"
    IsolCpusList: "10-87"

  OvsDpdkSocketMemory: "2048,2048"
  OvsDpdkMemoryChannels: "4"
  NovaReservedHostMemory: 4096

  OvsPmdCoreList: "10,11,22,23"
  OvsDpdkCoreList: "0,1,2,3,4,5,6,7,8,9"
  NovaVcpuPinSet: ['12-21','24-87']
--------------------------------

Comment 4 Steve Baker 2018-04-08 22:38:39 UTC
Just to make a general comment, the purpose of docker-puppet.py is to use puppet write configuration files, so it is optimised for concurrency.

This means if docker-puppet.py modifies any state on the host there is a risk that a race will be triggered, resulting in unpredictable errors.

Since you've tried with DockerPuppetProcessCount:1 that may not be the cause of this particular issue, but it sounds like the host *is* being modified, and we should make the required change to do this in a separate "puppet apply" call rather than using docker-puppet.py.

Looking at https://github.com/moby/moby/issues/34971, I suppose we could be hitting some kind of namespace limit. I'll keep investigating.

Comment 5 Steve Baker 2018-04-08 23:17:00 UTC
According to this[1] it might be worth adding this to your KernelArgs.

  user_namespace.enable=1

Could you please try this with the DockerPuppetProcessCount:1 parameter and post any failed stack log?

[1] https://github.com/opencontainers/runc/issues/1343

Comment 6 Saravanan KR 2018-04-13 10:43:42 UTC
(In reply to Steve Baker from comment #5)
> According to this[1] it might be worth adding this to your KernelArgs.
> 
>   user_namespace.enable=1
> 
> Could you please try this with the DockerPuppetProcessCount:1 parameter and
> post any failed stack log?
> 
> [1] https://github.com/opencontainers/runc/issues/1343

Yes, I too have tried with the kernel args and process count, but doesn't help.

-------------------------------------
parameter_defaults:
  ComputeOvsDpdkParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=64 intel_iommu=on iommu=pt isolcpus=10-87 user_namespace.enable=1"

  ExtraSysctlSettings:
      user.max_user_namespaces:
        value: 15000
-------------------------------------

And it started happening in rhel7.5. Event I have tried with the latest docker packages in rhel. 

Also this issue never occurs in CentOS based deployments.

If I disable the TunedProfileName alone, then the deployment is successful.

Comment 7 Steve Baker 2018-04-16 20:20:27 UTC
Assigning to DF:NFV dradaz and marking as untriaged. Once the issue is diagnosed it can already be reassigned back to us if the fix is changing the container config.

Comment 8 Yariv 2018-04-23 14:07:48 UTC
*** Bug 1570412 has been marked as a duplicate of this bug. ***

Comment 9 Saravanan KR 2018-05-02 07:07:06 UTC
On trying with puddle 2018-04-19.2, I am unable to reproduce this issue. Asked for confirmation from QE.

Still the root cause of the BZ is unknown. All it is known is is related to user namespace but the solutions suggested above does not work. If it is reproduced again, we have to work with container and tuned team to fix this issue. Moving it to ON_QA to ensure that QE validates this scenario on the latest puddle.

Comment 10 Yariv 2018-05-02 09:40:27 UTC
Verified with SRIOV with puddle 2018-04-26.1

Comment 12 Maxim Babushkin 2018-05-06 10:14:43 UTC
The issue appears back in the latest 2018-05-04.1 puddle.

Comment 13 Saravanan KR 2018-05-07 10:30:58 UTC
(In reply to Maxim Babushkin from comment #12)
> The issue appears back in the latest 2018-05-04.1 puddle.

Can you past the error logs?
  openstack stack failures list --long overcloud

Comment 16 Maxim Babushkin 2018-05-10 05:35:13 UTC
Created attachment 1434177 [details]
stack_failures_log

Comment 17 Ollie Walsh 2018-05-10 21:24:17 UTC
Looks like this kernel issue:  https://lkml.org/lkml/2015/7/28/833

Looking at the source for the kernel on that deployment (from kernel/fork.c):

static int check_unshare_flags(unsigned long unshare_flags)
{
        if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
                                CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
                                CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
                                CLONE_NEWUSER|CLONE_NEWPID))
                return -EINVAL;
        /*
         * Not implemented, but pretend it works if there is nothing to
         * unshare. Note that unsharing CLONE_THREAD or CLONE_SIGHAND
         * needs to unshare vm.
         */
        if (unshare_flags & (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)) {
                /* FIXME: get_task_mm() increments ->mm_users */
                if (atomic_read(&current->mm->mm_users) > 1)
                        return -EINVAL;
        }

        return 0;
}

While https://github.com/moby/moby/issues/34971 show the flags are CLONE_NEWUTS|CLONE_NEWPID|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWIPC we don't avoid the issue as CLONE_NEWPID implicitly sets CLONE_THREAD:

SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
{
        struct fs_struct *fs, *new_fs = NULL;
        struct files_struct *fd, *new_fd = NULL;
        struct cred *new_cred = NULL;
        struct nsproxy *new_nsproxy = NULL;
        int do_sysvsem = 0;
        int err;

        /*
         * If unsharing a user namespace must also unshare the thread.
         */
        if (unshare_flags & CLONE_NEWUSER)
                unshare_flags |= CLONE_THREAD | CLONE_FS;
        /*
         * If unsharing a pid namespace must also unshare the thread.
         */
        if (unshare_flags & CLONE_NEWPID)
                unshare_flags |= CLONE_THREAD;
        /*
         * If unsharing a thread from a thread group, must also unshare vm.
         */
        if (unshare_flags & CLONE_THREAD)
                unshare_flags |= CLONE_VM;
        /*
         * If unsharing vm, must also unshare signal handlers.
         */
        if (unshare_flags & CLONE_VM)
                unshare_flags |= CLONE_SIGHAND;
        /*
         * If unsharing namespace, must also unshare filesystem information.
         */
        if (unshare_flags & CLONE_NEWNS)
                unshare_flags |= CLONE_FS;

        err = check_unshare_flags(unshare_flags);
        if (err)
                goto bad_unshare_out;

Maybe setting pid=host in the docker run command line would remove CLONE_NEWPID from the unshare flags used and avoid the issue...

Comment 23 Saravanan KR 2018-05-11 15:01:21 UTC
I face similar failures in paunch container start also on overcloud.AllNodesDeploySteps.ComputeOvsDpdkDeployment_Step4.0


            "Error running ['docker', 'run', '--name', 'ceilometer_agent_compute', '--label', 'config_id=tripleo_step4', '--label', 'container_name=ceilometer_agent_compute', '--label', 'managed_by=paunch', '--label', 'config_data={\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=6b3b5b2d9dd55807977f98e19c7c715d\"], \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/ceilometer_agent_compute.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/ceilometer/:/var/lib/kolla/config_files/src:ro\", \"/var/run/libvirt:/var/run/libvirt:ro\", \"/var/log/containers/ceilometer:/var/log/ceilometer\"], \"image\": \"192.0.10.1:8787/rhosp13/openstack-ceilometer-compute:2018-05-07.2\", \"net\": \"host\", \"restart\": \"always\", \"privileged\": false}', '--detach=true', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=6b3b5b2d9dd55807977f98e19c7c715d', '--net=host', '--privileged=false', '--restart=always', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/ceilometer_agent_compute.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/ceilometer/:/var/lib/kolla/config_files/src:ro', '--volume=/var/run/libvirt:/var/run/libvirt:ro', '--volume=/var/log/containers/ceilometer:/var/log/ceilometer', '192.0.10.1:8787/rhosp13/openstack-ceilometer-compute:2018-05-07.2']. [125]", 
            "stdout: 811a9940b8afa258ccb467cd765a9b24edbce059e106ef125dc1a79686df3e32", 
            "stderr: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 29\\\"\".",

Comment 24 Wojciech Dec 2018-05-14 09:39:46 UTC
We're seeing this issue also in Pike.

Comment 25 Saravanan KR 2018-05-15 09:46:50 UTC
Reproduced it standalone by running the docker-puppet script directly on the compute nodes. Out of 5 trials, the same error occurred once. Trying to figure out if it could be reproduced by directly with docker command.

export PROCESS_COUNT=3
export NET_HOST='true'
python /var/lib/docker-puppet/docker-puppet.py

http://paste.openstack.org/show/720984/

Comment 26 Alex Schultz 2018-05-15 16:03:09 UTC
*** Bug 1578355 has been marked as a duplicate of this bug. ***

Comment 27 Saravanan KR 2018-05-16 13:46:21 UTC
cat <<EOF>/home/stack/nsenter_error_tht.patch
diff --git a/docker/docker-puppet.py b/docker/docker-puppet.py
index 599ed14b..89e92ce4 100755
--- a/docker/docker-puppet.py
+++ b/docker/docker-puppet.py
@@ -275,6 +275,7 @@ def mp_puppet_config((config_volume, puppet_tags, manifest, config_image, volume
         pull_image(config_image)
 
         dcmd = ['/usr/bin/docker', 'run',
+                '--pid', 'host',
                 '--user', 'root',
                 '--name', 'docker-puppet-%s' % config_volume,
                 '--env', 'PUPPET_TAGS=%s' % puppet_tags,
EOF

cat <<EOF>/home/stack/nsenter_error_tht.patch
diff --git a/docker/docker-puppet.py b/docker/docker-puppet.py
index 599ed14b..89e92ce4 100755
--- a/docker/docker-puppet.py
+++ b/docker/docker-puppet.py
@@ -275,6 +275,7 @@ def mp_puppet_config((config_volume, puppet_tags, manifest, config_image, volume
         pull_image(config_image)
 
         dcmd = ['/usr/bin/docker', 'run',
+                '--pid', 'host',
                 '--user', 'root',
                 '--name', 'docker-puppet-%s' % config_volume,
                 '--env', 'PUPPET_TAGS=%s' % puppet_tags,
EOF

sudo patch  -d /usr/share/openstack-tripleo-heat-templates/ -p1 < /home/stack/nsenter_error_tht.patch

sudo yum install libguestfs-tools -y
cd /home/stack/images
sudo su
mkdir -p mount_image
export LIBGUESTFS_BACKEND=direct
guestmount -a overcloud-full.qcow2 -m /dev/sda mount_image
patch -d mount_image/usr/lib/python2.7/site-packages/ -p1 < /home/stack/nsenter_error_paunch.patch
exit
openstack overcloud image upload --update-existing

Comment 31 Saravanan KR 2018-05-22 11:49:58 UTC
cat <<EOF>/home/stack/nsenter_error_paunch.patch
diff --git a/paunch/builder/compose1.py b/paunch/builder/compose1.py
index 5472d37..a041bc0 100644
--- a/paunch/builder/compose1.py
+++ b/paunch/builder/compose1.py
@@ -144,6 +144,7 @@ class ComposeV1Builder(object):
         for v in cconfig.get('environment', []):
             if v:
                 cmd.append('--env=%s' % v)
+        cmd.append('--pid=host')
         if cconfig.get('remove', False):
             cmd.append('--rm')
         if cconfig.get('interactive', False):
EOF

Comment 37 Alex Schultz 2018-05-23 15:01:14 UTC
*** Bug 1581711 has been marked as a duplicate of this bug. ***

Comment 38 Saravanan KR 2018-05-30 13:37:54 UTC
*** Bug 1566561 has been marked as a duplicate of this bug. ***

Comment 46 Yariv 2018-07-12 11:58:03 UTC
Verified with Kernel 3.10.0-862.6.3.el7.x86_64 #
With Tuned enabled, please refer NFV CI' THT already merged
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/nfv/view/nfv/

Comment 47 Saravanan KR 2018-07-24 06:13:10 UTC
As per comment #46, this issue has been verified by QE. And this kernel is available as part of OSP13 19th July update. Moving to verified state.

Comment 48 Saravanan KR 2018-07-24 06:14:09 UTC
Closing it as the required kernel version is already part of 19th July 2018 release of OSP13.