Description of problem: TrilioVault's datamover container mounts "/var/lib/nova" with "shared,z" option. Here is the heat template code for it: https://github.com/shyam-biradar/triliovault-cfg-scripts/blob/master/redhat-director-scripts/docker/services/trilio-datamover-osp16.yaml#L176 Then, Trilio Datamover service mounts a NFS share under "/var/lib/nova" inside datamover container. Main goal of this is, to make Trilio's NFS share to 'nova_compute' and 'nova_libvirt' container. As, "/var/lib/nova" directory is mounted and shared among nova_compute and nova_libvirt, we achived this goal by mounting NFS share under "/var/lib/nova". This worked fine till RHOSP14, but now in RHOSP16, we are facing issue during overcloud deployment with TrilioVault containers. Overcloud deployment for first time with TrilioVault containers iw working fine, but subsequent upgrades of Trilio containers are failing with following error of 'relabelling'. "fatal: [overcloud-novacompute-0]: FAILED! => {"ansible_job_id": "692027900534.93212", "attempts": 2, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step3", "rc": 126, "stderr": "Did not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=nova_statedir_owner', '--filter', 'label=config_id=tripleo_step3', '--format', '{{.Names}}']\" - retrying without config_id\nDid not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=nova_statedir_owner', '--format', '{{.Names}}']\"\nError running ['podman', 'run', '--name', 'nova_statedir_owner', '--label', 'config_id=tripleo_step3', '--label', 'container_name=nova_statedir_owner', '--label', 'managed_by=tripleo-Compute', '--label', 'config_data={\"command\": \"/container-config-scripts/pyshim.sh /container-config-scripts/nova_statedir_ownership.py\", \"detach\": false, \"environment\": {\"TRIPLEO_DEPLOY_IDENTIFIER\": \"1585050745\", \"__OS_DEBUG\": \"false\"}, \"image\": \"devundercloud.ctlplane.localdomain:8787/rhosp-rhel8/openstack-nova-compute:16.0-83\", \"net\": \"none\", \"privileged\": false, \"user\": \"root\", \"volumes\": [\"/var/lib/nova:/var/lib/nova:shared,z\", \"/var/lib/container-config-scripts/:/container-config-scripts/:z\"]}', '--conmon-pidfile=/var/run/nova_statedir_owner.pid', '--log-driver', 'k8s-file', '--log-opt', 'path=/var/log/containers/stdouts/nova_statedir_owner.log', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1585050745', '--env=__OS_DEBUG=false', '--net=none', '--privileged=false', '--user=root', '--volume=/var/lib/nova:/var/lib/nova:shared,z', '--volume=/var/lib/container-config-scripts/:/container-config-scripts/:z', '--cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', 'devundercloud.ctlplane.localdomain:8787/rhosp-rhel8/openstack-nova-compute:16.0-83', '/container-config-scripts/pyshim.sh', '/container-config-scripts/nova_statedir_ownership.py']. [126]\n\nstdout: \nstderr: Error: relabel failed \"/var/lib/nova\": operation not supported\n\n", "stderr_lines": ["Did not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=nova_statedir_owner', '--filter', 'label=config_id=tripleo_step3', '--format', '{{.Names}}']\" - retrying without config_id", "Did not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=nova_statedir_owner', '--format', '{{.Names}}']\"", "Error running ['podman', 'run', '--name', 'nova_statedir_owner', '--label', 'config_id=tripleo_step3', '--label', 'container_name=nova_statedir_owner', '--label', 'managed_by=tripleo-Compute', '--label', 'config_data={\"command\": \"/container-config-scripts/pyshim.sh /container-config-scripts/nova_statedir_ownership.py\", \"detach\": false, \"environment\": {\"TRIPLEO_DEPLOY_IDENTIFIER\": \"1585050745\", \"__OS_DEBUG\": \"false\"}, \"image\": \"devundercloud.ctlplane.localdomain:8787/rhosp-rhel8/openstack-nova-compute:16.0-83\", \"net\": \"none\", \"privileged\": false, \"user\": \"root\", \"volumes\": [\"/var/lib/nova:/var/lib/nova:shared,z\", \"/var/lib/container-config-scripts/:/container-config-scripts/:z\"]}', '--conmon-pidfile=/var/run/nova_statedir_owner.pid', '--log-driver', 'k8s-file', '--log-opt', 'path=/var/log/containers/stdouts/nova_statedir_owner.log', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1585050745', '--env=__OS_DEBUG=false', '--net=none', '--privileged=false', '--user=root', '--volume=/var/lib/nova:/var/lib/nova:shared,z', '--volume=/var/lib/container-config-scripts/:/container-config-scripts/:z', '--cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', 'devundercloud.ctlplane.localdomain:8787/rhosp-rhel8/openstack-nova-compute:16.0-83', '/container-config-scripts/pyshim.sh', '/container-config-scripts/nova_statedir_ownership.py']. [126]", "", "stdout: ", "stderr: Error: relabel failed \"/var/lib/nova\": operation not supported", ""], "stdout": "", "stdout_lines": []} " Version-Release number of selected component (if applicable): RHOSP16 How reproducible: Always reproducible. Steps to Reproduce: 1. Deploy RHOSP16 2. Deploy TrilioVault 4.0 through overcloud deploy command 3. Upgrade TrilioVault build on RHOSP16 through overcloud deploy command It will fail with above error. Actual results: Overcloud deploy fails. Expected results: Overcloud deploy should be successful. Additional info: If there is any selinux option/label for Trilio NFS to make this work, it would be great.
Hi, please post your /var/log/audit/audit.log (grep for AVC) from the overcloud-novacompute-0 (or full sos report); so we can help and identify what rule is missing.
Some additional context since we don't have everything here. This issue was also discussed on #tripleo this week. owalsh stepped in with some proposal and all. The thing is: - that trilio thing mounts an NFS share in a subdirectory of /var/lib/nova - this prevents the relabelling, because NFS - this solution was apparently approved at some point by Red Hat eng Solution - one of the proposal made was to actually mount that NFS elsewhere in the containers (there are, iirc, 3 containers, including triliovault) - in order to do so, a new param would be needed for the libvirt container, something like NovaLibvirtOptVolumes - then NovaComputeOptVolumes could be used in addition Doing so would allow triliovault to work as expected, but would require some work in order to properly get the NFS share name (there's apparently some kind of hashing at some point for the name) The current solution (mounting the share directly in /var/lib/nova) was working with Docker because the selinux separation was deactivated back then. The move to podman enforces selinux separation, therefore we had to add some flags to the shares, such as that "z" one, which requires a recursive relabelling of the volume. NFS doesn't really support SELinux. We can pass a "context" (mount -t nfs -o context="...") but this won't make the relabelling work (recursion).... Moving the share elsewhere is probably the best move at this point. @Shyam: would the proposed solution be OK for you? Adding a new param in order to get extras volumes in libvirt container? I know this means you'll need to rehash some data in order to provide the right path during deploy time, but it shouldn't be that complicated, right? Maybe you can even pre-hash things once and re-use that generated during the whole deploy (i.e. as a param). Thank you for your feedback. Cheers, C. (aka Tengu on #tripleo)
Hi Team, Moving NFS share to somewhere else is something very difficult. In that case we need to achive many things a the same time. 1. We need to make NFS share available to nova_compute container 2. We need to make NFS share available to nova_libvirt container 3. Mount point of NFS share is static, our datamover service calculates the hash for given NFS share and uses it as mount point. 4. This approach makes it less dynamic. owalsh is proposing something else. Here is PR owalsh raised for this: https://review.opendev.org/#/c/715015/ You will get more details on the approach owalsh is using here. Thank you. Let me know if you need additional information.
Hello Shyam, Thank you for your feedback. I've indeed seen Oliver proposal and it seems to solve your issue (among others). I let Oliver manage this BZ as well (I've put him as assignee yesterday). Cheers, C.
Thank you Cedric.
*** Bug 1813941 has been marked as a duplicate of this bug. ***
According to our records, this should be resolved by openstack-tripleo-heat-templates-11.3.2-0.20200405044625.ec9970c.el8ost. This build is available now.
Hi, With this fix, our trilio_datamover container is not getting started. It's remained in 'Created' state. I tried to start this container using 'podman start' command, it's failing following error. [root@overcloud-novacompute-0 heat-admin]# podman ps --all | grep trilio 70464925affe devundercloud.ctlplane.localdomain:8787/trilio/trilio-datamover:4.0.91-rhosp16 kolla_start 18 hours ago Created trilio_datamover [root@overcloud-novacompute-0 heat-admin]# podman start trilio_datamover Error: unable to start container "trilio_datamover": relabel failed "/var/lib/nova": operation not supported When this is happening: We deployed 4.0.90 containers of Triliovault, it worked fine. trilio_datamover container started well. NFS share mounted under '/var/lib/nova/...'. But when I tried to upgrade the cloud with 4.0.91 containers of triliovault, 4.0.91 trilio_datamover container is not getting started. Overcloud deployment intermittently failing. Let us know your thoughts.
Hi, Following is the error found in ansible logs. Deployment failed at step5 while starting 'trilio_datamover' container. ------------------------------------------------------------------------------------ 2020-05-26 13:41:53,780 p=25153 u=mistral | FAILED - RETRYING: Wait for containers to start for step 5 using paunch (1189 retries left). 2020-05-26 13:41:57,105 p=25153 u=mistral | FAILED - RETRYING: Wait for containers to start for step 5 using paunch (1188 retries left). 2020-05-26 13:42:00,377 p=25153 u=mistral | ok: [overcloud-controller-0] => {"action": ["Applying config_id tripleo_step5"], "ansible_job_id": "439545201027.925672", "attempts": 14, "changed": false, "finished": 1, "rc": 0, "stderr": "Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=cinder_volume_init_bundle', '--filter', 'label=config_id=tripleo_step5', '--format', '{{.Names}}']" - retrying without config_id\nDid not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=cinder_volume_init_bundle', '--format', '{{.Names}}']"\nRemoved /etc/systemd/system/multi-user.target.wants/tripleo_trilio_dmapi.service.\nDid not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=trilio_dmapi', '--filter', 'label=config_id=tripleo_step5', '--format', '{{.Names}}']" - retrying without config_id\nDid not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=trilio_dmapi', '--format', '{{.Names}}']"\nCreated symlink /etc/systemd/system/multi-user.target.wants/tripleo_trilio_dmapi.service → /etc/systemd/system/tripleo_trilio_dmapi.service.\n", "stderr_lines": ["Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=cinder_volume_init_bundle', '--filter', 'label=config_id=tripleo_step5', '--format', '{{.Names}}']" - retrying without config_id", "Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=cinder_volume_init_bundle', '--format', '{{.Names}}']"", "Removed /etc/systemd/system/multi-user.target.wants/tripleo_trilio_dmapi.service.", "Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=trilio_dmapi', '--filter', 'label=config_id=tripleo_step5', '--format', '{{.Names}}']" - retrying without config_id", "Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=trilio_dmapi', '--format', '{{.Names}}']"", "Created symlink /etc/systemd/system/multi-user.target.wants/tripleo_trilio_dmapi.service → /etc/systemd/system/tripleo_trilio_dmapi.service."], "stdout": "Info: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: Loading facts\nInfo: ------------------------------------------------------------------------------------
(In reply to Shyam from comment #13) > Hi, > > > With this fix, our trilio_datamover container is not getting started. It's > remained in 'Created' state. > I tried to start this container using 'podman start' command, it's failing > following error. > > [root@overcloud-novacompute-0 heat-admin]# podman ps --all | grep trilio > 70464925affe > devundercloud.ctlplane.localdomain:8787/trilio/trilio-datamover:4.0.91- > rhosp16 kolla_start 18 hours ago Created > trilio_datamover > > > [root@overcloud-novacompute-0 heat-admin]# podman start trilio_datamover > Error: unable to start container "trilio_datamover": relabel failed > "/var/lib/nova": operation not supported This suggests /var/lib/nova is bind mounted with selinux relabelling enabled e.g /var/lib/nova:/var/lib/nova:shared,z Change this to /var/lib/nova:/var/lib/nova:shared instead. > > > > When this is happening: > We deployed 4.0.90 containers of Triliovault, it worked fine. > trilio_datamover container started well. NFS share mounted under > '/var/lib/nova/...'. > But when I tried to upgrade the cloud with 4.0.91 containers of triliovault, > 4.0.91 trilio_datamover container is not getting started. I doubt the version matters, just that the NFS mounts exist when the container is restarted. > Overcloud deployment intermittently failing. > > Let us know your thoughts.
Hi, Yes, we use 'shared,z' flags while mounting the '/var/lib/nova'. Here is the code. https://github.com/trilioData/triliovault-cfg-scripts/blob/stable/4.0/redhat-director-scripts/docker/services/trilio-datamover-osp16.yaml#L158 Let me try by removing 'z' flag. Thank you.
(In reply to Shyam from comment #16) > Hi, > > Yes, we use 'shared,z' flags while mounting the '/var/lib/nova'. > Here is the code. > https://github.com/trilioData/triliovault-cfg-scripts/blob/stable/4.0/redhat- > director-scripts/docker/services/trilio-datamover-osp16.yaml#L158 > > Let me try by removing 'z' flag. > > Thank you. Hi, I can see this was committed in https://github.com/trilioData/triliovault-cfg-scripts/commit/55db096d5ab7496797dfed5c9f23670f119a74ba so I assume it works and closing the BZ. Thanks, Ollie