Description of problem: Intermittently nova fails to mount an NFS volume using a netapp backend with FIPS is enabled. This puddle has been tested multiple times and failure happened once today. Want to confirm if the netapp backend needs to be updated or if there is a potential race in deployment. Error produced: 2023-08-15 03:14:03.612 2 DEBUG nova.virt.libvirt.volume.mount [req-2cb867f8-80e6-419d-9986-26309b832a92 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] _HostMountState.mount(fstype=nfs, export=192.168.1.50:/vol_cinder, vol_name=volume-dbd2fc42-50f6-46f4-bed8-75034fcb0a33, /var/lib/nova/mnt/724dab229d80c6a1a1e49a71c8356eed, options=[]) generation 0 mount /usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py:287 2023-08-15 03:14:03.613 2 DEBUG nova.virt.libvirt.volume.mount [req-2cb867f8-80e6-419d-9986-26309b832a92 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Mounting /var/lib/nova/mnt/724dab229d80c6a1a1e49a71c8356eed generation 0 mount /usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py:299 2023-08-15 03:14:03.615 2 DEBUG oslo_concurrency.lockutils [req-21a9119f-5093-4314-9041-6b4eac9791ce 7e39a6fa25e24604a9b0b0a454bbbf91 f9aa00d658424550990f75f23f187541 - default default] Releasing lock "refresh_cache-70e4550d-c03c-4f90-bdb2-fb6e082cfc90" lock /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:282 2023-08-15 03:14:03.621 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:355 2023-08-15 03:14:03.622 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.001s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:367 2023-08-15 03:14:03.623 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:355 2023-08-15 03:14:03.623 18 DEBUG oslo_concurrency.lockutils [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Lock "4f13d7ce-d27d-4d6b-8c7e-99343707fbe9" released by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python3.9/site-packages/oslo_concurrency/lockutils.py:367 2023-08-15 03:14:03.688 15 DEBUG neutron.wsgi [-] (15) accepted ('172.17.1.29', 54120) server /usr/lib/python3.9/site-packages/eventlet/wsgi.py:992 2023-08-15 03:14:03.697 21 DEBUG neutron_lib.db.api [req-86bca735-e7dc-44bc-9947-1a1aea16f2ad - - - - -] Retry wrapper got retriable exception: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched. wrapped /usr/lib/python3.9/site-packages/neutron_lib/db/api.py:183 2023-08-15 03:14:03.697 21 DEBUG oslo_db.api [req-86bca735-e7dc-44bc-9947-1a1aea16f2ad - - - - -] Performing DB retry for function neutron.api.rpc.handlers.l3_rpc.L3RpcCallback.update_floatingip_statuses wrapper /usr/lib/python3.9/site-packages/oslo_db/api.py:155 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [req-2cb867f8-80e6-419d-9986-26309b832a92 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Setting instance vm_state to ERROR: TypeError: catching classes that do not inherit from BaseException is not allowed 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Traceback (most recent call last): 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py", line 305, in mount 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] nova.privsep.fs.mount(fstype, export, mountpoint, options) 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_privsep/priv_context.py", line 253, in _wrap 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] return self.channel.remote_call(name, args, kwargs) 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_privsep/daemon.py", line 226, in remote_call 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] raise exc_type(*result[2]) 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Command: mount -t nfs 192.168.1.50:/vol_cinder /var/lib/nova/mnt/724dab229d80c6a1a1e49a71c8356eed 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Exit code: 32 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stdout: '' 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No data available\nmount.nfs: access denied by server while mounting 192.168.1.50:/vol_cinder\n' 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] During handling of the above exception, another exception occurred: 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Traceback (most recent call last): 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 10256, in _error_out_instance_on_exception 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] yield 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 5851, in _finish_resize_helper 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] network_info = self._finish_resize(context, instance, migration, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 5789, in _finish_resize 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self._set_instance_info(instance, old_flavor) 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self.force_reraise() 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] raise self.value 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 5772, in _finish_resize 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self.driver.finish_migration(context, migration, instance, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11308, in finish_migration 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] xml = self._get_guest_xml(context, instance, network_info, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7182, in _get_guest_xml 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] conf = self._get_guest_config(instance, network_info, image_meta, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 6789, in _get_guest_config 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] storage_configs = self._get_guest_storage_config(context, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 5395, in _get_guest_storage_config 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] self._connect_volume(context, connection_info, instance) 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 1883, in _connect_volume 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] vol_driver.connect_volume(connection_info, instance) 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/fs.py", line 113, in connect_volume 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] mount.mount(self.fstype, export, vol_name, mountpoint, instance, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py", line 407, in mount 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] mount_state.mount(fstype, export, vol_name, mountpoint, instance, 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/volume/mount.py", line 306, in mount 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] except processutils.ProcessExecutionError(): 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] TypeError: catching classes that do not inherit from BaseException is not allowed 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] 2023-08-15 03:14:03.769 15 DEBUG neutron.pecan_wsgi.hooks.policy_enforcement [req-1bc380d3-ff31-4719-b699-b3bcffdba79a 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Attributes excluded by policy engine: ['standard_attr_id', 'binding:profile', 'binding:host_id', 'binding:vif_type', 'binding:vif_details', 'resource_request'] _exclude_attributes_by_policy /usr/lib/python3.9/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:259 2023-08-15 03:14:03.771 15 INFO neutron.wsgi [req-1bc380d3-ff31-4719-b699-b3bcffdba79a 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] 172.17.1.100,172.17.1.29 "GET /v2.0/ports?device_id=70e4550d-c03c-4f90-bdb2-fb6e082cfc90 HTTP/1.1" status: 200 len: 1148 time: 0.0815990 2023-08-15 03:14:03.826 15 INFO neutron.wsgi [req-92242980-60a4-43c8-aceb-c8383f1798c8 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] 172.17.1.100,172.17.1.29 "GET /v2.0/security-groups?id=706a223d-2fa5-4380-9a2e-82efbcba23a8&fields=id&fields=name HTTP/1.1" status: 200 len: 289 time: 0.0494747 2023-08-15 03:14:03.829 18 DEBUG nova.policy [req-bcb0888d-60f7-43bb-a03e-feb4303a74a2 44e407e8e5224e45b926d3b4924144fd 0f7fbf19b06245e7ad31f7e235fd312f - default default] Policy check for os_compute_api:os-extended-server-attributes failed with credentials {'is_admin': False, 'user_id': '44e407e8e5224e45b926d3b4924144fd', 'user_domain_id': 'default', 'system_scope': None, 'domain_id': None, 'project_id': '0f7fbf19b06245e7ad31f7e235fd312f', 'project_domain_id': 'default', 'roles': ['creator', 'swiftoperator', 'reader', 'member'], 'is_admin_project': True, 'service_user_id': None, 'service_user_domain_id': None, 'service_project_id': None, 'service_project_domain_id': None, 'service_roles': []} authorize /usr/lib/python3.9/site-packages/nova/policy.py:199 Version-Release number of selected component (if applicable): RHOS-17.1-RHEL-9-20230802.n.1 How reproducible: ~10% Steps to Reproduce: 1. Create a deployment with Netapp as the backend and deploy with FIPs 2. Create a volume backed guest and resize the guest to a new flavor 3. Actual results: When resizing the guest the volume fails to mount and resize fails Expected results: Resize completes as expected Additional info: Relevant ci/log info provided in follow up comment.
There is an actual bug here (TypeError), but do we know why the share didn't mount? 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No data available\nmount.nfs: access denied by server while mounting 192.168.1.50:/vol_cinder\n'
(In reply to Eric Harney from comment #4) > There is an actual bug here (TypeError), but do we know why the share didn't > mount? So that's a legit Nova bug in terms of error handling that should be easy enough to fix, but it won't address the root cause, which is below. > 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: > 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No > data available\nmount.nfs: access denied by server while mounting > 192.168.1.50:/vol_cinder\n' I'm guessing we'd need to look at the server-side to figure this out? But... the server-side is Netapp, so...?
(In reply to Artom Lifshitz from comment #5) > (In reply to Eric Harney from comment #4) > > 2023-08-15 03:14:03.764 2 ERROR nova.compute.manager [instance: > > 70e4550d-c03c-4f90-bdb2-fb6e082cfc90] Stderr: 'Failed to connect to bus: No > > data available\nmount.nfs: access denied by server while mounting > > 192.168.1.50:/vol_cinder\n' > > I'm guessing we'd need to look at the server-side to figure this out? But... > the server-side is Netapp, so...? That, and it's also worth looking into whether FIPS mode changes anything in NFS that could affect how NFS shares are mounted etc.
I will go ahead and propose an upstream fix for nova except match shortly. It is indeed looks like a trivial fix to me. I also can try to reproduce this problem manually and collect extra data for NetApp if needed. But I need access to worker node and VMs.
Upstream master review: https://review.opendev.org/c/openstack/nova/+/892496
In fact Melanie already reported this in upstream and proposed fix: https://bugs.launchpad.net/nova/+bug/1984736 https://review.opendev.org/c/openstack/nova/+/852795 Upstream fix is currently blocked by CI (nova-lvm failure). With that being said, should we re-assign this BZ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974