Bug 1777726 - [4.4.0-4] failed to deploy HE over NFS on RHEL-8.1 in probe_block_size
Summary: [4.4.0-4] failed to deploy HE over NFS on RHEL-8.1 in probe_block_size
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-28 07:50 UTC by Yedidyah Bar David
Modified: 2019-11-28 11:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-28 11:02:44 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)

Description Yedidyah Bar David 2019-11-28 07:50:03 UTC
Description of problem:

Tried to deploy hosted-engine with NFS storage, it failed.

This is in ovirt-hosted-engine-setup-ansible-create_storage_domain-2019102891136-02bez8.log:

2019-11-28 09:11:52,269+0200 DEBUG var changed: host "localhost" var "otopi_storage_domain_details_nfs" type "<class 'dict'>" value: "{                                                      
    "changed": false,
    "exception": "Traceback (most recent call last):\n  File \"/tmp/ansible_ovirt_storage_domain_payload_56jzsnju/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py\", line 792, in main\n  File \"/tmp/ansible_ovirt_storage_domain_payload_56jzsnju/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py\", line 621, in create\n    **kwargs\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py\", line 25168, in add\n    return self._internal_add(storage_domain, headers, query, wait)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n    return future.wait() if wait else future\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 55, in wait\n    return self._code(response)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 229, in callback\n    self._check_fault(response)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n    self._raise_error(response, body)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n    raise error\novirtsdk4.Error: Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400.\n",  
    "failed": true,
    "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."                                                         
}"

This is in vdsm.log:

2019-11-28 09:11:51,677+0200 INFO  (ioprocess/2782) [IOProcess] (156cd2e7-8ce8-440a-8b35-e7eb13523442) Starting ioprocess (__init__:434)
2019-11-28 09:11:51,685+0200 INFO  (jsonrpc/1) [vdsm.api] FINISH createStorageDomain error=[Errno 1] Operation not permitted from=::ffff:192.168.222.91,36782, flow_id=2f9c8796, task_id=92ecf
591-69d5-4bd0-8b7b-2eb016a0311d (api:52)
2019-11-28 09:11:51,685+0200 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='92ecf591-69d5-4bd0-8b7b-2eb016a0311d') Unexpected error (task:889)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-123>", line 2, in createStorageDomain
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2639, in createStorageDomain
    max_hosts=max_hosts)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nfsSD.py", line 97, in create
    storage_block_size = cls._detect_block_size(sdUUID, mntPoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 860, in _detect_block_size                                                                                            
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 370, in probe_block_size                                                                                        
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
PermissionError: [Errno 1] Operation not permitted
2019-11-28 09:11:51,685+0200 INFO  (jsonrpc/1) [storage.TaskManager.Task] (Task='92ecf591-69d5-4bd0-8b7b-2eb016a0311d') aborting: Task is aborted: 'value=[Errno 1] Operation not permitted abortedcode=100' (task:1200)
2019-11-28 09:11:51,685+0200 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH createStorageDomain error=[Errno 1] Operation not permitted (dispatcher:87)                                       
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1208, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-123>", line 2, in createStorageDomain
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2639, in createStorageDomain                                                                                             
    max_hosts=max_hosts)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nfsSD.py", line 97, in create
    storage_block_size = cls._detect_block_size(sdUUID, mntPoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 860, in _detect_block_size
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 370, in probe_block_size
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
PermissionError: [Errno 1] Operation not permitted
2019-11-28 09:11:51,685+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.create failed (error 351) in 0.24 seconds (__init__:312)

Tried to find relevant logs from ioprocess, failed.

Version-Release number of selected component (if applicable):

vdsm-4.40.0-141.gitb9d2120.el8ev.x86_64
ioprocess-1.3.0-1.el8ev.x86_64

How reproducible:
Not sure. On my system, always (tried 3 times).

Steps to Reproduce:
1. Deploy hosted-engine with nfs
2.
3.

Actual results:

Fails as above

Expected results:

Succeeds

Additional info:

I currently have the machine on which I tried this, can provide access if needed

Comment 1 Yedidyah Bar David 2019-11-28 11:02:44 UTC
The nfs server I used seems to have squashed all UIDs to nobody. I set permissions on the root directory to be mounted to be 1777. This made it possible for everyone to create files, but only for the file owner to remove them. Since all were squashed to nobody, vdsm failed to remove files after it created them. Setting permissions to 777 made it succeed.

Closing for now. If anyone decides to reopen, the fix should probably be to add more logging, to make it clear what failed exactly.

Comment 2 Vojtech Juranek 2019-11-28 11:12:04 UTC
RFE for improving logging is tracked under BZ #1777805


Note You need to log in before you can comment on or make changes to this bug.