Bug 1777726

Summary: [4.4.0-4] failed to deploy HE over NFS on RHEL-8.1 in probe_block_size
Product: [oVirt] vdsm Reporter: Yedidyah Bar David <didi>
Component: CoreAssignee: bugs <bugs>
Status: CLOSED NOTABUG QA Contact: Avihai <aefrat>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.40.0CC: bugs, vjuranek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-28 11:02:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2019-11-28 07:50:03 UTC
Description of problem:

Tried to deploy hosted-engine with NFS storage, it failed.

This is in ovirt-hosted-engine-setup-ansible-create_storage_domain-2019102891136-02bez8.log:

2019-11-28 09:11:52,269+0200 DEBUG var changed: host "localhost" var "otopi_storage_domain_details_nfs" type "<class 'dict'>" value: "{                                                      
    "changed": false,
    "exception": "Traceback (most recent call last):\n  File \"/tmp/ansible_ovirt_storage_domain_payload_56jzsnju/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py\", line 792, in main\n  File \"/tmp/ansible_ovirt_storage_domain_payload_56jzsnju/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py\", line 621, in create\n    **kwargs\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py\", line 25168, in add\n    return self._internal_add(storage_domain, headers, query, wait)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 232, in _internal_add\n    return future.wait() if wait else future\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 55, in wait\n    return self._code(response)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 229, in callback\n    self._check_fault(response)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 132, in _check_fault\n    self._raise_error(response, body)\n  File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n    raise error\novirtsdk4.Error: Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400.\n",  
    "failed": true,
    "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."                                                         
}"

This is in vdsm.log:

2019-11-28 09:11:51,677+0200 INFO  (ioprocess/2782) [IOProcess] (156cd2e7-8ce8-440a-8b35-e7eb13523442) Starting ioprocess (__init__:434)
2019-11-28 09:11:51,685+0200 INFO  (jsonrpc/1) [vdsm.api] FINISH createStorageDomain error=[Errno 1] Operation not permitted from=::ffff:192.168.222.91,36782, flow_id=2f9c8796, task_id=92ecf
591-69d5-4bd0-8b7b-2eb016a0311d (api:52)
2019-11-28 09:11:51,685+0200 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='92ecf591-69d5-4bd0-8b7b-2eb016a0311d') Unexpected error (task:889)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-123>", line 2, in createStorageDomain
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2639, in createStorageDomain
    max_hosts=max_hosts)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nfsSD.py", line 97, in create
    storage_block_size = cls._detect_block_size(sdUUID, mntPoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 860, in _detect_block_size                                                                                            
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 370, in probe_block_size                                                                                        
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
PermissionError: [Errno 1] Operation not permitted
2019-11-28 09:11:51,685+0200 INFO  (jsonrpc/1) [storage.TaskManager.Task] (Task='92ecf591-69d5-4bd0-8b7b-2eb016a0311d') aborting: Task is aborted: 'value=[Errno 1] Operation not permitted abortedcode=100' (task:1200)
2019-11-28 09:11:51,685+0200 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH createStorageDomain error=[Errno 1] Operation not permitted (dispatcher:87)                                       
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 110, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 1208, in prepare
    raise self.error
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 896, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-123>", line 2, in createStorageDomain
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2639, in createStorageDomain                                                                                             
    max_hosts=max_hosts)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/nfsSD.py", line 97, in create
    storage_block_size = cls._detect_block_size(sdUUID, mntPoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 860, in _detect_block_size
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 370, in probe_block_size
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
PermissionError: [Errno 1] Operation not permitted
2019-11-28 09:11:51,685+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.create failed (error 351) in 0.24 seconds (__init__:312)

Tried to find relevant logs from ioprocess, failed.

Version-Release number of selected component (if applicable):

vdsm-4.40.0-141.gitb9d2120.el8ev.x86_64
ioprocess-1.3.0-1.el8ev.x86_64

How reproducible:
Not sure. On my system, always (tried 3 times).

Steps to Reproduce:
1. Deploy hosted-engine with nfs
2.
3.

Actual results:

Fails as above

Expected results:

Succeeds

Additional info:

I currently have the machine on which I tried this, can provide access if needed

Comment 1 Yedidyah Bar David 2019-11-28 11:02:44 UTC
The nfs server I used seems to have squashed all UIDs to nobody. I set permissions on the root directory to be mounted to be 1777. This made it possible for everyone to create files, but only for the file owner to remove them. Since all were squashed to nobody, vdsm failed to remove files after it created them. Setting permissions to 777 made it succeed.

Closing for now. If anyone decides to reopen, the fix should probably be to add more logging, to make it clear what failed exactly.

Comment 2 Vojtech Juranek 2019-11-28 11:12:04 UTC
RFE for improving logging is tracked under BZ #1777805