Bug 1574900

Summary: VDSErrorException: Failed to CreateStorageDomainVDS, error = Error creating a storage domain: ('storageType=7, ..), code = 351
Product: [oVirt] ovirt-engine Reporter: Petr Balogh <pbalogh>
Component: BLL.StorageAssignee: Maor <mlipchuk>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.2CC: bugs, kdhananj, mlipchuk, pbalogh, sabose, tnisan
Target Milestone: ovirt-4.3.0Keywords: Automation
Target Release: ---Flags: rule-engine: ovirt-4.3+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-18 09:15:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume01
none
rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume02
none
rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume03
none
mountpoint
none
cli log none

Description Petr Balogh 2018-05-04 09:59:24 UTC
Description of problem:
When adding glusterfs storage domains in parallel via ansbile modules I got on the last one SD error:
Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400.

In ovirt log I see:
2018-05-04 11:24:37,389+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default t
ask-46) [5ea48b7d] Command 'CreateStorageDomainVDSCommand(HostName = host_mixed_1, CreateStorageDomainVDSCommandParameters:{hostId='2a5f8f74-d7de-419f-a43a-b612932d7411', storageDomain='StorageDomainStatic:{name='test_gluster_2', id='03cc8bca-218d-47ac-801e-327f966c1df7'}', args='storage.server.com:/GE_he2_volume03'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStorageDomainVDS, error = Error creating a storage domain: ('storageType=7, sdUUID=03cc8bca-218d-47ac-801e-327f966c1df7, domainName=test_gluster_2, domClass=1, typeSpecificArg=storage.server.com:/GE_he2_volume03 domVersion=4',), code = 351

for rest look at log files.

From UI when I tried add this SD I didn't hit any issue again.

Version-Release number of selected component (if applicable):
rhv 4.2.3-5

How reproducible:
Not sure it it's easily reproducible, but as I told we hit this issue when parallel adding these glusterfs SDs.

Actual results:
Failing with error

Expected results:
All SDs added OK

Additional info:
Logs attached

Comment 3 Yaniv Kaul 2018-05-07 00:57:47 UTC
Have you seen anything on VDSM? Gluster logs?

Comment 4 Allon Mureinik 2018-05-07 11:48:25 UTC
Maor, can you take a look please?

Comment 5 Maor 2018-05-07 12:37:31 UTC
It seems like the VDSM logs fomr the time the exception occured (2018-05-04 11:24:37) are missing.
Can you please add the VDSM logs of host_mixed_2 and host_mixed_1 which includes the entire hour.

From what we have so far I can tell that the reason which you probably succeeded to create the SD from the GUI was because host_mixed_2 was used to create the storage domain.
The exception which occured in 11:24 was when host_mixed_1 was used to create the storage domain.

Comment 8 Petr Balogh 2018-05-09 15:49:07 UTC
I reproduced issue when I ran again on the same env, so hope I attached all the logs you need now.

Comment 9 Maor 2018-05-10 06:58:34 UTC
It seems like the problem is because the glusterfs is in read only mode (see [1]), that is why the Host can't create the storage domain.


[1]
2018-05-09 17:26:15,020+0300 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='a1c7ef58-1570-4b89-a885-74a5c4f4a1e5') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in createStorageDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2591, in createStorageDomain
    storageType, domVersion)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 83, in create
    version)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 50, in _preCreateValidation
    fileSD.validateFileSystemFeatures(sdUUID, domPath)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 104, in validateFileSystemFeatures
    oop.getProcessPool(sdUUID).directTouch(testFilePath)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 320, in directTouch
    ioproc.touch(path, flags, mode)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 567, in touch
    self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 451, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 30] Read-only file system

Comment 10 Maor 2018-05-10 06:59:40 UTC
Created attachment 1434239 [details]
rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume01

Comment 11 Maor 2018-05-10 07:00:34 UTC
Created attachment 1434240 [details]
rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume02

Comment 12 Maor 2018-05-10 07:01:02 UTC
Created attachment 1434241 [details]
rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume03

Comment 13 Maor 2018-05-10 07:01:39 UTC
Created attachment 1434242 [details]
mountpoint

Comment 14 Maor 2018-05-10 07:02:03 UTC
Created attachment 1434244 [details]
cli log

Comment 15 Maor 2018-05-10 07:05:14 UTC
I've attached the gluster logs from Petr attachment.
Sahina, can you please help with understand why the glusterfs became readonly?

Comment 16 Sahina Bose 2018-05-18 11:41:29 UTC
Krutika, could you take a look please?
There are messages of the form "failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running." in the mount logs

Comment 17 Krutika Dhananjay 2018-05-21 06:09:53 UTC
(In reply to Sahina Bose from comment #16)
> Krutika, could you take a look please?
> There are messages of the form "failed to get the port number for remote
> subvolume. Please run 'gluster volume status' on server to see if brick
> process is running." in the mount logs

Sure. Funny thing is that the there is no log from GE_he2_volume03-client-1 indicating whether it successfully connected to the client or not. And a fuse client is not supposed to process any IO (such as the "create" on __DIRECT_IO_TEST__) from the application until AFR has heard from all of its children. In this case, AFR either hasn't heard from GE_he2_volume03-client-1 at all and still notified FUSE to go ahead with IO, or there is some codepath in which a CONNECT/DISCONNECT event from GE_he2_volume03-client-1 is NOT logged.

Let me dig into client translator code and get back.
Keeping the needinfo intact until then.

-Krutika

Comment 18 Krutika Dhananjay 2018-05-21 07:30:50 UTC
Could you please attach glusterd logs as well from all 3 hosts?

You'll find them under /var/log/glusterfs and named glusterd.log.

-Krutika

Comment 19 Krutika Dhananjay 2018-05-21 07:31:33 UTC
(In reply to Krutika Dhananjay from comment #18)
> Could you please attach glusterd logs as well from all 3 hosts?
> 
> You'll find them under /var/log/glusterfs and named glusterd.log.
> 
> -Krutika

Also attach all glusterd.log* files in case they got rotated over the weekend.

-Krutika

Comment 20 Krutika Dhananjay 2018-05-21 08:28:20 UTC
And also the log files under /var/log/glusterfs/bricks of all 3 hosts please ...

Comment 21 Maor 2018-05-29 13:43:20 UTC
Seems like this issue is more glusterfs oriented.
and it is currently wait for needinfo, therefore posponing it to 4.3

Comment 22 Petr Balogh 2018-06-04 12:25:48 UTC
(In reply to Krutika Dhananjay from comment #18)
> Could you please attach glusterd logs as well from all 3 hosts?
> 
> You'll find them under /var/log/glusterfs and named glusterd.log.
> 
> -Krutika

Hey, we cannot reproduce this issue with latest builds. So cannot provide any another logs, and env was already provisioned multiple times...

All logs from first host were attached in Comment 8 , you can find copied whole folder /var/log/glusterfs in logs under  ./HostsLogs/tmp/ovirt-logs-hypervisor/glusterfs, but anyway I see there is no glusterd.log.

Comment 23 Tal Nisan 2018-07-18 09:15:39 UTC
Due to the bug not being able to reproduce and the lack of additional logs due to that closing as INSUFFICIENT_DATA, please reopen if you manage to reproduce in the future.