Bug 1726834

Summary: ioprocess readfile(direct=True) does not use direct I/O
Product: [oVirt] ovirt-distribution Reporter: Nir Soffer <nsoffer>
Component: ioprocessAssignee: Nir Soffer <nsoffer>
ioprocess sub component: General QA Contact: Pavol Brilla <pbrilla>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: aefrat, lleistne
Version: 4.4.0Keywords: ZStream
Target Milestone: ovirt-4.3.6Flags: sbonazzo: ovirt-4.3?
Target Release: 4.3.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.30.25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-26 19:43:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Soffer 2019-07-03 20:30:21 UTC
Description of problem:

IOProcess.readfile(direct=True) is expected to read a file using direct I/O,
bypassing the host page cache. Due to unfortunate error, it does not use 
direct I/O:

The underlying file is opened here:

668     fd = open(path->str, flags);

But the O_DIRECT flag is added after file was opened:

696     if (direct) {
697         flags |= O_DIRECT;
698     }

There error was introduce in:

commit 0428d6867ee9bfd22b26739989e19dffa58351d4
Author: Saggi Mizrahi <ficoos>
Date:   Tue Dec 31 13:02:56 2013 +0200

    Initial Commit
    
    Signed-off-by: Saggi Mizrahi <ficoos>


Version-Release number of selected component (if applicable):
Any.

How reproducible:
Always

Steps to Reproduce:
1. call IOProcess.readfile(direct=True)

Not using direct I/O may return stale data from the page cache when another
host was modifying the file data on another host.

I'm not sure if this is practically possible.

We had similar bug for IOProcess.writefile(direct=True) (bug 1535401) closed
as wontfix, since we don't need direct I/O for writing.

Comment 1 Nir Soffer 2019-07-03 20:36:31 UTC
This affects one call in vdsm:

326 def directReadLines(ioproc, path):
327     fileStr = ioproc.readfile(path, direct=True)                                                                                                                         
328     return fileStr.splitlines(True)

We have 2 callers of this code:

136     def readlines(self):
137         try:
138             return stripNewLines(self._oop.directReadLines(self._metafile))                                                                                              
139         except (IOError, OSError) as e:
140             if e.errno != errno.ENOENT:
141                 raise
142             return []

This is used for reading storage domain metadata on all file based domains
(NFS, GlusterFS, localfs, POSIX).

144     def getMetadata(self, metaId=None):
145         """
146         Get Meta data array of key,values lines
147         """
148         if not metaId:
149             metaId = self.getMetadataId()
150 
151         volPath, = metaId
152         metaPath = self.getMetaVolumePath(volPath)
153 
154         try:
155             lines = self.oop.directReadLines(metaPath)                                                                                                                   
156         except Exception as e:
157             self.log.error(e, exc_info=True)
158             raise se.VolumeMetadataReadError("%s: %s" % (metaId, e))
159 
160         md = VolumeMetadata.from_lines(lines)
161         return md

This is for reading volume metadata in all file based storage domains
(NFS, GlusterFS, localfs, POSIX).

This may cause storage domain or volume metadata to be stale, missing updates done
on another host, for example, storage job updating volume metadata, or domain upgraded
to new version.

Comment 2 Nir Soffer 2019-07-11 20:54:28 UTC
Should be available in 4.3.6. ioprocess including this fix was merged,
vdsm patch requiring it is posted.

Comment 3 Avihai 2019-08-26 11:55:16 UTC
Nir, please provide a clear reproduction/verification steps

Comment 4 Nir Soffer 2019-08-26 20:46:50 UTC
I don't think QE can reproduce this, and this is fix is covered
by automated tests, so the only thing QE can do is to run the standard
automated tests to ensure that we don't have any regressions.

Comment 5 Pavol Brilla 2019-09-02 09:56:32 UTC
Verified on build rhv-4.3.6-5/ vdsm-4.30.29-2.el7ev.x86_64.

Comment 7 Sandro Bonazzola 2019-09-26 19:43:15 UTC
This bugzilla is included in oVirt 4.3.6 release, published on September 26th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.