Description of problem: IOProcess.readfile(direct=True) is expected to read a file using direct I/O, bypassing the host page cache. Due to unfortunate error, it does not use direct I/O: The underlying file is opened here: 668 fd = open(path->str, flags); But the O_DIRECT flag is added after file was opened: 696 if (direct) { 697 flags |= O_DIRECT; 698 } There error was introduce in: commit 0428d6867ee9bfd22b26739989e19dffa58351d4 Author: Saggi Mizrahi <ficoos> Date: Tue Dec 31 13:02:56 2013 +0200 Initial Commit Signed-off-by: Saggi Mizrahi <ficoos> Version-Release number of selected component (if applicable): Any. How reproducible: Always Steps to Reproduce: 1. call IOProcess.readfile(direct=True) Not using direct I/O may return stale data from the page cache when another host was modifying the file data on another host. I'm not sure if this is practically possible. We had similar bug for IOProcess.writefile(direct=True) (bug 1535401) closed as wontfix, since we don't need direct I/O for writing.
This affects one call in vdsm: 326 def directReadLines(ioproc, path): 327 fileStr = ioproc.readfile(path, direct=True) 328 return fileStr.splitlines(True) We have 2 callers of this code: 136 def readlines(self): 137 try: 138 return stripNewLines(self._oop.directReadLines(self._metafile)) 139 except (IOError, OSError) as e: 140 if e.errno != errno.ENOENT: 141 raise 142 return [] This is used for reading storage domain metadata on all file based domains (NFS, GlusterFS, localfs, POSIX). 144 def getMetadata(self, metaId=None): 145 """ 146 Get Meta data array of key,values lines 147 """ 148 if not metaId: 149 metaId = self.getMetadataId() 150 151 volPath, = metaId 152 metaPath = self.getMetaVolumePath(volPath) 153 154 try: 155 lines = self.oop.directReadLines(metaPath) 156 except Exception as e: 157 self.log.error(e, exc_info=True) 158 raise se.VolumeMetadataReadError("%s: %s" % (metaId, e)) 159 160 md = VolumeMetadata.from_lines(lines) 161 return md This is for reading volume metadata in all file based storage domains (NFS, GlusterFS, localfs, POSIX). This may cause storage domain or volume metadata to be stale, missing updates done on another host, for example, storage job updating volume metadata, or domain upgraded to new version.
Should be available in 4.3.6. ioprocess including this fix was merged, vdsm patch requiring it is posted.
Nir, please provide a clear reproduction/verification steps
I don't think QE can reproduce this, and this is fix is covered by automated tests, so the only thing QE can do is to run the standard automated tests to ensure that we don't have any regressions.
Verified on build rhv-4.3.6-5/ vdsm-4.30.29-2.el7ev.x86_64.
This bugzilla is included in oVirt 4.3.6 release, published on September 26th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.