Bug 1086725 - Disk I/O erros in WAL tests
Summary: Disk I/O erros in WAL tests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: sqlite
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Staněk
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F-ExcludeArch-ppc64le, PPC64LETracker
TreeView+ depends on / blocked
 
Reported: 2014-04-11 11:39 UTC by Éric Fintzel
Modified: 2014-09-11 08:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-11 08:32:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sqlite.spec.ppc64le.ignore.test.error.patch (752 bytes, patch)
2014-04-25 12:55 UTC, Michel Normand
no flags Details | Diff

Description Éric Fintzel 2014-04-11 11:39:32 UTC
Description of problem:
Disk I/O erros in some WAL tests of sqlite build for ppc64le.
Problem also occurs with ppc64.

Version-Release number of selected component (if applicable):
sqlite-3.8.4.2-2.fc21.src.rpm

How reproducible:
Build with make check active.

Actual results:
"Error: disk I/O error" message on some tests.

Additional info:

Trying to build and make checks on the ppc64le arch.
Got some disk I/O errors on the WAL (Write-Ahead Logging) tests.

Reproduced the problem under the sqlite3 command line tool. The disk I/O error seems to occur when the -shm file associated to the database file reaches the 65535 bytes size. Here is the corresponding log:

sqlite> .log stdout
sqlite> vacuum;
(5386) os_unix.c:28099: (22) mmap(/tmp/mydb-shm) - 
(5386) statement aborts at 5: [ATTACH '' AS vacuum_db;] disk I/O error
(10) statement aborts at 2: [vacuum;] disk I/O error
Error: disk I/O error
sqlite>


The strace tool was used to capture some more information:

$ strace sqlite3 mydb 'VACUUM;' 2>&1 | tee mydb-vacuum.strace
...
lseek(5, 53247, SEEK_SET)               = 53247
write(5, "\0", 1)                       = 1
lseek(5, 57343, SEEK_SET)               = 57343
write(5, "\0", 1)                       = 1
lseek(5, 61439, SEEK_SET)               = 61439
write(5, "\0", 1)                       = 1
lseek(5, 65535, SEEK_SET)               = 65535
write(5, "\0", 1)                       = 1
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0x8000) = -1 EINVAL (Invalid argument)
fcntl(5, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=121, len=7}) = 0
fcntl(5, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=120, len=1}) = 0
write(2, "Error: disk I/O error\n", 22Error: disk I/O error
) = 22
exit_group(10)                          = ?
+++ exited with 10 +++


Looking at the os_unix.c file, in the unixShmMap() function:
...
    pShmNode->apRegion = apNew;
    while(pShmNode->nRegion<=iRegion){
      void *pMem;
      if( pShmNode->h>=0 ){
        pMem = osMmap(0, szRegion,
            pShmNode->isReadonly ? PROT_READ : PROT_READ|PROT_WRITE,
            MAP_SHARED, pShmNode->h, szRegion*(i64)pShmNode->nRegion
        );
        if( pMem==MAP_FAILED ){
4410      rc = unixLogError(SQLITE_IOERR_SHMMAP, "mmap", pShmNode->zFilename);
          goto shmpage_out;
        }
      }else{
        pMem = sqlite3_malloc(szRegion);
        if( pMem==0 ){
          rc = SQLITE_NOMEM;
          goto shmpage_out;
        }
        memset(pMem, 0, szRegion);
      }
      pShmNode->apRegion[pShmNode->nRegion] = pMem;
      pShmNode->nRegion++;
    }
  }
...

At line 4410, the error message regarding mmap if displayed, and SQLITE_IOERR_SHMMAP symbol value is 5386 (defined in sqlite3.h) as displayed in the sqlite log, with the EINVAL value for errno. The message is displayed because the call to osMmap() failed.


The osMmap() is defined in the same os_unix.c file as:
/*                                                                              
** Many system calls are accessed through pointer-to-functions so that          
** they may be overridden at runtime to facilitate fault injection during       
** testing and sandboxing.  The following array holds the names and pointers    
** to all overrideable system calls.                                            
*/
static struct unix_syscall {
  const char *zName;            /* Name of the system call */
  sqlite3_syscall_ptr pCurrent; /* Current value of the system call */
  sqlite3_syscall_ptr pDefault; /* Default value */
} aSyscall[] = {
...
#if !defined(SQLITE_OMIT_WAL) || SQLITE_MAX_MMAP_SIZE>0
  { "mmap",       (sqlite3_syscall_ptr)mmap,     0 },
#define osMmap ((void*(*)(void*,size_t,int,int,int,off_t))aSyscall[21].pCurrent)
...
}; /* End of the overrideable system calls */


Failing call:
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0x8000) = -1 EINVAL (Invalid argument)

Looking at the mmap() man page, errno can be set to EINVAL in following three cases:
...
       EINVAL We don't like addr, length, or offset (e.g., they are too large,
              or not aligned on a page boundary).

       EINVAL (since Linux 2.6.12) length was 0.

       EINVAL flags  contained neither MAP_PRIVATE or MAP_SHARED, or contained
              both of these values.
...

The case to consider should probably be the first one.


Note this problem does not seem to be specific to ppc64le since it also occur with a ppc64 build, see:

http://ppc.koji.fedoraproject.org/kojifiles/packages/sqlite/3.8.4/1.fc21/data/logs/ppc64/build.log

Comment 1 Michel Normand 2014-04-25 12:55:27 UTC
Created attachment 889744 [details]
sqlite.spec.ppc64le.ignore.test.error.patch

Comment 2 Michel Normand 2014-04-25 12:58:18 UTC
I suggest to modify the spec file as per above sqlite.spec.ppc64le.ignore.test.error.patch to have same bypass for ppc64le as already done for other archi as tracked by bug 1041279.

Comment 3 Michel Normand 2014-06-02 14:32:56 UTC
(In reply to Michel Normand from comment #2)
> I suggest to modify the spec file as per above
> sqlite.spec.ppc64le.ignore.test.error.patch to have same bypass for ppc64le
> as already done for other archi as tracked by bug 1041279.

the last available sqlite-3.8.4.3-3.fc21 is still failing for ppc64le archi and still need an update of spec file as already suggested by previous comment.
===
$git diff
diff --git a/sqlite.spec b/sqlite.spec
index df042ea..140a350 100644
--- a/sqlite.spec
+++ b/sqlite.spec
@@ -164,7 +164,7 @@ rm -f $RPM_BUILD_ROOT/%{_libdir}/*.{la,a}
 # XXX shell tests are broken due to loading system libsqlite3, work around...
 export LD_LIBRARY_PATH=`pwd`/.libs
 export MALLOC_CHECK_=3
-%ifarch s390 s390x ppc ppc64 %{sparc}
+%ifarch s390 s390x ppc %{power64} %{sparc}
 make test || :
 %else
 make test
===

Comment 4 Michel Normand 2014-09-03 12:34:09 UTC
with sqlite-3.8.6-2 last available version sqlite is built and tested OK on ppc64le archi, and has 7 errors for ppc64 arch but bypassed by current spec.
So should be able to close this bug.

http://ppc.koji.fedoraproject.org/koji/buildinfo?buildID=259821
=== ppc64
7 errors out of 212993 tests
Failures on these tests: fts3conf-3.1 fts3conf-3.2 fts3conf-3.3 fts3conf-3.4 fts3conf-3.5 fts3conf-3.6 fts3conf-3.8
=== ppc64le
0 errors out of 212994 tests
===

Comment 5 Jan Staněk 2014-09-11 08:32:14 UTC
After checking the logs, I agree that this bug is probably fixed - closing.

In case of persisting problems fell free to re-open.


Note You need to log in before you can comment on or make changes to this bug.