Bug 567113
Summary: | ext4 journal_data open O_SYNC etc. fails with EINVAL, blocks postgresql wal default sync method | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bruno Wolff III <bruno> |
Component: | kernel | Assignee: | Eric Sandeen <esandeen> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 13 | CC: | anton, devrim, dougsland, esandeen, fche, gansalmon, itamar, jonathan, kernel-maint, tgl |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-05-05 19:25:32 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bruno Wolff III
2010-02-21 18:07:13 UTC
Hm, I tried to install f13 so I could investigate this, but today's boot.iso freezes at the welcome screen :-(. Could you provide a bit more detail about how you installed this? Also, what filesystem did you pick? I did yum updates for the two systems I had been seeing this on. Both have /var/lib/pgsql on ext3. However, this got me to try it on another system where /var/lib/pgsql is on an ext4 file system and it does not exhibit this problem. This machine was also yum updated, though doesn't go as far back as the other 2. One of the problem machines had been running the corresponding f12 version of postgres and once I had things going again the existing data was usable. Maybe there was a change to ext3 support that broke the default sync type (open_datasync according to the comments). These 3 machines are all i686. Tomorrow I'll have access to an x86_64 machine (I am locked out right now due to openssh/selinux issue) and I'll report what I find there. Other notes. I tried switching selinux to permissive and checking the audit log and didn't find any reason to believe selinux is causing the problem. I tried using both existing data directories (from the corresponding f12 version of postgres) and fresh initdb's. The results didn't seem to be related to that. I haven't tried doing initdb's on different file systems on the same machine, though I think I have one that has both. If you want I could look at that? I also haven't tried any specific fsync types other than fsync. If you think it would, be useful I could try all of the different possible ones and see what results I get? Yeah, please, on both points. I think this is almost certainly a kernel regression, but we ought to try to narrow it down as much as possible before bugging those folk about it. I did some more testing but results seem mysterious to me. Whether open_sync and open_datasync worked seemed to be tied to the machine, but not the filesystem type. All three machines should have packages at the same version, though not all had the same set of packages installed. 2 and 3 are mostly identical hardware, but 2 has a couple of extra pata drives thrown in. None are using lvm, all are using raid 1, and machines 1, 3 and 4 use luks. However machines 1, 2 and 4 use noatime and barrier=1 as mount options and machine 3 doesn't. I'll be checking that shortly. machine1 x86_64 ext3 fsync OK machine1 x86_64 ext3 fsync_writethrough invalid machine1 x86_64 ext3 open_sync fail machine2 i686 ext3 default fail machine2 i686 ext3 open_datasync fail machine2 i686 ext3 fdatasync OK machine2 i686 ext3 fsync OK machine2 i686 ext3 fsync_writethrough invalid machine2 i686 ext3 open_sync fail machine2 i686 ext4 default fail machine2 i686 ext4 open_datasync fail machine2 i686 ext4 fdatasync OK machine2 i686 ext4 fsync OK machine2 i686 ext4 fsync_writethrough invalid machine2 i686 ext4 open_sync fail machine3 i686 ext4 default OK machine3 i686 ext4 open_datasync OK machine3 i686 ext4 fdatasync OK machine3 i686 ext4 fsync OK machine3 i686 ext4 fsync_writethrough invalid machine3 i686 ext4 open_sync OK It doesn't look like noatime nor barrier=1 makes a difference. I am going to check journalling type next. Setting the default mount option (with tune2fs) journal_data triggers the behavior. OK, I finally got f-13 alpha rc4 installed here, and I confirm it: things are fine by default, but after you set tune2fs -o journal_data /dev/sdaX and reboot, the database no longer starts. strace'ing the postmaster shows it fails here: [pid 1662] open("pg_xlog/000000010000000000000000", O_RDWR|O_SYNC|O_DIRECT|O_LARGEFILE) = -1 EINVAL (Invalid argument) I suspect the kernel is unhappy about either O_SYNC or O_DIRECT, but in any case this is a regression from previous behavior. Reassigning to kernel to see what they say about it. Reproduced here with: kernel-PAE-2.6.33-1.fc13.i686 If this isn't fixed by final release, it would probably be good to put something in the release notes about it under postgresql. Sorry for not seeing this bug sooner. There is no ->direct_IO method for ext3 or ext4 in journalled data mode - never has been, AFAIK. In my testing, ext3 & ext4 fail the same way, it's not unique to ext4. open("/mnt/test/blah", O_WRONLY|O_CREAT|O_TRUNC|O_DIRECT, 0666) = -1 EINVAL (Invalid argument) __dentry_open() does this: /* NB: we're sure to have correct a_ops only after f_op->open */ if (f->f_flags & O_DIRECT) { if (!f->f_mapping->a_ops || ((!f->f_mapping->a_ops->direct_IO) && (!f->f_mapping->a_ops->get_xip_mem))) { fput(f); f = ERR_PTR(-EINVAL); } } so, you get EINVAL. If this "worked" before, I suspect that at a minimum you weren't actually getting direct IO... (spot-checks RHEL5 ... same deal) Did this really used to work? I'm doubtful. -Eric It worked in that postgres used to run with that setting. I have no idea if direct IO was really used or not, as I didn't test on that level. Maybe the error code being returned changed or something of that kind. I'm pretty sure that postgres' behavior didn't change in this respect between f12 and f13. Are you sure that you had the filesystem set to journal_data mode before? I don't believe I touched anything related to the journal mode setting at the time I started seeing the postgresql server stop working. It's possible that some other behavior changed and that while I had requested data_journal, I wasn't really getting it. I am running software raid 1 and I remember that for a long time that I was getting errors related to that and write barriers. So maybe journalling was being disabled because lack of support of write barriers with software raid? Barriers shouldn't be related. From what I see, it's not that DIO would fail, it's that O_DIRECT open would fail as well, at least since 2.6.18 kernels. -Eric If this is the expected behavior, then it's not a regression and we can close this? Probably not too many people will run into this. It might make sense to document this somewhere in postgres. Well, I do not think it is a regression, I just wish we knew for sure why it seemed to work for you before. :) Fine by me to close it though; there isn't any ->direct_IO for data=journal ... not even in rhel4 (just checked) -Eric All of my machines are running F13 now. I have one that I can wipe, but I am short on time. I'll have rebuilding that machine with F11 or early F12 low on priority list. If I can get postgres to work with the direct IO setting and data journaling, I'll try upgrading the kernel and see if that changes things. But considering the low impact, I won't be rushing to do this relative to all of the other Fedora stuff I am supposed to be working on. |