Bug 1281994
Summary: | Slow SQL replication under the 4.2 kernels | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | LukasH <k-rh-bugzilla> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 22 | CC: | gansalmon, hhorak, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-02-15 11:47:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
LukasH
2015-11-14 01:24:29 UTC
Which filesystem is in use and are the processes hung or sleeping? Filesystem is ext4, there is nothing extra with FS flags (or with Fedora install at all, it's normal "F22 Server Edition", without SELinux enabled). But I was fallen with the 4.2 kernels, the problem is elsewhere - it looks like that replication delay starts at some point (regardless of kernel version), and this point is probably : "systemctl restart mariadb.service". Reboot (again, regardless to which kernel version) fix it. I have two F22 SQL slaves, I'll observed it on the first one, will try to verify the same behaviour on the second one this evening. In any case, I'm pretty sure that the same action (`systemctl restart mariadb.service') works fine (without any delay affection) in F21 - the configuration of replication processes and whole MariaDB was exactly the same lot of months ago. So, this bug is very likely (something under) "fc22 related". But I really don't know, why/where. Or what should I trace or observe for better specification of this bugreport. Just another additional info - on 4.3 kernels, situation with replication is still the same or slightly worse. SQL/replication is probably definitely not the reason, just a "result effect" of occasional huge load and huge number of forks/processes on the server(s). So it's probably really kernel (4.2/4.3) and/or systemd related. I found the similar trouble tickets with systemd (and mistake with small number of limit of processes/tasks/threads per user) on Arch Linux forum, but I tried their recommended "stress test" and this is probably not the case in F23 with current/fresh version of systemd, kernel, etc. It's definitely not the HW case/error, as I observe exactly the same problem (with huge CPU/fork load and lagging replications) on several servers with various architecture (nVidia, AMD, Intel based, one VM under VBox...). And finally - the exactly same configuration (of SQL replication and everything else) works like a charm on F21 / 4.1.x kernel instance... Problem solved. Problem was under heavy iowait/ioload of jbd2 journal process, and - due to SQL - problem has been solved by this : innodb_flush_log_at_trx_commit=2 innodb_flush_log_at_timeout=30 (instead of default innodb_flush_log_at_trx_commit=1, which means permanent sync after each commit). In any case, something must be different in ext4/jbd2 code in kernel -- because the same config, SQL traffic and everything else works fine under 4.1.X kernels. |