Bug 862917
| Summary: | Varnish 3.0.3 hangs on startup | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Tom Moertel <tom> | ||||
| Component: | selinux-policy | Assignee: | Miroslav Grepl <mgrepl> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 17 | CC: | dominick.grift, dwalsh, ingvar, mgrepl, tore | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 861493 | Environment: | |||||
| Last Closed: | 2012-12-20 15:37:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 861493 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
Could it be related to using a private /tmp directory in recent varnish packages? I know that PrivateTmp=true was introduced in varnish-3.0.2-2.fc17 [1], which works, but perhaps some change introduced in 3.0.3-1.fc17 depends upon normal /tmp? [1] https://bugzilla.redhat.com/show_bug.cgi?id=782539 Created attachment 624289 [details]
Command dump showing a successful start of varnishd
I'm not able to reproduce this on fedora 17/x86_64 with selinux enforcing. See attached command dump.
Ingvar
Thanks for taking a look at this problem. When I reproduced it, I did so on a nearly bare Amazon EC2 m1.small x86_64 instance running Fedora 17 (Fedora Provided ami-2ea50247 [1]) after a yum update and reboot. Is it possible that varnish-3.0.3-1.fc17 has some additional, undeclared dependency that exists within your test environment but not in a bare environment in which a person yum installs varnish-3.0.3-1.fc17? Links: [1] https://fedoraproject.org/wiki/Cloud_images By Ingvar's request:
type=AVC msg=audit(1349863214.323:1515870): avc: denied { execmem } for pid=11549 comm="varnishd" scontext=system_u:system_r:varnishd_t:s0 tcontext=system_u:system_r:varnishd_t:s0 tclass=process
type=SYSCALL msg=audit(1349863214.323:1515870): arch=c000003e syscall=9 success=no exit=-13 a0=0 a1=10000 a2=7 a3=22 items=0 ppid=11546 pid=11549 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="varnishd" exe="/usr/sbin/varnishd" subj=system_u:system_r:varnishd_t:s0 key=(null)
type=AVC msg=audit(1349863214.493:1515871): avc: denied { execmem } for pid=11563 comm="varnishd" scontext=system_u:system_r:varnishd_t:s0 tcontext=system_u:system_r:varnishd_t:s0 tclass=process
type=SYSCALL msg=audit(1349863214.493:1515871): arch=c000003e syscall=9 success=no exit=-13 a0=0 a1=10000 a2=7 a3=22 items=0 ppid=11562 pid=11563 auid=4294967295 uid=991 gid=987 euid=991 suid=991 fsuid=991 egid=987 sgid=987 fsgid=987 tty=(none) ses=4294967295 comm="varnishd" exe="/usr/sbin/varnishd" subj=system_u:system_r:varnishd_t:s0 key=(null)
Tore
Tom, see the selinux audit message above. Could this be related to your problem? Ingvar selinux-policy, please have a look at this. Ingvar Is varnishd using some kind of java or JIT operation here. Execmem is caused when you have both executable and writeable memory at the same time. Here is further explanation. http://www.akkadia.org/drepper/selinux-mem.html Daniel, There is no java involved, but varnish does something that may look like jit. varnishd calls gcc to compile its vcl configuration to a shared library, which in turn is loaded. Both these operations happens at run time. For those who do not know varnish, this may seem a bit strange, but it is among the things that makes varnish very fast. FAQ: https://www.varnish-cache.org/docs/3.0/faq/general.html#why Ingvar Ingvar, yes I get the same audit message when running 3.0.3-1.fc17 while tailing /var/log/audit/audit.log:
[root@localhost ~]# rpm -q varnish
varnish-3.0.3-1.fc17.x86_64
[root@localhost ~]# systemctl start varnish.service
type=AVC msg=audit(1349878121.947:28109): avc: denied { execmem } for pid=11974 comm="varnishd" scontext=system_u:system_r:varnishd_t:s0 tcontext=system_u:system_r:varnishd_t:s0 tclass=process
type=SYSCALL msg=audit(1349878121.947:28109): arch=c000003e syscall=9 success=no exit=-13 a0=0 a1=10000 a2=7 a3=22 items=0 ppid=11971 pid=11974 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="varnishd" exe="/usr/sbin/varnishd" subj=system_u:system_r:varnishd_t:s0 key=(null)
But the audit message is not produced under 3.0.2-2.fc17:
# yum remove varnish varnish-libs
# yum install varnish-3.0.2-2.fc17.x86_64
[root@localhost ~]# rpm -q varnish
varnish-3.0.2-2.fc17.x86_64
[root@localhost ~]# systemctl start varnish.service
type=SERVICE_START msg=audit(1349878217.984:28120): pid=0 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg=' comm="varnish" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
So I would say that we have probably identified the cause of the problem :-)
Upstream reports that 3.0.3 has started using jit-functionality in pcre, so this might also be related. selinux-policy, advice please. Ingvar We added execmem to F18. Fixed.
commit d436068f0dd4e3df288cf534a5e183ff9d79d8b2
Author: Miroslav Grepl <mgrepl>
Date: Wed Oct 17 09:00:48 2012 +0200
Allow varnish execmem
selinux-policy-3.10.0-156.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/selinux-policy-3.10.0-156.fc17 Package selinux-policy-3.10.0-156.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing selinux-policy-3.10.0-156.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-16347/selinux-policy-3.10.0-156.fc17 then log in and leave karma (feedback). selinux-policy-3.10.0-156.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |
Description of problem: The varnish-3.0.3-1.fc17 package causes varnish to hang when started by systemd if you have any custom VCL rules (which every non-trivial use will have). Downgrading to 3.0.2-2.fc17 solves the problem, as does starting varnish by hand instead of via systemd. Steps to Reproduce: 1. Install varnish-3.0.3-1.fc17.x86_64. 2. Edit /etc/varnish/default.vcl to add any logic you want, as long as it's not a no-op. For example: # cat >>/etc/varnish/default.vcl <<EOF sub vcl_recv { set req.url = regsub(req.url, " +$", ""); } EOF 3. Try to start varnish: # systemctl start varnish.service Actual results: The startup process for varnishd hangs, and systemd eventually reports the following in /var/log/messages: systemd[1]: varnish.service operation timed out. Terminating. systemd[1]: Unit varnish.service entered failed state. Expected results: Varnishd starts promptly and reliably. Additional info: You can make the test recipe produce the expected results by doing any of the following: * Removing all VCL logic from /etc/varnish/default.vcl. (Obviously, this is not a desirable solution since any non-trival use of Varnish will require custom VCL rules.) * Downgrading to the 3.0.2-2.fc17 package. * Starting varnishd by hand (as root), using the exact same ExecStart command that systemd uses: # /usr/sbin/varnishd -P /var/run/varnish.pid -f /etc/varnish/default.vcl -a :80 -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -S /etc/varnish/secret -s file,/var/lib/varnish/varnish_storage.bin,1G These three pieces of evidence suggest that something changed between 3.0.2-2.fc17 and 3.0.3-1.fc17 that affects how Varnish compiles VCL rules into C and ultimately into a binary. That the 3.0.3-1.fc17 varnishd actually starts as expected when started by hand, but not by systemctl, suggests that the problematic part of the change is triggered by some constraint in the systemd environment that is loosened in the root user's environment. (SELinux?)