Bug 1383668

Summary: Incomplete Kernel installation in CentOS6 / CentOS7
Product: [Community] Spacewalk Reporter: jochen
Component: ClientsAssignee: Jan Dobes <jdobes>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.5CC: eherget, jdobes, martin.matuska, mmkl2005, tschweikle, xdmoon
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rhnsd-5.0.27-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1469682 1475039 (view as bug list) Environment:
Last Closed: 2017-09-27 19:24:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1469682, 1484117    
Attachments:
Description Flags
Pull request 554 patch none

Description jochen 2016-10-11 12:07:04 UTC
If a kernel scheduled kernel installation is run using rhnsd, the installation of the kernel might be incomplete.

This has also been reported on the mailing list in:

https://www.redhat.com/archives/spacewalk-list/2016-September/msg00031.html

This has been caused by commit: 4c50c2e6b98fddae7c750caae847ac88d0b262b3

If started as daemon from init/systemd, rhnsd has no devices open. Without above commit, the syslog socket will be fd=0.

With above commit, fd=0 is attached to /dev/null, fd=1 is the syslog socket.

Later, this code is executed:

pipe(fds) will create a pipe on fd=2 / fd=3

	close(fds[0]);

	/* redirect stdout */
	if (fds[1] != STDOUT_FILENO) {
	    dup2(fds[1], STDOUT_FILENO);
	    close(fds[1]);
	}

	/* make sure this child has a stderr */
	dup2(STDOUT_FILENO, STDERR_FILENO);

	/* syslog for safekeeping */
	syslog(LOG_DEBUG, "running program %s", RHN_CHECK);

syslog will fail here because the socket on fd=1 has been replaced by the pipe write end. syslog will close fd=1 and recreate the socket in this case causing further trouble...

Here the relevant parts from strace:

[pid  9249] close(2)                    = 0
[pid  9249] dup2(3, 1)                  = 1
[pid  9249] close(3)                    = 0
[pid  9249] dup2(1, 2)                  = 2
[pid  9249] sendto(1, "<31>Oct 11 15:25:18 rhnsd[9249]: running program /usr/sbin/rhn_check", 68, MSG_NOSIGNAL, NULL, 0) = -1 ENOTSOCK (Socket operation on non-socket)
[pid  9249] close(1)                    = 0
[pid  9249] socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 1
[pid  9249] connect(1, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
[pid  9249] sendto(1, "<31>Oct 11 15:25:18 rhnsd[9249]: running program /usr/sbin/rhn_check", 68, MSG_NOSIGNAL, NULL, 0) = 68

Later on, commands may fail because they don't expect a socket on STDOUT:

/bin/sed: couldn't close stdout: Ung\303\274ltiger Dateideskriptor\nwarning: %post(kernel-2.6.32-642.6.1.el6.x86_64) scriptlet failed, exit status 4\n

Comment 1 Thomas Schweikle 2016-11-17 11:47:09 UTC
Seen with kernels from OracleLinux (UEK, UEKR3, UEKR4), ELRepo (www.kernel.org-Kernels unpatched, both (kernel types: kernel-ml, kernel lt). Scripts to update grub configuration are not called. The kernel are installed, but you cant boot them without editing grub.cfg/menu.cfg by hand.

Comment 2 Thomas Schweikle 2016-12-14 08:20:10 UTC
Incomplete kernel configuration has consequences: new kernels wont boot. If kernels are kernel bug fixes, these will never make it into the booting system, because systems wont boot the new kernel!
This makes this bug into a really bad bug because it leaves systems vulnerable to kernel bugs! Kernels are installed, but never booted!
Keeping the vulnerable ones online even after reboots (grub doesn't have the configuration lines to boot the newly installed kernel).

Comment 3 Thomas Schweikle 2016-12-14 08:23:41 UTC
It is not only CentOS 6/7:
- RHEL 6/7
- OracleLinux 6/7, both std-Kernels, uek-Kernels
- Fedora
- OpenSuse

It seems all RPM-based distributions are affected.

Comment 4 Thomas Schweikle 2017-01-04 20:52:21 UTC
This seems a side effect of disabling or not allowing scripts to run. If this is not allowed, scripts within rpms wont execute?

Comment 5 Martin Matuska 2017-06-30 10:32:26 UTC
I can confirm this issue on our site.

Comment 6 Martin Matuska 2017-07-05 10:23:45 UTC
I have created a pull request with a possible solution:
https://github.com/spacewalkproject/spacewalk/pull/554

Comment 7 Martin Matuska 2017-07-05 10:24:46 UTC
Created attachment 1294537 [details]
Pull request 554 patch

Comment 8 Jan Dobes 2017-07-11 15:40:52 UTC
fixed in spacewalk.git(master):

92b28b8f3bedd4f90334c8ffcf6558c905450b5d

What happens here before patch:
fd 0 (/dev/null) is opened
fd 1 (syslog) is opened
...
fd 2 (pipe[0]) is opened
fd 3 (pipe[1]) is opened
fork, then in child:
fd 2 (pipe[0]) is closed
fd 3 (pipe[1]) duplicated to fd 1 (where we want STDOUT), syslog socket is replaced
fd 3 (pipe[1]) is closed
fd 1 (STDOUT) duplicated to fd 2 (STDERR)
...
then next call of syslog will replace fd 1 back to syslog socket and STDOUT is lost


What happens here after patch:
fd 0 (/dev/null) is opened
fd 1 (syslog) is opened
...
fd 2 (pipe[0]) is opened
fd 3 (pipe[1]) is opened
fork, then in child:
fd 2 (pipe[0]) is closed
fd 1 (syslog) is closed
fd 3 (pipe[1]) duplicated to fd 1 (where we want STDOUT)
fd 3 (pipe[1]) is closed
fd 1 (STDOUT) duplicated to fd 2 (STDERR)
fd 3 (syslog) is opened
...
both STDOUT and syslog should be fine

Comment 9 stevemayster 2017-07-17 13:37:44 UTC
Does this update will be added to spacewalk-client.repo? 
spacewalk-client-nightly.repo not sign package at all, and this i believe by design.

Comment 10 Jan Dobes 2017-07-17 16:15:43 UTC
This fix will be part of Spacewalk 2.7 client repo release which should be in few weeks.

Comment 11 Eric Herget 2017-09-27 19:24:50 UTC
Spacewalk 2.7 has been released.

https://github.com/spacewalkproject/spacewalk/wiki/ReleaseNotes27