Bug 1154518 - When the os-prober script is run (on my system) the newns program in /usr/libexec enters an infinite loop.
Summary: When the os-prober script is run (on my system) the newns program in /usr/lib...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: os-prober
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Hedayat Vatankhah
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-20 04:54 UTC by Peter Trenholme
Modified: 2014-12-23 19:50 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-23 19:50:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Peter Trenholme 2014-10-20 04:54:07 UTC
Description of problem:
I run my rawhide system from a RAID-1 md device. Since I think the standard GRUB2 menus look nicer than the "grubby" ones, I always run grub2-mkconfig after a kernel update when my "yum update" finishes. That script calls the os-prober, the os-prober script hangs at the 'newns "$@"' command, which never returns. (/usr/libexec/newns is part of the os-prober rpm package.)

Version-Release number of selected component (if applicable):
The os-prober rpm is version 1.58=1.0.fc22.x86_64
 
How reproducible:
Every time

Steps to Reproduce:
1. Open a su terminal
2. Enter the os-prober command
3. See a few lines, and then wait "forever" (well, I waited am hour, but
   all the os-prober processes were in the "S+" state.)

Actual results:
Hung process

Expected results:
List of all installed systems

Additional info: 

Here's some terminal output:

Running os-prober:
[root ~]# os-prober 
/dev/sda1:Windows 7 (loader):Windows:chain
/dev/sda3:Windows Recovery Environment (loader):Windows1:chain
/dev/sdb1:Fedora release 20 (Heisenbug):Fedora:linux
=========================
(Note that the RAID 1 array components are reported in the "raided" file in /tmp, as shown below, in addtion to the actual mount (/dev/md127) shown in the mounted-map file.)

Speculation: Could the newns program be trying to re-mount a mounted RAID component instead of using the already mounted array? In fact, why would 'os-prober' need to know anything about a raid device except its logical name? Unless it's looking for bootable os's on unmounted RAID devices, and it's hung trying to mount the RAID array containing the woking system in, e.g., /var/lib/os-prober/mount to look for the rawhide os. (My system is all in a single ext4 file system.)

From another terminal:
[root ~]# ps -aux | egrep '(19648)|(19725)|(19726)'
root     19648  0.0  0.0 113824  3256 pts/3    S+   20:52   0:00 /bin/sh /bin/os-prober
root     19725  0.0  0.0 113824  1540 pts/3    S+   20:52   0:00 /bin/sh /bin/os-prober
root     19726  0.0  0.0 113824  2312 pts/3    S+   20:52   0:00 /bin/sh /bin/os-prober

Here's the /tmp files that were generated before the hang. (They all seem correct):
[root ~]# ls /tmp/os-prober.MxQGsY/
btrfs-vols  mounted-map  raided-map  swaps-map
==========================
[root ~]# cd /tmp/os-prober.MxQGsY/
==========================
[root os-prober.MxQGsY]# cat btrfs-vols 
da1e3dad-5460-4a45-95b6-a1c3d43f760d
=========================
[root os-prober.MxQGsY]# cat mounted-map 
/dev/md127 / ext4 /dev/md127
/dev/sdb1 /Fedora ext4 /dev/sdb1
/dev/sdc1 /Backups btrfs /dev/sdc1
/dev/sda2 /Win7 fuseblk /dev/sda2
/dev/sda1 /Win7/System fuseblk /dev/sda1
/dev/sda3 /Win7/HP_Recover fuseblk /dev/sda3
========================
[root os-prober.MxQGsY]# cat raided-map 
/dev/sda5
/dev/sdb2
========================
[root os-prober.MxQGsY]# cat swaps-map 
/dev/sdc2 swap
=======================
root ~]# kill -hup 19648 19725 19726

(The os-prober scrpt traps the hangup and removes the /tmp files.)

Comment 1 Peter Trenholme 2014-10-20 14:25:50 UTC
After additional thought (and a night's sleep), I realized that I should try the obvious: I rebooted using 3.18.0-0.rc0.git6.1.fc22.x86_64 #1 SMP instead of the newer 3.18.0-0.rc0.git9.4.fc22.x86_64 #1 SMP version, and os-prober worked with no problem.

So, there is in incompatibility between os-prober and git9.1 (and git9.4) versions of the 3.18 kernel.

(By the way, google-chrome also fails using the git9 kernels.)

Anyhow, I think this bug should either be closed or moved to a kernel bug. (I hadn't reported the the google-chrome bug because I'm using a hacked version.)

Comment 2 Hedayat Vatankhah 2014-10-20 17:59:42 UTC
Thanks for the report. However, newns does almost nothing! It just execs the command given to it after calling "unshare(CLONE_NEWNS)". So, maybe its semantics have changed somehow, or it might be a kernel bug or.... Let's see if this problem persists in future kernel snapshots.

Comment 3 Peter Trenholme 2014-10-21 20:31:43 UTC
Well, 18.0 rc1 was just posted, and the problem is still there. Can you (or I) move this report to the kernel people's attention? (I don't know how I do that, but I would if I could.)

Some change made between kernel 18.0-rc0.git6.1 and 18.0-rc0.git9.1 has impacted os-pober (and google-chrome) and perhaps other applications. . . .

As an interim measure, I think I'll post a bug against the kernel referencing this thread.

Comment 4 Hedayat Vatankhah 2014-12-21 19:17:45 UTC
Would you please try again with 3.18.1-1.fc22.x86_64 kernel? Seems to be fixed there.

Comment 5 Peter Trenholme 2014-12-23 19:19:29 UTC
O.K., no problem with 3.18.1-2.fc22.x86_64. In fact, I'd forgotten this since it disappeared after 3.18.0 rc2 (if I recollect correctly), and I thought this had been closed then.

Comment 6 Hedayat Vatankhah 2014-12-23 19:50:17 UTC
Thanks, closing then.


Note You need to log in before you can comment on or make changes to this bug.