Bug 998678 - top reports libnuma problem "/sys not mounted or invalid"
Summary: top reports libnuma problem "/sys not mounted or invalid"
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: procps-ng
Version: 19
Hardware: i686
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jaromír Cápík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-19 18:42 UTC by Rick
Modified: 2016-02-01 01:59 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-18 14:05:18 UTC
Type: Bug
Embargoed:
seppo.yli-olli: needinfo-


Attachments (Terms of Use)

Description Rick 2013-08-19 18:42:26 UTC
Description of problem:

After typing "top" at the commandline, a warning is reported:

libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory


Version-Release number of selected component (if applicable):

procps-ng-3.3.8-8.fc19.i686

How reproducible:

Everytime

Steps to Reproduce:
1.  Type "top" at commandline
2.  Observe warning listed above
3.

Actual results:

top info overwritten with libnuma warning at first presentation.  Warning overwritten after top info update.

Expected results:

No errors/warnings.

Additional info:

Linux steelers.net 3.10.7-200.fc19.i686.PAE #1 SMP Fri Aug 16 00:22:51 UTC 2013 i686 i686 i386 GNU/Linux

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Genuine Intel(R) CPU           T2250  @ 1.73GHz
stepping        : 8
microcode       : 0x39
cpu MHz         : 1733.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fdiv_bug        : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor est tm2 xtpr pdcm dtherm
bogomips        : 3459.19
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 32 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Genuine Intel(R) CPU           T2250  @ 1.73GHz
stepping        : 8
microcode       : 0x39
cpu MHz         : 1733.000
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor est tm2 xtpr pdcm dtherm
bogomips        : 3459.19
clflush size    : 64
cache_alignment : 64
address sizes   : 32 bits physical, 32 bits virtual
power management:

Comment 1 Jaromír Cápík 2013-08-20 15:40:00 UTC
Hello Rick.

In fact the warning is not printed by the 'top' tool. It's printed by the libnuma library. I found a similar bug reported against numactl in Fedora 11 (#499633). I'm going to switch the component to numactl and we'll see.

Regards,
Jaromir.

Comment 2 Jaromír Cápík 2013-08-20 15:52:12 UTC
I believe that libraries should avoid printing warnings and errors directly. They should return an error code to the calling function and provide functions for translating the error codes to readable strings. Alternatively they should provide a possibility to suppress warnings by setting some kind of 'quiet' flag.

Comment 3 Rick 2013-08-20 16:38:46 UTC
Hello Jaromir,

Thank you for tracking it down.

I thought my numactl-libs version might help as well:

numactl-libs-2.0.8-4.fc19.i686

and:

# numactl -s
physcpubind: 0 1 
No NUMA support available on this system.

Regards,
Rick

Comment 4 Nuno Lopes 2013-10-17 09:28:45 UTC
I can reproduce this problem in latest FC19.

Comment 5 Seppo Yli-Olli 2013-10-30 17:36:46 UTC
From libnuma.c file:
/* Next two can be overwritten by the application for different error handling */
It's talking about numa_err and numa_warn. This message into top comes from numa_warn and it sounds like the application is expected to silence it by overriding numa_warn. It sounds strongly like this would be a bug in top using libnuma incorrectly and not in libnuma.

Comment 6 Petr Holasek 2013-10-30 22:45:55 UTC
Agreed, numa_warn and numa_error are defined with weak attribute and can be (and should be) overriden by calling application. I also think that re-definition of that two functions in top would be proper solution.

Jaromir, do you agree?

Comment 7 Jaromír Cápík 2013-10-31 09:07:52 UTC
Thanks for the analysis guys. In that case I'll ask Jim to fix that or dig deeper by myself and write a patch.
Taking the bug ...

Comment 8 Trevor Cordes 2013-11-02 08:44:59 UTC
This bug is a bit more annoying than just a visible error message.  If you run top and don't press any keys you get two (i.e. a duplicated) "KiB Mem:" lines at the top, and completely lose the swap line that should be in the place of the 2nd one.  The only way to see the swap line is to do some sort of manual refresh, like hitting spacebar.

It's because the error at the beginning is confusing curses, I do believe, from which it never recovers.

Even if no one addresses the library error output, perhaps someone can have top do a complete curses refresh after startup to make sure the swap line is shown.

Sure, it's not a big deal for experienced users, but a newbie using top may get frustrated.

Comment 9 Jaromír Cápík 2013-11-04 13:14:35 UTC
Hello Trevor.

Do you experience the issue on a 32-bit or 64-bit system? We're trying to find a way how to reliably reproduce the issue.

Thanks in advance.

Regards,
Jaromir.

Comment 10 Jaromír Cápík 2013-11-04 15:48:51 UTC
Hello guys.

We call the libnuma functions using the dlopen/dlsym way and it seems the weak functions can't be easily overridden that way (or at least we haven't found any way how to do that). 

Anyway, I still believe the library shouldn't print any warnings by default and should be redesigned.

Please, reconsider that and let me know.

Regards,
Jaromir.

Comment 11 Trevor Cordes 2013-11-05 07:20:13 UTC
I see this bug every single time I run top on at least 2 of my systems.  They are both 32-bit.

kernel-PAE-3.11.6-200.fc19.i686
procps-ng-3.3.8-10.fc19.i686

one system's cpu is:
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz

which obviously is SMP, but not NUMA.

Strange, but I just tried to reproduce the bad behaviour I noted in comment #8, but it's not doing it now.  I did recently yum update and reboot, so maybe something was changed?  To be clear: I do still see the original bug (error message when first starting top) every time, but I do not see the additional bug in comment #8.  I will report back if it comes back (perhaps it was a time-sensitive or terminal issue).

Comment 12 Seppo Yli-Olli 2013-11-05 19:31:30 UTC
I do not represent libnuma so I probably should not be added as needinfo. My stake here is I just did a quick analysis on this problem since I bumped into it on a 32bit Fedora 19 on a CPU without NUMA support.

Comment 13 Petr Holasek 2013-11-11 19:18:15 UTC
Hi Jaromir,

sorry for delay, I was PTO last week.

I agree that __weak__ function was a bad design decision w.r.t of dlopen() use. But I don't exactly know in what manner you would like to redesign library.
The most elegant solution would be redirecting warnings to syslog, but libnuma users should either have right capabilities or be root.
When we will remove warnings, user would just have to rely on unclear errno values.

Please, go ahead and explain your problem with dlopen()+broken ncurses layout to upstream developers on linux-numa.org list. I will follow-up with patches if you agree on some reasonable solution.

thanks,
Petr

Comment 14 Seppo Yli-Olli 2014-02-12 22:35:00 UTC
Has there been any update on this?
If not, Petr,
do you consider it to be too much of an ugly design if there would be a shim for libnuma which would be exactly the same as current one, just with the weak attributes overridden as silent and applications could choose which one to load? Could have both behaviours available that way.

Comment 15 Christian Kujau 2014-03-23 04:37:33 UTC
Same here on F20 (32 bit Atom CPU, procps-ng-3.3.8-16.fc20.i686):

$ top -b -n 1 > /dev/null 
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory

strace says:

2561  open("/lib/libnuma.so.1", O_RDONLY|O_CLOEXEC) = 3
[...]
2561  open("/proc/stat", O_RDONLY)      = 6
[...]
2561  open("/sys/devices/system/node/node0/cpumap", O_RDONLY) = -1 ENOENT (No such file or directory)
2561  write(2, "libnuma: Warning: ", 18) = 18
2561  write(2, "/sys not mounted or invalid. Assuming one node: No such file or directory", 73) = 73

Comment 16 Trevor Cordes 2015-01-09 08:46:46 UTC
I upgraded to F21 x86_64 (from F19 i686) and this bug has not shown up in the 5 days since.  I think this bug is fixed, at least in F21 and x86_64.

This bug still shows up my other F19 i686 (PAE) computers with the latest (last) F19 updates.

When I upgrade my other F19 i686 computers to F20 i686 (soon!) I will report back if it happens there.

Comment 17 Fedora End Of Life 2015-01-09 22:15:39 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2015-02-18 14:05:18 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.