RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 821090 - Spontaneous reboots occurring while testing RHEL-7.0-20120508.n.0 on x86_64.
Summary: Spontaneous reboots occurring while testing RHEL-7.0-20120508.n.0 on x86_64.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: acpid
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: alpha
: ---
Assignee: Jaroslav Škarvada
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-11 20:28 UTC by Barry Donahue
Modified: 2012-05-15 13:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-15 13:35:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Barry Donahue 2012-05-11 20:28:56 UTC
I'm trying to run the pNATE regression suite on RHEL 7. While pNATE is still in the process of configuring the client systems, they reboot on their own. Following are the only clues I see. This is very reproducible.

The last messages out of the console are:

May 11 15:56:58 pnate-client-01 abrtd: Got signal 15, exiting
May 11 15:56:58 pnate-client-01 acpid: exiting
May 11 15:56:58 pnate-client-01 smartd[2033]: smartd received signal 15: Terminated
May 11 15:56:58 pnate-client-01 kernel: Kernel logging (proc) stopped.
May 11 15:56:58 pnate-client-01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="2204" x-info="http://www.rsyslog.com"] exiting on signal 15.

Broadcast message from root.bos.redhat.com on pts/0 (Fri, 11 May 2012 15:56:58 -0400):

The system is going down for reboot NOW!

The only interesting things in dmesg are:

[   76.423414] ACPI Error: No handler for Region [IPMI] (ffff880220114320) [IPMI] (20120111/evregion-376)
[   76.423421] ACPI Error: Region IPMI (ID=7) has no handler (20120111/exfldio-306)
[   76.423495] ACPI Error: Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff880220126b20), AE_NOT_EXIST (20120111/psparse-536)
[   76.423817] ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff880220126e00), AE_NOT_EXIST (20120111/psparse-536)
[   76.424331] ACPI Exception: AE_NOT_EXIST, Evaluating _PMC (20120111/power_meter-773)
[   76.740760] EDAC MC: Ver: 2.1.0
[   76.757410] EDAC MC0: Giving out device to 'i7core_edac.c' 'i7 core #0': DEV 0000:fe:03.0
[   76.763334] EDAC PCI0: Giving out device to module 'i7core_edac' controller 'EDAC PCI controller': DEV '0000:fe:03.0' (POLLED)
[   76.764363] 
[   76.764372] ===============================
[   76.764381] [ INFO: suspicious RCU usage. ]
[   76.764388] 3.3.0-0.12.el7.x86_64 #1 Not tainted
[   76.764395] -------------------------------
[   76.764402] arch/x86/kernel/cpu/mcheck/mce.c:194 suspicious rcu_dereference_index_check() usage!
[   76.764414] 
[   76.764414] other info that might help us debug this:
[   76.764416] 
[   76.764425] 
[   76.764426] rcu_scheduler_active = 1, debug_locks = 0
[   76.764435] 3 locks held by udevd/436:
[   76.764441]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff8141c0ab>] __driver_attach+0x5b/0xb0
[   76.764462]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff8141c0b9>] __driver_attach+0x69/0xb0
[   76.764479]  #2:  (i7core_edac_lock){+.+.+.}, at: [<ffffffffa008db26>] i7core_probe+0x22/0xd8f [i7core_edac]
[   76.764498] 
[   76.764499] stack backtrace:
[   76.764507] Pid: 436, comm: udevd Not tainted 3.3.0-0.12.el7.x86_64 #1
[   76.764515] Call Trace:
[   76.764523]  [<ffffffff810cbe45>] lockdep_rcu_suspicious+0xe5/0x100
[   76.764533]  [<ffffffff810301c0>] mce_register_decode_chain+0x1c0/0x1e0
[   76.764545]  [<ffffffff814e2707>] ? dmi_walk+0x67/0xa0
[   76.764554]  [<ffffffffa008e746>] i7core_probe+0xc42/0xd8f [i7core_edac]
[   76.764567]  [<ffffffff810a5f15>] ? sched_clock_cpu+0xc5/0x120
[   76.764578]  [<ffffffff81354a5c>] local_pci_probe+0x5c/0xd0
[   76.764586]  [<ffffffff81354be1>] pci_device_probe+0x111/0x120
[   76.764595]  [<ffffffff8141bdf6>] driver_probe_device+0x96/0x2f0
[   76.764604]  [<ffffffff8141c0fb>] __driver_attach+0xab/0xb0
[   76.764613]  [<ffffffff8141c050>] ? driver_probe_device+0x2f0/0x2f0
[   76.764622]  [<ffffffff81419ff5>] bus_for_each_dev+0x55/0x90
[   76.764631]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
[   76.764640]  [<ffffffff8141b8de>] driver_attach+0x1e/0x20
[   76.764648]  [<ffffffff8141b5e8>] bus_add_driver+0x1b8/0x2b0
[   76.764656]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
[   76.764663]  [<ffffffff8141c8d7>] driver_register+0x77/0x160
[   76.764670]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
[   76.764678]  [<ffffffff81353833>] __pci_register_driver+0x73/0xf0
[   76.764686]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
[   76.764694]  [<ffffffffa010203f>] i7core_init+0x3f/0x1000 [i7core_edac]
[   76.764705]  [<ffffffff8100212a>] do_one_initcall+0x12a/0x180
[   76.764713]  [<ffffffff810dee00>] sys_init_module+0xc0/0x220
[   76.764724]  [<ffffffff8166b8e9>] system_call_fastpath+0x16/0x1b
[   76.764731] EDAC i7core: Driver loaded, 1 memory controller(s) found.


   This may or may not be related to the pNATE tests trying to start up the ntpd service to sync up the logs. It could be a pure coincidence.

# systemctl status ntpd.service
ntpd.service
	  Loaded: error (Reason: No such file or directory)
	  Active: inactive (dead)

Comment 1 Jaroslav Škarvada 2012-05-14 09:43:19 UTC
The RCU related backtrace maybe dupe of bug 789644 but AFAIK it shouldn't cause reboots.

It doesn't seem to be related to acpid component, because it seems it got SIGTERM like others due to previously initiated reboot:

> May 11 15:56:58 pnate-client-01 abrtd: Got signal 15, exiting
> May 11 15:56:58 pnate-client-01 acpid: exiting
> May 11 15:56:58 pnate-client-01 smartd[2033]: smartd received signal 15:
> Terminated

Comment 2 Jaroslav Škarvada 2012-05-14 10:11:00 UTC
Is there anything more before (maybe in /var/log/messages):
May 11 15:56:58 pnate-client-01 abrtd: Got signal 15, exiting
?

Comment 3 Barry Donahue 2012-05-14 13:41:58 UTC
These are the messages that precede the earlier messages posted.

May 11 15:49:04 pnate-client-02 systemd-logind[2046]: New session 1 of user root.
May 11 15:51:18 pnate-client-02 systemd-logind[2046]: New session 3 of user root.



May 11 15:54:59 pnate-client-02 systemd-logind[2046]: Removed session 1.
May 11 15:56:19 pnate-client-02 systemd-logind[2046]: New session 4 of user root.
May 11 15:56:30 pnate-client-02 kernel: [  732.865363] sanlun used greatest stack depth: 2944 bytes left
May 11 15:56:49 pnate-client-02 systemd-logind[2046]: New session 5 of user root.
May 11 15:56:54 pnate-client-02 systemd-logind[2046]: Removed session 5.
May 11 15:56:58 pnate-client-02 acpid: exiting

Comment 6 Jaroslav Škarvada 2012-05-14 15:06:00 UTC
The kernel patch from bug 789644 doesn't seem to be in RHEL-7 kernel. It should at least quiet the RCU backtrace.

Could you run acpid with -ld command line parameters? It would be more verbose about events.

Comment 7 Jaroslav Škarvada 2012-05-14 15:11:31 UTC
Cloned bug 789644 as bug 821464.


Note You need to log in before you can comment on or make changes to this bug.