Bug 1245181 - Sanlock fail to set scheduler to SCHED_RR
Summary: Sanlock fail to set scheduler to SCHED_RR
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: sanlock
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: David Teigland
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1243935
TreeView+ depends on / blocked
 
Reported: 2015-07-21 12:08 UTC by Nir Soffer
Modified: 2015-07-26 15:28 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-07-26 15:28:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sanlock.log (120.14 KB, text/plain)
2015-07-21 12:10 UTC, Nir Soffer
no flags Details

Description Nir Soffer 2015-07-21 12:08:50 UTC
Description of problem:

When sanlock starts up, it fails in sched_setscheduler():

2015-07-13 12:03:03+0300 1622 [13612]: sanlock daemon started 3.2.2 host ade0d225-5bd3-424b-bed5-ca739c40b7dd.bamba.tlv.
2015-07-13 12:03:03+0300 1622 [13612]: set scheduler RR|RESET_ON_FORK priority 99 failed: Operation not permitted

Version-Release number of selected component (if applicable):
3.2.2

How reproducible:
Always

Steps to Reproduce:
1. Start sanlock service

This failure does not happen on rhel 7.1, fedora 20, and 21.

Looks like a kernel issue in fedora 22.

We suspect that not running using SCHED_RR may lead to io timeouts and unneeded
fencing of the SPM, which fail any operation running on the SPM.

Comment 1 Nir Soffer 2015-07-21 12:10:19 UTC
Created attachment 1054317 [details]
sanlock.log

Comment 2 David Teigland 2015-07-21 15:17:40 UTC
This seems a likely cause for the timeouts.

The wdmd daemon also does the same scheduler steps, so I'd expect the same errors from wdmd to be in /var/log/messages.  If not, could you check if wdmd was able to set its scheduling successfully?

Running:
ps ax -o pid,stat,cmd,class,rtprio | grep wdmd

Should show this:
14282 SLs  wdmd                        RR      99

Comment 3 Nir Soffer 2015-07-21 19:11:05 UTC
(In reply to David Teigland from comment #2)
> This seems a likely cause for the timeouts.
> 
> The wdmd daemon also does the same scheduler steps, so I'd expect the same
> errors from wdmd to be in /var/log/messages.  If not, could you check if
> wdmd was able to set its scheduling successfully?

I see:

# ps axf -o pid,stat,cmd,class,rtprio
  721 SLs  wdmd -G sanlock             RR      99
  723 SLsl sanlock daemon -U sanlock - RR      99
  724 S     \_ sanlock daemon -U sanlo TS       -

And I also do not see any error after yesterday at 01:30 - maybe the issue
disappeared after reboot?

Comment 4 Nir Soffer 2015-07-21 19:18:18 UTC
I rebooted the host, and I see:

(reboot)

# sanlock.log
2015-07-21 22:10:31+0300 11 [747]: sanlock daemon started 3.2.2 host 708de246-f98b-4f9a-b9b2-de8d8a10a291.bamba.tlv.
2015-07-21 22:10:53+0300 33 [752]: cmd_add_lockspace 3,9 f4f54f47-9ccf-4978-a9a7-12a6d89bf94e:2:/rhev/data-center/mnt/multipass.eng.lab.tlv.redhat.com:_export_images_rnd_ahadas

# ps axf -o pid,stat,cmd,class,rtprio
  742 SLs  wdmd -G sanlock             RR      99
  747 SLsl sanlock daemon -U sanlock - RR      99
  748 S     \_ sanlock daemon -U sanlo TS       -

And when running the tests program, it works now.

Seems like a temporary failure that I cannot reproduce now.

How do you suggest to proceed with this?

Comment 5 David Teigland 2015-07-22 14:21:45 UTC
That's good and bad I suppose.  I don't have any clue what could have happened.

Comment 6 Nir Soffer 2015-07-26 15:28:08 UTC
Since the sched_setscheduler(2) issue disappeared, we cannot do much about
this. Closing until we have more data.


Note You need to log in before you can comment on or make changes to this bug.