Re: [bug] very high non-preempt latency in context_struct_compute_av()



On Monday, June 4 2007 7:27:45 am Ingo Molnar wrote:
a simple ssh login triggers a ~130 msecs non-preemptible latency even
with CONFIG_PREEMPT enabled, on a fast Core2Duo CPU (!).

the latency is caused by a _very_ long loop in the SELinux code:

sshd-4828 0.N.. 465894us : avtab_search_node
(context_struct_compute_av) sshd-4828 0.N.. 465895us : cond_compute_av
(context_struct_compute_av) sshd-4828 0.N.. 465895us : avtab_search_node
(cond_compute_av) sshd-4828 0.N.. 465895us : avtab_search_node
(context_struct_compute_av) sshd-4828 0.N.. 465896us : cond_compute_av
(context_struct_compute_av) sshd-4828 0.N.. 465896us : avtab_search_node
(cond_compute_av) sshd-4828 0.N.. 465896us : avtab_search_node
(context_struct_compute_av) sshd-4828 0.N.. 465896us : cond_compute_av
(context_struct_compute_av) sshd-4828 0.N.. 465896us : avtab_search_node
(cond_compute_av)

it is triggered like this:

sshd-4828 0..s. 462986us : tasklet_action (__do_softirq)
sshd-4828 0..s. 462986us : rcu_process_callbacks (tasklet_action)
sshd-4828 0..s. 462986us : __rcu_process_callbacks
(rcu_process_callbacks) sshd-4828 0..s. 462987us : __rcu_process_callbacks
(rcu_process_callbacks) sshd-4828 0D.s. 462987us : _local_bh_enable
(__do_softirq)
sshd-4828 0DN.. 462987us : idle_cpu (irq_exit)
sshd-4828 0.N.. 462988us : avtab_search_node
(context_struct_compute_av) sshd-4828 0.N.. 462989us : cond_compute_av
(context_struct_compute_av)

{snip}

The distribution is Fedora 7, v2.6.21 (but also happens in recent -git)
and a simple 'ssh localhost' login is enough to trigger this. It
triggers every time and this is causing audio skipping in certain apps.
It is even visible in glxgears smoothness: a small 'bump' is visible in
the otherwise smooth rotation of glxgears. Enabling CONFIG_PREEMPT does
not fix this issue as the function runs under spinlocks. (enabling
CONFIG_PREEMPT_RT in -rt fixes the issue - but that still leaves us with
the huge 130 msecs cost of that function.)

I'm not an expert on the SELinux security server guts like the other people on
the To/CC line of this thread, but here are my two cents on the issue above.

From what I can tell the nasty loop that is taking so long is the actual
access vector lookup which determines if the subject has access to the object
(i.e. can user/application X access resource Y on the system). While it may
be possible to optimize this code I wonder if a quicker/easier solution would
be to refactor the lock. At present SELinux uses a read/write spinlock to
protect the policy stored in the kernel with macros to take and release the
lock, POLICY_{RD,WR}LOCK and POLICY_{RD,WR}UNLOCK. From personal
observations as well as a quick check of the code, it appears that most of
the time we only want to read lock the policy and not write lock the policy -
a spinlock, even a read/write spinlock, seems a bit expensive here.

If we were to convert from a read/write spinlock to a RCU locking mechanism
would this solve the preemption problem (I'm not a lock expert either)? If
so, can anyone think of any reasons why converting the policy lock to RCU is
a bad idea (James, Stephen, the other James)?

--
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [bug] very high non-preempt latency in context_struct_compute_av()
    ... quicker/easier solution would be to refactor the lock. ... SELinux uses a read/write spinlock to protect the policy stored in the ... policy lock to RCU is a bad idea? ...
    (Linux-Kernel)
  • Re: [bug] very high non-preempt latency in context_struct_compute_av()
    ... I'm not an expert on the SELinux security server guts like the other people on ... lock, POLICY_LOCK and POLICY_UNLOCK. ... the time we only want to read lock the policy and not write lock the policy - ... a spinlock, even a read/write spinlock, seems a bit expensive here. ...
    (Linux-Kernel)
  • Re: Trigger blocks our AS400 applications
    ... indeed, when I take a record in lock for update, ... another session can read the record. ... SECOND TIME with a lock for update when the trigger is active ??? ...
    (comp.sys.ibm.as400.misc)
  • Re: Quick question re: locks and triggers
    ... TableX has insert trigger TableXInsertTrigger that includes the following ... It might be that it does not take a table lock, ... or at least use holdlock hints on the critical ...
    (microsoft.public.sqlserver.programming)
  • Re: Trigger blocks our AS400 applications
    ... indeed, when I take a record in lock for update, ... another session can read the record. ... SECOND TIME with a lock for update when the trigger is active ??? ...
    (comp.sys.ibm.as400.misc)