前幾天被同事問說為什麼他們在log看到 swapper 一直是non-preemptible的 (preempt_count > 0)
聽起來也蠻奇怪的齁,如果idle task不能preempt,那不是不就沒人可以搶CPU了嗎,CPU永遠在發呆XD
於是乎就來trace code吧吧吧吧
CPU 開起來之後就會進入這裡 然後就會開始一直跑cpu_idle_loop
void cpu_startup_entry(enum cpuhp_state state) { /* * This #ifdef needs to die, but it's too late in the cycle to * make this generic (arm and sh have never invoked the canary * init for the non boot cpus!). Will be fixed in 3.11 */ #ifdef CONFIG_X86 /* * If we're the non-boot CPU, nothing set the stack canary up * for us. The boot CPU already has it initialized but no harm * in doing it again. This is a good place for updating it, as * we wont ever return from this function (so the invalid * canaries already on the stack wont ever trigger). */ boot_init_stack_canary(); #endif arch_cpu_idle_prepare(); cpuhp_online_idle(state); cpu_idle_loop(); }
接著這裡會反覆執行need_resched() 來判斷需不需要做scheduling
當需要重新排程調度的時候就會脫離while loop,然後把PREEMPT_NEED_RESCHED bit set up
/* * Generic idle loop implementation * * Called with polling cleared. */ static void cpu_idle_loop(void) { int cpu = smp_processor_id(); while (1) { /* * If the arch has a polling bit, we maintain an invariant: * * Our polling bit is clear if we're not scheduled (i.e. if * rq->curr != rq->idle). This means that, if rq->idle has * the polling bit set, then setting need_resched is * guaranteed to cause the cpu to reschedule. */ __current_set_polling(); quiet_vmstat(); tick_nohz_idle_enter(); while (!need_resched()) { check_pgt_cache(); rmb(); if (cpu_is_offline(cpu)) { cpuhp_report_idle_dead(); arch_cpu_idle_dead(); } local_irq_disable(); arch_cpu_idle_enter(); /* * In poll mode we reenable interrupts and spin. * * Also if we detected in the wakeup from idle * path that the tick broadcast device expired * for us, we don't want to go deep idle as we * know that the IPI is going to arrive right * away */ if (cpu_idle_force_poll || tick_check_broadcast_expired()) cpu_idle_poll(); else cpuidle_idle_call(); arch_cpu_idle_exit(); } /* * Since we fell out of the loop above, we know * TIF_NEED_RESCHED must be set, propagate it into * PREEMPT_NEED_RESCHED. * * This is required because for polling idle loops we will * not have had an IPI to fold the state for us. */ preempt_set_need_resched(); tick_nohz_idle_exit(); __current_clr_polling(); /* * We promise to call sched_ttwu_pending and reschedule * if need_resched is set while polling is set. That * means that clearing polling needs to be visible * before doing these things. */ smp_mb__after_atomic(); sched_ttwu_pending(); schedule_preempt_disabled(); } }
loop的最後一行 在schedule_preempt_disabled()
最後進入schedule() 進行排程讓給其他task,後面這段就是另一個故事惹
然後呢,在某年某月某日的時候 idle task(swapper) 在某次的schedule()後再度拿回CPU,preempt又會再度被disable惹!!!
void __sched schedule_preempt_disabled(void) { sched_preempt_enable_no_resched(); schedule(); preempt_disable(); }
似乎是可以結論這段code的邏輯是讓idle task 一直在做 “到底要不要放開CPU給其他人” 這件事
所以我想non-preemptible好像也很合理的,要是做到一半被搶走了不就很奇怪…難道前面 “到底要不要放開CPU給其他人” 的 code邏輯有問題嗎XD