Kernel preemption and realtime linux kernel with preempt_rt, part 1

Operating System Design: Scheduling

Only one process per CPU can run at any one time, multitasking operating systems use a concept called multiprogramming to schedule time for each process to run on a CPU. A scheduler is responsible for giving each process time on the CPU. When the current time slice expires, the scheduler puts the current process to sleep and the next process is given CPU time. Some scheduling systems include:

First Come First Served, Shortest Process Next, Shortest Remaining Time, Round Robin Scheduling, Preemption, Priority Scheduling


In computing, preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such a change is known as a context switch. It is normally carried out by a privileged task or part of the system known as a preemptive scheduler, which has the power to preempt, or interrupt, and later resume, other tasks in the system.

In the Linux kernel, the scheduler is called after each timer interrupt (that is, quite a few times per second). It determines what process to run next based on a variety of factors, including priority, time already run, etc.

Pros/Cons: Making a scheduler preemptible has the advantage of better system responsiveness and scalability, but comes with the disadvantage of race conditions (A race occurs when correctness of the program depends on one thread reaching point x before another thread reaches point y.).

Preemptive Multitasking

The term preemptive multitasking is used to distinguish a multitasking operating system, which permits preemption of tasks, from a cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources.

Although multitasking techniques were originally developed to allow multiple users to share a single machine, it soon became apparent that multitasking was useful regardless of the number of users. Many operating systems, from mainframes down to single-user personal computers and no-user control systems (like those in robotic spacecraft), have recognized the usefulness of multitasking support for a variety of reasons. Multitasking makes it possible for a single user to run multiple applications at the same time, or to run “background” processes while retaining control of the computer.


Kernel is a computer program that manages input/output requests from software, and translates them into data processing instructions for the central processing unit and other electronic components of a computer.

1Monolithic kernels, which have traditionally been used by Unix-like operating systems, contain all the operating system core functions and the device drivers (small programs that allow the operating system to interact with hardware devices, such as disk drives, video cards and printers). This is the traditional design of UNIX systems. A monolithic kernel is one single program that contains all of the code necessary to perform every kernel related task.

The main disadvantages of monolithic kernels are the dependencies between system components – a bug in a device driver might crash the entire system.


A microkernel runs most of the operating system’s background processes in user space, to make the operating system more modular and, therefore, easier to maintain.

Main criticisms of monolithic kernels from microkernel advocates, which is that;

  • A device driver can enter an infinite loop or other unrecoverable state, crashing the whole system
  • Some drivers and system calls on monolithic kernels are slow to execute, and can’t return control of the processor to the scheduler or other program until they complete execution.

Kernel preemption

Kernel preemption is a method used mainly in monolithic and hybrid kernels where all or most device drivers are run in kernel space, whereby the scheduler is permitted to forcibly perform a context switch (i.e. preemptively schedule; on behalf of a runnable and higher priority process) on a driver or other part of the kernel during its execution, rather than co-operatively wait for the driver or kernel function (such as a system call) to complete its execution and return control of the processor to the scheduler.


Problem in Kernel preemption: Priority Inversion

Lower-priority process effectively blocks a higher-priority one. Lower-priority process’s ownership of lock prevents higher-priority process from running.




Solution to priority inversion is Priority Inheritance: Temporarily increase process’s priority when it acquires a lock.

Linux kernel, preemption in linux kernel

Kernel mode: The Linux kernel System Call Interface, Process scheduling subsystem, IPC subsystem, Memory management subsystem, Virtual files subsystem, Network subsystem, Other components

User mode: System daemons(deamon: In multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.), Windowing system, C standard library, Other libraries.

Priority inheritance support available since Linux 2.6.18.

Linux kernel is a preemptive operating system. When a task runs in user-space mode and gets interrupted by an interruption, if the interrupt handler wakes up another task, this task can be scheduled as soon as we return from the interrupt handler.


However, when the interrupt comes while the task is executing a system call, this system call has to finish before another task can be scheduled. By default, the Linux kernel does not do kernel preemption.

This means that the time before which the scheduler will be called to schedule another task is unbounded.


Realtime Linux kernel

Hard real-time systems: required to complete a critical task within a guaranteed amount of time.

Soft real-time systems: requires that critical processes receive priority over less fortunate ones.

Real-time applications have operational deadlines between some triggering event and the application’s response to that event. To meet these operational deadlines, programmers use real-time operating systems (RTOS) on which the maximum response time can be calculated or measured reliably for the given application and environment. A typical RTOS uses priorities. The highest priority task wanting the CPU always gets the CPU within a fixed amount of time after the event waking the task has taken place.

Traditionally, the Linux kernel will only allow one process to preempt another only under certain circumstances:

  • When the CPU is running user-mode code
  • When kernel code returns from a system call or an interrupt back to user space
  • When kernel code code blocks on a mutex, or explicitly yields control to another process

If kernel code is executing when some event takes place that requires a high priority thread to start executing, the high priority thread can not preempt the running kernel code, until the kernel code explicitly yields control. In the worst case, the latency could potentially be hundreds milliseconds or more.

The Linux 2.6 configuration option CONFIG_PREEMPT_VOLUNTARY introduces checks to the most common causes of long latencies, so that the kernel can voluntarily yield control to a higher priority task waiting to execute. This can be helpful, but while it reduces the occurences of long latencies (hundreds of milliseconds to potentially seconds or more), it does not eliminate them. However unlike CONFIG_PREEMPT (discussed below), CONFIG_PREEMPT_VOLUNTARY has a much lower impact on the overall throughput of the system. (As always, there is a classical tradeoff between throughput — the overall efficiency of the system — and latency. With the faster CPU’s of modern-day systems, it often makes sense to trade off throughput for lower latencies, but server class systems that do not need minimum latency guarantees may very well choose to use either CONFIG_PREEMPT_VOLUNTARY, or to stick with the traditional non-preemptible kernel design.)

The 2.6 Linux kernel has an additional configuration option, CONFIG_PREEMPT, which causes all kernel code outside of spinlock-protected regions and interrupt handlers to be eligible for non-voluntary preemption by higher priority kernel threads. With this option, worst case latency drops to (around) single digit milliseconds, although some device drivers can have interrupt handlers that will introduce latency much worse than that. If a real-time Linux application requires latencies smaller than single-digit milliseconds, use of the CONFIG_PREEMPT_RT patch is highly recommended.

The RT-Preempt patch converts Linux into a fully preemptible kernel. The magic is done with:

  • Making in-kernel locking-primitives (using spinlocks) preemptible though reimplementation with rtmutexes.
  • Critical sections protected by i.e. spinlock_t and rwlock_t are now preemptible. The creation of non-preemptible sections (in kernel) is still possible with raw_spinlock_t (same APIs like spinlock_t).
  • Implementing priority inheritance for in-kernel spinlocks and semaphores.
  • Converting interrupt handlers into preemptible kernel threads: The RT-Preempt patch treats soft interrupt handlers in kernel thread context, which is represented by a task_struct like a common user space process. However it is also possible to register an IRQ in kernel context.
  • Converting the old Linux timer API into separate infrastructures for high resolution kernel timers plus one for timeouts, leading to user space POSIX timers with high resolution.

Architectures, CONFIG_PREEMPT_RT patch does the support:

There are systems representing the x86, x86_64, ARM, MIPS, and Power architectures using the CONFIG_PREEMPT_RT patch. However, in many ways this is the wrong question. Support for real-time is not just about the instruction set architecture, but also about supporting the high resolution timer provided by the CPU and/or CPU support chipset, the device drivers for the system being well behaved, etc.

Please refer to platforms tested and in use with CONFIG_PREEMT_RT section in this wiki for a list of platforms that members of the -rt community have used successfully.

Disadvantages of preempt_rt patch

The normal Linux kernel allows preemption of a task by a higher priority task only when the user space code is getting executed.

In order to reduce the latency, the CONFIG_PREEMPT_RT patch forces the kernel to non-voluntarily preempt the task at hand, at the arrival of a higher proiority kernel task. This is bound to cause a reduction in the overall throughput of the system since there will be several context switches and also the lower priority tasks won’t be getting much a chance to get through.
This is the current status of Realtime Linux using the Realtime Preempt patches (aka PREEMPT_RT):


1) Since kernel 2.6.24 in mainline Linux
2) Since kernel 2.6.25 in mainline Linux
3) Realtime-Preempt patches or higher required
4) Since kernel 2.6.30 in mainline Linux
5) Not yet adapted to generic interrupt code
6) Since kernel 2.6.33 in mainline Linux
7) Since kernel 2.6.39 in mainline Linux

Installation of preempt_rt patch to linux kernel and compiling new custom linux kernel


  • or via ketchup: automatically download various kernel patches and update kernels.

ketchup 3.12.6  ketchup -s 2.6-rt #find latest RT revision  

  • After downloading, unpack the kernel tarball and change into the kernel source directory. Patch the kernel with patch level p1:
bzcat ../patch- | patch -p1
    • Configure kernel, save .config file. Check: ‘Preemption mode’ configurations
make menuconfig


  • activated the High-Resolution-Timer Option (Attention, the amount of supported platforms by the HR timer is still very limited. Right now the option is only supported on x86 systems, PowerPC and ARM Support are however in queue.)
  • disabled all Power Management Options like ACPI or APM
  • Further interesting options can be found under the “Kernel Hacking” menu entry. This menu lists options for system debugging and performance measurement. Keep in mind that the debug options may either increase the kernel size or cause higher latencies. If you do not want to debug the kernel or get some automatically produced histograms, make sure that you don’t activate any unnecessary options here. If you have activated any latency critical options the kernel will warn at boot time.
  • Compile & install
make modules
make modules_install
make install
preempt_rt patch, ‘Preemption mode’ configurations
  • No Forced Preemption (Server)  CONFIG_PREEMPT_NONE:

This is the traditional Linux preemption model, geared throughput. It will still provide good latencies most of the time, but there are no guarantees and occasional longer delays are possible. Select this option if you are building a kernel for a server or scientific/computation system, or if you want to maximize the raw processing power of the kernel, irrespective of scheduling latencies.

  • Voluntary Kernel Preemption (Desktop) CONFIG_PREEMPT_VOLUNTARY:

This option reduces the latency of the kernel by adding more”explicit preemption points” to the kernel code. These new preemption points have been selected to reduce the maximum latency of rescheduling, providing faster application reactions, at the cost of slightly lower throughput. This allows reaction to interactive events by allowing a low priority process to voluntarily preempt itself even if it is in kernel mode executing a system call. This allows applications to run more ‘smoothly’ even when the system is under load. Select this if you are building a kernel for a desktop system.

  • Preemptible Kernel (Low-Latency Desktop) CONFIG_PREEMPT__LL:

This option reduces the latency of the kernel by making all kernel code (that is not executing in a critical section) preemptible.  This allows reaction to interactive events by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing a system call and would otherwise not be about to reach a natural preemption point. This allows applications to run more ‘smoothly’ even when the system is under load, at the cost of slightly lower throughput and a slight runtime overhead to kernel code. Select this if you are building a kernel for a desktop or embedded system with latency requirements in the milliseconds range.

  • Preemptible Kernel (Basic RT) CONFIG_PREEMPT_RTB:  

This option is basically the same as (Low-Latency Desktop) but enables changes which are preliminary for the full preemptible RT kernel.

  • Fully Preemptible Kernel (RT) CONFIG_PREEMPT_RT_FULL:

All and everything.

There have been two approaches to bring realtime capability to linux kernel:
  1. Mentioned PREEMPT_RT and other one is
  2. Linux (realtime) extensions: Add extra layer between hardware and the Linux kernel to manage real-time tasks separately. Below the normal kernel, there have been three generations: RTLinux, RTAI, and Xenomai.

The content of this text is mainly compiled from various resources.

Display panel and touchscreen driver support for Pandaboard ES

Chipsee pandaboard expansion set comes with ubuntu-precise kernel. Even though, modified files are already indicated by readme files, as far as i know there isn’t any vanilla kernel integration of expansion set drivers.

To achieve that, I analyzed some linux kernel patches* from version 3.2 through version 3.10.80. And then, modified both chipsee provided files and kernel files.

These are the changed files:

+ drivers/video/omap2/displays/panel-chipsee-dpi.c
M drivers/video/omap2/displays/Kconfig
M drivers/video/omap2/displays/Makefile
M drivers/input/touchscreen/ads7846.c
M arch/arm/mach-omap2/board-omap4panda.c
M arch/arm/mach-omap2/dss-common.c

And this is the related commit:

Although, it is not proper to implement platform spesifics in dss-common.c, since it is an experimental work, i didn’t hesitate to do that.

After patching kernel, one should calibrate the touchscreen. Tslib ( is the hardware handling utility and it has ts_calibrate tool to do that. However, before executing ts_calibrate, some environment variables should be set:

export TSLIB_CONFFILE=/etc/ts.conf
export TSLIB_FBDEVICE=/dev/fb0
export TSLIB_TSDEVICE=/dev/input/event0
export TSLIB_PLUGINDIR=/usr/lib/ts
export TSLIB_CALIBFILE=/etc/pointercal

Moreover, in TSLIB_CONFFILE, module_raw input line has to be commented out.

After all of them, to run a qt application in linux frame buffer, these arguments should be passed:

-platform linuxfb -plugin evdevkeyboard:/dev/input/eventX -plugin evdevmouse:/dev/input/eventY -plugin tslib:/dev/input/eventZ

X, Y, Z numbers can be checked from /sys/class/input.

*: Here are the kernel patches that I analyzed:

build error on bash 4.3.30

During the building operations of bash 4.3.30 using buildroot, some errors may occur:

bashline.o: In function `bash_event_hook':
bashline.c:(.text+0x2328): undefined reference to `rl_signal_event_hook'
bashline.o: In function `bash_execute_unix_command':
bashline.c:(.text+0x2d5c): undefined reference to `rl_executing_keyseq'
bashline.o: In function `bashline_set_event_hook':
bashline.c:(.text+0x3734): undefined reference to `rl_signal_event_hook'
bashline.o: In function `bashline_reset_event_hook':
bashline.c:(.text+0x3748): undefined reference to `rl_signal_event_hook'
bashline.o: In function `initialize_readline':
bashline.c:(.text+0x4464): undefined reference to `rl_filename_stat_hook'
bashline.o: In function `attempt_shell_completion':
bashline.c:(.text+0x4b0c): undefined reference to `rl_filename_stat_hook'
bashline.o: In function `bashline_reset':
bashline.c:(.text+0x4bb8): undefined reference to `rl_filename_stat_hook'
bashline.c:(.text+0x4bc0): undefined reference to `rl_signal_event_hook'
bashline.o: In function `command_word_completion_function':
bashline.c:(.text+0x572c): undefined reference to `rl_filename_stat_hook'
collect2: error: ld returned 1 exit status
make[2]: *** [bash] Error 1

This error may depend on your external toolchain, however solution is as easy as making symlink. Main source of this error comes from readline library. It seems that readline 6.2 & 6 (afaik) do not have above functions.

In buildroot sysroot directory,

$BUILDROOT_DIR/output/host/usr/arm-buildroot-linux-gnueabihf/sysroot/ exists in usr/lib/arm-linux-gnueabihf and linked to lib/arm-linux-gnueabihf/ which also linked to Changing this linkage to

cp usr/lib/ lib/arm-linux-gnueabihf/
cd usr/lib
ln -sf

should solve the build problem.

libcgroup cross-compile error: undefined reference to rpl_realloc

libcgroup is an abstraction of a linux cgroups. Even though standard cross-compile operations are fairly enough:

./configure --host=arm-linux-gnueabihf

During compilation, this error happens:

make[3]: Entering directory `/home/arcelik/1511/tools/libcgroup-0.41/src/daemon'
CC cgrulesengd.o
CCLD cgrulesengd
cgrulesengd.o: In function `cgre_store_unchanged_process':
/home/arcelik/1511/tools/libcgroup-0.41/src/daemon/cgrulesengd.c:310: undefined reference to `rpl_realloc'
cgrulesengd.o: In function `cgre_store_parent_info':
/home/arcelik/1511/tools/libcgroup-0.41/src/daemon/cgrulesengd.c:223: undefined reference to `rpl_realloc'
../../src/.libs/ undefined reference to `rpl_malloc'
collect2: error: ld returned 1 exit status

undefined reference to rpl_malloc and rpl_realloc error can be fixed by explicitly telling the configure script that malloc and realloc functions are exist. So that, above linking error should not happen.

However, target platform has to be a glibc system to avoid from runtime error in the future.

ac_cv_func_malloc_0_nonnull=yes ac_cv_func_realloc_0_nonnull=yes ./configure --host=arm-linux-gnueabihf

Note: an extension of a linaro toolchain 2014.09 is used while cross-compiling.

SELinux mode configuration details

There exists some blog posts, articles about SELinux mode configuration already, which easily can be found by searching on google: “how to enable/disable selinux, how to configure selinux”. Moreover, The SELinux Notebook 4th edition has information about SELinux modes and global configuration files, respectively in chapter 2.15 and 3.2.1. However, I think, SELinux has some controversial issues about mode configuration and none of these resources are good enough to clear the mind about confusing SELinux mode configuration.
First of all, linux kernel has some configuration options which allows/disallows SELinux to be disabled/enabled. These options are:

  • CONFIG_SECURITY_SELINUX_BOOTPARAM: This option allows SELinux to be disabled at boot. If this option is selected, SELinux functionality can be disabled with selinux=0 on the kernel command line. Moreover, SELinux boot parameter default value can be changed by setting CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE to 0 or 1, respectively disabling or enabling SELinux at bootup.
  • CONFIG_SECURITY_SELINUX_DISABLE: This option enables writing to a selinuxfs node ‘disable’, which allows SELinux to be disabled at runtime prior to the policy load. That means, kernel will be “capable” to disable SELinux at runtime.  In here, Stephen Smalley claims that /sys/fs/selinux/disable can be triggered by setting SELINUX=disabled in /etc/selinux/config. However modifications on /etc/selinux/config file does not affect system until next boot-up. So that, it is the first confusing thing, since linux kernel commented on this option: “SELinux will remain disabled until the next boot”. I still do not know how to disable SELinux (disable, not permissive) at runtime by the help of this kernel configuration.
  • CONFIG_SECURITY_SELINUX_DEVELOP: With this option enabled, the kernel will start in permissive mode unless you specify enforcing=1 on the kernel command line.

So far, we know that these arguments can be passed to kernel:

  • selinux=1 /* enabled */
  • selinux=0 /* disabled */
  • enforcing=1 /* enabled & enforcing */
  • enforcing=0 /* enabled & permissive */

Additionally, from Dan Walsh’s blogpost, there is also

  • autorelabel=1

boot argument to relabel the system with exact security contexts.
Furthermore, another way to configure SELinux modes is global configuration file: /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.

According to these options, SELinux modes can be changed. However which option overrides the other one is confusing.

Temporarily switching between permissive and enforcing

Two way to switch between permissive and enforcing at runtime. Note that after reboot these options will be overriden by the system defaults. Moreover, switching to permissive/enforcing mode is only applicable unless selinux is disabled.

  • Using selinuxfx
# switching to enforcing
echo 1 > /sys/fs/selinux/enforce
# switching to permissive
echo 0 > /sys/fs/selinux/enforce
  • Using setenforce utility
# switching to enforcing
setenforce 1 /* or setenforce Enforcing */
# switching to permissive
setenforce 0 /* or setenforce Permissive */

Permanently switching SELinux modes

The second confusing thing about SELinux mode configuration is that even though kernel boot parameters override the config file, the exact opposite of this action is also possible. To clarify this options, I have made some tests on my running ARM platform. Note that, I compiled kernel with these configs:

  • No boot parameters and no config file exists, then system boots at Permissive mode. CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE & CONFIG_SECURITY_SELINUX_DEVELOP configurations are the reason of that. In fact, selinux=1 is passed as default.
  • None or selinux=1 as boot parameters and SELINUX=enforcing in config file, then system boots at Enforcing.
  • None or selinux=1 as boot parameters and SELINUX=permissive in config file, then system boots at Permissive.
  • None or selinux=1 as boot parameters and SELINUX=disabled in config file, SELinux:  Disabled at runtime.
  • If selinux=0 passed as kernel boot parameter, then SELinux will be disabled whatever passed in config file. However, it is not recommended.
  • enforcing=0 as boot parameter, SELINUX=enforcing in config file, then system boots at Permissive mode.
  • enforcing=0 as boot parameter, SELINUX=disabled in config file, then SELinux:  Disabled at runtime.
  • enforcing=1 passed as boot parameter, SELINUX=permissive in config file, then system boots at Enforcing mode.
  • enforcing=1 passed as boot parameter, SELINUX=disabled in config file, then unexpectedly: SELinux:  Disabled at runtime and kernel panic occurs since can’t load SELinux Policy. I do not know the exact reason of this error. I actually tried to relabel filesystem before enforcing the system, so expected result of this action should be Enforcing system boot-up.

So, boot argument selinux=0 overrides /etc/config/selinux, however selinux=1 does not overrides. Moreover, while passing enforcing=0, even though in /etc/selinux/config includes SELINUX=enforcing, system starts w/ permissive mode. However if config file includes SELINUX=disabled, system starts w/o SELinux. I think this is confusing and kind of inconsistent but there should be a good rationale for that. Most people hardly ever get confronted with these situations.

Switching from disabled to enabled

As mentioned gentoo:selinux tutorials and here, If system booted with SELinux disabled, we need to relabel filesystem to enable again SELinux. After disabling SELinux, switching back to enforcing mode causes kernel crash, since relabeling can not be possible in enforcing mode. So that, switching from disabled to enabled mode is only possible in permissive mode. After booting in permissive mode:

fixfiles relabel

can be used to relabel filesystem. Also, kernel boot parameter


will force the system to relabel, too. In my experiments, after disabling SELinux, passing enforcing=1 as kernel parameters causes kernel panic (as expected). Here the logs:

[ 8.245513] SELinux: Disabled at runtime.
[ 8.474853] type=1404 audit(946686015.345:2): selinux=0 auid=4294967295 ses=4294967295
can't load SELinux Policy. Machine is in enforcing mode. Halting now.

So, what’s the conclusion?

If your intention is to disable SELinux permanently, and never ever want to be enabled again, even though it is not recommended, passing selinux=0 as kernel boot parameter is the best option. For this case, kernel boot argument overrides options in /etc/selinux/config. However, unless SELinux is intented to be disabled, passing selinux=1 or none as kernel boot parameter and modifying /etc/selinux/config file, according to intention, will be proper action.

redirection vs pipe

Redirection (>, <) and pipe (|), both are used to pass output of a process. However, there exists one fundamental differences between them. Even though redirection can be used to pass to stream which also can be a file, pipe can pass output of a process to another process.

For example, this command does that:

dmesg | grep selinux > temp.log
  1. executes dmesg
  2. passes output of dmesg to grep process
  3. executes grep on output of dmesg and take selinux as a pattern
  4. then redirects output of grep to a file named temp.log, if does not exists, creates it.

In fact, above-command can be typed like that:

dmesg > dmesg.out && grep selinux < dmesg.out > temp.log

Which does same job, except that it writes dmesg output to file named dmesg.out. Instead of double redirection, using pipe eases our job.