Operating System Design: Scheduling
Only one process per CPU can run at any one time, multitasking operating systems use a concept called multiprogramming to schedule time for each process to run on a CPU. A scheduler is responsible for giving each process time on the CPU. When the current time slice expires, the scheduler puts the current process to sleep and the next process is given CPU time. Some scheduling systems include:
First Come First Served, Shortest Process Next, Shortest Remaining Time, Round Robin Scheduling, Preemption, Priority Scheduling
In computing, preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such a change is known as a context switch. It is normally carried out by a privileged task or part of the system known as a preemptive scheduler, which has the power to preempt, or interrupt, and later resume, other tasks in the system.
In the Linux kernel, the scheduler is called after each timer interrupt (that is, quite a few times per second). It determines what process to run next based on a variety of factors, including priority, time already run, etc.
Pros/Cons: Making a scheduler preemptible has the advantage of better system responsiveness and scalability, but comes with the disadvantage of race conditions (A race occurs when correctness of the program depends on one thread reaching point x before another thread reaches point y.).
The term preemptive multitasking is used to distinguish a multitasking operating system, which permits preemption of tasks, from a cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources.
Although multitasking techniques were originally developed to allow multiple users to share a single machine, it soon became apparent that multitasking was useful regardless of the number of users. Many operating systems, from mainframes down to single-user personal computers and no-user control systems (like those in robotic spacecraft), have recognized the usefulness of multitasking support for a variety of reasons. Multitasking makes it possible for a single user to run multiple applications at the same time, or to run “background” processes while retaining control of the computer.
Kernel is a computer program that manages input/output requests from software, and translates them into data processing instructions for the central processing unit and other electronic components of a computer.
Monolithic kernels, which have traditionally been used by Unix-like operating systems, contain all the operating system core functions and the device drivers (small programs that allow the operating system to interact with hardware devices, such as disk drives, video cards and printers). This is the traditional design of UNIX systems. A monolithic kernel is one single program that contains all of the code necessary to perform every kernel related task.
The main disadvantages of monolithic kernels are the dependencies between system components – a bug in a device driver might crash the entire system.
A microkernel runs most of the operating system’s background processes in user space, to make the operating system more modular and, therefore, easier to maintain.
Main criticisms of monolithic kernels from microkernel advocates, which is that;
- A device driver can enter an infinite loop or other unrecoverable state, crashing the whole system
- Some drivers and system calls on monolithic kernels are slow to execute, and can’t return control of the processor to the scheduler or other program until they complete execution.
Kernel preemption is a method used mainly in monolithic and hybrid kernels where all or most device drivers are run in kernel space, whereby the scheduler is permitted to forcibly perform a context switch (i.e. preemptively schedule; on behalf of a runnable and higher priority process) on a driver or other part of the kernel during its execution, rather than co-operatively wait for the driver or kernel function (such as a system call) to complete its execution and return control of the processor to the scheduler.
Problem in Kernel preemption: Priority Inversion
Lower-priority process effectively blocks a higher-priority one. Lower-priority process’s ownership of lock prevents higher-priority process from running.
Solution to priority inversion is Priority Inheritance: Temporarily increase process’s priority when it acquires a lock.
Linux kernel, preemption in linux kernel
Kernel mode: The Linux kernel System Call Interface, Process scheduling subsystem, IPC subsystem, Memory management subsystem, Virtual files subsystem, Network subsystem, Other components
User mode: System daemons(deamon: In multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.), Windowing system, C standard library, Other libraries.
Priority inheritance support available since Linux 2.6.18.
Linux kernel is a preemptive operating system. When a task runs in user-space mode and gets interrupted by an interruption, if the interrupt handler wakes up another task, this task can be scheduled as soon as we return from the interrupt handler.
However, when the interrupt comes while the task is executing a system call, this system call has to finish before another task can be scheduled. By default, the Linux kernel does not do kernel preemption.
This means that the time before which the scheduler will be called to schedule another task is unbounded.
Realtime Linux kernel
Hard real-time systems: required to complete a critical task within a guaranteed amount of time.
Soft real-time systems: requires that critical processes receive priority over less fortunate ones.
Real-time applications have operational deadlines between some triggering event and the application’s response to that event. To meet these operational deadlines, programmers use real-time operating systems (RTOS) on which the maximum response time can be calculated or measured reliably for the given application and environment. A typical RTOS uses priorities. The highest priority task wanting the CPU always gets the CPU within a fixed amount of time after the event waking the task has taken place.
Traditionally, the Linux kernel will only allow one process to preempt another only under certain circumstances:
- When the CPU is running user-mode code
- When kernel code returns from a system call or an interrupt back to user space
- When kernel code code blocks on a mutex, or explicitly yields control to another process
If kernel code is executing when some event takes place that requires a high priority thread to start executing, the high priority thread can not preempt the running kernel code, until the kernel code explicitly yields control. In the worst case, the latency could potentially be hundreds milliseconds or more.
The Linux 2.6 configuration option CONFIG_PREEMPT_VOLUNTARY introduces checks to the most common causes of long latencies, so that the kernel can voluntarily yield control to a higher priority task waiting to execute. This can be helpful, but while it reduces the occurences of long latencies (hundreds of milliseconds to potentially seconds or more), it does not eliminate them. However unlike CONFIG_PREEMPT (discussed below), CONFIG_PREEMPT_VOLUNTARY has a much lower impact on the overall throughput of the system. (As always, there is a classical tradeoff between throughput — the overall efficiency of the system — and latency. With the faster CPU’s of modern-day systems, it often makes sense to trade off throughput for lower latencies, but server class systems that do not need minimum latency guarantees may very well choose to use either CONFIG_PREEMPT_VOLUNTARY, or to stick with the traditional non-preemptible kernel design.)
The 2.6 Linux kernel has an additional configuration option, CONFIG_PREEMPT, which causes all kernel code outside of spinlock-protected regions and interrupt handlers to be eligible for non-voluntary preemption by higher priority kernel threads. With this option, worst case latency drops to (around) single digit milliseconds, although some device drivers can have interrupt handlers that will introduce latency much worse than that. If a real-time Linux application requires latencies smaller than single-digit milliseconds, use of the CONFIG_PREEMPT_RT patch is highly recommended.
The RT-Preempt patch converts Linux into a fully preemptible kernel. The magic is done with:
- Making in-kernel locking-primitives (using spinlocks) preemptible though reimplementation with rtmutexes.
- Critical sections protected by i.e. spinlock_t and rwlock_t are now preemptible. The creation of non-preemptible sections (in kernel) is still possible with raw_spinlock_t (same APIs like spinlock_t).
- Implementing priority inheritance for in-kernel spinlocks and semaphores.
- Converting interrupt handlers into preemptible kernel threads: The RT-Preempt patch treats soft interrupt handlers in kernel thread context, which is represented by a task_struct like a common user space process. However it is also possible to register an IRQ in kernel context.
- Converting the old Linux timer API into separate infrastructures for high resolution kernel timers plus one for timeouts, leading to user space POSIX timers with high resolution.
Architectures, CONFIG_PREEMPT_RT patch does the support:
There are systems representing the x86, x86_64, ARM, MIPS, and Power architectures using the CONFIG_PREEMPT_RT patch. However, in many ways this is the wrong question. Support for real-time is not just about the instruction set architecture, but also about supporting the high resolution timer provided by the CPU and/or CPU support chipset, the device drivers for the system being well behaved, etc.
Please refer to platforms tested and in use with CONFIG_PREEMT_RT section in this wiki for a list of platforms that members of the -rt community have used successfully.
Disadvantages of preempt_rt patch
The normal Linux kernel allows preemption of a task by a higher priority task only when the user space code is getting executed.
In order to reduce the latency, the CONFIG_PREEMPT_RT patch forces the kernel to non-voluntarily preempt the task at hand, at the arrival of a higher proiority kernel task. This is bound to cause a reduction in the overall throughput of the system since there will be several context switches and also the lower priority tasks won’t be getting much a chance to get through.
This is the current status of Realtime Linux using the Realtime Preempt patches (aka PREEMPT_RT):
1) Since kernel 2.6.24 in mainline Linux
2) Since kernel 2.6.25 in mainline Linux
3) Realtime-Preempt patches 184.108.40.206-rt15 or higher required
4) Since kernel 2.6.30 in mainline Linux
5) Not yet adapted to generic interrupt code
6) Since kernel 2.6.33 in mainline Linux
7) Since kernel 2.6.39 in mainline Linux
Installation of preempt_rt patch to linux kernel and compiling new custom linux kernel
- Download kernel from kernel.org. Also, the realtime preemption patch can be downloaded here: http://www.kernel.org/pub/linux/kernel/projects/rt/. Make sure that the RT-Preempt version fits the kernel version you intend to use.
- or via ketchup: automatically download various kernel patches and update kernels.
ketchup 3.12.6 ketchup -s 2.6-rt #find latest RT revision
- After downloading, unpack the kernel tarball and change into the kernel source directory. Patch the kernel with patch level p1:
bzcat ../patch-220.127.116.11-rt11.bz2 | patch -p1
- Configure kernel, save .config file. Check: ‘Preemption mode’ configurations
- activated the High-Resolution-Timer Option (Attention, the amount of supported platforms by the HR timer is still very limited. Right now the option is only supported on x86 systems, PowerPC and ARM Support are however in queue.)
- disabled all Power Management Options like ACPI or APM
- Further interesting options can be found under the “Kernel Hacking” menu entry. This menu lists options for system debugging and performance measurement. Keep in mind that the debug options may either increase the kernel size or cause higher latencies. If you do not want to debug the kernel or get some automatically produced histograms, make sure that you don’t activate any unnecessary options here. If you have activated any latency critical options the kernel will warn at boot time.
- Compile & install
make make modules make modules_install make install
preempt_rt patch, ‘Preemption mode’ configurations
- No Forced Preemption (Server) CONFIG_PREEMPT_NONE:
This is the traditional Linux preemption model, geared throughput. It will still provide good latencies most of the time, but there are no guarantees and occasional longer delays are possible. Select this option if you are building a kernel for a server or scientific/computation system, or if you want to maximize the raw processing power of the kernel, irrespective of scheduling latencies.
- Voluntary Kernel Preemption (Desktop) CONFIG_PREEMPT_VOLUNTARY:
This option reduces the latency of the kernel by adding more”explicit preemption points” to the kernel code. These new preemption points have been selected to reduce the maximum latency of rescheduling, providing faster application reactions, at the cost of slightly lower throughput. This allows reaction to interactive events by allowing a low priority process to voluntarily preempt itself even if it is in kernel mode executing a system call. This allows applications to run more ‘smoothly’ even when the system is under load. Select this if you are building a kernel for a desktop system.
- Preemptible Kernel (Low-Latency Desktop) CONFIG_PREEMPT__LL:
This option reduces the latency of the kernel by making all kernel code (that is not executing in a critical section) preemptible. This allows reaction to interactive events by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing a system call and would otherwise not be about to reach a natural preemption point. This allows applications to run more ‘smoothly’ even when the system is under load, at the cost of slightly lower throughput and a slight runtime overhead to kernel code. Select this if you are building a kernel for a desktop or embedded system with latency requirements in the milliseconds range.
- Preemptible Kernel (Basic RT) CONFIG_PREEMPT_RTB:
This option is basically the same as (Low-Latency Desktop) but enables changes which are preliminary for the full preemptible RT kernel.
- Fully Preemptible Kernel (RT) CONFIG_PREEMPT_RT_FULL:
All and everything.
There have been two approaches to bring realtime capability to linux kernel:
- Mentioned PREEMPT_RT and other one is
- Linux (realtime) extensions: Add extra layer between hardware and the Linux kernel to manage real-time tasks separately. Below the normal kernel, there have been three generations: RTLinux, RTAI, and Xenomai.
The content of this text is mainly compiled from various resources.