4.7.3. Synchronization primitives, locks, and semaphore

When porting software from a single core environment to run on multi-core cluster, you might need to modify code to perform the following operations:

The Linux kernel, like other operating systems, provides a number of different synchronization primitives for this purpose. Most such primitives are implemented using the same architectural features as application-level threading libraries like Pthreads.

Understanding which of these is best suited for a particular case can improve software performance. Serialization and multiple threads contending for a resource can reduce the performance benefit provided by the multiple cores. In all cases, minimizing the size of the critical section provides best performance.


Completions are a feature provided by the Linux kernel. You can use them to serialize task execution. They provide a lightweight mechanism that provides a flag to signal completion of an event between two tasks.

The task that is waiting can sleep until it receives the signal, using wait_for_completion (struct completion *comp). The task that is sending the signal typically uses either of the following:

  • complete (struct completion *comp), which wakes up one waiting process

  • complete_all (struct completion *comp), which wakes all processes that are waiting for the event.

Kernel version 2.6.11 added support for completions that can time out and for interruptible completions.


A spinlock provides a simple binary locking mechanism to protect critical sections. It implements a busy-wait loop. A spinlock is a generic synchronization primitive that can be accessed by any number of threads.

More than one thread might be spinning for obtaining the lock. However, only one thread can obtain the lock. The waiting task executes spin_lock (spinlock_t *lock) and the signaling task uses spin_unlock (spinlock_t *lock). Spinlocks do not sleep and disable preemption.


Semaphores are a widely used method to control accesses to shared resources. You can use them to achieve serialization of execution. They provide a counting locking mechanism that can cope with multiple threads attempting to lock.

They can be used to protect critical sections and are useful when there is no fixed latency requirement. However, where there is a significant amount of contention for a semaphore, performance is reduced. The Linux kernel provides a straightforward API with functions down (struct semaphore *sem) and up(struct semaphore *sem) to lower and raise the semaphore.

Unlike spinlocks, which spin in a busy wait loop, semaphores have a queue of pending tasks. When a semaphore is locked, the task yields, so that some other tasks can run. Semaphores can be binary (in which case they are also mutexes) or counting.

Lock-free synchronization

If you have multiple readers and writers to a shared resource, using a mutex might not be efficient. A mutex would prevent concurrent read access to the shared resource because only a single thread is permitted inside the critical section.

The use of lock-free data structures, such as circular buffers, can avoid the overheads associated with spinlocks or semaphores. The Linux kernel also provides the following synchronization mechanisms that are lock-free:


Read-Copy-Update (RCU) can help in the case where the shared resource is mainly accessed by readers. Reader threads execute with little synchronization overhead. A thread that writes the shared resource has a much higher overhead, but is executed relatively infrequently. The writer thread must make a copy of the shared resource, and access to shared resources must be granted though pointers.

When the update is complete, it publishes the new data structure, so that it is visible to all readers. The original copy is preserved until the next context switch on all cores. This ensures that all current read operations can complete. RCUs are more complex to use than standard mutexes and are typically used only when traditional solutions are not suitable.


Seqlocks provide quick access to shared resources, without using locks. They are optimized for short critical sections. Readers can access the shared resource with no overhead, but must explicitly check and retry if there is a conflict with a write. Writes still require exclusive access to the shared resource. They were originally developed to handle things like system time, a global variable that can be read by many processes and is written only by a timer-based interrupt on a frequent basis. Using a seqlock, instead of a mutex, enables many readers to share access, without locking out the writer from accessing the critical section.

Copyright © 2014 ARM. All rights reserved.ARM DAI0425