2.7 Timing in the Linux Kernel
In the Linux kernel, clocks "tick" slightly different by than they do in the real world. The time does not progress continually, but in increments of 10 ms (milliseconds) each, which is called a tick. This means that the time virtually stands still between any two ticks. The number of ticks since the system started is recorded in a variable called jiffies in the kernel. The timer interrupt increments the jiffies variable at each interrupt. The terms ticks and jiffies are often used interchangeably.
The resolution frequency of the timer interrupt is initialized to the value of the variable HZ (include/asm/param.h), and it increments the jiffies variable every . This length of time is absolutely sufficient for normal applications, because a higher interrupt frequency would only mean a higher load on the system due to too many unnecessary interruptions [RuCo01]. However, there are certain situations where a high timer resolution is required, especially to measure smaller time increments or for running actions at specific points in time [WeRi00]. In networks, you often find such requirements for protocol instances, for example protocol instances that have to calculate packet run times or traffic shapers that have to measure minimum time intervals in the microsecond range.
Most of these tasks require clocks with a resolution that is at least in the microsecond range. For example, to implement a traffic shaper [Tane97], you have to calculate the number of bytes that could be sent within a specific interval. For example, the jiffies time measurement with a resolution of 100 Hz is not suitable. With a rate of 2 Mbits/s, an interval of already corresponds to a packet with a length of 2500 bytes.
To avoid this problem, most modern processors (Pentium, Alpha, etc.) have appropriate registers. They have been added to those processors mainly to allow system performance measurements and less for traffic shaping in networks. But, while they are present, their use is quite popular. In the Pentium processor and its successors (and most of its clones), this is a 64-bit-wide TSC (Time Stamp Counter) register; its content is incremented by a value of one in each processor clock. The content of this register shows the number of elapsed clock cycles since system start.
The TSC register is actually nothing more than a hardware variant of jiffies, except that its resolutions is higher by a factor of between 106 and 108. This means, for example, that you can measure intervals with an accuracy of 0.001 ms in a Pentium processor with a clock rate of 1 GHz.
Nevertheless, there is a certain inaccuracy when measuring with the TSC register, because it takes a few clocks (approx. ten) to read the register. The reason is the main memory access that occurs after the register value has been read. It can be done only in the bus frequency, which corresponds to a fraction of the CPU frequency. In addition, there could be effects in the first-level and second-level cache accesses that can easily lead to false measurements. However, the error caused by the TSC register is meaningless for normal measurements, because most of them measure only relatively big time cycles (in the 1-ms range). The command get_cycles() (defined in <asm/timex.h>) can be used to read the content of the TSC register.
2.7.1 Standard Timers
In addition to measuring intervals in the microsecond range, we also need a way to run a function at a specific point in time to implement a traffic shaper [WeRi00], which sends packets at specific points in time. The resolution of such a timer should be at least in the 100-ms range. However, due to the fact that a PC has only one timer component, you can use only this one. As described above, the interrupt is triggered HZ times per second. In addition to updating jiffies, Linux uses the timer interrupt to run functions at specific points in time (i.e., the timer handler).
A timer queue can be used when a function of the kernel should run at a specific point in time (e.g., switching off the floppy motor). At each occurrence of a timer interrupt, the timer interrupt routine updates the jiffies variable and also checks the timer queue for timer handling routines, as may be present. Each timer_list structure within the timer queue stands for one function (timer handling routine), which is to run at a specific point in time (expires). The exact process of the timer resolution and of subsequent checking of the timer queue is described in [RuCo01].
The following functions are available to manage the timer queue:
add_timer() adds a timer_list structure to the timer queue according to the time specified by expires. A timer_list structure represents a timer handling routine (i.e., a function to be executed). The kernel runs this function at the specified time.
Note, however, that the timer interrupt is triggered only HZ times per second. This means that the method can run only when expires reaches the value of jiffies. Therefore, there is a small difference between time t when the function should theoretically run and the next possible value of jiffies. This difference can take up to . But 10 ms is too long to allow reasonable traffic shaping.
del_timer() deletes a timer_list structure from the timer queue. The corresponding function will then no longer run.
init_timer() initializes a timer_list structure. This function should always be called when a timer_list structure was created.
2.7.2 Using the APIC for High-resolution Timers
The current Linux kernel does not support any freely programmable timers with an accuracy in the microsecond range. As explained above, such high-resolution timers are required to support various functionalities (e.g., for traffic shaping in high-speed networks and to synchronize multimedia contents playback), but additional usages are conceivable. On the other hand, there is the problem that modern processors become increasingly faster, while the accuracy of timers remains at the state of the eighties for downward compatibility with vintage PCs.
There are two basic usages for high-resolution timers:
Periodic shot: A timer with a specific interval is initialized and then periodically triggers an interrupt when this interval expires. This corresponds to the behavior of the timer interrupt in the Linux kernel, which always triggers an interrupt after 10 ms.
This type of timer is suitable for all scenarios where actions have to run frequently and normally after fixed intervals. If the accuracy of these intervals is within the range of milliseconds, then the standard timers described in the previous sections can be used.
One shot: Exactly one action needs to run at a specific time, regardless of other events. Such an action is, for example, when you send a packet at a pre-calculated time or represent an image from a video.
Until recently, one-shot and periodic-shot timers had been available only on the basis of timer interrupts, offering an accuracy of not more than 1/HZ seconds. The timer functionality introduced next is based on the APIC component (in short APIC timer) to avoid the problems outlined above. The UKA-APIC timer was developed at the Institute for Telematics at the University of Karlsruhe, Germany, and can be downloaded from [ObWe01].
Technical Basis of the APIC Timer
Intel's x86 processor family originally used the PIC 8259A Programmable Interrupt Controller to manage triggered interrupts. It was used since the first personal computer at the beginning of the eighties and met its tasks without problem. However, multiprocessor capability needs to distribute triggered interrupts among several processors of an SMP computer. For this reason, Intel introduced the so-called APIC (Advanced Programmable Interrupt Controller). More specifically, there are the following two different chips, as shown in Figure 2-9:
The local APIC has been integrated in all Pentium processors (since Pentium P54C), and cooperates with the I/O APIC described below in multiprocessor systems. In addition to communicating with the I/O APIC and handling of incoming interrupts, a local APIC offers interesting possibilities, so it will be described here in more detail. Each local APIC has several 32-bit registers, an internal clock, an internal timer, 240 interrupt vectors, and two additional interrupt lines that can be used for interrupts generated locally.
The I/O APIC is a separate component, collecting external interrupts and distributing them to the set of processors of a system. An I/O APIC is generally present only in multiprocessor systems, where such systems may indeed use more than one I/O APIC, which is supported in Linux since Version 2.4 [BoCe00]. The I/O APIC connects to the local APIC components of each installed processor over an interrupt Controller Communication Bus (ICC).
Figure 2-9. Use of an Advanced Programmable Interrupt Controller (APIC) in multiprocessor systems.
The internal timer of a local APIC is the most interesting part for the tasks discussed in this section. The internal timer works in bus-clock accuracy and can be initialized to a specific value. Subsequently, the value of the timer is decremented at each bus clock, and an interrupt is triggered when zero is reached. This means that the internal timer of the APIC component can be used to implement a high-resolution timer with almost bus-clock accuracy.
In contrast, single-processor systems do not integrate I/O components, and their local APICs are not activated when the system starts in most operating systems. In older P5 processors, you could activate the local APIC component only when the system started, and hardware manipulation was the only way to initialize it again. Since the P6 processor generation (Pentium Pro and successors), you can activate the integrated local APIC also during operation by use of software commands. This means that it can be used to implement a high-resolution timer.
Functionality of the UKA-APIC Timer
We emphasize here once more that the local APIC can be used for a freely programmable timer only in single-processor computers, because the timer of the local APIC in multiprocessor systems is used for interprocessor synchronization.
Some versions of the Linux kernel 2.3 allowed you to reactivate the local APIC component over a module. Unfortunately, this module is no longer present in the 2.4 versions. However, there is a patch [Pett01] you can use to activate the local APIC in single-processor systems at runtime. Based on an activated local APIC and its integrated timer, a high-resolution timer support was developed, featuring a programming interface similar to that of the standard timer of the Linux kernel [ObWe01, WeRi00].
The APIC timer also consists of a patch, integrating the interfaces required in the kernel and from a kernel module that manages the timers. One of the goals set when developing the APIC timer was to pack as much functionality and tasks as possible into one module to keep the understanding and maintenance simple. Unfortunately, there is no way around changes to the kernel for two reasons: First, you first have to activate the APIC component; second, there is no interface to register an interrupt handling routing for the APIC timer interrupt; request_irq() does not help either. For this reason, the APIC timer handling routing, smp_apic_timer_interrupt(), normally used in an SMP configuration, is overwritten by another one, which allows entry into the use as a freely programmable timer (set_apic_timer_up_handler()). This method can be used only to set a new handling routine for the APIC timer interrupt.
The UKA APIC Timer Module
The UKA APIC Timer module offers the interface required to register individual handling routines. The module consists mainly of management functions for the timer and methods to achieve as high a timer accuracy as possible.
Registered handling routines are managed in a linked list, similar to the management of the standard timer of the Linux kernel. The individual elements are structured as follows:
struct apic_timer_list *next, *prev;
unsigned long long expires;
unsigned long data;
void (*function)(unsigned long long, unsigned long)
next and prev are used to link the apic_timer_list entries.
The variable expires contains a value for the timestamp counter register, which specifies the time when the handling routine should run. Note that the TSC register operates with the processor clock and not with the bus clock (like the local APIC). The linked list is ordered by trigger points (expires) for performance reasons.
data is a pointer that can be used to point to private data contained in the handling routine. This can be useful for reentrant functions to point to a specific instance.
function is the function pointer pointing to the handling routine to be executed. Function() is called as soon as the time specified by expires is reached. The parameter data is also passed at this point in time.
The UKA-APIC timer module offers the following interface to the outside. The header file uka_apic_timer.h should be embedded to use this interface. To make things simpler for the programmer, the structure of the UKA-APIC timer interface is almost identical to the interface of the standard kernel timer:
init_apic_timer(struct apic_timer_list *timer) initializes the passed structure of type apic_timer_list. Currently, only pointers for the linking are set to NULL.
add_apic_timer(struct apic_timer_list *timer) registers a structure of type apic_timer_list and adds it to the linked list of the registered timers. The handling method timer?gt;function() runs when the timer reaches timer?gt;expires.
del_apic_timer(struct apic_timer_list *timer) removes an apic_timer_list structure from the list of registered timers. This means that the handling routine will no longer run when the timer reaches expires.
mod_apic_timer(struct apic_timer_list *timer, unsigned long long expires) modifies the time when a registered timer should run. This change can mean that the apic_timer_list structure may have to be put in another place within the list.
The following code fragment is a simple example to show how you can use the UKA-APIC timer. The following steps are required to register the handling routine test_timer_handler() so that it will run within two microseconds:
#define SYS_CLOCK 500000000 //(500 MHz)
static struct apic_timer_list test_timer;
unsigned long long timestamp;
static struct egal_daten data1;
void test_timer_handler(unsigned long long exp, unsigned long data)
/* Do here what you think you have to do :? */
* e.g., use hard_start_xmit to send a packet */
/* ... This is a routine, in which the timer is activated ... */
/* Initialize the apic_timer_list structure */
/* Read the current time (status of the TSC register) */
timestamp = get_cycles();
/* Set the values... */
timer.function = (void*) &test_timer_handler;
timer.expires = timestamp + (SYS_CLOCK * (2 / 1000000));
timer.data = (unsigned long) &data1;
/* Register the timer */