Previous section   Next section


24.5 Timer Management In TCP

In closing, this section briefly discusses how timers are managed in TCP. Timers are used in different positions within the TCP protocol to control retransmissions and to limit the hold time for missing packets.

24.5.1 Data Structures

struct timer_list

include/linux/timer.h


struct timer_list {
       struct list_head list;
       unsigned long expires;
       unsigned long data;
       void (*function) (unsigned long);
       volatile int running;
};

The basis for each timer is the jiffies variable. As described in Chapter 2, it is updated by Linux every 10 ms.

A timer structure includes a function pointers that takes a behavior function when initialized. This function is invoked when the timer expires. The time at which it expires depends on the expires field. This field takes an offset (in the jiffies unit) for the current time (also in the jiffies unit). The behavior function is invoked when this value is reached.

TCP maintains seven timers for each connection:

  • SYNACK timer This timer is used when a TCP instance changes its state from LISTEN to SYN_RECV. The TCP instance of the server initially waits three seconds for an ACK. If no ACK arrives within this time, then the connection request is considered outdated.

  • Retransmit timer Because the TCP protocol uses only positive acknowledgements, the sending TCP instance has to see for itself whether a segment was lost. It does this by use of the retransmit timer, the expiry of which indicates that a segment could have been lost, causing its retransmission.

    The exponential backoff method assumes that retransmissions are caused by a congestion. When segments are retransmitted, the timer value is increased exponentially so as to be able to detect segment losses.

The retransmit timer determines when packets have to be retransmitted during a data transmission phase. This value depends on the round-trip time and normally is within the range from 200 ms to two minutes.

This timer is also used during the establishment of a connection. It is initialized to three seconds. Upon expiry of this time, the backoff mechanism is used five times.

  • Delayed ACK timer This timer delays the transmission of ACK packets. The value is smaller than 200 ms.

  • Keepalive timer This timer is used to test whether a connection is still up. It is invoked for the first time after nine hours. Subsequently, nine probes are sent every 75 seconds. If all probes fail, the connection is reset.

  • Probe timer This timer is used to test for a defined time interval, to see whether the zero window still applies. The value depends on the round-trip time.

  • FinWait2 timer The expiry of this timer switches the connection from the FIN_WAIT2 state into the CLOSED state, if no FIN packet from the partner arrives.

  • TWKill timer This timer manages the interval in the TIME_WAIT state. The value is twice the maximum segment lifecycle, which is 60 seconds.

24.5.2 Functions

This section first introduces all general timer functions.

tcp_init_xmit_timers()

net/ipv4/tcp_timer.c


tcp_init_xmit_timers(sk) initializes the set of various timers. The timer_list structures are hooked, and the function pointers are converted to the respective behavior functions.

tcp_reset_xmit_timer()

include/net/tcp.h


The function tcp_reset_xmit_timer(sk, what, when) sets the timer specified in what to the time when (i.e., to the time jiffies + when).

tcp_clear_xmit_timer()

include/net/tcp.h


The function tcp_clear_xmit_timer(sk) removes all timers set for a connection from the linked list of timer_list structures.

SYNACK Timer

The actions of the SYNACK timer are implemented in the function tcp_synack_timer(sk) (include/linux/tcp_timer.c). This timer walks through a list of all connections with unacknowledged SYNACK packets and deletes all connections for which the timeout value min ((TCP_TIMEOUT_INIT << req->retrans), TCP_RTO_MAX), has expired. Subsequently, the keepalive timer is started for a new connection and initialized by TCP_SYNQ_INTERVAL.

There are various functions to manage the keepalive timer:

tcp_delete_keepalive_timer()

net/ipv4/tcp_timer.c


The function tcp_delete_keepalive_timer(sk) removes the keepalive timer from the list of timer_list structures.

tcp_reset_keepalive_timer()

net/ipv4/tcp_timer.c


The function tcp_reset_keepalive_timer(sk, len) sets the timer to the value jiffies + len.

tcp_keepalive_timer()

net/ipv4/tcp_timer.c


The function tcp_keepalive_timer(data) is the behavior function for the keepalive timer. When this function is invoked, the state of the connection is checked to decide whether this connection should be terminated. This function implements various logically separated TCP timers. In addition to the SYNACK timer described here, it implements the timeout in the FIN_WAIT_2 state and the actual keepalive timer.

Retransmit Timer

The client's TCP instance sends a SYN packet to the server and waits for an answer (with SYN and ACK set) while the connection-establishment phase is active. At the same time, tcp_connect() (in net/ipv4/tcp_output.c) and the function tcp_reset_xmit_timer (sk, TCP_TIME_RETRANS, tp->rto) are used to set the retransmit timer to the value TCP_TIMEOUT_INIT. TCP_TIMEOUT_INIT is set to the value 3*HZ in the file include/net/tcp.h.

The retransmit timer is also used when the connection is established and running. The duration of the timeout is doubled, in the tcp_retransmit_timer() function (net/ipv4/tcp_timer.c), upon each retransmission, until it has arrived at the maximum and the connection is reset:

tp->rto = min(tp->rto << 1, TCP_RTO_MAX);
       tcp_reset_xmit_timer(sk, TCP_TIME_RETRANS, tp->rto);
       if (tp->retransmits > sysctl_tcp_retries1)
               __sk_dst_reset(sk);

The retransmit timer is initialized to the retransmission timeout (RTO), which is recalculated by use of various helper functions:

tcp_set_rto()

net/ipv4/tcp_input.c


This function computes the RTO (retransmission timeout) from the round-trip time values.

tcp_bound_rto()

net/ipv4/tcp_input.c


The function tcp_bound_rto(tp) limits the value range for RTO to a fixed interval.

tcp_ack_saw_tstamp()

net/ipv4/tcp_input.c


The function tcp_ack_saw_tstamp(sk, tp, seq, ack, flag) computes and sets the RTO and terminates the retransmission mode, if applicable.

tcp_ack_packets_out()

net/ipv4/tcp_input.c


In the retransmission mode, the function tcp_ack_packets_out(sk, tp) continues to send packets from the retransmission queue and updates the retransmit timer.

Delayed ACK Timer

The Delayed ACK timer is implemented in the function tcp_delack_timer().

tcp_delack_timer()

net/ipv4/tcp_timer.c


This function is invoked when the TCP_TIME_DACK timer expires. It resets the Delayed ACK timer and sends an ACK packet.

tcp_send_delayed_ack()

net/ipv4/tcp_output.c


This function is invoked by tcp_ack_snd_check() (include/linux/tcp_input.c) when an incoming packet should be acknowledged and when no direct ACK is required.

The function tcp_send_delayed_ack() uses the mod_timer (&tp->delack_timer, timeout) call to set the Delayed ACK timer. When this timer expires, the function tcp_send_delayed_ack(sk) sends a delayed ACK packet, and the Delayed ACK timer restarts.

Keepalive Timer

The actual keepalive timer is implemented in the function tcp_keepalive_timer(), which serves to test a connection that has not been used over a lengthy period of time. When this timer expires, the function tcp_write_wakeup(sk) uses the function tcp_xmit_probe_skb() (both in net/ipv4/tcp_output.c) to send a probe packet. Subsequently, the variable tp-> probes_out is incremented until the maximum number of probes, defined in sysctl_tcp_keepalive_probes, is reached.

Probe Timer

Zero-window probing was described in Section 24.4.1; we mention it here only for the sake of completeness.

FinWait2 Timer

The keepalive timer is also used to implement the timeout when waiting for a FIN packet in the FIN_WAIT2 state. In this case, it is used to compute the timeout duration. During calling of the function tcp_time_wait(), the connection state changes to TIME_WAIT. The connection is closed when the timeout expires, at the latest.

TWKill Timer

The timeout in the TIME_WAIT state is implemented by the function tcp_tw_schedule() (net/ipv4/tcp minisocks.c), which is invoked by the function tcp_time_wait() when the connection is torn down.

24.5.3 Configuration

To be able to use the TCP/IP support, we have to activate the option TCP/IP networking in the kernel-configuration menu.

In addition, when creating a socket, you can use optional settings to influence the behavior. These settings are defined as constants in the file /include/linux/tcp.h. All available settings are listed in Table 24-1.

Table 24-1. Socket options for the Transmission Control Protocol.

Socket Options

TCP_NODELAY

Disables the Nagle algorithm.

TCP_MAXSEG

Limits the maximum segment size.

TCP_CORK

Only segments with max.size are sent.

TCP_KEEPIDLE

Initial value for keepalive probes.

TCP_KEEPINTV

Interval between keepalive probes.

TCP_KEEPCNT

Number of keepalives.

TCP_SYNCNT

Number of SYN retransmissions.

TCP_LINGER2

Timeout duration in the FIN_WAIT2 state.

TCP_DEFER_ACCEPT

Notify only when data is received.

TCP_WINDOW_CLAMP

Limit the receive window.

TCP_INFO

Information about the current connection.

TCP_QUICKACK

Activate or deactivate Quick ACKs.

TCP_OPT_TIMESTAMPS

Activate or deactivate the timestamp option.

TCP_OPT_WSCALE

Activate or deactivate the window scaling option.

TCP_OPT_ECN

Activate or deactivate the ECN (Explicit Congestion Notification).



      Previous section   Next section