Home
小杰的博客 Prev Page Prev Page
?
Main Page
Table of content
Copyright
Addison-Wesley Professional Computing Series
Foreword
Preface
Introduction
Changes from the Second Edition
Using This Book
Source Code and Errata Availability
Acknowledgments
Part 1: Introduction and TCP/IP
Chapter 1. Introduction
1.1 Introduction
1.2 A Simple Daytime Client
1.3 Protocol Independence
1.4 Error Handling: Wrapper Functions
1.5 A Simple Daytime Server
1.6 Roadmap to Client/Server Examples in the Text
1.7 OSI Model
1.8 BSD Networking History
1.9 Test Networks and Hosts
1.10 Unix Standards
1.11 64-Bit Architectures
1.12 Summary
Exercises
Chapter 2. The Transport Layer: TCP, UDP, and SCTP
2.1 Introduction
2.2 The Big Picture
2.3 User Datagram Protocol (UDP)
2.4 Transmission Control Protocol (TCP)
2.5 Stream Control Transmission Protocol (SCTP)
2.6 TCP Connection Establishment and Termination
2.7 TIME_WAIT State
2.8 SCTP Association Establishment and Termination
2.9 Port Numbers
2.10 TCP Port Numbers and Concurrent Servers
2.11 Buffer Sizes and Limitations
2.12 Standard Internet Services
2.13 Protocol Usage by Common Internet Applications
2.14 Summary
Exercises
Part 2: Elementary Sockets
Chapter 3. Sockets Introduction
3.1 Introduction
3.2 Socket Address Structures
3.3 Value-Result Arguments
3.4 Byte Ordering Functions
3.5 Byte Manipulation Functions
3.6 'inet_aton', 'inet_addr', and 'inet_ntoa' Functions
3.7 'inet_pton' and 'inet_ntop' Functions
3.8 'sock_ntop' and Related Functions
3.9 'readn', 'writen', and 'readline' Functions
3.10 Summary
Exercises
Chapter 4. Elementary TCP Sockets
4.1 Introduction
4.2 'socket' Function
4.3 'connect' Function
4.4 'bind' Function
4.5 'listen' Function
4.6 'accept' Function
4.7 'fork' and 'exec' Functions
4.8 Concurrent Servers
4.9 'close' Function
4.10 'getsockname' and 'getpeername' Functions
4.11 Summary
Exercises
Chapter 5. TCP Client/Server Example
5.1 Introduction
5.2 TCP Echo Server: 'main' Function
5.3 TCP Echo Server: 'str_echo' Function
5.4 TCP Echo Client: 'main' Function
5.5 TCP Echo Client: 'str_cli' Function
5.6 Normal Startup
5.7 Normal Termination
5.8 POSIX Signal Handling
5.9 Handling 'SIGCHLD' Signals
5.10 'wait' and 'waitpid' Functions
5.11 Connection Abort before 'accept' Returns
5.12 Termination of Server Process
5.13 'SIGPIPE' Signal
5.14 Crashing of Server Host
5.15 Crashing and Rebooting of Server Host
5.16 Shutdown of Server Host
5.17 Summary of TCP Example
5.18 Data Format
5.19 Summary
Exercises
Chapter 6. I/O Multiplexing: The 'select' and 'poll' Functions
6.1 Introduction
6.2 I/O Models
6.3 'select' Function
6.4 'str_cli' Function (Revisited)
6.5 Batch Input and Buffering
6.6 'shutdown' Function
6.7 'str_cli' Function (Revisited Again)
6.8 TCP Echo Server (Revisited)
6.9 'pselect' Function
6.10 'poll' Function
6.11 TCP Echo Server (Revisited Again)
6.12 Summary
Exercises
Chapter 7. Socket Options
7.1 Introduction
7.2 'getsockopt' and 'setsockopt' Functions
7.3 Checking if an Option Is Supported and Obtaining the Default
7.4 Socket States
7.5 Generic Socket Options
7.6 IPv4 Socket Options
7.7 ICMPv6 Socket Option
7.8 IPv6 Socket Options
7.9 TCP Socket Options
7.10 SCTP Socket Options
7.11 'fcntl' Function
7.12 Summary
Exercises
Chapter 8. Elementary UDP Sockets
8.1 Introduction
8.2 'recvfrom' and 'sendto' Functions
8.3 UDP Echo Server: 'main' Function
8.4 UDP Echo Server: 'dg_echo' Function
8.5 UDP Echo Client: 'main' Function
8.6 UDP Echo Client: 'dg_cli' Function
8.7 Lost Datagrams
8.8 Verifying Received Response
8.9 Server Not Running
8.10 Summary of UDP Example
8.11 'connect' Function with UDP
8.12 'dg_cli' Function (Revisited)
8.13 Lack of Flow Control with UDP
8.14 Determining Outgoing Interface with UDP
8.15 TCP and UDP Echo Server Using 'select'
8.16 Summary
Exercises
Chapter 9. Elementary SCTP Sockets
9.1 Introduction
9.2 Interface Models
9.3 'sctp_bindx' Function
9.4 'sctp_connectx' Function
9.5 'sctp_getpaddrs' Function
9.6 'sctp_freepaddrs' Function
9.7 'sctp_getladdrs' Function
9.8 'sctp_freeladdrs' Function
9.9 'sctp_sendmsg' Function
9.10 'sctp_recvmsg' Function
9.11 'sctp_opt_info' Function
9.12 'sctp_peeloff' Function
9.13 'shutdown' Function
9.14 Notifications
9.15 Summary
Exercises
Chapter 10. SCTP Client/Server Example
10.1 Introduction
10.2 SCTP One-to-Many-Style Streaming Echo Server: 'main' Function
10.3 SCTP One-to-Many-Style Streaming Echo Client: 'main' Function
10.4 SCTP Streaming Echo Client: 'str_cli' Function
10.5 Exploring Head-of-Line Blocking
10.6 Controlling the Number of Streams
10.7 Controlling Termination
10.8 Summary
Exercises
Chapter 11. Name and Address Conversions
11.1 Introduction
11.2 Domain Name System (DNS)
11.3 'gethostbyname' Function
11.4 'gethostbyaddr' Function
11.5 'getservbyname' and 'getservbyport' Functions
11.6 'getaddrinfo' Function
11.7 'gai_strerror' Function
11.8 'freeaddrinfo' Function
11.9 'getaddrinfo' Function: IPv6
11.10 'getaddrinfo' Function: Examples
11.11 'host_serv' Function
11.12 'tcp_connect' Function
11.13 'tcp_listen' Function
11.14 'udp_client' Function
11.15 'udp_connect' Function
11.16 'udp_server' Function
11.17 'getnameinfo' Function
11.18 Re-entrant Functions
11.19 'gethostbyname_r' and 'gethostbyaddr_r' Functions
11.20 Obsolete IPv6 Address Lookup Functions
11.21 Other Networking Information
11.22 Summary
Exercises
Part 3: Advanced Sockets
Chapter 12. IPv4 and IPv6 Interoperability
12.1 Introduction
12.2 IPv4 Client, IPv6 Server
12.3 IPv6 Client, IPv4 Server
12.4 IPv6 Address-Testing Macros
12.5 Source Code Portability
12.6 Summary
Exercises
Chapter 13. Daemon Processes and the 'inetd' Superserver
13.1 Introduction
13.2 'syslogd' Daemon
13.3 'syslog' Function
13.4 'daemon_init' Function
13.5 'inetd' Daemon
13.6 'daemon_inetd' Function
13.7 Summary
Exercises
Chapter 14. Advanced I/O Functions
14.1 Introduction
14.2 Socket Timeouts
14.3 'recv' and 'send' Functions
14.4 'readv' and 'writev' Functions
14.5 'recvmsg' and 'sendmsg' Functions
14.6 Ancillary Data
14.7 How Much Data Is Queued?
14.8 Sockets and Standard I/O
14.9 Advanced Polling
14.10 Summary
Exercises
Chapter 15. Unix Domain Protocols
15.1 Introduction
15.2 Unix Domain Socket Address Structure
15.3 'socketpair' Function
15.4 Socket Functions
15.5 Unix Domain Stream Client/Server
15.6 Unix Domain Datagram Client/Server
15.7 Passing Descriptors
15.8 Receiving Sender Credentials
15.9 Summary
Exercises
Chapter 16. Nonblocking I/O
16.1 Introduction
16.2 Nonblocking Reads and Writes: 'str_cli' Function (Revisited)
16.3 Nonblocking 'connect'
16.4 Nonblocking 'connect:' Daytime Client
16.5 Nonblocking 'connect:' Web Client
16.6 Nonblocking 'accept'
16.7 Summary
Exercises
Chapter 17. 'ioctl' Operations
17.1 Introduction
17.2 'ioctl' Function
17.3 Socket Operations
17.4 File Operations
17.5 Interface Configuration
17.6 'get_ifi_info' Function
17.7 Interface Operations
17.8 ARP Cache Operations
17.9 Routing Table Operations
17.10 Summary
Exercises
Chapter 18. Routing Sockets
18.1 Introduction
18.2 Datalink Socket Address Structure
18.3 Reading and Writing
18.4 'sysctl' Operations
18.5 'get_ifi_info' Function (Revisited)
18.6 Interface Name and Index Functions
18.7 Summary
Exercises
Chapter 19. Key Management Sockets
19.1 Introduction
19.2 Reading and Writing
19.3 Dumping the Security Association Database (SADB)
19.4 Creating a Static Security Association (SA)
19.5 Dynamically Maintaining SAs
19.6 Summary
Exercises
Chapter 20. Broadcasting
20.1 Introduction
20.2 Broadcast Addresses
20.3 Unicast versus Broadcast
20.4 'dg_cli' Function Using Broadcasting
20.5 Race Conditions
20.6 Summary
Exercises
Chapter 21. Multicasting
21.1 Introduction
21.2 Multicast Addresses
21.3 Multicasting versus Broadcasting on a LAN
21.4 Multicasting on a WAN
21.5 Source-Specific Multicast
21.6 Multicast Socket Options
21.7 'mcast_join' and Related Functions
21.8 'dg_cli' Function Using Multicasting
21.9 Receiving IP Multicast Infrastructure Session Announcements
21.10 Sending and Receiving
21.11 Simple Network Time Protocol (SNTP)
21.12 Summary
Exercises
Chapter 22. Advanced UDP Sockets
22.1 Introduction
22.2 Receiving Flags, Destination IP Address, and Interface Index
22.3 Datagram Truncation
22.4 When to Use UDP Instead of TCP
22.5 Adding Reliability to a UDP Application
22.6 Binding Interface Addresses
22.7 Concurrent UDP Servers
22.8 IPv6 Packet Information
22.9 IPv6 Path MTU Control
22.10 Summary
Exercises
Chapter 23. Advanced SCTP Sockets
23.1 Introduction
23.2 An Autoclosing One-to-Many-Style Server
23.3 Partial Delivery
23.4 Notifications
23.5 Unordered Data
23.6 Binding a Subset of Addresses
23.7 Determining Peer and Local Address Information
23.8 Finding an Association ID Given an IP Address
23.9 Heartbeating and Address Failure
23.10 Peeling Off an Association
23.11 Controlling Timing
23.12 When to Use SCTP Instead of TCP
23.13 Summary
Exercises
Chapter 24. Out-of-Band Data
24.1 Introduction
24.2 TCP Out-of-Band Data
24.3 'sockatmark' Function
24.4 TCP Out-of-Band Data Recap
24.5 Summary
Exercises
Chapter 25. Signal-Driven I/O
25.1 Introduction
25.2 Signal-Driven I/O for Sockets
25.3 UDP Echo Server Using 'SIGIO'
25.4 Summary
Exercises
Chapter 26. Threads
26.1 Introduction
26.2 Basic Thread Functions: Creation and Termination
26.3 'str_cli' Function Using Threads
26.4 TCP Echo Server Using Threads
26.5 Thread-Specific Data
26.6 Web Client and Simultaneous Connections (Continued)
26.7 Mutexes: Mutual Exclusion
26.8 Condition Variables
26.9 Web Client and Simultaneous Connections (Continued)
26.10 Summary
Exercises
Chapter 27. IP Options
27.1 Introduction
27.2 IPv4 Options
27.3 IPv4 Source Route Options
27.4 IPv6 Extension Headers
27.5 IPv6 Hop-by-Hop Options and Destination Options
27.6 IPv6 Routing Header
27.7 IPv6 Sticky Options
27.8 Historical IPv6 Advanced API
27.9 Summary
Exercises
Chapter 28. Raw Sockets
28.1 Introduction
28.2 Raw Socket Creation
28.3 Raw Socket Output
28.4 Raw Socket Input
28.5 'ping' Program
28.6 'traceroute' Program
28.7 An ICMP Message Daemon
28.8 Summary
Exercises
Chapter 29. Datalink Access
29.1 Introduction
29.2 BSD Packet Filter (BPF)
29.3 Datalink Provider Interface (DLPI)
29.4 Linux: 'SOCK_PACKET' and 'PF_PACKET'
29.5 'libpcap': Packet Capture Library
29.6 'libnet': Packet Creation and Injection Library
29.7 Examining the UDP Checksum Field
29.8 Summary
Exercises
Chapter 30. Client/Server Design Alternatives
30.1 Introduction
30.2 TCP Client Alternatives
30.3 TCP Test Client
30.4 TCP Iterative Server
30.5 TCP Concurrent Server, One Child per Client
30.6 TCP Preforked Server, No Locking Around 'accept'
30.7 TCP Preforked Server, File Locking Around 'accept'
30.8 TCP Preforked Server, Thread Locking Around 'accept'
30.9 TCP Preforked Server, Descriptor Passing
30.10 TCP Concurrent Server, One Thread per Client
30.11 TCP Prethreaded Server, per-Thread 'accept'
30.12 TCP Prethreaded Server, Main Thread 'accept'
30.13 Summary
Exercises
Chapter 31. Streams
31.1 Introduction
31.2 Overview
31.3 'getmsg' and 'putmsg' Functions
31.4 'getpmsg' and 'putpmsg' Functions
31.5 'ioctl' Function
31.6 Transport Provider Interface (TPI)
31.7 Summary
Exercises
Appendix A. IPv4, IPv6, ICMPv4, and ICMPv6
A.1 Introduction
A.2 IPv4 Header
A.3 IPv6 Header
A.4 IPv4 Addresses
A.5 IPv6 Addresses
A.6 Internet Control Message Protocols (ICMPv4 and ICMPv6)
Appendix B. Virtual Networks
B.1 Introduction
B.2 The MBone
B.3 The 6bone
B.4 IPv6 Transition: 6to4
Appendix C. Debugging Techniques
C.1 System Call Tracing
C.2 Standard Internet Services
C.3 'sock' Program
C.4 Small Test Programs
C.5 'tcpdump' Program
C.6 'netstat' Program
C.7 'lsof' Program
Appendix D. Miscellaneous Source Code
D.1 'unp.h' Header
D.2 'config.h' Header
D.3 Standard Error Functions
Appendix E. Solutions to Selected Exercises
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 20
Chapter 21
Chapter 22
Chapter 24
Chapter 25
Chapter 26
Chapter 27
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Bibliography
?
[ Team LiB ] Previous Section Next Section

4.5 listen Function

The listen function is called only by a TCP server and it performs two actions:

  1. When a socket is created by the socket function, it is assumed to be an active socket, that is, a client socket that will issue a connect. The listen function converts an unconnected socket into a passive socket, indicating that the kernel should accept incoming connection requests directed to this socket. In terms of the TCP state transition diagram (Figure 2.4), the call to listen moves the socket from the CLOSED state to the LISTEN state.

  2. The second argument to this function specifies the maximum number of connections the kernel should queue for this socket.

#include <sys/socket.h>

#int listen (int sockfd, int backlog);

Returns: 0 if OK, -1 on error

This function is normally called after both the socket and bind functions and must be called before calling the accept function.

To understand the backlog argument, we must realize that for a given listening socket, the kernel maintains two queues:

  1. An incomplete connection queue, which contains an entry for each SYN that has arrived from a client for which the server is awaiting completion of the TCP three-way handshake. These sockets are in the SYN_RCVD state (Figure 2.4).

  2. A completed connection queue, which contains an entry for each client with whom the TCP three-way handshake has completed. These sockets are in the ESTABLISHED state (Figure 2.4).

Figure 4.7 depicts these two queues for a given listening socket.

Figure 4.7. The two queues maintained by TCP for a listening socket.

graphics/04fig07.gif

When an entry is created on the incomplete queue, the parameters from the listen socket are copied over to the newly created connection. The connection creation mechanism is completely automatic; the server process is not involved. Figure 4.8 depicts the packets exchanged during the connection establishment with these two queues.

Figure 4.8. TCP three-way handshake and the two queues for a listening socket.

graphics/04fig08.gif

When a SYN arrives from a client, TCP creates a new entry on the incomplete queue and then responds with the second segment of the three-way handshake: the server's SYN with an ACK of the client's SYN (Section 2.6). This entry will remain on the incomplete queue until the third segment of the three-way handshake arrives (the client's ACK of the server's SYN), or until the entry times out. (Berkeley-derived implementations have a timeout of 75 seconds for these incomplete entries.) If the three-way handshake completes normally, the entry moves from the incomplete queue to the end of the completed queue. When the process calls accept, which we will describe in the next section, the first entry on the completed queue is returned to the process, or if the queue is empty, the process is put to sleep until an entry is placed onto the completed queue.

There are several points to consider regarding the handling of these two queues.

  • The backlog argument to the listen function has historically specified the maximum value for the sum of both queues.

    There has never been a formal definition of what the backlog means. The 4.2BSD man page says that it "defines the maximum length the queue of pending connections may grow to." Many man pages and even the POSIX specification copy this definition verbatim, but this definition does not say whether a pending connection is one in the SYN_RCVD state, one in the ESTABLISHED state that has not yet been accepted, or either. The historical definition in this bullet is the Berkeley implementation, dating back to 4.2BSD, and copied by many others.

  • Berkeley-derived implementations add a fudge factor to the backlog: It is multiplied by 1.5 (p. 257 of TCPv1 and p. 462 of TCPV2). For example, the commonly specified backlog of 5 really allows up to 8 queued entries on these systems, as we show in Figure 4.10.

    The reason for adding this fudge factor appears lost to history [Joy 1994]. But if we consider the backlog as specifying the maximum number of completed connections that the kernel will queue for a socket ([Borman 1997b], as discussed shortly), then the reason for the fudge factor is to take into account incomplete connections on the queue.

  • Do not specify a backlog of 0, as different implementations interpret this differently (Figure 4.10). If you do not want any clients connecting to your listening socket, close the listening socket.

  • Assuming the three-way handshake completes normally (i.e., no lost segments and no retransmissions), an entry remains on the incomplete connection queue for one RTT, whatever that value happens to be between a particular client and server. Section 14.4 of TCPv3 shows that for one Web server, the median RTT between many clients and the server was 187 ms. (The median is often used for this statistic, since a few large values can noticeably skew the mean.)

  • Historically, sample code always shows a backlog of 5, as that was the maximum value supported by 4.2BSD. This was adequate in the 1980s when busy servers would handle only a few hundred connections per day. But with the growth of the World Wide Web (WWW), where busy servers handle millions of connections per day, this small number is completely inadequate (pp. 187鈥?92 of TCPv3). Busy HTTP servers must specify a much larger backlog, and newer kernels must support larger values.

    Many current systems allow the administrator to modify the maximum value for the backlog.

  • A problem is: What value should the application specify for the backlog, since 5 is often inadequate? There is no easy answer to this. HTTP servers now specify a larger value, but if the value specified is a constant in the source code, to increase the constant requires recompiling the server. Another method is to assume some default but allow a command-line option or an environment variable to override the default. It is always acceptable to specify a value that is larger than supported by the kernel, as the kernel should silently truncate the value to the maximum value that it supports, without returning an error (p. 456 of TCPv2).

    We can provide a simple solution to this problem by modifying our wrapper function for the listen function. Figure 4.9 shows the actual code. We allow the environment variable LISTENQ to override the value specified by the caller.

    Figure 4.9 Wrapper function for listen that allows an environment variable to specify backlog.

    lib/wrapsock.c

    137 void
    138 Listen (int fd, int backlog)
    139 {
    140     char    *ptr;
    
    141         /* can override 2nd argument with environment variable */
    142     if ( (ptr = getenv("LISTENQ")) != NULL)
    143         backlog = atoi (ptr);
    
    144     if (listen (fd, backlog) < 0)
    145         err_sys ("listen error");
    146 }
    
  • Manuals and books have historically said that the reason for queuing a fixed number of connections is to handle the case of the server process being busy between successive calls to accept. This implies that of the two queues, the completed queue should normally have more entries than the incomplete queue. Again, busy Web servers have shown that this is false. The reason for specifying a large backlog is because the incomplete connection queue can grow as client SYNs arrive, waiting for completion of the three-way handshake.

  • If the queues are full when a client SYN arrives, TCP ignores the arriving SYN (pp. 930鈥?31 of TCPv2); it does not send an RST. This is because the condition is considered temporary, and the client TCP will retransmit its SYN, hopefully finding room on the queue in the near future. If the server TCP immediately responded with an RST, the client's connect would return an error, forcing the application to handle this condition instead of letting TCP's normal retransmission take over. Also, the client could not differentiate between an RST in response to a SYN meaning "there is no server at this port" versus "there is a server at this port but its queues are full."

    Some implementations do send an RST when the queue is full. This behavior is incorrect for the reasons stated above, and unless your client specifically needs to interact with such a server, it's best to ignore this possibility. Coding to handle this case reduces the robustness of the client and puts more load on the network in the normal RST case, where the port really has no server listening on it.

  • Data that arrives after the three-way handshake completes, but before the server calls accept, should be queued by the server TCP, up to the size of the connected socket's receive buffer.

Figure 4.10 shows the actual number of queued connections provided for different values of the backlog argument for the various operating systems in Figure 1.16. For seven different operating systems there are five distinct columns, showing the variety of interpretations about what backlog means!

Figure 4.10. Actual number of queued connections for values of backlog.

graphics/04fig10.gif

AIX and MacOS have the traditional Berkeley algorithm, and Solaris seems very close to that algorithm as well. FreeBSD just adds one to backlog.

The program to measure these values is shown in the solution for Exercise 15.4.

As we said, historically the backlog has specified the maximum value for the sum of both queues. During 1996, a new type of attack was launched on the Internet called SYN flooding [CERT 1996b]. The hacker writes a program to send SYNs at a high rate to the victim, filling the incomplete connection queue for one or more TCP ports. (We use the term hacker to mean the attacker, as described in [Cheswick, Bellovin, and Rubin 2003].) Additionally, the source IP address of each SYN is set to a random number (this is called IP spoofing) so that the server's SYN/ACK goes nowhere. This also prevents the server from knowing the real IP address of the hacker. By filling the incomplete queue with bogus SYNs, legitimate SYNs are not queued, providing a denial of service to legitimate clients. There are two commonly used methods of handling these attacks, summarized in [Borman 1997b]. But what is most interesting in this note is revisiting what the listen backlog really means. It should specify the maximum number of completed connections for a given socket that the kernel will queue. The purpose of having a limit on these completed connections is to stop the kernel from accepting new connection requests for a given socket when the application is not accepting them (for whatever reason). If a system implements this interpretation, as does BSD/OS 3.0, then the application need not specify huge backlog values just because the server handles lots of client requests (e.g., a busy Web server) or to provide protection against SYN flooding. The kernel handles lots of incomplete connections, regardless of whether they are legitimate or from a hacker. But even with this interpretation, scenarios do occur where the traditional value of 5 is inadequate.

[ Team LiB ] Previous Section Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)