Home
小杰的博客 Prev Page Prev Page
?
Main Page
Table of content
Copyright
Addison-Wesley Professional Computing Series
Foreword
Preface
Introduction
Changes from the Second Edition
Using This Book
Source Code and Errata Availability
Acknowledgments
Part 1: Introduction and TCP/IP
Chapter 1. Introduction
1.1 Introduction
1.2 A Simple Daytime Client
1.3 Protocol Independence
1.4 Error Handling: Wrapper Functions
1.5 A Simple Daytime Server
1.6 Roadmap to Client/Server Examples in the Text
1.7 OSI Model
1.8 BSD Networking History
1.9 Test Networks and Hosts
1.10 Unix Standards
1.11 64-Bit Architectures
1.12 Summary
Exercises
Chapter 2. The Transport Layer: TCP, UDP, and SCTP
2.1 Introduction
2.2 The Big Picture
2.3 User Datagram Protocol (UDP)
2.4 Transmission Control Protocol (TCP)
2.5 Stream Control Transmission Protocol (SCTP)
2.6 TCP Connection Establishment and Termination
2.7 TIME_WAIT State
2.8 SCTP Association Establishment and Termination
2.9 Port Numbers
2.10 TCP Port Numbers and Concurrent Servers
2.11 Buffer Sizes and Limitations
2.12 Standard Internet Services
2.13 Protocol Usage by Common Internet Applications
2.14 Summary
Exercises
Part 2: Elementary Sockets
Chapter 3. Sockets Introduction
3.1 Introduction
3.2 Socket Address Structures
3.3 Value-Result Arguments
3.4 Byte Ordering Functions
3.5 Byte Manipulation Functions
3.6 'inet_aton', 'inet_addr', and 'inet_ntoa' Functions
3.7 'inet_pton' and 'inet_ntop' Functions
3.8 'sock_ntop' and Related Functions
3.9 'readn', 'writen', and 'readline' Functions
3.10 Summary
Exercises
Chapter 4. Elementary TCP Sockets
4.1 Introduction
4.2 'socket' Function
4.3 'connect' Function
4.4 'bind' Function
4.5 'listen' Function
4.6 'accept' Function
4.7 'fork' and 'exec' Functions
4.8 Concurrent Servers
4.9 'close' Function
4.10 'getsockname' and 'getpeername' Functions
4.11 Summary
Exercises
Chapter 5. TCP Client/Server Example
5.1 Introduction
5.2 TCP Echo Server: 'main' Function
5.3 TCP Echo Server: 'str_echo' Function
5.4 TCP Echo Client: 'main' Function
5.5 TCP Echo Client: 'str_cli' Function
5.6 Normal Startup
5.7 Normal Termination
5.8 POSIX Signal Handling
5.9 Handling 'SIGCHLD' Signals
5.10 'wait' and 'waitpid' Functions
5.11 Connection Abort before 'accept' Returns
5.12 Termination of Server Process
5.13 'SIGPIPE' Signal
5.14 Crashing of Server Host
5.15 Crashing and Rebooting of Server Host
5.16 Shutdown of Server Host
5.17 Summary of TCP Example
5.18 Data Format
5.19 Summary
Exercises
Chapter 6. I/O Multiplexing: The 'select' and 'poll' Functions
6.1 Introduction
6.2 I/O Models
6.3 'select' Function
6.4 'str_cli' Function (Revisited)
6.5 Batch Input and Buffering
6.6 'shutdown' Function
6.7 'str_cli' Function (Revisited Again)
6.8 TCP Echo Server (Revisited)
6.9 'pselect' Function
6.10 'poll' Function
6.11 TCP Echo Server (Revisited Again)
6.12 Summary
Exercises
Chapter 7. Socket Options
7.1 Introduction
7.2 'getsockopt' and 'setsockopt' Functions
7.3 Checking if an Option Is Supported and Obtaining the Default
7.4 Socket States
7.5 Generic Socket Options
7.6 IPv4 Socket Options
7.7 ICMPv6 Socket Option
7.8 IPv6 Socket Options
7.9 TCP Socket Options
7.10 SCTP Socket Options
7.11 'fcntl' Function
7.12 Summary
Exercises
Chapter 8. Elementary UDP Sockets
8.1 Introduction
8.2 'recvfrom' and 'sendto' Functions
8.3 UDP Echo Server: 'main' Function
8.4 UDP Echo Server: 'dg_echo' Function
8.5 UDP Echo Client: 'main' Function
8.6 UDP Echo Client: 'dg_cli' Function
8.7 Lost Datagrams
8.8 Verifying Received Response
8.9 Server Not Running
8.10 Summary of UDP Example
8.11 'connect' Function with UDP
8.12 'dg_cli' Function (Revisited)
8.13 Lack of Flow Control with UDP
8.14 Determining Outgoing Interface with UDP
8.15 TCP and UDP Echo Server Using 'select'
8.16 Summary
Exercises
Chapter 9. Elementary SCTP Sockets
9.1 Introduction
9.2 Interface Models
9.3 'sctp_bindx' Function
9.4 'sctp_connectx' Function
9.5 'sctp_getpaddrs' Function
9.6 'sctp_freepaddrs' Function
9.7 'sctp_getladdrs' Function
9.8 'sctp_freeladdrs' Function
9.9 'sctp_sendmsg' Function
9.10 'sctp_recvmsg' Function
9.11 'sctp_opt_info' Function
9.12 'sctp_peeloff' Function
9.13 'shutdown' Function
9.14 Notifications
9.15 Summary
Exercises
Chapter 10. SCTP Client/Server Example
10.1 Introduction
10.2 SCTP One-to-Many-Style Streaming Echo Server: 'main' Function
10.3 SCTP One-to-Many-Style Streaming Echo Client: 'main' Function
10.4 SCTP Streaming Echo Client: 'str_cli' Function
10.5 Exploring Head-of-Line Blocking
10.6 Controlling the Number of Streams
10.7 Controlling Termination
10.8 Summary
Exercises
Chapter 11. Name and Address Conversions
11.1 Introduction
11.2 Domain Name System (DNS)
11.3 'gethostbyname' Function
11.4 'gethostbyaddr' Function
11.5 'getservbyname' and 'getservbyport' Functions
11.6 'getaddrinfo' Function
11.7 'gai_strerror' Function
11.8 'freeaddrinfo' Function
11.9 'getaddrinfo' Function: IPv6
11.10 'getaddrinfo' Function: Examples
11.11 'host_serv' Function
11.12 'tcp_connect' Function
11.13 'tcp_listen' Function
11.14 'udp_client' Function
11.15 'udp_connect' Function
11.16 'udp_server' Function
11.17 'getnameinfo' Function
11.18 Re-entrant Functions
11.19 'gethostbyname_r' and 'gethostbyaddr_r' Functions
11.20 Obsolete IPv6 Address Lookup Functions
11.21 Other Networking Information
11.22 Summary
Exercises
Part 3: Advanced Sockets
Chapter 12. IPv4 and IPv6 Interoperability
12.1 Introduction
12.2 IPv4 Client, IPv6 Server
12.3 IPv6 Client, IPv4 Server
12.4 IPv6 Address-Testing Macros
12.5 Source Code Portability
12.6 Summary
Exercises
Chapter 13. Daemon Processes and the 'inetd' Superserver
13.1 Introduction
13.2 'syslogd' Daemon
13.3 'syslog' Function
13.4 'daemon_init' Function
13.5 'inetd' Daemon
13.6 'daemon_inetd' Function
13.7 Summary
Exercises
Chapter 14. Advanced I/O Functions
14.1 Introduction
14.2 Socket Timeouts
14.3 'recv' and 'send' Functions
14.4 'readv' and 'writev' Functions
14.5 'recvmsg' and 'sendmsg' Functions
14.6 Ancillary Data
14.7 How Much Data Is Queued?
14.8 Sockets and Standard I/O
14.9 Advanced Polling
14.10 Summary
Exercises
Chapter 15. Unix Domain Protocols
15.1 Introduction
15.2 Unix Domain Socket Address Structure
15.3 'socketpair' Function
15.4 Socket Functions
15.5 Unix Domain Stream Client/Server
15.6 Unix Domain Datagram Client/Server
15.7 Passing Descriptors
15.8 Receiving Sender Credentials
15.9 Summary
Exercises
Chapter 16. Nonblocking I/O
16.1 Introduction
16.2 Nonblocking Reads and Writes: 'str_cli' Function (Revisited)
16.3 Nonblocking 'connect'
16.4 Nonblocking 'connect:' Daytime Client
16.5 Nonblocking 'connect:' Web Client
16.6 Nonblocking 'accept'
16.7 Summary
Exercises
Chapter 17. 'ioctl' Operations
17.1 Introduction
17.2 'ioctl' Function
17.3 Socket Operations
17.4 File Operations
17.5 Interface Configuration
17.6 'get_ifi_info' Function
17.7 Interface Operations
17.8 ARP Cache Operations
17.9 Routing Table Operations
17.10 Summary
Exercises
Chapter 18. Routing Sockets
18.1 Introduction
18.2 Datalink Socket Address Structure
18.3 Reading and Writing
18.4 'sysctl' Operations
18.5 'get_ifi_info' Function (Revisited)
18.6 Interface Name and Index Functions
18.7 Summary
Exercises
Chapter 19. Key Management Sockets
19.1 Introduction
19.2 Reading and Writing
19.3 Dumping the Security Association Database (SADB)
19.4 Creating a Static Security Association (SA)
19.5 Dynamically Maintaining SAs
19.6 Summary
Exercises
Chapter 20. Broadcasting
20.1 Introduction
20.2 Broadcast Addresses
20.3 Unicast versus Broadcast
20.4 'dg_cli' Function Using Broadcasting
20.5 Race Conditions
20.6 Summary
Exercises
Chapter 21. Multicasting
21.1 Introduction
21.2 Multicast Addresses
21.3 Multicasting versus Broadcasting on a LAN
21.4 Multicasting on a WAN
21.5 Source-Specific Multicast
21.6 Multicast Socket Options
21.7 'mcast_join' and Related Functions
21.8 'dg_cli' Function Using Multicasting
21.9 Receiving IP Multicast Infrastructure Session Announcements
21.10 Sending and Receiving
21.11 Simple Network Time Protocol (SNTP)
21.12 Summary
Exercises
Chapter 22. Advanced UDP Sockets
22.1 Introduction
22.2 Receiving Flags, Destination IP Address, and Interface Index
22.3 Datagram Truncation
22.4 When to Use UDP Instead of TCP
22.5 Adding Reliability to a UDP Application
22.6 Binding Interface Addresses
22.7 Concurrent UDP Servers
22.8 IPv6 Packet Information
22.9 IPv6 Path MTU Control
22.10 Summary
Exercises
Chapter 23. Advanced SCTP Sockets
23.1 Introduction
23.2 An Autoclosing One-to-Many-Style Server
23.3 Partial Delivery
23.4 Notifications
23.5 Unordered Data
23.6 Binding a Subset of Addresses
23.7 Determining Peer and Local Address Information
23.8 Finding an Association ID Given an IP Address
23.9 Heartbeating and Address Failure
23.10 Peeling Off an Association
23.11 Controlling Timing
23.12 When to Use SCTP Instead of TCP
23.13 Summary
Exercises
Chapter 24. Out-of-Band Data
24.1 Introduction
24.2 TCP Out-of-Band Data
24.3 'sockatmark' Function
24.4 TCP Out-of-Band Data Recap
24.5 Summary
Exercises
Chapter 25. Signal-Driven I/O
25.1 Introduction
25.2 Signal-Driven I/O for Sockets
25.3 UDP Echo Server Using 'SIGIO'
25.4 Summary
Exercises
Chapter 26. Threads
26.1 Introduction
26.2 Basic Thread Functions: Creation and Termination
26.3 'str_cli' Function Using Threads
26.4 TCP Echo Server Using Threads
26.5 Thread-Specific Data
26.6 Web Client and Simultaneous Connections (Continued)
26.7 Mutexes: Mutual Exclusion
26.8 Condition Variables
26.9 Web Client and Simultaneous Connections (Continued)
26.10 Summary
Exercises
Chapter 27. IP Options
27.1 Introduction
27.2 IPv4 Options
27.3 IPv4 Source Route Options
27.4 IPv6 Extension Headers
27.5 IPv6 Hop-by-Hop Options and Destination Options
27.6 IPv6 Routing Header
27.7 IPv6 Sticky Options
27.8 Historical IPv6 Advanced API
27.9 Summary
Exercises
Chapter 28. Raw Sockets
28.1 Introduction
28.2 Raw Socket Creation
28.3 Raw Socket Output
28.4 Raw Socket Input
28.5 'ping' Program
28.6 'traceroute' Program
28.7 An ICMP Message Daemon
28.8 Summary
Exercises
Chapter 29. Datalink Access
29.1 Introduction
29.2 BSD Packet Filter (BPF)
29.3 Datalink Provider Interface (DLPI)
29.4 Linux: 'SOCK_PACKET' and 'PF_PACKET'
29.5 'libpcap': Packet Capture Library
29.6 'libnet': Packet Creation and Injection Library
29.7 Examining the UDP Checksum Field
29.8 Summary
Exercises
Chapter 30. Client/Server Design Alternatives
30.1 Introduction
30.2 TCP Client Alternatives
30.3 TCP Test Client
30.4 TCP Iterative Server
30.5 TCP Concurrent Server, One Child per Client
30.6 TCP Preforked Server, No Locking Around 'accept'
30.7 TCP Preforked Server, File Locking Around 'accept'
30.8 TCP Preforked Server, Thread Locking Around 'accept'
30.9 TCP Preforked Server, Descriptor Passing
30.10 TCP Concurrent Server, One Thread per Client
30.11 TCP Prethreaded Server, per-Thread 'accept'
30.12 TCP Prethreaded Server, Main Thread 'accept'
30.13 Summary
Exercises
Chapter 31. Streams
31.1 Introduction
31.2 Overview
31.3 'getmsg' and 'putmsg' Functions
31.4 'getpmsg' and 'putpmsg' Functions
31.5 'ioctl' Function
31.6 Transport Provider Interface (TPI)
31.7 Summary
Exercises
Appendix A. IPv4, IPv6, ICMPv4, and ICMPv6
A.1 Introduction
A.2 IPv4 Header
A.3 IPv6 Header
A.4 IPv4 Addresses
A.5 IPv6 Addresses
A.6 Internet Control Message Protocols (ICMPv4 and ICMPv6)
Appendix B. Virtual Networks
B.1 Introduction
B.2 The MBone
B.3 The 6bone
B.4 IPv6 Transition: 6to4
Appendix C. Debugging Techniques
C.1 System Call Tracing
C.2 Standard Internet Services
C.3 'sock' Program
C.4 Small Test Programs
C.5 'tcpdump' Program
C.6 'netstat' Program
C.7 'lsof' Program
Appendix D. Miscellaneous Source Code
D.1 'unp.h' Header
D.2 'config.h' Header
D.3 Standard Error Functions
Appendix E. Solutions to Selected Exercises
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 20
Chapter 21
Chapter 22
Chapter 24
Chapter 25
Chapter 26
Chapter 27
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Bibliography
?
[ Team LiB ] Previous Section Next Section

6.3 select Function

This function allows the process to instruct the kernel to wait for any one of multiple events to occur and to wake up the process only when one or more of these events occurs or when a specified amount of time has passed.

As an example, we can call select and tell the kernel to return only when:

  • Any of the descriptors in the set {1, 4, 5} are ready for reading

  • Any of the descriptors in the set {2, 7} are ready for writing

  • Any of the descriptors in the set {1, 4} have an exception condition pending

  • 10.2 seconds have elapsed

That is, we tell the kernel what descriptors we are interested in (for reading, writing, or an exception condition) and how long to wait. The descriptors in which we are interested are not restricted to sockets; any descriptor can be tested using select.

Berkeley-derived implementations have always allowed I/O multiplexing with any descriptor. SVR3 originally limited I/O multiplexing to descriptors that were STREAMS devices (Chapter 31), but this limitation was removed with SVR4.

#include <sys/select.h>

#include <sys/time.h>

int select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timeval *timeout);

Returns: positive count of ready descriptors, 0 on timeout, 鈥? on error

We start our description of this function with its final argument, which tells the kernel how long to wait for one of the specified descriptors to become ready. A timeval structure specifies the number of seconds and microseconds.


struct timeval  {
  long   tv_sec;          /* seconds */
  long   tv_usec;         /* microseconds */
};

There are three possibilities:

  1. Wait forever鈥?/span> Return only when one of the specified descriptors is ready for I/O. For this, we specify the timeout argument as a null pointer.

  2. Wait up to a fixed amount of time鈥?/span> Return when one of the specified descriptors is ready for I/O, but do not wait beyond the number of seconds and microseconds specified in the timeval structure pointed to by the timeout argument.

  3. Do not wait at all鈥?/span> Return immediately after checking the descriptors. This is called polling. To specify this, the timeout argument must point to a timeval structure and the timer value (the number of seconds and microseconds specified by the structure) must be 0.

The wait in the first two scenarios is normally interrupted if the process catches a signal and returns from the signal handler.

Berkeley-derived kernels never automatically restart select (p. 527 of TCPv2), while SVR4 will if the SA_RESTART flag is specified when the signal handler is installed. This means that for portability, we must be prepared for select to return an error of EINTR if we are catching signals.

Although the timeval structure lets us specify a resolution in microseconds, the actual resolution supported by the kernel is often more coarse. For example, many Unix kernels round the timeout value up to a multiple of 10 ms. There is also a scheduling latency involved, meaning it takes some time after the timer expires before the kernel schedules this process to run.

On some systems, select will fail with EINVAL if the tv_sec field in the timeout is over 100 million seconds. Of course, that's a very large timeout (over three years) and likely not very useful, but the point is that the timeval structure can represent values that are not supported by select.

The const qualifier on the timeout argument means it is not modified by select on return. For example, if we specify a time limit of 10 seconds, and select returns before the timer expires with one or more of the descriptors ready or with an error of EINTR, the timeval structure is not updated with the number of seconds remaining when the function returns. If we wish to know this value, we must obtain the system time before calling select, and then again when it returns, and subtract the two (any robust program will take into account that the system time may be adjusted by either the administrator or by a daemon like ntpd occasionally).

Some Linux versions modify the timeval structure. Therefore, for portability, assume the timeval structure is undefined upon return, and initialize it before each call to select. POSIX specifies the const qualifier.

The three middle arguments, readset, writeset, and exceptset, specify the descriptors that we want the kernel to test for reading, writing, and exception conditions. There are only two exception conditions currently supported:

  1. The arrival of out-of-band data for a socket. We will describe this in more detail in Chapter 24.

  2. The presence of control status information to be read from the master side of a pseudo-terminal that has been put into packet mode. We do not talk about pseudo-terminals in this book.

A design problem is how to specify one or more descriptor values for each of these three arguments. select uses descriptor sets, typically an array of integers, with each bit in each integer corresponding to a descriptor. For example, using 32-bit integers, the first element of the array corresponds to descriptors 0 through 31, the second element of the array corresponds to descriptors 32 through 63, and so on. All the implementation details are irrelevant to the application and are hidden in the fd_set datatype and the following four macros:

void FD_ZERO(fd_set *fdset);

/* clear all bits in fdset */

void FD_SET(int fd, fd_set *fdset);

/* turn on the bit for fd in fdset */

void FD_CLR(int fd, fd_set *fdset);

/* turn off the bit for fd in fdset */

int FD_ISSET(int fd, fd_set *fdset);

/* is the bit for fd on in fdset ? */

We allocate a descriptor set of the fd_set datatype, we set and test the bits in the set using these macros, and we can also assign it to another descriptor set across an equals sign (=) in C.

What we are describing, an array of integers using one bit per descriptor, is just one possible way to implement select. Nevertheless, it is common to refer to the individual descriptors within a descriptor set as bits, as in "turn on the bit for the listening descriptor in the read set."

We will see in Section 6.10 that the poll function uses a completely different representation: a variable-length array of structures with one structure per descriptor.

For example, to define a variable of type fd_set and then turn on the bits for descriptors 1, 4, and 5, we write


fd_set rset;

FD_ZERO(&rset);          /* initialize the set: all bits off */
FD_SET(1, &rset);        /* turn on bit for fd 1 */
FD_SET(4, &rset);        /* turn on bit for fd 4 */
FD_SET(5, &rset);        /* turn on bit for fd 5 */

It is important to initialize the set, since unpredictable results can occur if the set is allocated as an automatic variable and not initialized.

Any of the middle three arguments to select, readset, writeset, or exceptset, can be specified as a null pointer if we are not interested in that condition. Indeed, if all three pointers are null, then we have a higher precision timer than the normal Unix sleep function (which sleeps for multiples of a second). The poll function provides similar functionality. Figures C.9 and C.10 of APUE show a sleep_us function implemented using both select and poll that sleeps for multiples of a microsecond.

The maxfdp1 argument specifies the number of descriptors to be tested. Its value is the maximum descriptor to be tested plus one (hence our name of maxfdp1). The descriptors 0, 1, 2, up through and including maxfdp1鈥? are tested.

The constant FD_SETSIZE, defined by including <sys/select.h>, is the number of descriptors in the fd_set datatype. Its value is often 1024, but few programs use that many descriptors. The maxfdp1 argument forces us to calculate the largest descriptor that we are interested in and then tell the kernel this value. For example, given the previous code that turns on the indicators for descriptors 1, 4, and 5, the maxfdp1 value is 6. The reason it is 6 and not 5 is that we are specifying the number of descriptors, not the largest value, and descriptors start at 0.

The reason this argument exists, along with the burden of calculating its value, is purely for efficiency. Although each fd_set has room for many descriptors, typically 1,024, this is much more than the number used by a typical process. The kernel gains efficiency by not copying unneeded portions of the descriptor set between the process and the kernel, and by not testing bits that are always 0 (Section 16.13 of TCPv2).

select modifies the descriptor sets pointed to by the readset, writeset, and exceptset pointers. These three arguments are value-result arguments. When we call the function, we specify the values of the descriptors that we are interested in, and on return, the result indicates which descriptors are ready. We use the FD_ISSET macro on return to test a specific descriptor in an fd_set structure. Any descriptor that is not ready on return will have its corresponding bit cleared in the descriptor set. To handle this, we turn on all the bits in which we are interested in all the descriptor sets each time we call select.

The two most common programming errors when using select are to forget to add one to the largest descriptor number and to forget that the descriptor sets are value-result arguments. The second error results in select being called with a bit set to 0 in the descriptor set, when we think that bit is 1.

The return value from this function indicates the total number of bits that are ready across all the descriptor sets. If the timer value expires before any of the descriptors are ready, a value of 0 is returned. A return value of 鈥? indicates an error (which can happen, for example, if the function is interrupted by a caught signal).

Early releases of SVR4 had a bug in their implementation of select: If the same bit was on in multiple sets, say a descriptor was ready for both reading and writing, it was counted only once. Current releases fix this bug.

Under What Conditions Is a Descriptor Ready?

We have been talking about waiting for a descriptor to become ready for I/O (reading or writing) or to have an exception condition pending on it (out-of-band data). While readability and writability are obvious for descriptors such as regular files, we must be more specific about the conditions that cause select to return "ready" for sockets (Figure 16.52 of TCPv2).

  1. A socket is ready for reading if any of the following four conditions is true:

    1. The number of bytes of data in the socket receive buffer is greater than or equal to the current size of the low-water mark for the socket receive buffer. A read operation on the socket will not block and will return a value greater than 0 (i.e., the data that is ready to be read). We can set this low-water mark using the SO_RCVLOWAT socket option. It defaults to 1 for TCP and UDP sockets.

    2. The read half of the connection is closed (i.e., a TCP connection that has received a FIN). A read operation on the socket will not block and will return 0 (i.e., EOF).

    3. The socket is a listening socket and the number of completed connections is nonzero. An accept on the listening socket will normally not block, although we will describe a timing condition in Section 16.6 under which the accept can block.

    4. A socket error is pending. A read operation on the socket will not block and will return an error (鈥?) with errno set to the specific error condition. These pending errors can also be fetched and cleared by calling getsockopt and specifying the SO_ERROR socket option.

  2. A socket is ready for writing if any of the following four conditions is true:

    1. The number of bytes of available space in the socket send buffer is greater than or equal to the current size of the low-water mark for the socket send buffer and either: (i) the socket is connected, or (ii) the socket does not require a connection (e.g., UDP). This means that if we set the socket to nonblocking (Chapter 16), a write operation will not block and will return a positive value (e.g., the number of bytes accepted by the transport layer). We can set this low-water mark using the SO_SNDLOWAT socket option. This low-water mark normally defaults to 2048 for TCP and UDP sockets.

    2. The write half of the connection is closed. A write operation on the socket will generate SIGPIPE (Section 5.12).

    3. A socket using a non-blocking connect has completed the connection, or the connect has failed.

    4. A socket error is pending. A write operation on the socket will not block and will return an error (鈥?) with errno set to the specific error condition. These pending errors can also be fetched and cleared by calling getsockopt with the SO_ERROR socket option.

  3. A socket has an exception condition pending if there is out-of-band data for the socket or the socket is still at the out-of-band mark. (We will describe out-of-band data in Chapter 24.)

    Our definitions of "readable" and "writable" are taken directly from the kernel's soreadable and sowriteable macros on pp. 530鈥?31 of TCPv2. Similarly, our definition of the "exception condition" for a socket is from the soo_select function on these same pages.

Notice that when an error occurs on a socket, it is marked as both readable and writable by select.

The purpose of the receive and send low-water marks is to give the application control over how much data must be available for reading or how much space must be available for writing before select returns a readable or writable status. For example, if we know that our application has nothing productive to do unless at least 64 bytes of data are present, we can set the receive low-water mark to 64 to prevent select from waking us up if less than 64 bytes are ready for reading.

As long as the send low-water mark for a UDP socket is less than the send buffer size (which should always be the default relationship), the UDP socket is always writable, since a connection is not required.

Figure 6.7 summarizes the conditions just described that cause a socket to be ready for select.

Figure 6.7. Summary of conditions that cause a socket to be ready for select.

graphics/06fig07.gif

Maximum Number of Descriptors for select

We said earlier that most applications do not use lots of descriptors. It is rare, for example, to find an application that uses hundreds of descriptors. But, such applications do exist, and they often use select to multiplex the descriptors. When select was originally designed, the OS normally had an upper limit on the maximum number of descriptors per process (the 4.2BSD limit was 31), and select just used this same limit. But, current versions of Unix allow for a virtually unlimited number of descriptors per process (often limited only by the amount of memory and any administrative limits), so the question is: How does this affect select?

Many implementations have declarations similar to the following, which are taken from the 4.4BSD <sys/types.h> header:


/*
 * Select uses bitmasks of file descriptors in longs. These macros
 * manipulate such bit fields (the filesystem macros use chars).
 * FD_SETSIZE may be defined by the user, but the default here should
 * be enough for most uses.
 */
#ifndef FD_SETSIZE
#define FD_SETSIZE      256
#endif

This makes us think that we can just #define FD_SETSIZE to some larger value before including this header to increase the size of the descriptor sets used by select. Unfortunately, this normally does not work.

To see what is wrong, notice that Figure 16.53 of TCPv2 declares three descriptor sets within the kernel and also uses the kernel's definition of FD_SETSIZE as the upper limit. The only way to increase the size of the descriptor sets is to increase the value of FD_SETSIZE and then recompile the kernel. Changing the value without recompiling the kernel is inadequate.

Some vendors are changing their implementation of select to allow the process to define FD_SETSIZE to a larger value than the default. BSD/OS has changed the kernel implementation to allow larger descriptor sets, and it also provides four new FD_xxx macros to dynamically allocate and manipulate these larger sets. From a portability standpoint, however, beware of using large descriptor sets.

[ Team LiB ] Previous Section Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)