Home
小杰的博客 Prev Page Prev Page
?
Main Page
Table of content
Copyright
Addison-Wesley Professional Computing Series
Foreword
Preface
Introduction
Changes from the Second Edition
Using This Book
Source Code and Errata Availability
Acknowledgments
Part 1: Introduction and TCP/IP
Chapter 1. Introduction
1.1 Introduction
1.2 A Simple Daytime Client
1.3 Protocol Independence
1.4 Error Handling: Wrapper Functions
1.5 A Simple Daytime Server
1.6 Roadmap to Client/Server Examples in the Text
1.7 OSI Model
1.8 BSD Networking History
1.9 Test Networks and Hosts
1.10 Unix Standards
1.11 64-Bit Architectures
1.12 Summary
Exercises
Chapter 2. The Transport Layer: TCP, UDP, and SCTP
2.1 Introduction
2.2 The Big Picture
2.3 User Datagram Protocol (UDP)
2.4 Transmission Control Protocol (TCP)
2.5 Stream Control Transmission Protocol (SCTP)
2.6 TCP Connection Establishment and Termination
2.7 TIME_WAIT State
2.8 SCTP Association Establishment and Termination
2.9 Port Numbers
2.10 TCP Port Numbers and Concurrent Servers
2.11 Buffer Sizes and Limitations
2.12 Standard Internet Services
2.13 Protocol Usage by Common Internet Applications
2.14 Summary
Exercises
Part 2: Elementary Sockets
Chapter 3. Sockets Introduction
3.1 Introduction
3.2 Socket Address Structures
3.3 Value-Result Arguments
3.4 Byte Ordering Functions
3.5 Byte Manipulation Functions
3.6 'inet_aton', 'inet_addr', and 'inet_ntoa' Functions
3.7 'inet_pton' and 'inet_ntop' Functions
3.8 'sock_ntop' and Related Functions
3.9 'readn', 'writen', and 'readline' Functions
3.10 Summary
Exercises
Chapter 4. Elementary TCP Sockets
4.1 Introduction
4.2 'socket' Function
4.3 'connect' Function
4.4 'bind' Function
4.5 'listen' Function
4.6 'accept' Function
4.7 'fork' and 'exec' Functions
4.8 Concurrent Servers
4.9 'close' Function
4.10 'getsockname' and 'getpeername' Functions
4.11 Summary
Exercises
Chapter 5. TCP Client/Server Example
5.1 Introduction
5.2 TCP Echo Server: 'main' Function
5.3 TCP Echo Server: 'str_echo' Function
5.4 TCP Echo Client: 'main' Function
5.5 TCP Echo Client: 'str_cli' Function
5.6 Normal Startup
5.7 Normal Termination
5.8 POSIX Signal Handling
5.9 Handling 'SIGCHLD' Signals
5.10 'wait' and 'waitpid' Functions
5.11 Connection Abort before 'accept' Returns
5.12 Termination of Server Process
5.13 'SIGPIPE' Signal
5.14 Crashing of Server Host
5.15 Crashing and Rebooting of Server Host
5.16 Shutdown of Server Host
5.17 Summary of TCP Example
5.18 Data Format
5.19 Summary
Exercises
Chapter 6. I/O Multiplexing: The 'select' and 'poll' Functions
6.1 Introduction
6.2 I/O Models
6.3 'select' Function
6.4 'str_cli' Function (Revisited)
6.5 Batch Input and Buffering
6.6 'shutdown' Function
6.7 'str_cli' Function (Revisited Again)
6.8 TCP Echo Server (Revisited)
6.9 'pselect' Function
6.10 'poll' Function
6.11 TCP Echo Server (Revisited Again)
6.12 Summary
Exercises
Chapter 7. Socket Options
7.1 Introduction
7.2 'getsockopt' and 'setsockopt' Functions
7.3 Checking if an Option Is Supported and Obtaining the Default
7.4 Socket States
7.5 Generic Socket Options
7.6 IPv4 Socket Options
7.7 ICMPv6 Socket Option
7.8 IPv6 Socket Options
7.9 TCP Socket Options
7.10 SCTP Socket Options
7.11 'fcntl' Function
7.12 Summary
Exercises
Chapter 8. Elementary UDP Sockets
8.1 Introduction
8.2 'recvfrom' and 'sendto' Functions
8.3 UDP Echo Server: 'main' Function
8.4 UDP Echo Server: 'dg_echo' Function
8.5 UDP Echo Client: 'main' Function
8.6 UDP Echo Client: 'dg_cli' Function
8.7 Lost Datagrams
8.8 Verifying Received Response
8.9 Server Not Running
8.10 Summary of UDP Example
8.11 'connect' Function with UDP
8.12 'dg_cli' Function (Revisited)
8.13 Lack of Flow Control with UDP
8.14 Determining Outgoing Interface with UDP
8.15 TCP and UDP Echo Server Using 'select'
8.16 Summary
Exercises
Chapter 9. Elementary SCTP Sockets
9.1 Introduction
9.2 Interface Models
9.3 'sctp_bindx' Function
9.4 'sctp_connectx' Function
9.5 'sctp_getpaddrs' Function
9.6 'sctp_freepaddrs' Function
9.7 'sctp_getladdrs' Function
9.8 'sctp_freeladdrs' Function
9.9 'sctp_sendmsg' Function
9.10 'sctp_recvmsg' Function
9.11 'sctp_opt_info' Function
9.12 'sctp_peeloff' Function
9.13 'shutdown' Function
9.14 Notifications
9.15 Summary
Exercises
Chapter 10. SCTP Client/Server Example
10.1 Introduction
10.2 SCTP One-to-Many-Style Streaming Echo Server: 'main' Function
10.3 SCTP One-to-Many-Style Streaming Echo Client: 'main' Function
10.4 SCTP Streaming Echo Client: 'str_cli' Function
10.5 Exploring Head-of-Line Blocking
10.6 Controlling the Number of Streams
10.7 Controlling Termination
10.8 Summary
Exercises
Chapter 11. Name and Address Conversions
11.1 Introduction
11.2 Domain Name System (DNS)
11.3 'gethostbyname' Function
11.4 'gethostbyaddr' Function
11.5 'getservbyname' and 'getservbyport' Functions
11.6 'getaddrinfo' Function
11.7 'gai_strerror' Function
11.8 'freeaddrinfo' Function
11.9 'getaddrinfo' Function: IPv6
11.10 'getaddrinfo' Function: Examples
11.11 'host_serv' Function
11.12 'tcp_connect' Function
11.13 'tcp_listen' Function
11.14 'udp_client' Function
11.15 'udp_connect' Function
11.16 'udp_server' Function
11.17 'getnameinfo' Function
11.18 Re-entrant Functions
11.19 'gethostbyname_r' and 'gethostbyaddr_r' Functions
11.20 Obsolete IPv6 Address Lookup Functions
11.21 Other Networking Information
11.22 Summary
Exercises
Part 3: Advanced Sockets
Chapter 12. IPv4 and IPv6 Interoperability
12.1 Introduction
12.2 IPv4 Client, IPv6 Server
12.3 IPv6 Client, IPv4 Server
12.4 IPv6 Address-Testing Macros
12.5 Source Code Portability
12.6 Summary
Exercises
Chapter 13. Daemon Processes and the 'inetd' Superserver
13.1 Introduction
13.2 'syslogd' Daemon
13.3 'syslog' Function
13.4 'daemon_init' Function
13.5 'inetd' Daemon
13.6 'daemon_inetd' Function
13.7 Summary
Exercises
Chapter 14. Advanced I/O Functions
14.1 Introduction
14.2 Socket Timeouts
14.3 'recv' and 'send' Functions
14.4 'readv' and 'writev' Functions
14.5 'recvmsg' and 'sendmsg' Functions
14.6 Ancillary Data
14.7 How Much Data Is Queued?
14.8 Sockets and Standard I/O
14.9 Advanced Polling
14.10 Summary
Exercises
Chapter 15. Unix Domain Protocols
15.1 Introduction
15.2 Unix Domain Socket Address Structure
15.3 'socketpair' Function
15.4 Socket Functions
15.5 Unix Domain Stream Client/Server
15.6 Unix Domain Datagram Client/Server
15.7 Passing Descriptors
15.8 Receiving Sender Credentials
15.9 Summary
Exercises
Chapter 16. Nonblocking I/O
16.1 Introduction
16.2 Nonblocking Reads and Writes: 'str_cli' Function (Revisited)
16.3 Nonblocking 'connect'
16.4 Nonblocking 'connect:' Daytime Client
16.5 Nonblocking 'connect:' Web Client
16.6 Nonblocking 'accept'
16.7 Summary
Exercises
Chapter 17. 'ioctl' Operations
17.1 Introduction
17.2 'ioctl' Function
17.3 Socket Operations
17.4 File Operations
17.5 Interface Configuration
17.6 'get_ifi_info' Function
17.7 Interface Operations
17.8 ARP Cache Operations
17.9 Routing Table Operations
17.10 Summary
Exercises
Chapter 18. Routing Sockets
18.1 Introduction
18.2 Datalink Socket Address Structure
18.3 Reading and Writing
18.4 'sysctl' Operations
18.5 'get_ifi_info' Function (Revisited)
18.6 Interface Name and Index Functions
18.7 Summary
Exercises
Chapter 19. Key Management Sockets
19.1 Introduction
19.2 Reading and Writing
19.3 Dumping the Security Association Database (SADB)
19.4 Creating a Static Security Association (SA)
19.5 Dynamically Maintaining SAs
19.6 Summary
Exercises
Chapter 20. Broadcasting
20.1 Introduction
20.2 Broadcast Addresses
20.3 Unicast versus Broadcast
20.4 'dg_cli' Function Using Broadcasting
20.5 Race Conditions
20.6 Summary
Exercises
Chapter 21. Multicasting
21.1 Introduction
21.2 Multicast Addresses
21.3 Multicasting versus Broadcasting on a LAN
21.4 Multicasting on a WAN
21.5 Source-Specific Multicast
21.6 Multicast Socket Options
21.7 'mcast_join' and Related Functions
21.8 'dg_cli' Function Using Multicasting
21.9 Receiving IP Multicast Infrastructure Session Announcements
21.10 Sending and Receiving
21.11 Simple Network Time Protocol (SNTP)
21.12 Summary
Exercises
Chapter 22. Advanced UDP Sockets
22.1 Introduction
22.2 Receiving Flags, Destination IP Address, and Interface Index
22.3 Datagram Truncation
22.4 When to Use UDP Instead of TCP
22.5 Adding Reliability to a UDP Application
22.6 Binding Interface Addresses
22.7 Concurrent UDP Servers
22.8 IPv6 Packet Information
22.9 IPv6 Path MTU Control
22.10 Summary
Exercises
Chapter 23. Advanced SCTP Sockets
23.1 Introduction
23.2 An Autoclosing One-to-Many-Style Server
23.3 Partial Delivery
23.4 Notifications
23.5 Unordered Data
23.6 Binding a Subset of Addresses
23.7 Determining Peer and Local Address Information
23.8 Finding an Association ID Given an IP Address
23.9 Heartbeating and Address Failure
23.10 Peeling Off an Association
23.11 Controlling Timing
23.12 When to Use SCTP Instead of TCP
23.13 Summary
Exercises
Chapter 24. Out-of-Band Data
24.1 Introduction
24.2 TCP Out-of-Band Data
24.3 'sockatmark' Function
24.4 TCP Out-of-Band Data Recap
24.5 Summary
Exercises
Chapter 25. Signal-Driven I/O
25.1 Introduction
25.2 Signal-Driven I/O for Sockets
25.3 UDP Echo Server Using 'SIGIO'
25.4 Summary
Exercises
Chapter 26. Threads
26.1 Introduction
26.2 Basic Thread Functions: Creation and Termination
26.3 'str_cli' Function Using Threads
26.4 TCP Echo Server Using Threads
26.5 Thread-Specific Data
26.6 Web Client and Simultaneous Connections (Continued)
26.7 Mutexes: Mutual Exclusion
26.8 Condition Variables
26.9 Web Client and Simultaneous Connections (Continued)
26.10 Summary
Exercises
Chapter 27. IP Options
27.1 Introduction
27.2 IPv4 Options
27.3 IPv4 Source Route Options
27.4 IPv6 Extension Headers
27.5 IPv6 Hop-by-Hop Options and Destination Options
27.6 IPv6 Routing Header
27.7 IPv6 Sticky Options
27.8 Historical IPv6 Advanced API
27.9 Summary
Exercises
Chapter 28. Raw Sockets
28.1 Introduction
28.2 Raw Socket Creation
28.3 Raw Socket Output
28.4 Raw Socket Input
28.5 'ping' Program
28.6 'traceroute' Program
28.7 An ICMP Message Daemon
28.8 Summary
Exercises
Chapter 29. Datalink Access
29.1 Introduction
29.2 BSD Packet Filter (BPF)
29.3 Datalink Provider Interface (DLPI)
29.4 Linux: 'SOCK_PACKET' and 'PF_PACKET'
29.5 'libpcap': Packet Capture Library
29.6 'libnet': Packet Creation and Injection Library
29.7 Examining the UDP Checksum Field
29.8 Summary
Exercises
Chapter 30. Client/Server Design Alternatives
30.1 Introduction
30.2 TCP Client Alternatives
30.3 TCP Test Client
30.4 TCP Iterative Server
30.5 TCP Concurrent Server, One Child per Client
30.6 TCP Preforked Server, No Locking Around 'accept'
30.7 TCP Preforked Server, File Locking Around 'accept'
30.8 TCP Preforked Server, Thread Locking Around 'accept'
30.9 TCP Preforked Server, Descriptor Passing
30.10 TCP Concurrent Server, One Thread per Client
30.11 TCP Prethreaded Server, per-Thread 'accept'
30.12 TCP Prethreaded Server, Main Thread 'accept'
30.13 Summary
Exercises
Chapter 31. Streams
31.1 Introduction
31.2 Overview
31.3 'getmsg' and 'putmsg' Functions
31.4 'getpmsg' and 'putpmsg' Functions
31.5 'ioctl' Function
31.6 Transport Provider Interface (TPI)
31.7 Summary
Exercises
Appendix A. IPv4, IPv6, ICMPv4, and ICMPv6
A.1 Introduction
A.2 IPv4 Header
A.3 IPv6 Header
A.4 IPv4 Addresses
A.5 IPv6 Addresses
A.6 Internet Control Message Protocols (ICMPv4 and ICMPv6)
Appendix B. Virtual Networks
B.1 Introduction
B.2 The MBone
B.3 The 6bone
B.4 IPv6 Transition: 6to4
Appendix C. Debugging Techniques
C.1 System Call Tracing
C.2 Standard Internet Services
C.3 'sock' Program
C.4 Small Test Programs
C.5 'tcpdump' Program
C.6 'netstat' Program
C.7 'lsof' Program
Appendix D. Miscellaneous Source Code
D.1 'unp.h' Header
D.2 'config.h' Header
D.3 Standard Error Functions
Appendix E. Solutions to Selected Exercises
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 20
Chapter 21
Chapter 22
Chapter 24
Chapter 25
Chapter 26
Chapter 27
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Bibliography
?
[ Team LiB ] Previous Section Next Section

2.11 Buffer Sizes and Limitations

Certain limits affect the size of IP datagrams. We first describe these limits and then tie them all together with regard to how they affect the data an application can transmit.

  • The maximum size of an IPv4 datagram is 65,535 bytes, including the IPv4 header. This is because of the 16-bit total length field in Figure A.1.

  • The maximum size of an IPv6 datagram is 65,575 bytes, including the 40-byte IPv6 header. This is because of the 16-bit payload length field in Figure A.2. Notice that the IPv6 payload length field does not include the size of the IPv6 header, while the IPv4 total length field does include the header size.

    IPv6 has a jumbo payload option, which extends the payload length field to 32 bits, but this option is supported only on datalinks with a maximum transmission unit (MTU) that exceeds 65,535. (This is intended for host-to-host interconnects, such as HIPPI, which often have no inherent MTU.)

  • Many networks have an MTU which can be dictated by the hardware. For example, the Ethernet MTU is 1,500 bytes. Other datalinks, such as point-to-point links using the Point-to-Point Protocol (PPP), have a configurable MTU. Older SLIP links often used an MTU of 1,006 or 296 bytes.

    The minimum link MTU for IPv4 is 68 bytes. This permits a maximum-sized IPv4 header (20 bytes of fixed header, 30 bytes of options) and minimum-sized fragment (the fragment offset is in units of 8 bytes). The minimum link MTU for IPv6 is 1,280 bytes. IPv6 can run over links with a smaller MTU, but requires link-specific fragmentation and reassembly to make the link appear to have an MTU of at least 1,280 bytes (RFC 2460 [Deering and Hinden 1998]).

  • The smallest MTU in the path between two hosts is called the path MTU. Today, the Ethernet MTU of 1,500 bytes is often the path MTU. The path MTU need not be the same in both directions between any two hosts because routing in the Internet is often asymmetric [Paxson 1996]. That is, the route from A to B can differ from the route from B to A.

  • When an IP datagram is to be sent out an interface, if the size of the datagram exceeds the link MTU, fragmentation is performed by both IPv4 and IPv6. The fragments are not normally reassembled until they reach the final destination. IPv4 hosts perform fragmentation on datagrams that they generate and IPv4 routers perform fragmentation on datagrams that they forward. But with IPv6, only hosts perform fragmentation on datagrams that they generate; IPv6 routers do not fragment datagrams that they are forwarding.

    We must be careful with our terminology. A box labeled as an IPv6 router may indeed perform fragmentation, but only on datagrams that the router itself generates, never on datagrams that it is forwarding. When this box generates IPv6 datagrams, it is really acting as a host. For example, most routers support the Telnet protocol and this is used for router configuration by administrators. The IP datagrams generated by the router's Telnet server are generated by the router, not forwarded by the router.

    You may notice that fields exist in the IPv4 header (Figure A.1) to handle IPv4 fragmentation, but there are no fields in the IPv6 header (Figure A.2) for fragmentation. Since fragmentation is the exception, rather than the rule, IPv6 contains an option header with the fragmentation information.

    Certain firewalls, which usually act as routers, may reassemble fragmented packets to allow inspection of the entire packet contents. This allows the prevention of certain attacks at the cost of additional complexity in the firewall device. It also requires the firewall device to be part of the only path to the network, reducing the opportunities for redundancy.

  • If the "don't fragment" (DF) bit is set in the IPv4 header (Figure A.1), it specifies that this datagram must not be fragmented, either by the sending host or by any router. A router that receives an IPv4 datagram with the DF bit set whose size exceeds the outgoing link's MTU generates an ICMPv4 "destination unreachable, fragmentation needed but DF bit set" error message (Figure A.15).

    Since IPv6 routers do not perform fragmentation, there is an implied DF bit with every IPv6 datagram. When an IPv6 router receives a datagram whose size exceeds the outgoing link's MTU, it generates an ICMPv6 "packet too big" error message (Figure A.16).

    The IPv4 DF bit and its implied IPv6 counterpart can be used for path MTU discovery (RFC 1191 [Mogul and Deering 1990] for IPv4 and RFC 1981 [McCann, Deering, and Mogul 1996] for IPv6). For example, if TCP uses this technique with IPv4, then it sends all its datagrams with the DF bit set. If some intermediate router returns an ICMP "destination unreachable, fragmentation needed but DF bit set" error, TCP decreases the amount of data it sends per datagram and retransmits. Path MTU discovery is optional with IPv4, but IPv6 implementations all either support path MTU discovery or always send using the minimum MTU.

    Path MTU discovery is problematic in the Internet today; many firewalls drop all ICMP messages, including the fragmentation required message, meaning that TCP never gets the signal that it needs to decrease the amount of data it is sending. As of this writing, an effort is beginning in the IETF to define another method for path MTU discovery that does not rely on ICMP errors.

  • IPv4 and IPv6 define a minimum reassembly buffer size, the minimum datagram size that we are guaranteed any implementation must support. For IPv4, this is 576 bytes. IPv6 raises this to 1,500 bytes. With IPv4, for example, we have no idea whether a given destination can accept a 577-byte datagram or not. Therefore, many IPv4 applications that use UDP (e.g., DNS, RIP, TFTP, BOOTP, SNMP) prevent applications from generating IP datagrams that exceed this size.

  • TCP has a maximum segment size (MSS) that announces to the peer TCP the maximum amount of TCP data that the peer can send per segment. We saw the MSS option on the SYN segments in Figure 2.5. The goal of the MSS is to tell the peer the actual value of the reassembly buffer size and to try to avoid fragmentation. The MSS is often set to the interface MTU minus the fixed sizes of the IP and TCP headers. On an Ethernet using IPv4, this would be 1,460, and on an Ethernet using IPv6, this would be 1,440. (The TCP header is 20 bytes for both, but the IPv4 header is 20 bytes and the IPv6 header is 40 bytes.)

    The MSS value in the TCP MSS option is a 16-bit field, limiting the value to 65,535. This is fine for IPv4, since the maximum amount of TCP data in an IPv4 datagram is 65,495 (65,535 minus the 20-byte IPv4 header and minus the 20-byte TCP header). But with the IPv6 jumbo payload option, a different technique is used (RFC 2675 [Borman, Deering, and Hinden 1999]). First, the maximum amount of TCP data in an IPv6 datagram without the jumbo payload option is 65,515 (65,535 minus the 20-byte TCP header). Therefore, the MSS value of 65,535 is considered a special case that designates "infinity." This value is used only if the jumbo payload option is being used, which requires an MTU that exceeds 65,535. If TCP is using the jumbo payload option and receives an MSS announcement of 65,535 from the peer, the limit on the datagram sizes that it sends is just the interface MTU. If this turns out to be too large (i.e., there is a link in the path with a smaller MTU), then path MTU discovery will determine the smaller value.

  • SCTP keeps a fragmentation point based on the smallest path MTU found to all the peer's addresses. This smallest MTU size is used to split large user messages into smaller pieces that can be sent in one IP datagram. The SCTP_MAXSEG socket option can influence this value, allowing the user to request a smaller fragmentation point.

TCP Output

Given all these terms and definitions, Figure 2.15 shows what happens when an application writes data to a TCP socket.

Figure 2.15. Steps and buffers involved when an application writes to a TCP socket.

graphics/02fig15.gif

Every TCP socket has a send buffer and we can change the size of this buffer with the SO_SNDBUF socket option (Section 7.5). When an application calls write, the kernel copies all the data from the application buffer into the socket send buffer. If there is insufficient room in the socket buffer for all the application's data (either the application buffer is larger than the socket send buffer, or there is already data in the socket send buffer), the process is put to sleep. This assumes the normal default of a blocking socket. (We will talk about nonblocking sockets in Chapter 16.) The kernel will not return from the write until the final byte in the application buffer has been copied into the socket send buffer. Therefore, the successful return from a write to a TCP socket only tells us that we can reuse our application buffer. It does not tell us that either the peer TCP has received the data or that the peer application has received the data. (We will talk about this more with the SO_LINGER socket option in Section 7.5.)

TCP takes the data in the socket send buffer and sends it to the peer TCP based on all the rules of TCP data transmission (Chapter 19 and 20 of TCPv1). The peer TCP must acknowledge the data, and as the ACKs arrive from the peer, only then can our TCP discard the acknowledged data from the socket send buffer. TCP must keep a copy of our data until it is acknowledged by the peer.

TCP sends the data to IP in MSS-sized or smaller chunks, prepending its TCP header to each segment, where the MSS is the value announced by the peer, or 536 if the peer did not send an MSS option. IP prepends its header, searches the routing table for the destination IP address (the matching routing table entry specifies the outgoing interface), and passes the datagram to the appropriate datalink. IP might perform fragmentation before passing the datagram to the datalink, but as we said earlier, one goal of the MSS option is to try to avoid fragmentation and newer implementations also use path MTU discovery. Each datalink has an output queue, and if this queue is full, the packet is discarded and an error is returned up the protocol stack: from the datalink to IP and then from IP to TCP. TCP will note this error and try sending the segment later. The application is not told of this transient condition.

UDP Output

Figure 2.16 shows what happens when an application writes data to a UDP socket.

Figure 2.16. Steps and buffers involved when an application writes to a UDP socket.

graphics/02fig16.gif

This time, we show the socket send buffer as a dashed box because it doesn't really exist. A UDP socket has a send buffer size (which we can change with the SO_SNDBUF socket option, Section 7.5), but this is simply an upper limit on the maximum-sized UDP datagram that can be written to the socket. If an application writes a datagram larger than the socket send buffer size, EMSGSIZE is returned. Since UDP is unreliable, it does not need to keep a copy of the application's data and does not need an actual send buffer. (The application data is normally copied into a kernel buffer of some form as it passes down the protocol stack, but this copy is discarded by the datalink layer after the data is transmitted.)

UDP simply prepends its 8-byte header and passes the datagram to IP. IPv4 or IPv6 prepends its header, determines the outgoing interface by performing the routing function, and then either adds the datagram to the datalink output queue (if it fits within the MTU) or fragments the datagram and adds each fragment to the datalink output queue. If a UDP application sends large datagrams (say 2,000-byte datagrams), there is a much higher probability of fragmentation than with TCP, because TCP breaks the application data into MSS-sized chunks, something that has no counterpart in UDP.

The successful return from a write to a UDP socket tells us that either the datagram or all fragments of the datagram have been added to the datalink output queue. If there is no room on the queue for the datagram or one of its fragments, ENOBUFS is often returned to the application.

Unfortunately, some implementations do not return this error, giving the application no indication that the datagram was discarded without even being transmitted.

SCTP Output

Figure 2.17 shows what happens when an application writes data to an SCTP socket.

Figure 2.17. Steps and buffers involved when an application writes to an SCTP socket.

graphics/02fig17.gif

SCTP, since it is a reliable protocol like TCP, has a send buffer. As with TCP, an application can change the size of this buffer with the SO_SNDBUF socket option (Section 7.5). When the application calls write, the kernel copies all the data from the application buffer into the socket send buffer. If there is insufficient room in the socket buffer for all of the application's data (either the application buffer is larger than the socket send buffer, or there is already data in the socket send buffer), the process is put to sleep. This sleeping assumes the normal default of a blocking socket. (We will talk about nonblocking sockets in Chapter 16.) The kernel will not return from the write until the final byte in the application buffer has been copied into the socket send buffer. Therefore, the successful return from a write to an SCTP socket only tells the sender that it can reuse the application buffer. It does not tell us that either the peer SCTP has received the data, or that the peer application has received the data.

SCTP takes the data in the socket send buffer and sends it to the peer SCTP based on all the rules of SCTP data transmission (for details of data transfer, see Chapter 5 of [Stewart and Xie 2001]). The sending SCTP must await a SACK in which the cumulative acknowledgment point passes the sent data before that data can be removed from the socket buffer.

[ Team LiB ] Previous Section Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)