3.9 readn, writen,
and readline Functions
Stream sockets (e.g., TCP sockets) exhibit a
behavior with the read and write functions that
differs from normal file I/O. A read or write on
a stream socket might input or output fewer bytes than requested,
but this is not an error condition. The reason is that buffer
limits might be reached for the socket in the kernel. All that is
required to input or output the remaining bytes is for the caller
to invoke the read or write function again. Some
versions of Unix also exhibit this behavior when writing more than
4,096 bytes to a pipe. This scenario is always a possibility on a
stream socket with read, but is normally seen with
write only if the socket is nonblocking. Nevertheless, we
always call our writen function instead of write,
in case the implementation returns a short count.
We provide the following three functions that we
use whenever we read from or write to a stream socket:
#include "unp.h"
|
ssize_t readn(int filedes, void *buff, size_t nbytes);
|
ssize_t writen(int filedes, const void *buff, size_t nbytes);
|
ssize_t readline(int filedes, void *buff, size_t maxlen);
|
All return: number of bytes read or written, 鈥?
on error
|
Figure
3.15 shows the readn function, Figure 3.16 shows the writen
function, and Figure 3.17
shows the readline function.
Figure 3.15
readn function: Read n
bytes from a descriptor.
lib/readn.c
1 #include "unp.h"
2 ssize_t /* Read "n" bytes from a descriptor. */
3 readn(int fd, void *vptr, size_t n)
4 {
5 size_t nleft;
6 ssize_t nread;
7 char *ptr;
8 ptr = vptr;
9 nleft = n;
10 while (nleft > 0) {
11 if ( (nread = read(fd, ptr, nleft)) < 0) {
12 if (errno == EINTR)
13 nread = 0; /* and call read() again */
14 else
15 return (-1);
16 } else if (nread == 0)
17 break; /* EOF */
18 nleft -= nread;
19 ptr += nread;
20 }
21 return (n - nleft); /* return >= 0 */
22 }
Figure 3.16
writen function: Write n
bytes to a descriptor.
lib/writen.c
1 #include "unp.h"
2 ssize_t /* Write "n" bytes to a descriptor. */
3 writen(int fd, const void *vptr, size_t n)
4 {
5 size_t nleft;
6 ssize_t nwritten;
7 const char *ptr;
8 ptr = vptr;
9 nleft = n;
10 while (nleft > 0) {
11 if ( (nwritten = write(fd, ptr, nleft)) <= 0) {
12 if (nwritten < 0 && errno == EINTR)
13 nwritten = 0; /* and call write() again */
14 else
15 return (-1); /* error */
16 }
17 nleft -= nwritten;
18 ptr += nwritten;
19 }
20 return (n);
21 }
Figure 3.17
readline function: Read a text line from a descriptor, one
byte at a time.
test/readline1.c
1 #include "unp.h"
2 /* PAINFULLY SLOW VERSION -- example only */
3 ssize_t
4 readline(int fd, void *vptr, size_t maxlen)
5 {
6 ssize_t n, rc;
7 char c, *ptr;
8 ptr = vptr;
9 for (n = 1; n < maxlen; n++) {
10 again:
11 if ( (rc = read(fd, &c, 1)) == 1) {
12 *ptr++ = c;
13 if (c == '\n')
14 break; /* newline is stored, like fgets() */
15 } else if (rc == 0) {
16 *ptr = 0;
17 return (n - 1); /* EOF, n - 1 bytes were read */
18 } else {
19 if (errno == EINTR)
20 goto again;
21 return (-1); /* error, errno set by read() */
22 }
23 }
24 *ptr = 0; /* null terminate like fgets() */
25 return (n);
26 }
Our three functions look for the error
EINTR (the system call was interrupted by a caught signal,
which we will discuss in more detail in Section 5.9) and
continue reading or writing if the error occurs. We handle the
error here, instead of forcing the caller to call readn or
writen again, since the purpose of these three functions
is to prevent the caller from having to handle a short count.
In Section 14.3, we
will mention that the MSG_WAITALL flag can be used with
the recv function to replace the need for a separate
readn function.
Note that our readline function calls
the system's read function once for every byte of data.
This is very inefficient, and why we've commented the code to state
it is "PAINFULLY SLOW." When faced with the desire to read lines
from a socket, it is quite tempting to turn to the standard I/O
library (referred to as "stdio"). We will discuss this approach at
length in Section 14.8, but
it can be a dangerous path. The same stdio buffering that solves
this performance problem creates numerous logistical problems that
can lead to well-hidden bugs in your application. The reason is
that the state of the stdio buffers is not exposed. To explain this
further, consider a line-based protocol between a client and a
server, where several clients and servers using that protocol may
be implemented over time (really quite common; for example, there
are many Web browsers and Web servers independently written to the
HTTP specification). Good "defensive programming" techniques
require these programs to not only expect their counterparts to
follow the network protocol, but to check for unexpected network
traffic as well. Such protocol violations should be reported as
errors so that bugs are noticed and fixed (and malicious attempts
are detected as well), and also so that network applications can
recover from problem traffic and continue working if possible.
Using stdio to buffer data for performance flies in the face of
these goals since the application has no way to tell if unexpected
data is being held in the stdio buffers at any given time.
There are many line-based network protocols such
as SMTP, HTTP, the FTP control connection protocol, and finger. So,
the desire to operate on lines comes up again and again. But our
advice is to think in terms of buffers and not lines. Write your
code to read buffers of data, and if a line is expected, check the
buffer to see if it contains that line.
Figure
3.18 shows a faster version of the readline function,
which uses its own buffering rather than stdio buffering. Most
importantly, the state of readline's internal buffer is
exposed, so callers have visibility into exactly what has been
received. Even with this feature, readline can be
problematic, as we'll see in Section 6.3. System
functions like select still won't know about
readline's internal buffer, so a carelessly written
program could easily find itself waiting in select for
data already received and stored in readline's buffers.
For that matter, mixing readn and readline calls
will not work as expected unless readn is modified to
check the internal buffer as well.
Figure 3.18
Better version of readline function.
lib/readline.c
1 #include "unp.h"
2 static int read_cnt;
3 static char *read_ptr;
4 static char read_buf[MAXLINE];
5 static ssize_t
6 my_read(int fd, char *ptr)
7 {
8 if (read_cnt <= 0) {
9 again:
10 if ( (read_cnt = read(fd, read_buf, sizeof(read_buf))) < 0) {
11 if (errno == EINTR)
12 goto again;
13 return (-1);
14 } else if (read_cnt == 0)
15 return (0);
16 read_ptr = read_buf;
17 }
18 read_cnt--;
19 *ptr = *read_ptr++;
20 return (1);
21 }
22 ssize_t
23 readline(int fd, void *vptr, size_t maxlen)
24 {
25 ssize_t n, rc;
26 char c, *ptr;
27 ptr = vptr;
28 for (n = 1; n < maxlen; n++) {
29 if ( (rc = my_read(fd, &c)) == 1) {
30 *ptr++ = c;
31 if (c == '\n')
32 break; /* newline is stored, like fgets() */
33 } else if (rc == 0) {
34 *ptr = 0;
35 return (n - 1); /* EOF, n - 1 bytes were read */
36 } else
37 return (-1); /* error, errno set by read() */
38 }
39 *ptr = 0; /* null terminate like fgets() */
40 return (n);
41 }
42 ssize_t
43 readlinebuf(void **vptrptr)
44 {
45 if (read_cnt)
46 *vptrptr = read_ptr;
47 return (read_cnt);
48 }
2鈥?1
The internal function my_read reads up to MAXLINE
characters at a time and then returns them, one at a time.
29 The
only change to the readline function itself is to call
my_read instead of read.
42鈥?8
A new function, readlinebuf, exposes the internal buffer
state so that callers can check and see if more data was received
beyond a single line.
Unfortunately, by using static
variables in readline.c to maintain the state information
across successive calls, the functions are not re-entrant or thread-safe. We will discuss this in Sections 11.18
and 26.5. We will
develop a thread-safe version using thread-specific data in
Figure 26.11.
|