Re: select()/write() semantics



David Schwartz <davids@xxxxxxxxxxxxx> writes:
On Jun 19, 5:04 am, Rainer Weikusat <rweiku...@xxxxxxxxxxx> wrote:

Exactly. Meanwhile 'select' queries the I/O subsystem,

There is no such thing as 'an I/O subsystem'. 'I/O readiness' is a
per-descriptor property which is implemented by a struct file method.

You are confusing attributes of some particular piece of code with
attributes of the functions that code implements.

No. I am writing about Linux 2.6 and Linux 2.6 does not (currently)
have a general abstraction named 'I/O subsystem'.

typically about network connections.

That something happens 'typically' is different from 'something
happens always'. Specifically, FIFOs are not network connection, and
select behaviour on FIFOs was discussed.

Exactly. So nothing about network connections turns into a *guarantee*
about what select will do.

Since the original question was about FIFOs, behaviour of (hypothetical)
network connection is not part of the answer.

At a minimum, these connections have a remote end, which can change
the status of those connections at any time.

For connected sockets, the remote end can either send data or
terminate the connection. It cannot modify anything about already
received data.

How do you know that? Where is that written?

It is written in the kernel source and the relevant protocol
specifications.

[...]

Below is the kernel implementation for polling on FIFOs:

static unsigned int
pipe_poll(struct file *filp, poll_table *wait)
{
unsigned int mask;
struct inode *inode = filp->f_path.dentry->d_inode;
struct pipe_inode_info *pipe = inode->i_pipe;
int nrbufs;

poll_wait(filp, &pipe->wait, wait);

/* Reading only -- no need for acquiring the semaphore. */
nrbufs = pipe->nrbufs;
mask = 0;
if (filp->f_mode & FMODE_READ) {
mask = (nrbufs > 0) ? POLLIN | POLLRDNORM : 0;
if (!pipe->writers && filp->f_version != pipe->w_counter)
mask |= POLLHUP;
}

if (filp->f_mode & FMODE_WRITE) {
mask |= (nrbufs < PIPE_BUFFERS) ? POLLOUT | POLLWRNORM : 0;
/*
* Most Unices do not set POLLERR for FIFOs but on Linux they
* behave exactly like pipes for poll().
*/
if (!pipe->readers)
mask |= POLLERR;
}

return mask;}

[../linux/fs/pipe.c]
(for a general description, try <URL:http://linuxdriver.co.il/ldd3/>)

You mean the implementation in one specific version of one particular
operating system.

Exactly. I am writing about FIFOs as implemented in Linux in a
newsgroup whose topic is (supposedly) development of applications for
Linux.

The routine returns 'ready for reading' if pipe->nrbufs is larger
than zero and 'ready for writting' if pipe->nrbufs is smaller than
PIPE_BUFFERS. The pipe->nrbufs values is only modified in pipe_read
(decrements it if a buffer was consumed) and pipe_write (increments it
if a buffer was added). pipe_write blocks only if pipe->nrbufs becomes
equal to PIPE_BUFFERS before it has written all the requested data
to a set of pipe buffers. Since each pipe buffer has a size of
PIPE_BUF, at least one can be added if pipe_poll returns 'writable'
and nobody adds pipe buffers except pipe_write, pipe_write will not
block when writing <= PIPE_BUF octets after poll has returned writable
provided only a single process can write to the pipe.

BTW, that I have looked this up in the kernel source was basically for
sport. The 'room in the buffer' condition of a pipe does not change
except if data is added to the buffer.

It can't change from external memory pressure? Where is that
written?

In the code.


if (!page) {
page = alloc_page(GFP_HIGHUSER);
if (unlikely(!page)) {
ret = ret ? : -ENOMEM;
break;
}
pipe->tmp_page = page;
}
[pipe_write/ ../linux/fs/pipe.c]

The 'room in the buffer' condition actually means 'pipe_write may
allocate more memory for this pipe' and if allocating more memory
isn't possible, the routine returns failure. It may probably sleep
waiting for disk I/O if pages need to be reclaimed, but that isn't
what is usually meant by 'blocking', namely 'wait for an indefinite
amount of time until an unrelated application has done something.

The same would trivially be true for a TCP socket descriptor, for
instance. If there is room in the socket write buffer, this room
will remain available until consumed.

How do you know this?

Strictly speaking, not at all. But that would be the answer to the
question 'How do you know that Schroedinger's cat will be dead after
ten years in its box', and the topic if discussion would then be a
philosophical one. I am willing to assume 'cause and effect' and the
possibility of deductions that correctly predict future effects as
given.

That is precisely what you cannot do.

Ergo: It is impossible to develop software, because its behaviour
cannot be predicted.

Since I am able to type this particular sentence, this conclusion is
wrong.

It would even be incorrect to say, for example, "if 'access' says
you can write to a file, a subsequent open for writing will not fail
so long as the permissions are not modified". Why is that wrong?

Because somebody could unlink the file.

Because there could be many other ways the subsequent 'open' could
fail, and even if you can't think of any, that doesn't mean they
don't exist.

The problem here is that the original claim is too general. If a
process having the necessary permissions tries to open a file which
exists for writing, this open will neither fail because the file does
not exist nor because the process hasn't the necessary permissions to
open it. That's part of the defined semantic of 'open(2)' and an open
which 'may or may not open a file, depending on random circumstances'
wouldn't be particularly useful.

It does not mean that the cleaning lady will not pull the power chord
at the same time and it does not mean that an evil alien with a gamma
ray cannon could not just erase some parts of the contents of the
system's RAM.

[...]

At least two times in the past, people have listened to nonsense
exactly like the nonsense you are spouting

I am still writing (mostly) of the behaviour of FIFOs on Linux. And
nothing else. If would add the additional claim that the behaviour of
each other type of descriptor can be determined, too, provided one is
willing to limit oneself to a specific part of the observable
reality. Which I am.

and real-world code has *BROKEN* because of it.

This code has been broken to begin with, because it was written under
assumptions that even contradicted documented behaviour.
.



Relevant Pages

  • Re: PIPE buffer size unknown or ignored...
    ... > the number of bytes that a pipe can hold. ... Definition of fpathconf on _PC_PIPE_BUF is ambiguous and change from unix to ... On Linux: ... pipe is made of more than one buffer from kernel 2.6.11 on) but I want to ...
    (comp.os.linux.development.apps)
  • Re: subprocesses lifecycle
    ... actually try to write to the pipe until the buffer (4kB on Linux, ... most other unixes) is full - that will be after about 75 or 150 lines, ... Torvalds' goal for Linux is very simple: ...
    (comp.lang.perl.misc)
  • Re: SATA-performance: Linux vs. FreeBSD
    ... If you call fsync in BSD then you get what you expect. ... but modern Linux forces flushes to platter if the hardware support ... MB your buffer should be always full... ... There will be always long periods when the pipe will be empty. ...
    (Linux-Kernel)
  • Re: Make pipe data structure be a circular list of pages, rather than
    ... > work on socket locking than on pipe locking. ... > code should conceptually really allow one CPU to empty one pipe buffer ... > This is the main reason why I want to avoid coalescing if possible: ...
    (Linux-Kernel)
  • Re: Make pipe data structure be a circular list of pages, rather
    ... > the concept of actually keeping the buffer on an external controller. ... - depends on a fairly recent -BK (since it uses the pipe buffer ops ... # Still horribly bad VFS interface, ... # Need to pass in arrays of "struct pipe_buffer" ...
    (Linux-Kernel)

Loading