Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

On Thu, 3 Sep 2009, OGAWA Hirofumi wrote:

If I'm not missing, I think it doesn't have big change with old
code. But I would need to check more deeply.

The thing is, the old pty code pushed _directly_ to the receiving ldisc,
with no buffering.

Yes.

I'm not entirely sure why Alan felt it needed changing,
but moving over to the generic tty buffering code did get rid of some
duplicate logic, and the locking is now done in one place, so that's
probably the main reason.

IIRC, ppp had the locking issue without that patch?

Anyway, the old pty code would be entirely synchronous, and would do the

ld->ops->receive_buf(to, buf, NULL, c);

to push the data all the way to the receive side frm pty_write(). So with
the old code, the destination "receive_room" was always accurate, because
both the reading side and the writing side basically accessed it directly.

With the new code, it all goes through tty_buffer.c, and the bugs have
been mostly about the receiving side not seeing all the data in the
buffers. And those buffers simply didn't use to exist before.

Yes. However, pty_write() checks tty_buffer instead of receive_room. So
I thought, the change of write side is mainly buffer size (receive_room
size + tty_buffer size). It will stop after filling tty_buffer, not
receive_room.

And (I hope) the read side guarantees to consume both buffers. If it is
right, I guessed the change is timing issues with more larger buffer
size.

Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
why reverting buffer handling fixes the problem.

In the old code, if 'receive_room' was zero, then the writer would simply
stop writing (no buffers in between). So in the old code, you could never
get into a situation where receive_room was zero and there was still
pending data.

At least that's how I read the situation.

Another possibility in my guess is the change of pty_flush_buffer() and
pty_chars_in_buffer(). I'm not sure at all though, especially, I'm
suspecting pty_flush_buffer() may change the behaviors.

If I'm right, I'm hoping that the patch I sent out fixes it, and if so,
we'll do that for 2.6.31 (and then after that maybe re-think whether the
extra buffering is worth all this pain).

I also hope it works.

And if it _doesn't_ fix it, then I think we'll just have to revert the
commits in question. We won't have time to root-cause it if the above
isn't it.

At least for me, it sounds like good if revert works. I have no
preference about it.

FWIW, meanwhile, I'll just try to see the root-cause of this as
another/fallback solution.

Thanks.
--
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages