Re: [PATCH] OpenBSD Networking-related randomization port

linux_at_horizon.com
Date: 01/29/05

  • Next message: John Richard Moser: "Re: Patch 4/6 randomize the stack pointer"
    Date:	29 Jan 2005 07:24:29 -0000
    To: lorenzo@gnu.org
    
    

    > It adds support for advanced networking-related randomization, in
    > concrete it adds support for TCP ISNs randomization

    Er... did you read the existing Linux TCP ISN generation code?
    Which is quite thoroughly randomized already?

    I'm not sure how the OpenBSD code is better in any way. (Notice that it
    uses the same "half_md4_transform" as Linux; you just added another copy.)
    Is there a design note on how the design was chosen?

    I don't wish to be *too* discouraging to someone who's *trying* to help,
    but could you *please* check a little more carefully in future to
    make sire it's actually an improvement?

    I fear there's some ignorance of what the TCP ISN does, why it's chosen
    the way it is, and what the current Linux algorithm is designed to do.
    So here's a summary of what's going on. But even as a summary, it's
    pretty long...

    First, a little background on the selection of the TCP ISN...

    TCP is designed to work in an environment where packets are delayed.
    If a packet is delayed enough, TCP will retransmit it. If one of
    the copies floats around the Internet for long enough and then arrives
    long after it is expected, this is a "delayed duplicate".

    TCP connections are between (host, port, host port) quadruples, and
    packets that don't match some "current connection" in all four fields
    will have no effect on the current connection. This is why systems try
    to avoid re-using source port numbers when making connections to
    well-known destination ports.

    However, sometimes the source port number is explicitly specified and
    must be reused. The problem then arises, how do we avoid having any
    possible delayed packets from the previous use of this address pair show
    up during the current connection and confuse the heck out of things by
    acknowledging data that was never received, or shutting down a connection
    that's supposed to stay open, or something like that?

    First of all, protocols assume a maximum packet lifetime in the Internet.
    The "Maximum Segment Lifetime" was originally specified as 120 seconds,
    but many implementations optimize this to 60 or 30 seconds. The longest
    time that a response can be delayed is 2*MSL - one delay for the packet
    eliciting the response, and another for the response.

    In truth, there are few really-hard guarantees on how long a packet can
    be delayed. IP does have a TTL field, and a requirement that a packet's
    TTL field be decremented for each hop between routers *or each second of
    delay within a router*, but that latter portion isn't widely implemented.
    Still, it is an identified design goal, and is pretty reliable in
    practice.

    The solution is twofold: First, refuse to accept packets whose
    acks aren't in the current transmission window. That is, if the
    last ack I got was for byte 1000, and I have sent 1100 bytes
    (numbers 0 through 1099), then if the incoming packet's ack isn't
    somewhere between 1000 and 1100, it's not relevant. If it's
    950, it might be an old ack from the current connection (which
    doesn't include anything interesting), but in any case it can be
    safely ignored, and should be.

    The only remaining issue is, how to choose the first sequence number
    to use in a connection, the Initial Sequence Number (ISN)?

    If you start every connection at zero, then you have the risk that
    packets from an old connection between the same endpoints will
    show up at a bad time, with in-range sequence numbers, and confuse
    the current connection.

    So what you do is, start at a sequence number higher than the
    last one used in the old connection. Then there can't be any
    confusion. But this requires remembering the last sequence number
    used on every connection ever. And there are at least 2^48 addresses
    allowed to connect to each port on the local machine. At 4 bytes
    per sequence number, that's a Petabyte of storage...

    Well, first of all, after 2*MSL, you can forget about it and use
    whatever sequence number you like, because you know that there won't
    be any old packets floating around to crash the party.

    But still, it can be quite a burden on a busy web server. And you might
    crash and lose all your notes. Do you want to have to wait 2*MSL before
    rebooting?

    So the TCP designers (I'm not on page 27 of RFC 793, if you want to follow
    along) specified a time of day based ISN. If you use a clock to generate
    an ISN which counts up faster than your network connection can send
    data (and thus crank up its sequence numbers), you can be sure that your
    ISN is always higher than the last one used by an old connection without
    having to remember it explicitly.

    RFC 793 specifies a 250,000 bytes/second counting rate. Most
    implementations since Ethernet used a 1,000,000 byte/second counting
    rate, which matches the capabilities of 10base5 and 10base2 quite well,
    and is easy to get from the gettimeofday() call.

    Note that there are two risks with this. First, if the connection actually
    manages to go faster than the ISN clock, the next connection's ISN will
    be in the middle of the space the earlier connection used. Fortunately,
    the kind of links where significant routing delay appear are generally
    slower ones where 1 Mbyte/sec is a not too unreasonable limit. Your
    gigabit LAN isn't going to be delaying packets by seconds.

    The second is that a connection will be made and do nothing for 4294
    seconds until the ISN clock is about to wrap around and then start
    doing packets "ahead of" the ISN clock. If it then closes the connection
    and a new one opens, once again you have sequence number overlap.

    If you read old networking papers, there were a bunch of proposals for
    occasional sequence number renegotiation to solve this problem, but in the
    end people decided to not worry about it, and it hasn't been a problem
    in practice.

    Anyway... fast forward out of the peace and love decade and welcome to
    the modern Internet, with people *trying* to mess up TCP connections.
    This kind of attack from within was, unfortunately, not one of the
    scenarios that the initial Internet designers considered, and it's
    been a bit of a problem since.

    All this careful worry about packets left over from an old connection
    randomly showing up and messing things up, when we have people *creating*
    packets deliberately crafted to mess things up! A whole separate problem.
    In particular, using the simple timer-based algorithm, I can connect to
    a server, look at the ISN it offers me, and know that thats the same
    ISN it's offering to other people connecting at the same time. So I
    can create packets with a forged source address and approximately-valid
    sequence numbers and bombard the connection with them, cutting off that
    server's connection to some third party. Even if I can't see any of
    the traffic on the connection.

    So people sat down and did some thinking. How to deal with this?
    Harder yet, how to deal with this without redesigning TCP from scratch?

    Well, if a person wants to mess up their *own* connections, we can't
    stop them. The fundamental problem is that an attacker A can figure
    out the sequence numbers that machines B and C are using to talk to
    each other. So we'd like to make the sequence numbers for every
    connection unique and not related to the sequence numbers used on any
    other connections. So A can talk to B and A can talk to C and still not
    be able to figure out the sequence numbers that B and C are using between
    themselves.

    Fortunately, it is entirely possible to combine this with the clock-based
    algorithm and get the best of both worlds! All we need is a random offset,
    unique for every (address, port, address, port) quadruple, to add to
    the clock value, and we have all of the clock-based guarantees preserved
    within every pair of endpoints, but unrelated endpoints have their ISNs
    at some unpredictable offset relative to each other.

    And for generating such a random offset, we can use cryptography.
    Keep a single secret key, and hash together the endpoint addresses,
    and you can generate a random 32-bit ISN offset. Add that to the
    current time, and everything is golden. A can connect to B and
    see and ISN, but would need to do some serious cryptanalysis to
    figure out what ISN B is using to talk to C.

    Linux actually adds one refinement. For speed, it uses a very
    stripped-down cryptographic hash function. To guard against that
    being broken, it generates a new secret every 5 minutes. So an
    attacker only has 5 minutes to break it.

    The cryptographic offset is divided into 2 parts. The high 8 bits are
    a sequence number, incremented every time the secret is regenerated.
    The low 24 bits are produced by the hash. So 5 minutes after booting,
    the secret offset changes from 0x00yyyyyy to 0x01zzzzzz. This is at
    least +1, and at most +0x1ffffff. On average, the count is going up
    by 2^24 = 16 million every 300 seconds. Which just adds a bit to the
    basic "1 million per second" ISN timer.

    The cost is that the per-connection part of the ISN offset is limited
    to 24 and not 32 bits, but a cryptanalytic attack is pretty much
    precluded by the every-5-minutes thing. The rekey time and the number of
    really-unpredictable bits have to add up to not wrapping the ISN space
    too fast. (Although the 24 bits could be increased to 28 bits without
    quite doubling the ISN counting speed. Or 27 bits if you want plenty
    of margin. Could I suggest that as an improvement?)

    --- drivers/char/random.c 2004-12-04 09:24:19.000000000 +0000
    +++ drivers/char/random.c 2005-01-29 07:20:37.000000000 +0000
    @@ -2183,26 +2183,26 @@
     #define REKEY_INTERVAL (300*HZ)
     /*
      * Bit layout of the tcp sequence numbers (before adding current time):
    - * bit 24-31: increased after every key exchange
    - * bit 0-23: hash(source,dest)
    + * bit 27-31: increased after every key exchange
    + * bit 0-26: hash(source,dest)
      *
      * The implementation is similar to the algorithm described
      * in the Appendix of RFC 1185, except that
      * - it uses a 1 MHz clock instead of a 250 kHz clock
      * - it performs a rekey every 5 minutes, which is equivalent
    - * to a (source,dest) tulple dependent forward jump of the
    + * to a (source,dest) tuple dependent forward jump of the
      * clock by 0..2^(HASH_BITS+1)
      *
    - * Thus the average ISN wraparound time is 68 minutes instead of
    - * 4.55 hours.
    + * Thus the average ISN wraparound time is 49 minutes instead of
    + * 4.77 hours.
      *
      * SMP cleanup and lock avoidance with poor man's RCU.
      * Manfred Spraul <manfred@colorfullife.com>
      *
      */
    -#define COUNT_BITS 8
    +#define COUNT_BITS 5
     #define COUNT_MASK ( (1<<COUNT_BITS)-1)
    -#define HASH_BITS 24
    +#define HASH_BITS 27
     #define HASH_MASK ( (1<<HASH_BITS)-1 )
     
     static struct keydata {

    Anyway, in comparison, the algorithm in your patch (and presumably
    OpenBSD, although I haven't personally compared it) uses a clock
    offset generated fresh for each connection. There's a global counter
    (tcp_rndiss_cnt; I notice you don't have any SMP locking on it) which
    is incremented every time an ISN is needed. It's rekeyed periodically,
    and the high bit (tcp_rndiss_msb) of the delta is used like the COUNT_BITS
    in the Linux algorithm.

    The ISN is generated as the high sequence bit, then 15 bits of encrypted
    count (with some homebrew cipher I don't recognize), then a constant
    zero bit (am I reading that right), then the 15 low-order bits are
    purely random.

    It's a slightly different algorithm, but it does a very similar function.
    The main downsides are that the sequence number can easily go backwards
    (there's no guarantee that consecutive calls will return increasing
    numbers since tcp_rndiss_encrypt scrambles the high 15 bits), and
    that it's not SMP-safe. Two processors could read and use the same
    tcp_rndiss_cnt value at the same time, or (more likely) both call
    tcp_rndiss_init at the same time and end up toggling tcp_rndiss_msb twice,
    thereby destroying the no-rollback property it's trying to achieve.

    Oh, and the single sequence bit in the offsets means that the
    TCP ISN will wrap around very fast. Every 10 minutes, or every
    60000 TCP connections, whichever comes first.

    Regarding the first issue, it's possible that the OpenBSD network stack
    takes care to remember all connections for 2*MSL and continues the
    sequence number if the endpoints are reused, thereby avoiding a call to
    ip_randomisn entirely.

    But the second deserves some attention. The Linux code takes some care
    to avoid having to lock anything in the common case. The global count
    makes that difficult.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: John Richard Moser: "Re: Patch 4/6 randomize the stack pointer"

    Relevant Pages

    • Re: Pros & cons with stealth?
      ... I used a number of different Windows raw TCP connect tools (Essential Net ... few others), to establish a connection with TESTBED, first denying, then ... 3 attempts were made by each client to establish the ... Each transaction of 3 SYN and RST ACK packets was 372 bytes. ...
      (comp.security.firewalls)
    • Re: Simulating smaller MTU? ie sending small packets.
      ... This is due to the fact that TCP ... If you want smaller packets, ... >> set there as the MSS is announced by the receiver during the ... Yes, per connection. ...
      (comp.lang.perl.misc)
    • Re: Pros & cons with stealth?
      ... "Upon receiving the ACK/RST client from the target host, ... Winsock implementation of TCP, a pending connection will keep attempting to ... I captured all packets to ...
      (comp.security.firewalls)
    • Re: ISP Redundancy Configuration
      ... So I can establish the connection and when I try from the ... Ack sequence numbers do not and should not increment ... after that the end points will be send an ack packet with data, ... In line 2 of figure 7, TCP A begins by sending a SYN segment ...
      (comp.security.firewalls)
    • Re: ISP Redundancy Configuration
      ... What I posted is directly from RFC793 describing how the TCP connection is ... Ack sequence numbers do not and should not increment ...
      (comp.security.firewalls)