Re: multiple link aggregation questions: LAG /LACP /IEEE 802.3ad/ etc.



Rahul <nospam@xxxxxxxxxxxxxx> wrote:
Rick Jones <rick.jones2@xxxxxx> wrote in news:ga4fl7$ndc$1
@usenet01.boi.hp.com:

I think so but my point of view may not be shared by others. IIRC the
only mode that will spread the _outbound_ traffic of a single
connection/flow across multiple links in the bond/trunk/aggregate is
mode-rr aka round-robin.
I've never been terribly fond of that mode because it leads to
out-of-order TCP segments and a resulting increase in ACKs and
depending on the number of links in the bond/trunk/aggregate spurrious
TCP retransmissions.

Interesting. Any downsides to mode-rr?

It leads to out-of-order TCP segments, which leads to an increase in
the number of ACKs, which will increase CPU utilization per KB
transferred (service demand in netperf-speak) and on the larger link
counts in a single aggregate, spurrious TCP retransmissions which will
waste bandwidth and suppress the congestion window.

Is it transmit-side load balancing only?

Yes.

Also, why do you think that some of the other "smarter" modes (alb /
802.3ab) do not achieve a bandwidth multiplier, can I ask? Just a
personal preference or anything fundamentally iffy about those
modes?

Unless I've really misunderstood what is going on, the modes playing
tricks with ARP cannot on first principles affect a single flow. They
get traffic to flow over different links by handing-out different MAC
addreses to queries for their one local IP. Even if we assume that
every segment sent on a TCP connection does an ARP cache lookup, the
only way it could get a new MAC address each time would be if there
was an ARP update between every TCP segment. I cannot imagine any of
the modes in linux bonding doing something sooo terribly inefficient.
It would make mode-rr look positively pristine in comparison.

The point of link aggregation was to increase aggregate throughput and
provide a modicum of HA. Increasing the speed of a single flow was
not part of the design center.

I am not familiar with any switch with a similar round-robin mode for
the inbound traffic. Doesn't mean they don't exist mind you...

I thought a LAG was the same idea. If a switch cannot distinguish between
two similar links and clubs them together doesn't that achive the same
effect? Maybe I am wrong.

All depends on what the switch does. My experience with other
switches (non-Dell) has been that when presented with an aggregate the
switch will hash on some addressing in the frame to pick the link on
which it will place the frame. Soemtimes this is simply the MAC,
sometimes it may include the IP. I've heard unconfirmed rumours that
some switches may even go so far as to look at TCP/UDP port numbers.
However, none of that would result in traffic for a single flow
flowing over multiple links in parallel.

Those adaptive modes which are doing clever things with MAC
addresses are (probably) doing them for different destinations (IP
addresses). It would be necessary to _constantly_ be sending ARP
refreshes (as in an ARP frame for virtually every frame carrying a
TCP segment) to get traffic between a single pair of IPs to spread
across different MAC addresses.

Right. Which is why mode=6 (alb) will only (IMO) give a bandwidth
multiplier when speaking to *at least* two different peers. When
talking to a single peer (single IP) no advantage.

Right, and you said you needed an increase for comms to a single peer
right?

IMO the best-if-not-only way to get > 1Gbit/s for a single TCP
connection is to use a 10G link.

Too expensive for a university-research cluster! :)

How did the line go in "The Right Stuff?" "No bucks, no Buck Rogers."
:)

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
.