[RFC] implement "blackhole" option for TCP and UDP

From: Michael Tokarev (mjt_at_tls.msk.ru)
Date: 05/31/05

  • Next message: Jens Axboe: "Re: [PATCH] SATA NCQ #3"
    Date:	Tue, 31 May 2005 19:06:22 +0400
    To: linux-kernel@vger.kernel.org
    
    
    

    The patch below is an RFC only. It implements "blackhole"
    interface option for IPv4 (for now) TCP and UDP protocols,
    adding a boolean "blackhole" in /proc/sys/net/ipv4/conf/*/
    (per-interface). The intent is to be able to ignore common
    "invalid" IP traffic on (usually high-loaded) internet
    servers, to stop such machines to participate in several
    kinds of [D]DoS attacks and to reduce the load and bandwidth
    utilisation.

    Before someone asks: no, this feature isn't easily doable
    using iptables, because of the following reasons:

     o iptables are somewhat heavy-weight esp for high-traffic sites

     o it is not possible to allow outgoing connections to pass
       with iptables, without ip_conntrack module

     o ip_conntrack has alot of its own limitations, again esp.
       for high-traffic sites

    The patch below adds a simple check into the non-speed-critical
    path, when, after realizing the destination port isn't listening,
    ICMP port unreach (for UDP) or RST (for TCP) packet is generated.

    This feature is similar to the one found on eg FreeBSD, with several
    differences:

     o FreeBSD have global flags, while this patch implements the
       feature per-interface

     o FreeBSD uses two different flags for TCP and UDP, and distinguishes
       between TCP SYN packets (initial TCP handshake) and the rest,
       while this patch has one boolean (per interface) for both
       UDP and TCP.

    There's an implementation question I have unanswered, too. When,
    having an skb pointer, we're checking if a corresponding (input)
    interface has 'blackhole' option set, we should reference
    skb->input_dev. All accesses to this variable are done with a
    lock set, using in_dev_get(). In this patch, I used lockless
    __in_dev_get() variant, which does not seem to be right.. Is
    it the only way here to use in_dev_get() and in_dev_put()?

    Thanks.

    /mjt

    
    

    This patch adds a sys/ipv4/conf/*/blackhole
    boolean option. When set, it makes the
    interface in question to be a "black hole",
    that is, the interface does not answer to
    network packets sent to ports without a
    listener (or corresponding to established
    connections).

    BOLD NOTE: the patch uses __in_dev_get()
    routine incorrectly (but proper usage,
    with in_dev_get(), requires locking in
    a fast path) -- it can oops SMP kernel
    if a device gets removed right when we're
    desciding if we should send an ICMP/RST.

    Another note: probably should test if
    __in_dev_get() returned something != NULL,
    too.

    --- linux-2.6.11.orig/include/linux/inetdevice.h Wed Mar 2 10:38:13 2005
    +++ linux-2.6.11/include/linux/inetdevice.h Tue May 17 19:21:11 2005
    @@ -29,6 +29,7 @@ struct ipv4_devconf
             int no_xfrm;
             int no_policy;
             int force_igmp_version;
    + int blackhole;
             void *sysctl;
     };
     
    @@ -71,6 +72,7 @@ struct in_device
     #define IN_DEV_SEC_REDIRECTS(in_dev) (ipv4_devconf.secure_redirects || (in_dev)->cnf.secure_redirects)
     #define IN_DEV_IDTAG(in_dev) ((in_dev)->cnf.tag)
     #define IN_DEV_MEDIUM_ID(in_dev) ((in_dev)->cnf.medium_id)
    +#define IN_DEV_BLACKHOLE(in_dev) (ipv4_devconf.blackhole || ((in_dev) && (in_dev)->cnf.blackhole))
     
     #define IN_DEV_RX_REDIRECTS(in_dev) \
             ((IN_DEV_FORWARD(in_dev) && \
    --- linux-2.6.11.orig/include/linux/sysctl.h Tue May 17 19:04:21 2005
    +++ linux-2.6.11/include/linux/sysctl.h Tue May 17 19:22:58 2005
    @@ -399,6 +399,7 @@ enum
             NET_IPV4_CONF_FORCE_IGMP_VERSION=17,
             NET_IPV4_CONF_ARP_ANNOUNCE=18,
             NET_IPV4_CONF_ARP_IGNORE=19,
    + NET_IPV4_CONF_BLACKHOLE=20,
     };
     
     /* /proc/sys/net/ipv4/netfilter */
    --- linux-2.6.11.orig/net/ipv4/devinet.c Wed Mar 2 10:37:50 2005
    +++ linux-2.6.11/net/ipv4/devinet.c Tue May 17 20:20:21 2005
    @@ -1212,7 +1212,7 @@ int ipv4_doint_and_flush_strategy(ctl_ta
     
     static struct devinet_sysctl_table {
             struct ctl_table_header *sysctl_header;
    - ctl_table devinet_vars[20];
    + ctl_table devinet_vars[21];
             ctl_table devinet_dev[2];
             ctl_table devinet_conf_dir[2];
             ctl_table devinet_proto_dir[2];
    @@ -1373,6 +1373,14 @@ static struct devinet_sysctl_table {
                             .mode = 0644,
                             .proc_handler = &ipv4_doint_and_flush,
                             .strategy = &ipv4_doint_and_flush_strategy,
    + },
    + {
    + .ctl_name = NET_IPV4_CONF_BLACKHOLE,
    + .procname = "blackhole",
    + .data = &ipv4_devconf.blackhole,
    + .maxlen = sizeof(int),
    + .mode = 0644,
    + .proc_handler = &proc_dointvec,
                     },
             },
             .devinet_dev = {
    --- linux-2.6.11.orig/net/ipv4/tcp_ipv4.c Wed Mar 2 10:37:54 2005
    +++ linux-2.6.11/net/ipv4/tcp_ipv4.c Tue May 17 20:34:13 2005
    @@ -1166,6 +1166,9 @@ static void tcp_v4_send_reset(struct sk_
             if (((struct rtable *)skb->dst)->rt_type != RTN_LOCAL)
                     return;
     
    + if (IN_DEV_BLACKHOLE(__in_dev_get(skb->input_dev)))
    + return;
    +
             /* Swap the send and the receive. */
             memset(&rth, 0, sizeof(struct tcphdr));
             rth.dest = th->source;
    --- linux-2.6.11.orig/net/ipv4/udp.c Wed Mar 2 10:37:49 2005
    +++ linux-2.6.11/net/ipv4/udp.c Tue May 17 20:23:42 2005
    @@ -1168,7 +1168,8 @@ int udp_rcv(struct sk_buff *skb)
                     goto csum_error;
     
             UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
    - icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
    + if (!IN_DEV_BLACKHOLE(__in_dev_get(skb->input_dev)))
    + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
     
             /*
              * Hmm. We got an UDP packet to a port to which we

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Jens Axboe: "Re: [PATCH] SATA NCQ #3"

    Relevant Pages

    • Re: NTP and Firewall help needed.
      ... >>port 123 for udp and tcp. ... The action here is applied for packets that fall off ... > - ACCEPT any and all traffic coming from the localhost interface ...
      (comp.os.linux.setup)
    • RE: Entirely ignoring TCP and UDP checksum in kernel level
      ... Indeed I already had done that UDP and TCP patch before I posted this ... - I send a packet with wrong checksum to a closed UDP port, ... I send a packet with wrong checksum to a closed TCP port, ...
      (Linux-Kernel)
    • Re: NTP and Firewall help needed.
      ... >port 123 for udp and tcp. ... Also the idea of combining rules for packets arriving at the local machine ... ACCEPT any and all traffic coming from the localhost interface ...
      (comp.os.linux.setup)
    • RE: kinit fail on AIX
      ... This is the same patch that worked for us also. ... known service for both udp and tcp, ... unsigned long lport; ...
      (comp.protocols.kerberos)
    • Re: A question regarding MTU: how it can effect TCP performance + other queries
      ... I am sending one IP packet that I receive over virtual ... interface to the physical one using socket. ... datagram size of 1470 bytes and UDP buffer size of 8KB. ... What packet size IO are you doing in case of TCP? ...
      (microsoft.public.development.device.drivers)