はじめに

JVNVU#91510483 複数の TCP 実装にサービス運用妨害 (DoS) の脆弱性
がアナウンスされましたのでどういう脆弱性か調査してみます。

リソース枯渇の脆弱性 (CWE-400) - CVE-2018-5390
Linux カーネルに対し、tcp_collapse_ofo_queue() や tcp_prune_ofo_queue() 関数による処理がパケット毎に行われるよう細工したパケットを送りつけることで、サービス運用妨害 (DoS) 攻撃が可能であることが報告されています。

この脆弱性に対し、攻撃者は TCP セッション中に細工したパケットを送ることで攻撃を行います。DoS 状態を維持するためには細工したパケットを処理させ続ける必要があるため、送信元 IP アドレスを偽装した攻撃を行うことはできません。

影響を受けるシステム
Linux カーネル version 4.9 およびそれ以降のバージョン

Merge branch 'tcp-robust-ooo'に本脆弱性のパッチのソースがあります。

TCPヘッダ

https://elixir.bootlin.com/linux/v4.9/source/include/uapi/linux/tcp.h#L24

include/uapi/linux/tcp.h

struct tcphdr {
    __be16  source;
    __be16  dest;
    __be32  seq;
    __be32  ack_seq;
    __u16   doff:4,
        res1:4,
        cwr:1,
        ece:1,
        urg:1,
        ack:1,
        psh:1,
        rst:1,
        syn:1,
        fin:1;
    __be16  window;
    __sum16 check;
    __be16  urg_ptr;
};

呼び出すまでの経路

脆弱性のある関数は tcp_collapse_ofo_queue() と tcp_prune_ofo_queue() です。これらが呼び出されるまでの経路は以下です。

tcp_v4_rcv()→tcp_rcv_established()→tcp_data_queue()→tcp_data_queue_ofo()→tcp_try_rmem_schedule()

tcp_try_rmem_schedule()から脆弱性のある２関数が呼び出されます。
1) tcp_collapse_ofo_queue
tcp_try_rmem_schedule()→tcp_prune_queue()→tcp_collapse_ofo_queue()

2) tcp_prune_ofo_queue()
tcp_try_rmem_schedule()→tcp_prune_queue()→tcp_prune_ofo_queue()
と
tcp_try_rmem_schedule()→tcp_prune_ofo_queue()

tcp_collapse_ofo_queue()

https://elixir.bootlin.com/linux/v4.9/source/net/ipv4/tcp_input.c#L4901

ofoというのはOut-of-orderのことです。
TCPのパケットは順序がバラバラで届いても元の順番に直せる必要があります。
シーケンス番号が順番になっていないときはout-of-order queueとして、赤黒木でシーケンス番号をキーにして管理しています。

赤黒木で矛盾することにならないか調べ、矛盾があれば解消するように tcp_collapse() を呼び出し赤黒木を再構成します。

この tcp_collapse() がパケットが届くたびに呼び出されるように、シーケンス番号を工夫してパケットを送ればDoS状態になるようです。

パッチはシーケンス番号のみに頼るのではなく、そのパケットのサイズも見て攻撃コードと区別しているようです。

以下はパッチの内容を取り込んだソースです。

net/ipv4/tcp_input.c

/* Collapse ofo queue. Algorithm: select contiguous sequence of skbs
 * and tcp_collapse() them until all the queue is collapsed.
 */
static void tcp_collapse_ofo_queue(struct sock *sk)
 {
    struct tcp_sock *tp = tcp_sk(sk);
+   u32 range_truesize, sum_tiny = 0;
    struct sk_buff *skb, *head;
    u32 start, end;

    p = rb_first(&tp->out_of_order_queue);
    skb = rb_entry_safe(p, struct sk_buff, rbnode);
new_range:
    if (!skb) {
        p = rb_last(&tp->out_of_order_queue);
        /* Note: This is possible p is NULL here. We do not
         * use rb_entry_safe(), as ooo_last_skb is valid only
         * if rbtree is not empty.
         */
        tp->ooo_last_skb = rb_entry(p, struct sk_buff, rbnode);
        return;
    }
    start = TCP_SKB_CB(skb)->seq;
    end = TCP_SKB_CB(skb)->end_seq;
+   range_truesize = skb->truesize;

    for (head = skb;;) {
        skb = skb_rb_next(skb);

        /* Range is terminated when we see a gap or when
         * we are at the queue end.
         */
        if (!skb ||
            after(TCP_SKB_CB(skb)->seq, end) ||
            before(TCP_SKB_CB(skb)->end_seq, start)) {
-           tcp_collapse(sk, NULL, &tp->out_of_order_queue,
-                    head, skb, start, end);
+           /* Do not attempt collapsing tiny skbs */
+           if (range_truesize != head->truesize ||
+               end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) {
+               tcp_collapse(sk, NULL, &tp->out_of_order_queue,
+                        head, skb, start, end);
+           } else {
+               sum_tiny += range_truesize;
+               if (sum_tiny > sk->sk_rcvbuf >> 3)
+                   return;
+           }
            goto new_range;
        }

+       range_truesize += skb->truesize;
        if (unlikely(before(TCP_SKB_CB(skb)->seq, start)))
            start = TCP_SKB_CB(skb)->seq;
        if (after(TCP_SKB_CB(skb)->end_seq, end))
            end = TCP_SKB_CB(skb)->end_seq;
    }
}

tcp_prune_ofo_queue

https://elixir.bootlin.com/linux/v4.9/source/net/ipv4/tcp_input.c#L4954

pruneというのは「剪定する」という意味で、赤黒木で連続するシーケンス番号があればout-of-orderではないので赤黒木から取り外す処理を行います。

受信バッファの1/8よりも小さい、通常ないパケットを排除する処理により、攻撃コードを防ぐ処理をしているようです。

以下はパッチを組み込んだソースです。

net/ipv4/tcp_input.c

/*
 * Clean the out-of-order queue to make room.
 * We drop high sequences packets to :
 * 1) Let a chance for holes to be filled.
 * 2) not add too big latencies if thousands of packets sit there.
 *    (But if application shrinks SO_RCVBUF, we could still end up
 *     freeing whole queue here)
+ * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
 *
 * Return true if queue has shrunk.
 */
static bool tcp_prune_ofo_queue(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);
    struct rb_node *node, *prev;
+   int goal;

    if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
        return false;

    NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
+   goal = sk->sk_rcvbuf >> 3; // size of receive buffer in bytes

    node = &tp->ooo_last_skb->rbnode;
    do {
        prev = rb_prev(node);
        rb_erase(node, &tp->out_of_order_queue);
+       goal -= rb_to_skb(node)->truesize;
        tcp_drop(sk, rb_to_skb(node));
-       sk_mem_reclaim(sk);
-       if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
-           !tcp_under_memory_pressure(sk))
-           break;
+       if (!prev || goal <= 0) {
+           sk_mem_reclaim(sk);
+           if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
+               !tcp_under_memory_pressure(sk))
+               break;
+           goal = sk->sk_rcvbuf >> 3;
+       }
        node = prev;
    } while (node);
    tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);

    /* Reset SACK state.  A conforming SACK implementation will
     * do the same at a timeout based retransmit.  When a connection
     * is in a sad state like this, we care only about integrity
     * of the connection not performance.
     */
    if (tp->rx_opt.sack_ok)
        tcp_sack_reset(&tp->rx_opt);
    return true;
}