2025 年 6 月
一	二	三	四	五	六	日
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

在net/ipv4/tcp_ipv4.c中定义了一个struct inet_hashinfo tcp_hashinfo，用于记录所有的TCP socket。

Table of Contents

inet_hashinfo

struct inet_hashinfo {
	/* This is for sockets with full identity only.  Sockets here will
	 * always be without wildcards and will have the following invariant:
	 *
	 *          TCP_ESTABLISHED <= sk->sk_state < TCP_CLOSE
	 *
	 */
	struct inet_ehash_bucket	*ehash;
	spinlock_t			*ehash_locks;
	unsigned int			ehash_mask;
	unsigned int			ehash_locks_mask;

	/* Ok, let's try this, I give up, we do need a local binding
	 * TCP hash as well as the others for fast bind/connect.
	 */
	struct kmem_cache		*bind_bucket_cachep;
	struct inet_bind_hashbucket	*bhash;
	unsigned int			bhash_size;

	/* The 2nd listener table hashed by local port and address */
	unsigned int			lhash2_mask;
	struct inet_listen_hashbucket	*lhash2;

	/* All the above members are written once at bootup and
	 * never written again _or_ are predominantly read-access.
	 *
	 * Now align to a new cache line as all the following members
	 * might be often dirty.
	 */
	/* All sockets in TCP_LISTEN state will be in listening_hash.
	 * This is the only table where wildcard'd TCP sockets can
	 * exist.  listening_hash is only hashed by local port number.
	 * If lhash2 is initialized, the same socket will also be hashed
	 * to lhash2 by port and address.
	 */
	struct inet_listen_hashbucket	listening_hash[INET_LHTABLE_SIZE]
					____cacheline_aligned_in_smp;
};

从inet_hashinfo的定义可以看出，这个hash table是分为三部分的：

用于记录已经建立连接的TCP连接的socket的hash table ehash
处于listen状态的TCP socket的hash table listening_hash
listening_hash只用监听端口来hash；
根据监听端口和地址来hash的，用于记录listen状态socket的lhash2；

ehash

1.每个bucket的定义

对于已经完全建立连接的TCP socket，ehash的每个inet_ehash_bucket的定义是：

struct inet_ehash_bucket {
	struct hlist_nulls_head chain;
};

继续看每个bucket里的链表的定义：

struct hlist_nulls_head {
	struct hlist_nulls_node *first;
};

继续看链表的每个节点的定义：

struct hlist_nulls_node {
	struct hlist_nulls_node *next, **pprev;
};

就是一个普通的链表节点定义，但是这个命名里带了一个nulls，因为链表的末尾不是一个NULL指针，而是所谓的null marker：如果next指针的值的最后一位是1，那么这就不是一个有效的链表节点指针，而是一个特殊的null marker，将这个值右移一位有他用。

2.hash算法

就是根据hash表里设置的ehash_mask和给定的hash值相与：

static inline struct inet_ehash_bucket *inet_ehash_bucket(
	struct inet_hashinfo *hashinfo,
	unsigned int hash)
{
	return &hashinfo->ehash[hash & hashinfo->ehash_mask];
}

这个传入的hash值的算法：

static u32 inet_ehashfn(const struct net *net, const __be32 laddr,
			const __u16 lport, const __be32 faddr,
			const __be16 fport)
{
	static u32 inet_ehash_secret __read_mostly;

	net_get_random_once(&inet_ehash_secret, sizeof(inet_ehash_secret));

	return __inet_ehashfn(laddr, lport, faddr, fport,
			      inet_ehash_secret + net_hash_mix(net));
}

这个inet_ehash_secret是一次性初始化的随机数。net_hash_mix(net)则是把net namespace这个因素也加进去，最终实际上使用的hash函数是：

static inline unsigned int __inet_ehashfn(const __be32 laddr,
					  const __u16 lport,
					  const __be32 faddr,
					  const __be16 fport,
					  u32 initval)
{
	return jhash_3words((__force __u32) laddr,
			    (__force __u32) faddr,
			    ((__u32) lport) << 16 | (__force __u32)fport,
			    initval);
}

具体的hash算法就不继续看了，总之就是根据TCP连接的四元祖，inet_ehash_secret，net namespace这3种因素算出的。

3.ehash的初始化

在tcp.c的tcp_init函数里会初始化整个tcp_hashinfo表，其中alloc_large_system_hash用于初始化ehash和bhash。可以从alloc_large_system_hash的实现中看出ehash_mask之类的变量既是mask，又是hash表的容量，只不过这个容量正好是2的幂，方便hash用与的方式取余数。

4.ehash的修改

a.向ehash中插入socket；

根据从ehash中删除socket的函数，倒着往回找，发现是tcp_conn_request函数在往hash表里插入socket，不过这时候插入的是struct request_sock *，在完成三次握手之后，再在tcp_v4_syn_recv_sock函数中插入新的struct sock *，并移除之前插入的struct request_sock *。

b.从ehash中删除socket；

reqsk_queue_unlink中删除struct request_sock *，并且清除这个request_sock相应的timer；
inet_unhash中删除，inet_hash不仅删除ehash中的socket,也可以删除listen表中的socket，但是inet_unhash删除时不会删除相应的timer；
inet_twsk_hashdance中删除；

近期评论

近期文章

计算机 · 2021年12月29日 0

内核中的tcp hash table

inet_hashinfo

ehash

1.每个bucket的定义

2.hash算法

3.ehash的初始化

4.ehash的修改

a.向ehash中插入socket；

b.从ehash中删除socket；

发表回复取消回复

计算机 · 2021年12月29日 0

inet_hashinfo

ehash

1.每个bucket的定义

2.hash算法

3.ehash的初始化

4.ehash的修改

a.向ehash中插入socket；

b.从ehash中删除socket；

发表回复 取消回复

发表回复取消回复