Discussion:
[Bridge] [PATCH net-next v5 0/3] bridge: neigh msg proxy and flood suppression support
Roopa Prabhu
2017-10-06 18:34:37 UTC
Permalink
From: Roopa Prabhu <***@cumulusnetworks.com>

This series implements arp and nd suppression in the bridge
driver for ethernet vpns. It implements rfc7432, section 10
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.

In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.

v2 : rebase to latest + address some optimization feedback from Nikolay.
v3 : fix kbuild reported build errors with CONFIG_INET off
v4 : simplify port flag mask as suggested by stephen
v5 : address some feedback from Toshiaki

Roopa Prabhu (3):
bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd
flood
neigh arp suppress first
bridge: suppress nd messages from going to BR_NEIGH_SUPPRESS ports

include/linux/if_bridge.h | 1 +
include/uapi/linux/if_link.h | 1 +
net/bridge/Makefile | 2 +-
net/bridge/br_arp_nd_proxy.c | 492 +++++++++++++++++++++++++++++++++++++++++++
net/bridge/br_device.c | 18 ++
net/bridge/br_forward.c | 3 +-
net/bridge/br_if.c | 5 +
net/bridge/br_input.c | 73 ++-----
net/bridge/br_netlink.c | 16 +-
net/bridge/br_private.h | 9 +
net/bridge/br_sysfs_if.c | 2 +
11 files changed, 561 insertions(+), 61 deletions(-)
create mode 100644 net/bridge/br_arp_nd_proxy.c
--
2.1.4
Roopa Prabhu
2017-10-06 18:34:38 UTC
Permalink
From: Roopa Prabhu <***@cumulusnetworks.com>

This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to
suppress arp and nd flood on bridge ports. It implements
rfc7432, section 10.
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.

In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.

This patch adds netlink and sysfs support to set this bridge port
flag.

Signed-off-by: Roopa Prabhu <***@cumulusnetworks.com>
---
include/linux/if_bridge.h | 1 +
include/uapi/linux/if_link.h | 1 +
net/bridge/Makefile | 2 +-
net/bridge/br_arp_nd_proxy.c | 32 ++++++++++++++++++++++++++++++++
net/bridge/br_forward.c | 2 +-
net/bridge/br_if.c | 5 +++++
net/bridge/br_netlink.c | 10 +++++++++-
net/bridge/br_private.h | 2 ++
net/bridge/br_sysfs_if.c | 2 ++
9 files changed, 54 insertions(+), 3 deletions(-)
create mode 100644 net/bridge/br_arp_nd_proxy.c

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 3cd18ac..316ee11 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -49,6 +49,7 @@ struct br_ip_list {
#define BR_MULTICAST_TO_UNICAST BIT(12)
#define BR_VLAN_TUNNEL BIT(13)
#define BR_BCAST_FLOOD BIT(14)
+#define BR_NEIGH_SUPPRESS BIT(15)

#define BR_DEFAULT_AGEING_TIME (300 * HZ)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index cd580fc..b037e0a 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -327,6 +327,7 @@ enum {
IFLA_BRPORT_VLAN_TUNNEL,
IFLA_BRPORT_BCAST_FLOOD,
IFLA_BRPORT_GROUP_FWD_MASK,
+ IFLA_BRPORT_NEIGH_SUPPRESS,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index 40b1ede..4aee55f 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_BRIDGE) += bridge.o
bridge-y := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \
br_ioctl.o br_stp.o br_stp_bpdu.o \
br_stp_if.o br_stp_timer.o br_netlink.o \
- br_netlink_tunnel.o
+ br_netlink_tunnel.o br_arp_nd_proxy.o

bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o

diff --git a/net/bridge/br_arp_nd_proxy.c b/net/bridge/br_arp_nd_proxy.c
new file mode 100644
index 0000000..f889ad5
--- /dev/null
+++ b/net/bridge/br_arp_nd_proxy.c
@@ -0,0 +1,32 @@
+/*
+ * Handle bridge arp/nd proxy/suppress
+ *
+ * Copyright (C) 2017 Cumulus Networks
+ * Copyright (c) 2017 Roopa Prabhu <***@cumulusnetworks.com>
+ *
+ * Authors:
+ * Roopa Prabhu <***@cumulusnetworks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include "br_private.h"
+
+void br_recalculate_neigh_suppress_enabled(struct net_bridge *br)
+{
+ struct net_bridge_port *p;
+ bool neigh_suppress = false;
+
+ list_for_each_entry(p, &br->port_list, list) {
+ if (p->flags & BR_NEIGH_SUPPRESS) {
+ neigh_suppress = true;
+ break;
+ }
+ }
+
+ br->neigh_suppress_enabled = neigh_suppress;
+}
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 48fb174..b4eed11 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -204,7 +204,7 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
/* Do not flood to ports that enable proxy ARP */
if (p->flags & BR_PROXYARP)
continue;
- if ((p->flags & BR_PROXYARP_WIFI) &&
+ if ((p->flags & (BR_PROXYARP_WIFI | BR_NEIGH_SUPPRESS)) &&
BR_INPUT_SKB_CB(skb)->proxyarp_replied)
continue;

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 59a74a4..ae38547 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -310,6 +310,8 @@ void br_dev_delete(struct net_device *dev, struct list_head *head)
del_nbp(p);
}

+ br_recalculate_neigh_suppress_enabled(br);
+
br_fdb_delete_by_port(br, NULL, 0, 1);

cancel_delayed_work_sync(&br->gc_work);
@@ -660,4 +662,7 @@ void br_port_flags_change(struct net_bridge_port *p, unsigned long mask)

if (mask & BR_AUTO_MASK)
nbp_update_port_count(br);
+
+ if (mask & BR_NEIGH_SUPPRESS)
+ br_recalculate_neigh_suppress_enabled(br);
}
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index dea88a2..f0e8268 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -138,6 +138,7 @@ static inline size_t br_port_info_size(void)
+ nla_total_size(1) /* IFLA_BRPORT_PROXYARP */
+ nla_total_size(1) /* IFLA_BRPORT_PROXYARP_WIFI */
+ nla_total_size(1) /* IFLA_BRPORT_VLAN_TUNNEL */
+ + nla_total_size(1) /* IFLA_BRPORT_NEIGH_SUPPRESS */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_ROOT_ID */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_BRIDGE_ID */
+ nla_total_size(sizeof(u16)) /* IFLA_BRPORT_DESIGNATED_PORT */
@@ -210,7 +211,9 @@ static int br_port_fill_attrs(struct sk_buff *skb,
nla_put_u8(skb, IFLA_BRPORT_CONFIG_PENDING, p->config_pending) ||
nla_put_u8(skb, IFLA_BRPORT_VLAN_TUNNEL, !!(p->flags &
BR_VLAN_TUNNEL)) ||
- nla_put_u16(skb, IFLA_BRPORT_GROUP_FWD_MASK, p->group_fwd_mask))
+ nla_put_u16(skb, IFLA_BRPORT_GROUP_FWD_MASK, p->group_fwd_mask) ||
+ nla_put_u8(skb, IFLA_BRPORT_NEIGH_SUPPRESS,
+ !!(p->flags & BR_NEIGH_SUPPRESS)))
return -EMSGSIZE;

timerval = br_timer_value(&p->message_age_timer);
@@ -785,6 +788,11 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
p->group_fwd_mask = fwd_mask;
}

+ err = br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS,
+ BR_NEIGH_SUPPRESS);
+ if (err)
+ return err;
+
br_port_flags_change(p, old_flags ^ p->flags);
return 0;
}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index ab4df24..00fa371 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -404,6 +404,7 @@ struct net_bridge {
#ifdef CONFIG_NET_SWITCHDEV
int offload_fwd_mark;
#endif
+ bool neigh_suppress_enabled;
};

struct br_input_skb_cb {
@@ -1139,4 +1140,5 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
}
#endif /* CONFIG_NET_SWITCHDEV */

+void br_recalculate_neigh_suppress_enabled(struct net_bridge *br);
#endif
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 9110d5e..0a1fa9c 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -191,6 +191,7 @@ BRPORT_ATTR_FLAG(proxyarp, BR_PROXYARP);
BRPORT_ATTR_FLAG(proxyarp_wifi, BR_PROXYARP_WIFI);
BRPORT_ATTR_FLAG(multicast_flood, BR_MCAST_FLOOD);
BRPORT_ATTR_FLAG(broadcast_flood, BR_BCAST_FLOOD);
+BRPORT_ATTR_FLAG(neigh_suppress, BR_NEIGH_SUPPRESS);

#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -241,6 +242,7 @@ static const struct brport_attribute *brport_attrs[] = {
&brport_attr_multicast_flood,
&brport_attr_broadcast_flood,
&brport_attr_group_fwd_mask,
+ &brport_attr_neigh_suppress,
NULL
};
--
2.1.4
Roopa Prabhu
2017-10-06 18:34:39 UTC
Permalink
From: Roopa Prabhu <***@cumulusnetworks.com>

This patch avoids flooding and proxies arp packets
for BR_NEIGH_SUPPRESS ports.

Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp
to support both proxy arp and neigh suppress.

Signed-off-by: Roopa Prabhu <***@cumulusnetworks.com>
---
net/bridge/br_arp_nd_proxy.c | 188 +++++++++++++++++++++++++++++++++++++++++++
net/bridge/br_device.c | 9 +++
net/bridge/br_input.c | 63 ++-------------
net/bridge/br_private.h | 3 +
4 files changed, 205 insertions(+), 58 deletions(-)

diff --git a/net/bridge/br_arp_nd_proxy.c b/net/bridge/br_arp_nd_proxy.c
index f889ad5..a79c182 100644
--- a/net/bridge/br_arp_nd_proxy.c
+++ b/net/bridge/br_arp_nd_proxy.c
@@ -14,6 +14,14 @@
*/

#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/neighbour.h>
+#include <net/arp.h>
+#include <linux/if_vlan.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+
#include "br_private.h"

void br_recalculate_neigh_suppress_enabled(struct net_bridge *br)
@@ -30,3 +38,183 @@ void br_recalculate_neigh_suppress_enabled(struct net_bridge *br)

br->neigh_suppress_enabled = neigh_suppress;
}
+
+#if IS_ENABLED(CONFIG_INET)
+static void br_arp_send(struct net_bridge *br, struct net_bridge_port *p,
+ struct net_device *dev, __be32 dest_ip, __be32 src_ip,
+ const unsigned char *dest_hw,
+ const unsigned char *src_hw,
+ const unsigned char *target_hw,
+ __be16 vlan_proto, u16 vlan_tci)
+{
+ struct net_bridge_vlan_group *vg;
+ struct sk_buff *skb;
+ u16 pvid;
+
+ netdev_dbg(dev, "arp send dev %s dst %pI4 dst_hw %pM src %pI4 src_hw %pM\n",
+ dev->name, &dest_ip, dest_hw, &src_ip, src_hw);
+
+ if (!vlan_tci) {
+ arp_send(ARPOP_REPLY, ETH_P_ARP, dest_ip, dev, src_ip,
+ dest_hw, src_hw, target_hw);
+ return;
+ }
+
+ skb = arp_create(ARPOP_REPLY, ETH_P_ARP, dest_ip, dev, src_ip,
+ dest_hw, src_hw, target_hw);
+ if (!skb)
+ return;
+
+ if (p)
+ vg = nbp_vlan_group_rcu(p);
+ else
+ vg = br_vlan_group_rcu(br);
+ pvid = br_get_pvid(vg);
+ if (pvid == (vlan_tci & VLAN_VID_MASK))
+ vlan_tci = 0;
+
+ if (vlan_tci)
+ __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
+
+ if (p) {
+ arp_xmit(skb);
+ } else {
+ skb_reset_mac_header(skb);
+ __skb_pull(skb, skb_network_offset(skb));
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+ skb->pkt_type = PACKET_HOST;
+
+ netif_rx_ni(skb);
+ }
+}
+
+static int br_chk_addr_ip(struct net_device *dev, void *data)
+{
+ __be32 ip = *(__be32 *)data;
+ struct in_device *in_dev;
+ __be32 addr = 0;
+
+ in_dev = __in_dev_get_rcu(dev);
+ if (in_dev)
+ addr = inet_confirm_addr(dev_net(dev), in_dev, 0, ip,
+ RT_SCOPE_HOST);
+
+ if (addr == ip)
+ return 1;
+
+ return 0;
+}
+
+static bool br_is_local_ip(struct net_device *dev, __be32 ip)
+{
+ if (br_chk_addr_ip(dev, &ip))
+ return true;
+
+ /* check if ip is configured on upper dev */
+ if (netdev_walk_all_upper_dev_rcu(dev, br_chk_addr_ip, &ip))
+ return true;
+
+ return false;
+}
+
+void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
+ u16 vid, struct net_bridge_port *p)
+{
+ struct net_device *dev = br->dev;
+ struct net_device *vlandev = dev;
+ struct neighbour *n;
+ struct arphdr *parp;
+ u8 *arpptr, *sha;
+ __be32 sip, tip;
+
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = false;
+
+ if ((dev->flags & IFF_NOARP) ||
+ !pskb_may_pull(skb, arp_hdr_len(dev)))
+ return;
+
+ parp = arp_hdr(skb);
+
+ if (parp->ar_pro != htons(ETH_P_IP) ||
+ parp->ar_hln != dev->addr_len ||
+ parp->ar_pln != 4)
+ return;
+
+ arpptr = (u8 *)parp + sizeof(struct arphdr);
+ sha = arpptr;
+ arpptr += dev->addr_len; /* sha */
+ memcpy(&sip, arpptr, sizeof(sip));
+ arpptr += sizeof(sip);
+ arpptr += dev->addr_len; /* tha */
+ memcpy(&tip, arpptr, sizeof(tip));
+
+ if (ipv4_is_loopback(tip) ||
+ ipv4_is_multicast(tip))
+ return;
+
+ if (br->neigh_suppress_enabled) {
+ if (p && (p->flags & BR_NEIGH_SUPPRESS))
+ return;
+ if (ipv4_is_zeronet(sip) || sip == tip) {
+ /* prevent flooding to neigh suppress ports */
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ return;
+ }
+ }
+
+ if (parp->ar_op != htons(ARPOP_REQUEST))
+ return;
+
+ if (vid != 0) {
+ vlandev = __vlan_find_dev_deep_rcu(br->dev, skb->vlan_proto,
+ vid);
+ if (!vlandev)
+ return;
+ }
+
+ if (br->neigh_suppress_enabled && br_is_local_ip(vlandev, tip)) {
+ /* its our local ip, so don't proxy reply
+ * and don't forward to neigh suppress ports
+ */
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ return;
+ }
+
+ n = neigh_lookup(&arp_tbl, &tip, vlandev);
+ if (n) {
+ struct net_bridge_fdb_entry *f;
+
+ if (!(n->nud_state & NUD_VALID)) {
+ neigh_release(n);
+ return;
+ }
+
+ f = br_fdb_find_rcu(br, n->ha, vid);
+ if (f) {
+ bool replied = false;
+
+ if ((p && (p->flags & BR_PROXYARP)) ||
+ (f->dst && (f->dst->flags & (BR_PROXYARP_WIFI |
+ BR_NEIGH_SUPPRESS)))) {
+ if (!vid)
+ br_arp_send(br, p, skb->dev, sip, tip,
+ sha, n->ha, sha, 0, 0);
+ else
+ br_arp_send(br, p, skb->dev, sip, tip,
+ sha, n->ha, sha,
+ skb->vlan_proto,
+ skb_vlan_tag_get(skb));
+ replied = true;
+ }
+
+ /* If we have replied or as long as we know the
+ * mac, indicate to arp replied
+ */
+ if (replied || br->neigh_suppress_enabled)
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ }
+
+ neigh_release(n);
+ }
+}
+#endif
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 7acb77c..eb30c6a 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -39,6 +39,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
struct pcpu_sw_netstats *brstats = this_cpu_ptr(br->stats);
const struct nf_br_ops *nf_ops;
const unsigned char *dest;
+ struct ethhdr *eth;
u16 vid = 0;

rcu_read_lock();
@@ -57,11 +58,19 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
BR_INPUT_SKB_CB(skb)->brdev = dev;

skb_reset_mac_header(skb);
+ eth = eth_hdr(skb);
skb_pull(skb, ETH_HLEN);

if (!br_allowed_ingress(br, br_vlan_group_rcu(br), skb, &vid))
goto out;

+ if (IS_ENABLED(CONFIG_INET) &&
+ (eth->h_proto == htons(ETH_P_ARP) ||
+ eth->h_proto == htons(ETH_P_RARP)) &&
+ br->neigh_suppress_enabled) {
+ br_do_proxy_suppress_arp(skb, br, vid, NULL);
+ }
+
dest = eth_hdr(skb)->h_dest;
if (is_broadcast_ether_addr(dest)) {
br_flood(br, skb, BR_PKT_BROADCAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7cb6137..4b8d2ec 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -71,62 +71,6 @@ static int br_pass_frame_up(struct sk_buff *skb)
br_netif_receive_skb);
}

-static void br_do_proxy_arp(struct sk_buff *skb, struct net_bridge *br,
- u16 vid, struct net_bridge_port *p)
-{
- struct net_device *dev = br->dev;
- struct neighbour *n;
- struct arphdr *parp;
- u8 *arpptr, *sha;
- __be32 sip, tip;
-
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = false;
-
- if ((dev->flags & IFF_NOARP) ||
- !pskb_may_pull(skb, arp_hdr_len(dev)))
- return;
-
- parp = arp_hdr(skb);
-
- if (parp->ar_pro != htons(ETH_P_IP) ||
- parp->ar_op != htons(ARPOP_REQUEST) ||
- parp->ar_hln != dev->addr_len ||
- parp->ar_pln != 4)
- return;
-
- arpptr = (u8 *)parp + sizeof(struct arphdr);
- sha = arpptr;
- arpptr += dev->addr_len; /* sha */
- memcpy(&sip, arpptr, sizeof(sip));
- arpptr += sizeof(sip);
- arpptr += dev->addr_len; /* tha */
- memcpy(&tip, arpptr, sizeof(tip));
-
- if (ipv4_is_loopback(tip) ||
- ipv4_is_multicast(tip))
- return;
-
- n = neigh_lookup(&arp_tbl, &tip, dev);
- if (n) {
- struct net_bridge_fdb_entry *f;
-
- if (!(n->nud_state & NUD_VALID)) {
- neigh_release(n);
- return;
- }
-
- f = br_fdb_find_rcu(br, n->ha, vid);
- if (f && ((p->flags & BR_PROXYARP) ||
- (f->dst && (f->dst->flags & BR_PROXYARP_WIFI)))) {
- arp_send(ARPOP_REPLY, ETH_P_ARP, sip, skb->dev, tip,
- sha, n->ha, sha);
- BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
- }
-
- neigh_release(n);
- }
-}
-
/* note: already called with rcu_read_lock */
int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
@@ -171,8 +115,11 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb

BR_INPUT_SKB_CB(skb)->brdev = br->dev;

- if (IS_ENABLED(CONFIG_INET) && skb->protocol == htons(ETH_P_ARP))
- br_do_proxy_arp(skb, br, vid, p);
+ if (IS_ENABLED(CONFIG_INET) &&
+ (skb->protocol == htons(ETH_P_ARP) ||
+ skb->protocol == htons(ETH_P_RARP))) {
+ br_do_proxy_suppress_arp(skb, br, vid, p);
+ }

switch (pkt_type) {
case BR_PKT_MULTICAST:
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 00fa371..4e6b25b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1140,5 +1140,8 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
}
#endif /* CONFIG_NET_SWITCHDEV */

+/* br_arp_nd_proxy.c */
void br_recalculate_neigh_suppress_enabled(struct net_bridge *br);
+void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
+ u16 vid, struct net_bridge_port *p);
#endif
--
2.1.4
Roopa Prabhu
2017-10-06 18:34:40 UTC
Permalink
From: Roopa Prabhu <***@cumulusnetworks.com>

This patch avoids flooding and proxies ndisc packets
for BR_NEIGH_SUPPRESS ports.

Signed-off-by: Roopa Prabhu <***@cumulusnetworks.com>
---
net/bridge/br_arp_nd_proxy.c | 249 +++++++++++++++++++++++++++++++++++++++++++
net/bridge/br_device.c | 11 ++
net/bridge/br_input.c | 11 ++
net/bridge/br_private.h | 3 +
4 files changed, 274 insertions(+)

diff --git a/net/bridge/br_arp_nd_proxy.c b/net/bridge/br_arp_nd_proxy.c
index a79c182..6b7440d 100644
--- a/net/bridge/br_arp_nd_proxy.c
+++ b/net/bridge/br_arp_nd_proxy.c
@@ -21,6 +21,9 @@
#include <linux/if_vlan.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/ip6_checksum.h>
+#endif

#include "br_private.h"

@@ -218,3 +221,249 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
}
}
#endif
+
+#if IS_ENABLED(CONFIG_IPV6)
+struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *msg)
+{
+ struct nd_msg *m;
+
+ m = skb_header_pointer(skb, skb_network_offset(skb) +
+ sizeof(struct ipv6hdr), sizeof(*msg), msg);
+ if (!m)
+ return NULL;
+
+ if (m->icmph.icmp6_code != 0 ||
+ (m->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION &&
+ m->icmph.icmp6_type != NDISC_NEIGHBOUR_ADVERTISEMENT))
+ return NULL;
+
+ return m;
+}
+
+static void br_nd_send(struct net_bridge_port *p, struct sk_buff *request,
+ struct neighbour *n, __be16 vlan_proto, u16 vlan_tci,
+ struct nd_msg *ns)
+{
+ struct net_device *dev = request->dev;
+ struct sk_buff *reply;
+ struct nd_msg *na;
+ struct ipv6hdr *pip6;
+ u8 *daddr;
+ int na_olen = 8; /* opt hdr + ETH_ALEN for target */
+ int ns_olen;
+ int i, len;
+
+ if (!dev)
+ return;
+
+ len = LL_RESERVED_SPACE(dev) + sizeof(struct ipv6hdr) +
+ sizeof(*na) + na_olen + dev->needed_tailroom;
+
+ reply = alloc_skb(len, GFP_ATOMIC);
+ if (!reply)
+ return;
+
+ reply->protocol = htons(ETH_P_IPV6);
+ reply->dev = dev;
+ skb_reserve(reply, LL_RESERVED_SPACE(dev));
+ skb_push(reply, sizeof(struct ethhdr));
+ skb_set_mac_header(reply, 0);
+
+ daddr = eth_hdr(request)->h_source;
+
+ /* Do we need option processing ? */
+ ns_olen = request->len - (skb_network_offset(request) +
+ sizeof(struct ipv6hdr)) - sizeof(*ns);
+ for (i = 0; i < ns_olen - 1; i += (ns->opt[i + 1] << 3)) {
+ if (ns->opt[i] == ND_OPT_SOURCE_LL_ADDR) {
+ daddr = ns->opt + i + sizeof(struct nd_opt_hdr);
+ break;
+ }
+ }
+
+ /* Ethernet header */
+ ether_addr_copy(eth_hdr(reply)->h_dest, daddr);
+ ether_addr_copy(eth_hdr(reply)->h_source, n->ha);
+ eth_hdr(reply)->h_proto = htons(ETH_P_IPV6);
+ reply->protocol = htons(ETH_P_IPV6);
+
+ skb_pull(reply, sizeof(struct ethhdr));
+ skb_set_network_header(reply, 0);
+ skb_put(reply, sizeof(struct ipv6hdr));
+
+ /* IPv6 header */
+ pip6 = ipv6_hdr(reply);
+ memset(pip6, 0, sizeof(struct ipv6hdr));
+ pip6->version = 6;
+ pip6->priority = ipv6_hdr(request)->priority;
+ pip6->nexthdr = IPPROTO_ICMPV6;
+ pip6->hop_limit = 255;
+ pip6->daddr = ipv6_hdr(request)->saddr;
+ pip6->saddr = *(struct in6_addr *)n->primary_key;
+
+ skb_pull(reply, sizeof(struct ipv6hdr));
+ skb_set_transport_header(reply, 0);
+
+ na = (struct nd_msg *)skb_put(reply, sizeof(*na) + na_olen);
+
+ /* Neighbor Advertisement */
+ memset(na, 0, sizeof(*na) + na_olen);
+ na->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT;
+ na->icmph.icmp6_router = 0; /* XXX: should be 1 ? */
+ na->icmph.icmp6_override = 1;
+ na->icmph.icmp6_solicited = 1;
+ na->target = ns->target;
+ ether_addr_copy(&na->opt[2], n->ha);
+ na->opt[0] = ND_OPT_TARGET_LL_ADDR;
+ na->opt[1] = na_olen >> 3;
+
+ na->icmph.icmp6_cksum = csum_ipv6_magic(&pip6->saddr,
+ &pip6->daddr,
+ sizeof(*na) + na_olen,
+ IPPROTO_ICMPV6,
+ csum_partial(na, sizeof(*na) + na_olen, 0));
+
+ pip6->payload_len = htons(sizeof(*na) + na_olen);
+
+ skb_push(reply, sizeof(struct ipv6hdr));
+ skb_push(reply, sizeof(struct ethhdr));
+
+ reply->ip_summed = CHECKSUM_UNNECESSARY;
+
+ if (p) {
+ struct net_bridge_vlan_group *vg;
+ u16 pvid;
+
+ vg = nbp_vlan_group_rcu(p);
+ pvid = br_get_pvid(vg);
+ if (pvid && pvid == vlan_tci)
+ vlan_tci = 0;
+ }
+
+ if (vlan_tci != 0) {
+ reply = vlan_insert_tag_set_proto(reply, vlan_proto, vlan_tci);
+ if (!reply) {
+ net_err_ratelimited("evpn: failed to insert VLAN tag\n");
+ return;
+ }
+ }
+
+ netdev_dbg(dev, "nd send dev %s dst %pI6 dst_hw %pM src %pI6 src_hw %pM\n",
+ dev->name, &pip6->daddr, daddr, &pip6->saddr, n->ha);
+
+ dev_queue_xmit(reply);
+}
+
+static int br_chk_addr_ip6(struct net_device *dev, void *data)
+{
+ struct in6_addr *addr = (struct in6_addr *)data;
+
+ if (ipv6_chk_addr(dev_net(dev), addr, dev, 0))
+ return 1;
+
+ return 0;
+}
+
+static bool br_is_local_ip6(struct net_device *dev, struct in6_addr *addr)
+
+{
+ if (br_chk_addr_ip6(dev, addr))
+ return true;
+
+ /* check if ip is configured on upper dev */
+ if (netdev_walk_all_upper_dev_rcu(dev, br_chk_addr_ip6, addr))
+ return true;
+
+ return false;
+}
+
+void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
+ u16 vid, struct net_bridge_port *p, struct nd_msg *msg)
+{
+ struct net_device *dev = br->dev;
+ struct net_device *vlandev = NULL;
+ struct in6_addr *saddr, *daddr;
+ struct ipv6hdr *iphdr;
+ struct inet6_dev *in6_dev;
+ struct neighbour *n;
+
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = false;
+
+ if (p && (p->flags & BR_NEIGH_SUPPRESS))
+ return;
+
+ if (msg->icmph.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT &&
+ !msg->icmph.icmp6_solicited) {
+ /* prevent flooding to neigh suppress ports */
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ return;
+ }
+
+ if (msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION)
+ return;
+
+ in6_dev = __in6_dev_get(dev);
+ if (!in6_dev)
+ return;
+
+ iphdr = ipv6_hdr(skb);
+ saddr = &iphdr->saddr;
+ daddr = &iphdr->daddr;
+
+ if (ipv6_addr_any(saddr) || !ipv6_addr_cmp(saddr, daddr)) {
+ /* prevent flooding to neigh suppress ports */
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ return;
+ }
+
+ if (vid != 0) {
+ /* build neigh table lookup on the vlan device */
+ vlandev = __vlan_find_dev_deep_rcu(br->dev, skb->vlan_proto,
+ vid);
+ if (!vlandev)
+ return;
+ } else {
+ vlandev = dev;
+ }
+
+ if (br_is_local_ip6(vlandev, &msg->target)) {
+ /* its our own ip, so don't proxy reply
+ * and don't forward to arp suppress ports
+ */
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ return;
+ }
+
+ n = neigh_lookup(ipv6_stub->nd_tbl, &msg->target, vlandev);
+ if (n) {
+ struct net_bridge_fdb_entry *f;
+
+ if (!(n->nud_state & NUD_VALID)) {
+ neigh_release(n);
+ return;
+ }
+
+ f = br_fdb_find_rcu(br, n->ha, vid);
+ if (f) {
+ bool replied = false;
+
+ if (f->dst && (f->dst->flags & BR_NEIGH_SUPPRESS)) {
+ if (vid != 0)
+ br_nd_send(p, skb, n, skb->vlan_proto,
+ skb_vlan_tag_get(skb), msg);
+ else
+ br_nd_send(p, skb, n, 0, 0, msg);
+ replied = true;
+ }
+
+ /* If we have replied or as long as we know the
+ * mac, indicate to NEIGH_SUPPRESS ports that we
+ * have replied
+ */
+ if (replied || br->neigh_suppress_enabled)
+ BR_INPUT_SKB_CB(skb)->proxyarp_replied = true;
+ }
+ neigh_release(n);
+ }
+}
+#endif
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index eb30c6a..28bb221 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -69,6 +69,17 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
eth->h_proto == htons(ETH_P_RARP)) &&
br->neigh_suppress_enabled) {
br_do_proxy_suppress_arp(skb, br, vid, NULL);
+ } else if (IS_ENABLED(CONFIG_IPV6) &&
+ skb->protocol == htons(ETH_P_IPV6) &&
+ br->neigh_suppress_enabled &&
+ pskb_may_pull(skb, sizeof(struct ipv6hdr) +
+ sizeof(struct nd_msg)) &&
+ ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
+ struct nd_msg *msg, _msg;
+
+ msg = br_is_nd_neigh_msg(skb, &_msg);
+ if (msg)
+ br_do_suppress_nd(skb, br, vid, NULL, msg);
}

dest = eth_hdr(skb)->h_dest;
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 4b8d2ec..a096d3e 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -119,6 +119,17 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
(skb->protocol == htons(ETH_P_ARP) ||
skb->protocol == htons(ETH_P_RARP))) {
br_do_proxy_suppress_arp(skb, br, vid, p);
+ } else if (IS_ENABLED(CONFIG_IPV6) &&
+ skb->protocol == htons(ETH_P_IPV6) &&
+ br->neigh_suppress_enabled &&
+ pskb_may_pull(skb, sizeof(struct ipv6hdr) +
+ sizeof(struct nd_msg)) &&
+ ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
+ struct nd_msg *msg, _msg;
+
+ msg = br_is_nd_neigh_msg(skb, &_msg);
+ if (msg)
+ br_do_suppress_nd(skb, br, vid, p, msg);
}

switch (pkt_type) {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 4e6b25b..fa0039f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1144,4 +1144,7 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb)
void br_recalculate_neigh_suppress_enabled(struct net_bridge *br);
void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br,
u16 vid, struct net_bridge_port *p);
+void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br,
+ u16 vid, struct net_bridge_port *p, struct nd_msg *msg);
+struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m);
#endif
--
2.1.4
Roopa Prabhu
2017-10-07 02:35:27 UTC
Permalink
Post by Roopa Prabhu
This series implements arp and nd suppression in the bridge
driver for ethernet vpns. It implements rfc7432, section 10
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.
In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.
v2 : rebase to latest + address some optimization feedback from Nikolay.
v3 : fix kbuild reported build errors with CONFIG_INET off
v4 : simplify port flag mask as suggested by stephen
v5 : address some feedback from Toshiaki
Looks like I missed applying a cleanup done in v5 to the ipv6 nd path.
Dave, I see these patches 'under review' in patchworks. If its not too
late, pls drop them. I will spin v6 in a few hrs. thanks.
David Miller
2017-10-07 05:46:56 UTC
Permalink
From: Roopa Prabhu <***@cumulusnetworks.com>
Date: Fri, 6 Oct 2017 19:35:27 -0700
Post by Roopa Prabhu
Post by Roopa Prabhu
This series implements arp and nd suppression in the bridge
driver for ethernet vpns. It implements rfc7432, section 10
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.
In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.
v2 : rebase to latest + address some optimization feedback from Nikolay.
v3 : fix kbuild reported build errors with CONFIG_INET off
v4 : simplify port flag mask as suggested by stephen
v5 : address some feedback from Toshiaki
Looks like I missed applying a cleanup done in v5 to the ipv6 nd path.
Dave, I see these patches 'under review' in patchworks. If its not too
late, pls drop them. I will spin v6 in a few hrs. thanks.
Ok.

Loading...