Age | Commit message (Collapse) | Author |
|
Packet socket receive buffers default to the sysctl value of
net.core.rmem_default and are capped by net.core.rmem_max both
which are usually set to 208KB on most systems.
Since we can't expect every gVisor user to bump these we use
SO_RCVBUFFORCE to exceed the limit. This is possible as runsc runs
with CAP_NET_ADMIN outside the sandbox and can do this before
the FD is passed to the sentry inside the sandbox.
Updates #211
iperf output w/ 4MB buffer.
iperf3 -c 172.17.0.2 -t 100
Connecting to host 172.17.0.2, port 5201
[ 4] local 172.17.0.1 port 40378 connected to 172.17.0.2 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.02 MBytes
[ 4] 1.00-2.00 sec 1.18 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 2.00-3.00 sec 965 MBytes 8.09 Gbits/sec 0 1.02 MBytes
[ 4] 3.00-4.00 sec 942 MBytes 7.90 Gbits/sec 0 1.02 MBytes
[ 4] 4.00-5.00 sec 952 MBytes 7.99 Gbits/sec 0 1.02 MBytes
[ 4] 5.00-6.00 sec 1.14 GBytes 9.81 Gbits/sec 0 1.02 MBytes
[ 4] 6.00-7.00 sec 1.13 GBytes 9.68 Gbits/sec 0 1.02 MBytes
[ 4] 7.00-8.00 sec 930 MBytes 7.80 Gbits/sec 0 1.02 MBytes
[ 4] 8.00-9.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.02 MBytes
[ 4] 9.00-10.00 sec 938 MBytes 7.87 Gbits/sec 0 1.02 MBytes
[ 4] 10.00-11.00 sec 737 MBytes 6.18 Gbits/sec 0 1.02 MBytes
[ 4] 11.00-12.00 sec 1.16 GBytes 9.93 Gbits/sec 0 1.02 MBytes
[ 4] 12.00-13.00 sec 917 MBytes 7.69 Gbits/sec 0 1.02 MBytes
[ 4] 13.00-14.00 sec 1.19 GBytes 10.2 Gbits/sec 0 1.02 MBytes
[ 4] 14.00-15.00 sec 1.01 GBytes 8.70 Gbits/sec 0 1.02 MBytes
[ 4] 15.00-16.00 sec 1.20 GBytes 10.3 Gbits/sec 0 1.02 MBytes
[ 4] 16.00-17.00 sec 1.14 GBytes 9.80 Gbits/sec 0 1.02 MBytes
^C[ 4] 17.00-17.60 sec 718 MBytes 10.1 Gbits/sec 0 1.02 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-17.60 sec 18.4 GBytes 8.98 Gbits/sec 0 sender
[ 4] 0.00-17.60 sec 0.00 Bytes 0.00 bits/sec receiver
PiperOrigin-RevId: 245470590
Change-Id: I1c08c5ee8345de6ac070513656a4703312dc3c00
|
|
This CL fixes the following bugs:
- Uses atomic to set/read status instead of binary.LittleEndian.PutUint32 etc
which are not atomic.
- Increments ringOffsets for frames that are truncated (i.e status is
tpStatusCopy)
- Does not ignore frames with tpStatusLost bit set as they are valid frames and
only indicate that there some frames were lost before this one and metrics can
be retrieved with a getsockopt call.
- Adds checks to make sure blockSize is a multiple of page size. This is
required as the kernel allocates in pages per block and rejects sizes that are
not page aligned with an EINVAL.
Updates #210
PiperOrigin-RevId: 244959464
Change-Id: I5d61337b7e4c0f8a3063dcfc07791d4c4521ba1f
|
|
It is possible to create a listening socket which will accept
IPv4 and IPv6 connections. In this case, we set IPv6ProtocolNumber
for all accepted endpoints, even if they handle IPv4 connections.
This means that we can't use endpoint.netProto to set gso.L3HdrLen.
PiperOrigin-RevId: 244227948
Change-Id: I5e1863596cb9f3d216febacdb7dc75651882eef1
|
|
RELNOTES: n/a
PiperOrigin-RevId: 244031742
Change-Id: Id0cdb73194018fb5979e67b58510ead19b5a2b81
|
|
PiperOrigin-RevId: 242704699
Change-Id: I87db368ca343b3b4bf4f969b17d3aa4ce2f8bd4f
|
|
The linux packet socket can handle GSO packets, so we can segment packets to
64K instead of the MTU which is usually 1500.
Here are numbers for the nginx-1m test:
runsc: 579330.01 [Kbytes/sec] received
runsc-gso: 1794121.66 [Kbytes/sec] received
runc: 2122139.06 [Kbytes/sec] received
and for tcp_benchmark:
$ tcp_benchmark --duration 15 --ideal
[ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal
[ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec
$ tcp_benchmark --client --duration 15 --ideal --gso 65536
[ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec
PiperOrigin-RevId: 240809103
Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
|
|
Previous memory allocation was excessive (80 MB). Changed
it to use 2 MB instead. There is no drop in perfomance due
to this change:
ab -n 100 -c 10 http://server/latin10m.txt ==> 10 MB file
80 MB: 178 MB/s
2 MB: 181 MB/s
PiperOrigin-RevId: 238321594
Change-Id: I1c8aed13cad5d75f4506d2b406b305117055fbe5
|
|
HandleLocal is very similar conceptually to MULTICAST_LOOP, so we can unify
the implementations. This has the benefit of making HandleLocal apply even when
the fdbased link endpoint isn't in use.
In addition, move looping logic to route creation so that it doesn't need to be
run for each packet. This should improve performance.
PiperOrigin-RevId: 238099480
Change-Id: I72839f16f25310471453bc9d3fb8544815b25c23
|
|
Also exposes ipv4.MaxTotalSize since it is a generally useful constant.
PiperOrigin-RevId: 235799755
Change-Id: I1fa8d5294bf355acf5527cfdf274b3687d3c8b13
|
|
PACKET_RX_RING allows the use of an mmapped buffer to receive packets from the
kernel. This should cut down the number of host syscalls that need to be made
to receive packets when the underlying fd is a socket of the AF_PACKET type.
PiperOrigin-RevId: 233834998
Change-Id: I8060025c6ced206986e94cc46b8f382b81bfa47f
|
|
PiperOrigin-RevId: 233139020
Change-Id: I2e7089fa25d20e5662eb941054a684d41f5d3e12
|
|
PiperOrigin-RevId: 232937200
Change-Id: I5c3709cc8f1313313ff618a45e48c14a3a111cb4
|
|
Nothing reads them and they can simply get stale.
Generated with:
$ sed -i "s/licenses(\(.*\)).*/licenses(\1)/" **/BUILD
PiperOrigin-RevId: 231818945
Change-Id: Ibc3f9838546b7e94f13f217060d31f4ada9d4bf0
|
|
This should reduce the number of syscalls required to process packets
significantly and improve throughputs.
PiperOrigin-RevId: 231366886
Change-Id: I8b38077262bf9c53176bc4a94b530188d3d7c0ca
|
|
PiperOrigin-RevId: 228979583
Change-Id: I69bd82def48ceb19bc8558c890622b8528d98764
|
|
...to (remote, local), reflecting the (correct) names in the implementation of
DeliverNetworkPacket (see tcpip/stack/nic.go).
Also trim the names in DeliverNetworkPacket and elsewhere to avoid stuttering;
since the type is tcpip.LinkAddress, there's no need to include "LinkAddr" in
the parameter names.
Note that every callsite passes arguments in the order (src, dst).
PiperOrigin-RevId: 221514396
Change-Id: I3637454ad0d6e62a19e4dcbc2a16493798bd0f09
|
|
Replacing map lookups with slice indexing is higher performance.
PiperOrigin-RevId: 219569901
Change-Id: I9b7cd22abd4b95383025edbd5a80d1c1a4496936
|
|
This change also adds extensive testing to the p9 package via mocks. The sanity
checks and type checks are moved from the gofer into the core package, where
they can be more easily validated.
PiperOrigin-RevId: 218296768
Change-Id: I4fc3c326e7bf1e0e140a454cbacbcc6fd617ab55
|
|
PiperOrigin-RevId: 217951017
Change-Id: Ie08bf6987f98467d07457bcf35b5f1ff6e43c035
|
|
Currently, in the face of FileMem fragmentation and a large sendmsg or
recvmsg call, host sockets may pass > 1024 iovecs to the host, which
will immediately cause the host to return EMSGSIZE.
When we detect this case, use a single intermediate buffer to pass to
the kernel, copying to/from the src/dst buffer.
To avoid creating unbounded intermediate buffers, enforce message size
checks and truncation w.r.t. the send buffer size. The same
functionality is added to netstack unix sockets for feature parity.
PiperOrigin-RevId: 216590198
Change-Id: I719a32e71c7b1098d5097f35e6daf7dd5190eff7
|
|
...by increasing the allotted timeout and using direct comparison rather than
reflect.DeepEqual (which should be faster).
PiperOrigin-RevId: 214027024
Change-Id: I0a2690e65c7e14b4cc118c7312dbbf5267dc78bc
|
|
This allows a NetworkDispatcher to implement transparent bridging,
assuming all implementations of LinkEndpoint.WritePacket call eth.Encode
with header.EthernetFields.SrcAddr set to the passed
Route.LocalLinkAddress, if it is provided.
PiperOrigin-RevId: 213686651
Change-Id: I446a4ac070970202f0724ef796ff1056ae4dd72a
|
|
It is the same as buffer.Prependable.View.
PiperOrigin-RevId: 213064166
Change-Id: Ib33b8a2c4da864209d9a0be0a1c113be10b520d3
|
|
PiperOrigin-RevId: 213053370
Change-Id: I60ea89572b4fca53fd126c870fcbde74fcf52562
|
|
PiperOrigin-RevId: 212757571
Change-Id: I04200df9e45c21eb64951cd2802532fa84afcb1a
|
|
Makes it possible to avoid copying or allocating in cases where DeliverNetworkPacket (rx)
needs to turn around and call WritePacket (tx) with its VectorisedView.
Also removes the restriction on having VectorisedViews with multiple views in the write path.
PiperOrigin-RevId: 211728717
Change-Id: Ie03a65ecb4e28bd15ebdb9c69f05eced18fdfcff
|
|
PiperOrigin-RevId: 211525182
Change-Id: I462c20328955c77ecc7bfd8ee803ac91f15858e6
|
|
PiperOrigin-RevId: 211156845
Change-Id: Ie28011d7eb5f45f3a0158dbee2a68c5edf22f6e0
|
|
This CL does NDP link-address discovery for IPv6.
It includes several small changes necessary to get linux to talk to
this implementation. In particular, a hop limit of 255 is necessary
for ICMPv6.
PiperOrigin-RevId: 211103930
Change-Id: If25370ab84c6b1decfb15de917f3b0020f2c4e0e
|
|
PiperOrigin-RevId: 210442599
Change-Id: I9498351f461dc69c77b7f815d526c5693bec8e4a
|
|
PiperOrigin-RevId: 209819644
Change-Id: I329d054bf8f4999e7db0dcd95b13f7793c65d4e2
|
|
The previous use of non-blocking writes could result in corrupt PCAP files if a
partial write occurs. Using (*os.File).Write solves this problem by not
allowing partial writes. This change does not increase allocations (in one path
it actually reduces them), but does add additional copying.
PiperOrigin-RevId: 209652974
Change-Id: I4b1cf2eda4cfd7f237a4245aceb7391b3055a66c
|
|
Closes #94
PiperOrigin-RevId: 207997580
Change-Id: I19b426f1586b5ec12f8b0cd5884d5b401d334924
|
|
Add option to redirect packet back to netstack if it's destined to itself.
This fixes the problem where connecting to the local NIC address would
not work, e.g.:
echo bar | nc -l -p 8080 &
echo foo | nc 192.168.0.2 8080
PiperOrigin-RevId: 207995083
Change-Id: I17adc2a04df48bfea711011a5df206326a1fb8ef
|
|
PiperOrigin-RevId: 206659972
Change-Id: I5e0e035f97743b6525ad36bed2c802791609beaf
|
|
We have been unnecessarily creating too many savable types implicitly.
PiperOrigin-RevId: 206334201
Change-Id: Idc5a3a14bfb7ee125c4f2bb2b1c53164e46f29a8
|
|
PiperOrigin-RevId: 204930314
Change-Id: Ifc4c41ed28616cd57fafbf7c92e87141a945c41f
|
|
PiperOrigin-RevId: 204196916
Change-Id: If632750fc6368acb835e22cfcee0ae55c8a04d16
|
|
PiperOrigin-RevId: 204006237
Change-Id: Ica8402ab54d9dd7d11cc41c6d74aacef51d140b7
|
|
Fixes #27
PiperOrigin-RevId: 203825288
Change-Id: Ie9f3a2b2c1e296b026b024f75c07da1a7e118633
|
|
PiperOrigin-RevId: 203499064
Change-Id: I2cd5189638e94ce926f1e82c1264a8d3ece9dfa5
|
|
Add option to redirect packet back to netstack if it's destined to itself.
This fixes the problem where connecting to the local NIC address would
not work, e.g.:
echo bar | nc -l -p 8080 &
echo foo | nc 192.168.0.2 8080
PiperOrigin-RevId: 203157739
Change-Id: I31c9f7c501e3f55007f25e1852c27893a16ac6c4
|
|
Establishes a way of communicating interface flags between netstack and
epsocket. More flags can be added over time.
PiperOrigin-RevId: 197616669
Change-Id: I230448c5fb5b7d2e8d69b41a451eb4e1096a0e30
|
|
PiperOrigin-RevId: 195035056
Change-Id: Ib32188b62188ddc2059debd52610e7904e1438c6
|
|
PiperOrigin-RevId: 194676372
Change-Id: I2e7fce03e70a45f2a872160c33e05f822ae2d866
|
|
PiperOrigin-RevId: 194583126
Change-Id: Ica1d8821a90f74e7e745962d71801c598c652463
|