Age | Commit message (Collapse) | Author |
|
This change adds explicit support for honoring the 2MSL timeout
for sockets in TIME_WAIT state. It also adds support for the
TCP_LINGER2 option that allows modification of the FIN_WAIT2
state timeout duration for a given socket.
It also adds an option to modify the Stack wide TIME_WAIT timeout
but this is only for testing. On Linux this is fixed at 60s.
Further, we also now correctly process RST's in CLOSE_WAIT and
close the socket similar to linux without moving it to error
state.
We also now handle SYN in ESTABLISHED state as per
RFC5961#section-4.1. Earlier we would just drop these SYNs.
Which can result in some tests that pass on linux to fail on
gVisor.
Netstack now honors TIME_WAIT correctly as well as handles the
following cases correctly.
- TCP RSTs in TIME_WAIT are ignored.
- A duplicate TCP FIN during TIME_WAIT extends the TIME_WAIT
and a dup ACK is sent in response to the FIN as the dup FIN
indicates potential loss of the original final ACK.
- An out of order segment during TIME_WAIT generates a dup ACK.
- A new SYN w/ a sequence number > the highest sequence number
in the previous connection closes the TIME_WAIT early and
opens a new connection.
Further to make the SYN case work correctly the ISN (Initial
Sequence Number) generation for Netstack has been updated to
be as per RFC. Its not a pure random number anymore and follows
the recommendation in https://tools.ietf.org/html/rfc6528#page-3.
The current hash used is not a cryptographically secure hash
function. A separate change will update the hash function used
to Siphash similar to what is used in Linux.
PiperOrigin-RevId: 279106406
|
|
NETLINK_KOBJECT_UEVENT sockets send udev-style messages for device events.
gVisor doesn't have any device events, so our sockets don't need to do anything
once created.
systemd's device manager needs to be able to create one of these sockets. It
also wants to install a BPF filter on the socket. Since we'll never send any
messages, the filter would never be invoked, thus we just fake it out.
Fixes #1117
Updates #1119
PiperOrigin-RevId: 278405893
|
|
Since we only supporting sending messages from the kernel, the peer is always
the kernel, simplifying handling.
There are currently no known users of SO_PASSCRED that would actually receive
messages from gVisor, but adding full support is barely more work than stubbing
out fake support.
Updates #1117
Fixes #1119
PiperOrigin-RevId: 277981465
|
|
In the future this will replace DanglingEndpoints. DanglingEndpoints must be
kept for now due to issues with save/restore.
This is arguably a cleaner design and allows the stack to know which transport
endpoints might still be using its link endpoints.
Updates #837
PiperOrigin-RevId: 277386633
|
|
DelayOption is set on all new endpoints in gVisor.
PiperOrigin-RevId: 276746791
|
|
PiperOrigin-RevId: 276380008
|
|
Like (AF_INET, SOCK_RAW) sockets, AF_PACKET sockets require CAP_NET_RAW. With
runsc, you'll need to pass `--net-raw=true` to enable them.
Binding isn't supported yet.
PiperOrigin-RevId: 275909366
|
|
PiperOrigin-RevId: 275139066
|
|
PiperOrigin-RevId: 275114157
|
|
Netstack has its own stats, we use this to fill /proc/net/snmp.
Note that some metrics are not recorded in Netstack, which will be shown
as 0 in the proc file.
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ie0089184507d16f49bc0057b4b0482094417ebe1
|
|
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
For hostinet, we inherit the data from host procfs. To to that, we
cache the fds for these files for later reads.
Fixes #506
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: I2f81215477455b9c59acf67e33f5b9af28ee0165
|
|
PiperOrigin-RevId: 274700093
|
|
This allows for peeking at the length of the next message on a netlink socket
without pulling it off the socket's buffer/queue, allowing tools like 'ip' to
work.
This CL also fixes an issue where dump_done_errno was not included in the
NLMSG_DONE messages payload.
Issue #769
PiperOrigin-RevId: 274068637
|
|
Strengthen the header.IPv4.IsValid check to correctly check
for IHL/TotalLength fields. Also add a check to make sure
fragmentOffsets + size of the fragment do not cause a wrap
around for the end of the fragment.
PiperOrigin-RevId: 274049313
|
|
PiperOrigin-RevId: 273861936
|
|
Also change the default TTL to 64 to match Linux.
PiperOrigin-RevId: 273430341
|
|
PiperOrigin-RevId: 273365058
|
|
PiperOrigin-RevId: 271644926
|
|
PiperOrigin-RevId: 271442321
|
|
PiperOrigin-RevId: 270763208
|
|
PiperOrigin-RevId: 270680704
|
|
PiperOrigin-RevId: 270114317
|
|
This also allows the tee(2) implementation to be enabled, since dup can now be
properly supported via WriteTo.
Note that this change necessitated some minor restructoring with the
fs.FileOperations splice methods. If the *fs.File is passed through directly,
then only public API methods are accessible, which will deadlock immediately
since the locking is already done by fs.Splice. Instead, we pass through an
abstract io.Reader or io.Writer, which elide locks and use the underlying
fs.FileOperations directly.
PiperOrigin-RevId: 268805207
|
|
They are no-ops, so the standard rule works fine.
PiperOrigin-RevId: 268776264
|
|
Ioctl was returning just the buffer size from epsocket.endpoint
and it was not considering data from epsocket.SocketOperations
that was read from the endpoint, but not yet sent to the caller.
PiperOrigin-RevId: 266485461
|
|
PiperOrigin-RevId: 266229756
|
|
For SOCK_STREAM type unix socket, we shall return ECONNRESET if peer is
closed with data not read.
We explictly set a flag when closing one end, to differentiate from
just shutdown (where zero shall be returned).
Fixes: #735
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
Previously, recvmsg() on a unix stream socket with its peer closed will
never return, with goroutine call trace like this:
...
2 in gvisor.dev/gvisor/pkg/sentry/kernel.(*Task).block
at pkg/sentry/kernel/task_block.go:124
3 in gvisor.dev/gvisor/pkg/sentry/kernel.(*Task).BlockWithDeadline
at pkg/sentry/kernel/task_block.go:69
4 in gvisor.dev/gvisor/pkg/sentry/socket/unix.(*SocketOperations).RecvMsg
at pkg/sentry/socket/unix/unix.go:612
5 in gvisor.dev/gvisor/pkg/sentry/syscalls/linux.recvFrom
at pkg/sentry/syscalls/linux/sys_socket.go:885
6 in gvisor.dev/gvisor/pkg/sentry/syscalls/linux.RecvFrom
at pkg/sentry/syscalls/linux/sys_socket.go:910
...
The issue is caused by that ErrClosedForReceive returned by
unix/transport.queue is turned into nil in
unix.(*EndpointReader).ReadToBlocks():
err.ToError()
As a result, in unix.(*SocketOperations).RecvMsg():
n == 0 and err == nil
We shall differentiate it from another case - no data to read where
ErrWouldBlock shall be returned; and return 0 immediately.
Fixes: #734
Reported-by: chenglang.hy <chenglang.hy@antfin.com>
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
|
|
This is the first step in replacing some of the redundant types with the
standard library equivalents.
PiperOrigin-RevId: 264706552
|
|
We wrongly parses output interface as gateway address.
The fix is straightforward.
Fixes #638
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ia4bab31f3c238b0278ea57ab22590fad00eaf061
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gvisor/pull/684 from tanjianfeng:fix-638 b940e810367ad1273519bfa594f4371bdd293e83
PiperOrigin-RevId: 264211336
|
|
PiperOrigin-RevId: 264180125
|
|
Linux allows to call connect for ANY and the zero port.
PiperOrigin-RevId: 263892534
|
|
This is in accordance with newer parts of the standard library.
PiperOrigin-RevId: 263449916
|
|
SendMsg before this change would copy all the data over into a
new slice even if the underlying socket could only accept a
small amount of data. This is really inefficient with non-blocking
sockets and under high throughput where large writes could get
ErrWouldBlock or if there was say a timeout associated with the sendmsg()
syscall.
With this change we delay copying bytes in till they are needed and only
copy what can be potentially sent/held in the socket buffer. Reducing
the need to repeatedly copy data over.
Also a minor fix to change state FIN-WAIT-1 when shutdown(..., SHUT_WR) is called
instead of when we transmit the actual FIN. Otherwise the socket could remain in
CONNECTED state even though the user has called shutdown() on the socket.
Updates #627
PiperOrigin-RevId: 263430505
|
|
Now if a process sends an unsupported netlink requests,
an error is returned from the send system call.
The linux kernel works differently in this case. It returns errors in the
nlmsgerr netlink message.
Reported-by: syzbot+571d99510c6f935202da@syzkaller.appspotmail.com
PiperOrigin-RevId: 262690453
|
|
Previously we were representing socket addresses as an interface{},
which allowed any type which could be binary.Marshal()ed to be used as
a socket address. This is fine when the address is passed to userspace
via the linux ABI, but is problematic when used from within the sentry
such as by networking procfs files.
PiperOrigin-RevId: 262460640
|
|
Endpoint protocol goroutines were previously started as part of
loading the endpoint. This is potentially too soon, as resources used
by these goroutine may not have been loaded. Protocol goroutines may
perform meaningful work as soon as they're started (ex: incoming
connect) which can cause them to indirectly access resources that
haven't been loaded yet.
This CL defers resuming all protocol goroutines until the end of
restore.
PiperOrigin-RevId: 262409429
|
|
PiperOrigin-RevId: 262402929
|
|
syscall.EPOLLET has been defined with different values on amd64 and
arm64(-0x80000000 on amd64, and 0x80000000 on arm64), while unix.EPOLLET
has been unified this value to 0x80000000(golang/go#5328). ref #63
Signed-off-by: Haibo Xu <haibo.xu@arm.com>
Change-Id: Id97d075c4e79d86a2ea3227ffbef02d8b00ffbb8
|
|
PiperOrigin-RevId: 261413396
|
|
PiperOrigin-RevId: 261373749
|
|
Export some readily-available fields for TCP_INFO and stub out the rest.
PiperOrigin-RevId: 261191548
|
|
Implements support for RTM_GETROUTE requests for netlink sockets.
Fixes #507
PiperOrigin-RevId: 261051045
|
|
This allows the user code to add a network address with a subnet prefix length.
The prefix length value is stored in the network endpoint and provided back to
the user in the ProtocolAddress type.
PiperOrigin-RevId: 259807693
|
|
PiperOrigin-RevId: 258859507
|
|
tcpdump creates these.
PiperOrigin-RevId: 258611829
|
|
This proc file reports the stats of interfaces. We could use ifconfig
command to check the result.
Signed-off-by: Jianfeng Tan <henry.tjf@antfin.com>
Change-Id: Ia7c1e637f5c76c30791ffda68ee61e861b6ef827
COPYBARA_INTEGRATE_REVIEW=https://gvisor-review.googlesource.com/c/gvisor/+/18282/
PiperOrigin-RevId: 258303936
|
|
iptables also relies on IPPROTO_RAW in a way. It opens such a socket to
manipulate the kernel's tables, but it doesn't actually use any of the
functionality. Blegh.
PiperOrigin-RevId: 257903078
|
|
Adds support to set/get the TCP_MAXSEG value but does not
really change the segment sizes emitted by netstack or
alter the MSS advertised by the endpoint. This is currently
being added only to unblock iperf3 on gVisor. Plumbing
this correctly requires a bit more work which will come
in separate CLs.
PiperOrigin-RevId: 257859112
|