Age | Commit message (Collapse) | Author |
|
The implementation is similar to linux where we track the number of bytes
consumed by the application to grow the receive buffer of a given TCP endpoint.
This ensures that the advertised window grows at a reasonable rate to accomodate
for the sender's rate and prevents large amounts of data being held in stack
buffers if the application is not actively reading or not reading fast enough.
The original paper that was used to implement the linux receive buffer auto-
tuning is available @ https://public.lanl.gov/radiant/pubs/drs/lacsi2001.pdf
NOTE: Linux does not implement DRS as defined in that paper, it's just a good
reference to understand the solution space.
Updates #230
PiperOrigin-RevId: 253168283
|
|
All functions which allocate objects containing AtomicRefCounts will soon need
a context.
PiperOrigin-RevId: 253147709
|
|
The deadlock can occur when both ends of a connected Unix socket which has
FIOASYNC enabled on at least one end are closed at the same time. One end
notifies that it is closing, calling (*waiter.Queue).Notify which takes
waiter.Queue.mu (as a read lock) and then calls (*FileAsync).Callback, which
takes FileAsync.mu. The other end tries to unregister for notifications by
calling (*FileAsync).Unregister, which takes FileAsync.mu and calls
(*waiter.Queue).EventUnregister which takes waiter.Queue.mu.
This is fixed by moving the calls to waiter.Waitable.EventRegister and
waiter.Waitable.EventUnregister outside of the protection of any mutex used
in (*FileAsync).Callback.
The new test is related, but does not cover this particular situation.
Also fix a data race on FileAsync.e.Callback. (*FileAsync).Callback checked
FileAsync.e.Callback under the protection of FileAsync.mu, but the waiter
calling (*FileAsync).Callback could not and did not. This is fixed by making
FileAsync.e.Callback immutable before passing it to the waiter for the first
time.
Fixes #346
PiperOrigin-RevId: 253138340
|
|
SO_TYPE was already implemented for everything but netlink sockets.
PiperOrigin-RevId: 253138157
|
|
PiperOrigin-RevId: 253137638
|
|
This can be merged after:
https://github.com/google/gvisor-website/pull/77
or
https://github.com/google/gvisor-website/pull/78
PiperOrigin-RevId: 253132620
|
|
PiperOrigin-RevId: 253122166
|
|
runsc will now set the HOME environment variable as required by POSIX. The
user's home directory is retrieved from the /etc/passwd file located on the
container's file system during boot.
PiperOrigin-RevId: 253120627
|
|
PiperOrigin-RevId: 253096085
|
|
PiperOrigin-RevId: 253061432
|
|
PiperOrigin-RevId: 252918338
|
|
This CL also cleans up the error returned for setting congestion
control which was incorrectly returning EINVAL instead of ENOENT.
PiperOrigin-RevId: 252889093
|
|
PiperOrigin-RevId: 252869983
|
|
PiperOrigin-RevId: 252855280
|
|
'--rootless' flag lets a non-root user execute 'runsc do'.
The drawback is that the sandbox and gofer processes will
run as root inside a user namespace that is mapped to the
caller's user, intead of nobody. And network is defaulted
to '--network=host' inside the root network namespace. On
the bright side, it's very convenient for testing:
runsc --rootless do ls
runsc --rootless do curl www.google.com
PiperOrigin-RevId: 252840970
|
|
For sendfile(2), we propagate a TCP error through the system call layer.
This should be eaten if there is a partial result. This change also adds
a test to ensure that there is no panic in this case, for both TCP sockets
and unix domain sockets.
PiperOrigin-RevId: 252746192
|
|
PiperOrigin-RevId: 252724255
|
|
oh-my-zsh aliases ... to ../.. [1]. Add an explicit reference
to workspace root to work around the alias.
[1] https://github.com/robbyrussell/oh-my-zsh/blob/master/lib/directories.zsh
Fixes #341
PiperOrigin-RevId: 252720590
|
|
Parse annotations containing 'gvisor.dev/spec/mount' that gives
hints about how mounts are shared between containers inside a
pod. This information can be used to better inform how to mount
these volumes inside gVisor. For example, a volume that is shared
between containers inside a pod can be bind mounted inside the
sandbox, instead of being two independent mounts.
For now, this information is used to allow the same tmpfs mounts
to be shared between containers which wasn't possible before.
PiperOrigin-RevId: 252704037
|
|
It prints formatted to the log.
PiperOrigin-RevId: 252699551
|
|
(11:34:09) ERROR: /tmpfs/src/github/repo/runsc/BUILD:82:1: Couldn't build file runsc/version.txt: Executing genrule //runsc:deb-version failed (Broken pipe): bash failed: error executing command
PiperOrigin-RevId: 252691902
|
|
Otherwise it's hard to find a directory for a specific test case.
PiperOrigin-RevId: 252636901
|
|
Adds simple introspection for syscall compatibility information to Linux/AMD64.
Syscalls registered in the syscall table now have associated metadata like
name, support level, notes, and URLs to relevant issues.
Syscall information can be exported as a table, JSON, or CSV using the new
'runsc help syscalls' command. Users can use this info to debug and get info
on the compatibility of the version of runsc they are running or to generate
documentation.
PiperOrigin-RevId: 252558304
|
|
PiperOrigin-RevId: 252501653
|
|
Changes netstack to confirm to current linux behaviour where if the backlog is
full then we drop the SYN and do not send a SYN-ACK. Similarly we allow upto
backlog connections to be in SYN-RCVD state as long as the backlog is not full.
We also now drop a SYN if syn cookies are in use and the backlog for the
listening endpoint is full.
Added new tests to confirm the behaviour.
Also reverted the change to increase the backlog in TcpPortReuseMultiThread
syscall test.
Fixes #236
PiperOrigin-RevId: 252500462
|
|
Store enough information in the kernel socket table to distinguish
between different types of sockets. Previously we were only storing
the socket family, but this isn't enough to classify sockets. For
example, TCPv4 and UDPv4 sockets are both AF_INET, and ICMP sockets
are SOCK_DGRAM sockets with a particular protocol.
Instead of creating more sub-tables, flatten the socket table and
provide a filtering mechanism based on the socket entry.
Also generate and store a socket entry index ("sl" in linux) which
allows us to output entries in a stable order from procfs.
PiperOrigin-RevId: 252495895
|
|
PiperOrigin-RevId: 252124156
|
|
This also ensures BUILD files are correctly formatted.
PiperOrigin-RevId: 251990267
|
|
PiperOrigin-RevId: 251965598
|
|
When set sends log messages to the error log:
sudo ./runsc --logtostderr do ls
I0531 17:59:58.105064 144564 x:0] ***************************
I0531 17:59:58.105087 144564 x:0] Args: [runsc --logtostderr do ls]
I0531 17:59:58.105112 144564 x:0] PID: 144564
I0531 17:59:58.105125 144564 x:0] UID: 0, GID: 0
[...]
PiperOrigin-RevId: 251964377
|
|
Almost (?) all uses of CopyStringIn are via linux.copyInPath(), which
passes maxlen = linux.PATH_MAX = 4096. Pre-allocating a buffer of this
size is measurably inefficient in most cases: most paths will not be
this long, 4 KB is a lot of bytes to zero, and as of this writing the Go
runtime allocator maps only two 4 KB objects to each 8 KB span,
necessitating a call to runtime.mcache.refill() on ~every other call.
Limit the initial buffer size to 256 B instead, and geometrically
reallocate if necessary.
PiperOrigin-RevId: 251960441
|
|
SockType isn't specific to unix domain sockets, and the current
definition basically mirrors the linux ABI's definition.
PiperOrigin-RevId: 251956740
|
|
Moves the build badge to just below the logo and adds the gitter badge next to
it for consistency.
PiperOrigin-RevId: 251956383
|
|
Overlayfs was expecting the parent to exist when bind(2)
was called, which may not be the case. The fix is to copy
the parent directory to the upper layer before binding
the UDS.
There is not good place to add tests for it. Syscall tests
would be ideal, but it's hard to guarantee that the
directory where the socket is created hasn't been touched
before (and thus copied the parent to the upper layer).
Added it to runsc integration tests for now. If it turns
out we have lots of these kind of tests, we can consider
moving them somewhere more appropriate.
PiperOrigin-RevId: 251954156
|
|
We still only advertise a single NUMA node, and ignore mempolicy
accordingly, but mbind() at least now succeeds and has effects reflected
by get_mempolicy().
Also fix handling of nodemasks: round sizes to unsigned long (as
documented and done by Linux), and zero trailing bits when copying them
out.
PiperOrigin-RevId: 251950859
|
|
PiperOrigin-RevId: 251950660
|
|
runsc supports UDS over gofer mounts and tmpfs is
not needed for this test.
PiperOrigin-RevId: 251944870
|
|
This is necessary for implementing network diagnostic interfaces like
/proc/net/{tcp,udp,unix} and sock_diag(7).
For pass-through endpoints such as hostinet, we obtain the socket
state from the backend. For netstack, we add explicit tracking of TCP
states.
PiperOrigin-RevId: 251934850
|
|
PiperOrigin-RevId: 251929314
|
|
PiperOrigin-RevId: 251928000
|
|
PiperOrigin-RevId: 251902567
|
|
Containerd uses the last error message sent to the log to
print as failure cause for create/exec. This required a
few changes in the logging logic for runsc:
- cmd.Errorf/Fatalf: now writes a message with 'error'
level to containerd log, in addition to stderr and
debug logs, like before.
- log.Infof/Warningf/Fatalf: are not sent to containerd
log anymore. They are mostly used for debugging and not
useful to containerd. In most cases, --debug-log is
enabled and this avoids the logs messages from being
duplicated.
- stderr is not used as default log destination anymore.
Some commands assume stdio is for the container/process
running inside the sandbox and it's better to never use
it for logging. By default, logs are supressed now.
PiperOrigin-RevId: 251881815
|
|
This allows an fdbased endpoint to have multiple underlying fd's from which
packets can be read and dispatched/written to.
This should allow for higher throughput as well as better scalability of the
network stack as number of connections increases.
Updates #231
PiperOrigin-RevId: 251852825
|
|
PiperOrigin-RevId: 251788534
|
|
This is required to make the shutdown visible to peers outside the
sandbox.
The readClosed / writeClosed fields were dropped, as they were
preventing a shutdown socket from reading the remainder of queued bytes.
The host syscalls will return the appropriate errors for shutdown.
The control message tests have been split out of socket_unix.cc to make
the (few) remaining tests accessible to testing inherited host UDS,
which don't support sending control messages.
Updates #273
PiperOrigin-RevId: 251763060
|
|
In case of GSO, a segment can container more than one packet
and we need to use the pCount() helper to get a number of packets.
PiperOrigin-RevId: 251743020
|
|
Multicast packets are special in that their destination address does not
identify a specific interface. When sending out such a packet the multicast
address is the remote address, but for incoming packets it is the local
address. Hence, when looping a multicast packet, the route needs to be
tweaked to reflect this.
PiperOrigin-RevId: 251739298
|
|
PiperOrigin-RevId: 251737069
|
|
PiperOrigin-RevId: 251716439
|
|
We don't actually support core dumps, but some applications want to
get/set dumpability, which still has an effect in procfs.
Lack of support for set-uid binaries or fs creds simplifies things a
bit.
As-is, processes started via CreateProcess (i.e., init and sentryctl
exec) have normal dumpability. I'm a bit torn on whether sentryctl exec
tasks should be dumpable, but at least since they have no parent normal
UID/GID checks should protect them.
PiperOrigin-RevId: 251712714
|