Age | Commit message (Collapse) | Author |
|
Some tcp unit tests are affected by this change:
- Some retransmission tests assumed RTO=1s when connection is established. This
is no longer true because minRTO was set to 3s in tests so now RTO becomes 3s
after the first updateRTO call. Set minRTO=1s for these tests.
- Some RACK enabled tests are affected because now that RTT is initialized, and
the estimated RTT is quite small, spurious TLP might be sent out and causing
flakes, introduce an artificial delay for these tests so that the estimated
RTT is larger.
PiperOrigin-RevId: 392768725
|
|
PiperOrigin-RevId: 392554743
|
|
PiperOrigin-RevId: 392523879
|
|
Right now, the first slot starts with an address of a memory region and its size is faultBlockSize,
but the second slot starts with (physicalStart + faultBlockSize) & faultBlockMask.
It means they will overlap if a start address of a memory region are not aligned to faultBlockSize.
The kernel doesn't allow to add overlapped regions, but we ignore the EEXIST error.
Signed-off-by: Andrei Vagin <avagin@google.com>
|
|
PiperOrigin-RevId: 392102898
|
|
The old implementation was mostly correct but error prone - making way for the
issue in question here. In its error path, it would leak the intermediate file
being walked. Each return/break needed explicit cleanup.
This change implements a more clean way to cleaning up intermediate directories.
If the code were to evolve to be more complex, it would still work.
PiperOrigin-RevId: 392102826
|
|
We cannot hold mm.aioManager.mu while calling MUnmap, because MUnmap attempts
to aquire mm.mappingMu. This violates the lock order as documented in mm/mm.go.
PiperOrigin-RevId: 392102472
|
|
PiperOrigin-RevId: 392078690
|
|
Add an LRU cache to cache verity dentries when ref count drop to 0. This
way we don't need to hash and verify the previous opened files or
directories each time.
PiperOrigin-RevId: 391880157
|
|
Read all data into memory in one Read call and verify them block by
block instead of read each block during verification. This is for
performance purpose to avoid invoking multiple syscalls.
PiperOrigin-RevId: 391877937
|
|
The rationale given for using buffered copies is still valid, but it's unclear
whether holding MM locks or allocating buffers is better in practice, and the
former is at least consistent with gofer.regularFileFD (and VFS1), making
performance easier to reason about.
PiperOrigin-RevId: 391877913
|
|
...to match Linux behaviour.
We can see evidence of Linux representing loopback as an ethernet-based
device below:
```
# EUI-48 based MAC addresses.
$ ip link show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# tcpdump showing ethernet frames when sniffing loopback and logging the
# link-type as EN10MB (Ethernet).
$ sudo tcpdump -i lo -e -c 2 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0
03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0
2 packets captured
116 packets received by filter
0 packets dropped by kernel
```
Wireshark shows a similar result as the tcpdump example above.
Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162
PiperOrigin-RevId: 391836719
|
|
Also fix an option parsing error in checker.TCPTimestampChecker while I am here.
PiperOrigin-RevId: 391828329
|
|
Use different secrets for different purposes (port picking,
ISN generation, tsOffset generation) and moved the secrets
from stack.Stack to tcp.protocol.
PiperOrigin-RevId: 391641238
|
|
Also plumber the controls through runsc
PiperOrigin-RevId: 391594318
|
|
PiperOrigin-RevId: 391416650
|
|
Add support for msgctl and enable tests.
Fixes #135
|
|
For IPCInfo, update value of MSGSEG constant in abi to avoid overflow in
MsgInfo.MsgSeg. MSGSEG was originaly simplified in abi, and is unused
(by us and within the kernel), so updating it is okay.
Updates #135
|
|
Set provides functionality of {sem,shm,msg}ctl(IPC_SET).
|
|
Removes package syserror and moves still relevant code to either linuxerr
or to syserr (to be later removed).
Internal errors are converted from random types to *errors.Error types used
in linuxerr. Internal errors are in linuxerr/internal.go.
PiperOrigin-RevId: 390724202
|
|
Add Event controls and implement "stream" commands.
PiperOrigin-RevId: 390691702
|
|
tcpip.Endpoint.Close is documented to free all resources associated
with an endpoint so we don't need to create an empty map to clear
the multicast memberships.
PiperOrigin-RevId: 390609826
|
|
Add Usage controls and implement "usage/usagefd" commands.
PiperOrigin-RevId: 390507423
|
|
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr.
PiperOrigin-RevId: 390471763
|
|
PiperOrigin-RevId: 390463819
|
|
PiperOrigin-RevId: 390318725
|
|
Send buffer size in TCP indicates the amount of bytes available for the sender
to transmit. This change will allow TCP to update the send buffer size when
- TCP enters established state.
- ACK is received.
The auto tuning is disabled when the send buffer size is set with the
SO_SNDBUF option.
PiperOrigin-RevId: 390312274
|
|
Earlier PacketBuffer.Clone() would do a shallow top level copy of the packet
buffer - which involved sharing the *buffer.Buffer between packets. Reading
or writing to the buffer in one packet would impact the other.
This caused modifications in one packet to affect the other's pkt.Views() which
is not desired. Change the clone to do a deeper copy of the underlying buffer
list and buffer pointers. The payload buffers (which are immutable) are still
shared. This change makes the Clone() operation more expensive as we now need to
allocate the entire buffer list.
Added unit test to test integrity of packet data after cloning.
Reported-by: syzbot+7ffff9a82a227b8f2e31@syzkaller.appspotmail.com
Reported-by: syzbot+7d241de0d9072b2b6075@syzkaller.appspotmail.com
Reported-by: syzbot+212bc4d75802fa461521@syzkaller.appspotmail.com
PiperOrigin-RevId: 390277713
|
|
The dentry for each file/directory can be created/destroyed multiple
times during sandbox lifetime. We should not clear the Merkle file each
time a dentry is created.
PiperOrigin-RevId: 390277107
|
|
We were relying on children adding its name to parent's dentry to
populate parent's children list. However, this may not work since the
parent dentry could be destroyed if its reference count drops to zero.
In that case, a new dentry will be created when enabling the parent and
it does not contain the children names info. Therefore we need to
populate the child names list again to avoid missing children in the
directory.
PiperOrigin-RevId: 390270227
|
|
Allow creation and management of subcontainers through cgroupfs
directory syscalls. Also add a mechanism to specify a default root
container to start new jobs in.
This implements the filesystem support for subcontainers, but doesn't
implement hierarchical resource accounting or task migration.
PiperOrigin-RevId: 390254870
|
|
Right now, it contains the code:
origState := atomic.LoadUint32(&c.state)
atomicbitops.AndUint32(&c.state, ^vCPUUser)
The problem here is that vCPU.bounce that is called from another thread can add
vCPUWaiter when origState has been read but vCPUUser isn't cleared yet. In this
case, vCPU.unlock doesn't notify other threads about changes and c.bounce will
be stuck in the futex_wait call.
PiperOrigin-RevId: 389697411
|
|
Reported-by: syzbot+63bde04529f701c76168@syzkaller.appspotmail.com
Reported-by: syzbot+69866b9a16ec29993e6a@syzkaller.appspotmail.com
PiperOrigin-RevId: 389084629
|
|
Go's dev.typeparams branch already claims to be Go 1.18, so our !go1.18 build
tags breaking testing gVisor with that branch.
Normally I would not want to bump the build tags this early, but I plan to
extend checklinkname to check the assumptions in these files and remove the
build tags ASAP. So we just go ahead and bump the tags until then to unblock
testing.
PiperOrigin-RevId: 389037239
|
|
PiperOrigin-RevId: 389035388
|
|
PiperOrigin-RevId: 388985968
|
|
AlignedAtomicUint64 is 15 bytes and it takes 16 bytes in structures. On
32-bit systems, variables and structure fields is guaranteed to be
32-bit aligned and this means that we need only 12 bytes to find 8
contiguous bytes.
|
|
In this case, the task is already a subreaper, so setting this bit is a noop.
Updates #2323
PiperOrigin-RevId: 388828034
|
|
Add Fs controls and implement "cat" command.
PiperOrigin-RevId: 388812540
|
|
Also change runsc pause/resume cmd to access Lifecycle instead of
containerManager.
PiperOrigin-RevId: 388534928
|
|
Implement Queue.Copy and add more tests for it.
Updates #135
|
|
Add support for msgsnd and msgrcv and enable syscall tests.
Updates #135
|
|
Receive implements the behaviour of msgrcv(2) without the MSG_COPY flag.
Updates #135
|
|
Send implements the functionality of msgsnd(2).
Updates #135
|
|
PiperOrigin-RevId: 388129112
|
|
This change will allow us to remove the default link in a packetimpact test so
we can reduce indeterministic behaviors as required in https://fxbug.dev/78430.
This will also help with testing #1388.
Updates #578, #1388.
PiperOrigin-RevId: 387896847
|
|
PiperOrigin-RevId: 387885663
|
|
This CL introduces a 'checklinkname' analyzer, which provides rudimentary
type-checking that verifies that function signatures on the local and remote
sides of //go:linkname directives match expected values.
If the Go standard library changes the definitions of any of these function,
checklinkname will flag the change as a finding, providing an error informing
the gVisor team to adapt to the upstream changes. This allows us to eliminate
the majority of gVisor's forward-looking negative build tags, as we can catch
mismatches in testing [1].
The remaining forward-looking negative build tags are covering shared struct
definitions, which I hope to add to checklinkname in a future CL.
[1] Of course, semantics/requirements can change without the signature
changing, so we still must be careful, but this covers the common case.
PiperOrigin-RevId: 387873847
|
|
PiperOrigin-RevId: 387513118
|
|
For comparison:
```
$ docker run --rm -it ubuntu:focal bash -c 'cat /proc/self/status'
Name: cat
Umask: 0022
State: R (running)
Tgid: 1
Ngid: 0
Pid: 1
PPid: 0
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups:
NStgid: 1
NSpid: 1
NSpgid: 1
NSsid: 1
VmPeak: 2660 kB
VmSize: 2660 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 528 kB
VmRSS: 528 kB
...
$ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal bash -c 'cat /proc/self/status'
Name: cat
State: R (running)
Tgid: 1
Pid: 1
PPid: 0
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 4
Groups:
VmSize: 10708 kB
VmRSS: 3124 kB
VmData: 316 kB
...
```
Fixes #6374
PiperOrigin-RevId: 387465655
|