summaryrefslogtreecommitdiffhomepage
path: root/pkg
AgeCommit message (Collapse)Author
2021-08-24Measure RTT during handshake since Linux does the sameZeling Feng
Some tcp unit tests are affected by this change: - Some retransmission tests assumed RTO=1s when connection is established. This is no longer true because minRTO was set to 3s in tests so now RTO becomes 3s after the first updateRTO call. Set minRTO=1s for these tests. - Some RACK enabled tests are affected because now that RTT is initialized, and the estimated RTT is quite small, spurious TLP might be sent out and causing flakes, introduce an artificial delay for these tests so that the estimated RTT is larger. PiperOrigin-RevId: 392768725
2021-08-23Merge pull request #6491 from avagin:kvm-mem-slot-overlapgVisor bot
PiperOrigin-RevId: 392554743
2021-08-23Internal change.Chong Cai
PiperOrigin-RevId: 392523879
2021-08-21platform/kvm: set physical slots without overlappingAndrei Vagin
Right now, the first slot starts with an address of a memory region and its size is faultBlockSize, but the second slot starts with (physicalStart + faultBlockSize) & faultBlockMask. It means they will overlap if a start address of a memory region are not aligned to faultBlockSize. The kernel doesn't allow to add overlapped regions, but we ignore the EEXIST error. Signed-off-by: Andrei Vagin <avagin@google.com>
2021-08-20Allow gofer.specialFileFDs to be mmapped with a host FD.Jamie Liu
PiperOrigin-RevId: 392102898
2021-08-20[op] Prevent file leak in MultiGetAttr's error path.Ayush Ranjan
The old implementation was mostly correct but error prone - making way for the issue in question here. In its error path, it would leak the intermediate file being walked. Each return/break needed explicit cleanup. This change implements a more clean way to cleaning up intermediate directories. If the code were to evolve to be more complex, it would still work. PiperOrigin-RevId: 392102826
2021-08-20Fix lock ordering violation introduced in cl/347704347.Nicolas Lacasse
We cannot hold mm.aioManager.mu while calling MUnmap, because MUnmap attempts to aquire mm.mappingMu. This violates the lock order as documented in mm/mm.go. PiperOrigin-RevId: 392102472
2021-08-20Remove experimental warning in the VFS2 README.Jamie Liu
PiperOrigin-RevId: 392078690
2021-08-19Cache verity dentriesChong Cai
Add an LRU cache to cache verity dentries when ref count drop to 0. This way we don't need to hash and verify the previous opened files or directories each time. PiperOrigin-RevId: 391880157
2021-08-19Merge Read calls in verity merkle treeChong Cai
Read all data into memory in one Read call and verify them block by block instead of read each block during verification. This is for performance purpose to avoid invoking multiple syscalls. PiperOrigin-RevId: 391877937
2021-08-19Use MM-mapped I/O instead of buffered copies in gofer.specialFileFD.Jamie Liu
The rationale given for using buffered copies is still valid, but it's unclear whether holding MM locks or allocating buffers is better in practice, and the former is at least consistent with gofer.regularFileFD (and VFS1), making performance easier to reason about. PiperOrigin-RevId: 391877913
2021-08-19Add loopback interface as an ethernet-based deviceGhanan Gowripalan
...to match Linux behaviour. We can see evidence of Linux representing loopback as an ethernet-based device below: ``` # EUI-48 based MAC addresses. $ ip link show lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # tcpdump showing ethernet frames when sniffing loopback and logging the # link-type as EN10MB (Ethernet). $ sudo tcpdump -i lo -e -c 2 -n tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 03:09:05.002034 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.9557 > 127.0.0.1.36828: Flags [.], ack 3562800815, win 15342, options [nop,nop,TS val 843174495 ecr 843159493], length 0 03:09:05.002094 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 66: 127.0.0.1.36828 > 127.0.0.1.9557: Flags [.], ack 1, win 6160, options [nop,nop,TS val 843174496 ecr 843159493], length 0 2 packets captured 116 packets received by filter 0 packets dropped by kernel ``` Wireshark shows a similar result as the tcpdump example above. Linux's loopback setup: https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/drivers/net/loopback.c#L162 PiperOrigin-RevId: 391836719
2021-08-19Use a hash function to generate tcp timestamp offsetZeling Feng
Also fix an option parsing error in checker.TCPTimestampChecker while I am here. PiperOrigin-RevId: 391828329
2021-08-18Split TCP secrets from Stack to tcp.protocolZeling Feng
Use different secrets for different purposes (port picking, ISN generation, tsOffset generation) and moved the secrets from stack.Stack to tcp.protocol. PiperOrigin-RevId: 391641238
2021-08-18Add control configsChong Cai
Also plumber the controls through runsc PiperOrigin-RevId: 391594318
2021-08-17Merge pull request #6262 from sudo-sturbia:msgqueue/syscalls3gVisor bot
PiperOrigin-RevId: 391416650
2021-08-17Implement stub for msgctl(2).Zyad A. Ali
Add support for msgctl and enable tests. Fixes #135
2021-08-17Implement control operations on msgqueue.Zyad A. Ali
For IPCInfo, update value of MSGSEG constant in abi to avoid overflow in MsgInfo.MsgSeg. MSGSEG was originaly simplified in abi, and is unused (by us and within the kernel), so updating it is okay. Updates #135
2021-08-17Implement ipc.Object.Set and use it in ipc mechanisms.Zyad A. Ali
Set provides functionality of {sem,shm,msg}ctl(IPC_SET).
2021-08-13[syserror] Remove pkg syserror.Zach Koopmans
Removes package syserror and moves still relevant code to either linuxerr or to syserr (to be later removed). Internal errors are converted from random types to *errors.Error types used in linuxerr. Internal errors are in linuxerr/internal.go. PiperOrigin-RevId: 390724202
2021-08-13Add Event controlsChong Cai
Add Event controls and implement "stream" commands. PiperOrigin-RevId: 390691702
2021-08-13Free multicastMemberships on UDP endpoint closeGhanan Gowripalan
tcpip.Endpoint.Close is documented to free all resources associated with an endpoint so we don't need to create an empty map to clear the multicast memberships. PiperOrigin-RevId: 390609826
2021-08-12Add Usage controlsChong Cai
Add Usage controls and implement "usage/usagefd" commands. PiperOrigin-RevId: 390507423
2021-08-12[syserror] Convert remaining syserror definitions to linuxerr.Zach Koopmans
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr. PiperOrigin-RevId: 390471763
2021-08-12fix typoKevin Krakauer
PiperOrigin-RevId: 390463819
2021-08-12Internal change.gVisor bot
PiperOrigin-RevId: 390318725
2021-08-12Add support for TCP send buffer auto tuning.Nayana Bidari
Send buffer size in TCP indicates the amount of bytes available for the sender to transmit. This change will allow TCP to update the send buffer size when - TCP enters established state. - ACK is received. The auto tuning is disabled when the send buffer size is set with the SO_SNDBUF option. PiperOrigin-RevId: 390312274
2021-08-11[op] Make PacketBuffer Clone() do a deeper copy.Ayush Ranjan
Earlier PacketBuffer.Clone() would do a shallow top level copy of the packet buffer - which involved sharing the *buffer.Buffer between packets. Reading or writing to the buffer in one packet would impact the other. This caused modifications in one packet to affect the other's pkt.Views() which is not desired. Change the clone to do a deeper copy of the underlying buffer list and buffer pointers. The payload buffers (which are immutable) are still shared. This change makes the Clone() operation more expensive as we now need to allocate the entire buffer list. Added unit test to test integrity of packet data after cloning. Reported-by: syzbot+7ffff9a82a227b8f2e31@syzkaller.appspotmail.com Reported-by: syzbot+7d241de0d9072b2b6075@syzkaller.appspotmail.com Reported-by: syzbot+212bc4d75802fa461521@syzkaller.appspotmail.com PiperOrigin-RevId: 390277713
2021-08-11Do not clear merkle files when creating dentryChong Cai
The dentry for each file/directory can be created/destroyed multiple times during sandbox lifetime. We should not clear the Merkle file each time a dentry is created. PiperOrigin-RevId: 390277107
2021-08-11Popluate verity directory children namesChong Cai
We were relying on children adding its name to parent's dentry to populate parent's children list. However, this may not work since the parent dentry could be destroyed if its reference count drops to zero. In that case, a new dentry will be created when enabling the parent and it does not contain the children names info. Therefore we need to populate the child names list again to avoid missing children in the directory. PiperOrigin-RevId: 390270227
2021-08-11Initial cgroupfs support for subcontainersRahat Mahmood
Allow creation and management of subcontainers through cgroupfs directory syscalls. Also add a mechanism to specify a default root container to start new jobs in. This implements the filesystem support for subcontainers, but doesn't implement hierarchical resource accounting or task migration. PiperOrigin-RevId: 390254870
2021-08-09platform/kvm: fix a race condition in vCPU.unlock()Andrei Vagin
Right now, it contains the code: origState := atomic.LoadUint32(&c.state) atomicbitops.AndUint32(&c.state, ^vCPUUser) The problem here is that vCPU.bounce that is called from another thread can add vCPUWaiter when origState has been read but vCPUUser isn't cleared yet. In this case, vCPU.unlock doesn't notify other threads about changes and c.bounce will be stuck in the futex_wait call. PiperOrigin-RevId: 389697411
2021-08-05Correctly handle interruptions in blocking msgqueue syscalls.Rahat Mahmood
Reported-by: syzbot+63bde04529f701c76168@syzkaller.appspotmail.com Reported-by: syzbot+69866b9a16ec29993e6a@syzkaller.appspotmail.com PiperOrigin-RevId: 389084629
2021-08-05Bump gVisor build tags to go1.19Michael Pratt
Go's dev.typeparams branch already claims to be Go 1.18, so our !go1.18 build tags breaking testing gVisor with that branch. Normally I would not want to bump the build tags this early, but I plan to extend checklinkname to check the assumptions in these files and remove the build tags ASAP. So we just go ahead and bump the tags until then to unblock testing. PiperOrigin-RevId: 389037239
2021-08-05Automated rollback of changelist 384508720Kevin Krakauer
PiperOrigin-RevId: 389035388
2021-08-05Merge pull request #6372 from avagin:AlignedAtomicgVisor bot
PiperOrigin-RevId: 388985968
2021-08-04Reduce overhead of AlignedAtomic typesAndrei Vagin
AlignedAtomicUint64 is 15 bytes and it takes 16 bytes in structures. On 32-bit systems, variables and structure fields is guaranteed to be 32-bit aligned and this means that we need only 12 bytes to find 8 contiguous bytes.
2021-08-04Implement PR_SET_CHILD_SUBREAPER when the calling task is PID 1.Nicolas Lacasse
In this case, the task is already a subreaper, so setting this bit is a noop. Updates #2323 PiperOrigin-RevId: 388828034
2021-08-04Add Fs controlsChong Cai
Add Fs controls and implement "cat" command. PiperOrigin-RevId: 388812540
2021-08-03Add Lifecycle controlsChong Cai
Also change runsc pause/resume cmd to access Lifecycle instead of containerManager. PiperOrigin-RevId: 388534928
2021-08-03Implement MSG_COPY option for msgrcv(2).Zyad A. Ali
Implement Queue.Copy and add more tests for it. Updates #135
2021-08-03Implement stubs for msgsnd(2) and msgrcv(2).Zyad A. Ali
Add support for msgsnd and msgrcv and enable syscall tests. Updates #135
2021-08-03Implement Queue.Receive.Zyad A. Ali
Receive implements the behaviour of msgrcv(2) without the MSG_COPY flag. Updates #135
2021-08-03Implement Queue.Send.Zyad A. Ali
Send implements the functionality of msgsnd(2). Updates #135
2021-08-01Merge pull request #6350 from sudo-sturbia:cgroupfsgVisor bot
PiperOrigin-RevId: 388129112
2021-07-30Support RTM_DELLINKZeling Feng
This change will allow us to remove the default link in a packetimpact test so we can reduce indeterministic behaviors as required in https://fxbug.dev/78430. This will also help with testing #1388. Updates #578, #1388. PiperOrigin-RevId: 387896847
2021-07-30Merge pull request #6257 from zhlhahaha:2193-1gVisor bot
PiperOrigin-RevId: 387885663
2021-07-30checklinkname: rudimentary type-checking of linkname directivesMichael Pratt
This CL introduces a 'checklinkname' analyzer, which provides rudimentary type-checking that verifies that function signatures on the local and remote sides of //go:linkname directives match expected values. If the Go standard library changes the definitions of any of these function, checklinkname will flag the change as a finding, providing an error informing the gVisor team to adapt to the upstream changes. This allows us to eliminate the majority of gVisor's forward-looking negative build tags, as we can catch mismatches in testing [1]. The remaining forward-looking negative build tags are covering shared struct definitions, which I hope to add to checklinkname in a future CL. [1] Of course, semantics/requirements can change without the signature changing, so we still must be careful, but this covers the common case. PiperOrigin-RevId: 387873847
2021-07-28Explicitly encode the pcap packet headers to reduce CPU cost of pcap generation.gVisor bot
PiperOrigin-RevId: 387513118
2021-07-28Add Uid/Gid/Groups fields to VFS2 /proc/[pid]/status.Jamie Liu
For comparison: ``` $ docker run --rm -it ubuntu:focal bash -c 'cat /proc/self/status' Name: cat Umask: 0022 State: R (running) Tgid: 1 Ngid: 0 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 1 NSpid: 1 NSpgid: 1 NSsid: 1 VmPeak: 2660 kB VmSize: 2660 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 528 kB VmRSS: 528 kB ... $ docker run --runtime=runsc-vfs2 --rm -it ubuntu:focal bash -c 'cat /proc/self/status' Name: cat State: R (running) Tgid: 1 Pid: 1 PPid: 0 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 4 Groups: VmSize: 10708 kB VmRSS: 3124 kB VmData: 316 kB ... ``` Fixes #6374 PiperOrigin-RevId: 387465655