summaryrefslogtreecommitdiffhomepage
path: root/pkg
AgeCommit message (Collapse)Author
2020-02-20Initial network namespace support.gVisor bot
TCP/IP will work with netstack networking. hostinet doesn't work, and sockets will have the same behavior as it is now. Before the userspace is able to create device, the default loopback device can be used to test. /proc/net and /sys/net will still be connected to the root network stack; this is the same behavior now. Issue #1833 PiperOrigin-RevId: 296309389
2020-02-20Support disabling a NICgVisor bot
- Disabled NICs will have their associated NDP state cleared. - Disabled NICs will not accept incoming packets. - Writes through a Route with a disabled NIC will return an invalid endpoint state error. - stack.Stack.FindRoute will not return a route with a disabled NIC. - NIC's Running flag will report the NIC's enabled status. Tests: - stack_test.TestDisableUnknownNIC - stack_test.TestDisabledNICsNICInfoAndCheckNIC - stack_test.TestRoutesWithDisabledNIC - stack_test.TestRouteWritePacketWithDisabledNIC - stack_test.TestStopStartSolicitingRouters - stack_test.TestCleanupNDPState - stack_test.TestAddRemoveIPv4BroadcastAddressOnNICEnableDisable - stack_test.TestJoinLeaveAllNodesMulticastOnNICEnableDisable PiperOrigin-RevId: 296298588
2020-02-20Remove bytes read/written from marshal.Marshallable API.gVisor bot
Users of the API only care about whether the copy in/out succeeds in their entirety, which is already signalled by the returned error. PiperOrigin-RevId: 296297843
2020-02-20Better strace logging for epoll syscalls.gVisor bot
Example: epoll_ctl(0x3 anon_inode:[eventpoll], EPOLL_CTL_ADD, 0x6 anon_inode:[eventfd], 0x7efe2fd92a80 {events=EPOLLIN|EPOLLOUT data=0x10203040506070a}) = 0x0 (4.411µs) epoll_wait(0x3 anon_inode:[eventpoll], 0x7efe2fd92b50 {{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}{events=EPOLLOUT data=0x102030405060708}}, 0x3, 0xffffffff) = 0x3 (29.891µs) PiperOrigin-RevId: 296258146
2020-02-20Re-add atomicbitops_arm64.s to BUILD.gVisor bot
This was inadverently dropped by cl/295811743. PiperOrigin-RevId: 296254482
2020-02-20VFS2: Support memory mapping in tmpfs.gVisor bot
tmpfs.fileDescription now implements ConfigureMMap. And tmpfs.regularFile implement memmap.Mappable. The methods are mostly unchanged from VFS1 tmpfs. PiperOrigin-RevId: 296234557
2020-02-19Internal change.gVisor bot
PiperOrigin-RevId: 296088213
2020-02-18Fix mis-named comment.Kevin Krakauer
2020-02-18Enable IPV6_RECVTCLASS socket option for datagram socketsgVisor bot
Added the ability to get/set the IP_RECVTCLASS socket option on UDP endpoints. If enabled, traffic class from the incoming Network Header passed as ancillary data in the ControlMessages. Adding Get/SetSockOptBool to decrease the overhead of getting/setting simple options. (This was absorbed in a CL that will be landing before this one). Test: * Added unit test to udp_test.go that tests getting/setting as well as verifying that we receive expected TOS from incoming packet. * Added a syscall test for verifying getting/setting * Removed test skip for existing syscall test to enable end to end test. PiperOrigin-RevId: 295840218
2020-02-18Add //pkg/syncevent.gVisor bot
Package syncevent is intended to subsume ~all uses of channels in the sentry (including //pkg/waiter), as well as //pkg/sleep. Compared to channels: - Delivery of events to a syncevent.Receiver allows *synchronous* execution of an arbitrary callback, whereas delivery of events to a channel requires a goroutine to receive from that channel, resulting in substantial scheduling overhead. (This is also part of the motivation for the waiter package.) - syncevent.Waiter can wait on multiple event sources without the high O(N) overhead of select. (This is the same motivation as for the sleep package.) Compared to the waiter package: - syncevent.Waiters are intended to be persistent (i.e. per-kernel.Task), and syncevent.Broadcaster (analogous to waiter.Queue) is a hash table rather than a linked list, such that blocking is (usually) allocation-free. - syncevent.Source (analogous to waiter.Waitable) does not include an equivalent to waiter.Waitable.Readiness(), since this is inappropriate for transient events (see e.g. //pkg/sentry/kernel/time.ClockEventSource). Compared to the sleep package: - syncevent events are represented by bits in a bitmask rather than discrete sleep.Waker objects, reducing overhead and making it feasible to broadcast events to multiple syncevent.Receivers. - syncevent.Receiver invokes an arbitrary callback, which is required by the sentry's epoll implementation. (syncevent.Waiter, which is analogous to sleep.Sleeper, pairs a syncevent.Receiver with a callback that wakes a waiting goroutine; the implementation of this aspect is nearly identical to that of sleep.Sleeper, except that it represents *runtime.g as unsafe.Pointer rather than uintptr.) - syncevent.Waiter.Wait (analogous to sleep.Sleeper.Fetch(block=true)) does not automatically un-assert returned events. This is useful in cases where the path for handling an event is not the same as the path that observes it, such as for application signals (a la Linux's TIF_SIGPENDING). - Unlike sleep.Sleeper, which Fetches Wakers in the order that they were Asserted, the event bitmasks used by syncevent.Receiver have no way of preserving event arrival order. (This is similar to select, which goes out of its way to randomize event ordering.) The disadvantage of the syncevent package is that, since events are represented by bits in a uint64 bitmask, each syncevent.Receiver can "only" multiplex between 64 distinct events; this does not affect any known use case. Benchmarks: BenchmarkBroadcasterSubscribeUnsubscribe BenchmarkBroadcasterSubscribeUnsubscribe-12 45133884 26.3 ns/op BenchmarkMapSubscribeUnsubscribe BenchmarkMapSubscribeUnsubscribe-12 28504662 41.8 ns/op BenchmarkQueueSubscribeUnsubscribe BenchmarkQueueSubscribeUnsubscribe-12 22747668 45.6 ns/op BenchmarkBroadcasterSubscribeUnsubscribeBatch BenchmarkBroadcasterSubscribeUnsubscribeBatch-12 31609177 37.8 ns/op BenchmarkMapSubscribeUnsubscribeBatch BenchmarkMapSubscribeUnsubscribeBatch-12 17563906 62.1 ns/op BenchmarkQueueSubscribeUnsubscribeBatch BenchmarkQueueSubscribeUnsubscribeBatch-12 26248838 46.6 ns/op BenchmarkBroadcasterBroadcastRedundant BenchmarkBroadcasterBroadcastRedundant/0 BenchmarkBroadcasterBroadcastRedundant/0-12 100907563 11.8 ns/op BenchmarkBroadcasterBroadcastRedundant/1 BenchmarkBroadcasterBroadcastRedundant/1-12 85103068 13.3 ns/op BenchmarkBroadcasterBroadcastRedundant/4 BenchmarkBroadcasterBroadcastRedundant/4-12 52716502 22.3 ns/op BenchmarkBroadcasterBroadcastRedundant/16 BenchmarkBroadcasterBroadcastRedundant/16-12 20278165 58.7 ns/op BenchmarkBroadcasterBroadcastRedundant/64 BenchmarkBroadcasterBroadcastRedundant/64-12 5905428 205 ns/op BenchmarkMapBroadcastRedundant BenchmarkMapBroadcastRedundant/0 BenchmarkMapBroadcastRedundant/0-12 87532734 13.5 ns/op BenchmarkMapBroadcastRedundant/1 BenchmarkMapBroadcastRedundant/1-12 28488411 36.3 ns/op BenchmarkMapBroadcastRedundant/4 BenchmarkMapBroadcastRedundant/4-12 19628920 60.9 ns/op BenchmarkMapBroadcastRedundant/16 BenchmarkMapBroadcastRedundant/16-12 6026980 192 ns/op BenchmarkMapBroadcastRedundant/64 BenchmarkMapBroadcastRedundant/64-12 1640858 754 ns/op BenchmarkQueueBroadcastRedundant BenchmarkQueueBroadcastRedundant/0 BenchmarkQueueBroadcastRedundant/0-12 96904807 12.0 ns/op BenchmarkQueueBroadcastRedundant/1 BenchmarkQueueBroadcastRedundant/1-12 73521873 16.3 ns/op BenchmarkQueueBroadcastRedundant/4 BenchmarkQueueBroadcastRedundant/4-12 39209468 31.2 ns/op BenchmarkQueueBroadcastRedundant/16 BenchmarkQueueBroadcastRedundant/16-12 10810058 105 ns/op BenchmarkQueueBroadcastRedundant/64 BenchmarkQueueBroadcastRedundant/64-12 2998046 376 ns/op BenchmarkBroadcasterBroadcastAck BenchmarkBroadcasterBroadcastAck/1 BenchmarkBroadcasterBroadcastAck/1-12 44472397 26.4 ns/op BenchmarkBroadcasterBroadcastAck/4 BenchmarkBroadcasterBroadcastAck/4-12 17653509 69.7 ns/op BenchmarkBroadcasterBroadcastAck/16 BenchmarkBroadcasterBroadcastAck/16-12 4082617 260 ns/op BenchmarkBroadcasterBroadcastAck/64 BenchmarkBroadcasterBroadcastAck/64-12 1220534 1027 ns/op BenchmarkMapBroadcastAck BenchmarkMapBroadcastAck/1 BenchmarkMapBroadcastAck/1-12 26760705 44.2 ns/op BenchmarkMapBroadcastAck/4 BenchmarkMapBroadcastAck/4-12 11495636 100 ns/op BenchmarkMapBroadcastAck/16 BenchmarkMapBroadcastAck/16-12 2937590 343 ns/op BenchmarkMapBroadcastAck/64 BenchmarkMapBroadcastAck/64-12 861037 1344 ns/op BenchmarkQueueBroadcastAck BenchmarkQueueBroadcastAck/1 BenchmarkQueueBroadcastAck/1-12 19832679 55.0 ns/op BenchmarkQueueBroadcastAck/4 BenchmarkQueueBroadcastAck/4-12 5618214 189 ns/op BenchmarkQueueBroadcastAck/16 BenchmarkQueueBroadcastAck/16-12 1569980 713 ns/op BenchmarkQueueBroadcastAck/64 BenchmarkQueueBroadcastAck/64-12 437672 2814 ns/op BenchmarkWaiterNotifyRedundant BenchmarkWaiterNotifyRedundant-12 650823090 1.96 ns/op BenchmarkSleeperNotifyRedundant BenchmarkSleeperNotifyRedundant-12 619871544 1.61 ns/op BenchmarkChannelNotifyRedundant BenchmarkChannelNotifyRedundant-12 298903778 3.67 ns/op BenchmarkWaiterNotifyWaitAck BenchmarkWaiterNotifyWaitAck-12 68358360 17.8 ns/op BenchmarkSleeperNotifyWaitAck BenchmarkSleeperNotifyWaitAck-12 25044883 41.2 ns/op BenchmarkChannelNotifyWaitAck BenchmarkChannelNotifyWaitAck-12 29572404 40.2 ns/op BenchmarkSleeperMultiNotifyWaitAck BenchmarkSleeperMultiNotifyWaitAck-12 16122969 73.8 ns/op BenchmarkWaiterTempNotifyWaitAck BenchmarkWaiterTempNotifyWaitAck-12 46111489 25.8 ns/op BenchmarkSleeperTempNotifyWaitAck BenchmarkSleeperTempNotifyWaitAck-12 15541882 73.6 ns/op BenchmarkWaiterNotifyWaitMultiAck BenchmarkWaiterNotifyWaitMultiAck-12 65878500 18.2 ns/op BenchmarkSleeperNotifyWaitMultiAck BenchmarkSleeperNotifyWaitMultiAck-12 28798623 41.5 ns/op BenchmarkChannelNotifyWaitMultiAck BenchmarkChannelNotifyWaitMultiAck-12 11308468 101 ns/op BenchmarkWaiterNotifyAsyncWaitAck BenchmarkWaiterNotifyAsyncWaitAck-12 2475387 492 ns/op BenchmarkSleeperNotifyAsyncWaitAck BenchmarkSleeperNotifyAsyncWaitAck-12 2184507 518 ns/op BenchmarkChannelNotifyAsyncWaitAck BenchmarkChannelNotifyAsyncWaitAck-12 2120365 562 ns/op BenchmarkWaiterNotifyAsyncWaitMultiAck BenchmarkWaiterNotifyAsyncWaitMultiAck-12 2351247 494 ns/op BenchmarkSleeperNotifyAsyncWaitMultiAck BenchmarkSleeperNotifyAsyncWaitMultiAck-12 2205799 522 ns/op BenchmarkChannelNotifyAsyncWaitMultiAck BenchmarkChannelNotifyAsyncWaitMultiAck-12 1238079 928 ns/op Updates #1074 PiperOrigin-RevId: 295834087
2020-02-18cpuid: cache the maximum size of xsave stategVisor bot
perf shows that ExtendedStateSize cosumes more than 20% of cpu: 23.61% 23.61% [.] pkg/cpuid/cpuid.HostID PiperOrigin-RevId: 295813263
2020-02-18atomicbitops package cleanupsgVisor bot
- Redocument memory ordering from "no ordering" to "acquire-release". (No functional change: both LOCK WHATEVER on x86, and LDAXR/STLXR loops on ARM64, already have this property.) - Remove IncUnlessZeroInt32 and DecUnlessOneInt32, which were only faster than the equivalent loops using sync/atomic before the Go compiler inlined non-unsafe.Pointer atomics many releases ago. PiperOrigin-RevId: 295811743
2020-02-18Merge pull request #1850 from kevinGC:jump2gVisor bot
PiperOrigin-RevId: 295785052
2020-02-18ring0/pagetables: fix typogVisor bot
PiperOrigin-RevId: 295770717
2020-02-14Remove linux.EpollEvent.Fd.gVisor bot
glibc defines struct epoll_event in such a way that epoll_event.data.fd exists. However, the kernel's definition of struct epoll_event makes epoll_event.data an opaque uint64, so naming half of it "fd" just introduces confusion. Remove the Fd field, and make Data a [2]int32 to compensate. Also add required padding to linux.EpollEvent on ARM64. PiperOrigin-RevId: 295250424
2020-02-14Synchronize signalling with S/RgVisor bot
This is to fix a data race between sending an external signal to a ThreadGroup and kernel saving state for S/R. PiperOrigin-RevId: 295244281
2020-02-14Allow vfs.IterDirentsCallback.Handle() to return an error.gVisor bot
This is easier than storing errors from e.g. CopyOut in the callback. PiperOrigin-RevId: 295230021
2020-02-14Enable automated marshalling for RSeqCriticalSection.gVisor bot
PiperOrigin-RevId: 295226468
2020-02-14Inline vfs.VirtualFilesystem in Kernel structgVisor bot
This saves one pointer dereference per VFS access. Updates #1623 PiperOrigin-RevId: 295216176
2020-02-14Un-export p9 message encode/decode functions.gVisor bot
These are not used outside of the p9 package. PiperOrigin-RevId: 295200052
2020-02-14Enable automated marshalling for struct stat.gVisor bot
This requires fixing a few build issues for non-am64 platforms. PiperOrigin-RevId: 295196922
2020-02-14Plumb VFS2 inside the SentrygVisor bot
- Added fsbridge package with interface that can be used to open and read from VFS1 and VFS2 files. - Converted ELF loader to use fsbridge - Added VFS2 types to FSContext - Added vfs.MountNamespace to ThreadGroup Updates #1623 PiperOrigin-RevId: 295183950
2020-02-14Fix various issues related to enabling go-marshal.gVisor bot
- Add missing build tags to files in the abi package. - Add the marshal package as a sentry dependency, allowed by deps_test. - Fix an issue with our top-level go_library BUILD rule, which incorrectly shadows the variable containing the input set of source files. This caused the expansion for the go_marshal clause to silently omit input files. - Fix formatting when copying build tags to gomarshal-generated files. - Fix a bug with import statement collision detection in go-marshal. PiperOrigin-RevId: 295112284
2020-02-13Add FileExec flag to OpenOptionsgVisor bot
This allow callers to say whether the file is being opened to be executed, so that the proper checks can be done from FilesystemImpl.OpenAt() Updates #1623 PiperOrigin-RevId: 295042595
2020-02-13We can now create and jump in iptables. For example:Kevin Krakauer
$ iptables -N foochain $ iptables -A INPUT -j foochain
2020-02-13Merge pull request #1791 from kevinGC:uchainsgVisor bot
PiperOrigin-RevId: 294957297
2020-02-13Internal change.gVisor bot
PiperOrigin-RevId: 294952610
2020-02-12iptables: User chainsKevin Krakauer
- Adds creation of user chains via `-N <chainname>` - Adds `-j RETURN` support for built-in chains, which triggers the chain's underflow rule (usually the default policy). - Adds tests for chain creation, default policies, and `-j RETURN' from built-in chains.
2020-02-11Simplify atomic operationsgVisor bot
PiperOrigin-RevId: 294582802
2020-02-11Ensure fsimpl/gofer.dentryPlatformFile.hostFileMapper is initialized.gVisor bot
Fixes #1812. (The more direct cause of the deadlock is panic unsafety because the historically high cost of defer means that we avoid it in hot paths, including much of MM; defer is much cheaper as of Go 1.14, but still a measurable overhead.) PiperOrigin-RevId: 294560316
2020-02-11Disallow duplicate NIC names.gVisor bot
PiperOrigin-RevId: 294500858
2020-02-11Prevent DATA RACE in UnstableAttr.Adin Scannell
The slaveInodeOperations is currently copying the object when truncate is called (which is a no-op). This may result in a (unconsequential) data race when being modified concurrently. PiperOrigin-RevId: 294484276
2020-02-11Move Align{Up,Down} into binary package.gVisor bot
PiperOrigin-RevId: 294477647
2020-02-10Merge pull request #1775 from kevinGC:tcp-matchers-submitgVisor bot
PiperOrigin-RevId: 294340468
2020-02-10Cleanup internal package group.Adin Scannell
PiperOrigin-RevId: 294339229
2020-02-10Refactor getxattr.Dean Deng
Put most of the logic for getxattr in one place for clarity. This simplifies FGetXattr and getXattrFromPath, which are just wrappers for getXattr. PiperOrigin-RevId: 294308332
2020-02-10Add context to note.Adin Scannell
PiperOrigin-RevId: 294300040
2020-02-10Add context to comments.Adin Scannell
PiperOrigin-RevId: 294295852
2020-02-10Add contextual comment.Adin Scannell
PiperOrigin-RevId: 294289066
2020-02-10Add contextual note.Adin Scannell
PiperOrigin-RevId: 294285723
2020-02-10Document MinimumTotalMemoryBytes.Adin Scannell
PiperOrigin-RevId: 294273559
2020-02-10Redirect FIXME to gvisor.devFabricio Voznika
PiperOrigin-RevId: 294272755
2020-02-10Move x86 state definition to its own file.Brad Burlage
PiperOrigin-RevId: 294271541
2020-02-10Update visibility.Adin Scannell
PiperOrigin-RevId: 294265019
2020-02-10Merge pull request #1453 from xiaobo55x:cpuidgVisor bot
PiperOrigin-RevId: 294257911
2020-02-10Enable pkg/cpuid support on arm64.Haibo Xu
Fixes #1255 Signed-off-by: Haibo Xu <haibo.xu@arm.com> Change-Id: I8614e6f3ee321c2989567e4e712aa8f28cc9db14
2020-02-07Support listxattr and removexattr syscalls.Dean Deng
Note that these are only implemented for tmpfs, and other impls will still return EOPNOTSUPP. PiperOrigin-RevId: 293899385
2020-02-07Log level, optname, optval and optlen in getsockopt/setsockopt in strace.Ian Gudger
Log 8, 16, and 32 int optvals and dump the memory of other sizes. Updates #1782 PiperOrigin-RevId: 293889388
2020-02-07Address GH comments.Kevin Krakauer
2020-02-06Send DAD event when DAD resolves immediatelyGhanan Gowripalan
Previously, a DAD event would not be sent if DAD was disabled. This allows integrators to do some work when an IPv6 address is bound to a NIC without special logic that checks if DAD is enabled. Without this change, integrators would need to check if a NIC has DAD enabled when an address is auto-generated. If DAD is enabled, it would need to delay the work until the DAD completion event; otherwise, it would need to do the work in the address auto-generated event handler. Test: stack_test.TestDADDisabled PiperOrigin-RevId: 293732914