summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2020-08-06amd64: implement KPTI for gvisorLai Jiangshan
Actually, gvisor has KPTI (Kernel PageTable Isolation) between gr0 and gr3. But the upper half of the userCR3 contains the whole sentry kernel which makes the kernel vulnerable to gr3 APP through CPU bugs. This patch implement full KPTI functionality for gvisor. It doesn't map the whole kernel in the upper. It maps only the text section of the binary and the entry area required by the ISA. The entry area contains the global idt, the percpu gdt/tss etc. The entry area packs all these together which is less than 350k for 512 vCPUs. The text section is normally nonsensitive. It is possible to map only the entry functions (interrupt handler etc.) only. But it requires some hacks. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-05amd64: introduce kernelEntryLai Jiangshan
kernelEntry is split from CPU that contains minimal CPU-specific arch state that can be mapped at the upper of the address space. It is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-05amd64: don't check vcpu in bluepill()Lai Jiangshan
m.Get() has guaranteed that if any OS thread TID is in guest, m.vCPUs[TID] points to the vCPU in which the OS thread TID is running. So if m.Get() returns with the corrent context in guest, the vCPU of it must be the same as what Get() returns. So bluepill() doesn't need to check if the vCPU is matched or not. The check need to access to %gs register which will not points to vCPU later when KPTI for gvisor is enabled. We can still fetch the vCPU pointer from %gs later (when %gs points to kernelEntry), but it needs the ENTRY_CPU_SELF which is generated by ring0/offset_amd64.go. So we just simply remove the check. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: less code and data in the upper halfLai Jiangshan
Call jumpToKernel() in sysret()/iret() so that there is less code and data in the upper half, and, especially, current goroutine's stack and user regs will not be accessed from the upper half (also with the help from previous patches which make less code in userCR3 context). jumpToUser() will not be needed, because current goroutine's stack and return value in the stack is lower half address. It is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: switch to kernelCR3 when just return to gr0Lai Jiangshan
KernelCR3 takes effect as early as possible so that less code is in the userCR3 environment. It is prepared for the next patches that make less code and data in the upper half, which is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-08-04amd64: switch to userCR3 just before return to gr3Lai Jiangshan
UserCR3 takes effect as late as possible so that less code is in the userCR3 environment. It is prepared for the next patches that make less code and data in the upper half, which is prepared for KPTI for gvisor. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antfin.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
2020-07-31Merge pull request #3300 from lubinszARM:pr_fpsimd_usrgVisor bot
PiperOrigin-RevId: 324309862
2020-07-31Fix PHONY target typosKevin Krakauer
PiperOrigin-RevId: 324305107
2020-07-31test/socket_netlink_route: check that there is a route on local or main tablesAndrei Vagin
A new network namespace has only the local route table. PiperOrigin-RevId: 324303629
2020-07-31Use proper set-output syntax.Adin Scannell
PiperOrigin-RevId: 324302828
2020-07-31[runtime tests] Enhance java runtime test.Ayush Ranjan
- Added a bunch of helpful options which help in speeding up the test and providing useful output. - Unexcluded passing tests and updated bugs. Excluded tests which were failing. - Increased the batch size for java tests so that we can take advantage of the shared JVMs. The running time of the tests decreased from 3+ hours (I don't know the exact running time because this test has always timed out after 3 hours) to 1 hour 15 minutes. We can reliably run this a CI kokoro job. PiperOrigin-RevId: 324301503
2020-07-31Support fragments from different sourcesGhanan Gowripalan
Prevent fragments with different source-destination pairs from conflicting with each other. Test: - ipv6_test.TestReceiveIPv6Fragments - ipv4_test.TestReceiveIPv6Fragments PiperOrigin-RevId: 324283246
2020-07-31Merge pull request #3348 from kevinGC:so-orig-dstgVisor bot
PiperOrigin-RevId: 324279280
2020-07-31Internal change.gVisor bot
PiperOrigin-RevId: 324259991
2020-07-31s/github.dev/gvisor.devKevin Krakauer
PiperOrigin-RevId: 324249991
2020-07-31iptables: support SO_ORIGINAL_DSTKevin Krakauer
Envoy (#170) uses this to get the original destination of redirected packets.
2020-07-31Merge pull request #3420 from ↵gVisor bot
google:dependabot/bundler/benchmarks/workloads/ruby/activesupport-6.0.3.2 PiperOrigin-RevId: 324238154
2020-07-31Clean up vfs2 fallocate.Dean Deng
Move to setstat.go and add a FileDescription wrapper method. PiperOrigin-RevId: 324165277
2020-07-30Fix TCP CurrentConnected counter updates.Mithun Iyer
CurrentConnected counter is incorrectly decremented on close of an endpoint which is still not connected. Fixes #3443 PiperOrigin-RevId: 324155171
2020-07-30Port nginx and move parsers to own package.Zach Koopmans
This change: - Ports the nginx benchmark. - Switches the Httpd benchmark to use 'hey' as a client. - Moves all parsers to their own package 'tools'. Parsers are moved to their own package because 1) parsing output of a command is often dependent on the format of the command (e.g. 'fio --json'), 2) to enable easier reuse, and 3) clean up and simplify actual running benchmarks (no TestParser functions and ugly sample output in benchmark files). PiperOrigin-RevId: 324144165
2020-07-30Merge pull request #3448 from lubinszARM:pr_tls_testsgVisor bot
PiperOrigin-RevId: 324127810
2020-07-30Merge pull request #3028 from lubinszARM:pr_kvm_hello1gVisor bot
PiperOrigin-RevId: 324125938
2020-07-30Merge pull request #3179 from jinmouil:fuse_initgVisor bot
PiperOrigin-RevId: 324100220
2020-07-30Call lseek(0, SEEK_CUR) unconditionally in runsc fsgofer's Readdir(offset=0).Jamie Liu
9P2000.L is silent as to how readdir RPCs interact with directory mutation. The most performant option is for Treaddir with offset=0 to restart iteration, avoiding needing to walk+open+clunk a new directory fid between invocations of getdents64(2), and the VFS2 gofer client assumes this is the case. Make this actually true for the runsc fsgofer. Fixes #3344, #3345, #3355 PiperOrigin-RevId: 324090384
2020-07-30Revert change to default buffer size.Bhasker Hariharan
In https://github.com/google/gvisor/commit/ca6bded95dbce07f9683904b4b768dfc2d4a09b2 we reduced the default buffer size to 32KB. This mostly works fine except at high throughput where we hit zero window very quickly and the TCP receive buffer moderation is not able to grow the window. This can be seen in the benchmarks where with a 32KB buffer and 100 connections downloading a 10MB file we get about 30 requests/s vs the 1MB buffer gives us about 53 requests/s. A proper fix requires a few changes to when we send a zero window as well as when we decide to send a zero window update. Today we consider available space below 1MSS as zero and send an update when it crosses 1MSS of available space. This is way too low and results in the window staying very small once we hit a zero window condition as we keep sending updates with size barely over 1MSS. Linux and BSD are smarter about this and use different thresholds. We should separately update our logic to match linux or BSD so that we don't send window updates that are really tiny or wait until we drop below 1MSS to advertise a zero window. PiperOrigin-RevId: 324087019
2020-07-30Enforce fragment block size and validate argsGhanan Gowripalan
Allow configuring fragmentation.Fragmentation with a fragment block size which will be enforced when processing fragments. Also validate arguments when processing fragments. Test: - fragmentation.TestErrors - ipv6_test.TestReceiveIPv6Fragments - ipv4_test.TestReceiveIPv6Fragments PiperOrigin-RevId: 324081521
2020-07-30Implement overlayfs_stale_read for vfs2.Jamie Liu
PiperOrigin-RevId: 324080111
2020-07-30Allocate a pseudo-tty for exec.Adin Scannell
Otherwise Ctrl-C will kill the 'docker exec' as opposed to killing the bazel command being run inside the container. PiperOrigin-RevId: 324079339
2020-07-30Add runsc build benchmark.Zach Koopmans
PiperOrigin-RevId: 324071377
2020-07-30Implement neighbor unreachability detection for ARP and NDP.Sam Balana
This change implements the Neighbor Unreachability Detection (NUD) state machine, as per RFC 4861 [1]. The state machine operates on a single neighbor in the local network. This requires the state machine to be implemented on each entry of the neighbor table. This change also adds, but does not expose, several APIs. The first API is for performing basic operations on the neighbor table: - Create a static entry - List all entries - Delete all entries - Remove an entry by address The second API is used for changing the NUD protocol constants on a per-NIC basis to allow Neighbor Discovery to operate over links with widely varying performance characteristics. See [RFC 4861 Section 10][2] for the list of constants. Finally, the last API is for allowing users to subscribe to NUD state changes. See [RFC 4861 Appendix C][3] for the list of edges. [1]: https://tools.ietf.org/html/rfc4861 [2]: https://tools.ietf.org/html/rfc4861#section-10 [3]: https://tools.ietf.org/html/rfc4861#appendix-C Tests: pkg/tcpip/stack:stack_test - TestNeighborCacheAddStaticEntryThenOverflow - TestNeighborCacheClear - TestNeighborCacheClearThenOverflow - TestNeighborCacheConcurrent - TestNeighborCacheDuplicateStaticEntryWithDifferentLinkAddress - TestNeighborCacheDuplicateStaticEntryWithSameLinkAddress - TestNeighborCacheEntry - TestNeighborCacheEntryNoLinkAddress - TestNeighborCacheGetConfig - TestNeighborCacheKeepFrequentlyUsed - TestNeighborCacheNotifiesWaker - TestNeighborCacheOverflow - TestNeighborCacheOverwriteWithStaticEntryThenOverflow - TestNeighborCacheRemoveEntry - TestNeighborCacheRemoveEntryThenOverflow - TestNeighborCacheRemoveStaticEntry - TestNeighborCacheRemoveStaticEntryThenOverflow - TestNeighborCacheRemoveWaker - TestNeighborCacheReplace - TestNeighborCacheResolutionFailed - TestNeighborCacheResolutionTimeout - TestNeighborCacheSetConfig - TestNeighborCacheStaticResolution - TestEntryAddsAndClearsWakers - TestEntryDelayToProbe - TestEntryDelayToReachableWhenSolicitedOverrideConfirmation - TestEntryDelayToReachableWhenUpperLevelConfirmation - TestEntryDelayToStaleWhenConfirmationWithDifferentAddress - TestEntryDelayToStaleWhenProbeWithDifferentAddress - TestEntryFailedGetsDeleted - TestEntryIncompleteToFailed - TestEntryIncompleteToIncompleteDoesNotChangeUpdatedAt - TestEntryIncompleteToReachable - TestEntryIncompleteToReachableWithRouterFlag - TestEntryIncompleteToStale - TestEntryInitiallyUnknown - TestEntryProbeToFailed - TestEntryProbeToReachableWhenSolicitedConfirmationWithSameAddress - TestEntryProbeToReachableWhenSolicitedOverrideConfirmation - TestEntryProbeToStaleWhenConfirmationWithDifferentAddress - TestEntryProbeToStaleWhenProbeWithDifferentAddress - TestEntryReachableToStaleWhenConfirmationWithDifferentAddress - TestEntryReachableToStaleWhenConfirmationWithDifferentAddressAndOverride - TestEntryReachableToStaleWhenProbeWithDifferentAddress - TestEntryReachableToStaleWhenTimeout - TestEntryStaleToDelay - TestEntryStaleToReachableWhenSolicitedOverrideConfirmation - TestEntryStaleToStaleWhenOverrideConfirmation - TestEntryStaleToStaleWhenProbeUpdateAddress - TestEntryStaysDelayWhenOverrideConfirmationWithSameAddress - TestEntryStaysProbeWhenOverrideConfirmationWithSameAddress - TestEntryStaysReachableWhenConfirmationWithRouterFlag - TestEntryStaysReachableWhenProbeWithSameAddress - TestEntryStaysStaleWhenProbeWithSameAddress - TestEntryUnknownToIncomplete - TestEntryUnknownToStale - TestEntryUnknownToUnknownWhenConfirmationWithUnknownAddress pkg/tcpip/stack:stack_x_test - TestDefaultNUDConfigurations - TestNUDConfigurationFailsForNotSupported - TestNUDConfigurationsBaseReachableTime - TestNUDConfigurationsDelayFirstProbeTime - TestNUDConfigurationsMaxMulticastProbes - TestNUDConfigurationsMaxRandomFactor - TestNUDConfigurationsMaxUnicastProbes - TestNUDConfigurationsMinRandomFactor - TestNUDConfigurationsRetransmitTimer - TestNUDConfigurationsUnreachableTime - TestNUDStateReachableTime - TestNUDStateRecomputeReachableTime - TestSetNUDConfigurationFailsForBadNICID - TestSetNUDConfigurationFailsForNotSupported [1]: https://tools.ietf.org/html/rfc4861 [2]: https://tools.ietf.org/html/rfc4861#section-10 [3]: https://tools.ietf.org/html/rfc4861#appendix-C Updates #1889 Updates #1894 Updates #1895 Updates #1947 Updates #1948 Updates #1949 Updates #1950 PiperOrigin-RevId: 324070795
2020-07-30Use brodcast MAC for broadcast IPv4 packetsGhanan Gowripalan
When sending packets to a known network's broadcast address, use the broadcast MAC address. Test: - stack_test.TestOutgoingSubnetBroadcast - udp_test.TestOutgoingSubnetBroadcast PiperOrigin-RevId: 324062407
2020-07-30Have dockerutil.Wait* respect the context deadlineKevin Krakauer
PiperOrigin-RevId: 324044634
2020-07-30Fix SETOWN_EX return value.Dean Deng
Return on success should be 0, not size of the struct copied out. PiperOrigin-RevId: 324029193
2020-07-30[runtime tests] go language test enhancementAyush Ranjan
- Unexported some passing tests. This will increase the testing surface and will be especially helpful when this is enabled for vfs2. - Run tool tests with -v (verbose output). We only print the output when a test fails so this should not clutter the output. - Run tool tests with "-no-rebuild" flag. - Surround test name with appropriate regex, i.e. ^testname$. This will ensure that only that test is run. Earlier running go_test:os would also run go_test:os/exec, go_test:os/signal, go_test:os/user. This should help speed up the tests as we do not run the same test multiple times anymore. - Updated bugs. Updates #3191 PiperOrigin-RevId: 324028878
2020-07-30Update call in Node benchmark.Zach Koopmans
PiperOrigin-RevId: 324028183
2020-07-30Override the test timeout for runtimes.Adin Scannell
PiperOrigin-RevId: 324026021
2020-07-30Fix merge flow for Go branch.Adin Scannell
PiperOrigin-RevId: 324024075
2020-07-30Drop complex awk step.Adin Scannell
PiperOrigin-RevId: 324023425
2020-07-30Double the number of jobs used by RBE.Adin Scannell
PiperOrigin-RevId: 324022546
2020-07-30Disable consistently failing test.Adin Scannell
PiperOrigin-RevId: 324017310
2020-07-30add usr-tls test cases for Arm64Bin Lu
Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-29Add FUSE_INITJinmou Li
This change allows the sentry to send FUSE_INIT request and process the reply. It adds the corresponding structs, employs the fuse device to send and read the message, and stores the results of negotiation in corresponding places (inside connection struct). It adds a CallAsync() function to the FUSE connection interface: - like Call(), but it's for requests that do not expect immediate response (init, release, interrupt etc.) - will block if the connection hasn't initialized, which is the same for Call()
2020-07-29Force registration for EPOLLHUP, not EPOLLRDHUP, in vfs2's epoll.Jamie Liu
Compare Linux's fs/eventpoll.c:do_epoll_ctl(). I don't know where EPOLLRDHUP came from. PiperOrigin-RevId: 323874419
2020-07-29Port fio benchmarkZach Koopmans
PiperOrigin-RevId: 323810654
2020-07-29Bump activesupport from 5.2.3 to 6.0.3.2 in /benchmarks/workloads/rubydependabot[bot]
Bumps [activesupport](https://github.com/rails/rails) from 5.2.3 to 6.0.3.2. - [Release notes](https://github.com/rails/rails/releases) - [Changelog](https://github.com/rails/rails/blob/v6.0.3.2/activesupport/CHANGELOG.md) - [Commits](https://github.com/rails/rails/compare/v5.2.3...v6.0.3.2) Signed-off-by: dependabot[bot] <support@github.com>
2020-07-29Port node benchmark.Zach Koopmans
PiperOrigin-RevId: 323810235
2020-07-29load/store user fpsimd on Arm64Bin Lu
full context switch: add fpsimd load/store support to container application. Signed-off-by: Bin Lu <bin.lu@arm.com>
2020-07-29Test UDP socket bound to ANY can receive unicastJay Zhuang
PiperOrigin-RevId: 323773771
2020-07-28Redirect TODO to GitHub issuesFabricio Voznika
PiperOrigin-RevId: 323715260
2020-07-28Merge pull request #3025 from kevinGC:ipv6-iptables-testing2gVisor bot
PiperOrigin-RevId: 323692144