summaryrefslogtreecommitdiffhomepage
AgeCommit message (Collapse)Author
2019-04-04gvisor: Add support for the MS_NOEXEC mount optionAndrei Vagin
https://github.com/google/gvisor/issues/145 PiperOrigin-RevId: 242044115 Change-Id: I8f140fe05e32ecd438b6be218e224e4b7fe05878
2019-04-04Remove defer from trivial ThreadID methodsMichael Pratt
In particular, ns.IDOfTask and tg.ID are used for gettid and getpid, respectively, where removing defer saves ~100ns. This may be a small improvement to application logging, which may call gettid/getpid frequently. PiperOrigin-RevId: 242039616 Change-Id: I860beb62db3fe077519835e6bafa7c74cba6ca80
2019-04-04BUILD: Add useful go_path targetAdin Scannell
Change-Id: Ibd6d8a1a63826af6e62a0f0669f8f0866c8091b4 PiperOrigin-RevId: 242037969
2019-04-04Format workspaceAdin Scannell
Change-Id: Ibb77656c46942eb123cd6cff8b471a526468d2dd PiperOrigin-RevId: 242007583
2019-04-03Internal change.Googler
PiperOrigin-RevId: 241867632 Change-Id: I29459f2758ac4835882b491ff25c6aca9a37d41d
2019-04-03Only CopyOut CPU when it changesMichael Pratt
This will save copies when preemption is not caused by a CPU migration. PiperOrigin-RevId: 241844399 Change-Id: I2ba3b64aa377846ab763425bd59b61158f576851
2019-04-03Don't release d.mu in checks for child-existence.Nicolas Lacasse
Dirent.exists() is called in Create to check whether a child with the given name already exists. Dirent.exists() calls walk(), and before this CL allowed walk() to drop d.mu while calling d.Inode.Lookup. During this existence check, a racing Rename() can acquire d.mu and create a new child of the dirent with the same name. (Note that the source and destination of the rename must be in the same directory, otherwise renameMu will be taken preventing the race.) In this case, d.exists() can return false, even though a child with the same name actually does exist. This CL changes d.exists() so that it does not release d.mu while walking, thus preventing the race with Rename. It also adds comments noting that lockForRename may not take renameMu if the source and destination are in the same directory, as this is a bit surprising (at least it was to me). PiperOrigin-RevId: 241842579 Change-Id: I56524870e39dfcd18cab82054eb3088846c34813
2019-04-03Cache ThreadGroups in PIDNamespaceMichael Pratt
If there are thousands of threads, ThreadGroupsAppend becomes very expensive as it must iterate over all Tasks to find the ThreadGroup leaders. Reduce the cost by maintaining a map of ThreadGroups which can be used to grab them all directly. The one somewhat visible change is to convert PID namespace init children zapping to a group-directed SIGKILL, as Linux did in 82058d668465 "signal: Use group_send_sig_info to kill all processes in a pid namespace". In a benchmark that creates N threads which sleep for two minutes, we see approximately this much CPU time in ThreadGroupsAppend: Before: 1 thread: 0ms 1024 threads: 30ms - 9130ms 4096 threads: 50ms - 2000ms 8192 threads: 18160ms 16384 threads: 17210ms After: 1 thread: 0ms 1024 threads: 0ms 4096 threads: 0ms 8192 threads: 0ms 16384 threads: 0ms The profiling is actually extremely noisy (likely due to cache effects), as some runs show almost no samples at 1024, 4096 threads, but obviously this does not scale to lots of threads. PiperOrigin-RevId: 241828039 Change-Id: I17827c90045df4b3c49b3174f3a05bca3026a72c
2019-04-03Fix index out of bounds in tty implementation.Kevin Krakauer
The previous implementation revolved around runes instead of bytes, which caused weird behavior when converting between the two. For example, peekRune would read the byte 0xff from a buffer, convert it to a rune, then return it. As rune is an alias of int32, 0xff was 0-padded to int32(255), which is the hex code point for ?. However, peekRune also returned the length of the byte (1). When calling utf8.EncodeRune, we only allocated 1 byte, but tried the write the 2-byte character ?. tl;dr: I apparently didn't understand runes when I wrote this. PiperOrigin-RevId: 241789081 Change-Id: I14c788af4d9754973137801500ef6af7ab8a8727
2019-04-03Addresses data race in tty implementation.Kevin Krakauer
Also makes the safemem reading and writing inline, as it makes it easier to see what locks are held. PiperOrigin-RevId: 241775201 Change-Id: Ib1072f246773ef2d08b5b9a042eb7e9e0284175c
2019-04-03Add syscall annotations for unimplemented syscallsIan Lewis
Added syscall annotations for unimplemented syscalls for later generation into reference docs. Annotations are of the form: @Syscall(<name>, <key:value>, ...) Supported args and values are: - arg: A syscall option. This entry only applies to the syscall when given this option. - support: Indicates support level - UNIMPLEMENTED: Unimplemented (implies returns:ENOSYS) - PARTIAL: Partial support. Details should be provided in note. - FULL: Full support - returns: Indicates a known return value. Values are syscall errors. This is treated as a string so you can use something like "returns:EPERM or ENOSYS". - issue: A Github issue number. - note: A note Example: // @Syscall(mmap, arg:MAP_PRIVATE, support:FULL, note:Private memory fully supported) // @Syscall(mmap, arg:MAP_SHARED, support:UNIMPLEMENTED, issue:123, note:Shared memory not supported) // @Syscall(setxattr, returns:ENOTSUP, note:Requires file system support) Annotations should be placed as close to their implementation as possible (preferrably as part of a supporting function's Godoc) and should be updated as syscall support changes. PiperOrigin-RevId: 241697482 Change-Id: I7a846135db124e1271dc5057d788cba82ca312d4
2019-04-02Set options on the correct Task in PTRACE_SEIZE.Jamie Liu
$ docker run --rm --runtime=runsc -it --cap-add=SYS_PTRACE debian bash -c "apt-get update && apt-get install strace && strace ls" ... Setting up strace (4.15-2) ... execve("/bin/ls", ["ls"], [/* 6 vars */]) = 0 brk(NULL) = 0x5646d8c1e000 uname({sysname="Linux", nodename="114ef93d2db3", ...}) = 0 ... PiperOrigin-RevId: 241643321 Change-Id: Ie4bce27a7fb147eef07bbae5895c6ef3f529e177
2019-04-02Add build rule for raw socket tests so they are runnable via:Kevin Krakauer
bazel test test/syscalls:raw_socket_ipv4_test_{native,runsc_ptrace,runsc_kvm} PiperOrigin-RevId: 241640049 Change-Id: Iac4dbdd7fd1827399a472059ac7d85fb6b506577
2019-04-02Add test that symlinking over a directory returns EEXIST.Nicolas Lacasse
Also remove comments in InodeOperations that required that implementation of some Create* operations ensure that the name does not already exist, since these checks are all centralized in the Dirent. PiperOrigin-RevId: 241637335 Change-Id: Id098dc6063ff7c38347af29d1369075ad1e89a58
2019-04-02Remove obsolete TODO.Kevin Krakauer
PiperOrigin-RevId: 241637164 Change-Id: I65476a739cf38f1818dc47f6ce60638dec8b77a8
2019-04-02Fix more data races in shm debug messages.Rahat Mahmood
PiperOrigin-RevId: 241630409 Change-Id: Ie0df5f5a2f20c2d32e615f16e2ba43c88f963181
2019-04-02device: fix device major/minorWei Zhang
Current gvisor doesn't give devices a right major and minor number. When testing golang supporting of gvisor, I run the test case below: ``` $ docker run -ti --runtime runsc golang:1.12.1 bash -c "cd /usr/local/go/src && ./run.bash " ``` And it reports some errors, one of them is: "--- FAIL: TestDevices (0.00s) --- FAIL: TestDevices//dev/null_1:3 (0.00s) dev_linux_test.go:45: for /dev/null Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/null Minor(0x0) == 0, want 3 dev_linux_test.go:51: for /dev/null Mkdev(1, 3) == 0x103, want 0x0 --- FAIL: TestDevices//dev/zero_1:5 (0.00s) dev_linux_test.go:45: for /dev/zero Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/zero Minor(0x0) == 0, want 5 dev_linux_test.go:51: for /dev/zero Mkdev(1, 5) == 0x105, want 0x0 --- FAIL: TestDevices//dev/random_1:8 (0.00s) dev_linux_test.go:45: for /dev/random Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/random Minor(0x0) == 0, want 8 dev_linux_test.go:51: for /dev/random Mkdev(1, 8) == 0x108, want 0x0 --- FAIL: TestDevices//dev/full_1:7 (0.00s) dev_linux_test.go:45: for /dev/full Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/full Minor(0x0) == 0, want 7 dev_linux_test.go:51: for /dev/full Mkdev(1, 7) == 0x107, want 0x0 --- FAIL: TestDevices//dev/urandom_1:9 (0.00s) dev_linux_test.go:45: for /dev/urandom Major(0x0) == 0, want 1 dev_linux_test.go:48: for /dev/urandom Minor(0x0) == 0, want 9 dev_linux_test.go:51: for /dev/urandom Mkdev(1, 9) == 0x109, want 0x0 " So I think we'd better assign to them correct major/minor numbers following linux spec. Signed-off-by: Wei Zhang <zhangwei198900@gmail.com> Change-Id: I4521ee7884b4e214fd3a261929e3b6dac537ada9 PiperOrigin-RevId: 241609021
2019-04-02Change bug number for duplicate bug.Kevin Krakauer
PiperOrigin-RevId: 241567897 Change-Id: I580eac04f52bb15f4aab7df9822c4aa92e743021
2019-04-02Add a raw socket transport endpoint and use it for raw ICMP sockets.Kevin Krakauer
Having raw socket code together will make it easier to add support for other raw network protocols. Currently, only ICMP uses the raw endpoint. However, adding support for other protocols such as UDP shouldn't be much more difficult than adding a few switch cases. PiperOrigin-RevId: 241564875 Change-Id: I77e03adafe4ce0fd29ba2d5dfdc547d2ae8f25bf
2019-04-01Automated rollback of changelist 240657604Fabricio Voznika
PiperOrigin-RevId: 241434161 Change-Id: I9ec734e50cef5b39203e8bf37de2d91d24943f1e
2019-04-01Add release hook and version flagAdin Scannell
PiperOrigin-RevId: 241421671 Change-Id: Ic0cebfe3efd458dc42c49f7f812c13318705199a
2019-04-01Save/restore simple devices.Rahat Mahmood
We weren't saving simple devices' last allocated inode numbers, which caused inode number reuse across S/R. PiperOrigin-RevId: 241414245 Change-Id: I964289978841ef0a57d2fa48daf8eab7633c1284
2019-04-01Trim trailing newline when reading /proc/[pid]/{uid,gid}_map in test.Jamie Liu
This reveals a bug in the tests that require CAP_SET{UID,GID}: After the child process enters the new user namespace, it ceases to have the relevant capability in the parent user namespace, so the privileged write must be done by the parent process. Change tests accordingly. PiperOrigin-RevId: 241412765 Change-Id: I587c1f24aa6f2180fb2e5e5c0162691ba5bac1bc
2019-04-01gofer: ignore unsupported filesLiu Hua
'ls' will hang if there is any FIFO in this path. So return EPERM if unsupported file occurs and add NONBLOCK flag when opening file to avoid blocking on FIFO read. Signed-off-by: Liu Hua <sdu.liu@huawei.com> Change-Id: I8b9a2a48322118d8ad531dd226395438123eb047 PiperOrigin-RevId: 241406726
2019-04-01Don't expand COW-break on executable VMAs.Jamie Liu
PiperOrigin-RevId: 241403847 Change-Id: I4631ca05734142da6e80cdfa1a1d63ed68aa05cc
2019-04-01gvisor: convert ilist to ilist:generic_listAndrei Vagin
ilist:generic_list works faster (cl/240185278) and the code looks cleaner without type casting. PiperOrigin-RevId: 241381175 Change-Id: I8487ab1d73637b3e9733c253c56dce9e79f0d35f
2019-04-01Internal change.Googler
PiperOrigin-RevId: 241350917 Change-Id: Ieacaa9ce2e41e22f1bae8900170879f549606782
2019-04-01Fix MemfdTest_OtherProcessCanOpenFromProcfs.Jamie Liu
- Make the body of InForkedProcess async-signal-safe. - Pass the correct path to open(). PiperOrigin-RevId: 241348774 Change-Id: I753dfa36e4fb05521e659c173e3b7db0c7fc159b
2019-03-29gvisor/runsc: enable generic segmentation offload (GSO)Andrei Vagin
The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 241072403 Change-Id: I20b03063a1a6649362b43609cbbc9b59be06e6d5
2019-03-29Use kernel.Task.CopyScratchBuffer in syscalls/linux where possible.Jamie Liu
PiperOrigin-RevId: 241072126 Change-Id: Ib4d9f58f550732ac4c5153d3cf159a5b1a9749da
2019-03-29Set container.CreatedAt in Create().Nicolas Lacasse
PiperOrigin-RevId: 241056805 Change-Id: I13ea8f5dbfb01ca02a3b0ab887b8c3bdf4d556a6
2019-03-29Treat fsync errors during save as SaveRejection errors.Nicolas Lacasse
PiperOrigin-RevId: 241055485 Change-Id: I70259e9fef59bdf9733b35a2cd3319359449dd45
2019-03-29Drop reference on shared anon mappableMichael Pratt
We call NewSharedAnonMappable simply to use it for Mappable/MappingIdentity for shared anon mmap. From MMapOpts.MappingIdentity: "If MMapOpts is used to successfully create a memory mapping, a reference is taken on MappingIdentity." mm.createVMALocked (below) takes this additional reference, so we don't need the reference returned by NewSharedAnonMappable. Holding it leaks the mappable. PiperOrigin-RevId: 241038108 Change-Id: I78ee3af78e0cc7aac4063b274b30d0e41eb5677d
2019-03-29Return srclen in proc.idMapFileOperations.Write.Jamie Liu
PiperOrigin-RevId: 241037926 Change-Id: I4b0381ac1c7575e8b861291b068d3da22bc03850
2019-03-29Treat ENOSPC as a state-file error during save.Nicolas Lacasse
PiperOrigin-RevId: 241028806 Change-Id: I770bf751a2740869a93c3ab50370a727ae580470
2019-03-29Fix incorrect checksums in TCP and UDP tests.Bhasker Hariharan
PiperOrigin-RevId: 241025361 Change-Id: I292e7aea9a4b294b11e4f736e107010d9524586b
2019-03-28Fix Panic in SACKScoreboard.Delete.Bhasker Hariharan
The panic was caused by modifying the tree while iterating which invalidated the iterator. Also fixes another bug in SACKScoreboard.Insert() which was causing blocks to be merged incorrectly. PiperOrigin-RevId: 240895053 Change-Id: Ia72b8244297962df5c04283346da5226434740af
2019-03-28set task's name when forkchris.zn
When fork a child process, the name filed of TaskContext is not set. It results in that when we cat /proc/{pid}/status, the name filed is null. Like this: Name: State: S (sleeping) Tgid: 28 Pid: 28 PPid: 26 TracerPid: 0 FDSize: 8 VmSize: 89712 kB VmRSS: 6648 kB Threads: 1 CapInh: 00000000a93d35fb CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 00000000a93d35fb Seccomp: 0 Change-Id: I5d469098c37cedd19da16b7ffab2e546a28a321e PiperOrigin-RevId: 240893304
2019-03-28Setting timestamps should trigger an inotify event.Nicolas Lacasse
PiperOrigin-RevId: 240850187 Change-Id: I1458581b771a1031e47bba439e480829794927b8
2019-03-28Add ICMP statsBert Muthalaly
PiperOrigin-RevId: 240848882 Change-Id: I23dd4599f073263437aeab357c3f767e1a432b82
2019-03-28Internal change.Googler
PiperOrigin-RevId: 240842801 Change-Id: Ibbd6f849f9613edc1b1dd7a99a97d1ecdb6e9188
2019-03-28Clean up gofer handle caching.Jamie Liu
- Document fsutil.CachedFileObject.FD() requirements on access permissions, and change gofer.inodeFileState.FD() to honor them. Fixes #147. - Combine gofer.inodeFileState.readonly and gofer.inodeFileState.readthrough, and simplify handle caching logic. - Inline gofer.cachePolicy.cacheHandles into gofer.inodeFileState.setSharedHandles, because users with access to gofer.inodeFileState don't necessarily have access to the fs.Inode (predictably, this is a save/restore problem). Before this CL: $ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash root@34d51017ed67:/# /root/repro/runsc-b147 mmap: 0x7f3c01e45000 Segmentation fault After this CL: $ docker run --runtime=runsc-d -v $(pwd)/gvisor/repro:/root/repro -it ubuntu bash root@d3c3cb56bbf9:/# /root/repro/runsc-b147 mmap: 0x7f78987ec000 o PiperOrigin-RevId: 240818413 Change-Id: I49e1d4a81a0cb9177832b0a9f31a10da722a896b
2019-03-28gofer: some fixs in setupRootFSLiu Hua
1.use root instead of spec.Root.path as mountpoint 2.put remount readonly logic ahead to avoid device busy errors Signed-off-by: Liu Hua <sdu.liu@huawei.com> Change-Id: I9222b4695f917136a97b0898ac6f75fcff296e5d PiperOrigin-RevId: 240818182
2019-03-28netstack/fdbased: add generic segmentation offload (GSO) supportAndrei Vagin
The linux packet socket can handle GSO packets, so we can segment packets to 64K instead of the MTU which is usually 1500. Here are numbers for the nginx-1m test: runsc: 579330.01 [Kbytes/sec] received runsc-gso: 1794121.66 [Kbytes/sec] received runc: 2122139.06 [Kbytes/sec] received and for tcp_benchmark: $ tcp_benchmark --duration 15 --ideal [ 4] 0.0-15.0 sec 86647 MBytes 48456 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal [ 4] 0.0-15.0 sec 2173 MBytes 1214 Mbits/sec $ tcp_benchmark --client --duration 15 --ideal --gso 65536 [ 4] 0.0-15.0 sec 19357 MBytes 10825 Mbits/sec PiperOrigin-RevId: 240809103 Change-Id: I2637f104db28b5d4c64e1e766c610162a195775a
2019-03-27Add rsslim field in /proc/pid/stat.Nicolas Lacasse
PiperOrigin-RevId: 240681675 Change-Id: Ib214106e303669fca2d5c744ed5c18e835775161
2019-03-27Automated rollback of changelist 240502097Fabricio Voznika
PiperOrigin-RevId: 240657604 Change-Id: Ida15dee83337867c560427eae0b4b9ce1051dbb8
2019-03-27Avoid mutating memory passed to DeliverTransportPacketTamir Duberstein
PiperOrigin-RevId: 240642903 Change-Id: I16625015123a827d267d60b328a202057264bbd6
2019-03-27Add start time to /proc/<pid>/stat.Nicolas Lacasse
The start time is the number of clock ticks between the boot time and application start time. PiperOrigin-RevId: 240619475 Change-Id: Ic8bd7a73e36627ed563988864b0c551c052492a5
2019-03-27gvisor/runsc: address typos from githubAndrei Vagin
Fixes: https://github.com/google/gvisor/issues/143 Fixes #143 PiperOrigin-RevId: 240600719 Change-Id: Id1731b9969f98e32e52e144a6643e12b0b70f168
2019-03-27Dev device methods should take pointer receiver.Nicolas Lacasse
PiperOrigin-RevId: 240600504 Change-Id: I7dd5f27c8da31f24b68b48acdf8f1c19dbd0c32d