tinyproxy - Light-weight HTTP/HTTPS proxy daemon for POSIX operating systems

Age	Commit message (Collapse)	Author
2020-09-30	print linenumber from all conf-emitted warnings	rofl0r

2020-09-30	log: print timestamps with millisecond precision	rofl0r
	this allows easier time measurements for benchmarks.
2020-09-30	change loglevel of "Not running as root" message to INFO	rofl0r
	there's no reason to display this as warning.
2020-09-30	conf: remove bogus support for hex literals	rofl0r
	the INT regex macro supported a 0x prefix (used e.g. for port numbers), however following that, only digits were accepted, and not the full range of hexdigits. it's unlikely this was used, so remove it. note that the () expression is kept, so we don't have to adjust match number indices all over the place.
2020-09-30	speed up build by only including regex.h where needed	rofl0r

2020-09-27	add conf-tokens.gperf to EXTRA_DIST	rofl0r
	otherwise it will be missing in `make dist`-generated tarballs.
2020-09-18	transparent: workaround old glibc bug on RHEL7	rofl0r
	it's been reported[0] that RHEL7 fails to properly set the length parameter of the getsockname() call to the length of the required struct sockaddr type, and always returns the length passed if it is big enough. the SOCKADDR_UNION_* macros originate from my microsocks[1] project, and facilitate handling of the sockaddr mess without nasty casts. [0]: https://github.com/tinyproxy/tinyproxy/issues/45#issuecomment-694594990 [1]: https://github.com/rofl0r/microsocks
2020-09-17	child_kill_children(): use method that actually works	rofl0r
	it turned out that close()ing an fd behind the back of a thread doesn't actually cause blocking operations to get a read/write event, because the fd will stay valid to in-progress operations.
2020-09-17	tune error messages to show select or poll depending on what is used	rofl0r

2020-09-16	add autoconf test and fallback code for systems without gperf	rofl0r

2020-09-16	main: print error when config_init() fails	rofl0r

2020-09-16	speed up big config parsing by 2x using gperf	rofl0r

2020-09-16	conf.c: simplify the huge IPV6 regex	rofl0r
	even though the existing IPV6 regex caught (almost?) all invalid ipv6 addresses, it did so with a huge performance penalty. parsing a file with 32K allow or deny statement took 30 secs in a test setup, after this change less than 3. the new regex is sufficient to recognize all valid ipv6 addresses, and hands down the responsibility to detect corner cases to the system's inet_pton() function, which is e.g. called from insert_acl(), which now causes a warning to be printed in the log if a seemingly valid address is in fact invalid. the new regex has been tested with 486 testcases from http://download.dartware.com/thirdparty/test-ipv6-regex.pl and accepts all valid ones and rejects most of the invalid ones. note that the IPV4 regex already did a similar thing and checked only whether the ip looks like [0-9]+.[0-9]+.[0-9]+.[0-9]+ without pedantry.
2020-09-16	acl.c: detect invalid ipv6 string	rofl0r

2020-09-16	conf.c: warn when encountering invalid address	rofl0r

2020-09-16	conf: use cpp stringification for STDCONF macro	rofl0r

2020-09-16	conf: merge upstream/upstream_none into single regex/handler	rofl0r

2020-09-16	move config reload message to reload_config()	rofl0r
	move it to before disabling logging, so a message with the correct timestamp is printed if logging was already enabled. also add a message when loading finished, so one can see from the timestamp how long it took. note that this only works on a real config reload triggered by SIGHUP/SIGUSR1, because on startup we don't know yet where to log to.
2020-09-16	remove vector remains	rofl0r

2020-09-16	log_message_storage: use sblist	rofl0r

2020-09-16	listen_addrs: use sblist	rofl0r

2020-09-16	basicauth: use sblist	rofl0r

2020-09-16	connect_ports: use sblist	rofl0r

2020-09-16	add_header: use sblist	rofl0r
	note that the old code inserted added headers at the beginning of the list, reasoning unknown. this seems counter-intuitive as the headers would end up in the request in the reverse order they were added, but this was irrelevant, as the headers were originally first put into the hashmap hashofheaders before sending it to the client. since the hashmap didn't preserve ordering, the headers would appear in random order anyway.
2020-09-16	listen_fds: use sblist	rofl0r

2020-09-15	free a mem leak by statically allocating global statsbuf	rofl0r

2020-09-15	main: include loop header	rofl0r

2020-09-15	free() loop records too	rofl0r

2020-09-15	use poll() where available	rofl0r

2020-09-15	prepare transition to poll()	rofl0r
	usage of select() is inefficient (because a huge fd_set array has to be initialized on each call) and insecure (because an fd >= FD_SETSIZE will cause out-of-bounds accesses using the FD_*SET macros, and a system can be set up to allow more than that number of fds using ulimit). for the moment we prepared a poll-like wrapper that still runs select() to test for regressions, and so we have fallback code for systems without poll().
2020-09-15	refactor conns.[ch], put conn_s into child struct	rofl0r
	this allows to access the conn member from the main thread handling the childs, plus simplifies the code.
2020-09-15	hsearch: add seed to prevent another CVE-2012-3505 instance	rofl0r

2020-09-15	replace leftover users of hashmap with htab	rofl0r
	also fixes a bug where the ErrorFile directive would create a new hashmap on every added item, effectively allowing only the use of the last specified errornumber, and producing memory leaks on each config reload.
2020-09-15	save headers in an ordered dictionary	rofl0r
	due to the usage of a hashmap to store headers, when relaying them to the other side the order was not prevented. even though correct from a standards point-of-view, this caused issues with various programs, and it allows to fingerprint the use of tinyproxy. to implement this, i imported the MIT-licensed hsearch.[ch] from https://github.com/rofl0r/htab which was originally taken from musl libc. it's a simple and efficient hashtable implementation with far better performance characteristic than the one previously used by tinyproxy. additionally it has an API much more well-suited for this purpose. orderedmap.[ch] was implemented from scratch to address this issue. behind the scenes it uses an sblist to store string values, and a htab to store keys and the indices into the sblist. this allows us to iterate linearly over the sblist and then find the corresponding key in the hash table, so the headers can be reproduced in the order they were received. closes #73
2020-09-15	fix free()ing of config items	rofl0r
	- we need to free the config after it has been succesfully loaded, not unconditionally before reloading. - we also need to free them before exiting from the main program to have clean valgrind output.
2020-09-15	shutdown: free children from right place	rofl0r

2020-09-15	Revert "childs.c: fix minor memory leak"	rofl0r
	This reverts commit 6dd3806f7d1a337fb89e335e986e1fa4eab8340c.
2020-09-15	childs.c: fix minor memory leak	rofl0r
	this would leak only once on program termination, so it's no big deal apart from having spurious reachable memory in valgrind logs.
2020-09-14	main: orderly shutdown on SIGINT too	rofl0r
	the appropriate code in the signal handler was already set up, but for some reason the signal itself not being handled.
2020-09-14	conf.c: include common.h	rofl0r

2020-09-13	fix get_request_entity()	rofl0r
	get_request_entity()'s purpose is to drain remaining unread bytes in the request read pipe before handing out an error page, and kinda surprisingly, also when connection to the stathost is done. in the stathost case tinyproxy just skipped proper processing and jumped to the error handler code, and remembering whether a connection to the stathost was desired in a variable, then doing things a bit differently depending on whether it's set. i tried to fix issues with get_request_entity in 88153e944f7d28f57cccc77f3228a3f54f78ce4e (which is basically the right fix for the issue it tried to solve, but incomplete), and resulting from there in 78cc5b72b18a3c0d196126bfbc5d3b6473386da9. the latter fix wasn't quite right since we're not supposed to check whether the socket is ready for writing, and having a return value of 2 instead of 1 got resulted in some of the if statements not kicking in when they should have. this also resulted in the stathost page no longer working. after in-depth study of the issue i realized that we only need to call get_request_entity() when the headers aren't completely read, additional to setting the proper connection timeout as 88153e944f7d28f57cccc77f3228a3f54f78ce4e already implemented. the changes of 78cc5b72b18a3c0d196126bfbc5d3b6473386da9 have been reverted.
2020-09-12	add_new_errorpage(): fix segfault accessing global config	rofl0r
	another fallout of the config refactoring finished by 2e02dce0c3de4a231f74b44c34647406de507768. apparently no one using the ErrorFile directive used git master during the last months, as there have been no reports about this issue.
2020-09-12	vector.h: missing include <unistd.h> for ssize_t	rofl0r

2020-09-10	handle_connection(): print process_*_headers errno information	rofl0r

2020-09-10	handle_connection: replace "goto fail" with func call	rofl0r
	this allows to see in a backtrace from where the error was triggered.
2020-09-10	handle_connection(): factor out failure code	rofl0r
	this allows us in a next step to replace goto fail with a call to that function, so we can see in a backtrace from where the failure was triggered.
2020-09-09	remove bogus custom timeout handling code	rofl0r
	in networking, hitting a timeout requires that nothing happens during the interval. whenever anything happens, the timeout is reset. there's no need to do custom time calculations, it's perfectly fine to let the kernel handle it using the select() syscall. additionally the code added in 0b9a74c29036f9215b2b97a301b7b25933054302 assures that read and write syscalls() don't block indefinitely and return on the timeout too, so there's no need to switch sockets back and forth between blocking/nonblocking.
2020-09-09	fix negative timeout resulting in select() EINVAL	rofl0r

2020-09-08	get_request_entity: fix regression w/ CONNECT method	rofl0r
	introduced in 88153e944f7d28f57cccc77f3228a3f54f78ce4e. when connect method is used (HTTPS), and e.g. a filtered domain requested, there's no data on readfds, only on writefds. this caused the response from the connection to hang until the timeout was hit. in the past in such scenario always a "no entity" response was produced in tinyproxy logs.
2020-09-07	make acl lookup 450x faster by using sblist	rofl0r
	tested with 32K acl rules, generated by for x in `seq 128` ; do for y in `seq 255` ; do \ echo "Deny 10.$x.$y.0/24" ; done ; done after loading the config (which is dogslow too), tinyproxy required 9.5 seconds for the acl check on every request. after switching the list implementation to sblist, a request with the full acl check now takes only 0.025 seconds. the time spent for loading the config file is identical for both list implementations, roughly 30 seconds. (in a previous test, 65K acl rules were generated, but every connection required almost 2 minutes to crunch through the list...)