summaryrefslogtreecommitdiffhomepage
path: root/src
AgeCommit message (Collapse)Author
2020-09-16move config reload message to reload_config()rofl0r
move it to before disabling logging, so a message with the correct timestamp is printed if logging was already enabled. also add a message when loading finished, so one can see from the timestamp how long it took. note that this only works on a real config reload triggered by SIGHUP/SIGUSR1, because on startup we don't know yet where to log to.
2020-09-16remove vector remainsrofl0r
2020-09-16log_message_storage: use sblistrofl0r
2020-09-16listen_addrs: use sblistrofl0r
2020-09-16basicauth: use sblistrofl0r
2020-09-16connect_ports: use sblistrofl0r
2020-09-16add_header: use sblistrofl0r
note that the old code inserted added headers at the beginning of the list, reasoning unknown. this seems counter-intuitive as the headers would end up in the request in the reverse order they were added, but this was irrelevant, as the headers were originally first put into the hashmap hashofheaders before sending it to the client. since the hashmap didn't preserve ordering, the headers would appear in random order anyway.
2020-09-16listen_fds: use sblistrofl0r
2020-09-15free a mem leak by statically allocating global statsbufrofl0r
2020-09-15main: include loop headerrofl0r
2020-09-15free() loop records toorofl0r
2020-09-15use poll() where availablerofl0r
2020-09-15prepare transition to poll()rofl0r
usage of select() is inefficient (because a huge fd_set array has to be initialized on each call) and insecure (because an fd >= FD_SETSIZE will cause out-of-bounds accesses using the FD_*SET macros, and a system can be set up to allow more than that number of fds using ulimit). for the moment we prepared a poll-like wrapper that still runs select() to test for regressions, and so we have fallback code for systems without poll().
2020-09-15refactor conns.[ch], put conn_s into child structrofl0r
this allows to access the conn member from the main thread handling the childs, plus simplifies the code.
2020-09-15hsearch: add seed to prevent another CVE-2012-3505 instancerofl0r
2020-09-15replace leftover users of hashmap with htabrofl0r
also fixes a bug where the ErrorFile directive would create a new hashmap on every added item, effectively allowing only the use of the last specified errornumber, and producing memory leaks on each config reload.
2020-09-15save headers in an ordered dictionaryrofl0r
due to the usage of a hashmap to store headers, when relaying them to the other side the order was not prevented. even though correct from a standards point-of-view, this caused issues with various programs, and it allows to fingerprint the use of tinyproxy. to implement this, i imported the MIT-licensed hsearch.[ch] from https://github.com/rofl0r/htab which was originally taken from musl libc. it's a simple and efficient hashtable implementation with far better performance characteristic than the one previously used by tinyproxy. additionally it has an API much more well-suited for this purpose. orderedmap.[ch] was implemented from scratch to address this issue. behind the scenes it uses an sblist to store string values, and a htab to store keys and the indices into the sblist. this allows us to iterate linearly over the sblist and then find the corresponding key in the hash table, so the headers can be reproduced in the order they were received. closes #73
2020-09-15fix free()ing of config itemsrofl0r
- we need to free the config after it has been succesfully loaded, not unconditionally before reloading. - we also need to free them before exiting from the main program to have clean valgrind output.
2020-09-15shutdown: free children from right placerofl0r
2020-09-15Revert "childs.c: fix minor memory leak"rofl0r
This reverts commit 6dd3806f7d1a337fb89e335e986e1fa4eab8340c.
2020-09-15childs.c: fix minor memory leakrofl0r
this would leak only once on program termination, so it's no big deal apart from having spurious reachable memory in valgrind logs.
2020-09-14main: orderly shutdown on SIGINT toorofl0r
the appropriate code in the signal handler was already set up, but for some reason the signal itself not being handled.
2020-09-14conf.c: include common.hrofl0r
2020-09-13fix get_request_entity()rofl0r
get_request_entity()'s purpose is to drain remaining unread bytes in the request read pipe before handing out an error page, and kinda surprisingly, also when connection to the stathost is done. in the stathost case tinyproxy just skipped proper processing and jumped to the error handler code, and remembering whether a connection to the stathost was desired in a variable, then doing things a bit differently depending on whether it's set. i tried to fix issues with get_request_entity in 88153e944f7d28f57cccc77f3228a3f54f78ce4e (which is basically the right fix for the issue it tried to solve, but incomplete), and resulting from there in 78cc5b72b18a3c0d196126bfbc5d3b6473386da9. the latter fix wasn't quite right since we're not supposed to check whether the socket is ready for writing, and having a return value of 2 instead of 1 got resulted in some of the if statements not kicking in when they should have. this also resulted in the stathost page no longer working. after in-depth study of the issue i realized that we only need to call get_request_entity() when the headers aren't completely read, additional to setting the proper connection timeout as 88153e944f7d28f57cccc77f3228a3f54f78ce4e already implemented. the changes of 78cc5b72b18a3c0d196126bfbc5d3b6473386da9 have been reverted.
2020-09-12add_new_errorpage(): fix segfault accessing global configrofl0r
another fallout of the config refactoring finished by 2e02dce0c3de4a231f74b44c34647406de507768. apparently no one using the ErrorFile directive used git master during the last months, as there have been no reports about this issue.
2020-09-12vector.h: missing include <unistd.h> for ssize_trofl0r
2020-09-10handle_connection(): print process_*_headers errno informationrofl0r
2020-09-10handle_connection: replace "goto fail" with func callrofl0r
this allows to see in a backtrace from where the error was triggered.
2020-09-10handle_connection(): factor out failure coderofl0r
this allows us in a next step to replace goto fail with a call to that function, so we can see in a backtrace from where the failure was triggered.
2020-09-09remove bogus custom timeout handling coderofl0r
in networking, hitting a timeout requires that *nothing* happens during the interval. whenever anything happens, the timeout is reset. there's no need to do custom time calculations, it's perfectly fine to let the kernel handle it using the select() syscall. additionally the code added in 0b9a74c29036f9215b2b97a301b7b25933054302 assures that read and write syscalls() don't block indefinitely and return on the timeout too, so there's no need to switch sockets back and forth between blocking/nonblocking.
2020-09-09fix negative timeout resulting in select() EINVALrofl0r
2020-09-08get_request_entity: fix regression w/ CONNECT methodrofl0r
introduced in 88153e944f7d28f57cccc77f3228a3f54f78ce4e. when connect method is used (HTTPS), and e.g. a filtered domain requested, there's no data on readfds, only on writefds. this caused the response from the connection to hang until the timeout was hit. in the past in such scenario always a "no entity" response was produced in tinyproxy logs.
2020-09-07make acl lookup 450x faster by using sblistrofl0r
tested with 32K acl rules, generated by for x in `seq 128` ; do for y in `seq 255` ; do \ echo "Deny 10.$x.$y.0/24" ; done ; done after loading the config (which is dogslow too), tinyproxy required 9.5 seconds for the acl check on every request. after switching the list implementation to sblist, a request with the full acl check now takes only 0.025 seconds. the time spent for loading the config file is identical for both list implementations, roughly 30 seconds. (in a previous test, 65K acl rules were generated, but every connection required almost 2 minutes to crunch through the list...)
2020-09-07acl: typedef access_list to acl_list_trofl0r
this allows to switch the underlying implementation easily.
2020-09-07check_acl: do full_inet_pton() only once per iprofl0r
if there's a long list of acl's, doing full_inet_pton() over and over with the same IP isn't really efficient.
2020-09-07get_request_entity: respect user-set timeoutrofl0r
get_request_entity() is only called on error, for example if a client doesn't pass a check_acl() check. in such a case it's possible that the client fd isn't yet ready to read from. using select() with a timeout timeval of {0,0} causes it to return immediately and return 0 if there's no data ready to be read. this resulted in immediate connection termination rather than returning the 403 access denied error page to the client and a confusing "no entity" message displayed in the proxy log.
2020-09-07change loglevel of start/stop/reload messages to NOTICErofl0r
this allows to see them when the verbose INFO loglevel is not desired. closes #78
2020-09-07upstream: fix ip/mask calculation for types other than nonerofl0r
the code wrongly processed the site_spec (here: domain) parameter only when PT_TYPE == PT_NONE. re-arranged code to process it correctly whenever passed. additionally the mask is now also applied to the passed subnet/ip, so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8. also the case where inet_aton fails now produces a proper error message. note that the code still doesn't process ipv6 addresses and mask. to support it, we should use the existing code in acl.c and refactor it so it can be used from both call sites. closes #83 closes #165
2020-09-07html-error: substitute template variables via a regexrofl0r
previously, in order to detect and insert {variables} into error/stats templates, tinyproxy iterated char-by-char over the input file, and would try to parse anything inside {} pairs and treat it like a variable name. this breaks CSS, and additionally it's dog slow as tinyproxy wrote every single character to the client via a write syscall. now we process line-by-line, and inspect all matches of the regex \{[a-z]{1,32}\}. if the contents of the regex are a known variable name, substitution is taking place. if not, the contents are passed as-is to the client. also the chunks before and after matches are written in a single syscall. closes #108
2020-09-07Do not give error while storing invalid header[anp/hsw]
2020-09-07config parser: increase possible line length limitrofl0r
let's use POSIX LINE_MAX (usually 4KB) instead of 1KB. closes #226
2020-09-06allow SIGUSR1 to be used as an alternative to SIGHUProfl0r
this allows a tinyproxy session in terminal foreground mode to reload its configuration without dropping active connections.
2020-09-06main.c: remove set_signal_handler code duplicationrofl0r
2020-09-06do not catch SIGHUP in foreground-moderofl0r
it's quite unexpected for an application running foreground in a terminal to keep running when the terminal is closed. also in such a case (if file logging is disabled) there's no way to see what's happening to the proxy.
2020-09-06send_html_file(): also set empty variables to "(unknown)"rofl0r
2020-09-06transparent: remove usage of inet_ntoa(), make IPv6 readyrofl0r
inet_ntoa() uses a static buffer and is therefore not threadsafe. additionally it has been deprecated by POSIX. by using inet_ntop() instead the code has been made ipv6 aware. note that this codepath was only entered in the unlikely event that no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
2020-09-05filter: reduce memory usage, fix OOM crashesrofl0r
* check return values of memory allocation and abort gracefully in out-of-memory situations * use sblist (linear dynamic array) instead of linked list - this removes one pointer per filter rule - removes need to manually allocate/free every single list item (instead block allocation is used) - simplifies code * remove storage of (unused) input rule - removes one char* pointer per filter rule - removes storage of the raw bytes of each filter rule * add line number to display on out-of-memory/invalid regex situation * replace duplicate filter_domain()/filter_host() code with a single function filter_run() - reduces code size and management effort with these improvements, >1 million regex rules can be loaded with 4 GB of RAM, whereas previously it crashed with about 950K. the list for testing was assembled from http://www.shallalist.de/Downloads/shallalist.tar.gz closes #20
2020-09-01Change loglevel for "Maximum number of connections reached"Nicolai Søborg
I was hit by this, and did not see anything in the log, connections was just hanging. Think warning is a better log level
2020-08-19upstream: allow port 0 to be specifiedrofl0r
this is useful to use upstream directive to null-route a specific target domain. e.g. upstream http 0.0.0.0:0 ".adserver.com"
2020-07-15enforce socket timeout on new sockets via setsockopt()rofl0r
the timeout option set by the config file wasn't respected at all so it could happen that connections became stale and were never released, which eventually caused tinyproxy to hit the limit of open connections and never accepting new ones. addresses #274