Age | Commit message (Collapse) | Author |
|
|
|
also fixes a bug where the ErrorFile directive would create a
new hashmap on every added item, effectively allowing only
the use of the last specified errornumber, and producing memory
leaks on each config reload.
|
|
due to the usage of a hashmap to store headers, when relaying them
to the other side the order was not prevented.
even though correct from a standards point-of-view, this caused
issues with various programs, and it allows to fingerprint the use
of tinyproxy.
to implement this, i imported the MIT-licensed hsearch.[ch] from
https://github.com/rofl0r/htab which was originally taken from
musl libc. it's a simple and efficient hashtable implementation
with far better performance characteristic than the one previously
used by tinyproxy. additionally it has an API much more well-suited
for this purpose.
orderedmap.[ch] was implemented from scratch to address this issue.
behind the scenes it uses an sblist to store string values, and a htab
to store keys and the indices into the sblist.
this allows us to iterate linearly over the sblist and then find the
corresponding key in the hash table, so the headers can be reproduced
in the order they were received.
closes #73
|
|
- we need to free the config after it has been succesfully loaded,
not unconditionally before reloading.
- we also need to free them before exiting from the main program
to have clean valgrind output.
|
|
|
|
This reverts commit 6dd3806f7d1a337fb89e335e986e1fa4eab8340c.
|
|
this would leak only once on program termination, so it's no big
deal apart from having spurious reachable memory in valgrind logs.
|
|
the appropriate code in the signal handler was already set up,
but for some reason the signal itself not being handled.
|
|
|
|
get_request_entity()'s purpose is to drain remaining unread bytes
in the request read pipe before handing out an error page,
and kinda surprisingly, also when connection to the stathost is
done.
in the stathost case tinyproxy just skipped proper processing and
jumped to the error handler code, and remembering whether a
connection to the stathost was desired in a variable, then doing
things a bit differently depending on whether it's set.
i tried to fix issues with get_request_entity in
88153e944f7d28f57cccc77f3228a3f54f78ce4e (which is basically the
right fix for the issue it tried to solve, but incomplete),
and resulting from there in 78cc5b72b18a3c0d196126bfbc5d3b6473386da9.
the latter fix wasn't quite right since we're not supposed to check
whether the socket is ready for writing, and having a return value
of 2 instead of 1 got resulted in some of the if statements not
kicking in when they should have.
this also resulted in the stathost page no longer working.
after in-depth study of the issue i realized that we only need to
call get_request_entity() when the headers aren't completely read,
additional to setting the proper connection timeout as
88153e944f7d28f57cccc77f3228a3f54f78ce4e already implemented.
the changes of 78cc5b72b18a3c0d196126bfbc5d3b6473386da9 have been
reverted.
|
|
another fallout of the config refactoring finished by
2e02dce0c3de4a231f74b44c34647406de507768.
apparently no one using the ErrorFile directive used git master
during the last months, as there have been no reports about this issue.
|
|
|
|
|
|
this allows to see in a backtrace from where the error was
triggered.
|
|
this allows us in a next step to replace goto fail with a call to that
function, so we can see in a backtrace from where the failure was
triggered.
|
|
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.
additionally the code added in 0b9a74c29036f9215b2b97a301b7b25933054302
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
|
|
|
|
|
|
since there are numerous changes in HTTP/1.1, the proxyserver will
stick to using HTTP/1.0 for internal usage, however when a connection
is requested with HTTP/1.x from now on we will duplicate the minor revision
the client requested, because apparently some servers refuse to accept
HTTP/1.0
addresses #152.
|
|
introduced in 88153e944f7d28f57cccc77f3228a3f54f78ce4e.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.
this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
|
|
tested with 32K acl rules, generated by
for x in `seq 128` ; do for y in `seq 255` ; do \
echo "Deny 10.$x.$y.0/24" ; done ; done
after loading the config (which is dogslow too), tinyproxy
required 9.5 seconds for the acl check on every request.
after switching the list implementation to sblist, a request
with the full acl check now takes only 0.025 seconds.
the time spent for loading the config file is identical for both
list implementations, roughly 30 seconds.
(in a previous test, 65K acl rules were generated, but every
connection required almost 2 minutes to crunch through the list...)
|
|
this allows to switch the underlying implementation easily.
|
|
if there's a long list of acl's, doing full_inet_pton() over
and over with the same IP isn't really efficient.
|
|
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
|
|
this allows to see them when the verbose INFO loglevel is not desired.
closes #78
|
|
the code wrongly processed the site_spec (here: domain) parameter
only when PT_TYPE == PT_NONE.
re-arranged code to process it correctly whenever passed.
additionally the mask is now also applied to the passed subnet/ip,
so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8.
also the case where inet_aton fails now produces a proper error
message.
note that the code still doesn't process ipv6 addresses and mask.
to support it, we should use the existing code in acl.c and refactor
it so it can be used from both call sites.
closes #83
closes #165
|
|
previously, in order to detect and insert {variables} into error/stats
templates, tinyproxy iterated char-by-char over the input file, and would
try to parse anything inside {} pairs and treat it like a variable name.
this breaks CSS, and additionally it's dog slow as tinyproxy wrote every
single character to the client via a write syscall.
now we process line-by-line, and inspect all matches of the regex
\{[a-z]{1,32}\}. if the contents of the regex are a known variable name,
substitution is taking place. if not, the contents are passed as-is to
the client. also the chunks before and after matches are written in
a single syscall.
closes #108
|
|
|
|
let's use POSIX LINE_MAX (usually 4KB) instead of 1KB.
closes #226
|
|
this allows a tinyproxy session in terminal foreground mode to reload
its configuration without dropping active connections.
|
|
|
|
it's quite unexpected for an application running foreground in a
terminal to keep running when the terminal is closed.
also in such a case (if file logging is disabled) there's no way to
see what's happening to the proxy.
|
|
|
|
inet_ntoa() uses a static buffer and is therefore not threadsafe.
additionally it has been deprecated by POSIX.
by using inet_ntop() instead the code has been made ipv6 aware.
note that this codepath was only entered in the unlikely event that
no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
|
|
* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gz
closes #20
|
|
I was hit by this, and did not see anything in the log, connections was just hanging.
Think warning is a better log level
|
|
this is useful to use upstream directive to null-route a specific target
domain.
e.g.
upstream http 0.0.0.0:0 ".adserver.com"
|
|
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.
addresses #274
|
|
regression introduced in f6d4da5d81694721bf50b2275621e7ce84e6da30.
this has been overlooked due to the assert macro being optimized out in
non-debug builds.
|
|
getsockname() requires addrlen to be set to the size of the sockaddr struct
passed as the addr, and a check whether the returned addrlen exceeds the
initially passed size (to determine whether the address returned is truncated).
with a request like "GET /\r\n\r\n" where length is 0 this caused the code
to assume success and use the values of the uninitialized sockaddr struct.
|
|
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.
we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.
fixes #292
|
|
... in case reloading of it after SIGHUP fails, the old config can
continue working.
(apart from the logging-related issue mentioned in 27d96df99900c5a62ab0fdf2a37565e78f256d6a )
|
|
previously, default values were stored once into a static struct,
then on each reload item by item copied manually into a "new"
config struct.
this has proven to be errorprone, as additions in one of the 2
locations were not propagated to the second one, apart from
being simply a lot of gratuitous code.
we now simply load the default values directly into the config
struct to be used on each reload.
closes #283
|
|
as a side effect of not updating the config pointer when loading
the config file fails, the "FIXME" level comment to take appropriate
action in that case has been removed. the only issue remaining
when receiving a SIGHUP and encountering a malformed config file would
now be the case that output to syslog/logfile won't be resumed, if
initially so configured.
|
|
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.
unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.
right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
|
|
since this is set via command line, we can deal with it easily
from where it is actually needed.
|
|
since this option can't be set via config file, it makes sense
to factor it out and use it only where strictly needed, e.g. in
startup code.
|
|
if daemon mode is used and neither logfile nor syslog options specified,
this is clearly a misconfiguration issue. don't try to be smart and work
around that, so less global state information is required.
also, this case is already checked for in main.c:334.
|
|
LOG_DAEMON isn't specified in POSIX and the gratuitously different
treatment is in the way of a planned cleanup.
|
|
|