Common IRC bot URL title vulnerabilities
Carriage return in title:
http://irc-bot-science.clsr.net/test Correct responses: test
testQUIT :Look at me, I'm an IRC bot with security holes!(ignore this if it's all in one line)
test QUIT :Look at me, I'm an IRC bot with security holes! (ignore this if it's all in one line)
Incorrect responses: - (quitting with the quit message
Look at me, I'm an IRC bot with security holes!
)
Solution: strip carriage return and newline characters from the title before printing it
Valid but uncommon tag formats:
http://irc-bot-science.clsr.net/tags Correct responses: Incorrect responses: - (same as in a page without a <title> tag)
this is a site <title>
Solution: use a proper HTML parser or a more robust regex (e.g. something like
<title[^>]*>([^<]*)</title\s*>
in case-insensitive mode), then decode HTML entities in it; also see
hard mode, which probably requires a HTML parser
No <title> tag:
http://irc-bot-science.clsr.net/notitle Correct responses: - (no response)
[no title]
(or equivalent)
Incorrect responses: - (page content or crashing)
Solution: handle the case where a title tag cannot be found
Long title messages:
http://irc-bot-science.clsr.net/long Correct responses: testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttest...
(or other truncated length)
Incorrect responses: Solution: always truncate the result before sending it in a message (maximum IRC message size, including source, command and args, is 512 bytes, so maximum length is somewhere around 450)
Large file size:
http://irc-bot-science.clsr.net/internet.gz Correct responses: - (no response)
- (any variant of representing the number
1263157894736842240 bytes
, filetype application/x-gzip
or the filename the-internet.gz
)
Incorrect responses: - (stalling forever opening the page)
- (saying that the filename is
internet.gz
Solution: if handling non-HTML pages, use at least 64-bit integers to store filesize, get filenames from Content-Disposition and don't read the page content
IP address:
http://irc-bot-science.clsr.net/ip Correct responses: IP address: (the bot's IP (v4 or v6) address)
IP address: (something that masks the address)
Incorrect responses: - (none, this is just to show that the bot IP will be publicly accessible)
Solution: there is no viable way to detect any representation of the IP (e.g. if the string from the above link will be hidden, use
this link instead and base64-decode the result)
CTCP messages:
http://irc-bot-science.clsr.net/ctcp Correct responses: Incorrect responses: - performing the CTCP action
* BotNick is a shit bot
Solution: strip ASCII SOH (byte 0x01) from the start and end of the message or prefix the title with some string
Infinite redirect:
http://irc-bot-science.clsr.net/redirect Correct responses: - (no response)
[too many redirects]
(or equivalent)
Incorrect responses: - (following the redirects forever)
Solution: have a limit on the number of followed redirects
1 GiB HTML page (but HEAD returns Content-Length: 42):
http://irc-bot-science.clsr.net/fakelength Correct responses: - (no response)
[page too large]
(or equivalent)
Incorrect responses: - (getting OOM killed)
congratulations, didn't OOM
(should have stopped reading sooner)
Solution: only read some of the page (e.g. 16 KiB); in most sane pages, the title will be at the beginning; also, have a timeout in case the page loads too slowly
Page with title at the beginning, followed by a gigabyte of data:
http://irc-bot-science.clsr.net/large Correct responses: If this title is printed, it works correctly.
Incorrect responses: - (same as in the 1 GiB page; there is a title within the first 81 bytes, no need to read the whole page)
Solution: only read the start of the page (e.g. 16 KiB) and try to find the <title> tag in that, even if it wasn't the whole page
1 GiB of small headers:
http://irc-bot-science.clsr.net/longheaders Correct responses: - (no response)
[page too large]
(or equivalent)
Incorrect responses: - (getting OOM killed)
Reading a gigabyte of headers surely seems like a waste...
(should have stopped reading sooner)
Solution: include headers in your timeout and/or read size limit
Extremely long header
http://irc-bot-science.clsr.net/bigheader Correct responses: - (no response)
[page too large]
(or equivalent)
Incorrect responses: - (getting OOM killed)
Congratulations, you just read a billion digits of pi in a header.
(should have stopped reading sooner)
Solution: set your limits on actual data read, not just number of headers
Compose a <title> message: