Skip to content

proposal: spec: disallow LTR/RTL characters in string literals? #20209

Open
@karalabe

Description

TL;DR https://play.golang.org/p/LPkPTRF7fC

The above code looks quite plain and obvious, except it does something completely different than you'd expect (feel free to run it). The obvious thing that should happen is that it counts the number of bits set in the given string. The non-obvious thing that happens is that the mask is actually 0, not 0x01.

So, what happened there? The abuse in the above code is around the invisible Unicode characters that mark following text to be right-to-left or left-to-right. Since Go permits arbitrary Unicode characters to be present in string literals, it also allows me to have a string of the form "bla bla blaabc". Since we're dealing with valid Unicode sequences here, any modern editor/website will actually interpret those special characters, causing the content in between the two special marks to be reversed to the end of the line (alas still part of the string literal).

In my playground code this is abused by having the following source code:

str, mask := "Hello, World!<rtl>10x<ltr>", 0

Which will be displayed by all modern editors/websites as:

str, mask := "Hello, World!", 0x01

The security aspect of this issue is social engineering attacks. The "display" line of my sample code is obvious beyond doubt, so noone will ever inspect such a thing; however it managed to flip one bit in a mask (imagine doing this for file permission umasks). The issue is that such code could easily get past reviews and into a final product.

The tricky part is how to avoid these issues. My only meaningful suggestion would be to disallow these two special characters in string literals. This does break Go 1.0 compatibility guarantees, however I think it's worth it (I can't really figure out a meaningful use case for it). Using it in a single line string will screw up the display of the source code, so it doesn't make sense to do it, and using it in a multi line string is questionable at best. I think requiring users to explicitly use \x notation for adding these characters it a good compromise to protect source code sanity.

Activity

changed the title Code obfuscation through invisible right-to-left/left-to-right characters spec: disallow LTR/RTL characters in string literals? on May 2, 2017
changed the title spec: disallow LTR/RTL characters in string literals? proposal: spec: disallow LTR/RTL characters in string literals? on May 2, 2017
added this to the Proposal milestone on May 2, 2017
cznic

cznic commented on May 3, 2017

@cznic
Contributor

String literals must be able to hold any byte sequence whatsoever. Using a string to hold binary data is not uncommon and this proposal will introduce a forbidden byte sequence.

I understand the problem (nice trick BTW), but I don't think this is a way to solve it.

bradfitz

bradfitz commented on May 3, 2017

@bradfitz
Contributor

@cznic,

String literals must be able to hold any byte sequence whatsoever.

Yes, but not in their raw form. Only if they're escaped.

Even today:

      s := "`\"`"

... you need to escape the double quote. We could require that RTL/LTR characters also need to be escaped.

cznic

cznic commented on May 3, 2017

@cznic
Contributor

We could require that RTL/LTR characters also need to be escaped.

Yes, but that's unfortunately only a partial solution: https://play.golang.org/p/ZL_QF7xc-e.

karalabe

karalabe commented on May 3, 2017

@karalabe
ContributorAuthor
bradfitz

bradfitz commented on May 3, 2017

@bradfitz
Contributor

@karalabe, yes.

cznic

cznic commented on May 3, 2017

@cznic
Contributor

It would need to be forbidden from `` as well.

I don't like to say that, but then we are back to square 1 (modulo the backtick). To make things even worse, there's also the Go 1 compatibility promise and I'm quite unsure if such breaking change qualifies under it.

karalabe

karalabe commented on May 3, 2017

@karalabe
ContributorAuthor

That's the catch 22 :)

karalabe

karalabe commented on May 3, 2017

@karalabe
ContributorAuthor

Perhaps a nice compromise (alas it makes for ugly parsing rules and introduces corner cases) is to forbid the two characters in single line strings (irrelevant of the type) and only allow it in multiline strings for lines that are fully inside (i.e. not the same line as the opening or closing tick is on). In reality I think it's enough to disallow for the closing tick's line.

These "forbidden" cases would already even now screw up the code so I cannot imagine anyone benevoletly using the in such a way. Yet by limiting to the closing mark only, we can retain the functionality of rtl characters that some may have in backtick code.

All in all personally I'd just forbid everywhere but I'm not accustomed to rtl use cases so I can't really comment on how people use them in completely valid and meaningful ways.

jimmyfrasche

jimmyfrasche commented on May 3, 2017

@jimmyfrasche
Member

This could be handled by a linter of some kind, but then people need to be aware of that tool and run it.

I know this doesn't match all the criteria for go vet, but adding it there may be best.

If go vet is added to go test per #18084 it would be run by default in a lot of situations making it harder to sneak this into a codebase.

Of course, that doesn't close the hole: it just makes it smaller.

Given how unlikely this is to be done on purpose without malicious intent, breaking Go 1 (which is allowed when it comes to security) and making it illegal across the board probably wouldn't cause anyone any pain.

randall77

randall77 commented on May 3, 2017

@randall77
Contributor

I think solving this problem in the language is the wrong place.
How about in your code review tool? It would be as simple as highlighting weird/unusual UTF8, homographs, ... Then a human can decide whether it is ok or not.

If you're not doing code reviews, you have bigger problems.

karalabe

karalabe commented on May 3, 2017

@karalabe
ContributorAuthor

GitHub flat out stated they don't care about these kinds of attack vectors. Given that most Go code is hosted on GitHub, most reviews will be done there too.

Also these two characters are not homoglyphs, they are invisible.

20 remaining items

rsc

rsc commented on Jun 13, 2017

@rsc
Contributor

Does someone want to come up with a list of what a vet "unicode" check would check?

rsc

rsc commented on Oct 10, 2017

@rsc
Contributor

I think we agree that the first step is to add a vet check and only after building experience with it think about actual language restrictions (or not). Marking this proposal-hold until there is a proposal (a short list would be fine) of what a vet "unicode" check would check.

/cc @nigeltao @mpvl

dolmen

dolmen commented on Aug 13, 2020

@dolmen
Contributor

In #40717 I propose 3 codepoints to add to that list for a vet check: U+115F, U+1160, U+3164

holiman

holiman commented on Nov 1, 2021

@holiman

"‘Trojan Source’ Bug Threatens the Security of All Code": https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/

Apparently some researchers with the University of Cambridge have given this (type of) bug a cool-sounding name now.

locked and limited conversation to collaborators on Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @bradfitz@rsc@dolmen@karalabe@holiman

      Issue actions

        proposal: spec: disallow LTR/RTL characters in string literals? · Issue #20209 · golang/go