Saturday, September 18, 2010

FINDSTR and Unicode strings

Recently I had another look at my old post on finding unused resources in a .NET project, and realized that it would not work if any of the resource strings contain Unicode characters. The culprit turns out to be the FINDSTR command, which doesn't support Unicode text files. To solve the problem, I replace FINDSTR with FIND.EXE (which supports Unicode). However, all is lost when my usage of FOR to tokenize the input files stops working with Unicode input. I also tried a few suggestions here and here to no avail.

My simple conclusion is that, batch script was not designed with Unicode in mind. Perhaps Windows Powershell can probably do a better job...

2 comments:

  1. since find.exe will convert unicode to ansi, you can pipe find into findstr by specifying /v and a garbage string

    find /v "ThisIsGarbage" file.unicode | findstr

    ReplyDelete