I have a NUL delimited output coming from the following command :

some commands | grep -i -c -w -Z 'some regex'

The output consists of records of the format :

[file name]\0[pattern count]\0

I want to use text manipulation tools, such as sed/awk, to change the records to the following format :

[file name]:[pattern count]\0

But it seems that sed/awk usually handles only records delimited by the "newline" character. I would like to know that how sed/awk could be used to achieve my purpose, or if sed/awk could not handle such case what other Linux tool should I use.

Thanks for any suggestion.

Lawrence

share|improve this question
    
so how do you look at this file? with a hex editor? How does it know where to 'break' the lines? Why not just convert the '\0' to '\n' and have a nice easy to read file that can be processed using the standard unix paradigm? Otherwise at every step, you'll be fighting the basic law of unix, "each record on its own line" ! ;-) Life is too short, There are much more interesting problems to do battle with. Can you get the original source of output to use '\n' or ... shudder, '\r\n' ? Good luck. – shellter Feb 7 '12 at 3:17
    
The output is not to be displayed, it is piped into another command. I use NUL as separator as Linux file names could have "newline" character in it. I agree that life is only too short for us to figure out all the solutions for our questions. – user1129812 Feb 7 '12 at 3:50
    
but a filename is a different piece of 'data' than the data included in a pipe. the 2 only meet as an when data is written into file with a name that may have a '\n' in it. Good luck. – shellter Feb 7 '12 at 4:07
    
I finally figure out that grep -c -Z would only place a NUL character after [file name] but would place a "newline" character after [pattern count]. I now choose not to use the grep -Z option but TejasP's answer is still helpful for me to parse NUL delimited files using awk in the future. Thanks all. – user1129812 Feb 7 '12 at 6:03
up vote 2 down vote accepted

By default, the record separator is the newline character, defining a record to be a single line of text. You can use a different character by changing the built-in variable RS. The value of RS is a string that says how to separate records; the default value is "\n", the string containing just a newline character.

 awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
share|improve this answer
3  
I have tested that the command awk 'BEGIN { RS = "\0" } ; { print $0 }' could delimit records with the NUL character. But The GNU Awk User's Guide says that RS = "\0" Is Not Portable. Anyway, I could start with this command to try to change the NUL character before the [pattern count] to the ":" character in my case. – user1129812 Feb 7 '12 at 3:21

Since version 4.2.2, GNU sed has had the -z or --null-data option to do exactly this. Eg:

sed -z 's/old/new' null_separated_infile
share|improve this answer

Using sed for removing the null characters -

sed 's/\x0/ /g' infile > outfile

or make in-file substitution by doing (this will make backup of your original file and overwrite your original file with substitutions).

sed -i.bak 's/\x0/ /g' infile

Using tr:

tr -d "\000" < infile > outfile
share|improve this answer
    
or tr "\000" "\n" < infile > output :-?) – shellter Feb 7 '12 at 3:23
    
@shellter You are right. I was not sure if OP wanted to substitute them with newlines or remove them … :) – jaypal singh Feb 7 '12 at 3:37
    
But my purpose is to only replace the NUL character before the [pattern count], not to replace all NUL characters. – user1129812 Feb 7 '12 at 3:43
    
@user1129812 In that case you can use the sed command and remove the g option from it. g option is for making global substitutions. When removed, it will only make the change on first occurrence on each line. – jaypal singh Feb 7 '12 at 3:58

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.