Searching for nonascii characters

hypnomarten · 2025-02-10T23:58:18+00:00

I found this in the info about the clases: Note that the ‘[’ and ‘]’ characters that enclose the class name are part of the name, so a regular expression using these classes needs one more pair of brackets. For example, a regular expression matching a sequence of one or more letters and digits would be ‘[[:alnum:]]+’, not ‘[:alnum:]+’.

So, try [[:noascii:]]

Seems to work for me.

mmaug · 2025-02-11T02:51:47+00:00

What is the encoding of the buffer? If the buffer is Unicode then octal bytes are not character code points. But if the file is actually a different encoding, making sure the buffer is using the same encoding could significantly alter your perception of the file contents.

Locating bytes displayed in octal is a separate problem and more specifics are needed to definitively recommend a solution. File encoding and Unicode display are incredibly complex topics that is far more difficult that the ASCII model implies.

mmaug · 2025-02-11T15:47:38+00:00

(You've stirred up my ADHD — I won't rest for weeks…)

A quick DDG search and Wikipedia convinces me that if your file is properly encoded that there'll be FOSS tools around for manipulating the data you have. The MARC-8 and MARC-21 file formats are well documented and have formal specifications. My guess is that you are not the first one to try to make sense of what they hold. Stand on the shoulders of others if you can…

emacs

MODERATORS

emacs

MODERATORS

Welcome to Reddit.

Want to add to the discussion?