Say I am editing some file with vim (or gvim). I have no idea about the file's encoding and I want to know whether it is in UTF-8 or ISO-8859-1 or whatever? Can I somehow tell vim to show me what encoding is used?
The fileencoding setting shows the current buffer's encoding:
:set fileencoding
fileencoding=utf8
There really isn't a common way to determine the encoding of a plaintext file, as that information isn't saved in the file itself - except UTF-8 Files where you've got a so called BOM which indicates the Encoding. This is why xml and html files have charset metatags.
You can enforce a particular encoding with the 'encoding' setting. See :help encoding
and :help fileencoding
in Vim for how the editor handles these settings. You can also add several fileencoding settings to your vimrc to have vim try detecting based on the ones listed.
-
7Unfortunatelly, not correct. For Vim cannot find the encoding of the file you're reading. It is not written in the file. It can only guess based on the available characters in the file. For example a file with the text "abcdef" can be in several encodings, since practically all support those characters, but a file with "šđčćž" will likely be in CP1252. So, you're not reading the encoding from somewhere, but guessing what encoding could that be, and based on that displaying it properly. – Rook Aug 24 '09 at 14:29
-
6What you are doing here is explicitly setting the encoding, based on your observations of the file's contents. If you wish for vim to try several encoding, when opening a file, put several of them in the option in your _vimrc. – Rook Aug 24 '09 at 14:32
-
@ldigas, thanks for the feedback, I've updated the answer to be a bit more clear on that (I hope!) – jtimberman Aug 24 '09 at 15:18
-
I only wish that the answer were this easy. It's not, see my answer below for the 'right' way and explanation. – dotancohen Dec 26 '13 at 7:00
-
2Probably worth mentioning that BOMs are 1.) Not unique to UTF-8 -- though UTF-8's is distinct from other BOMs, 2.) Not required and often not found in UTF-8. – ruffin Oct 16 '14 at 15:09
Note that files' encoding is not explicitly stated anywhere in a file. Thus, VIM and other applications must guess at the encoding. The canonical way of doing this is with the chardet
application, which can be run from within VIM as so:
:!chardet %
The answer provided by jtimberman shows you the encoding of the current buffer which may not be the same encoding as the file on disk. Thus, you will notice that chardet
will sometimes show a different encoding than VIM, especially if you have VIM configured to always use a specific encoding (i.e. UTF-8).
The nice thing about chardet
is that it gives a confidence score for its guess, whereas VIM can be (and often is) wrong about guessing the encoding if there are not many characters above \x7F (ASCII 127). For instance, adding a single א
to a long file of PHP code makes chardet
think that the file is ISO-8859-2
with a confidence of 0.72, whereas adding the slightly longer phrase שלום, עולם!
gives UTF-8 with a confidence score of 0.99. In both cases, set fileencoding?
showed UTF-8
not because the file on disk was UTF-8, but because VIM is configured to use UTF-8 internally.
-
I suggest that you mention a word about the availability of chardet across OS'es. – Soundararajan Aug 31 '18 at 9:28
-
@Soundararajan: I'm probably not the guy to mention that as I use Debian and CentOS only. You are invited to edit the answer if you have relevant information, though. Thanks! – dotancohen Aug 31 '18 at 12:28
-
I don't see the need to do that inside VIM, better to do it from outside:
chardet <file>
. Still, good suggestion. – lepe Aug 3 '19 at 7:10
I found that : https://vim.fandom.com/wiki/Reloading_a_file_using_a_different_encoding
You can reload a file using a different encoding if Vim was not able to detect the correct encoding :
:e ++enc=<encoding>
where encoding
could be cp850, ISO-8859-1, UTF-8, ...
You can use file yourfilename
to find encoding or chardetect
(provided by python-chardet
or uchardet
depending your Linux distribution) as suggested by dotancohen.
-
This doesn't answer the question of how to find out current encoding. Instead this command will force some other encoding on the buffer. – Ruslan Aug 9 '19 at 9:55