Excel: Error when CSV file starts with “I” and “D”

mckoss · 6 hours ago

Maybe my bug? I wrote most of the original Excel text-based file import code in 1984 (including the SLYK format, which was the main way we migrated files from Multiplan to Excel).

reply

jeffwass · 5 hours ago

Hi mckoss, awesome to hear one of the main Excel architects is here on HN!

Any interesting stories to share? I think we'd all be interested.

Mind if I ask a question directly related to one of my comments a few days ago? I was lamenting the fact that the ASCII codes 29, 30, 31 (Group, Record, and File separators) never really became widely implemented, as these were specifically designed to delimit data. Ie, one could easily include commas, new lines/carriage returns, etc in data cells without clashing. But instead CSV seems to be the most common standard for tabular data.

Were these ASCII codes ever considered for tabular files?

Thanks.

reply

simonh · 14 minutes ago

You couldn't conveniently type those codes on a keyboard and there was no standard way to display them. You'd need a special editor for such files, eliminating the main advantage of an interchangeable text based file format.

reply

rjbwork · 6 hours ago

This is one of the great things about HN, and indeed the internet at large. Someone posts a link to a little quirk of our world, and then someone is able to pipe up and say "hey that's me! I made that thing!" Even better when it's something like a software bug.

reply

batat · 58 minutes ago

Comment proving HN superiority over other pathetic news websites.

reply

NamTaf · 6 hours ago

This is why I like HN. Opening the comments and seeing "oh maybe this is due to something I did back in the 1980s when I wrote most of that, here's some inside info/explanation" about something as significant as Excel's file handling is something that will never grow old for me.

reply

magoon · 29 minutes ago

32 years ago! What are the chances your code has survived that long in a product like Excel?

reply

hasenj · 5 hours ago

Was it implemented before the generic CSV import was implemented?

It seems like if parsing fails it should throw an exception and fallback to regular csv parsing.

Am I missing something?

reply

ygra · 5 hours ago

That's probably how it has been fixed. I cannot test right now, but since the article doesn't apply to newer versions of Office, I guess they are no longer susceptible to that.

reply

moobs · 5 hours ago

This is so funny, I was literally fixing this bug last night. Still applies to office 2013

reply

cm2187 · 5 hours ago

Another bug with CSV files is that opening a CSV file triggers a full recalculation, even if the calculations are set to manual. This is clearly a wrong behaviour.

reply

Piskvorrr · 1 hour ago

Nope - just no longer admitting to it. Had to redo a bunch of exports because of this recently.

reply

saynsedit · 6 hours ago

This is why HN is cool. You see a Microsoft knowledgebase article documenting an obsolete enterprise software's bug and its workaround, then an anonymous account takes credit for it. This is what I come here for.

reply

JohnDoe365 · 4 hours ago

Kudos for standing up and saying "Hey, may be my fault"

reply

chris_wot · 1 hour ago

Maybe this was written quite a bit later, but can you explain why the following would allow data to be returned in Excel 97?

"If you move your mouse pointer continuously while the data is being returned to Microsoft Excel, the query may not fail. Do not stop moving the mouse until all the data has been returned to Microsoft Excel."

https://support.microsoft.com/en-us/kb/168702

I've always wanted to know why Method 2 works!

reply

ximeng · 3 hours ago

"SYLK": https://en.wikipedia.org/wiki/SYmbolic_LinK_(SYLK) not SLYK I assume

reply

combatentropy · 6 hours ago

I've run into this error many times for a decade, always when I make a CSV file of a table whose first column is "ID".

It seems to me that Microsoft by now could have improved its tests. If the first two letters are "ID", but if "there are no valid SYLK codes after the 'ID' characters," then maybe it was never meant to be an SYLK file. If the file's suffix is .csv, then maybe you should just treat it as a CSV file. If the file's suffix is .txt, then maybe you should treat it as a text file.

reply

coroxout · 3 hours ago

I've run into this bug too and it was mystifying and (iirc) not overridden by manually loading the file using the Text Import wizard. But this...

"If the file's suffix is .csv, then maybe you should just treat it as a CSV file"

doesn't make me any happier, because there are a lot of slight variations on CSV files or a lot of different ways you might want to load a CSV file (perhaps you'd like to specify the character set, or set columns to be text or date format, or not have leading zeroes stripped or numbers in brackets turned into negative values), and if I absentmindedly name them .csv then again I can't use the full loader, because Excel knows best.

And if you want to read a CSV file with a VB macro it's even worse, because you can specify all the parameters but it just silently overrides them all with the CSV defaults just because the file extension is CSV. Hours of debugging spent on that...

reply

mkesper · 3 hours ago

Yes, I always use .txt when exporting/importing CVS from/to Excel. Fun!

reply

13of40 · 4 hours ago

If you're a fan of this sort of thing, try the following in CMD. (It helps if you read it in the voice of a 90's rapping cartoon cat.)

echo MZ is my name my name is MZ. I'm the hippest loader bug from the sea to the sea! >foo.txt

.\foo.txt

reply

Retr0spectrum · 4 hours ago

What does this do? I don't have access to a windows machine.

reply

13of40 · 3 hours ago

For historical reasons, the MZ is interpreted as the header of a 16-bit executable. When you run something under CMD (hopefully I'm getting this right, it's been a few years) it first runs it using the CreateProcess API, which just tries to run the file as an executable, regardless of extension, then it falls back to ShellExecute if it doesn't have an MZ header, which dispatches based on extension.

Basically you get a dialog saying the text file isn't a compatible executable. If you change the MZ to anything else, it opens in notepad.

reply

Piskvorrr · 1 hour ago

Well, you are trying to execute the text file (with ShellExecute as a fallback). Windows does what you ordered it to.

reply

tehmaco · 2 hours ago

Windows 7 64 bit here - it just has the error message box, and the same text in the cmd window, it doesn't open the file though :)

"This version of foo.txt is not compatible with the version of Windows you're running. Check your computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher."

reply

jacobush · 6 hours ago

I can think if things breaking for each of those.

reply

mhd · 4 hours ago

So this would be another problem that could be avoided if "CSV" would actually friggin mean "comma separated values" and not "whatever Windows might consider comma-equivalent depending on locale and/or phase of the moon"?

reply

boulos · 4 hours ago

No. From the detailed description:

> A SYLK file is a text file that begins with "ID" or "ID_xxxx", where xxxx is a text string. The first record of a SYLK file is the ID_Number record. When Excel identifies this text at the beginning of a text file, it interprets the file as a SYLK file. Excel tries to convert the file from the SYLK format, but cannot do so because there are no valid SYLK codes after the "ID" characters. Because Excel cannot convert the file, you receive the error message.

This is usually called "magic string" or "magic number" (https://en.m.wikipedia.org/wiki/Magic_number_(programming)). It has nothing to do with the comma, and everything to do with SYLK using a pretty risky magic string (ID) and Excel not having a "try SYLK, if that fails, try as csv".

tl;dr: this is about guessing the input format and has nothing to do with the delimiter.

reply

mhd · 3 hours ago

It's not just ID, there's also mandatory fields after that, separated by a semicolon. Thus 'ID;P' would be a valid header for SYLK and e.g. for German "CSV"s. If CSV would stick to commas (and boohoo, just escape more fields when this causes locale issues), then even such overly simple heuristics would've sufficed, wouldn't it?

Although you'd just run into other pathetic cases with such an informal format (CSV, not SYLK). But hey, I need to interchange formats with some IT guys and don't have a big IT department (ready) and they don't want to parse XLS(X), so I just use CSV. Now you have two problems.

reply

philh · 1 hour ago

"ID;P" is valid at the start of CSV as well. Fields are allowed to contain semicolons.

reply

Frank2312 · 13 minutes ago

Depending on locale.

I believe in the Canadian French locale (and maybe many other locales), ";" is considered a separator (a "," equivalent).

reply

stinos · 44 minutes ago

Pretty sure any locale-aware OS treats comma different depending on what locale has been used, no? Anyway commas are historically used as seperators in numbers in some countries, and in text, etc, so if you really want to shed your blame all over something, maybe it shouldn't be Windows here but rather the people who came up with comma as a record seperator in the first place. Probably it was ok for them in that time, at that place (US I'm guessing) and for the usage needed back then, but I'd really wish they chose a different seperator.

reply

taneq · 4 hours ago

What's a comma if not a Unicode character that's comma-equivalent in the current locale?

I suspect you're thinking of a comma as ASCII 44. If your source text isn't ASCII then it's just not this simple.

reply

derefr · 3 hours ago

I expect what the parent is truly imagining (but hasn't really thought through the usability implications of) is a format where ASCII newline + ASCII comma are used as delimiters, and then everything else is encoded further so as to basically become opaque octet-strings (whose encoding is such that it will never include '\n' or ',').

In other words, if you take CSV and make it completely useless for hand-editing in a text-editor by taking it 90% of the way to being a length-prefixed binary encoding, you'll have what the programmer's mind intuitively jumps to as "what CSV should be."

reply

supergreg · 4 minutes ago

Why not just use ASCII 31/30 instead of comma/new line. Just give it a new name like ASV (ASCII separated values). What would take to give glyphs to those characters? Just add them to some font?

reply

userbinator · 2 hours ago

Actually newlines and commas are OK, and in fact arbitrary bytes can be put into CSV, because officially[1] CSV supports escaping using quotes (and escaping quotes by doubling them).

[1] https://www.ietf.org/rfc/rfc4180.txt

reply

dmichulke · 3 hours ago

An alternative would have been to not label it "CSV",

Another one would have been to write numbers in a CSV file in the American format and just adapt them to the current locale when reading them.

Both would be a more sensible interpretation of your GP.

reply

mSparks · 2 hours ago

No more sensible than starting with Japansese format and force the Americans to adapt to their locale when reading them....

That'll make the world a better place wouldn't it?.....

reply

reilly3000 · 5 hours ago

Jeez, how many SLYK files out there? I venture to guess there is an order of degrees more attempts at opening CSV files with ID as the first header than attempted opens of unconverted SLYK files in excel.

When I admire Microsoft, it's most often because despite nearly 4 decades of bloat to support, they still can ship reliable software to millions.

reply

lostlogin · 5 hours ago

I kind of wish I could just use a version from a decade or 2 ago. The (4 year old) versions of word and excel that the uni site I work for use are so clunky to navigate. I loved using word in pre OSX says - I wonder how much of that is rose tinted nostalgia.

reply

Piskvorrr · 1 hour ago

Try LibreOffice (Portable if necessary) - it has remained with the conservative UI.

reply

MikusR · 7 hours ago

"A SYLK file is a text file that begins with "ID" or "ID_xxxx", where xxxx is a text string. The first record of a SYLK file is the ID_Number record. When Excel identifies this text at the beginning of a text file, it interprets the file as a SYLK file. Excel tries to convert the file from the SYLK format, but cannot do so because there are no valid SYLK codes after the "ID" characters. Because Excel cannot convert the file, you receive the error message. "

reply

Piskvorrr · 5 hours ago

In other words, Excel has all the information to decide that this is not a SYLK file but a CSV, but just throws an error, because fuck you. Great UX.

That's way up there with the "you can't drag stuff here, should've dragged it a few px further" from Win98.

reply

cm2187 · 6 hours ago

I had to check this is not the 1st of April

I particularly like the step by step explaination of how to add an apostrophe at the begining of the first line with a description of which keyboard key to press...

reply

chias · 6 hours ago

I feel like this is one of those bugs that would have taken less time to fix than it took to document.

You could even do something as trivial as "if Excel fails to open the file as SYLK, try again as CSV" and cut at least 99% of the problem away.

reply

zaidf · 5 hours ago

I feel like this is one of those bugs that would have taken less time to fix than it took to document.

After many years, I still have to slap myself when I catch myself thinking this way. Unless you know the architecture of the software, I've learned that it's often significantly harder than you imagine to fix seemingly simple and obvious bugs without breaking something else.

reply

Dylan16807 · 5 hours ago

Fine, a slight correction then. It should be easy to fix except for terrible software architecture. Terrible software architecture is not a valid excuse when you have plenty of time and billions of dollars. So there should be little forgiveness for this bug still existing.

reply

zaidf · 4 hours ago

As a consumer affected by this bug, sure, you may find it inexcusable. But the basic truth about any mature software product used by tens of millions is that there will be a laundry list of bugs competing for limited resources. So there will always be bugs that you want to fix but simply isn't a high enough priority relative to all the other bugs or features.

reply

Dylan16807 · 3 hours ago

There are limited resources, but there are enough resources to improve the architecture over more than a decade. And if they decided to avoid architecture changes, there would be nothing in the way of throwing lots of siloed software engineers at bugs to get more of them fixed. They're not lacking for money with so many customers. It's possible for them to set a much more inclusive cutoff.

They could have fewer bugs per feature than almost any other product after being stable and popular for so long. But they choose not to make that a goal.

reply

cm2187 · 5 hours ago

Particularly since it's not like if Microsoft was adding new features to Excel since office 2007.

reply

peterwwillis · 4 hours ago

It is harder, more expensive and time-consuming to fix bugs later than sooner.

reply

bshimmin · 4 hours ago

This is basically what happens, at least on my Mac - it complains about it not being a valid SYLK file, then I think, "I have no idea what that means", then I dismiss the error and it opens it up as a CSV.

At least now I know why!

reply

rompic · 6 hours ago

Also reminded me of http://tburette.github.io/blog/2014/05/25/so-you-want-to-wri...

reply

rompic · 6 hours ago

We had a big wtf moment yesterday at work.

Also see https://en.m.wikipedia.org/wiki/SYmbolic_LinK_(SYLK)

reply

garyclarke27 · 5 hours ago

Excel is useless with csv and utf-8. when I need to export to csv, which I regularly do, I always use Libre Office.

reply

Freak_NL · 5 hours ago

Excel is quite often useless with CSV, even if its contents are pure ASCII. Whether or not Excel will open a standard CSV file delimited with comma's (as per RFC 4180) properly also depends on the locale of the computer running Excel. So if you run Excel on Windows with a Dutch locale, it will by default fail to open the CSV normally and jam all its contents in one cell, because it expects a semi-colon as delimiter instead of a comma. To open standard CSV you have to import the file instead.

Someone at some point decided that the Dutch use semi-colons instead of commas to delimit tabulated data, and decided to apply this logic to CSV files as well. I would like to know the history behind such a decision! Switch your locale to English US, and it opens normally.

LibreOffice just opens the CSV and asks the user to confirm the delimiter and what not regardless of locale.

reply

jsproc · 4 hours ago

Yes, this is so horrible. As a workaround I sometimes put "sep=," on the first line of the spreadsheet. This forces Excel to use the comma as a separator, regardless of the locale. The downside of course is that every other tool on the planet shows the extra row, so it's not very useful.

reply

ptman · 3 hours ago

I suffer from the reverse problem. I need to ingest CSV files produced by excel in various locales. Or is there some generally available alternative to excel that would produce valid RFC CSVs?

reply

Piskvorrr · 1 hour ago

There is no such thing as a valid CSV - even the spec is borked, and nobody follows it anyway. If you have a choice, XLSX is a lesser evil.

reply

Freak_NL · 38 minutes ago

RFC 4180¹ has been in existence since 2005 and seems perfectly straight forward. The basic syntax is two pages of very understandable text. I hold that to be the definition of valid CSV.

CSV libraries tend to adhere to it (and often support additional options encountered in the wild as well); e.g., Apache Commons CSV².

1: https://tools.ietf.org/html/rfc4180

2: https://commons.apache.org/proper/commons-csv/

reply

pieter_mj · 1 hour ago

Importing csv is very messy in excel.

A VBA import routine will always run in the US locale. However, csv files containing dates formatted according to other locales will have their days and months switched as long as the day is not bigger than 12.

reply

viraptor · 1 hour ago

Wait... which excel? 2013 definitely gives me a choice of separators when importing.

reply

Freak_NL · 43 minutes ago

> when importing.

Exactly. Unfortunately, double-clicking the file does not trigger that dialogue. You would be surprised how many IT-staffers simply give up at that point („Can I get the raw data?”, „Sure, here is the automatically generated CSV dump.”, „I can't open it…”).

reply

SonOfLilit · 5 hours ago

Surprisingly, Excel can import CSV from UTF8 with BOM (at least new versions of Excel). However, be careful, as Ctrl+S will save it in some arbitrary codepage and you'll probably end up with a document like:

ID,Name 1,??? ???? 2,Kevin Dub?is

so take care to always save such documents as .xmlx.

(If you convert them to your local codepage and import that - when at all relevant - it will be saved in the local codepage, so less risk of data loss.)

reply

masklinn · 4 hours ago

> Surprisingly, Excel can import CSV from UTF8 with BOM (at least new versions of Excel).

A BOM is abnormal and not recommended in UTF-8, it's basically a shitty MS hack: the BOM is necessary to detect endianness differences between the document and the host system, endianness has no impact impact on UTF-8.

And as freak_nl above notes, Excel uses locale-dependent field separators, so in some countries/locales it will try to import "CSV" with semicolon separators.

reply

m_mueller · 5 hours ago

It actually does have CSV and UTF-8, but finding the correct settings is very hidden. I'm always amazed at how unable Microsoft is to improve the base of Excel and instead just keeps piling UI layers on top to hide the ugly places.

reply

douche · 1 hour ago

I refuse to open CSVs in Excel, because so often perfectly valid CSV gets butchered by it. It's easier to just use Notepad++.

reply

code777777 · 6 hours ago

"Applies to Microsoft Excel 2002 Standard Edition, Microsoft Excel 2000 Standard Edition, Microsoft Excel 97 Standard Edition, Microsoft Office Excel 2003 "

reply

BBlarat · 5 hours ago

I still get the error in Excel 2016, so I guess it's not fixed yet.

reply

chetangole · 3 hours ago

We get a warning in Excel 2016, and can open the file.

reply

pskomoroch · 6 hours ago

Weird, I hit this exact error 2 weeks ago

reply

chrismorgan · 5 hours ago

Ditto, two days ago.

reply

flamedoge · 7 hours ago

one of those, how did they miss this?

reply

lifeisstillgood · 6 hours ago

I had to fake a csv file for testing yesterday, and started out with id, then went "ahahah", and used name.

I know that's boring but this is one of those generational pieces of knowledge like "keep your docs up to date" that we need to build into software training somehow. (Or rather, this bug is not important, but the kind of training that imparts this knowledge will be vital in building a real software profession)

But yeah, it's a real WTF

reply

Jaruzel · 5 hours ago

Embedded knowledge like this, is why guys like us can never retire...

reply

ZenoArrow · 1 hour ago

I don't know enough about SLYK to suggest a full solution, but couldn't additional checks be performed that confirm it's a SLYK file? Other than all SLYK files starting with ID, are there any other properties that exist in all SLYK files that could be checked against?

reply