Flash plugin
Posted Nov 10, 2010 19:13 UTC (Wed) by cesarb (subscriber, #6266) [Link]
Flash plugin
Posted Nov 10, 2010 19:20 UTC (Wed) by jwb (guest, #15467) [Link]
Flash plugin
Posted Nov 11, 2010 0:37 UTC (Thu) by marcH (subscriber, #57642) [Link]
Steve Jobs at work in glibc?
Flash plugin
Posted Nov 11, 2010 19:28 UTC (Thu) by iabervon (subscriber, #722) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 19:25 UTC (Wed) by brunowolff (guest, #71160) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 19:52 UTC (Wed) by brunowolff (guest, #71160) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 4:12 UTC (Thu) by plougher (guest, #21620) [Link]
The Redhat bugzilla doesn't have a description of the memcpy problem that ultimately caused the bug, but my Squashfs CVS commit does
http://squashfs.cvs.sourceforge.net/viewvc/squashfs/squas...
From my experience hitting this problem I think the people insisting this is merely bad programmers using memcpy where they should have used memmove are missing the point. I quite legitimately wrote code using memcpy where it was known the areas did not overlap, but over the years code changes elsewhere happened which then caused the areas to overlap in certain circumstances, breaking design assumptions made in years old code. This was obviously a bug, but one which hitherto had been completely hidden by the behaviour of memcpy.
In other words programmers can make well meaning mistakes especially when dealing with old code or with library routines where the underlying implementation is not known. Testing with the old behaviour of memcpy won't show anything is amiss.
Flash is obviously an example where memcpy of overlapping areas occurs frequently and so it has shown up quite quickly. There may be many applications using memcpy which in rare circumstances use overlapping areas, leading to unexplained corruption and data loss, which have not yet shown up.
Glibc change exposing bugs
Posted Nov 12, 2010 6:55 UTC (Fri) by hozelda (guest, #19341) [Link]
I'm not disagreeing with the gist of what you mentioned, but this does point out that software is very complex in that the exact semantics of every interacting component has compounding effects on the overall result. This is why there is a tendency to code to achieve a pass on tests rather than by strict well defined "interface specs". However, having access to source code and working openly means problems can be identified quicker and shared quicker with other projects. If source code was not available (eg, if the library change had happened on a proprietary platform), finding the mistake would be more difficult and costly and there would be more bugs that would only come out under odd scenarios because thorough testing is impossible and certainly more difficult than analyzing even lots of source code.
An ideal wish: I want to see code assumptions be documented better on source code (assert calls and/or prose), even though we do have access to version control, peer review, and sometimes lots of testing with decent feedback. HTML can be generated from a heavily documented code to exclude all the little comments except when you want to see them (eg, before a final release or for inspection/audits). Trying to keep the source clean makes it easier to end up with problems over time. We would benefit from extensive standards for documentation, and those that like simple tools (like a simple text editor) can run simple filters on the project from the make file so that you can have all those notes not pollute a working copy.
[An extensive test suite designed to catch these problems is similar in effect but will leave holes whereas descriptive text can offer an important layer of defense.]
[Many times when you are (I am) learning a new code base, you have to make these notes anyway. Why not just formalize the effort and keep it together with the other code? We should even be able to have tests run from this documentation. Tagging precise spots can be done using any "unique" delimiter and can take an sgml approach. Then git and other tools can identify "conflicts" not just from 3-way merges with overlaps but from simple edits which overlap with a comment's scope.. triggering a requirement to update the comments whose scope were touched.]
[Another approach would be to try and keep something like a git branch of such comments/tests in sync and require that it be run before accepting commits on the main clean branch.]
[It might just be too much effort to do this in terms of bang for buck. Comments can very easily grow stale, though, that is why I suggested automatic conflict identification efforts within the workflow.]
Glibc change exposing bugs
Posted Nov 26, 2010 13:37 UTC (Fri) by SEJeff (subscriber, #51588) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 19:40 UTC (Wed) by clugstj (subscriber, #4020) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 19:46 UTC (Wed) by rodgerd (guest, #58896) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:03 UTC (Wed) by corbet (editor, #1) [Link]
That's a germane example, actually. "Poo flinging" notwithstanding, the kernel developers fixed things so that applications would not lose data even if they weren't following standard behavior. Not breaking things was seen as more important than doing something because the posted rules say you can.I don't believe that Linus (or anybody else) is saying that the broken applications are not buggy. What I'm hearing is that those applications have worked for years and that people should think for a long time before introducing a change which breaks them. Thus, Linus asks: what's the benefit that justifies such a change? I think it's a reasonable question.
Glibc change exposing bugs
Posted Nov 10, 2010 20:10 UTC (Wed) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:38 UTC (Wed) by neilbrown (subscriber, #359) [Link]
This all sounds like a very strong recommendation in favour of Rusty Russell's Maxim of API development: APIs should be hard to misuse. memcpy, and apparently ALSA, are easy to misuse.So implementing memcpy as memmove - which Linus says in the bugzilla threads is largely what the kernel does - sounds very sensible. memmove is much harder to misuse.
Glibc change exposing bugs
Posted Nov 13, 2010 1:07 UTC (Sat) by rriggs (subscriber, #11598) [Link]
Which one do you think your average C programmer will choose?
Which one do you think new programmers are taught to use (in schools that still teach C programming)?
Glibc change exposing bugs
Posted Nov 13, 2010 2:52 UTC (Sat) by neilbrown (subscriber, #359) [Link]
Glibc change exposing bugs
Posted Nov 15, 2010 16:43 UTC (Mon) by renox (subscriber, #23785) [Link]
And memcpy should also be named as mem_unsafe_copy, but yes if you tell developers to use safe function by default and to optimize only when they can show benchmarks that the optimisation will make a difference, then yes, you'd get probably better software (if a bit slower).
Glibc change exposing bugs
Posted Oct 17, 2013 12:49 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]
You're calling memmove verbose as compared to memcpy? Even Ken Thompson said if he had it to do over, he'd spell creat() with the final 'e'.
Glibc change exposing bugs
Posted Nov 25, 2010 15:13 UTC (Thu) by Spudd86 (guest, #51683) [Link]
The problem is that most apps don't actually need the those bits, so they just needlessly break software like pulseaudio (and also break on bluetooth audio too).
Pulseaudio does use those unemulatable APIs, but it also falls back if they don't work, and it has good reasons to use those APIs (so it can hand over large chunks of audio data, but still be able to decide it wants to change that same data later (if for example something else starts playing audio), this saves you power because pulse won't wake your CPU as much, but it also uses APIs that don't emulate well AND until pulse came along nobody ever tried to do that sort of thing so it broke)
Glibc change exposing bugs
Posted Nov 25, 2010 16:07 UTC (Thu) by foom (subscriber, #14868) [Link]
Glibc change exposing bugs
Posted Nov 25, 2010 16:39 UTC (Thu) by Spudd86 (guest, #51683) [Link]
Glibc change exposing bugs
Posted Oct 17, 2013 12:42 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]
One major reason the remaining distinction between memcpy and memmove exists in the standard seems to be this:
To write memmove completely within conformant C, you need a malloc and a double-copy. That's because in that mythical Platonic ideal of a language, you cannot compare two pointers that do not point into the same object, and you are not guaranteed that the arguments to memmove point within the same object. That is, a fully compliant memmove would look something like this:
void *memmove(void *dst, const void *src, size_t len) { char *srcc = (char *)src; char *dstc = (char *)dst; char *temp = malloc(len); size_t i; /* What if 'malloc' fails? call abort()? Unspecified! */ for (i = 0; i != len; i++) temp[i] = srcc[i]; for (i = 0; i != len; i++) dstc[i] = temp[i]; free(temp); return dst; }
And, on 16-bit segmented computers or other computers lacking flat memory spaces, both of which are rather from a Platonic ideal, comparing two pointers isn't always as straightforward as you might like. So practically, memcpy offers some noticeable performance benefits on those machines.
Yes, I'm aware that the actual language in the standard says 'as if' the source was first copied to a temporary array. But, as I recall, a fully conformant C program has no other option. The 'as-if' clause allows library writers to avoid such shenanigans, without requiring them to do so. So much hair-splitting...
If it weren't for that, you could make the argument that separate memcpy and memmove were historical accidents, and change the C standard at some point to remove the restrictions on memcpy to make them both equivalent. That new memcpy would then adhere to Rusty's Maxim, or at least come much closer. And, from the thread linked above, that's pretty much what BSD did, it sounds like.
As a half step, you could define memcpy as always copying forward, to make "sliding down" safe, but that just seems a little goofy for a number of reasons.
I'm personally with Linus that the glibc breakage seems gratuitous. I'd lean towards making memcpy and memmove equivalent if their performance is largely indistinguishable. Arguing that the software is broken when it worked for year with the old library reminds me of this silly meme. It's the kind of hair-splitting that only a bureaucrat or chapter-verser could love.
Glibc change exposing bugs
Posted Oct 17, 2013 12:44 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]
...rather far from a Platonic ideal...
Need. More. Coffee.
Glibc change exposing bugs
Posted Oct 18, 2013 13:31 UTC (Fri) by meuh (subscriber, #22042) [Link]
If we were on "stackoverflow", you would have earned the "Necromancer" badge ;)
Glibc change exposing bugs
Posted Oct 18, 2013 14:08 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]
Glibc change exposing bugs
Posted Oct 21, 2013 20:37 UTC (Mon) by nix (subscriber, #2304) [Link]
Your comment was interesting anyway. This is the relevant guarantee from C89 (C99 and C11 have similar wording):If two pointers to object or incomplete types compare equal, they point to the same object. If two pointers to functions compare equal, they point to the same function. If two pointers point to the same object or function, they compare equal. If one of the operands is a pointer to an object or incomplete type and the other has type pointer to a qualified or unqualified version of void , the pointer to an object or incomplete type is converted to the type of the other operand.The problem here is that this does not guarantee that two pointers to the same object always compare equal, but rather that if they compare equal, they are pointers to the same object (and similarly for comparison operators). We can tell if two pointers definitely are pointers within the same object, but if the comparison fails we cannot conclude anything. This is unfortunately the opposite of the guarantee that memmove() needs if it is to transform itself into a memmove() when needed, so (in the absence of a Standard-blessed way to normalize pointers) you are indeed forced to do a double-copy at all times when writing memmove() in Standard C.
Glibc change exposing bugs
Posted Oct 21, 2013 20:49 UTC (Mon) by khim (subscriber, #9252) [Link]
This is unfortunately the opposite of the guarantee that memmove() needs if it is to transform itself into a memmove() when needed, so (in the absence of a Standard-blessed way to normalize pointers) you are indeed forced to do a double-copy at all times when writing memmove() in Standard C.
Note that in real world there are no such guarantee (hint, hint) thus GLibC's memmove sometimes works and sometimes does not work.
Glibc change exposing bugs
Posted Oct 23, 2013 14:23 UTC (Wed) by nix (subscriber, #2304) [Link]
This behaviour is explicitly permitted by the Standard: segmented architectures like MS-DOS were like this decades ago. The guarantee that a == b returns nonzero only when a and b are pointers to the same object holds nonetheless. It's just a less useful guarantee than we might like.
Glibc change exposing bugs
Posted Oct 23, 2013 15:47 UTC (Wed) by khim (subscriber, #9252) [Link]
My point was that real-world GLibC-implemented memmove does not actually work when used on POSIX system. It compares pointers and assumes that if they are different then underlying memory is also different!
Which means, strictly speaking, that memmove in GLibC is not standards-compliant :-)
Glibc change exposing bugs
Posted Oct 23, 2013 16:47 UTC (Wed) by jzbiciak (subscriber, #5246) [Link]
Glibc change exposing bugs
Posted Oct 23, 2013 18:09 UTC (Wed) by nix (subscriber, #2304) [Link]
... bloody hell, it does. Or many of the assembler versions do anyway. Or, rather, it assumes that distinct addresses cannot alias.
I suppose this is probably safe in practice, because if you *do* use mmap() to set up aliased regions at distinct addresses you are suddenly in hardware-dependent land (due to machines with VIPT caches such as, IIRC, MIPS, not being able to detect such aliasing at the caching level, so you suddenly need to introduce -- necessarily hardware-dependent -- cache flushes and memory barriers) so you have to know what you're doing anyway, and little things like memmove() falling back to memcpy() at unexpected times are things you're supposed to know about.
I hope.
Glibc change exposing bugs
Posted Nov 10, 2010 20:56 UTC (Wed) by clugstj (subscriber, #4020) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 22:24 UTC (Wed) by lmb (subscriber, #39048) [Link]
That behavior is undefined makes one only right as far as technicality is concerned; it does not imply that changing it silently is good software engineering practice, nor that it is right in terms of software providing a service to users.
Glibc change exposing bugs
Posted Nov 10, 2010 23:58 UTC (Wed) by nix (subscriber, #2304) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 2:29 UTC (Thu) by foom (subscriber, #14868) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 7:29 UTC (Thu) by nix (subscriber, #2304) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 17:30 UTC (Thu) by foom (subscriber, #14868) [Link]
Here we have a new bug in flash which appeared without a new version of the flash binary being uploaded. It's a substantively different situation.
Glibc change exposing bugs
Posted Nov 12, 2010 7:12 UTC (Fri) by hozelda (guest, #19341) [Link]
If you are that worried, you should work off stable versions or off a stable distributor that will manage this for you. You should not change key parts of the system if possible. glibc is a very key part. You should not update for optimizations, at least not without significant tests and only if you think it's worthwhile the gains. Stick to security updates or when a crucial problem has been solved.
Anyway, when an important "bug" like this comes up, projects should audit the code. In this case, the possible entry points to potential problems can be identified quickly for many projects (just search for memcpy).
The case of glibc involves well-defined standards. Most libraries do not have such carefully defined semantics, and we must rely on access to source code for the juicy bits.
OK, despite what I just said, if the gains here are not that useful, glibc should revert, at least for the time being. Reverting should not hurt those that adjusted already and will save those that have not. On the other hand, when will be the right time to change? Will people remember to fix this problem or will we just have a repeat later on? [Again, if the gains are negligible, the change in glibc should probably be avoided.]
Glibc change exposing bugs
Posted Nov 11, 2010 0:50 UTC (Thu) by MattPerry (guest, #46341) [Link]
But it is defined. The man page says not to use that function on overlapping regions. That applications ignored that and still functioned for so long is more a matter of good luck. That luck has run out due to their poor implementations and they should now be fixed.
Glibc change exposing bugs
Posted Nov 11, 2010 1:15 UTC (Thu) by Lovechild (guest, #3592) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 2:41 UTC (Thu) by quotemstr (subscriber, #45331) [Link]
the kernel developers fixed things so that applications would not lose data even if they weren't following standard behaviorWhat some filesystem developers propose applications do isn't defined by any standard. POSIX, SuS, and so on don't state what happens after a crash, fsync() or not. The argument was over what to do in certain circumstances outside any standard. The argument was must muddled because one said kept claiming that its brand of brain damage was endorsed by the standard. Fortunately, sanity prevailed. Calling fsync() after every rename would have inconvenienced application developers and decreased performance.
memcpy, on the other hand, is clearly described by the relevant standards. Application developers deserve what they get.
Glibc change exposing bugs
Posted Nov 11, 2010 8:07 UTC (Thu) by bojan (subscriber, #14302) [Link]
He, he... Nice try :-)
Nothing could be further from the truth. The problem is that the standard doesn't _specify_ in which order things should happen on the underlying FS, which then gives implementers the ability to implement _any_ order (which they do). Relying on a _particular_ order (which is completely undocumented, of course) by application writers is the problem.
Suggestion about specification not dealing with crashes is irrelevant, because, once again, it doesn't specify _any_ behaviour. In other words, if you FS is hosed completely after a crash, that OK. If it's half hosed, that's OK too. If it's completely OK, that's OK as well. Obviously, the _interesting_ case is when it's completely OK, in which case the _implemented_ ordering actually makes a difference. And, once again, _any_ ordering is OK, because the standard specifies _none_.
The only difference between this and the memcpy() fiasco is that in the case of rename() folks may get an _impression_ that the operation is atomic on the FS level, because it is atomic as viewed from the processes currently running on the system. Of course, this is documented nowhere, but is a common misreading of the standard.
With memcpy() it is quite clear overlapping regions should be copied with memmove().
Glibc change exposing bugs
Posted Nov 11, 2010 8:32 UTC (Thu) by Mook (guest, #71173) [Link]
Yes, glibc's rename() API guarantees atomic renames. Since normal applications do not make syscalls directly, but call the libc API to do it on their behalf, they are not to blame.
Glibc change exposing bugs
Posted Nov 11, 2010 8:46 UTC (Thu) by bojan (subscriber, #14302) [Link]
The atomicity of rename() refers to a view from the running system and not much else. But it has sure been misread a lot :-)
Glibc change exposing bugs
Posted Nov 11, 2010 9:06 UTC (Thu) by Mook (guest, #71173) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 9:52 UTC (Thu) by bojan (subscriber, #14302) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 13:49 UTC (Thu) by pbonzini (subscriber, #60935) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 23:05 UTC (Thu) by bojan (subscriber, #14302) [Link]
What glibc docs are talking about is that rename() is not implemented by copying content of the oldname to newname. So, if there was newname before rename and the directory commit doesn't go through, the content of newname will not be changed. It is a pure directory operation. On the other hand, if the directory gets committed, there will be just newname there, pointing to whatever content oldname had. All of that is if your FS knows how to survive a crash - otherwise situation is not interesting (well, unless you're the sysadmin recovering the mess :-).
Now note the situation from the ext4 "problem". The oldname content was not fsync()-ed to disk before the rename(). Ergo, when the directory got committed, oldname became newname on disk, pointing to zero bytes, due to delayed allocation. This has nothing to do with the fact that on unsuccessful (i.e. not committed before the crash) rename(), both oldname and newname would remain in the directory.
Glibc change exposing bugs
Posted Nov 12, 2010 7:12 UTC (Fri) by Mook (guest, #71173) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:07 UTC (Wed) by dlang (subscriber, #313) [Link]
if a userspace program does things that have been working, even if they weren't supposed to work, that's part of the ABI of the kernel and he is very reluctant to change anything, and will only do so when there is a _very_ compelling reason
Glibc change exposing bugs
Posted Nov 10, 2010 20:36 UTC (Wed) by JoeBuck (guest, #2330) [Link]
The existing memcpy implementation did copying in a forward direction, so it would give a wrong result for memcpy(buf, buf + 4, 8) but the expected result for memcpy(buf, buf - 4, 8). The change (in at least some circumstances) does the reverse, and both ways satisfy the spec, which says that src and dst must not overlap, and if they might, memmove should be used. Linus is apparently calling for the original implementation decision (forward, not backward) to be set in stone, even if a backward-copy might be faster on a particular processor. This doesn't seem right to me. However, it seems reasonable to provide a cleaner workaround until old code can be fixed (it might just be a cleaned-up version of his proposed LD_PRELOAD trick).An alternative LD_PRELOAD, pointing to a memcpy that crashes for overlapping arguments, could be used to expose accidental misuse of the API.
Glibc change exposing bugs
Posted Nov 10, 2010 20:51 UTC (Wed) by clugstj (subscriber, #4020) [Link]
Please don't attack strawmen. Thnx.
Posted Nov 10, 2010 22:47 UTC (Wed) by khim (subscriber, #9252) [Link]
The actual cite:
So in the kernel we have a pretty strict "no regressions" rule, and that if people depend on interfaces we exported having side effects that weren't intentional, we try to fix things so that they still work unless there is a major reason not to.
...
Regardless, it boils down to: we know the glibc change resulted in problems for real users. We do _not_ know that it helped anything at all.
Linus is Ok with changes that break buggy programs (it happened before, it'll happen again) bit only if there are "major reason". What's the justification for this particular case?
Please don't attack strawmen. Thnx.
Posted Nov 10, 2010 23:17 UTC (Wed) by bojan (subscriber, #14302) [Link]
Linus couldn't play his favourite YouTube videos ;-)
Please don't attack strawmen. Thnx.
Posted Nov 11, 2010 1:31 UTC (Thu) by jonabbey (guest, #2736) [Link]
It's not, in fact, a bug. It's a feature.
Glibc change exposing bugs
Posted Nov 10, 2010 23:15 UTC (Wed) by charlieb (subscriber, #23340) [Link]
Does it? The man page says:
The memory areas should not overlap.
It does not say:
The memory areas must not overlap.
It also says:
The memcpy() function copies n bytes from memory area src to
memory area dest.
It doesn't say:
The memcpy() function copies n bytes from memory area src to
memory area dest, unless the memory areas overlap.
"should" provisions are not mandatory. Unless you decide to redefine the terminology.
Glibc change exposing bugs
Posted Nov 10, 2010 23:23 UTC (Wed) by bojan (subscriber, #14302) [Link]
> Use memmove(3) if the memory areas do overlap.
Glibc change exposing bugs
Posted Nov 10, 2010 23:24 UTC (Wed) by donwaugaman (subscriber, #4214) [Link]
If copying takes place between objects that overlap, the behavior is undefined.
In the context of standardese, that specifies that exactly anything can happen in the event of overlapping memory areas, with no 'should' or 'must' about it. The standard doesn't set down any rules that a developer must follow, only what will happen under certain conditions (in this case, the result is 'anything').
'must' and 'should' are more in the vein of RFCs.
Glibc change exposing bugs
Posted Nov 11, 2010 0:25 UTC (Thu) by nicooo (guest, #69134) [Link]
Glibc's info page says it's undefined. It's the official documentation but nobody uses info.
Glibc change exposing bugs
Posted Nov 11, 2010 0:28 UTC (Thu) by bojan (subscriber, #14302) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 0:33 UTC (Thu) by charlieb (subscriber, #23340) [Link]
Ideally the linux man-page will be clarified. "should" there seems just a recommendation. Not "your software will eat babies unless you do this".
Glibc change exposing bugs
Posted Nov 11, 2010 2:42 UTC (Thu) by butlerm (guest, #13312) [Link]
It's the official documentation but nobody uses info.
Glibc change exposing bugs
Posted Nov 11, 2010 6:41 UTC (Thu) by HelloWorld (guest, #56129) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 9:46 UTC (Thu) by mpr22 (guest, #60784) [Link]
For large manuals, my experience is thatinfo
merely sucks less than a man page; the user interface of both /usr/bin/info
and /usr/bin/emacs -f info
is horrible. For simple things, man
wins by a country mile, because it doesn't slice-and-dice a simple program's documentation into 742 one-paragraph pages.
info considered harmful?
info considered harmful?
Posted Nov 12, 2010 14:08 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 22:56 UTC (Thu) by HelloWorld (guest, #56129) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 18:19 UTC (Fri) by sorpigal (subscriber, #36106) [Link]
I've used pinfo and it helps some in the UI department, but I'd still use man over pinfo for almost every trivial lookup. If your goal is to completely replace man then your system needs to be a drop-in replacement from a user interaction point of view, with the advantages discoverable by users who are interested in learning them.
Glibc change exposing bugs
Posted Nov 12, 2010 18:45 UTC (Fri) by foom (subscriber, #14868) [Link]
Not really: "info" also searches the whole document if you hit /. (although I share the general dislike for the info browser).
Glibc change exposing bugs
Posted Nov 25, 2010 15:22 UTC (Thu) by Spudd86 (guest, #51683) [Link]
It'd be nice to have an info viewer that converts to HTML on the fly and uses webkit to render it.
Glibc change exposing bugs
Posted Nov 25, 2010 22:13 UTC (Thu) by paulj (subscriber, #341) [Link]
Glibc change exposing bugs
Posted Nov 26, 2010 0:31 UTC (Fri) by Spudd86 (guest, #51683) [Link]
Glibc change exposing bugs
Posted Nov 26, 2010 0:40 UTC (Fri) by sfeam (subscriber, #2841) [Link]
Glibc change exposing bugs
Posted Nov 26, 2010 1:33 UTC (Fri) by Spudd86 (guest, #51683) [Link]
Glibc change exposing bugs
Posted Nov 27, 2010 13:26 UTC (Sat) by paulj (subscriber, #341) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 10:36 UTC (Fri) by marcH (subscriber, #57642) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 5:01 UTC (Fri) by nicooo (guest, #69134) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 7:33 UTC (Fri) by paulj (subscriber, #341) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 10:42 UTC (Fri) by marcH (subscriber, #57642) [Link]
I find it too bad that a not-so-good default user interface is rebuffing users before then even start to see the nice features of the format. The fix is to promote alternatives user interfaces, something I keep doing constantly (and which has already been done here).
Glibc change exposing bugs
Posted Nov 12, 2010 13:59 UTC (Fri) by HelloWorld (guest, #56129) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 20:10 UTC (Fri) by nicooo (guest, #69134) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 23:32 UTC (Fri) by Wol (guest, #4433) [Link]
Which is why I like man, and like pdf, and just curse profusely every time I'm exhorted to use info!
Cheers,
Wol
Glibc change exposing bugs
Posted Nov 12, 2010 13:52 UTC (Fri) by Wol (guest, #4433) [Link]
At least with man, I can scroll down (or search) until I find what I'm looking for.
info, on the other hand, "you are in maze of twisty little passages all alike". When presented with the instruction to "use info", I give up and use the web. When presented with a 1000-line man page, no problem ... :-)
Cheers,
Wol
Glibc change exposing bugs
Posted Nov 12, 2010 14:06 UTC (Fri) by HelloWorld (guest, #56129) [Link]
> At least with man, I can scroll down (or search) until I find what I'm looking for.
So you can with info. You can search the complete manual with the s key. The fact that you don't know this indicates you don't bother to read documentation at all really.
> info, on the other hand, "you are in maze of twisty little passages all alike".
If you had actually read the headings of the "twistly little passages", you would have found that they're really not alike at all. Alas, you don't seem to have bothered and decided to pointlessly whine about info instead.
Glibc change exposing bugs
Posted Nov 12, 2010 19:11 UTC (Fri) by bronson (subscriber, #4806) [Link]
Take a deep breath dude. Different people like different things.
Glibc change exposing bugs
Posted Nov 12, 2010 23:39 UTC (Fri) by Wol (guest, #4433) [Link]
The problem with that is if I can't articulate what I'm searching for. The number of times I've searched on what I think is the obvious search key, wasted half-an-hour or so doing it, then done a manual scroll through whatever I can find.
I then find what I'm looking for, and discover that it's called something (to me) extremely obscure, and doesn't mention my search term at all, etc etc.
Plus the fact that I'm one of those strange people who actually DOES tend to read documentation, from cover to cover, and likes to have a straight line path through it, not with redirects and jumps and god knows what all over the place. About the only place I can find information on info is in info - and if I find info repellent, how on earth am I going to find out how to use it if I have to use it to find out?
THERE is your problem with info - if you hate it because you can't find out how to use it, it's catch 22. You need to know how to use it to find out how to use it :-)
Cheers,
Wol
Glibc change exposing bugs
Posted Nov 13, 2010 0:42 UTC (Sat) by foom (subscriber, #14868) [Link]
Glibc change exposing bugs
Posted Nov 14, 2010 22:32 UTC (Sun) by nix (subscriber, #2304) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 7:31 UTC (Thu) by nix (subscriber, #2304) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 7:30 UTC (Fri) by hozelda (guest, #19341) [Link]
If you use Linux, the Linux documentation should be authoritative. Hopefully, it will agree with POSIX and C99 (or whatever is the latest memcpy standard) as much as possible. If there is a reason for a change (or to document a Linux bug) and you use Linux, I would pay attention to the Linux documentation and treat everything else as advisory. If you use Red Hat or whatever other distro, I would look treat those docs as authoritative and not whatever other standard you think should apply.
A different matter is arguing about keeping Linux in sync with POSIX, etc, but if you want to build software that will work, short of maintaining your personal set of patches not accepted by upstream, you would probably want to code to "Linux" (at least for the Linux port).
Glibc change exposing bugs
Posted Nov 12, 2010 7:37 UTC (Fri) by hozelda (guest, #19341) [Link]
Glibc change exposing bugs
Posted Nov 14, 2010 21:19 UTC (Sun) by nix (subscriber, #2304) [Link]
(You might need to adjust for bits of older systems that are non-POSIX, but that is really quite rare these days unless you're aiming for some strange emulation layer like Cygwin. Also you might need to do byteorder detection and so forth, but, again, that's stuff which is left unspecified by POSIX. You should not generally have to use Linux-specific stuff unless you really want to, and you normally shouldn't want to.)
Glibc change exposing bugs
Posted Nov 14, 2010 22:03 UTC (Sun) by promotion-account (guest, #70778) [Link]
You should not generally have to use Linux-specific stuff unless you really want to, and you normally shouldn't want to.
I'm sure you know this, but for some applications, POSIX is not really enough. Thus, for example, the need for some portable abstraction libraries like libevent.
Glibc change exposing bugs
Posted Nov 14, 2010 23:08 UTC (Sun) by nix (subscriber, #2304) [Link]
(btw, your account name is... *interesting*.)
Glibc change exposing bugs
Posted Nov 15, 2010 1:37 UTC (Mon) by promotion-account (guest, #70778) [Link]
(btw, your account name is... *interesting*.)
That's descriptive anonymity :)
Readers usually give higher weight to subscribers opinions here, so this handle honestly states that I'm a promoted guest.
Glibc change exposing bugs
Posted Nov 15, 2010 10:39 UTC (Mon) by nix (subscriber, #2304) [Link]
'Promotion' is a word with many meanings...
Glibc change exposing bugs
Posted Nov 15, 2010 8:13 UTC (Mon) by dlang (subscriber, #313) [Link]
most programs do not start off being written portably, usually portability is something that shows up after the program starts being used when people ask about using it on other platforms (and it's not uncommon for it to wait until those people asking submit patches)
not saying that this is right, just saying that it's the way things are. When Solaris dominated the same thing happened favoring it.
Glibc change exposing bugs
Posted Nov 11, 2010 0:30 UTC (Thu) by charlieb (subscriber, #23340) [Link]
What manpage is that? The memcpy(3) manpage on my CentOS4 box does not say "the behavior is undefined". Ah, I see that the memcpy(3p) one does.
> 'must' and 'should' are more in the vein of RFCs.
OK. But at least those are clear. "should" in the context of an API man page is not.
Glibc change exposing bugs
Posted Nov 11, 2010 12:23 UTC (Thu) by gidoca (subscriber, #62438) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 18:03 UTC (Thu) by donwaugaman (subscriber, #4214) [Link]
"The memory areas may not overlap."
... which sounds a little stronger than "should" to me.
Not sure why CentOS4 differs...
At any rate, arguing over the man pages is irrelevant to the standard - if the man pages don't match the standard, the man pages need to be fixed rather than the standard.
That being said, it would sure be nice to have some kind of formal deprecation of the previous behavior. One of the nice things about the free software world is that it should be more possible to make these kinds of changes because it's easier to change the programs whose assumptions worked OK with the previous behavior but are violated by the new behavior. Of course, with closed-source Flash players, that goes out the window, and it becomes a question of whether it is more important to pacify Adobe users or to give Adobe an incentive to clean up its software.
Glibc change exposing bugs
Posted Nov 11, 2010 0:04 UTC (Thu) by nix (subscriber, #2304) [Link]
> If copying takes place between objects that overlap, the behavior is undefined.
The behaviour of Linux (and Unix) systems in this area are governed by POSIX, not a random manpage. (And in this case POSIX is aligned with ISO C, and even uses the same phrasing.)
Glibc change exposing bugs
Posted Nov 10, 2010 21:33 UTC (Wed) by stijn (guest, #570) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 21:47 UTC (Wed) by jedbrown (guest, #49919) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 22:09 UTC (Wed) by stijn (guest, #570) [Link]
In a communal view of software production and use it seems a bit unthoughtful to push this through and let (less technical) users suffer. It makes the software and the makers look bad. It makes it worse if that is shrugged off in a disdainful manner.
Glibc change exposing bugs
Posted Nov 11, 2010 18:47 UTC (Thu) by oak (guest, #2786) [Link]
And it has been doing it for nearly a decade. And of course many other free memory debugging facilities like Duma (improved version of Electric Fence), mpatrol etc. produce these warnings too. As I would assume proprietary ones (on other platforms) to do also...
One could also define _FORTIFY_SOURCE to turn memcpy() etc into checking, slower versions. For more info, see:
* http://wiki.debian.org/Hardening
* https://wiki.ubuntu.com/CompilerFlags
Glibc change exposing bugs
Posted Nov 17, 2010 14:45 UTC (Wed) by meuh (subscriber, #22042) [Link]
Using -D_FORTIFY_SOURCE enable only check for overflow when source and destination length are known (or can be computed).
_chk() variant of memset(), memcpy(), etc. didn't check for overlap.
And one should know that GCC provides inline versions of such functions, so valgrind won't be able to overload them and provide stronger argument checking.
Glibc change exposing bugs
Posted Nov 17, 2010 19:08 UTC (Wed) by oak (guest, #2786) [Link]
> And one should know that GCC provides inline versions of such functionsWasn't this article about Glibc memcpy(), not the GCC (libgcc?) one?
Anyway, AFAIK GCC does that only if code is compiled with optimizations. Valgrind and -O0 compiled code are speed-wise pretty horrible combination though. Then it might be better to use one of the other memory debugging tools that don't do CPU emulation like Valgrind does...
Note that GCC doesn't inline its memcpy() code just for explicit (fixed size) memcpy() calls. Inlined version may also be used for assignments and developers are able to mess up addresses of variables used in thing like this too:
struct foobar_t *a = arg1, *b = arg2; ... *a = *b;
(I found this issue on implicit GCC memcpy() when my code didn't have correct alignment for one of above kind of pointers on platform that required things to be properly aligned. It triggering a kernel alignment exception handler bug had me scratching my head until more knowledgeable colleague came to rescue... I think with overlapping pointer addresses results may be even more mysterious as they show up later.)
Glibc change exposing bugs
Posted Nov 10, 2010 21:51 UTC (Wed) by nix (subscriber, #2304) [Link]
My reaction here is the same as HJ's: if a function as speed-critical as memcpy() is made faster by a change, and it breaks overlapping memcpy()s, the fault is the overlapper for being bloody stupid. (However, if Linus is right that this isn't actually speeding anything up except perhaps in microbenchmarks, then the large size of the 'fast' memcpy() implementation becomes an issue.)
Glibc change exposing bugs
Posted Nov 10, 2010 19:41 UTC (Wed) by mrshiny (subscriber, #4266) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 19:49 UTC (Wed) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:01 UTC (Wed) by mrshiny (subscriber, #4266) [Link]
First of all, glibc supports proprietary software, which is why they allow proprietary software to link to it. So punishing certain programs for license choice by making subtle (and sometimes unjustifiable) changes to API seems like a highly passive-aggressive approach to their ideology.
Second of all, there MUST be a way to preserve backwards compatibility AND allow for future progress. Remember: all of your open-source programs which exhibit this bug are just as broken as Flash, except that if someone tracks it down in the Free software it can be fixed for future versions. But that doesn't help anyone who already has the software installed. I just can't imagine that the glibc people couldn't come up with ANY approach that works for everyone. Off the top of my head I can think of several; probably there are reasons why they are problematic, but in that bugzilla entry even Linus Torvalds was unable to prove that the glibc approach was warranted at all, let alone warranted for everyone all the time.
Glibc change exposing bugs
Posted Nov 10, 2010 22:49 UTC (Wed) by cesarb (subscriber, #6266) [Link]
This was not an API change. The memcpy() API has always been that the regions cannot overlap, and has been so for decades. This was just a change in the implementation details.
Glibc change exposing bugs
Posted Nov 11, 2010 0:33 UTC (Thu) by marcH (subscriber, #57642) [Link]
Agreed, let's not confuse API change with "change of undefined behaviour". It may look the same but it is different.
That's what symbol versioning is all about.
Posted Nov 10, 2010 23:03 UTC (Wed) by khim (subscriber, #9252) [Link]
GLibC does have such mechanism: it's called ELF symbol versioning. But the policy does not cover cases similar to discussed one: if it's bug in a program (and the fact that regions must not intersect is well-documented one... heck, it's reason for memmove(3) existence), then there will be no new version of function.
The question of "do we actually want such change" is separate issue.
Glibc change exposing bugs
Posted Nov 10, 2010 20:05 UTC (Wed) by dlang (subscriber, #313) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:08 UTC (Wed) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 21:03 UTC (Wed) by fuhchee (guest, #40059) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 23:05 UTC (Wed) by patrick_g (subscriber, #44470) [Link]
>>> Linus doesn't even say what CPU he tested on
Glibc change exposing bugs
Posted Nov 11, 2010 0:26 UTC (Thu) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 5:01 UTC (Thu) by nicooo (guest, #69134) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 6:54 UTC (Thu) by madscientist (subscriber, #16861) [Link]
So it doesn't sound useless to me.
Glibc change exposing bugs
Posted Nov 11, 2010 9:42 UTC (Thu) by dgm (subscriber, #49227) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 12:00 UTC (Thu) by alankila (guest, #47141) [Link]
It would probably be better idea for majority of systems to just remove memcpy() and just replace it with memmove() which showed up with 0.17 %. Together, that would add up to 0.5 % at most, I suppose.
Glibc change exposing bugs
Posted Nov 11, 2010 16:26 UTC (Thu) by jedbrown (guest, #49919) [Link]
http://www.reddit.com/r/programming/comments/e4bq0/glibc_...
Glibc change exposing bugs
Posted Nov 11, 2010 0:57 UTC (Thu) by MattPerry (guest, #46341) [Link]
But we don't know if he ran his test on that computer. There's not enough information to tie the two together. It would help if Linus stated what CPU he used to run the test on.
Glibc change exposing bugs
Posted Nov 10, 2010 19:51 UTC (Wed) by mattdm (subscriber, #18) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:05 UTC (Wed) by gus3 (guest, #61103) [Link]
I learned the difference between memcpy() and memmove() in my very first C programming class. Adobe should be embarrassed that their programmers can't read documentation.
Glibc change exposing bugs
Posted Nov 11, 2010 12:38 UTC (Thu) by nye (guest, #51576) [Link]
That would be open source utopia I think.
Glibc change exposing bugs
Posted Nov 11, 2010 14:46 UTC (Thu) by nix (subscriber, #2304) [Link]
;}
Glibc change exposing bugs
Posted Nov 10, 2010 20:13 UTC (Wed) by jwb (guest, #15467) [Link]
It could also trap the overlapping memcpy and switch to one that has a lot of sleep() calls in the inner loop. That might alert the ignorant programmers of ghastly Adobe software to their API abuse.
Glibc change exposing bugs
Posted Nov 10, 2010 21:17 UTC (Wed) by lmb (subscriber, #39048) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 21:52 UTC (Wed) by nybble41 (subscriber, #55106) [Link]
The restrictions on memcpy() are hardly unique; *most* APIs do not tolerate overlapping memory regions. The memmove() routine is an exception. If you want a nice "safe" way to copy some data between buffers which may or may not overlap, and don't care so much about performance, just use memmove() everywhere.
While forward compatibility is a good thing in general, it is unreasonable for API developers to feel bound to support obvious *misuse* of their APIs which directly contradicts explicit API documentation, which is exactly what is happening here. Given that any broken applications can be trivially patched with a simple LD_PRELOAD, I see no reason not to permit this change to the internal implementation of memcpy() in glibc.
Glibc change exposing bugs
Posted Nov 10, 2010 22:18 UTC (Wed) by lmb (subscriber, #39048) [Link]
But the code worked so far - when users upgrade their glibc, suddenly their programs break, or possibly corrupt the user's data. How's that good?
Yes, it pushes users to complain to the developer (if they still can, and their e-mail/internet bits aren't affected), but it leaves them with a bitter aftertaste for the platform/ecosystem that forces developers to fix bugs at their users's expense.
The code should start with emitting a warning to the logs (once per program run, otherwise it becomes a DoS). The compiler could start warning if it detects the possibility of this happening (or coverity/valgrind etc all can). Possibly taunt developers publicly if you spot those messages in your logs.
But breaking underneath an unsuspecting user? Horribly, horribly wrong.
Glibc change exposing bugs
Posted Nov 10, 2010 22:34 UTC (Wed) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 23:59 UTC (Wed) by lmb (subscriber, #39048) [Link]
Knowingly introducing a change with consequences that aren't just mere crashes, but data corruption, for end users - if you cannot see how that is wrong, I have no idea how to help explain it.
Yes, of course, the performance achievement is worth having. However, not at this cost. Not without first adding some audit logging. Not without giving developers time to fix that. It is an incompatible change in the ABI.
I can read the man page as well as you can. Sure, the applications are buggy; that does not give one the right to corrupt user data. Such changes need to be phased in carefully; not in a "I am more righteous than you" style. It is bad enough when it happens by accident; intentionally, it is malpractice.
Glibc change exposing bugs
Posted Nov 11, 2010 0:05 UTC (Thu) by bojan (subscriber, #14302) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 16:35 UTC (Thu) by jedbrown (guest, #49919) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 0:02 UTC (Fri) by bojan (subscriber, #14302) [Link]
glibc-2.12.90-18.x86_64
glibc-2.12.90-18.i686
This is an unreleased version of glibc. Fedora does this from time to time - ship early cuts of new glibc (this will be 2.13 one day).
Glibc change exposing bugs
Posted Nov 11, 2010 0:31 UTC (Thu) by jwb (guest, #15467) [Link]
You are proposing that it would be wise to have a test and branch at every entry into memcpy? That is madness.
Glibc change exposing bugs
Posted Nov 11, 2010 1:52 UTC (Thu) by gus3 (guest, #61103) [Link]
if ((p1 + length <= p2) || (p2 + length <= p1)) {
crash_and_burn();
}
It is not a sophisticated test, and the more noise it makes about buggy parameters, the sooner the calling code will get fixed.
Glibc change exposing bugs
Posted Nov 11, 2010 2:05 UTC (Thu) by gus3 (guest, #61103) [Link]
if ((p1 + length >= p2) || (p2 + length >= p1)) {
crash_and_burn();
}
But goofing the test, doesn't mean the test isn't simple.
Glibc change exposing bugs
Posted Nov 11, 2010 2:12 UTC (Thu) by gus3 (guest, #61103) [Link]
I see the actual test in my head, but I can't code it right now due to fatigue. But even with all necessary calculations, being integer math, it'll take no more than a few tens of cycles. Even on a register-starved x86, putting a couple temporary variables on the stack will only pollute the cache, before over-writing the temps anyway. It shouldn't take more than a microsecond to check for overlap.
Glibc change exposing bugs
Posted Nov 11, 2010 7:24 UTC (Thu) by nix (subscriber, #2304) [Link]
There is absolutely no chance that the glibc devs would ever accept this except in the -lc_g version of the library (which nobody ever uses).
Glibc change exposing bugs
Posted Nov 12, 2010 23:04 UTC (Fri) by gus3 (guest, #61103) [Link]
Glibc change exposing bugs
Posted Nov 14, 2010 22:29 UTC (Sun) by nix (subscriber, #2304) [Link]
The GNU people should own up to having violated the documentation on their code.What on earth? The relevant documentation for memcpy() is ISO C, incorporated by reference into all versions of POSIX.1. This clearly states "If copying takes place between objects that overlap, the behavior is undefined."
This isn't an obscure or hard-to-interpret part of the Standard. Undefined, bang, that's it. Perhaps you are operating under the misapprehension that the linux manpages project, a descriptive effort, not a prescriptive one, is in some way binding on glibc? It isn't. It really isn't. It isn't binding on anything.
Glibc change exposing bugs
Posted Nov 15, 2010 0:00 UTC (Mon) by promotion-account (guest, #70778) [Link]
What on earth? The relevant documentation for memcpy() is ISO C, incorporated by reference into all versions of POSIX.1.
Indeed.
Linux man-pages are only authoritative for the kernel system-calls (more precisely, their glibc thin layer). The rest of the APIs are only included for convenience: they are a secondary source to the primary source references residing in the 'CONFORMING TO' section.
Glibc change exposing bugs
Posted Nov 15, 2010 0:31 UTC (Mon) by nix (subscriber, #2304) [Link]
Linux man-pages are only authoritative for the kernel system-calls (more precisely, their glibc thin layer).No, even those are descriptive. Perhaps the glibc texinfo documentation would be authoritative for that, if it was ever maintained by anyone. As it is, I think only Ulrich and Roland's brains are authoritative for glibc.
Glibc change exposing bugs
Posted Nov 15, 2010 1:27 UTC (Mon) by promotion-account (guest, #70778) [Link]
For good or bad, these manpages are the 'most primary' sources available for such topics, only beside the code.
But unfortunately these man-pages do not always exist. I once had to carefully study the bluez userspace code to know how to best interface with the kernel Bluetooth API (undocumented AF_BLUETOOTH sockets, undocumented netlink interfaces, etc).
Glibc change exposing bugs
Posted Nov 15, 2010 10:38 UTC (Mon) by nix (subscriber, #2304) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 18:52 UTC (Thu) by oak (guest, #2786) [Link]
Glibc change exposing bugs
Posted Nov 15, 2010 0:14 UTC (Mon) by promotion-account (guest, #70778) [Link]
memmove() has this check you're clamoring for... And if the given areas don't overlap, it calls memcpy().
Sometimes even if the areas do overlap, it calls memcpy(). This happens if the library has an internal knowledge about memcpy()'s copying direction.
A common example is having src > dst, copying is forward, and the CPU block transfer unit is smaller than or equal to (src - dst). x86-64 CPUs support copying up-to 8-byte blocks in one opcode (movsq), assuming no floating-point ops in use, which is usually the case with kernel code.
Glibc change exposing bugs
Posted Nov 12, 2010 6:34 UTC (Fri) by jmm82 (guest, #59425) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 22:56 UTC (Fri) by gus3 (guest, #61103) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 11:13 UTC (Thu) by marcH (subscriber, #57642) [Link]
No it is not, unless "defined-undefined behaviour" has now become part of Interfaces.
By using the wrong name you are trying to sidestep all the nuances of this problem. Unfair tactics lowering your credibility.
Glibc change exposing bugs
Posted Nov 11, 2010 12:52 UTC (Thu) by nye (guest, #51576) [Link]
Deterministic observed behaviour, like it or not, will always be considered a part of the ABI.
This is why the kernel goes out of its way to preserve observe but undocumented behaviour, and one of the reasons Windows is wildly successful despite its numerous design flaws is that Microsoft agrees.
If a change breaks existing software, then it's a regression. Hand-wringing, finger-pointing, and bitter recriminations about 'proprietary crapware' are all irrelevant. Something worked. Now it doesn't.
From the comments on this it sounds like symbol versioning could be used to avoid this problem altogether, while still getting the benefit for newly built applications. Developers don't want to do this because they feel that it will benefit only proprietary software[0]. Of course the only people harmed by this attitude are end users.
This is just yet another case where open source software chooses politics over technical excellence, which is sad but entirely unsurprising.
[0] Disregarding the idea that one might want to use some open source software with a similar bug that hasn't yet been fixed - most developers seem to always want to run the latest bleeding edge version of everything, and don't understand that the rest of the world isn't like that and expects existing software not to break unexpectedly.
Glibc change exposing bugs
Posted Nov 11, 2010 18:27 UTC (Thu) by donwaugaman (subscriber, #4214) [Link]
Oddly enough, I would consider "technical excellence" to mean fixing bugs in software that has them, in this case the Adobe Flash player, whereas "politics" means allowing poorly-written precedent to trump (and in this case penalize) better performance for programs written with an eye to the standard.
It's a shame there's no way to get Adobe to do an 's/memcpy/memmove/' on their codebase. But the fact that they won't let others do it has more to do with their politics (and opposition to software freedom) than about technical excellence.
Glibc change exposing bugs
Posted Nov 11, 2010 18:32 UTC (Thu) by xilun (guest, #50638) [Link]
Nonsence. The definition of the ABI is NOT any random characteristic that would please you by making others responsible for your own errors.
First if you want to program in a language (and its associated standard library) that is not full of undefined behaviors, then you don't program in C.
If you do program in C, *YOU* are responsible for respecting preconditions. The system will *NOT* magically fix your bugs for you. Glibc developpers are not responsible for the Flash software package, and this is not a free software or not free software problem; they are also not responsible for other random piece of free software, even crappy ones.
You are inversing roles, given the bad one to the developpers of the very high quality, standard compliant, piece of code the is glibc, and the good one to the constant notable piece of crap that is the flash player.
And even if you could be a dictator for the glibc project, please explain us what is the politic you would then impose to *concretely* solve the generalized unattended interactions problem between software components. Nobody has ever solved that. Even Microsoft, which you seems to cherish so much, does for *years* (maybe even decades now) ask third party developers to ship the MS libc the third party developer test his application with. So of course most of the time you don't have this problem of a suddenly changing libc under MS Windows, because each program has its own libc. Now what happens in case some version contains a security exploitable flaw? Security effort are duplicated in such an environment. (And this is just an example.)
Imagine a random application depends on a BUG of a particular version of the glibc. Because of that, you are asking for this bug to remain forever? Nonsense. This is what you call "technical excellence"? What a joke. It would mean freezing libraries forever, because that's the only way you can guarantee in the shared library model that the behavior of any random piece of crappy software won't change too much.
If you're a third party developer who only cares about making your proprietary app sort of working even when you write highly faulty unacceptable quality code, please: 1/ ship it with frozen version of the library it needs, like you do under windows anyway 2/ leave the developers of libraries that are trying to improve them alone, and especially do not report YOUR mistake on them.
I'm not sure nye meant all regressions need to be avoided.
Posted Nov 12, 2010 2:50 UTC (Fri) by gmatht (subscriber, #58961) [Link]
This seems almost a strawman. Criticize Microsoft's policy if you want, but all nye suggested was using symbol versioning for this particular known-to-be-dangerous change. This wouldn't cause any security issues (and may even avoid some).
I agree that it isn't possible to avoid every regression. For example, newer software often has performance regressions on old hardware. However, this seems like a particularly serious regression, so if there was an easy way to stop old versions of software silently corrupting data it may be worth taking.
I'm not sure nye meant all regressions need to be avoided.
Posted Nov 12, 2010 11:13 UTC (Fri) by xilun (guest, #50638) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 0:00 UTC (Thu) by nix (subscriber, #2304) [Link]
The same will happen here.
Glibc change exposing bugs
Posted Nov 11, 2010 0:08 UTC (Thu) by lmb (subscriber, #39048) [Link]
Of course it will cause the code to be fixed. But that the maintainers of the core system library place "I am right" above users's data is a worrying insight.
Glibc change exposing bugs
Posted Nov 11, 2010 0:26 UTC (Thu) by bojan (subscriber, #14302) [Link]
Now you're making glibc maintainers responsible for other people's bugs. They are not.
If this same buggy program was linked against some other library that implements memcpy() similarly to the way latest glibc does, the data would be just as corrupt.
In essence, it is the program that is corrupting the data, not glibc. And it's doing so by clear misuse of a function.
> But that the maintainers of the core system library place "I am right" above users's data is a worrying insight.
I think that's a bit overly dramatic. Fedora 14 is a fresh release, currently carrying a non-released version of glibc. As such, users of it (which includes me) sometimes encounter things that are surprising at first. But the audience is limited and the impact is not earth shattering.
Glibc change exposing bugs
Posted Nov 11, 2010 10:53 UTC (Thu) by nix (subscriber, #2304) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 22:55 UTC (Wed) by clugstj (subscriber, #4020) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 23:56 UTC (Wed) by nix (subscriber, #2304) [Link]
Emitting a warning to the logs is far too expensive: this stuff is so often called that the compiler sometimes open-codes it! Adding a conditional in there that isn't absolutely needed would have horrible effects on performance.
(And as for detecting it at compile time, well, sure! It requires whole-program optimization of every single program and all its shared libraries, and even then detecting it reliably reduces to solving the halting problem. This seems to be rather harder than just valgrinding the bloody thing and learning elementary C before you write it.)
(Sure, there will be actual bugs teased out by this: code that didn't expect to receive overlapping regions when it was written, but that now is. But, guess what? I bet those overlapping copies were causing other bugs, because it is surely rare for code to just memcpy() from region A to *unexpectedly*-overlapping region B and then never do anything with A again.)
Glibc change exposing bugs
Posted Nov 12, 2010 14:05 UTC (Fri) by Wol (guest, #4433) [Link]
So that if any programmer is doing what they should, the system is going to fail under test.
Cheers,
Wol
Glibc change exposing bugs
Posted Nov 10, 2010 20:06 UTC (Wed) by HappyCamp (subscriber, #29230) [Link]
From:
https://bugzilla.redhat.com/show_bug.cgi?id=638477#c74
H.J. Lu 2010-11-10 15:00:40 EST
Comment 74
64bit Fedora 14 is pretty much broken on machines with
SSE4.2. I ran into random crashes with 64bit Fedora 14 on
Intel Core i7. It turns out that 64bit strncasecmp
is broken on machines with SSE 4.2:
Glibc change exposing bugs
Posted Nov 10, 2010 20:21 UTC (Wed) by MisterIO (guest, #36192) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:24 UTC (Wed) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 20:51 UTC (Wed) by MisterIO (guest, #36192) [Link]
But anyway, my argument was not just !assembly, it was also not so much assembly. Look at the one proposed by Linus:
void *memcpy(void *dst, const void *src, size_t size)
{
void *orig = dst;
asm volatile("rep ; movsq"
:"=D" (dst), "=S" (src)
:"0" (dst), "1" (src), "c" (size >> 3)
:"memory");
asm volatile("rep ; movsb"
:"=D" (dst), "=S" (src)
:"0" (dst), "1" (src), "c" (size & 7)
:"memory");
return orig;
}
It may not be all that well tested, but it's simple enough to be comprehensible.
Glibc change exposing bugs
Posted Nov 10, 2010 21:15 UTC (Wed) by jwb (guest, #15467) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 21:20 UTC (Wed) by JoeBuck (guest, #2330) [Link]
No, the semantics of the function are that the behavior is not defined if the source and destination strings overlap, as the relevant standards and the man page clearly state. That's why there's an alternative function named memmove. If you write C, call memcpy, and the arguments overlap, you've written a non-portable program.
Glibc change exposing bugs
Posted Nov 10, 2010 21:55 UTC (Wed) by nix (subscriber, #2304) [Link]
(I know you know this, this is really for others reading)
Glibc change exposing bugs
Posted Nov 11, 2010 1:19 UTC (Thu) by tialaramex (subscriber, #21167) [Link]
[ Sadly I don't trust the Adobe developers enough to imagine that a diagnostic from static analysis would have stopped them doing this. I think "Warning: abuse of memcpy()" would have scrolled by with hundreds of other warnings they ignore... ]
In the same way "Don't frabidulate the wugs, she's my uncle" is an English sentence, but it isn't clear what it means. You can parse it, and you can answer some questions about it, e.g. "Are you being asked to frabidulate the wugs?" but there are big unknowns.
Glibc change exposing bugs
Posted Nov 11, 2010 7:21 UTC (Thu) by nix (subscriber, #2304) [Link]
This code doesn't break any of the rules of the C language which would cause it not to parse.It is true that the standard does not require a diagnostic in this case, and that providing a diagnostic in all cases at compile time is impossible, but that doesn't make it any less 'not C'. C is not just 'what the compiler happens to accept'.
(I completely agree that in this particular case a warning would likely have been useless unless accompanied by a brickbat.)
btw, HJ's trying to fix the underlying problem here.
Glibc change exposing bugs
Posted Nov 10, 2010 21:21 UTC (Wed) by MisterIO (guest, #36192) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 22:04 UTC (Wed) by joib (subscriber, #8541) [Link]
1) Less I$ pollution. You won't see this in a memcpy() benchmark, but what about a more realistic workload?
2) Give some incentive to CPU makers to optimize the simple rep mov instead of requiring ever more fancy unrolled loops written in the latest instruction set extension. :)
Glibc change exposing bugs - a bug in proposed memcpy
Posted Nov 16, 2010 16:45 UTC (Tue) by promotion-account (guest, #70778) [Link]
Look at the one proposed by Linus:
void *memcpy(void *dst, const void *src, size_t size)
{
void *orig = dst;
asm volatile("rep ; movsq"
:"=D" (dst), "=S" (src)
:"0" (dst), "1" (src), "c" (size >> 3)
:"memory");
asm volatile("rep ; movsb"
:"=D" (dst), "=S" (src)
:"0" (dst), "1" (src), "c" (size & 7)
:"memory");
return orig;
}
For completeness, this should have an "rcx" clobber, or GCC may believe that this important register will not change after each assembly snippet. Such a bug may get triggered if GCC aggressively inlined the code, which occurs in a good number of cases given its optimizer competency.
--Darwish
Glibc change exposing bugs
Posted Nov 10, 2010 20:45 UTC (Wed) by Rubberman (guest, #70320) [Link]
[quote]
The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas
should not overlap. Use memmove(3) if the memory areas do overlap.
[/quote]
Nice timing!
Posted Nov 10, 2010 21:24 UTC (Wed) by proski (subscriber, #104) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 21:25 UTC (Wed) by ikm (subscriber, #493) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 21:47 UTC (Wed) by jwb (guest, #15467) [Link]
Gentoo is fairly mainstream
Posted Nov 10, 2010 23:37 UTC (Wed) by alex (subscriber, #1355) [Link]
Glibc change exposing bugs
Posted Nov 10, 2010 23:56 UTC (Wed) by gerdesj (subscriber, #5446) [Link]
Cheap shot.
gcc version 4.4.5 (Gentoo 4.4.5 p1.0, pie-0.4.5)
But I do get the choice of something else if I want it - not what is rammed down my throat by the "mainstream".
I also get to support it ...
On the bright side, if your statement is true about release by Gentoo then I get a better chance of Flash working than you do - oh look no snags with Youtube audio.
Cheers
Jon
Glibc change exposing bugs
Posted Nov 14, 2010 5:33 UTC (Sun) by dirtyepic (subscriber, #30178) [Link]
(Gentoo toolchain dev)
Glibc change exposing bugs
Posted Nov 10, 2010 21:54 UTC (Wed) by joib (subscriber, #8541) [Link]
Glibc change exposing bugs
Posted Nov 17, 2010 15:02 UTC (Wed) by meuh (subscriber, #22042) [Link]
The real problem here
Posted Nov 10, 2010 23:38 UTC (Wed) by bojan (subscriber, #14302) [Link]
The real problem here
Posted Nov 11, 2010 12:58 UTC (Thu) by nye (guest, #51576) [Link]
Which would of course be a great shame, because avoidably breaking existing applications is wrong, regardless of whether that program has a hidden bug in it or not.
The real problem here
Posted Nov 11, 2010 18:41 UTC (Thu) by xilun (guest, #50638) [Link]
Two schools
Posted Nov 12, 2010 13:01 UTC (Fri) by tialaramex (subscriber, #21167) [Link]
You can get yourself tied in some terrible knots this way. Chen's blog The Old New Thing is currently documenting how starting from [let's not annoy CP/M programmers] got them to [making an OS component optional causes security vulnerabilities in third party programs], over the course of a decade or so. Every step along the way is completely rational but the result is a confusing, insecure mess that's hard to reform.
But the alternative school, where everything not tied down and documented is up for grabs, and the tied down stuff might be cut loose and "deprecated" with relatively little notice, causes its fair share of problem as we've seen with the thread's topic.
Let me say this: It is very far from clear which of the alternatives here is better for anyone, from users to developers to OS vendors, let alone which would be best for all.
Two schools
Posted Nov 12, 2010 17:48 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]
I can just imagine trying to convince the glibc folks to autodetect SimCity to dynamically change how free() works.
The real problem here
Posted Nov 19, 2010 2:14 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link]
---linuxrocks123
Glibc change exposing bugs
Posted Nov 11, 2010 0:32 UTC (Thu) by kunitz (subscriber, #3965) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 0:44 UTC (Thu) by bojan (subscriber, #14302) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 7:20 UTC (Thu) by kunitz (subscriber, #3965) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 18:23 UTC (Thu) by oak (guest, #2786) [Link]
What would be the point of slowing down memcpy for all CPUs (by adding an extra check for cached CPU type variable value)? As long as the change doesn't slow down things for other CPUs, and considerably speeds it up on some, it sounds fine...
Glibc change exposing bugs
Posted Nov 12, 2010 0:37 UTC (Fri) by jamesh (guest, #1159) [Link]
Glibc change exposing bugs
Posted Nov 12, 2010 10:51 UTC (Fri) by marcH (subscriber, #57642) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 2:06 UTC (Thu) by gmaxwell (guest, #30048) [Link]
That the error was found in flash, of all places, is not surprising. It's also scary that no one at adobe has been running flash in valgrind (but also not surprising).
Glibc change exposing bugs
Posted Nov 11, 2010 5:33 UTC (Thu) by PaulWay (subscriber, #45600) [Link]
Interesting that no-one's suggested we fix the obviously ambiguous wording in the man page. It seems that trusting the C programmer to know the difference between memcpy and memmove - whose names do _not_ imply anything about their behaviour - is a bad thing. Rusty's Hierarchy of API Design scores another victim, and yet no-one wants to fix either the API, the documentation or the behaviour.
Interesting that the question of why backwards-copying is necessary remains (AFAICS) unanswered. Has anyone actually tested whether the loop can be written with a forwards-copy and whether it performs better or worse than the backwards-copy and/or the Linus brute-force method?
Interesting that everyone who doesn't want to change memcpy to do checks or warn or everything asserts, without much actual evidence, that it would be a Bad Thing. Citation needed, or at least some crude benchmarks or numbers.
In my uninformed opinion it would be better to have one version of memcpy (memmove implies that the memory is absent from the source once completed, which is not true) that does the checks. The tiny overhead will be nothing compared to the page faults you're almost certainly incurring with repeated use. Run some tests, real world or otherwise, to see whether it really makes any difference. If it doesn't, make memmove a defined alias for memcpy, update the documentation, and everyone wins. The API remains the same, badly written applications don't die because of an underlying implementation change, lazy programmers have their arses saved, everyone wins.
But what do I know? I'm still writing the test.
Have fun,
Paul
Glibc change exposing bugs
Posted Nov 11, 2010 9:54 UTC (Thu) by mpr22 (guest, #60784) [Link]
That particular kind of lazy programmer deserves to lose, unfortunately.
Glibc change exposing bugs
Posted Nov 12, 2010 0:17 UTC (Fri) by PaulWay (subscriber, #45600) [Link]
I personally think we should do more tests with this kind of thing. Use the same LD_PRELOAD trick that Linus used to fix the problem to see if any other applications are assuming that memcpy will work on overlapping regions. See if we can find any other little abuses of standards, or ambiguities in them, that might catch us out in the future. And not just fix the code, but fix the standard. No-one, I hope, is saying that people should be using memcpy as if it were overlap-safe - just that existing code which does is sort of exempt from criticism.
Ah well, maybe we'll all look back on this and laugh.
Have fun,
Paul
Glibc change exposing bugs
Posted Nov 12, 2010 7:41 UTC (Fri) by hozelda (guest, #19341) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 12:15 UTC (Thu) by alankila (guest, #47141) [Link]
But sadly, I don't think it is possible to make people here accept that simply aliasing memcpy() to memmove() is actually the best solution. I bet the difference wouldn't show in anything but carefully constructed microbenchmarks, and yet we would be able to squash a whole class of bugs at once.
However, I do believe that the best use of increased CPU power is to spend it on simplifying the system, because that allows raising the complexity bar somewhere else higher up. (I believe in complexity budget: a finite number of things are possible. You are best off spending that complexity budget on features close to the user than on those close to the metal, so making people not have to care about difference for memmove() vs. memcpy() allows them to spend time caring about something that's far more useful.)
Glibc change exposing bugs
Posted Nov 11, 2010 17:24 UTC (Thu) by RobSeace (subscriber, #4435) [Link]
Well, actually, it IS kind of true... If the regions overlap (which is the sole
point of using memmove()), then it implies that the source region will indeed
no longer contain the data it previously contained, since at least part of it
would've been overwritten by the memmove() to the (overlapping) destination
region...
I side with the glibc people: any C programmer worth a damn knows better than
to use memcpy() on overlapping regions... Anyone that does so is writing known
buggy code that will fail to work on many systems... That it just happened to
have worked by chance until now on glibc doesn't matter a bit... There are
lots of subtle bugs you can make that appear to work fine until something
changes and exposes them... You see it in buffer overflows all the time; if
you overflow just a small amount and there's some meaningless variable there
in memory that you overflow into, no harm done, no crashing, no noticable bad
behavior at all... But, compile with a different optimization level, or change
the code in a certain way, and BAM!, that variable is no longer there to catch
the overflow, and you end up trashing something important... Now, are you
seriously going to say GCC should support such obviously buggy code by making
sure to always continue laying out variables in memory just as it did the
first time, so that the overflow causes no harm? If not, then how is this
glibc change any different at all?
Glibc change exposing bugs
Posted Nov 11, 2010 11:01 UTC (Thu) by slashdot (guest, #22014) [Link]
If the data is less than 128 bytes, it can be just all read into SSE2 registers and then written out, which handles overlap fine.
Otherwise, you can just check (size_t)(src - dst) >= (size_t)length, which shouldn't be that expensive compared to the copy.
But anyway, why is a backward copy supposed to be faster? It would seem pretty silly to design a CPU such that copies are better done backwards.
Perhaps just converting the new algorithm to a forward copy would give the same improvements?
Glibc change exposing bugs
Posted Nov 11, 2010 11:03 UTC (Thu) by slashdot (guest, #22014) [Link]
Glibc change exposing bugs
Posted Nov 11, 2010 12:25 UTC (Thu) by NikLi (guest, #66938) [Link]
There is also a big advantage by doing that: hopefully gcc in some cases can detect the alignment of pointers at compile-time and use even faster variants, which is even more important.
At least we hope that the gcc devs will remain sane (inclusion of "go" frontend is scary knowing that google tends to withdraw services and software without much thought (wave, etc))...
Glibc change exposing bugs
Posted Nov 11, 2010 14:49 UTC (Thu) by nix (subscriber, #2304) [Link]
Go is not just 'google': Go (in GCC) is Ian Lance Taylor, who is a very-long-standing GCC hacker who doesn't have a record for abandonware (hell, he put out a new release of Taylor UUCP not too long ago, and how old is *that*?)
Glibc change exposing bugs
Posted Nov 12, 2010 14:32 UTC (Fri) by Wol (guest, #4433) [Link]
Bear in mind Intel processors are arse-about-face (otherwise known as big-endian). Running on a little-endian processor, there is a clear "top" and "bottom". So we can define forwards and backwards.
But on Intel, let's say I want to write the number 1,234,567,890. And my processor has a 3-digit word size. It actually physically exists in the system as 890,567,234,1 ! So where's the top, bottom, front or back?
The other question, of course, is does the address register increment or decrement faster. There's no reason why those two operations should be equal cost (there's no reason why they shouldn't be, either :-) And if they're different, the result will be a difference in speed going forward or backwards.
Cheers,
Wol
Glibc change exposing bugs
Posted Nov 15, 2010 9:47 UTC (Mon) by mpr22 (guest, #60784) [Link]
I must confess to being utterly boggled by the notion of a backwards block copy (decrementing address) being faster than the forward (incrementing address) version. I mean, doesn't backward copying break the memory controller prefetch?
Glibc change exposing bugs
Posted Nov 15, 2010 11:50 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link]
Glibc change exposing bugs
Posted Nov 15, 2010 12:15 UTC (Mon) by slashdot (guest, #22014) [Link]
In particular, won't just doing all reads before all writes ensure no aliasing regardless of CPU operation?
I think there are enough callee-clobbered registers on x86-64 to allow that.
That is, do this:
movq (%rsi), %rax
movq 8(%rsi), %rdx
movq %rax, (%rdi)
movq %rdx, 8(%rdi)
Also, their backward copy obviously aliases if rsi is 0xf00c instead of 0xf004. I'm not sure why either of these cases should be intrinsically more frequent.
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 15:24 UTC (Thu) by dwheeler (guest, #1216) [Link]
This isn't an either/or situation. The glibc folks have a great point that it's absurd to presume that a call preserve some functionality, when it has never guaranteed it and the various documentation available SPECIFICALLY says to not depend on it. But Torvalds also has a point that functionality not officially guaranteed, but depended on by real programs, shouldn't be lightly disregarded.
I think the solution for stuff like this is to phase in major changes, in a slower way. First, clearly document that "it used to work this way in practice, but soon it won't". Implement the new semantics in a "testing" library so that people can test it out before it goes "live", but don't ram it down production systems at first. Document *how* to run the testing situations clearly and obviously; libc_g and friends are essentially impossible to find, even if you know they exist. Then, after some time, switch. Yes, even all this somebody will be caught off guard, but the list of impacts will be a lot shorter (and thus more manageable). Also, if you've warned people, many people will be looking for that kind of problem, making it much easier to identify and fix the stragglers.
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 15:36 UTC (Thu) by jwb (guest, #15467) [Link]
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 19:40 UTC (Thu) by sgros (subscriber, #36440) [Link]
There are so many broken programs because someone tested something in a specific environment and it happened to work in that particular case and that test finishes with the broad conclusion it will always work.
Network is another example. I heard people, writing networking code that directly accesses Ethernet, claim that frames smaller than 46 octets are perfectly OK. Yes, they are, until some user starts using that code in different environment that is strict with respect to specs.
In the end, I'm not for helping bad programs and lazy programmers (lazy in negative, not positive sense!)
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 19:25 UTC (Thu) by xilun (guest, #50638) [Link]
Standard preconditions user have to observe _are_ parts of the semantic, and neither those nor the correct behavior of the glibc has changed when you do observe them, so in no way this particular memcpy optimization is a major change. Actual preconditions are often relaxed in a given implementation, but unless it's documented in an additional standard the way they are relaxed will never be the same between two implementations or two version of the same, so nobody can pretend to reliably take advantage of undocumented relaxed preconditions.
Would that particular memcpy change be considered as a major change, _every_ glibc change would need to be considered as a major change.
In other words, when a language do define from the beginning of time that trying some operations would result in undefined behaviors (and has since always be consistent about this definition), and when a system does not provide further guarantees, then it does not matter what the observed behavior is with version X of the compiler, Y of the libC, and processor Z with die revision T -- changing any of X, Y, Z or T, or even seemingly unrelated parts of the faulty program can result in it to violently explode, and will eventually result in that because of Murphy's law. It will still result in that even if you blame glibc developers for your own mistakes.
Every C programmer should know the distinction between implementation-defined behavior, undefined behavior, and unspecified behavior -- otherwise he should rather program in an other language... You'd also better have some notions about how compilers, sometimes in a way related to associated libraries, can take advantages of explicitly undefined and unspecified behaviors to do some optimization. Stopping to do that would be hugely ridiculous, on a level as ridiculous as stopping to simplify boolean equations by taking advantage of "don't care" outputs, or even stopping to automatically factorize redundant computations.
If you know the difference, but just don't like that C has undefined behaviors, or that C compilers and other associated system stacks are targeting efficient code sometimes by taking advantages of explicitly undefined behaviors, well it's not going to change anytime. So in this case you also don't have any choice: use another language.
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 22:34 UTC (Thu) by dafid_b (guest, #67424) [Link]
On one side there are arguments that a change that is made by free software purists that happens to break pre-existing programs is good - because FLASH is one of the broken programs...
On the other side are arguments that users systems are exposed to corruptions due to changes in the behaviour of a library call made for a marginal optimisation of a utility function.
To put it in perspective: I do not want the software I rely on to have ONE randomly inserted bug activated for a 200% improvement of its overall performance.
That bug could be the one a hacker uses to observe my credit-card details when paying for LWN subscription.
The proposed benefit is 20% of 2%, or 0.4%. The possible cost is my bank account.
I hope that the packagers of the distributions do the sensible thing.
That is: pull that change out and shoot it.
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 22:57 UTC (Thu) by dgm (subscriber, #49227) [Link]
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 23:09 UTC (Thu) by dafid_b (guest, #67424) [Link]
"Actually I think we may have first seen this with squashfs. Problems showed up right before the F14 alpha. Phillip found the cause of the problem was using memcpy instead of memmove."
So there are at least two bugs exposed by this change in Glibc.
There may be more. There are vast number of applications out there still waiting to be tested.
It is just impolite to cause users to do the testing when you don't have to.
Dave
This isn't either/or. Phase in such changes instead!
Posted Nov 12, 2010 4:31 UTC (Fri) by mrshiny (subscriber, #4266) [Link]
This isn't either/or. Phase in such changes instead!
Posted Nov 17, 2010 15:14 UTC (Wed) by meuh (subscriber, #22042) [Link]
Remember: this bug exists for all users of glibc even if they compiled their apps a long time ago but recently updated glibc.It's not a bug. And it doesn't affect all users.
Hopefully, legitimate uses (regarding to specification) of memcpy() are not affected by the optimisation in newer glibc.
This isn't either/or. Phase in such changes instead!
Posted Nov 17, 2010 15:52 UTC (Wed) by mrshiny (subscriber, #4266) [Link]
Yes, it is a bug. Sure, the application is responsible for using APIs properly. But here we have a situation where a library has worked one way for years, and then suddenly works a different way. There was no way for the apps in question to detect the bugs because the code worked perfectly before. Now, due to a library upgrade, those apps don't work. In some cases there is data corruption. The corruption might happen silently. There is no way to be sure that this change is not quietly damaging untold amounts of data without auditing every use of memcpy everywhere to ensure that it is doing the right thing.
And this means that not only do you have to fix all source code which is wrong and issue new binaries, but you shouldn't upgrade to this version of Glibc because you might have an app somewhere that wasn't fixed, or isn't fixed in the version you have installed.
Glibc is a critical library in the system. Almost every program uses it. As such, it is their responsibility to treat ABI changes very carefully. Sure, this is not a change in the specification, it is an unintended consequence and it's due to those stupid lazy programmers who didn't read the spec or didn't care or whatever. Or inadvertently introduced errors when their code was changed. Or changed something without realizing that this change would result, somewhere, in a call to overlapping memcpy. Given that the bug was hard to identify (at least for some cases), and given that Glibc has symbol versioning, maybe they should use it?
Your last sentence sums up the problem: "Hopefully legitimate uses are not affected". I think we should expect stronger guarantees from glibc than "hopefully".
This isn't either/or. Phase in such changes instead!
Posted Nov 17, 2010 16:09 UTC (Wed) by meuh (subscriber, #22042) [Link]
This isn't either/or. Phase in such changes instead!
Posted Nov 17, 2010 16:23 UTC (Wed) by xilun (guest, #50638) [Link]
There is also no way to be sure that this change is not _fixing_ untold amounts of data corruption when the memcpy is done backward without auditing every use of memcpy :)
Anyway, C being what it is, this is a little ridiculous to do a fixation on that particular change, because some other changes exposing bugs are done every day, hundred at a time. So really, you have no way after ANY upgrade to be sure that memory corruption won't mysteriously happens when they previously did not. If that's a problem for you, don't ever update anything => problem magically solved.
> given that Glibc has symbol versioning, maybe they should use it?
Nope. Symbol versioning is for ABI changes, and symbol versioning does not even pretend to automatically solve every problem ABI changes has been shown to cause. The memcpy implementation change is not even an ABI change.
> Your last sentence sums up the problem: "Hopefully legitimate uses are not affected". I think we should expect stronger guarantees from glibc than "hopefully".
There was a problem only in the sentence. The "hopefully" is not needed. Legitimate users of memcpy will not be affected.
This isn't either/or. Phase in such changes instead!
Posted Nov 12, 2010 0:08 UTC (Fri) by xilun (guest, #50638) [Link]
I fail to see how my previous post, which you replied to, is in any way related to free software purist happy to break Flash.
Indeed I think you did not even read it.
So I'll make an executive summary (but with new elements, for those who follow): in http://www.coding-guidelines.com/cbook/cbook1_2.pdf ; read, starting at pdf page 183, 3.4 behavior, 3.4.1 implementation-defined behavior, 3.4.3 undefined behavior, and 3.4.4 unspecified behavior. You will then hopefully understand why it would indeed be *dangerous* for security (not even talking about performance) in the long term if a widely used implementation starts giving guarantees defining "undefined behaviors", or if the maintainers of such implementation start acting like there seems to be some guarantees. (Think about other compliant implementations.)
If you don't like implementation-defined, undefined and unspecified behaviors in programming languages, use Java. I'm indeed starting to wonder if Linus does not secretly dream about writing operating systems in Java -- look at: some of his responses during the NULL-page mapping debacle, GCC adding optimizations taking advantages of undefined behaviors on integers, and his position on this memcpy implementation.
> On the other side are arguments that users systems are exposed to corruptions due to changes in the behaviour of a library call made for a marginal optimisation of a utility function.
Users systems are exposed to corruptions because they wrote code having undefined behavior in the first place, and there should be neither surprise nor scandal when code containing faulty constructs having undefined behavior starts to behave in an undefined way, because that's precisely the definition of what "undefined behavior" means.
Undefined behavior could has well change observable behavior depending on your power supply, the phase of the moon, and the fact Linus has been personally annoyed by a random bug (the last cause being the most probable in those examples, which is a little weird from an economical perspective, but oh well). Blame glibc maintainers all that you want, but you'll soon have multiple targets when the next advance in GCC expose other bugs caused by other undefined behaviors.
> To put it in perspective: I do not want the software I rely on to have ONE randomly inserted bug activated for a 200% improvement of its overall performance.
Under which perspective bugs are not "randomly" inserted? (given a non malicious intent in the first place). Would you be OK with "ONE randomly inserted bug activated" because of a change for support of a new hardware, or a new feature. Do you realize that even a bug fix can activate other bugs? Do you realize that you can easily avoid all that kind of trouble by NOT upgrading your system library ever, if you really want to? Do you understand that optimizations made at system level follow a different economic than optimizations made at application level? Do you understand that compiler/library evolution have participated in the moore law, and that your computer would be maybe 4x slower or produce 4x more heat if we still were in the naive compiler era and if low level layers had not been updated to be efficient with modern processor architecture?
> The proposed benefit is 20% of 2%, or 0.4%. The possible cost is my bank account.
Given the nature of the memory corruption, very unlikely to have that kind of security impact (but not 100 % impossible).
What is really funny is that even without the incriminated patch, the memcpy was NOT a memmove (fortunately). This particular Flash call resulted in corrupted data when copying memory in reverse order because the pointers was in a specific order, and the area overlapping. Calling memcpy with the previous GLibc implementation, and probably 99% of implementations existing on earth, will still result in data corruption if memory area are overlapping in the other order.
So I suggest you run your whole system LD_PRELOAD'ing all processes with a library that calls memmove instead of memcpy, if you are worried too much about that.
I also suggest that you immediately start looking for other bugs more susceptible to have big security impacts than this class, and that you also workaround them in weird way instead of fixing them correctly in the first place.
Maybe it would indeed be easier that you take a really old distribution, with a compiler that does very few optimization, and a very simple libc, and stick to it forever. (Well, you'll still have to do the memcpy/memmove replacement trick, but you'll have very few optimizations, so I guess that will make you happy.)
And oh, I forget to tell you: randomly defining "undefined behaviors" without auditing every components involved in both the system and its construction can sometime expose bugs with an high security impact. See the NULL page mapping debacle.
> I hope that the packagers of the distributions do the sensible thing.
> That is: pull that change out and shoot it.
Yeah, they all are reading LWN comments, waiting for your enlightenments.
This isn't either/or. Phase in such changes instead!
Posted Nov 12, 2010 1:10 UTC (Fri) by dafid_b (guest, #67424) [Link]
However, I think that the conversation we are having in this thread is a bit disjointed because when I say 'user' I mean a 3rd party to this conversation - not the Flash developer, not the Glibc developer, nor the crushfs developer.. but a simple user:). Whereas I think you read my user as 'developer'.
In the later post I liken (the knowing continued) delivery of this change to Glibc to mugging the person (user) who is near (uses the software written by) a jay-walker(developer who used undefined behaviour that used to work in the past).
That does not seem very fair to the user. It is sure to convince most users to stop being Linux users if the change does cause a security issue to happen - and they find out that it was a deliberate choice.
I think a better policy would be to mug the developer (send the crash reports, mocking messages in the trade press, or whatever).
This could be done by putting an intercept layer between Glibc in system tests that any user could load - at a known performance cost - that logs such violations of API requirements.
I would be happy, ecstatic even, to take part in such a mugging, when I am not doing my banking on the system.
Thanks for pointing out my other mistake - I should have said 'randomly activated bugs' rather than 'randomly inserted bugs' - as from both the end-user and developer perspective that is what is happening.
On your other points, I agree. However I think that the problem the points address is developer behaviour, and the person you mug is the user.
end user should not be punished, depending of distro target
Posted Nov 12, 2010 2:28 UTC (Fri) by xilun (guest, #50638) [Link]
But I would not even be angry against a distribution that makes the choice to not care at all about Flash. I perfectly understand that some can absolutely not care about Flash, in which case an angry user should just do a workaround himself or switch to an other distro, if he indeed is not in the target of the one he used.
end user should not be punished, depending of distro target
Posted Nov 12, 2010 5:03 UTC (Fri) by dafid_b (guest, #67424) [Link]
end user should not be punished, depending of distro target
Posted Nov 12, 2010 19:28 UTC (Fri) by charlieb (subscriber, #23340) [Link]
And wouldn't it be nice if Fedora 14 were to do this :-)
This isn't either/or. Phase in such changes instead!
Posted Nov 11, 2010 23:04 UTC (Thu) by dgm (subscriber, #49227) [Link]
This isn't either/or. Phase in such changes instead!
Posted Nov 12, 2010 18:26 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]
This isn't either/or. Phase in such changes instead!
Posted Nov 12, 2010 0:10 UTC (Fri) by bojan (subscriber, #14302) [Link]
We are talking about brand new Fedora release, with yet unreleased version of glibc 2.12.90. At some point software has to get shipped in order to get tested by real users.
This isn't either/or. Phase in such changes instead!
Posted Nov 12, 2010 11:06 UTC (Fri) by marcH (subscriber, #57642) [Link]
This would be a way too much reasonable and professional approach. It would have a very high risk of less flamewars.
Glibc change exposing bugs
Posted Nov 11, 2010 23:03 UTC (Thu) by dafid_b (guest, #67424) [Link]
An earlier suggestion was to replace the change with a API violation detector that causes programs to crash rather than corrupt their state.
This is better than a silent corruption - but still antisocial. A program that used to work now fails. Not everyone can fix the cause of the crash. Not all software is unimportant to the user. It is a bit like suggesting that when you see a j-walker you should just mug the person next to them, as a deterrent for future jay-walkers.
I think it is fine that testing releases use the crash-bad-behaving applications change.
But the system released to users should provide the previously working Glibc.
And the Glibc developers should listen to Linus.
Cause, I really would prefer my software to work.
Selfish? Yes.
Glibc change exposing bugs
Posted Nov 12, 2010 0:15 UTC (Fri) by xilun (guest, #50638) [Link]
Then do some system level tests.
Glibc maintainers are not responsible for your system integration and QA.
> Selfish? Yes.
Indeed.
But considering Glibc maintainers are not generous to the point the will do your system integration and QA, we have an impedance mismatch here, and they will just do as they want, which in the end seems quite logical.
Glibc change exposing bugs
Posted Nov 12, 2010 1:51 UTC (Fri) by dafid_b (guest, #67424) [Link]
I would like to that in my spare time - not many hours - and leverage my subsequent relaxation browsing time...
What I am thinking is based on rough understanding, so please pass along any hints.
My idea is to provide a memcpy() that can safely be used in any application with minimal changes to the software behaviour, and yet provide logging of bad usage for proactive corrections.
It is ok if the system is slower.. but it should still work.
I don't really know how to do the logging as it should:
* not interfere with threads, signals etc
* be always available
The memcpy() is pretty simple..
A replacement memcpy() based on combination of memove() to test the parameters and the old memcpy() to provide the implementation for stability of my software.
When a memcpy() is made with bad args, that would normally invoke the special logic an error is logged by PID? and the old memcpy() still invoked to deliver the vanilla experience.
The replacement sounds like building a patched Glibc.
Any suggestions or hints for how to do the logging would be appreciated.
Glibc change exposing bugs
Posted Nov 12, 2010 5:00 UTC (Fri) by dafid_b (guest, #67424) [Link]
If this approach is reasonable.. then just need to link it into memcpy as outlined above to have a trace of last few hundred errors on the system in shared memory waiting to dumped...
Thoughts?
$ dd of=/tmp/data if=/dev/zero count=16
$ g++ test.cpp
$ ./a.out /tmp/data
ret=0x80489de, dest=0x976d008, src=0x8048b4d, len=4, pid=205b
$ cat test.cpp
#include <sys/types.h>
#include <unistd.h>
#include <sys/mman.h>
#include <syscall.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <syscall.h>
#include <errno.h>
#include <string.h>
#define SHARED_MEM_SIZE 8192
struct data // trace data for a memcpy() error
{
pid_t pid ;
const void *ret ;
const void *dest ;
const void *src ;
size_t len ;
} ;
struct log
{
long int index ; // sequence or index of last entry used
struct data entry[ SHARED_MEM_SIZE/sizeof(struct data) - 1 ] ; // vector of logging instances
} ;
struct log *pLog ;
#define N_ENTRIES (sizeof(pLog->entry)/sizeof(pLog->entry[0]))
void
displayLog(int i)
{
int j = i % N_ENTRIES ; // restrict index.
fprintf(stderr, "ret=%p, dest=%p, src=%p, len=%ld, pid=%lx\n",
pLog->entry[j].ret,
pLog->entry[j].dest,
pLog->entry[j].src,
(long)pLog->entry[j].len,
(unsigned long)pLog->entry[j].pid
) ;
}
void
capture(const void *dest, const void *src, size_t n)
{
int newLoc, oldLoc ;
do {
oldLoc = pLog->index ;
newLoc = pLog->index + 1 ;
} while ( __sync_bool_compare_and_swap( pLog->index, newLoc, oldLoc ) ) ;
int j = newLoc % N_ENTRIES ; // restrict index.
pLog->entry[j].ret = __builtin_return_address(0) ; //__builtin_extract_return_address(ra) ;
pLog->entry[j].dest = dest ;
pLog->entry[j].src = src ;
pLog->entry[j].len = n ;
pLog->entry[j].pid = getpid() ;
}
void *
setup(char *av)
{
void * p = NULL ;
int fd ;
fd = open(av, O_RDWR, 0x777) ; // open the file.
if (fd < 0)
{
fprintf(stderr, "Failed to open file /%s/ errno=%d\n", av, errno) ;
return 0 ;
}
p = mmap(0, SHARED_MEM_SIZE, PROT_WRITE|PROT_READ, MAP_SHARED, fd, 0) ;
if (p == 0)
{
fprintf(stderr, "Failed to mmap file /%s/ errno=%d\n", av, errno) ;
close(fd) ;
return 0 ;
}
// have mapping in p of SHARED_MEM_SIZE bytes
fprintf(stderr, "mapped %s to %p on fd %d\n", av, p, fd) ;
return p ;
}
int main(int ac, char **av)
{
int fd ;
if (ac >1)
{
pLog = (struct log*)setup(av[1]) ;
}
else
fprintf(stderr, "map <file>\n") ;
if (pLog)
{
capture(strdup("pete"), "joe", 4 ) ;
displayLog(pLog->index) ;
}
return 0 ;
}
Glibc change exposing bugs
Posted Nov 12, 2010 6:44 UTC (Fri) by cmccabe (guest, #60281) [Link]
This is not correct. You want
> fd = open(av, O_RDWR, 0777)
Yes, it's an octal constant. Or use the symbolic constants.
Also, this is more of a personal preference thing, but bumpyCaps and hungarian notation are frowned on by most.
In a larger sense, I think you don't want to rebuild glibc. You probably just want to use "the LD_PRELOAD trick"
If I were you, I would print my nastygrams to syslog, using the syslog(3) function. Most sysadmins don't check random areas of shared memory that often. If you do choose to use shm, try shm_open.
cheers,
C.
Glibc change exposing bugs
Posted Nov 12, 2010 9:09 UTC (Fri) by dafid_b (guest, #67424) [Link]
Is syslog() safe to call at this point?
It generates formatted output, which seems like it could itself call memcpy() or do other stuff in te library that the app did not allow for in its plan when it called memcpy.
Also is the system call that sends the message to the log safe, or can it have side effect such as signals and new error codes in errno?
I would be very happy if the answer to the above is: syslog() safe to call like this with no side-effects.
Glibc change exposing bugs
Posted Nov 13, 2010 3:02 UTC (Sat) by cmccabe (guest, #60281) [Link]
You raise a good issue. glibc's version of syslog is known to call malloc sometimes, which means that you shouldn't use it from within a signal handler. Surprisingly, memcpy isn't on the official list of "async-signal safe" functions, so you could argue that such an implementation would be POSIX conforming :)
But seriously. I think the best thing to do is probably implement your own version of syslog with no memory allocations or calls to memcpy. It's pretty easy to do in a few hundred lines. I had to do it before when writing a good signal handler.
C.
Glibc change exposing bugs
Posted Nov 12, 2010 18:21 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]
http://valgrind.org/docs/manual/mc-manual.html#mc-manual....
Not to deprive you of the experience of doing it yourself, which can be instructive. However, you should need to reinvent the wheel if all you want is the use of the tool. At the very least, you can see how valgrind does it. As for automatically invoking it, well that's an exercise for the reader. :)
C99 restrict
Posted Nov 12, 2010 11:12 UTC (Fri) by marcH (subscriber, #57642) [Link]
GCC built-in memcpy
Posted Nov 13, 2010 1:11 UTC (Sat) by rriggs (subscriber, #11598) [Link]
Kernel 2.6.36 broke my CentOS-5 Gnome 2.16 battery info
Posted Nov 25, 2010 17:32 UTC (Thu) by dag- (subscriber, #30207) [Link]
Well, I don't know how general that rule is, because kernel 2.6.36 ripped out an important set of /proc/acpi entries that are still used on older Gnome releases (eg. CentOS-5).
A separate project, named ELRepo, provides backported kernel modules, but also the current mainline kernel built specifically for CentOS-5. Which is great for testing/running the latest kernel with a stable and trusted distribution. Since 2.6.36, not anymore, as my laptop couldn't provide proper ACPI information, and as such couldn't suspend/hibernate before running out of power :-(
More information about this, and other breakage is available from:
Kernel 2.6.36 broke my CentOS-5 Gnome 2.16 battery info
Posted Nov 25, 2010 18:10 UTC (Thu) by dag- (subscriber, #30207) [Link]
Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds