Archive:
In this section: Subtopics:
Comments disabled |
Mon, 19 Jun 2017 On Saturday I posted an article explaining how remote branches and remote-tracking branches work in Git. That article is a prerequisite for this one. But here's the quick summary:
We will consider the following typical workflow:
But step 3 fails, saying something like:
In older versions of Git the hint was a little shorter:
Everyone at some point gets one of these messages, and in my experience it is one of the most confusing and distressing things for beginners. It cannot be avoided, worked around, or postponed; it must be understood and dealt with. Not everyone gets a clear explanation. (Reading it over, the actual message seems reasonably clear, but I know many people find it long and frighting and ignore it. It is tough in cases like this to decide how to trade off making the message shorter (and perhaps thereby harder to understand) or longer (and frightening people away). There may be no good solution. But here we are, and I am going to try to explain it myself, with pictures.) In a large project, the remote branch is always moving, as other
people add to it, and they do this without your knowing about it.
Immediately after you do the fetch in step 1 above, the
tracking branch Typical workflowWe were trying to do this:
and the failure occurred in step 3. Let's look at what each of these operations actually does. 1. Fetch the remote
|
Cheryl Burke | Huckleberry |
I thought Cheryl Burke was sufficiently famous, sufficiently recently, that most people might have heard of her. (Even I know who she is!) But I gave a version of the !!Con talk to the Philadelphia Perl Mongers the following Monday and I was the only one in the room who knew. (That version of the talk took around 75 minutes, but we took a lot of time to stroll around and look at the scenery, much of which is in this article.)
I had a struggle finding the right Cheryl Burke picture for the !!Con talk. The usual image searches turned up lots of glamour and fashion pictures and swimsuit pictures. I wanted a picture of her actually dancing and for some reason this was not easy to find. The few I found showed her from the back, or were motion blurred. I was glad when I found the one above.
A few days before the !!Con talk my original anagram-scoring article hit #1 on Hacker News. Hacker News user Pxtl suggested using the Wikipedia article title list as an input lexicon. The article title list is available for download from the Wikimedia Foundation so you don't have to scrape the pages as Pxtl suggested. There are around 13 million titles and I found all the anagrams and scored them; this took around 25 minutes with my current code.
The results were not exactly disappointing, but neither did they deliver anything as awesomely successful as “cinematographer” / “megachiropteran”. The top scorer by far was “ACEEEFFGHHIILLMMNNOORRSSSTUV”, which is the pseudonym of 17th-century German writer Hans Jakob Christoffel von Grimmelshausen. Obviously, Grimmelshausen constructed his pseudonym by sorting the letters of his name into alphabetical order.
(Robert Hooke famously used the same scheme to claim priority for discovery of his spring law without actually revealing it. He published the statement as “ceiiinosssttuv” and then was able to claim, two years later, that this was an anagram of the actual law, which was “ut tensio, sic vis”. (“As the extension, so the force.”) An attendee of my Monday talk wondered if there is some other Latin phrase that Hooke could have claimed to have intended. Perhaps someone else can take the baton from me on this project.)
Anyway, the next few top scorers demonstrate several different problems:
21 Abcdefghijklmnopqrstuvwxyz / Qwertyuiopasdfghjklzxcvbnm
21 Abcdefghijklmnopqrstuvwxyz / Qwertzuiopasdfghjklyxcvbnm
21 Ashland County Courthouse / Odontorhynchus aculeatus
21 Daniel Francois Malherbe / Mindenhall Air Force Base
20 Christine Amongin Aporu / Ethnic groups in Romania
20 Message force multiplier / Petroleum fiscal regimes
19 Cholesterol lowering agent / North West Regional College
19 Louise de Maisonblanche / Schoenobius damienella
19 Scorpaenodes littoralis / Steroidal spirolactones
The “Qwerty” ones are intrinsically uninteresting and anyway we could have predicted ahead of time that they would be there. And the others are just sort of flat. “Odontorhynchus aculeatus” has the usual problems. One can imagine that there could be some delicious irony in “Daniel Francois Malherbe” / “Mindenhall Air Force Base” but as far as I can tell there isn't any and neither was Louise de Maisonblanche killed by an S. damienella. (It's a moth. Mme de Maisonblanche was actually killed by Variola which is not an anagram of anything interesting.)
Wikipedia article titles include many trivial variations. For example, many people will misspell “Winona Ryder” as “Wynona Rider”, so Wikipedia has pages for both, with the real article at the correct spelling and the incorrect one redirecting to it. The anagram detector cheerfully picks these up although they do not get high scores. Similarly:
The anagram scorer often had quite a bit of trouble with items like these because they are long and full of repeated letter pairs. The older algorithm would have done even worse. If you're still wondering about the difference between two exponential algorithms, some of these would make good example cases to consider.
As I mentioned above you can download the Wikipedia anagrams from my web site and check for yourself. My favorite item so far is:
18 Atlantis Casino Resort Spa / Carter assassination plot
Some words appear with surprising frequency and I don't know why. As I mentioned above one of the top scorers was “Ethnic groups in Romania” and for some reason Romania appears in the anagram list over and over again:
20 Christine Amongin Aporu / Ethnic groups in Romania
17 List of Romanian actors / Social transformation
15 Imperial Coronation / Romanian riot police
14 Rakhine Mountains / Romanians in the UK
14 Mindanao rasbora / Romanians abroad
13 Romanian poets / ramosopinnate
13 Aleuron carinatum / Aromanian culture
11 Resita Montana / Romanian state
11 Monte Schiara / The Romaniacs
11 Monetarianism / Romanian Times
11 Marion Barnes / Romanian Serb
11 Maarsen railway station / Romanian State Railways
11 Eilema androconia / Nicolae de Romania
11 Ana Maria Norbis / Arabs in Romania
( 170 more )
Also I had never thought of this before, but Romania appears in this unexpected context:
09 Alicia Morton / Clitoromania
09 Carinito Malo / Clitoromania
(Alicia Morton played Annie in the 1999 film. Carinito Malo is actually Cariñito Malo. I've already discussed the nonequivalence of “n” and “ñ” so I won't beat that horse again.)
Well, this is something I can investigate. For each string of letters, we have here the number of Wikipedia article titles in which the string appears (middle column), the number of anagram pairs in which the string appears (left column; anagrams with score less than 6 are not counted) and the quotient of the two (right column).
romania 110 4106 2.7%
serbia 109 4400 2.5%
croatia 68 3882 1.8%
belarus 24 1810 1.3%
ireland 140 11426 1.2%
andorra 7 607 1.2%
austria 60 5427 1.1%
russia 137 15944 0.9%
macedonia 28 3167 0.9%
france 111 14785 0.8%
spain 64 8880 0.7%
slovenia 18 2833 0.6%
wales 47 9438 0.5%
portugal 17 3737 0.5%
italy 21 4353 0.5%
denmark 19 3698 0.5%
ukraine 12 2793 0.4%
england 37 8719 0.4%
sweden 11 4233 0.3%
scotland 16 4945 0.3%
poland 22 6400 0.3%
montenegro 4 1446 0.3%
germany 16 5733 0.3%
finland 6 2234 0.3%
albania 10 3268 0.3%
slovakia 3 1549 0.2%
norway 9 3619 0.2%
greece 10 8307 0.1%
belgium 3 2414 0.1%
switzerland 0 5439 0.0%
netherlands 1 3522 0.0%
czechia 0 75 0.0%
As we see, Romania and Serbia are substantially ahead of the others.
I suspect that it is a combination of some lexical property (the
interesting part) and the relatively low coverage of those countries
in English Wikipedia. That is, I think if we were to identify the
lexical component, we might well find that russia
has more of it,
but scores lower than romania
because Russia is much more important.
My apologies if I
accidentally omitted your favorite European country.
[ Oh, crap, I just realized I left out Bosnia. ]
Another one of the better high scorers turns out to be the delightful:
16 Lesbian intercourse / Sunrise Celebration
“Lesbian”, like “Romania”, seems to turn up over and over; the next few are:
11 Lesbian erotica / Oreste Bilancia
11 Pitane albicollis / Political lesbian
12 Balearic islands / Radical lesbians
12 Blaise reaction / Lesbian erotica
(43 more)
Wikipedia says:
The Blaise reaction is an organic reaction that forms a β-ketoester from the reaction of zinc metal with a α-bromoester and a nitrile.
A hundred points to anyone who can make a genuinely funny joke out of this.
Oreste Bilancia is an Italian silent-film star, and Pitane albicollis is another moth. I did not know there were so many anagrammatic moths. Christian Bale is an anagram of Birthana cleis, yet another moth.
I ran the same sort of analysis on lesbian
as on romania
, except
that since it wasn't clear what to compare it to, I picked a bunch of
random words.
nosehair 3 3 100.0%
margarine 4 16 25.0%
penis 95 573 16.6%
weasel 11 271 4.1%
phallus 5 128 3.9%
lesbian 26 863 3.0%
center 340 23969 1.4%
flowers 14 1038 1.3%
trumpet 6 487 1.2%
potato 10 941 1.1%
octopus 4 445 0.9%
coffee 12 1531 0.8%
It seems that lesbian
appears with unusually high but not remarkably
high frequency. The unusual part is its participation in so many
anagrams with very high scores. The outstanding item here is
penis
. (The top two being rare outliers.) But penis
still wins
even if I throw away anagrams with scores less than 10 (instead of
less than 6):
margarine 1 16 6.2%
penis 13 573 2.3%
lesbian 8 863 0.9%
trumpet 2 487 0.4%
flowers 4 1038 0.4%
center 69 23969 0.3%
potato 2 941 0.2%
octopus 1 445 0.2%
coffee 1 1531 0.1%
weasel 0 271 0.0%
phallus 0 128 0.0%
nosehair 0 3 0.0%
Since I'm sure you are wondering, here are the anagrams of margarine
and nosehair
:
07 Nosehair / Rehsonia
08 Aso Shrine / Nosehairs
09 Nosehairs / hoariness
04 Margaret Hines / The Margarines
07 Magerrain / margarine
07 Ramiengar / margarine
08 Rae Ingram / margarine
11 Erika Armstrong / Stork margarine
I think “Margaret Hines” / “The Margarines” should score more than 4, and that this exposes a defect in my method.
Here is the graph constructed by the MIS algorithm for the pair “acrididae” / “cidaridae”, which I discussed in an earlier article and also mentioned in my talk.
Each maximum independent set in this graph corresponds to a minimum-chunk mapping between “acrididae” and “cidaridae”. In the earlier article, I claimed:
This one has two maximum independent sets
which is wrong; it has three, yielding three different mappings with five chunks:
My daughter Katara points out that the graphs above resemble grasshoppers. My Gentle Readers will no doubt recall that acrididae is the family of grasshoppers, comprising around 10,000 species. I wanted to find an anagram “grasshopper” / “?????? graph”. There are many anagrams of “eoprs” and “eoprss” but I was not able to find anything good. The best I could do was “spore graphs”.
Thank you, Gentle Readers, for taking this journey with me. I hope nobody walks up to me in the next year to complain that my blog does not feature enough anagram-related material.
[Other articles in category /lang] permanent link
Mon, 08 May 2017
An anagrammatic cautionary tale
I previously claimed that “cinematographer” / “megachiropteran” was the best anagram in English. Scoring all the anagrams in the list of 13 million Wikipedia article titles did not refute this, but it did reveal that “cinematographer” is also an anagram of “Taichang Emperor”.
The Taichang Emperor (泰昌) lived from 1582 to 1620 and was the 14th emperor of the Ming Dynasty. His reign as emperor lasted only 29 days, after which he died of severe diarrhea. Wikipedia says:
According to non-official primary sources, the Taichang Emperor's illness was brought about by excessive sexual indulgence after he was presented with eight maidens by Lady Zheng.
To counteract the diarrhea, the emperor took a “red pill” offered to him by a court official:
It was recorded in official Ming histories that the Taichang Emperor felt much better after taking the red pill, regained his appetite and repeatedly praised Li Kezhuo as a "loyal subject". That same afternoon, the emperor took a second pill and was found dead the next morning.
Surely this tale of Ming China has something to teach us even today.
[Other articles in category /misc] permanent link
Sun, 02 Apr 2017A Unix system administrator of my acquaintance once got curious about
what people were putting into /dev/null
. I think he also may have
had some notion that it would contain secrets or other interesting
material that people wanted thrown away. Both of these ideas are
stupid, but what he did next was even more stupid: he decided to
replace /dev/null
with a plain file so that he could examine its
contents.
The root filesystem quickly filled up and the admin had to be called
back from dinner to fix it. But he found that he couldn't fix it: to
create a Unix device file you use the mknod
command, and its
arguments are the major and minor device numbers of the device to
create. Our friend didn't remember the correct minor device
number. The ls -l
command will tell you the numbers of a device file
but he had removed /dev/null
so he couldn't use that.
Having no other system of the same type with an intact device file to
check, he was forced to restore /dev/null
from the tape backups.
[Other articles in category /Unix] permanent link
Sun, 05 Mar 2017Lately my kids have been interested in puzzles of this type: You are given a sequence of four digits, say 1,2,3,4, and your job is to combine them with ordinary arithmetic operations (+, -, ×, and ÷) in any order to make a target number, typically 24. For example, with 1,2,3,4, you can go with $$((1+2)+3)×4 = 24$$ or with $$4×((2×3)×1) = 24.$$
I said I had found an unusually difficult puzzle of this type, which is to make 2,5,6,6 total to 17. This is rather difficult. (I will reveal the solution later in this article.) Several people independently wrote to advise me that it is even more difficult to make 3,3,8,8 total to 24. They were right; it is amazingly difficult. After a couple of weeks I finally gave up and asked the computer, and when I saw the answer I didn't feel bad that I hadn't gotten it myself. (The solution is here if you want to give up without writing a program.)
From now on I will abbreviate the two puzzles of the previous paragraph as «2 5 6 6 ⇒ 17» and «3 3 8 8 ⇒ 24», and others similarly.
The article also inspired a number of people to write their own solvers and send them to me, and comparing them was interesting. My solver followed the tree search technique that I described in chapter 5 of Higher-Order Perl, and which has become so familiar to me that by now I can implement it without thinking about it very hard:
Invent a data structure that represents the state of a possibly-incomplete search. This is just a list of the stuff one needs to keep track of while searching. (Let's call this a node.)
Build a function which recognizes when a node represents a successful search.
Build a function which takes a node, computes all the ways the search could proceed from that point, and returns a list of nodes for those slightly-more-advanced searches.
Initialize a queue with a node representing a search that has just begun.
Do this:
until ( queue.is_empty() ) {
current_node = queue.get_next()
if ( is_successful( current_node ) ) { print the solution }
queue.push( slightly_more_complete_searches( current_node ) )
}
This is precisely a breadth-first search. To make it into depth-first
search, replace the queue with a stack. To make a heuristically
directed search, replace get_next
with a function that looks at the
queue and chooses the best-looking node from which to proceed. Many
other variations are possible, which is the advantage of this
synthetic approach over letting the search arise organically from a
recursive searcher. (Higher-Order Perl says “Recursive functions
naturally perform depth-first
searches.” (page
203)) In Python or Ruby one would be able to use yield
and would
not have to manage the queue explicitly, but in this case the queue
management is trivial.
In my solver, each node contains a list of available expressions, annotated with its numerical value. Initially, the expressions are single numbers and the values are the same, say
[ [ "2" => 2 ], [ "3" => 3 ], [ "4" => 4 ], [ "6" => 6 ] ]
Whether you represent expressions as strings or as something more structured depends on what you need to do with them at the end. If you just need to print them out, strings are good enough and are easy to handle.
A node represents a successful search if it contains only a single expression and if the expression's value is the target sum, say 24:
[ [ "(((6÷2)+3)×4)" => 24 ] ]
From a node, the search should proceed by selecting two of the expressions, removing them from the node, selecting a legal operation, combining the two expressions into a single expression, and inserting the result back into the node. For example, from the initial node shown above, the search might continue by subtracting the fourth expression from the second:
[ [ "2" => 2 ], [ "4" => 4 ], [ "(3-6)" => -3 ] ]
or by multiplying the second and the third:
[ [ "2" => 2 ], [ "(3×4)" => 12 ], [ "6" => 6 ] ]
When the program encounters that first node it will construct both of these, and many others, and put them all into the queue to be investigated later.
From
[ [ "2" => 2 ], [ "(3×4)" => 12 ], [ "6" => 6 ] ]
the search might proceed by dividing the first expression by the third:
[ [ "(3×4)" => 12 ], [ "(2÷6)" => 1/3 ] ]
Then perhaps by subtracting the first from the second:
[ [ "((2÷6)-(3×4))" => -35/3 ] ]
From here there is no way to proceed, so when this node is removed from the queue, nothing is added to replace it. Had it been a winner, it would have been printed out, but since !!-\frac{35}3!! is not the target value of 24, it is silently discarded.
To solve a puzzle of the «a b c d ⇒ t» sort requires examining a few thousand nodes. On modern hardware this takes approximately zero seconds.
The actual code for my solver is a lot of Perl gobbledygook that may not be of general interest so I will provide a link for people who are interested in deciphering it. It also represents my second attempt: I lost the code that I described in the earlier article and had to rewrite it. It is rather bigger than I would have liked.
People showed me a lot of programs to solve this, and many didn't work. There are a few hard cases that several of them get wrong.
Some puzzles require that some subexpressions have fractional values. Many of the programs people showed me used integer arithmetic (sometimes implicitly and unintentionally) and failed to solve those puzzles. We can detect this by asking for a solution to «2 5 6 6 ⇒ 17», which requires a fraction. The solution is !!6×(2+(5÷6))!!. A program using integer arithmetic will calculate !!5÷6 = 0!! and fail to recognize the solution.
Several people on Twitter made this mistake and then mistakenly claimed that there was no solution at all. Usually it was possible to correct their programs by changing
inputs = [ 2, 2, 5, 6 ]
to
inputs = [ 2.0, 2.0, 5.0, 6.0 ]
or something like that.
Some people also surprised me by claiming that I had lied when I stated that the puzzle could be solved without any “underhanded tricks”, and that the use of intermediate fractions was itself an underhanded trick. Your Honor, I plead not guilty. I originally described the puzzle this way:
You are given a sequence of four digits, say 1,2,3,4, and your job is to combine them with ordinary arithmetic operations (+, -, ×, and ÷) in any order to make a target number, typically 24.
The objectors are implicitly claiming that when you combine 5 and 6 with the “ordinary arithmetic operation” of division, you get something other than !!\frac56!!. This is an indefensible claim.
I wasn't even trying to be tricky! It never occurred to me that fractions were something that some people would consider underhanded, and now that it has been suggested, I reject the suggestion. Folks, the result of division can be a fraction. Fractions are not some sort of obscure mathematical pettifoggery. They have been with us for at least 3,500 years now, so it is time everyone got used to them.
Some programs used floating-point arithmetic to deal with the fractions and then fell foul of floating-point error. I will defer discussion of this to a future article.
I've complained about floating-point numbers on this blog before. ( 1 2 3 4 5 ) God, how I loathe them.
A more subtle error that several programs made was to assume that all expressions can be constructed by combining a previous expression with a single input number. For example, to solve «2 3 5 7 ⇒ 24», you multiply 3 by 7 to get 21, then add 5 to get 26, then subtract 2 to get 24.
But not every puzzle can be solved this way. Consider «2 3 5 7 ⇒ 41». You start by multiplying 2 by 3 to get 6, but if you try to combine the 6 with either 5 or 7 at this point you will lose. The only solution is to put the 6 aside and multiply 5 by 7 to get 35. Then add the 6 and the 35 to get 41.
Another way to put this is that an unordered binary tree with 4 leaves can take two different shapes. (Imagine filling the green circles with numbers and the pink squares with operators.)
The right-hand type of structure is sometimes necessary, as with «2 3 5 7 ⇒ 41». But several of the proposed solutions produced only expressions with structures like that on the left.
Here's Sebastian Fischer's otherwise very elegant Haskell solution, in its entirety:
import Data.List ( permutations )
solution = head
[ (a,x,(b,y,(c,z,d)))
| [a,b,c,d] <- permutations [2,5,6,6],
ops <- permutations [((+),'+'),((-),'-'),((*),'*'),((/),'/')],
let [u,v,w] = map fst $ take 3 ops,
let [x,y,z] = map snd $ take 3 ops,
(a `u` (b `v` (c `w` d))) == 17
]
You can see the problem in the last line. a
, b
, c
, and d
are
numbers, and u
, v
, and w
are operators. The program evaluates
an expression to see if it has the value 17, but the expression always
has the left-hand shape. (The program has another limitation: it
never uses the same operator twice in the expression. That second
permutations
should be (sequence . take 3 . repeat)
or
something. It can still solve «2 5 6 6 ⇒ 17», however.)
Often the way these programs worked was to generate every possible permutation of the inputs and then apply the operators to the input lists stackwise: pop the first two values, combine them, push the result, and repeat. Here's a relevant excerpt from a program by Tim Dierks, this time in Python:
for ordered_values in permutations(values):
for operations in product(ops, repeat=len(values)-1):
result, formula = calc_result(ordered_values, operations)
Here the expression structure is implicit, but the current result is always made by combining one of the input numbers with the old result.
I have seen many people get caught by this and similar traps in the
past. I once posed the problem of enumerating all the strings of
balanced parentheses of a given length,
and several people assumed that all such strings have the form ()S
,
S()
, or (S)
, where S
is a shorter string of the same type. This
seems plausible, and it works up to length 6, but (())(())
does not
have that form.
A less common error exhibited by some programs was a failure to properly deal with division by zero. «2 5 6 6 ⇒ 17» has a solution, and if a program dies while checking !!2+(5÷(6-6))!! and doesn't find the solution, that's a bug.
Ingo Blechschmidt showed me a solution in
Haskell. The code is quite short.
M. Blechschmidt's program defines a synthetic expression type and an
evaluator for it. It defines a function arb
which transforms an
ordered list of numbers into a list of all possible expressions over
those numbers. Reordering the list is taken care of earlier, by
Data.List.permutations
.
By “synthetic expression type” I mean this:
data Exp a
= Lit a
| Sum (Exp a) (Exp a)
| Diff (Exp a) (Exp a)
| Prod (Exp a) (Exp a)
| Quot (Exp a) (Exp a)
deriving (Eq, Show)
Probably 80% of the Haskell programs ever written have something like this in them somewhere. This approach has a lot of boilerplate. For example, M. Blechschmidt's program then continues:
eval :: (Fractional a) => Exp a -> a
eval (Lit x) = x
eval (Sum a b) = eval a + eval b
eval (Diff a b) = eval a - eval b
eval (Prod a b) = eval a * eval b
eval (Quot a b) = eval a / eval b
Having made up our own synonyms for the arithmetic operators (Sum
for
!!+!!, etc.) we now have to explain to Haskell what they mean. (“Not
expressions, but an incredible simulation!”)
I spent a while trying to shorten the code by using a less artificial expression type:
data Exp a
= Lit a
| Op ((a -> a -> a), String) (Exp a) (Exp a)
but I was disappointed; I was only able to cut it down by 18%, from 34 lines to 28. I hope to discuss this in a future article. By the way, “Blechschmidt” is German for “tinsmith”.
Shreevatsa R. showed me a solution in Python. It generates every possible expression and prints it out with its value. If you want to filter the voluminous output for a particular target value, you do that later. Shreevatsa wrote up an extensive blog article about this which also includes a discussion about eliminating duplicate expressions from the output. This is a very interesting topic, and I have a lot to say about it, so I will discuss it in a future article.
Jeff Fowler of the Recurse Center wrote a compact solution in Ruby that he described as “hot garbage”. Did I say something earlier about Perl gobbledygook? It's nice that Ruby is able to match Perl's level of gobbledygookitude. This one seems to get everything right, but it fails mysteriously if I replace the floating-point constants with integer constants. He did provide a version that was not “egregiously minified” but I don't have it handy.
Lindsey Kuper wrote a series of solutions in the Racket dialect of Scheme, and discussed them on her blog along with some other people’s work.
M. Kuper's first draft was 92 lines long (counting whitespace) and when I saw it I said “Gosh, that is way too much code” and tried writing my own in Scheme. It was about the same size. (My Perl solution is also not significantly smaller.)
I saved the best for last. Martin Janecke showed me an almost flawless solution in PHP that uses a completely different approach than anyone else's program. Instead of writing a lot of code for generating permutations of the input, M. Janecke just hardcoded them:
$zahlen = [
[2, 5, 6, 6],
[2, 6, 5, 6],
[2, 6, 6, 5],
[5, 2, 6, 6],
[5, 6, 2, 6],
[5, 6, 6, 2],
[6, 2, 5, 6],
[6, 2, 6, 5],
[6, 5, 2, 6],
[6, 5, 6, 2],
[6, 6, 2, 5],
[6, 6, 5, 2]
]
Then three nested loops generate the selections of operators:
$operatoren = [];
foreach (['+', '-', '*', '/'] as $x) {
foreach (['+', '-', '*', '/'] as $y) {
foreach (['+', '-', '*', '/'] as $z) {
$operatoren[] = [$x, $y, $z];
}
}
}
Expressions are constructed from templates:
$klammern = [
'%d %s %d %s %d %s %d',
'(%d %s %d) %s %d %s %d',
'%d %s (%d %s %d) %s %d',
'%d %s %d %s (%d %s %d)',
'(%d %s %d) %s (%d %s %d)',
'(%d %s %d %s %d) %s %d',
'%d %s (%d %s %d %s %d)',
'((%d %s %d) %s %d) %s %d',
'(%d %s (%d %s %d)) %s %d',
'%d %s ((%d %s %d) %s %d)',
'%d %s (%d %s (%d %s %d))'
];
(I don't think those templates are all necessary, but hey, whatever.)
Finally, another set of nested loops matches each ordering of the
input numbers with each selection of operators, uses sprintf
to plug
the numbers and operators into each possible expression template, and
uses @eval
to evaluate the resulting expression to see if it has the
right value:
foreach ($zahlen as list ($a, $b, $c, $d)) {
foreach ($operatoren as list ($x, $y, $z)) {
foreach ($klammern as $vorlage) {
$term = sprintf ($vorlage, $a, $x, $b, $y, $c, $z, $d);
if (17 == @eval ("return $term;")) {
print ("$term = 17\n");
}
}
}
}
If loving this is wrong, I don't want to be right. It certainly satisfies Larry Wall's criterion of solving the problem before your boss fires you. The same approach is possible in most reasonable languages, and some unreasonable ones, but not in Haskell, which was specifically constructed to make this approach as difficult as possible.
M. Janecke wrote up a blog article about this, in
German. He says “It's not an elegant
program and PHP is probably not an obvious choice for arithmetic
puzzles, but I think it works.” Indeed it does. Note that the use of
@eval
traps the division-by-zero exceptions, but unfortunately falls
foul of floating-point roundoff errors.
Thanks to everyone who discussed this with me. In addition to the people above, thanks to Stephen Tu, Smylers, Michael Malis, Kyle Littler, Jesse Chen, Darius Bacon, Michael Robert Arntzenius, and anyone else I forgot. (If I forgot you and you want me to add you to this list, please drop me a note.)
I have enough material for at least three or four more articles about this that I hope to publish here in the coming weeks.
But the previous article on this subject ended similarly, saying
I hope to write a longer article about solvers in the next week or so.
and that was in July 2016, so don't hold your breath.
[Other articles in category /math] permanent link
Thu, 23 Feb 2017
Miscellaneous notes on anagram scoring
My article on finding the best anagram in English was well-received, and I got a number of interesting comments about it.
A couple of people pointed out that this does nothing to address the issue of multiple-word anagrams. For example it will not discover “I, rearrangement servant / Internet anagram server” True, that is a different problem entirely.
Markian Gooley informed me that “megachiropteran / cinematographer” has been long known to Scrabble players, and Ben Zimmer pointed out that A. Ross Eckler, unimpressed by “cholecystoduodenostomy / duodenocholecystostomy”, proposed a method almost identical to mine for scoring anagrams in an article in Word Ways in 1976. M. Eckler also mentioned that the “remarkable” “megachiropteran / cinematographer” had been published in 1927 and that “enumeration / mountaineer” (which I also selected as a good example) appeared in the Saturday Evening Post in 1879!
The Hacker News comments were unusually pleasant and interesting. Several people asked “why didn't you just use the Levenshtein distance”? I don't remember that it ever occured to me, but if it had I would have rejected it right away as being obviously the wrong thing. Remember that my original chunking idea was motivated by the observation that “cholecystoduodenostomy / duodenocholecystostomy” was long but of low quality. Levenshtein distance measures how far every letter has to travel to get to its new place and it seems clear that this would give “cholecystoduodenostomy / duodenocholecystostomy” a high score because most of the letters move a long way.
Hacker News user
tyingq
tried it
anyway, and reported that it produced a poor
outcome.
The top-scoring pair by Levenshtein distance is
“anatomicophysiologic physiologicoanatomic”, which under the
chunking method gets a score of 3. Repeat offender “cholecystoduodenostomy / duodenocholecystostomy”
only drops to fourth place.
A better idea seems to be Levenshtein score per unit of length,
suggested by lobste.rs user
cooler_ranch
.
A couple of people complained about my “notaries / senorita” example, rightly observing that “senorita” is properly spelled “señorita”. This bothered me also while I was writing the article. I eventually decided although “notaries” and “señorita” are certainly not anagrams in Spanish (even supposing that “notaries” was a Spanish word, which it isn't) that the spelling of “senorita” without the tilde is a correct alternative in English. (Although I found out later that both the Big Dictionary and American Heritage seem to require the tilde.)
Hacker News user
ggambetta
observed
that while ‘é’ and ‘e’, and ‘ó’ and ‘o’ feel interchangeable in
Spanish, ‘ñ’ and ‘n’ do not. I think this is right. The ‘é’ is an
‘e’, but with a mark on it to show you where the stress is in the
word. An ‘ñ’ is not like this. It was originally an abbreviation
for ‘nn’, introduced in the 18th century. So I thought it might
make sense to allow ‘ñ’ to be exchanged for ‘nn’, at least in some
cases.
(An analogous situation in German, which may be more familiar, is that it might be reasonable to treat ‘ö’ and ‘ü’ as if they were ‘oe’ and ‘ue’. Also note that in former times, “w” and “uu” were considered interchangeable in English anagrams.)
Unfortunately my Spanish dictionary is small (7,000 words) and of poor quality and I did not find any anagrams of “señorita”. I wish I had something better for you. Also, “señorita” is not one of the cases where it is appropriate to replace “ñ” with “nn”, since it was never spelled “sennorita”.
I wonder why sometimes this sort of complaint seems to me like useless nitpicking, and other times it seems like a serious problem worthy of serious consideration. I will try to think about this.
Mike Morton, who goes by the anagrammatic nickname of “Mr. Machine Tool”, referred me to his Higgledy-piggledy about megachiropteran / cinematographer, which is worth reading.
Regarding the maximum independent set algorithm I described yesterday, Shreevatsa R. suggested that it might be conceptually simpler to find the maximum clique in the complement graph. I'm not sure this helps, because the complement graph has a lot more edges than the original. Below right is the complement graph for “acrididae / cidaridae”. I don't think I can pick out the 4-cliques in that graph any more than the independent sets in the graph on the lower-left, and this is an unusually favorable example case for the clique version, because the original graph has an unusually large number of edges.
But perhaps the cliques might be easier to see if you know what to look for: in the right-hand diagram the four nodes on the left are one clique, and the four on the right are the other, whereas in the left-hand diagram the two independent sets are all mixed together.
An earlier version of the original article mentioned the putative 11-pointer “endometritria / intermediator”. The word “endometritria” seemed pretty strange, and I did look into it before I published the article, but not carefully enough. When Philip Cohen wrote to me to question it, I investigated more carefully, and discovered that it had been an error in an early WordNet release, corrected (to “endometria”) in version 1.6. I didn't remember that I had used WordNet's word lists, but I am not surprised to discover that I did.
A rare printing of Webster's 2¾th American International Lexican includes the word “endometritriostomoscopiotomous” but I suspect that it may be a misprint.
Philippe Bruhat wrote to inform me of Alain Chevrier’s book notes / sténo, a collection of thematically related anagrams in French. The full text is available online.
Alexandre Muñiz, who has a really delightful blog, and who makes and sells attractive and clever puzzles of his own invention. pointed out that soapstone teaspoons are available. The perfect gift for the anagram-lover in your life! They are not even expensive.
Thanks also to Clinton Weir, Simon Tatham, Jon Reeves, Wei-Hwa Huang, and Philip Cohen for their emails about this.
[ Addendum 20170507: Slides from my !!Con 2017 talk are now available. ]
[ Addendum 20170511: A large amount of miscellaneous related material ]
[Other articles in category /lang] permanent link
Wed, 22 Feb 2017
Moore's law beats a better algorithm
Yesterday I wrote about the project I did in the early 1990s to find the best anagrams. The idea is to give pair of anagram words a score, which is the number of chunks into which you have to divide one word in order to rearrange the chunks to form the other word. This was motivated by the observation that while “cholecysto-duodeno-stomy” and “duodeno-cholecysto-stomy” are very long words that are anagrams of one another, they are not interesting because they require so few chunks that the anagram is obvious. A shorter but much more interesting example is “aspired / diapers”, where the letters get all mixed up.
I wrote:
One could do this with a clever algorithm, if one were available. There is a clever algorithm, based on finding maximum independent sets in a certain graph. I did not find this algorithm at the time; nor did I try. Instead, I used a brute-force search.
I wrote about the brute-force search yesterday. Today I am going to discuss the clever algorithm. (The paper is Avraham Goldstein, Petr Kolman, Jie Zheng “Minimum Common String Partition Problem: Hardness and Approximations”, The Electronic Journal of Combinatorics, 12 (2005).)
The plan is to convert a pair of anagrams into a graph that expresses
the constraints on how the letters can move around when one turns into
the other. Shown below is the graph for comparing
acrididae
(grasshoppers)
with cidaridae
(sea
urchins):
The “2,4” node at the top means that the letters ri
at position
2 in acrididae
match the letters ri
at position 4 in cidaridae
;
the “3,1” node is for the match between the first id
and the first
id
. The two nodes are connected by an edge to show that the two
matchings are incompatible: if you map the ri
to the ri
, you
cannot also map the first id
to the first id
; instead you have to
map the first id
to the second one, represented by the node “3,5”,
which is not connected to “2,4”. A maximum independent set in this
graph is a maximum selection of compatible matchings in the words,
which corresponds to a division into the minimum number of chunks.
Usually the graph is much less complicated than this. For simple cases it is empty and the maximum independent set is trivial. This one has two maximum independent sets, one (3,1; 5,5; 6,6; 7,7) corresponding to the obvious minimum splitting:
and the other (2,4; 3,5; 5,1; 6,2) to this other equally-good splitting:
[ Addendum 20170511: It actually has three maximum independent sets. ]
In an earlier draft of yesterday's post, I wrote:
I should probably do this over again, because my listing seems to be incomplete. For example, it omits “spectrum / crumpets” which would have scored 5, because the Webster's Second list contains crumpet but not crumpets.
I was going to leave it at that, but then I did do it over again, and this time around I implemented the “good” algorithm. It was not that hard. The code is on GitHub if you would like to see it.
To solve the maximum independent set instances, I used a guided brute-force search. Maximum independent set is NP-complete, and so the best known algorithm for it runs in exponential time. But the instances in which we are interested here are small enough that this doesn't matter. The example graph above has 8 nodes, so one needs to check at most 256 possible sets to see which is the maximum independent set.
I collated together all the dictionaries I had handy. (I didn't know yet about SCOWL.) These totaled 275,954 words, which is somewhat more than Webster's Second by itself. One of the new dictionaries did contain crumpets so the result does include “spectrum / crumpets”.
The old scored anagram list that I made in the 1990s contained 23,521 pairs. The new one contains 38,333. Unfortunately most of the new stuff is of poor quality, as one would expect. Most of the new words that were missing from my dictionary the first time around are obscure. Perhaps some people would enjoy discovering that that “basiparachromatin” and “Marsipobranchiata” are anagrams, but I find it of very limited appeal.
But the new stuff is not all junk. It includes:
10 antiparticles paternalistic
10 nectarines transience
10 obscurantist subtractions11 colonialists oscillations
11 derailments streamlined
which I think are pretty good.
I wasn't sure how long the old program had taken to run back in the early nineties, but I was sure it had been at least a couple of hours. The new program processes the 275,954 inputs in about 3.5 seconds. I wished I knew how much of this was due to Moore's law and how much to the improved algorithm, but as I said, the old code was long lost.
But then just as I was finishing up the article, I found the old brute-force code that I thought I had lost! I ran it on the same input, and instead of 3.5 seconds it took just over 4 seconds. So almost all of the gain since the 1990s was from Moore's law, and hardly any was from the “improved” algorithm.
I had written in the earlier article:
In 2016 [ the brute force algorithm ] would probably still [ run ] quicker than implementing the maximum independent set algorithm.
which turned out to be completely true, since implementing the maximum independent set algorithm took me a couple of hours. (Although most of that was building out a graph library because I didn't want to look for one on CPAN.)
But hey, at least the new program is only twice as much code!
[ Addendum: The program had a minor bug: it would disregard
capitalization when deciding if two words were anagrams, but then
compute the scores with capitals and lowercase letters distinct. So
for example Chaenolobus
was considered an anagram of unchoosable
,
but then the Ch
in Chaenolobus
would not be matched to the ch
in
unchoosable
, resulting in a score of 11 instead of 10. I have
corrected the program and the output. Thanks to Philip Cohen for
pointing this out. ]
[ Addendum 20170223: More about this ]
[ Addendum 20170507: Slides from my !!Con 2017 talk are now available. ]
[ Addendum 20170511: A large amount of miscellaneous related material ]
[Other articles in category /lang] permanent link
Tue, 21 Feb 2017
I found the best anagram in English
I planned to publish this last week sometime but then I wrote a line of code with three errors and that took over the blog.
A few years ago I mentioned in passing that in the 1990s I had constructed a listing of all the anagrams in Webster's Second International dictionary. (The Webster's headword list was available online.)
This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:
# Process word list
while (my $word = <>) {
chomp $word;
my $sorted = join "", sort split //, $word; # normal form
push @{$anagrams{$sorted}}, $word;
}
for my $words (values %anagrams) {
print "@$words\n" if @$words > 1;
}
The key technique is to reduce each word to a normal form so that
two words have the same normal form if and only if they are anagrams
of one another. In this case we do this by sorting the letters into
alphabetical order, so that both megalodon and moonglade become
adeglmnoo
.
Then we insert the words into a (hash | associative array | dictionary), keyed by their normal forms, and two or more words are anagrams if they fall into the same hash bucket. (There is some discussion of this technique in Higher-Order Perl pages 218–219 and elsewhere.)
(The thing you do not want to do is to compute every permutation of the letters of each word, looking for permutations that appear in the word list. That is akin to sorting a list by computing every permutation of the list and looking for the one that is sorted. I wouldn't have mentioned this, but someone on StackExchange actually asked this question.)
Anyway, I digress. This article is about how I was unhappy with the results of the simple procedure above. From the Webster's Second list, which contains about 234,000 words, it finds about 14,000 anagram sets (some with more than two words), consisting of 46,351 pairs of anagrams. The list starts with
aal ala
and ends with
zolotink zolotnik
which exemplify the problems with this simple approach: many of the 46,351 anagrams are obvious, uninteresting or even trivial. There must be good ones in the list, but how to find them?
I looked in the list to find the longest anagrams, but they were also disappointing:
cholecystoduodenostomy duodenocholecystostomy
(Webster's Second contains a large amount of scientific and medical jargon. A cholecystoduodenostomy is a surgical operation to create a channel between the gall bladder (cholecysto-) and the duodenum (duodeno-). A duodenocholecystostomy is the same thing.)
This example made clear at least one of the problems with boring anagrams: it's not that they are too short, it's that they are too simple. Cholecystoduodenostomy and duodenocholecystostomy are 22 letters long, but the anagrammatic relation between them is obvious: chop cholecystoduodenostomy into three parts:
cholecysto duodeno stomy
and rearrange the first two:
duodeno cholecysto stomy
and there you have it.
This gave me the idea to score a pair of anagrams according to how many chunks one had to be cut into in order to rearrange it to make the other one. On this plan, the “cholecystoduodenostomy / duodenocholecystostomy” pair would score 3, just barely above the minimum possible score of 2. Something even a tiny bit more interesting, say “abler / blare” would score higher, in this case 4. Even if this strategy didn't lead me directly to the most interesting anagrams, it would be a big step in the right direction, allowing me to eliminate the least interesting.
This rule would judge both “aal / ala” and “zolotink / zolotnik” as being uninteresting (scores 2 and 4 respectively), which is a good outcome. Note that some other boring-anagram problems can be seen as special cases of this one. For example, short anagrams never need to be cut into many parts: no four-letter anagrams can score higher than 4. The trivial anagramming of a word to itself always scores 1, and nontrivial anagrams always score more than this.
So what we need to do is: for each anagram pair, say
acrididae
(grasshoppers)
and cidaridae
(sea
urchins), find the smallest number of chunks into which we can chop
acrididae
so that the chunks can be rearranged into cidaridae
.
One could do this with a clever algorithm, if one were available. There is a clever algorithm, based on finding maximum independent sets in a certain graph. (More about this tomorrow.) I did not find this algorithm at the time; nor did I try. Instead, I used a brute-force search. Or rather, I used a very small amount of cleverness to reduce the search space, and then used brute-force search to search the reduced space.
Let's consider a example, scoring the anagram “abscise / scabies”.
You do not have to consider every possible permutation of
abscise
. Rather, there are only two possible mappings from the
letters of abscise
to the letters of scabies
. You know that the
C
must map to the C
, the A
must map to the A
, and so
forth. The only question is whether the first S
of abscise
maps to
the first or to the second S
of scabies
. The first mapping gives
us:
and the second gives us
because the S
and the C
no longer go to adjoining positions. So
the minimum number of chunks is 5, and this anagram pair gets a score
of 5.
To fully analyze cholecystoduodenostomy
by this method required considering 7680
mappings. (120 ways to map the five O
's × 2 ways to map the two
C
's × 2 ways to map the two D
's, etc.) In the 1990s this took a
while, but not prohibitively long, and it worked well enough that I
did not bother to try to find a better algorithm. In 2016 it would
probably still run quicker than implementing the maximum independent
set algorithm. Unfortunately I have lost the code that I wrote then
so I can't compare.
Assigning scores in this way produced a scored anagram list which began
2 aal ala
and ended
4 zolotink zolotnik
and somewhere in the middle was
3 cholecystoduodenostomy duodenocholecystostomy
all poor scores. But sorted by score, there were treasures at the end, and the clear winner was
I declare this the single best anagram in English. It is 15 letters
long, and the only letters that stay together are the E
and the R
.
“Cinematographer” is as familiar as a 15-letter word can be, and
“megachiropteran” means a giant bat. GIANT BAT! DEATH FROM
ABOVE!!!
And there is no serious competition. There was another 14-pointer, but both its words are Webster's Second jargon that nobody knows:
14 rotundifoliate titanofluoride
There are no score 13 pairs, and the score 12 pairs are all obscure. So this is the winner, and a deserving winner it is.
I think there is something in the list to make everyone happy. If you are the type of person who enjoys anagrams, the list rewards casual browsing. A few examples:
7 admirer married
7 admires sidearm8 negativism timesaving
8 peripatetic precipitate
8 scepters respects
8 shortened threnodes
8 soapstone teaspoons9 earringed grenadier
9 excitation intoxicate
9 integrals triangles
9 ivoriness revisions
9 masculine calumnies10 coprophagist topographics
10 chuprassie haruspices
10 citronella interlocal11 clitoridean directional
11 dispensable piebaldness
“Clitoridean / directional” has been one of my favorites for years. But my favorite of all, although it scores only 6, is
6 yttrious touristy
I think I might love it just because the word yttrious is so delightful. (What a debt we owe to Ytterby, Sweden!)
I also rather like
5 notaries senorita
which shows that even some of the low-scorers can be worth looking at. Clearly my chunk score is not the end of the story, because “notaries / senorita” should score better than “abets / baste” (which is boring) or “Acephali / Phacelia” (whatever those are), also 5-pointers. The length of the words should be worth something, and the familiarity of the words should be worth even more.
Here are the results:
In former times there was a restaurant in Philadelphia named “Soupmaster”. My best unassisted anagram discovery was noticing that this is an anagram of “mousetraps”.
[ Addendum 20170222: There is a followup article comparing the two algorithms I wrote for computing scores. ]
[ Addendum 20170222: An earlier version of this article mentioned the putative 11-pointer “endometritria / intermediator”. The word “endometritria” seemed pretty strange, and I did look into it before I published the article, but not carefully enough. When Philip Cohen wrote to me to question it, I investigated more carefully, and discovered that it had been an error in an early WordNet release, corrected (to “endometria”) in version 1.6. I didn't remember that I had used WordNet's word lists, but I am not surprised to discover that I did. ]
[ Addendum 20170223: More about this ]
[ Addendum 20170507: Slides from my !!Con 2017 talk are now available. ]
[ Addendum 20170511: A large amount of miscellaneous related material ]
[Other articles in category /lang] permanent link
Thu, 16 Feb 2017
Automatically checking for syntax errors with Git's pre-commit hook
Previous related article
Earlier related article
Over the past couple of days I've written about how I committed a syntax error on a cron script, and a co-worker had to fix it on Saturday morning. I observed that I should have remembered to check the script for syntax errors before committing it, and several people wrote to point out to me that this is the sort of thing one should automate.
(By the way, please don't try to contact me on Twitter. It won't work. I have been on Twitter Vacation for months and have no current plans to return.)
Git has a “pre-commit hook” feature, which means that you can set up a program that will be run every time you attempt a commit, and which can abort the commit if it doesn't like what it sees. This is the natural place to put an automatic syntax check. Some people suggested that it should be part of the CI system, or even the deployment system, but I don't control those, and anyway it is much better to catch this sort of thing as early as possible. I decided to try to implement a pre-commit hook to check syntax.
Unlike some of the git hooks, the pre-commit hook is very simple to use. It gets run when you try to make a commit, and the commit is aborted if the hook exits with a nonzero status.
I made one mistake right off the bat: I wrote the hook in Bourne shell, even though I swore years ago to stop writing shell scripts. Everything that I want to write in shell should be written in Perl instead or in some equivalently good language like Python. But the sample pre-commit hook was written in shell and when I saw it I went into automatic shell scripting mode and now I have yet another shell script that will have to be replaced with Perl when it gets bigger. I wish I would stop doing this.
Here is the hook, which, I should say up front, I have not yet tried in day-to-day use. The complete and current version is on github.
#!/bin/bash
function typeof () {
filename=$1
case $filename in
*.pl | *.pm) echo perl; exit ;;
esac
line1=$(head -1 $1)
case $line1 in '#!'*perl )
echo perl; exit ;;
esac
}
Some of the sample programs people showed me decided which files
needed to be checked based only on the filename. This is not good
enough. My most important Perl programs have filenames with no
extension. This typeof
function decides which set of checks to
apply to each file, and the minimal demonstration version here can do
that based on filename or by looking for the #!...perl
line in the
first line of the file contents. I expect that this function will
expand to include other file types; for example
*.py ) echo python; exit ;;
is an obvious next step.
if [ ! -z $COMMIT_OK ]; then
exit 0;
fi
This block is an escape hatch. One day I will want to bypass the hook
and make a commit without performing the checks, and then I can
COMMIT_OK=1 git commit …
. There is actually a --no-verify
flag to
git-commit
that will skip the hook entirely, but I am unlikely to
remember it.
(I am also unlikely to remember COMMIT_OK=1
. But I know from
experience that I will guess that I might have put an escape hatch
into the hook. I will also guess that there might be a flag to
git-commit
that does what I want, but that will seem less likely to
be true, so I will look in the hook program first. This will be a
good move because my hook is much shorter than the git-commit
man
page. So I will want the escape hatch, I will look for it in the best place,
and I will find it. That is worth two lines of code. Sometimes I feel
like the guy in Memento. I have not yet resorted to tattooing
COMMIT_OK=1
on my chest.)
exec 1>&2
This redirects the standard output of all subsequent commands to go to
standard error instead. It makes it more convenient to issue error
messages with echo
and such like. All the output this hook produces
is diagnostic, so it is appropriate for it to go to standard error.
allOK=true
badFiles=
for file in $(git diff --cached --name-only | sort) ; do
allOK
is true if every file so far has passed its checks.
badFiles
is a list of files that failed their checks. the
git diff --cached --name-only
function interrogates the Git index
for a list of the files that have been staged for commit.
type=$(typeof "$file")
This invokes the typeof
function from above to decide the type of
the current file.
BAD=false
When a check discovers that the current file is bad, it will signal
this by setting BAD
to true
.
echo
echo "## Checking file $file (type $type)"
case $type in
perl )
perl -cw $file || BAD=true
[ -x $file ] || { echo "File is not executable"; BAD=true; }
;;
* )
echo "Unknown file type: $file; no checks"
;;
esac
This is the actual checking. To check Python files, we would add a
python) … ;;
block here. The * )
case is a catchall. The perl
checks run perl -cw
, which does syntax checking without executing
the program. It then checks to make sure the file is executable, which
I am sure is a mistake, because these checks are run for .pm
files,
which are not normally supposed to be executable. But I wanted to
test it with more than one kind of check.
if $BAD; then
allOK=false;
badFiles="$badFiles;$file"
fi
done
If the current file was bad, the allOK
flag is set false, and the
commit will be aborted. The current filename is appended to badFiles
for a later report. Bash has array variables but I don't remember how
they work and the manual made it sound gross. Already I regret not
writing this in a real language.
After the modified files have been checked, the hook exits successfully if they were all okay, and prints a summary if not:
if $allOK; then
exit 0;
else
echo ''
echo '## Aborting commit. Failed checks:'
for file in $(echo $badFiles | tr ';' ' '); do
echo " $file"
done
exit 1;
fi
This hook might be useful, but I don't know yet; as I said, I haven't
really tried it. But I can see ahead of time that it has a couple of
drawbacks. Of course it needs to be built out with more checks. A
minor bug is that I'd like to apply that is-executable check to Perl
files that do not end in .pm
, but that will be an easy fix.
But it does have one serious problem I don't know how to fix yet. The hook checks the versions of the files that are in the working tree, but not the versions that are actually staged for the commit!
The most obvious problem this might cause is that I might try to commit some files, and then the hook properly fails because the files are broken. Then I fix the files, but forget to add the fixes to the index. But because the hook is looking at the fixed versions in the working tree, the checks pass, and the broken files are committed!
A similar sort of problem, but going the other way, is that I might
make several changes to some file, use git add -p
to add the part I
am ready to commit, but then the commit hook fails, even though the
commit would be correct, because the incomplete changes are still in
the working tree.
I did a little tinkering with git stash save -k
to try to stash the
unstaged changes before running the checks, something like this:
git stash save -k "pre-commit stash" || exit 2 trap "git stash pop" EXIT
but I wasn't able to get anything to work reliably. Stashing a modified index has never worked properly for me, perhaps because there is something I don't understand. Maybe I will get it to work in the future. Or maybe I will try a different method; I can think of several offhand:
The hook could copy each file to a temporary file and then run the check on the temporary file. But then the diagnostics emitted by the checks would contain the wrong filenames.
It could move each file out of the way, check out the currently-staged version of the file, check that, and then restore the working tree version. (It can skip this process for files where the staged and working versions are identical.) This is not too complicated, but if it messes up it could catastrophically destroy the unstaged changes in the working tree.
Check out the entire repository and modified index into a fresh working tree and check that, then discard the temporary working tree. This is probably too expensive.
This one is kind of weird. It could temporarily commit the current
index (using --no-verify
), stash the working tree changes, and
check the files. When the checks are finished, it would unstash the
working tree changes, use git-reset --soft
to undo the temporary
commit, and proceed with the real commit if appropriate.
Come to think of it, this last one suggests a much better version of
the same thing: instead of a pre-commit hook, use a post-commit
hook. The post-commit hook will stash any leftover working tree
changes, check the committed versions of the files, unstash the
changes, and, if the checks failed, undo the commit with git-reset
--soft
.
Right now the last one looks much the best but perhaps there's something straightforward that I didn't think of yet.
[ Thanks to Adam Sjøgren, Jeffrey McClelland, and Jack Vickeridge for discussing this with me. Jeffrey McClelland also suggested that syntax checks could be profitably incorporated as a post-receive hook, which is run on the remote side when new commits are pushed to a remote. I said above that running the checks in the CI process seems too late, but the post-receive hook is earlier and might be just the thing. ]
[ Addendum: Daniel Holz wrote to tell me that the Yelp pre-commit frameworkhandles the worrisome case of unstaged working tree changes. The strategy is different from the ones I suggested above. If I'm reading this correctly, it records the unstaged changes in a patch file, which it sticks somewhere, and then checks out the index. If all the checks succeed, it completes the commit and then tries to apply the patch to restore the working tree changes. The checks in Yelp's framework might modify the staged files, and if they do, the patch might not apply; in this case it rolls back the whole commit. Thank you M. Holtz! ]
[Other articles in category /prog] permanent link
Wed, 15 Feb 2017
More thoughts on a line of code with three errors
Yesterday I wrote, in great irritation, about a line of code I had written that contained three errors.
I said:
What can I learn from this? Most obviously, that I should have tested my code before I checked it in.
Afterward, I felt that this was inane, and that the matter required a little more reflection. We do not test every single line of every program we write; in most applications that would be prohibitively expensive, and in this case it would have been excessive.
The change I was making was in the format of the diagnostic that the program emitted as it finished to report how long it had taken to run. This is not an essential feature. If the program does its job properly, it is of no real concern if it incorrectly reports how long it took to run. Two of my errors were in the construction of the message. The third, however, was a syntax error that prevented the program from running at all.
Having reflected on it a little more, I have decided that I am only really upset about the last one, which necessitated an emergency Saturday-morning repair by a co-worker. It was quite acceptable not to notice ahead of time that the report would be wrong, to notice it the following day, and to fix it then. I would have said “oops” and quietly corrected the code without feeling like an ass.
The third problem, however, was serious. And I could have prevented it with a truly minimal amount of effort, just by running:
perl -cw the-script
This would have diagnosed the syntax error, and avoided the main problem at hardly any cost. I think I usually remember to do something like this. Had I done it this time, the modified script would have gone into production, would have run correctly, and then I could have fixed the broken timing calculation on Monday.
In the previous article I showed the test program that I wrote to test the time calculation after the program produced the wrong output. I think it was reasonable to postpone writing this until after program ran and produced the wrong output. (The program's behavior in all other respects was correct and unmodified; it was only its report about its running time that was incorrect.) To have written the test ahead of time might be an excess of caution.
There has to be a tradeoff between cautious preparation and risk. Here I put everything on the side of risk, even though a tiny amount of caution would have eliminated most of the risk. In my haste, I made a bad trade.
[ Addendum 20170216: I am looking into automating the perl -cw
check. ]
[Other articles in category /prog] permanent link
Tue, 14 Feb 2017
How I got three errors into one line of code
At work we had this script that was trying to report how long it had
taken to run, and it was using DateTime::Duration
:
my $duration = $end_time->subtract_datetime($start_time);
my ( $hours, $minutes, $seconds ) =
$duration->in_units( 'hours', 'minutes', 'seconds' );
log_info "it took $hours hours $minutes minutes and $seconds seconds to run"
This looks plausible, but because
DateTime::Duration
is shit,
it didn't work. Typical output:
it took 0 hours 263 minutes and 19 seconds to run
I could explain to you why it does this, but it's not worth your time.
I got tired of seeing 0 hours 263 minutes
show up in my cron email
every morning, so I went to fix it. Here's what I changed it to:
my $duration = $end_time->subtract_datetime_absolute($start_time)->seconds;
my ( $hours, $minutes, $minutes ) = (int(duration/3600), int($duration/60)%60, $duration%3600);
I was at some pains to get that first line right, because getting
DateTime
to produce a useful time interval value is a tricky
proposition. I did get the first line right. But the second line is
just simple arithmetic, I have written it several times before, so I
dashed it off, and it contains a syntax error, that duration/3600
is
missing its dollar sign, which caused the cron job to crash the next
day.
A co-worker got there before I did and fixed it for me. While he was
there he also fixed the $hours, $minutes, $minutes
that should have
been $hours, $minutes, $seconds
.
I came in this morning and looked at the cron mail and it said
it took 4 hours 23 minutes and 1399 seconds to run
so I went back to fix the third error, which is that $duration%3600
should have been $duration%60
. The thrice-corrected line has
my ( $hours, $minutes, $seconds ) = (int($duration/3600), int($duration/60)%60, $duration%60);
What can I learn from this? Most obviously, that I should have tested my code before I checked it in. Back in 2013 I wrote:
Usually I like to draw some larger lesson from this sort of thing. … “Just write the tests, fool!”
This was a “just write the tests, fool!” moment if ever there was one. Madame Experience runs an expensive school, but fools will learn in no other.
I am not completely incorrigible. I did at least test the fixed code before I checked that in. The test program looks like this:
sub dur {
my $duration = shift;
my ($hours, $minutes, $seconds ) = (int($duration/3600), int($duration/60)%60, $duration%60);
sprintf "%d:%02d:%02d", $hours, $minutes, $seconds;
}
use Test::More;
is(dur(0), "0:00:00");
is(dur(1), "0:00:01");
is(dur(59), "0:00:59");
is(dur(60), "0:01:00");
is(dur(62), "0:01:02");
is(dur(122), "0:02:02");
is(dur(3599), "0:59:59");
is(dur(3600), "1:00:00");
is(dur(10000), "2:46:40");
done_testing();
It was not necessary to commit the test program, but it was necessary to write it and to run it. By the way, the test program failed the first two times I ran it.
Three errors in one line isn't even a personal worst. In 2012 I posted here about getting four errors into a one-line program.
[ Addendum 20170215: I have some further thoughts on this. ]
[Other articles in category /oops] permanent link
Tue, 07 Feb 2017
How many 24 puzzles are there?
[ Note: The tables in this article are important, and look unusually crappy if you read this blog through an aggregator. The properly-formatted version on my blog may be easier to follow. ]
A few months ago I wrote about puzzles of the following type: take four digits, say 1, 2, 7, 7, and, using only +, -, ×, and ÷, combine them to make the number 24. Since then I have been accumulating more and more material about these puzzles, which will eventually appear here. But meantime here is a delightful tangent.
In the course of investigating this I wrote programs to enumerate the solutions of all possible puzzles, and these programs were always much faster than I expected at first. It appears as if there are 10,000 possible puzzles, from «0,0,0,0» through «9,9,9,9». But a moment's thought shows that there are considerably fewer, because, for example, the puzzles «7,2,7,1», «1,2,7,7», «7,7,2,1», and «2,7,7,1» are all the same puzzle. How many puzzles are there really?
A back-of-the-envelope estimate is that only about 1 in 24 puzzles is really distinct (because there are typically 24 ways to rearrange the elements of a puzzle) and so there ought to be around !!\frac{10000}{24} \approx 417!! puzzles. This is an undercount, because there are fewer duplicates of many puzzles; for example there are not 24 variations of «1,2,7,7», but only 12. The actual number of puzzles turns out to be 715, which I think is not an obvious thing to guess.
Let's write !!S(d,n)!! for the set of sequences of length !!n!! containing up to !!d!! different symbols, with the duplicates removed: when two sequences are the same except for the order of their symbols, we will consider them the same sequence.
Or more concretely, we may imagine that the symbols are sorted into nondecreasing order, so that !!S(d,n)!! is the set of nondecreasing sequences of length !!n!! of !!d!! different symbols.
Let's also write !!C(d,n)!! for the number of elements of !!S(d,n)!!.
Then !!S(10, 4)!! is the set of puzzles where input is four digits. The claim that there are !!715!! such puzzles is just that !!C(10,4) = 715!!. A tabulation of !!C(\cdot,\cdot)!! reveals that it is closely related to binomial coefficients, and indeed that $$C(d,n)=\binom{n+d-1}{d-1}.\tag{$\heartsuit$}$$
so that the surprising !!715!! is actually !!\binom{13}{9}!!. This is not hard to prove by induction, because !!C(\cdot,\cdot)!! is easily shown to obey the same recurrence as !!\binom\cdot\cdot!!: $$C(d,n) = C(d-1,n) + C(d,n-1).\tag{$\spadesuit$}$$
To see this, observe that an element of !!C(d,n)!! either begins with a zero or with some other symbol. If it begins with a zero, there are !!C(d,n-1)!! ways to choose the remaining !!n-1!! symbols in the sequence. But if it begins with one of the other !!d-1!! symbols it cannot contain any zeroes, and what we really have is a length-!!n!! sequence of the symbols !!1\ldots (d-1)!!, of which there are !!C(d-1, n)!!.
0 0 0 0 | 1 1 1 |
0 0 0 1 | 1 1 2 |
0 0 0 2 | 1 1 3 |
0 0 0 3 | 1 1 4 |
0 0 1 1 | 1 2 2 |
0 0 1 2 | 1 2 3 |
0 0 1 3 | 1 2 4 |
0 0 2 2 | 1 3 3 |
0 0 2 3 | 1 3 4 |
0 0 3 3 | 1 4 4 |
0 1 1 1 | 2 2 2 |
0 1 1 2 | 2 2 3 |
0 1 1 3 | 2 2 4 |
0 1 2 2 | 2 3 3 |
0 1 2 3 | 2 3 4 |
0 1 3 3 | 2 4 4 |
0 2 2 2 | 3 3 3 |
0 2 2 3 | 3 3 4 |
0 2 3 3 | 3 4 4 |
0 3 3 3 | 4 4 4 |
Now we can observe that !!\binom74=\binom73!! (they are both 35) so that !!C(5,3) = C(4,4)!!. We might ask if there is a combinatorial proof of this fact, consisting of a natural bijection between !!S(5,3)!! and !!S(4,4)!!. Using the relation !!(\spadesuit)!! we have:
$$ \begin{eqnarray} C(4,4) & = & C(3, 4) + & C(4,3) \\ C(5,3) & = & & C(4,3) + C(5,2) \\ \end{eqnarray}$$
so part of the bijection, at least, is clear: There are !!C(4,3)!! elements of !!S(4,4)!! that begin with a zero, and also !!C(4,3)!! elements of !!S(5, 3)!! that do not begin with a zero, so whatever the bijection is, it ought to match up these two subsets of size 20. This is perfectly straightforward; simply match up !!«0, a, b, c»!! (blue) with !!«a+1, b+1, c+1»!! (pink), as shown at right.
But finding the other half of the bijection, between !!S(3,4)!! and !!S(5,2)!!, is not so straightforward. (Both have 15 elements, but we are looking for not just any bijection but for one that respects the structure of the elements.) We could apply the recurrence again, to obtain:
$$ \begin{eqnarray} C(3,4) & = \color{darkred}{C(2, 4)} + \color{darkblue}{C(3,3)} \\ C(5,2) & = \color{darkblue}{C(4,2)} + \color{darkred}{C(5,1)} \end{eqnarray}$$
and since $$ \begin{eqnarray} \color{darkred}{C(2, 4)} & = \color{darkred}{C(5,1)} \\ \color{darkblue}{C(3,3)} & = \color{darkblue}{C(4,2)} \end{eqnarray}$$
we might expect the bijection to continue in that way, mapping !!\color{darkred}{S(2,4) \leftrightarrow S(5,1)}!! and !!\color{darkblue}{S(3,3) \leftrightarrow S(4,2)}!!. Indeed there is such a bijection, and it is very nice.
To find the bijection we will take a detour through bitstrings. There is a natural bijection between !!S(d, n)!! and the bit strings that contain !!d-1!! zeroes and !!n!! ones. Rather than explain it with pseudocode, I will give some examples, which I think will make the point clear. Consider the sequence !!«1, 1, 3, 4»!!. Suppose you are trying to communicate this sequence to a computer. It will ask you the following questions, and you should give the corresponding answers:
At each stage the
computer asks about the identity of the next symbol. If the answer is
“yes” the computer has learned another symbol and moves on to the next
element of the sequence. If it is “no” the computer tries guessing a
different symbol. The “yes” answers become ones and “no”
answers become zeroes, so that the resulting bit string is 0 1 1 0 0 1 0 1
.
It sometimes happens that the computer figures out all the elements of the sequence before using up its !!n+d-1!! questions; in this case we pad out the bit string with zeroes, or we can imagine that the computer asks some pointless questions to which the answer is “no”. For example, suppose the sequence is !!«0, 1, 1, 1»!!:
The bit string is 1 0 1 1 1 0 0 0
, where the final three 0
bits are
the padding.
We can reverse the process, simply taking over the role of the
computer. To find the sequence that corresponds to the bit string
0 1 1 0 1 0 0 1
, we ask the questions ourselves and use the bits as the
answers:
We have recovered the sequence !!«1, 1, 2, 4»!! from the
bit string 0 1 1 0 1 0 0 1
.
This correspondence establishes relation !!(\heartsuit)!! in a different way from before: since there is a natural bijection between !!S(d, n)!! and the bit strings with !!d-1!! zeroes and !!n!! ones, there are certainly !!\binom{n+d-1}{d-1}!! of them as !!(\heartsuit)!! says because there are !!n+d-1!! bits and we may choose any !!d-1!! to be the zeroes.
We wanted to see why !!C(5,3) = C(4,4)!!. The detour above shows that there is a simple bijection between
!!S(5,3)!! and the bit strings with 4 zeroes and 3 ones
on one hand, and between
!!S(4,4)!! and the bit strings with 3 zeroes and 4 ones
on the other hand. And of course the bijection between the two sets of bit strings is completely obvious: just exchange the zeroes and the ones.
The table below shows the complete bijection between !!S(4,4)!! and its descriptive bit strings (on the left in blue) and between !!S(5, 3)!! and its descriptive bit strings (on the right in pink) and that the two sets of bit strings are complementary. Furthermore the top portion of the table shows that the !!S(4,3)!! subsets of the two families correspond, as they should—although the correct correspondence is the reverse of the one that was displayed earlier in the article, not the suggested !!«0, a, b, c» \leftrightarrow «a+1, b+1, c+1»!! at all. Instead, in the correct table, the initial digit of the !!S(4,4)!! entry says how many zeroes appear in the !!S(5,3)!! entry, and vice versa; then the increment to the next digit says how many ones, and so forth.
!!S(4,4)!! | (bits) | (complement bits) | !!S(5,3)!! |
---|---|---|---|
0 0 0 0 | 1 1 1 1 0 0 0 | 0 0 0 0 1 1 1 | 4 4 4 |
0 0 0 1 | 1 1 1 0 1 0 0 | 0 0 0 1 0 1 1 | 3 4 4 |
0 0 0 2 | 1 1 1 0 0 1 0 | 0 0 0 1 1 0 1 | 3 3 4 |
0 0 0 3 | 1 1 1 0 0 0 1 | 0 0 0 1 1 1 0 | 3 3 3 |
0 0 1 1 | 1 1 0 1 1 0 0 | 0 0 1 0 0 1 1 | 2 4 4 |
0 0 1 2 | 1 1 0 1 0 1 0 | 0 0 1 0 1 0 1 | 2 3 4 |
0 0 1 3 | 1 1 0 1 0 0 1 | 0 0 1 0 1 1 0 | 2 3 3 |
0 0 2 2 | 1 1 0 0 1 1 0 | 0 0 1 1 0 0 1 | 2 2 4 |
0 0 2 3 | 1 1 0 0 1 0 1 | 0 0 1 1 0 1 0 | 2 2 3 |
0 0 3 3 | 1 1 0 0 0 1 1 | 0 0 1 1 1 0 0 | 2 2 2 |
0 1 1 1 | 1 0 1 1 1 0 0 | 0 1 0 0 0 1 1 | 1 4 4 |
0 1 1 2 | 1 0 1 1 0 1 0 | 0 1 0 0 1 0 1 | 1 3 4 |
0 1 1 3 | 1 0 1 1 0 0 1 | 0 1 0 0 1 1 0 | 1 3 3 |
0 1 2 2 | 1 0 1 0 1 1 0 | 0 1 0 1 0 0 1 | 1 2 4 |
0 1 2 3 | 1 0 1 0 1 0 1 | 0 1 0 1 0 1 0 | 1 2 3 |
0 1 3 3 | 1 0 1 0 0 1 1 | 0 1 0 1 1 0 0 | 1 2 2 |
0 2 2 2 | 1 0 0 1 1 1 0 | 0 1 1 0 0 0 1 | 1 1 4 |
0 2 2 3 | 1 0 0 1 1 0 1 | 0 1 1 0 0 1 0 | 1 1 3 |
0 2 3 3 | 1 0 0 1 0 1 1 | 0 1 1 0 1 0 0 | 1 1 2 |
0 3 3 3 | 1 0 0 0 1 1 1 | 0 1 1 1 0 0 0 | 1 1 1 |
1 1 1 1 | 0 1 1 1 1 0 0 | 1 0 0 0 0 1 1 | 0 4 4 |
1 1 1 2 | 0 1 1 1 0 1 0 | 1 0 0 0 1 0 1 | 0 3 4 |
1 1 1 3 | 0 1 1 1 0 0 1 | 1 0 0 0 1 1 0 | 0 3 3 |
1 1 2 2 | 0 1 1 0 1 1 0 | 1 0 0 1 0 0 1 | 0 2 4 |
1 1 2 3 | 0 1 1 0 1 0 1 | 1 0 0 1 0 1 0 | 0 2 3 |
1 1 3 3 | 0 1 1 0 0 1 1 | 1 0 0 1 1 0 0 | 0 2 2 |
1 2 2 2 | 0 1 0 1 1 1 0 | 1 0 1 0 0 0 1 | 0 1 4 |
1 2 2 3 | 0 1 0 1 1 0 1 | 1 0 1 0 0 1 0 | 0 1 3 |
1 2 3 3 | 0 1 0 1 0 1 1 | 1 0 1 0 1 0 0 | 0 1 2 |
1 3 3 3 | 0 1 0 0 1 1 1 | 1 0 1 1 0 0 0 | 0 1 1 |
2 2 2 2 | 0 0 1 1 1 1 0 | 1 1 0 0 0 0 1 | 0 0 4 |
2 2 2 3 | 0 0 1 1 1 0 1 | 1 1 0 0 0 1 0 | 0 0 3 |
2 2 3 3 | 0 0 1 1 0 1 1 | 1 1 0 0 1 0 0 | 0 0 2 |
2 3 3 3 | 0 0 1 0 1 1 1 | 1 1 0 1 0 0 0 | 0 0 1 |
3 3 3 3 | 0 0 0 1 1 1 1 | 1 1 1 0 0 0 0 | 0 0 0 |
Observe that since !!C(d,n) = \binom{n+d-1}{d-1} = \binom{n+d-1}{n} = C(n+1, d-1)!! we have in general that !!C(d,n) = C(n+1, d-1)!!, which may be surprising. One might have guessed that since !!C(5,3) = C(4,4)!!, the relation was !!C(d,n) = C(d+1, n-1)!! and that !!S(d,n)!! would have the same structure as !!S(d+1, n-1)!!, but it isn't so. The two arguments exchange roles. Following the same path, we can identify many similar ‘coincidences’. For example, there is a simple bijection between the original set of 715 puzzles, which was !!S(10,4)!!, and !!S(5,9)!!, the set of nondecreasing sequences of !!0\ldots 4!! of length !!9!!.
[ Thanks to Bence Kodaj for a correction. ]
[Other articles in category /math] permanent link
Tue, 31 Jan 2017Below, One Liberty Place, the second-tallest building in my home city of Philadelphia. (Completed 1987, height 288 meters.)
Below, Zhongtian International Mansion at Fortune Plaza, the tallest building in Ürümqi, capital city of Xinjiang in northwest China.
(Completed 2007, height 230 meters.)
[ Addendum: Perhaps I should mention that One Liberty Place is itself widely seen as a knockoff of the much more graceful and elegant Chrysler Building in New York City. (Completed 1930, height 319 meters.) ]
[ Addendum: I brought this to the attention of GroJLart, the foulmouthed architecture blogger who knows everything, absolutely everything, about Philadelphia buildings, and he said “Thanks. I wrote an article on the same subject in 2011”. Of course. ]
[Other articles in category /misc] permanent link
Mon, 30 Jan 2017
Digit symbols in the Parshvanatha magic square
In last month's article about the magic square at the Parshvanatha temple, shown at right, I said:
It has come to my attention that the digit symbols in the magic square are not too different from the current forms of the digit symbols in the Gujarati script. The temple is not very close to Gujarat or to the area in which Gujarati is common, so I guess that the digit symbols in Indian languages have evolved in the past thousand years, with the Gujarati versions remaining closest to the ancient forms, or else perhaps Gujarati was spoken more widely a thousand years ago. I would be interested to hear about this from someone who knows.
Shreevatsa R. replied in detail, and his reply was so excellent that, finding no way to improve it by adding or taking away, I begged his permission to republish it without change, which he generously granted.
Am sending this email to say:
The Parshvanatha temple is located in the current state of Madhya Pradesh. Here is the location of the temple within a map of the state:
And here you can see that the above state of Madhya Pradesh (14 in the image below) is adjacent to the state of Gujarat (7):
The states of India are (sort of) organized along linguistic lines, and neighbouring states often have overlap or similarities in their languages. So a priori it shouldn't be too surprising if the language is that of a neighbouring state.
But, as you rightly say, the location of the Parshvanatha temple is actually quite far from the state (7) where Gujarat is spoken; it's closer to 27 in the above map (state named Uttar Pradesh).
Well, the Parshvanatha temple is believed to have been built "during the reign of the Chandela king Dhanga", and the Chandela kings were feudatories (though just beginning to assert sovereignty at the time) of the Gurjara-Pratihara kings, and "Gurjara" is where the name of the language of "Gujarati" comes from. So it's possible that they used the "official" language of the reigning kings, as with colonies. In fact the green area of the Gurjara-Pratihara kings in this map covers the location of the Parshvanatha temple:
But actually this is not a very convincing argument, because the link between Gurjara-Pratiharas and modern Gujarati is not too strong (at least I couldn't find it in a few minutes on Wikipedia :P)
So moving on...
Are the numerals really similar to Gujarati numerals? These are the numbers 1 to 16 from your blog post, ordered according to the usual order:
These are the numerals in a few current Indic scripts (as linked from your blog post):
Look at the first two rows above. Perhaps because of my familiarity with Devanagari, I cannot really see any big difference between the Devanagari and Gujarati symbols except for the 9: the differences are as minor as variation between fonts. (To see how much the symbols can change because of font variation, one can go to Google Fonts' Devanagari page and Google Fonts' Gujarati page and click on one of the sample texts and enter "० १ २ ३ ४ ५ ६ ७ ८ ९" and "૦ ૧ ૨ ૩ ૪ ૫ ૬ ૭ ૮ ૯" respectively, then "Apply to all fonts". Some fonts are bad, though.)
(In fact, even the Gurmukhi and Tibetan are somewhat recognizable, for someone who can read Devanagari.)
So if we decide that the Parshvanatha temple's symbols are actually closer not to modern Gujarati but to modern Devanagari (e.g. the "3" has a tail in the temple symbols which is present in Devanagari but missing in Gujarati), then the mystery disappears: Devanagari is still the script used in the state of Madhya Pradesh (and Uttar Pradesh, etc: it's the script used for Hindi, Marathi, Nepali, Sanskrit, and many other languages).
Finally, for the complete answer, we can turn to history.
The Parshvanatha temple was built during 950 to 970 CE. Languages: Modern Gujarati dates from 1800, Middle Gujarati from ~1500 to 1800, Old Gujarati from ~1100 to 1500. So the temple is older than the earliest language called "Gujarati". (Similarly, modern Hindi is even more recent.) Turning to scripts instead: see under Brahmic scripts.
So at the time the temple was built, neither Gujarati script nor Devanagari proper existed. The article on the Gujarati script traces its origin to the Devanagari script, which itself is a descendant of Nagari script.
At right are the symbols from the Nagari script, which I think are closer in many respects to the temple symbols.
So overall, if we trace the numerals in (a subset of) the family tree of scripts:
Brahmi > Gupta > Nagari > Devanagari > Gujarati
we'll find that the symbols of the temple are somewhere between the "Nagari" and "Devanagari" forms. (Most of the temple digits are the same as in the "Nagari" example above, except for the 5 which is closer to the Devanagari form.)
BTW, your post was about the numerals, but from being able to read modern Devanagari, I can also read some of the words above the square: the first line ends with ".. putra śrī devasarmma" (...पुत्र श्री देवसर्म्म) (Devasharma, son of...), and these words have the top bar which is missing in Gujarati script.
[Other articles in category /lang] permanent link