Confusion regarding PATRICIA [closed]

Question

According to points 3 and 4 of libstdc++ documentation, PATRICIA tries have two types of nodes:

A (PATRICIA) trie is similar to a tree, but with the following differences:

It explicitly views keys as a sequence of elements. E.g., a trie can view a string as a sequence of characters; a trie can view a number as a sequence of bits.

It is not (necessarily) binary. Each node has fan-out n + 1, where n is the number of distinct elements.

It stores values only at leaf nodes.

Internal nodes have the properties that A) each has at least two children, and B) each shares the same prefix with any of its descendant.

The book I've been reading (Algorithms in C, Parts 1-4 by Robert Sedgewick) seems to describe a PATRICIA trie storing n values with only n nodes, using internal nodes to store values:

Like DSTs, patricia tries allow search for N keys in a tree with just N nodes. ... we avoid external nodes via another simple device: we store data in internal nodes and replace links to external nodes with links that point back upwards to the correct internal node in the trie

I guess I'm concerned about the accuracy of my resources (edit: ... and these aren't the only two references in dispute on this topic). Which of these references are correct?

I'm voting to reopen this question because the answer I have accepted is supported by facts, references, or expertise (i.e. the historical document that originally defined the term in question) and that contradicts the closing reason. I would like to give other people a chance to get my accepted answer, providing they can do a better job than I have. — undefined behaviour, Jul 4 at 7:03

undefined behaviour · Accepted Answer · 2013-04-09 14:17:22Z

Though Ivaylo Strandjev's answer does seem to be the common definition, I believe that common definition hasn't done justice by blurring the definition of PATRICIA. After all, there are already umbrella terms for this common definition.

I continued to search for a specific definition from past reputable sources to confirm what I had suspected, and I'm writing to provide my findings. Perhaps the most significant, though the least precise in terms of logical language is the original paper defining PATRICIA, published by DR Morrison in October 1968s "Journal of the ACM":

PATRICIA evolved from "A Library Automaton" [3] and other studies. ... Early in this evolution it was decided that the alphabet should be restricted to a binary one. A theorem which strongly influenced this decision is one which, in another form, is due to Euler. The theorem states that if the alphabet is binary, then the number of branches is exactly one less than the number of ends. Corollaries state that as the library grows, each new end brings into the library with it exactly one new branch, and each branch has exactly two exits. These facts are very useful in the allocation of storage for the index. They imply that the total storage required is completely determined by the number of ends, and all of the storage required will actually be used.

This seems to contradict points 2 and 3 of the libstdc++ reference. There's further evidence in this paper, though not quite as clear, to suggest that point 4 is also inaccurate.

I found a page entitled Patricia Variations that states quite clearly, and to the point, in the section named "Standard Patricia":

Patricia creates a binary tree, with one node per string stored. Each node records a bit offset, and you search for a string by moving down the tree, inspecting the bit with the given offset at each node, and branching left or right, depending on whether the bit inspected is zero or one. The search terminates when the bit offset you are to inspect stops increasing. This means that you have either looped back to revisit the node you have just left, or have moved up from a node to visit one of its ancestors...

This paragraph clearly contradicts points 2 and 3 of the libcstd++ reference. It implies that all nodes are internal nodes, as I suspected, that descendants in a PATRICIA trie won't always share a common prefix (eg. when a branch points back up the trie) which further contradicts point 4(B). Perhaps the most useful aspect of all, in the first paragraph it mentions a very reputable source: "section 6.3 of Knuth Volume 3 (The Art of Computer Programming, by D.E. Knuth)", which I've had a quick glance at. Knuth's description doesn't seem to significantly deviate from Morrison's idea, either.

In conclusion, there's a common, obscure definition which seems to have become a synonym for "radix trie" (and perhaps in some cases eg. libstdc++, "multi-way radix trie"), and a specific, original definition which deviates from the common definition quite significantly. I hope the resources revealed in this answer can be as helpful to other students intrigued by PATRICIA as it has been to me... Peace!

Ivaylo Strandjev · Answer 2 · 2013-04-09 13:02:39Z

up vote 1 down vote

Although both definitions seem to be correct, first one is more detailed and seems better to me. Also have a look at this answer, where I try to depict the difference between a Patricia and regular Trie.

edited Apr 9 '13 at 13:02

answered Apr 9 '13 at 8:02

Ivaylo Strandjev
42.5k54692

I don't trust that they can both be accurate. They use explanations and diagrams that contradict each other. – undefined behaviour Apr 9 '13 at 8:39

@modifiablelvalue I only judge by the text you pasted and I don't see anything obviously wrong in it. Again please have a look at the answer I link to in order to understand what they mean. – Ivaylo Strandjev Apr 9 '13 at 8:40

Do you know what an "internal node" is? Your first example is full of them, and your second example uses them to separate "ha" from "he" and "hat" from "hav". Sedgewick, who was a student of Knuth, by the way, seems to insist that PATRICIA ties don't use that one-way branching. Take a look at this image. Perhaps you should read chapter 15.3 of this book before deciding which of them is more detailed. Of course, I'm not going to write out the entire 19 pages including implementation for the purpose of this question. – undefined behaviour Apr 9 '13 at 11:24

Note how in the image, there is no valueless root, nor is any valueless internal node required to separate 'H' from 'I', and how there are no "leaf nodes" in this diagram; They all point back up into the trie. Neither of your diagrams look anything like that. From the diagrams at your link, I gather that you don't understand this question. – undefined behaviour Apr 9 '13 at 11:27

I seem to be complying to the common understanding of patricia tree also my tree looks pretty much like this one. I don't quite understand what does the image you link to show but it definetely does not seem like a patricia tree to me. – Ivaylo Strandjev Apr 9 '13 at 12:10

| show 2 more comments

asked	2 years ago
viewed	504 times
active	2 years ago

current community

your communities

more stack exchange communities

Confusion regarding PATRICIA [closed]

closed as not constructive by Andrew Barber May 14 '13 at 8:49

2 Answers 2

Not the answer you're looking for? Browse other questions tagged data-structures patricia-trie or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Confusion regarding PATRICIA [closed]

closed as not constructive by Andrew Barber May 14 '13 at 8:49

2 Answers 2

Did you find this question interesting? Try our newsletter

Not the answer you're looking for? Browse other questions tagged data-structures patricia-trie or ask your own question.

Linked

Related

Hot Network Questions