I wish people would stop referring to Datomic as datalog. Datomic is many things, but only the query format (Horn clauses with unification of variables, similar to prolog) has anything to do with datalog.
Real datalog is far more interesting since it implicitly encodes recursion allowing you to chain rules. Rule A derives new facts, which rule B uses to derive new facts, which rules A and C use to derive new facts, and so on. Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.
Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it. Hellerstein's work at UC Berkeley (https://dsf.berkeley.edu/papers/sigrec10-declimperative.pdf) explores this in some detail.
> Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.
> Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it.
Sounds cool. What's the complexity of running this kind of recursive reasoning? Reasonable? Can you suggest any tools to not have to implement it ourselves?
Souffle and Cozo mentioned below already implement the whole of "traditional" datalog.
Percival (https://github.com/ekzhang/percival) has some very nice examples showing how you can interactively write and test rules on top of a datalog interpreter.
Bud (http://bloom-lang.net/bud/) is Hellerstein's proof of concept playground. It has bit-rotted in the past few years, but the examples are readable even if you can't easily get it working.
The complexity can be quite good. You can syntactically determine when you've written linear recursion (equivalent to a for loop) vs not. Otherwise, the complexity is what you'd expect from incremental view maintenance in a normal SQL database. Which is to say O(n^k) with k being the number of relations joined, but usually much, much less with appropriate indexes and skew in the data. All the usual tricks concerning data normalization and indexes from databases apply.
> We present a novel approach to parallel materialisation (i.e., fixpoint computation) of datalog programs in centralised, main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates.
> Materialisation is PTIME-complete in data complexity
and is thus believed to be inherently sequential. Nevertheless, many practical parallelisation techniques have been developed [...]
Datalog feels so much more intuitive than SQL or any other query language I've used. I'm able to write concise, complex expressions pretty easily. In a SQL-based system, there seems to be a (low) complexity metric where it's easier to write/debug/maintain what was supposed to be a 'declarative' SQL query in a functional/imperative language instead. It feels like datalog is the next evolution of a declarative query language, one that is much more declarative than SQL itself.
In the "day of datomic" videos, there is a segment where Stu debugs a slow query. He does the debugging without even looking at the data model, only by rearranging the clauses. It is really, really impressive, and I can't imagine having that capability in SQL.
I greatly respect what Stu and Rich have done to make Datomic.
However, they made an explicit design decision to not include a query optimizer and execute the clauses as they were written. This is usually fine since the author has some idea of what the best order is, but there are O(2^k) different permutations of clauses so doing it by hand will fail at some point (if you want the optimal ordering).
I asked Rich about his thoughts on query optimizers last year (not in the context of Datomic specifically) and his only reservation was around the practical implications for the operational experience. Specifically, that database systems should always provide the means for developers to control exactly how/when/whether existing (cached) execution plans get re-optimized, otherwise query optimizers can actually be a source of greater problems than they solve, particularly for applications with extremely rapid changes in data.
One of the easiest to get started on Datalog in my opinion is really clingo https://potassco.org/clingo/ , which can be pip installed and has python bindings. Answer Set Programming goes beyond datalog, but it holds datalog semantics as a sublanguage. It is unfortunate this is not well advertised.
Datalog has some really cool use cases for program analysis (more or less relations represent an overapproximation of possible values variables can have). The declarative constrained nature of the language enables incrementalization, composition of analysis, and connection back to other formal aspects like types.
Logseq does! We at Logseq also believe in giving back so we sponsored learndatalogtoday.org when it went down - https://github.com/sponsors/jonase . I'd encourage others to do the same if they appreciate the author for making datalog more accessible
It's a shame that there doesn't seem to be any decent open-source implementation of Datalog. If you go for full Prolog instead of Datalog, there are several (Scryer Prolog being my personal favourite).
1. Datomic - While not open-source, it has an open-source version called Datomic Free, which is a distributed database designed to enable scalable, flexible, and intelligent data storage and queries. Datomic's query language is closely inspired by Datalog.
2. DataScript - An open-source in-memory database and query engine for Clojure, ClojureScript, and JavaScript that is heavily influenced by Datalog and Datomic.
3. Crux (now XTDB) - A bitemporal database with Datalog-inspired querying capabilities. It is designed for efficient querying of historical data and offers ACID transactions.
4. Racket's miniKanren - While not strictly a database, miniKanren is an open-source logic programming extension to the Racket language, which is inspired by Datalog and can be used to manipulate and query data in a manner similar to Prolog.
5. LogicBlox - An open-source platform that combines a database system, a Datalog-based modeling language, and application server facilities. It allows developers to build complex, data-intensive applications.
6. Soufflé - A Datalog-inspired language that is designed for static analysis problems. It can be viewed as a database query language with a focus on performance, allowing for parallel execution of queries.
7. Dedalus - A Datalog-like temporal logic language used to express complex distributed systems. It is primarily a research tool but has informed the design of other Datalog-inspired systems.
8. Flora-2 - An open-source object-oriented knowledge representation and reasoning system that integrates a variant of Datalog with objects and frames.
Top 3 are from the Clojure ecosystem. Additionnaly in this same space there is Datalevin & Datahike among many others
Is LogicBlox open-source now? I encountered it on a project several years ago and at that point it was very much closed/commercial.
Now the website isn't even loading... has the project been shuttered? I know LogicBlox was acquired by Predictix a long time ago, and recently Infor acquired Predictix. Hoping the project is still a going concern, there was some very cool tech in there.
LogicBlox and Predicix merged. Predictix was just a company created solely to be a customer of LogicBlox. In the end it couldn’t survive on its own. Infor acquired LogicBlox and then reneged on some agreements, leading to a mass exodus. Infor made zero effort to sell LogicBlox technology or solutions, and then essentially threw their hundreds of millions away. It may never see the light of day again.
Cozo is very attractive. I just wish there was a native Rust DSL API for it, so it could be embedded in Rust programs without using datalog queries in strings.
the thing is written in Rust. but does not expose a Rust query API, you have to query it through Datalog queries in strings; what you shared there just builds those strings from python.. it'd be nice to have a directly native API, with horne clauses constructed in Rust.
Not only is the join algorithm patented, but my understanding is the original authors of it can't even use it, because the LogicBlox IP was acquired but the people moved on.
But some have since gone on to create new stuff @ RelationalAI
CodeQL is another datalog with the domain of code analysis as its use case. Too bad you cannot create a custom fact database with CodeQL. Otherwise, the implementation of CodeQL is pretty advanced and efficient.
Souffle deserves a high ranking. It really is a nice datalog. If only field reference was possible, instead of excessive tuple pattern matching. (Though, one could write a preprocessor for it, it is just syntax).
Differential datalog is also nice, if you have the usecase.
ErgoAI is as "an enterprise-level extension of the Flora-2 system" which was recently open-sourced: https://github.com/ErgoAI . It seems to be well documented.
Also GDL and its variants, but that is more of a domain-specific language for game descriptions and general game-playing runtimes. Still, they refer to Datalog as its basis.
#1 is wrong. Datomic Free is not open source, either. The Apache License they mention applies only to the binaries, which is confusing, if not actually misleading.
Would folks please stop publishing nonsense about simple relational algebra implementations claiming they are Datalog, even though there is no recursion.
This is like seeing an article about how to implement context free parsing and finding one about implementing regular expressions with DFAs.
Mangle https://github.com/google/mangle is an open-source implementation in golang, it was an explicit goal to make it easy to learn. Meaning: it is easy to recognize the pure datalog part, the syntax is following the good old course material.
I learned a bit of Datalog back in university, too many years ago. It was impressive how powerful the query language is. You could do in a single line what required several lines of SQL, and far more intuitively.
But ... the problem is how many DBs support it, and how useful of a skill it is to know.
For the idiot in the thread, why would I use datalog (which I've never heard of before) over SQL?
Having looked quickly at it just now it seems (Wikipedia article) similar to Web Ontology Language (OWL), though I believe datalog may have been around long before owl.
On a syntax level, parsing, generating, and templating datalog is _much_ simpler than doing the same to SQL. DBT would never exist if every SQL database accepted datalog queries and SQL injection attacks would be rare to non-existent.
The more interesting answer is to think of datalog as making it easy to encode nearly all of your application logic as a bunch of self-referencing, incrementally updated, materialized views. Some examples:
# view of Users table for currently logged in user
LoggedInUserView(name, email, id) :- Users(id:
payload["userId"], name, email), Cookies(name: "login", payload).
# view of Users for admin
AdminUserView(name, email, id) :- Users(id, name, email), Cookies(name: "login", payload), payload["isAdmin"] = true.
# posts a user can see
PostsView(title, content, id) :- Posts(title, content, public: true).
PostsView(title, content, id) :- Posts(title, content, author: payload["userId"]), Cookies(name: "login", payload).
And then you write your UI code to explicitly reference these derived views rather than manually wrapping an API around querying the Posts table and doing the filtering.
The examples above can be neatly replicated in Supabase or Postgraphile (the OG of auto-generated GraphQL over Postgres), but you can do a lot more with datalog as a language. The Hellerstein paper mentioned above is a good starting place.
I can't really understand this without seeing the equivalent SQL as I don't understand datalog.
Is this SQL query similar to the first datalog query you listed? I apologize for how HN formatted the below and my lack of understanding for how to get around it.
Select u.name, u.email, u.id
From Users u
Join Cookies c on u.name = c.name
create view LoggedInUserView as
select u.id as id, u.name as name, u.email as email
from Users u
join Cookies c on c.name = 'login' and u.id = c.payload->>'userId';
where payload is a json blob. (Indent code by 2+ spaces.)
Most people wouldn't design their schema in a SQL database like this with a bunch of special-use relations/views, but datalog encourages you to do so since defining a new relation is the only means of abstraction. In effect, you've created an API endpoint similar to /api/auth/users and, what's more, you can use the LoggedInUserView in other rules to define new relations.
Datalog can be very effective for expressing certain kinds of problems and for generating efficient solutions to those problems. Particularly anything that is even mildly recursive, and therefore especially "knowledge graphs" that rely heavily on rules to infer, model and retrieve information. However if your problem domain amounts to CRUD storage without a need for complex recursion then mature SQL systems usually have all the advantages (asides from the syntax!). For a more formal answer:
> The intersection of databases, logic, and artificial intelligence gave raise to deductive databases. Deductive database systems are database management systems built around a logical model of data, and their query languages allow expressing logical queries. A deductive database system includes procedures for defining deductive rules which can infer information (in the so-called intensional database) in addition to the facts loaded in the (so-called extensional) database. The logic model for deductive databases is closely related to the relational model and, in particular, with the domain relational calculus. Datalog is the most known deductive query language (which syntactically is a Prolog subset) where constructed terms are not allowed as other non-declarative constructs such as the cut.
> Also following the relational model, relational database systems are well-known and widespread nowadays. Their formal query languages include relational algebra and relational calculi but, in practical systems, the de-facto and ANSI/ISO standard SQL is the language of choice of every relational database vendor. Whilst SQL and relational formal languages implement a limited form of logic, deductive database languages implement advanced forms of logic.
Real datalog is far more interesting since it implicitly encodes recursion allowing you to chain rules. Rule A derives new facts, which rule B uses to derive new facts, which rules A and C use to derive new facts, and so on. Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.
Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it. Hellerstein's work at UC Berkeley (https://dsf.berkeley.edu/papers/sigrec10-declimperative.pdf) explores this in some detail.