Learn Datalog Today

brendanyounger · on Jan 21, 2024

I wish people would stop referring to Datomic as datalog. Datomic is many things, but only the query format (Horn clauses with unification of variables, similar to prolog) has anything to do with datalog.

Real datalog is far more interesting since it implicitly encodes recursion allowing you to chain rules. Rule A derives new facts, which rule B uses to derive new facts, which rules A and C use to derive new facts, and so on. Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.

Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it. Hellerstein's work at UC Berkeley (https://dsf.berkeley.edu/papers/sigrec10-declimperative.pdf) explores this in some detail.

refset · on Jan 21, 2024

> Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.

> Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it.

There was at least one serious attempt to bring these worlds together: https://github.com/sixthnormal/clj-3df

6gvONxR4sf7o · on Jan 21, 2024

Sounds cool. What's the complexity of running this kind of recursive reasoning? Reasonable? Can you suggest any tools to not have to implement it ourselves?

brendanyounger · on Jan 21, 2024

Souffle and Cozo mentioned below already implement the whole of "traditional" datalog.

Percival (https://github.com/ekzhang/percival) has some very nice examples showing how you can interactively write and test rules on top of a datalog interpreter.

Bud (http://bloom-lang.net/bud/) is Hellerstein's proof of concept playground. It has bit-rotted in the past few years, but the examples are readable even if you can't easily get it working.

The complexity can be quite good. You can syntactically determine when you've written linear recursion (equivalent to a for loop) vs not. Otherwise, the complexity is what you'd expect from incremental view maintenance in a normal SQL database. Which is to say O(n^k) with k being the number of relations joined, but usually much, much less with appropriate indexes and skew in the data. All the usual tricks concerning data normalization and indexes from databases apply.

refset · on Jan 21, 2024

RDFox offers a rather impressive sounding Datalog inferencing engine: https://www.oxfordsemantic.tech/rdfox

> We present a novel approach to parallel materialisation (i.e., fixpoint computation) of datalog programs in centralised, main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates.

> Materialisation is PTIME-complete in data complexity and is thus believed to be inherently sequential. Nevertheless, many practical parallelisation techniques have been developed [...]

There have been several papers and patents describing their approach, e.g. http://www.cs.ox.ac.uk/dan.olteanu/papers/mnpho-aaai14.pdf

Clever321 · on Jan 21, 2024

Datalog feels so much more intuitive than SQL or any other query language I've used. I'm able to write concise, complex expressions pretty easily. In a SQL-based system, there seems to be a (low) complexity metric where it's easier to write/debug/maintain what was supposed to be a 'declarative' SQL query in a functional/imperative language instead. It feels like datalog is the next evolution of a declarative query language, one that is much more declarative than SQL itself.

In the "day of datomic" videos, there is a segment where Stu debugs a slow query. He does the debugging without even looking at the data model, only by rearranging the clauses. It is really, really impressive, and I can't imagine having that capability in SQL.

brendanyounger · on Jan 21, 2024

I greatly respect what Stu and Rich have done to make Datomic.

However, they made an explicit design decision to not include a query optimizer and execute the clauses as they were written. This is usually fine since the author has some idea of what the best order is, but there are O(2^k) different permutations of clauses so doing it by hand will fail at some point (if you want the optimal ordering).

refset · on Jan 21, 2024

I asked Rich about his thoughts on query optimizers last year (not in the context of Datomic specifically) and his only reservation was around the practical implications for the operational experience. Specifically, that database systems should always provide the means for developers to control exactly how/when/whether existing (cached) execution plans get re-optimized, otherwise query optimizers can actually be a source of greater problems than they solve, particularly for applications with extremely rapid changes in data.

philzook · on Jan 22, 2024

One of the easiest to get started on Datalog in my opinion is really clingo https://potassco.org/clingo/ , which can be pip installed and has python bindings. Answer Set Programming goes beyond datalog, but it holds datalog semantics as a sublanguage. It is unfortunate this is not well advertised.

  pip install clingo

  #! clingo 
  edge(1,2).
  edge(2,3).
  path(X,Y) :- edge(X,Y).
  path(X,Z) :- edge(X,Y), path(Y,Z).

Datalog has some really cool use cases for program analysis (more or less relations represent an overapproximation of possible values variables can have). The declarative constrained nature of the language enables incrementalization, composition of analysis, and connection back to other formal aspects like types.

Souffle is very good, logica has a nice install story. https://colab.research.google.com/drive/1KJ6xKSwpw5FWWkvUOyB... . Many other interesting datalog systems.

- https://www.philipzucker.com/notes/Languages/datalog/

- https://www.philipzucker.com/notes/Logic/answer-set-programm...

grepexdev · on Jan 21, 2024

I thought that syntax looked familiar! Looks like Logseq uses Datalog for advanced queries.

https://hub.logseq.com/features/av5LyiLi5xS7EFQXy4h4K8/getti...

packetlost · on Jan 21, 2024

More specifically, Logseq uses DataScript, a Datomic-inspired Datalog engine for ClojureScript.

cldwalker · on Jan 21, 2024

Logseq does! We at Logseq also believe in giving back so we sponsored learndatalogtoday.org when it went down - https://github.com/sponsors/jonase . I'd encourage others to do the same if they appreciate the author for making datalog more accessible

achileas · on Jan 21, 2024

They do! IIRC they were inspired by Roam’s use of it with Clojurescript

dang · on Jan 21, 2024

Learn Datalog - https://news.ycombinator.com/item?id=19154997 - Feb 2019 (1 comment)

Learn Datalog Today - https://news.ycombinator.com/item?id=17109105 - May 2018 (2 comments)

Learn Datalog Today - https://news.ycombinator.com/item?id=14434457 - May 2017 (10 comments)

Learn Datalog Today Ported to DataScript and Clojure (JVM) - https://news.ycombinator.com/item?id=13037199 - Nov 2016 (1 comment)

Learn Datalog Today – An interactive Datomic query tutorial - https://news.ycombinator.com/item?id=6171722 - Aug 2013 (7 comments)

FreeFull · on Jan 21, 2024

It's a shame that there doesn't seem to be any decent open-source implementation of Datalog. If you go for full Prolog instead of Datalog, there are several (Scryer Prolog being my personal favourite).

kjqgqkejbfefn · on Jan 21, 2024

1. Datomic - While not open-source, it has an open-source version called Datomic Free, which is a distributed database designed to enable scalable, flexible, and intelligent data storage and queries. Datomic's query language is closely inspired by Datalog.

2. DataScript - An open-source in-memory database and query engine for Clojure, ClojureScript, and JavaScript that is heavily influenced by Datalog and Datomic.

3. Crux (now XTDB) - A bitemporal database with Datalog-inspired querying capabilities. It is designed for efficient querying of historical data and offers ACID transactions.

4. Racket's miniKanren - While not strictly a database, miniKanren is an open-source logic programming extension to the Racket language, which is inspired by Datalog and can be used to manipulate and query data in a manner similar to Prolog.

5. LogicBlox - An open-source platform that combines a database system, a Datalog-based modeling language, and application server facilities. It allows developers to build complex, data-intensive applications.

6. Soufflé - A Datalog-inspired language that is designed for static analysis problems. It can be viewed as a database query language with a focus on performance, allowing for parallel execution of queries.

7. Dedalus - A Datalog-like temporal logic language used to express complex distributed systems. It is primarily a research tool but has informed the design of other Datalog-inspired systems.

8. Flora-2 - An open-source object-oriented knowledge representation and reasoning system that integrates a variant of Datalog with objects and frames.

Top 3 are from the Clojure ecosystem. Additionnaly in this same space there is Datalevin & Datahike among many others

lukev · on Jan 21, 2024

Is LogicBlox open-source now? I encountered it on a project several years ago and at that point it was very much closed/commercial.

Now the website isn't even loading... has the project been shuttered? I know LogicBlox was acquired by Predictix a long time ago, and recently Infor acquired Predictix. Hoping the project is still a going concern, there was some very cool tech in there.

namingisharder · on Jan 21, 2024

LogicBlox and Predicix merged. Predictix was just a company created solely to be a customer of LogicBlox. In the end it couldn’t survive on its own. Infor acquired LogicBlox and then reneged on some agreements, leading to a mass exodus. Infor made zero effort to sell LogicBlox technology or solutions, and then essentially threw their hundreds of millions away. It may never see the light of day again.

persnickety · on Jan 21, 2024

Cozo uses Datalog for queries, and has several backends, including SQLite

cmrdporcupine · on Jan 21, 2024

Cozo is very attractive. I just wish there was a native Rust DSL API for it, so it could be embedded in Rust programs without using datalog queries in strings.

soraki_soladead · on Jan 21, 2024

https://github.com/cozodb/pycozo/blob/main/pycozo/test_build...

Here's the python version of what I think you're looking for. Shouldn't be too difficult to port to rust.

cmrdporcupine · on Jan 21, 2024

ok but that's not what i want.

the thing is written in Rust. but does not expose a Rust query API, you have to query it through Datalog queries in strings; what you shared there just builds those strings from python.. it'd be nice to have a directly native API, with horne clauses constructed in Rust.

macmac · on Jan 21, 2024

Ad 1. All versions of Datomic are now free, but none are Open Source.

summarity · on Jan 21, 2024

Also Rego, which is Datalog with structured extensions, in use everywhere where OPA is used (as in many k8s environments)

blagund · on Jan 22, 2024

Note: Rego targets near linear execution speed, so it can't do recursion generally, except limited tree-like cases for sat y acl checks..

That's as intended, just saying don't expect it has full datalog power.

It deals with Json data nicely though (which my favorite souffle does not).

cmrdporcupine · on Jan 21, 2024

Another one is Differential Datalog, for streaming data.

https://github.com/vmware/differential-datalog

habitue · on Jan 21, 2024

I'm sad their last commit was 2 years ago, seemed like a really cool idea

tylerhou · on Jan 21, 2024

The authors spun it out into a startup, Feldera. A paper describing their idea also won Best Paper at VLDB 2023. The idea is very far from dead.

cmrdporcupine · on Jan 21, 2024

Neat. Had run into them before (the "careers" page was marked as visited in my Firefox history ;-) ), but didn't make the connection.

habitue · on Jan 22, 2024

That's awesome

j-pb · on Jan 21, 2024

Are you sure that LogicBlox is open-source? I couldn't find anything confirming this.

I'd be very surprised if they were, because they even patented their join algorithm.

cmrdporcupine · on Jan 21, 2024

It's definitely not open source.

Not only is the join algorithm patented, but my understanding is the original authors of it can't even use it, because the LogicBlox IP was acquired but the people moved on.

But some have since gone on to create new stuff @ RelationalAI

namingisharder · on Jan 21, 2024

Nine more years to go. But there has been plenty of quality research on worst case optimal joins in the meantime.

odipar · on Jan 21, 2024

CodeQL is another datalog with the domain of code analysis as its use case. Too bad you cannot create a custom fact database with CodeQL. Otherwise, the implementation of CodeQL is pretty advanced and efficient.

infima · on Jan 21, 2024

While not trivial because it is not documented, you can create your a database with your own facts. Some of the extractors that create the required files are open source https://github.com/github/codeql/blob/main/ruby/extractor/sr...

blagund · on Jan 22, 2024

Souffle deserves a high ranking. It really is a nice datalog. If only field reference was possible, instead of excessive tuple pattern matching. (Though, one could write a preprocessor for it, it is just syntax).

Differential datalog is also nice, if you have the usecase.

marcle · on Jan 21, 2024

ErgoAI is as "an enterprise-level extension of the Flora-2 system" which was recently open-sourced: https://github.com/ErgoAI . It seems to be well documented.

kevindamm · on Jan 21, 2024

Also GDL and its variants, but that is more of a domain-specific language for game descriptions and general game-playing runtimes. Still, they refer to Datalog as its basis.

KingMob · on Jan 22, 2024

#1 is wrong. Datomic Free is not open source, either. The Apache License they mention applies only to the binaries, which is confusing, if not actually misleading.

nezaj · on Jan 21, 2024

Here’s a blog post showing you how to roll your own in ~100 lines of JS

https://www.instantdb.com/essays/datalogjs

namingisharder · on Jan 22, 2024

Except that is not remotely Datalog.

Would folks please stop publishing nonsense about simple relational algebra implementations claiming they are Datalog, even though there is no recursion.

This is like seeing an article about how to implement context free parsing and finding one about implementing regular expressions with DFAs.

Jonovono · on Jan 21, 2024

CozoDB: https://github.com/cozodb/cozo

dagipflihax0r · on Jan 21, 2024

Mangle https://github.com/google/mangle is an open-source implementation in golang, it was an explicit goal to make it easy to learn. Meaning: it is easy to recognize the pure datalog part, the syntax is following the good old course material.

It was discussed here: https://news.ycombinator.com/item?id=33756800

cmrdporcupine · on Jan 21, 2024

https://en.wikipedia.org/wiki/Souffl%C3%A9_(programming_lang...

https://github.com/souffle-lang/souffle

manu3000 · on Jan 21, 2024

you can use Datalig within Flix https://flix.dev/

refset · on Jan 21, 2024

For comparison, I previously translated that cart parts scheduling example on the Flix homepage to Datomic-style Datalog syntax: https://gist.github.com/refset/21b3fc1dec9a6928943073809e133...

SJC_Hacker · on Jan 21, 2024

I learned a bit of Datalog back in university, too many years ago. It was impressive how powerful the query language is. You could do in a single line what required several lines of SQL, and far more intuitively.

But ... the problem is how many DBs support it, and how useful of a skill it is to know.

account-5 · on Jan 21, 2024

For the idiot in the thread, why would I use datalog (which I've never heard of before) over SQL?

Having looked quickly at it just now it seems (Wikipedia article) similar to Web Ontology Language (OWL), though I believe datalog may have been around long before owl.

brendanyounger · on Jan 21, 2024

On a syntax level, parsing, generating, and templating datalog is _much_ simpler than doing the same to SQL. DBT would never exist if every SQL database accepted datalog queries and SQL injection attacks would be rare to non-existent.

The more interesting answer is to think of datalog as making it easy to encode nearly all of your application logic as a bunch of self-referencing, incrementally updated, materialized views. Some examples:

  # view of Users table for currently logged in user
  LoggedInUserView(name, email, id) :- Users(id: 
  payload["userId"], name, email), Cookies(name: "login", payload).

  # view of Users for admin
  AdminUserView(name, email, id) :- Users(id, name, email), Cookies(name: "login", payload), payload["isAdmin"] = true.

  # posts a user can see
  PostsView(title, content, id) :- Posts(title, content, public: true).
  PostsView(title, content, id) :- Posts(title, content, author: payload["userId"]), Cookies(name: "login", payload).

And then you write your UI code to explicitly reference these derived views rather than manually wrapping an API around querying the Posts table and doing the filtering.

The examples above can be neatly replicated in Supabase or Postgraphile (the OG of auto-generated GraphQL over Postgres), but you can do a lot more with datalog as a language. The Hellerstein paper mentioned above is a good starting place.

7thaccount · on Jan 22, 2024

I can't really understand this without seeing the equivalent SQL as I don't understand datalog.

Is this SQL query similar to the first datalog query you listed? I apologize for how HN formatted the below and my lack of understanding for how to get around it.

Select u.name, u.email, u.id From Users u Join Cookies c on u.name = c.name

brendanyounger · on Jan 22, 2024

Almost. What I wrote is more like:

  create view LoggedInUserView as
  select u.id as id, u.name as name, u.email as email
  from Users u
  join Cookies c on c.name = 'login' and u.id = c.payload->>'userId';

where payload is a json blob. (Indent code by 2+ spaces.)

Most people wouldn't design their schema in a SQL database like this with a bunch of special-use relations/views, but datalog encourages you to do so since defining a new relation is the only means of abstraction. In effect, you've created an API endpoint similar to /api/auth/users and, what's more, you can use the LoggedInUserView in other rules to define new relations.

refset · on Jan 21, 2024

Datalog can be very effective for expressing certain kinds of problems and for generating efficient solutions to those problems. Particularly anything that is even mildly recursive, and therefore especially "knowledge graphs" that rely heavily on rules to infer, model and retrieve information. However if your problem domain amounts to CRUD storage without a need for complex recursion then mature SQL systems usually have all the advantages (asides from the syntax!). For a more formal answer:

> The intersection of databases, logic, and artificial intelligence gave raise to deductive databases. Deductive database systems are database management systems built around a logical model of data, and their query languages allow expressing logical queries. A deductive database system includes procedures for defining deductive rules which can infer information (in the so-called intensional database) in addition to the facts loaded in the (so-called extensional) database. The logic model for deductive databases is closely related to the relational model and, in particular, with the domain relational calculus. Datalog is the most known deductive query language (which syntactically is a Prolog subset) where constructed terms are not allowed as other non-declarative constructs such as the cut.

> Also following the relational model, relational database systems are well-known and widespread nowadays. Their formal query languages include relational algebra and relational calculi but, in practical systems, the de-facto and ANSI/ISO standard SQL is the language of choice of every relational database vendor. Whilst SQL and relational formal languages implement a limited form of logic, deductive database languages implement advanced forms of logic.

https://www.fdi.ucm.es/profesor/fernan/des/html/manual/manua...