Looks very promising! We've looked at Cockroach for a particular project, and we've been concerned that performance wasn't good enough.
Cockroach performance seems to scale linearly, but single-connection performance, especially for small transactions, seems rather dismal. Some casual stress testing against a 3-node cluster on Kubernetes showed that small transactions modifying a single row could take as much as 7-8 seconds, where Postgres would take just a few milliseconds.
The documentation recommends that you batch as many updates as possible, but obviously that doesn't work for low-latency applications like web frontends that need to be able to do small, fine-grained modifications.
7-8 seconds seems extremely long. Human beings performing the raft consensus algorithm using paper and pencil over Skype wouldn't be much slower than that. Are you sure everything was working correctly?
7-8 seconds? Something definitely sounds misconfigured. I've been running a 1.1.x cluster for quite a while and I've never seen a single row transaction take that long. And even the slowest queries took at most ~500ms, and that was with:
- Replication factor increased to 5x (rather than the 3x default)
- 8 indexes on the table being modified which also needed to be updated
- Nodes spread across North America, incurring higher RTT latency between nodes
- Relatively high contention on the data triggering client-side retries
- HDD's as the storage medium (RockDB is optimized for SSDs)
" small transactions modifying a single row could take as much as 7-8 seconds"
That's surprising. I wasn't expecting CockroachDB to be really fast, given the constraints they work within. But that sounds more like a bug or config error. Unless perhaps you mean a really high number of processes trying to update the same row at the same time? Like a global counter or something?
Indeed, the stress test updates just one row, which mirrors certain write patterns in our application. I just started this testing, so we'll see what happens when I extend it to more than one row.
We have a collaborative, Google Docs-like application that currently issues a write every time someone types into a text field. Now, clearly it's suboptimal and something that should be optimized to batch the updates, but on the other hand, with Postgres we've had zero incentive to make such an optimization, because it's able to handle thousands of writes per node in real time with no queuing happening on the client. I don't expect this from Cockroach, but I would definitely want low latency.
Lordy, relational databases are not the way to go for that problem... With a single shared resource (document), you're going to be encountering write conflicts left and right.
Have you explored implementing a CRDT based solution like WOOT instead?
Definitely. The application is conceptually a transaction log of field/subfield patches, which would lend itself to something like an LSM, and we're looking at possible alternatives.
CRDTs could be a solution, but from what I gather they require too much context information to be viable for a text editing application. Our app currently uses something similar to OT.
> issues a write every time someone types into a text field
With more than a handful of people, this is getting into conflict territory pretty rapidly, especially if the document is structured as a single row (hopefully it's more granular than that). Time for some back of the envelope maths:
Assuming that an average person types at around 200 words per minute (number pulled from https://www.livechatinc.com/typing-speed-test/#/), that's a character every 300ms on average. With 10 people editing the document, that's a character every 30ms on average, which can easily lead to conflicts if they're all trying to update the same resource.
Perhaps it's event sourcing based. Every time someone types into a field it writes a row that something was typed which is a record of what was typed. Play it back and you have the full document with no conflicts.
I like what Cockroach is doing, I'm rooting for them to grow and survive. Unfortunately the only time I hear about it is when they post blogs. I never hear about it from other people.
raises hand we're using them extensively. They're our database of choice that we've paired with Nakama[1] which is an open-source, distributed server. Have nothing but great things to say about the database itself in terms of growing performance and the team behind it :). They've been great to us since day-1.
A couple of our use-cases include: good KV access (stored user data etc.) and listing blocks of data that has been pre-sorted on disk at insert time (leaderboard records, chat message history etc.). As well, the clustering technology is particularly useful at scale. We work in the games space with some very large games in production, which allows us to spread the load across multiple database nodes and offers us peace of mind regarding redundancy.
The thing I really don't get is why CockroachDB is avoid benchmarking with it's rival tidb (https://github.com/cockroachdb/docs/issues/1412). tidb already pretty mature, used in many big companies (Let's say, Didi, which on the similar scale data with Uber, and banks).
Even if I like CockroachDB's pg sql more, it would be helpful to have the comparison/benchmark to show something more.
TiDB looks promising, but it doesn't have serializable transactions at all, which makes it something of an apples-to-oranges comparison at the moment when it comes to OLTP.
TiDB has a weird kind of variation on "read committed" where you get phantom reads (though they're not called that in the documentation, which is actually ambiguous on this point). This is a problem for apps that expect consistency.
I guess will have to wait. The progress they made is obviously impressive but would really help if one could understand the overhead vs conventional RDBMS 5X might be OK
20X not so much.
I think you can still drive some insights. I clicked on the TPC-C results you shared and read their executive summaries.
The Oracle on SPARC cluster (at the top, 2010) performs 30.2M qualified tx/min vs the 16K tx/min in this blog post. The Oracle cluster also costs $30M, which is clearly higher than the Cockroach cluster's cost.
That said, the TPC-C benchmark is new to me. Happy to update this comment if I'm misreading the numbers.
Looking at the TPC-C page all the benchmarks seem quite old and only reflect commercial databases. Do you have any recent TPC-C benchmarks for OLTP databases such as Postgres, MySQL, and Cassandra so I can compare with CockroachDB?
A short note that the total cost of that SPARC cluster was $30 million. You're not misreading those numbers, but it requires a little context.
We're focusing today on our improvements over CockroachDB 1.1, using a small-ish cluster. We'll be showing some more scalability with larger clusters in the coming weeks. If you've found CockroachDB performance slow in the past, you will be pleasantly surprised with this release!
Sure thing. I was primarily answering the question above - in terms of how the numbers in the TPC-C benchmark fit in. I updated my comment to reflect the cost.
I think what's interesting with TPC-C is that you can sort the results based on performance or price/performance. On the price/performance metric, SPARC looks expensive. Dell has a $20K SQL Anywhere cluster that can do 113K tx/min.
I wonder if anyone tried to run these benchmarks on the cloud and how one would calculate total cost of ownership there now.
Yeah, but 1700 cores worth. That's still a lot of $300 boxes. Like qty 53 Sparc T3-2's for example. Which seem to be $1200 to $2k on eBay. And unsupported, end of life, etc.
I'd compare CockroachDB's number to some more recent result with a similar number of cores. (If you can find one)
I'd love to hear from someone who has implemented this in production. Seems like really cool tech, but haven't had a chance to use it on a project yet.
Using it in production currently with dual-write and dual-read to compare perf. I'll do a write-up showing how Cockroach performs to Citus and Cassandra for my use case.
CockroachDB is not (yet) ready for use on OLAP-style workloads. Our performance work has focused on OLTP workloads so far. That said, we do great on OLTP joins (which is a stressed in the TPC-C workload).
You're not going to get better performance for OLAP than MemSQL's columnstore and in-memory rowstore for reference tables to join.
Citus is great if you want the Postgres interface but is still using standard rowstore tables. CockroachDB is similar with rowstore performance but with added distributed consensus overhead. They are both much better for OLTP and sharding. CockroachDB also provides easy high-availability and replication.
Yeah this is what we do. Citus is our single source of truth and powers a few interactive apps and admin panels. These sync hourly to our Memsql cluster which is cstore + ref tables and works amazingly.
Licensed by total RAM of all nodes. $25k/year minimum license now, but you should still talk to them if you're a small company. Regardless of price, I highly recommend the product as one of the most polished data warehouses available for on-prem/self-managed operations.
Sure, but it's far more expensive and not as generally usable as the mysql-flavored MemSQL for common data warehouse scenarios. Performance will be similar but there are differences in functionality like kdb's asof joins that can't really be compared.
kdb+ is much better for numeric/financial analysis apps, especially when used with the integrated query language and interpreter environment.
edit: Looking into it even further, I agree with the co-author's response here that TPC-C is still an appropriate metric. TPC-E is different and newer but still not as widely used.
I don't think it's true to claim that TPC-C is obsolete and subsumed by TPC-E. They are both different OLTP benchmarks, with different characteristics. TPC-C is more write heavy, TPC-E is far more read heavy. It's true that TPC-E is newer, but doesn't deprecate TPC-C (the way TPC-A, for instance, is now deprecated).
We chose TPC-C because it's far more understood than TPC-E in 2018. We wanted to provide understandable benchmarks that can be put into context with other databases. Other databases report TPC-C numbers, so we choose to do so as well.
It seems not used much anymore. Follow that link (http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false...) and sort by either score, or price performance. The vast majority of top results are a decade old or more. I couldnt find anything less than 5 years old without going to second/third pages.
And the top results are usually crazy high number of cores clusters. The Sun example was over 1700 cores.
The problem is that I think it costs money and red tape to submit results and vendors run their own, and you kinda have to take their word on it or reproduce them yourself.
That makes sense. Probably TPC-C died after Oracle basically killed off Sybase and Informix. No more well funded competition to keep up the pace. And no multitude of RISC vendors trying to fend off Linux/X86.
The open source databases didn't play that game, so TPC-C became irrelevant.
Too bad there isn't a good way to directly compare the healthy survivors.
How much and what kind of memory and storage (SATA SSD, NVMe SSD, HDD?) is included in the 3 nodes used for testing? This benchmarking is really interesting but the next level is to understand the cost per tmpC measured. Memory especially and storage is a big component of cost these days.
Short answer: 3 n1-highcpu-16 GCE VMs with Local SSDs attached. We're working on a complete disclosure document, with comprehensive reproduction steps to replicate all our numbers. This document should be out in a couple of weeks. We want to walk you through, command by command, on how to reproduce these numbers, and verify the results for yourself.
Thanks for the short answer. Would be good to know how many local SSDs are attached though for the 850 warehouse scenario. The TPC-C documentation says each warehouse maintains 100,000 items in their stock, but I can't surmise from that how much storage is required to hold 850 warehouses' worth of data. I'm impatient though so let me try to work through the #s myself. I'm using GCP's monthly reserved pricing in the US-Iowa region as a reference as of today's pricing.
A n1-highcpu-16 GCE VM costs $289.84/month. Local SSDs are added at 375GB per drive, and they cost $30/month at $0.08 per GB. I highly doubt you could fit the ~1250 warehouses (what got you the peak TPM-C) on 375GB local SSD, but I have to make assumptions here! So, now you're paying $319.84 per instance per month, or $949.52 for 3 of these instances.
At 16,150 TPC-C, you're paying roughly $0.06 per TPC-C, or, looking at it the other way, you're getting 16.83 TPC-C per dollar spent each month. Is that good? I don't know!
Now, the really interesting question is, is that TPC-C/$ on CRDB 2.0 actually better than TPC-C/$ on CRDB 1.1? The answer lies in how many local SSDs you have to provision to reach that peak throughput. Peak is at ~1300 warehouses on CRDB 2.0, and ~800 warehouses on CRDB 1.1.
Does anyone with more knowledge here know how much storage you need per warehouse in the TPC-C test?
Each warehouse requires about 80 megabytes of storage, unreplicated. 1250 warehouses * 80 MB * 3-way replication = 300 GB, which comfortably fits in a 3-node cluster with 1 local SSD each.
Since you only have 3 nodes, doesn't that mean every range is replicated to every node? Doesn't that make joins trivial (i.e. no different from non-distributed joins)?
I dont like when companies are not transparent about the pricing of their product. If you have a price page, show the price, so that Í can decide if this is relevant for me or not ...
Enterprise pricing generally basically scales with the size of your company/budget and how much trouble they think you'll be worth as a customer.
As a rule of thumb, it starts at just above 1000 USD per unit, and goes up from there.
Many contracts are bespoke orders especially when you're dealing with a small company, so you can't have transparency since there isn't a single product.
People complaining about the name and how they are never going to be able to use it in production because of how gross cockroaches are is definitely the most recurring point.
I think it worked well for them, since everyone remembers the name, specially with all the distributed stores coming out lately.
So. For me, personally, I don't care about the name. I generally care that it's great tech, and it clearly has a great team behind it.
However....
If I worked at CockroachDB, and I saw the negative feedback around the name, I'd take it to heart. At the end of the day, the name is marketing for the hard work of their engineers, and marketing for the engineers that want to use this DB (remember, they need to sell it to their managers who may not be technical).
This issue can show up in unexpected ways. For example, for cloud providers like Compose (IBM company), would they be comfortable with putting "CockroachDB" on the front page? They might if it's good enough, but it's at least a consideration (i.e. another meeting, another stakeholder to convince).
Or how about an enterprise company that's going through due diligence, and when their client asks them about their tech stack do they say "CockroachDB" or do they obfuscate the name by saying "It's a high-performance distributed database". That's a crucial moment to market CockroachDB, and it could get lost. As sad as it is, saying that you're using MySQL "because Oracle" is a point of leverage for some sales people.
Cockroaches are considered pretty durable right? In the 80s I remember the line was always that after the nukes landed there would only be cockroaches and twinkies left.
>I can't see serious engineers working on a company named "Cockroach".
You used to work at a company called Yammer <rolls eyes>. God forbid they're not called tech.ai.io-ify.
I think it's really funny that this comes up almost every time there's a post about CockroachDB. There were also a lot of people commenting on https://news.ycombinator.com/item?id=16693253 about foul language and such. I also remember being at a big conference and one of the speakers being a little cavalier and dropping an obscenity for emphasis - in the meetup comments people were so deeply offended. And let's not forget people's constant flagellation around brainfuck.
Make no mistake: this is the flavor of conservatism and hypocrisy that tech is home to: pretend to be liberally minded but lash out whenever something is just slightly divergent.
People have been complaining about "The GIMP" for literally decades... (edit: in case this isn't clear, Spencer Kimball, the CEO of CockroachLabs, also created "The GIMP" at Berkeley).
Sadly, names do matter. Cockroach seems to be a great DB from my poking at it, but there's definitely a visceral reaction some people have to the name (myself included) that has to be overcome first.
On the other hand, someone would have to be astonishingly thick (or at best cavalier about their business) to take that seriously into account when deciding whether or not to use it.
They need a cute mascot, with a name. Then when people go "ew, it has a bug in it's name !!1!" we can say, "Aw, what's wrong with bugs? You're making Ricky cry."
I don't think its just that it has a bug in its name. I would venture to guess that if the name was Wasp DB, this issue wouldn't exist. It has more to do with the disgust trigger that many people have when they think of cockroaches.
>I also remember being at a big conference and one of the speakers being a little cavalier and dropping an obscenity for emphasis - in the meetup comments people were so deeply offended
That's unfortunate, but I hope your post isn't suggesting they shouldn't have named their product as they did.
Should people who don't like large numbers not use Google? Should people who fear fire not use Firebase? Should people who don't like coffee not use Java? Moreover, should those people suggest the names be changed due to their phobia?
We could number all database servers. Server 1, Server 2, Server 3, Server 4... but 4 is unlucky in China, so we can't use that.
Almost no one fears coffee or large numbers in the same way people fear cockroaches and fire also doesn't generally elicit near the same reaction (there are even types of fires that people react positively to, like camp fires, and many people like the smell of fire). They are allowed to name it whatever they want but it's hard to imagine a name with more negative feelings associated without getting vulgar or ridiculous.
No not at all im just giving my feedback of why I couldn’t use a great product sadly due to my phobia suffering from its name that many ppl might face the same but don’t bother to give feedbacks.
(PS: Yes if your target customers are Chinese you should avoid using number 4 especially in real estate)
Just so you know, it’s possible to overcome a phobia. In fact my startup Fearless helps people to just that, using VR. We have a module for cockroaches specifically. It goes very gradually, starting with a cartoon drawing, and progresses at your own pace. If you’re interested, http://FearlessVR.com
Also, I understand your complaint about the name, as I’ve encountered many, many people with all sorts of phobias at varying levels of extremity. It’s more common than most people think.
On behalf of everyone else who puts up with this constant repeated BS for every story about CockroachDB, we get it. Now shut up already and move on. I am one of the downvoters because at this point literally NO ONE cares about your complaint and this constant background whine contributes nothing. If you can't get over the name of a product then just keep it to yourself.
Granted, phobias are no fun and can be debilitating but seeing as the product has been around for 3 yrs, I don't think they have any plans on renaming it.
Quick question, and don't take it the wrong way as I am truly trying to understand the extend of your commitment of not using it, but what if you were to receive a once in a lifetime job offer and after you start the team decided to switch to this database, would you quit? Your not physically working with any cockroaches so does the phobia extend to even just hearing and/or saying the term? Thx
Yes just hearing or reading the word “cockcroach” gives me a cringe. I tried to google the word “cockroach phobia” just now and damn google shows a bunch of cockroach images that made me press back button immediately and couldn’t read a damn thing.
Im not sure what will happened to my job but given an equal choice i would choose alternatives, job or database.
Why don't you just write a browser script extension to edit out your trigger word? It might be easier than expecting a database company to change the name of what they make.
I like this idea, I wonder if other individuals with similar phobias would benefit from it. Wasn't there one, or perhaps multiple ones, to change variations of Trumps name at one point?
Came here to say the same! I am sorry but this name alone prevents me from exploring this product. Whoever chose that has probably never encountered a cockroach....flying.
Cockroach performance seems to scale linearly, but single-connection performance, especially for small transactions, seems rather dismal. Some casual stress testing against a 3-node cluster on Kubernetes showed that small transactions modifying a single row could take as much as 7-8 seconds, where Postgres would take just a few milliseconds.
The documentation recommends that you batch as many updates as possible, but obviously that doesn't work for low-latency applications like web frontends that need to be able to do small, fine-grained modifications.
reply