Hacker News new | comments | show | ask | jobs | submit login
CockroachDB 2.0 Performance Makes Significant Strides (cockroachlabs.com)
300 points by awoods187 10 hours ago | hide | past | web | favorite | 130 comments





Looks very promising! We've looked at Cockroach for a particular project, and we've been concerned that performance wasn't good enough.

Cockroach performance seems to scale linearly, but single-connection performance, especially for small transactions, seems rather dismal. Some casual stress testing against a 3-node cluster on Kubernetes showed that small transactions modifying a single row could take as much as 7-8 seconds, where Postgres would take just a few milliseconds.

The documentation recommends that you batch as many updates as possible, but obviously that doesn't work for low-latency applications like web frontends that need to be able to do small, fine-grained modifications.


7-8 seconds seems extremely long. Human beings performing the raft consensus algorithm using paper and pencil over Skype wouldn't be much slower than that. Are you sure everything was working correctly?

I don't know about you, but it would take me a lot longer than 7 seconds to perform the raft consensus algorithm with paper and pencil.

Who are you to judge? Did you win the Putnam or something?

I think people are generally allowed to judge themselves without industry credentials...

7-8 seconds? Something definitely sounds misconfigured. I've been running a 1.1.x cluster for quite a while and I've never seen a single row transaction take that long. And even the slowest queries took at most ~500ms, and that was with:

  - Replication factor increased to 5x (rather than the 3x default)
  - 8 indexes on the table being modified which also needed to be updated
  - Nodes spread across North America, incurring higher RTT latency between nodes
  - Relatively high contention on the data triggering client-side retries
  - HDD's as the storage medium (RockDB is optimized for SSDs)

" small transactions modifying a single row could take as much as 7-8 seconds"

That's surprising. I wasn't expecting CockroachDB to be really fast, given the constraints they work within. But that sounds more like a bug or config error. Unless perhaps you mean a really high number of processes trying to update the same row at the same time? Like a global counter or something?


Indeed, the stress test updates just one row, which mirrors certain write patterns in our application. I just started this testing, so we'll see what happens when I extend it to more than one row.

Did you use the 2.0 beta version or the latest stable release? They improved performance a lot in the 2.0 beta released this month.

I used 1.1.6. Looking forward to testing 2.0.

> low-latency applications like web frontends

...


We have a collaborative, Google Docs-like application that currently issues a write every time someone types into a text field. Now, clearly it's suboptimal and something that should be optimized to batch the updates, but on the other hand, with Postgres we've had zero incentive to make such an optimization, because it's able to handle thousands of writes per node in real time with no queuing happening on the client. I don't expect this from Cockroach, but I would definitely want low latency.

Lordy, relational databases are not the way to go for that problem... With a single shared resource (document), you're going to be encountering write conflicts left and right.

Have you explored implementing a CRDT based solution like WOOT instead?


Definitely. The application is conceptually a transaction log of field/subfield patches, which would lend itself to something like an LSM, and we're looking at possible alternatives.

CRDTs could be a solution, but from what I gather they require too much context information to be viable for a text editing application. Our app currently uses something similar to OT.


Why write conflicts? Contention, sure, but contention isn't an issue until you have literal tons of it.

From OP's description

  > issues a write every time someone types into a text field
With more than a handful of people, this is getting into conflict territory pretty rapidly, especially if the document is structured as a single row (hopefully it's more granular than that). Time for some back of the envelope maths:

Assuming that an average person types at around 200 words per minute (number pulled from https://www.livechatinc.com/typing-speed-test/#/), that's a character every 300ms on average. With 10 people editing the document, that's a character every 30ms on average, which can easily lead to conflicts if they're all trying to update the same resource.


Perhaps it's event sourcing based. Every time someone types into a field it writes a row that something was typed which is a record of what was typed. Play it back and you have the full document with no conflicts.

Poor UI choices these days do not provide feedback and reload, hence making low-latency necessary to even be tolerated by users.

I like what Cockroach is doing, I'm rooting for them to grow and survive. Unfortunately the only time I hear about it is when they post blogs. I never hear about it from other people.

raises hand we're using them extensively. They're our database of choice that we've paired with Nakama[1] which is an open-source, distributed server. Have nothing but great things to say about the database itself in terms of growing performance and the team behind it :). They've been great to us since day-1.

[1] github.com/heroiclabs/nakama


What kind of workload are you using it for? What's been your biggest win while using it?

A couple of our use-cases include: good KV access (stored user data etc.) and listing blocks of data that has been pre-sorted on disk at insert time (leaderboard records, chat message history etc.). As well, the clustering technology is particularly useful at scale. We work in the games space with some very large games in production, which allows us to spread the load across multiple database nodes and offers us peace of mind regarding redundancy.

The thing I really don't get is why CockroachDB is avoid benchmarking with it's rival tidb (https://github.com/cockroachdb/docs/issues/1412). tidb already pretty mature, used in many big companies (Let's say, Didi, which on the similar scale data with Uber, and banks).

Even if I like CockroachDB's pg sql more, it would be helpful to have the comparison/benchmark to show something more.


TiDB looks promising, but it doesn't have serializable transactions at all, which makes it something of an apples-to-oranges comparison at the moment when it comes to OLTP.

TiDB has a weird kind of variation on "read committed" where you get phantom reads (though they're not called that in the documentation, which is actually ambiguous on this point). This is a problem for apps that expect consistency.


TiDB is much more complicated to run with several moving pieces. Also missing a lot of standard relational features as you can see from the roadmap: https://github.com/pingcap/docs/blob/master/ROADMAP.md

Project idea: globally hosted / managed CockroachDB that lets developers quickly start building small apps cheaply or free using this database.

This database has the potential to dethrone Spanner in a major way.


There’s arguably nothing to dethrone, Spanner is too cost prohibitive for small developers in the first place.


How is this meaningful without detailed setup description? http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false... Looking at this list of results one wonders what those results actually mean?

At the bottom of the article it says that information is coming:

"Note: We have not filed for official certification of our TPC-C results. However, we will post full reproduction steps in a forthcoming whitepaper."


I guess will have to wait. The progress they made is obviously impressive but would really help if one could understand the overhead vs conventional RDBMS 5X might be OK 20X not so much.

I think you can still drive some insights. I clicked on the TPC-C results you shared and read their executive summaries.

The Oracle on SPARC cluster (at the top, 2010) performs 30.2M qualified tx/min vs the 16K tx/min in this blog post. The Oracle cluster also costs $30M, which is clearly higher than the Cockroach cluster's cost.

That said, the TPC-C benchmark is new to me. Happy to update this comment if I'm misreading the numbers.

(Edited to incorporate the reply below.)


Looking at the TPC-C page all the benchmarks seem quite old and only reflect commercial databases. Do you have any recent TPC-C benchmarks for OLTP databases such as Postgres, MySQL, and Cassandra so I can compare with CockroachDB?

A short note that the total cost of that SPARC cluster was $30 million. You're not misreading those numbers, but it requires a little context.

We're focusing today on our improvements over CockroachDB 1.1, using a small-ish cluster. We'll be showing some more scalability with larger clusters in the coming weeks. If you've found CockroachDB performance slow in the past, you will be pleasantly surprised with this release!


Sure thing. I was primarily answering the question above - in terms of how the numbers in the TPC-C benchmark fit in. I updated my comment to reflect the cost.

I think what's interesting with TPC-C is that you can sort the results based on performance or price/performance. On the price/performance metric, SPARC looks expensive. Dell has a $20K SQL Anywhere cluster that can do 113K tx/min.

I wonder if anyone tried to run these benchmarks on the cloud and how one would calculate total cost of ownership there now.


you do realize it's ancient hardware thats $300-400 USD on ebay now.

http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=11...

Yeah, but 1700 cores worth. That's still a lot of $300 boxes. Like qty 53 Sparc T3-2's for example. Which seem to be $1200 to $2k on eBay. And unsupported, end of life, etc.

I'd compare CockroachDB's number to some more recent result with a similar number of cores. (If you can find one)


I meant dell boxes Sparc based boxes retain some value

Congratulations to the cockroach team for putting out an awesome product :)

Would be great to see how it compares against postgres in similar scenarios.


I'd love to hear from someone who has implemented this in production. Seems like really cool tech, but haven't had a chance to use it on a project yet.

Using it in production currently with dual-write and dual-read to compare perf. I'll do a write-up showing how Cockroach performs to Citus and Cassandra for my use case.

We use Citus and Memsql (big data analytics use cases). How does Cockroach handle joins and other OLAP style queries?

CockroachDB is not (yet) ready for use on OLAP-style workloads. Our performance work has focused on OLTP workloads so far. That said, we do great on OLTP joins (which is a stressed in the TPC-C workload).

You're not going to get better performance for OLAP than MemSQL's columnstore and in-memory rowstore for reference tables to join.

Citus is great if you want the Postgres interface but is still using standard rowstore tables. CockroachDB is similar with rowstore performance but with added distributed consensus overhead. They are both much better for OLTP and sharding. CockroachDB also provides easy high-availability and replication.


Yeah this is what we do. Citus is our single source of truth and powers a few interactive apps and admin panels. These sync hourly to our Memsql cluster which is cstore + ref tables and works amazingly.

MemSQL is one of those "ask for a quote" products. What are some rule of thumb estimates for what it costs?

Licensed by total RAM of all nodes. $25k/year minimum license now, but you should still talk to them if you're a small company. Regardless of price, I highly recommend the product as one of the most polished data warehouses available for on-prem/self-managed operations.

$25K/year/box from previous comments

kdb+?

Sure, but it's far more expensive and not as generally usable as the mysql-flavored MemSQL for common data warehouse scenarios. Performance will be similar but there are differences in functionality like kdb's asof joins that can't really be compared.

kdb+ is much better for numeric/financial analysis apps, especially when used with the integrated query language and interpreter environment.


I'm using MemSQL's columnstore myself and the performance is nothing short of amazing. I migrated from Citus DB to it for OLAP workloads.

But for what reasons are you using Citus as well? Would like to know if I am missing something or hear another perspective.

Can you explain your use case? Thanks


We use clickhouse cluster with 1000 nodes and 50000 GB clickstream data.

That's only 50gb per node. Why do you/Clickhouse need so many nodes?

Please post your write-up on HN! I would love to read CockroachDB performing in real world.

Please do, that would be a very popular read I think.

Please do!

Works great, just a tad slow. Hopefully this improves things. Deploying with Kubernetes is pretty seamless as well.

Great stuff. I appreciated being educated about TPC-C, and the whole spirit of not focusing on vanity benchmarks!

Same here, but in educating myself more I found that TPC-C seems to be a somewhat obsolete metric compared to TPC-E (see https://stackoverflow.com/questions/9246939/what-is-the-diff...). Why use the old one here?

edit: Looking into it even further, I agree with the co-author's response here that TPC-C is still an appropriate metric. TPC-E is different and newer but still not as widely used.


I don't think it's true to claim that TPC-C is obsolete and subsumed by TPC-E. They are both different OLTP benchmarks, with different characteristics. TPC-C is more write heavy, TPC-E is far more read heavy. It's true that TPC-E is newer, but doesn't deprecate TPC-C (the way TPC-A, for instance, is now deprecated).

We chose TPC-C because it's far more understood than TPC-E in 2018. We wanted to provide understandable benchmarks that can be put into context with other databases. Other databases report TPC-C numbers, so we choose to do so as well.


It seems not used much anymore. Follow that link (http://www.tpc.org/tpcc/results/tpcc_results.asp?print=false...) and sort by either score, or price performance. The vast majority of top results are a decade old or more. I couldnt find anything less than 5 years old without going to second/third pages.

And the top results are usually crazy high number of cores clusters. The Sun example was over 1700 cores.


The problem is that I think it costs money and red tape to submit results and vendors run their own, and you kinda have to take their word on it or reproduce them yourself.

That makes sense. Probably TPC-C died after Oracle basically killed off Sybase and Informix. No more well funded competition to keep up the pace. And no multitude of RISC vendors trying to fend off Linux/X86.

The open source databases didn't play that game, so TPC-C became irrelevant.

Too bad there isn't a good way to directly compare the healthy survivors.


I wonder how far apart those three nodes are and how much the latency between them matters?

How much and what kind of memory and storage (SATA SSD, NVMe SSD, HDD?) is included in the 3 nodes used for testing? This benchmarking is really interesting but the next level is to understand the cost per tmpC measured. Memory especially and storage is a big component of cost these days.

Short answer: 3 n1-highcpu-16 GCE VMs with Local SSDs attached. We're working on a complete disclosure document, with comprehensive reproduction steps to replicate all our numbers. This document should be out in a couple of weeks. We want to walk you through, command by command, on how to reproduce these numbers, and verify the results for yourself.

Thanks for the short answer. Would be good to know how many local SSDs are attached though for the 850 warehouse scenario. The TPC-C documentation says each warehouse maintains 100,000 items in their stock, but I can't surmise from that how much storage is required to hold 850 warehouses' worth of data. I'm impatient though so let me try to work through the #s myself. I'm using GCP's monthly reserved pricing in the US-Iowa region as a reference as of today's pricing.

A n1-highcpu-16 GCE VM costs $289.84/month. Local SSDs are added at 375GB per drive, and they cost $30/month at $0.08 per GB. I highly doubt you could fit the ~1250 warehouses (what got you the peak TPM-C) on 375GB local SSD, but I have to make assumptions here! So, now you're paying $319.84 per instance per month, or $949.52 for 3 of these instances.

At 16,150 TPC-C, you're paying roughly $0.06 per TPC-C, or, looking at it the other way, you're getting 16.83 TPC-C per dollar spent each month. Is that good? I don't know!

Now, the really interesting question is, is that TPC-C/$ on CRDB 2.0 actually better than TPC-C/$ on CRDB 1.1? The answer lies in how many local SSDs you have to provision to reach that peak throughput. Peak is at ~1300 warehouses on CRDB 2.0, and ~800 warehouses on CRDB 1.1.

Does anyone with more knowledge here know how much storage you need per warehouse in the TPC-C test?


Each warehouse requires about 80 megabytes of storage, unreplicated. 1250 warehouses * 80 MB * 3-way replication = 300 GB, which comfortably fits in a 3-node cluster with 1 local SSD each.

Thank you! My Google-fu couldn't find me the answer

Since you only have 3 nodes, doesn't that mean every range is replicated to every node? Doesn't that make joins trivial (i.e. no different from non-distributed joins)?

Yeah, though from what I understand this benchmark is measuring both transactional read and write performance rather than just join performance.

Transactional writes are likely the slowest thing since they need to talk to all replicas.


Actually hmm, do reads need to talk to all replicas in this case (serializable isolation)?

Nice pun there. Cockroaches do indeed have a habit of making significant strides.

I dont like when companies are not transparent about the pricing of their product. If you have a price page, show the price, so that Í can decide if this is relevant for me or not ...

It's not relevant to you.

Enterprise pricing generally basically scales with the size of your company/budget and how much trouble they think you'll be worth as a customer.

As a rule of thumb, it starts at just above 1000 USD per unit, and goes up from there.

Many contracts are bespoke orders especially when you're dealing with a small company, so you can't have transparency since there isn't a single product.


I would usually agree with you, but cockroach is so new, I doubt they have any type of fixed price. They probably work it out on a 1-by-1 basis.

The only thing the Enterprise offering gives you is priority access.

Enterprise allows access to various features like distributed backup and restore.

Ah, my mistake. I stand corrected.

Before clicking the comments link, I always know what to expect in HN comment section for a CDB post announcing their latest milestone or feature:

A lot of congrats and excitement, questions about who uses it in a production environment, very specific use-case questions, and of course the name.

Weird how predictable the response to one company/tech always is.


People complaining about the name and how they are never going to be able to use it in production because of how gross cockroaches are is definitely the most recurring point. I think it worked well for them, since everyone remembers the name, specially with all the distributed stores coming out lately.

Sometimes i wonder the evolution of cockcroaches' grossness has something to do with its high survivability?

So. For me, personally, I don't care about the name. I generally care that it's great tech, and it clearly has a great team behind it.

However....

If I worked at CockroachDB, and I saw the negative feedback around the name, I'd take it to heart. At the end of the day, the name is marketing for the hard work of their engineers, and marketing for the engineers that want to use this DB (remember, they need to sell it to their managers who may not be technical).

This issue can show up in unexpected ways. For example, for cloud providers like Compose (IBM company), would they be comfortable with putting "CockroachDB" on the front page? They might if it's good enough, but it's at least a consideration (i.e. another meeting, another stakeholder to convince).

Or how about an enterprise company that's going through due diligence, and when their client asks them about their tech stack do they say "CockroachDB" or do they obfuscate the name by saying "It's a high-performance distributed database". That's a crucial moment to market CockroachDB, and it could get lost. As sad as it is, saying that you're using MySQL "because Oracle" is a point of leverage for some sales people.

Is the name worth it? Asking honestly.


came here to comment on the name.

I came here to upvote comments about the name
api 9 hours ago [flagged] [+4]

What drug was the person who drew that graphic on?

Heavy doses of Hieronymous Bosch? https://en.wikipedia.org/wiki/Hieronymus_Bosch


Windows 7 had some similar wallpapers (Scroll down):

https://blogs.msdn.microsoft.com/e7/2009/05/02/a-little-bit-...


[flagged]


We detached this subthread from https://news.ycombinator.com/item?id=16710517 and marked it off-topic.

Cockroaches are considered pretty durable right? In the 80s I remember the line was always that after the nukes landed there would only be cockroaches and twinkies left.

That's not a bad thing to say about a database.


This is the sort of obtuse insistence on narrow denotational semantics that makes everyone avoid engineers at parties ;)

It works like a charm for me, or wait, what's the opposite of a charm?

Denotationally speaking, an engineer?

I think that is where the name comes from. It isn't narrow denotational semantics.

Ignoring the connotation is a textbook case of it!

But you could also call it, "BombproofDB", "NukesafeDB", "GeodistDB" or something that gets the same idea across.

And TwinkieDB would be a copyright violation! ;)

WaterBearDB maybe?

But cockroaches are known for being extremely hardy! That seems like a good quality for a database..

Perhaps you should replace serious with pretentious or shallow?


>I can't see serious engineers working on a company named "Cockroach".

You used to work at a company called Yammer <rolls eyes>. God forbid they're not called tech.ai.io-ify.

I think it's really funny that this comes up almost every time there's a post about CockroachDB. There were also a lot of people commenting on https://news.ycombinator.com/item?id=16693253 about foul language and such. I also remember being at a big conference and one of the speakers being a little cavalier and dropping an obscenity for emphasis - in the meetup comments people were so deeply offended. And let's not forget people's constant flagellation around brainfuck.

Make no mistake: this is the flavor of conservatism and hypocrisy that tech is home to: pretend to be liberally minded but lash out whenever something is just slightly divergent.


Please don't bring in someone's personal details as ammunition in an argument. That breaks HN's civility rule.

https://news.ycombinator.com/newsguidelines.html


People have been complaining about "The GIMP" for literally decades... (edit: in case this isn't clear, Spencer Kimball, the CEO of CockroachLabs, also created "The GIMP" at Berkeley).

Sadly, names do matter. Cockroach seems to be a great DB from my poking at it, but there's definitely a visceral reaction some people have to the name (myself included) that has to be overcome first.


True enough. It's a really bad name.

On the other hand, someone would have to be astonishingly thick (or at best cavalier about their business) to take that seriously into account when deciding whether or not to use it.


They need a cute mascot, with a name. Then when people go "ew, it has a bug in it's name !!1!" we can say, "Aw, what's wrong with bugs? You're making Ricky cry."

I don't think its just that it has a bug in its name. I would venture to guess that if the name was Wasp DB, this issue wouldn't exist. It has more to do with the disgust trigger that many people have when they think of cockroaches.

Ricky can't help that he was born a cockroach. :(

Maybe it's strategic and the eventual not-free commercially supported version will have a more palatable name?

>I also remember being at a big conference and one of the speakers being a little cavalier and dropping an obscenity for emphasis - in the meetup comments people were so deeply offended

Fuck those guys lol


>Fuck those guys lol

indeed


There's nothing wrong with the name. It's quite good, in fact.

[flagged]


That's unfortunate, but I hope your post isn't suggesting they shouldn't have named their product as they did.

Should people who don't like large numbers not use Google? Should people who fear fire not use Firebase? Should people who don't like coffee not use Java? Moreover, should those people suggest the names be changed due to their phobia?

We could number all database servers. Server 1, Server 2, Server 3, Server 4... but 4 is unlucky in China, so we can't use that.


Almost no one fears coffee or large numbers in the same way people fear cockroaches and fire also doesn't generally elicit near the same reaction (there are even types of fires that people react positively to, like camp fires, and many people like the smell of fire). They are allowed to name it whatever they want but it's hard to imagine a name with more negative feelings associated without getting vulgar or ridiculous.

Are there people or cultures who have a fondness for cockroaches?

Wall-E

No not at all im just giving my feedback of why I couldn’t use a great product sadly due to my phobia suffering from its name that many ppl might face the same but don’t bother to give feedbacks.

(PS: Yes if your target customers are Chinese you should avoid using number 4 especially in real estate)


I think it depends on on how common the negative sentiment is. VomitDB anyone?

Just so you know, it’s possible to overcome a phobia. In fact my startup Fearless helps people to just that, using VR. We have a module for cockroaches specifically. It goes very gradually, starting with a cartoon drawing, and progresses at your own pace. If you’re interested, http://FearlessVR.com

Also, I understand your complaint about the name, as I’ve encountered many, many people with all sorts of phobias at varying levels of extremity. It’s more common than most people think.


On behalf of everyone else who puts up with this constant repeated BS for every story about CockroachDB, we get it. Now shut up already and move on. I am one of the downvoters because at this point literally NO ONE cares about your complaint and this constant background whine contributes nothing. If you can't get over the name of a product then just keep it to yourself.

You have a pretty extreme phobia if just seeing the word used for naming the thing that you're afraid of, is a problem for you.

I share your sentiment on its name..

Granted, phobias are no fun and can be debilitating but seeing as the product has been around for 3 yrs, I don't think they have any plans on renaming it.

Quick question, and don't take it the wrong way as I am truly trying to understand the extend of your commitment of not using it, but what if you were to receive a once in a lifetime job offer and after you start the team decided to switch to this database, would you quit? Your not physically working with any cockroaches so does the phobia extend to even just hearing and/or saying the term? Thx


Yes just hearing or reading the word “cockcroach” gives me a cringe. I tried to google the word “cockroach phobia” just now and damn google shows a bunch of cockroach images that made me press back button immediately and couldn’t read a damn thing.

Im not sure what will happened to my job but given an equal choice i would choose alternatives, job or database.


Why don't you just write a browser script extension to edit out your trigger word? It might be easier than expecting a database company to change the name of what they make.

I like this idea, I wonder if other individuals with similar phobias would benefit from it. Wasn't there one, or perhaps multiple ones, to change variations of Trumps name at one point?

It can also be called CRDB.

I will not use this product based on it's name alone, it give me jeepers. Petty? Damn straight it's petty. Doesn't make it less real though.

Came here to say the same! I am sorry but this name alone prevents me from exploring this product. Whoever chose that has probably never encountered a cockroach....flying.

It's just a pun about it being nearly indestructible. Pretty good name imo

Shame. We've got a CockroachDB infestation in both our East and West datacenters and it's been critical in reliably scaling thus far.

Sometimes a CockroachDB egg sac will go down, but we can spawn another which will hatch very quickly.

The only downside is when our ORM burrows deeply into human ears, causing pain and hearing loss.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: