More

shikhar · 2024-12-22T11:52:10 1734868330

(Founder) Both S3 Express _One Zone_ appends and Azure's append blobs charge the regular PUT price for appends. It may work for you, but probably not if you want to do smaller writes.

Blob stores will also not let you do tailing reads, like you can with S2.

In AWS, S2's Express storage class takes care of writing to a quorum of 3 zonal buckets for regional durability.

I doubt object stores will go from operating at the level of blobs and byte ranges, to records and sequence numbers. But I could be wrong.

shikhar · 2024-12-21T23:42:47 1734824567

(Founder) We plan to be multi-cloud over time. Kinesis has pretty low ordered throughput limit (i.e. at the level of a stream shard) of 1 MBps, if you need higher. S2 will be cheaper and faster than Kinesis with the Express storage class. S2 is also a more serverless pricing model - closer to S3 - than paying for stream shard hours.

nextworddev · 2024-12-22T00:22:41 1734826961

Thanks. You are right about those points. One thing to probably consider is whether serverless provides enough cost savings for most streaming ingest use cases which need static provisioning since ingest volumes are unpredictable. A better messaging would be that your serverless model can handle bursts well. (for context: used to sell KDA and KDS at AWS as part of AI solutions)

shikhar · 2024-12-21T22:39:50 1734820790

(Founder) No, we want to be in the same cloud regions as customers.

shikhar · 2024-12-21T22:31:01 1734820261

(Founder) Nope! We have a FAQ for this ;)

shikhar · 2024-12-21T22:26:01 1734819961

(Founder) We are happy for S2 API to have alternate implementations, we are considering an in-memory emulator to open source ourselves. It is not a very complicated API. If you would prefer to stick with the Kafka API but benefit from features like S2's storage classes or having a very large number of topics/partitions or high throughput per partition, we are planning an open source Kafka compatibility layer that can be self-hosted, with features like client-side encryption so you can have even more peace of mind.

rswail · 2024-12-22T08:56:40 1734857800

Having a kafka compatible API and S3 storage would be something I would jump to, the savings over MSK would be huge.

If you had a (paid for) API that sat on top of an S3 API for on-prem, that would be fantastic as well.

Kafka is great, but the whole Java ecosystem and the lack of control of what is in the topics and the stuff about co-ordinating the cluster in zookeeper is a management PITA.

evantbyrne · 2024-12-21T22:32:25 1734820345

First-class kafka compatibility could go a long way to making it a justifiable tech choice. When orgs go heavy on event streaming, that code gets _everywhere_, so a vendor off-ramp is needed.

shikhar · 2024-12-21T23:15:02 1734822902

(Founder) That makes sense. We would eventually host the Kafka layer too - and will be able to avoid a hop by inlining our edge service logic in there.

shikhar · 2024-12-21T22:05:07 1734818707

(Founder) There are definitely some interesting possibilities. Pretty hyped about S3 Table (Iceberg) buckets. S2 stream to buffer small writes so you can flush decent size Parquet into the table, and avoid compaction costs.

shikhar · 2024-12-21T21:40:45 1734817245

(Founder) We will be using authenticated encryption with per-basin (our term for bucket) or per-stream keys, but we don't have this yet. This is noted on https://s2.dev/docs/security#encryption

shikhar · 2024-12-21T21:31:54 1734816714

(Founder) Besides simple API,

- Unlimited streams. Current cloud systems limit to a few thousand. With dedicated clusters, few hundred K? If you want a stream per user, you are now dealing with multiple clusters.

- Elastic throughput per stream (i.e. a partition in Kafka) to 125 MiBps append / 500 MiBps realtime read / unlimited in aggregate for catching up. Current systems will have you at tens. And we may grow that limit yet. We are able to live migrate streams in milliseconds while keeping pipelined writes flowing, which gives us a lot of flexibility.

- Concurrency control mechanisms (https://s2.dev/docs/stream#concurrency-control)

shikhar · 2024-12-21T21:55:44 1734818144

Forgot to mention storage classes to tune your latency vs cost tradeoff. That you can even reconfigure - soon we will make that a live migration.

shikhar · 2024-12-21T19:50:25 1734810625

(Founder) so many possibilities! That's what I love about building building blocks. I think we will create an open source layer for an IoT protocol over time (unless community gets to it first), e.g. MQTT. I have to admit I don't know too much about the space.

shikhar · 2024-12-21T19:14:59 1734808499

(Founder) Named pipe that operates at the level of records, is durable regionally, you can read from any sequence, and lets you do concurrency control for writes if you need to.