Rendered at 04:43:24 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
conqrr 6 hours ago [-]
Shameless plug. But poor Indie hacker's log store based on similar concepts. Write logs to durable s3/r2 in parquet and querying with duckdb: https://github.com/amr8t/blobsearch
No expensive indexing or compute needed.
a012 5 days ago [-]
So it’s the same spirit of Clickhouse, how does VictoriaLogs scale?
dengolius 2 days ago [-]
The answer to your question was deleted, so I'll post it again:
Vertically on a single machine, the two are quite similar, both fan work out across all CPU cores.
The different is on scaling out.
ClickHouse scales by making you describe the cluster yourself. You decide how many shards to split the data into, how many copies (replicas) each shard keeps, which row goes to which shard.
The copies are kept in sync by a consensus system ClickHouse Keeper. This is flexible but also more works on operators.
VictoriaLogs takes the opposite bet. When logs come in, the inserter just spreads them across all storage nodes on its own, so there is no sharding key for you to design.
When a query runs, the selector asks every storage node in parallel and merges the results. There is no consensus system at all. If you want high availability, you run 2 independent clusters and send your logs to both, rather than having the database copy data internally. So this is simpler and less learning curve.
See more here https://victoriametrics.com/blog/victorialogs-architecture-b...
cassianoleal 8 hours ago [-]
Sounds like the same architecture used by Victoria Metrics storage. I ran it for years on a previous platform and it was so incredibly easy to operate and troubleshoot, and the performance is unreal!
swamp_donkey 7 hours ago [-]
Why did you stop using Victoria metrics and what are you using instead?
cassianoleal 6 hours ago [-]
Because I left that team. :D They're still using it and are happy though.
The platform I'm currently working on uses GCP Cloud Metrics, which is all sorts of bad. Funny enough, today I was troubleshooting something on it and after a good 30-40 min of frustration I decided to ask Gemini.
Gemini not only confirmed that Cloud Metrics is incredibly bad, but it listed 5 different ways in which it's a horrible experience and why. I then added one and it went on an 6 paragraphs rant about in which ways that problem was horrible and frustrating.
I've been advocating for migrating to Victoria Metrics, and I think it's going to happen - there's too many competing priorities at the moment though, so it might take a while.
winrid 10 hours ago [-]
So basically if you have queries that are hard on the query planner, that constant fan out has higher CPU cost than the alternatives.
AlotOfReading 9 hours ago [-]
Taking a look at their LogQL language, I don't see anything that would be particularly hard on the planner. You can't get the fan-out that makes fully relational query planning so difficult with the kind of boolean filters they seem to use. Planning should mostly be a matter of sorting by column cardinality and query optimization so you aren't doing unnecessary operations.
func25 4 days ago [-]
Vertically on a single machine, the two are quite similar, both fan work out across all CPU cores.
The different is on scaling out.
ClickHouse scales by making you describe the cluster yourself. You decide how many shards to split the data into, how many copies (replicas) each shard keeps, which row goes to which shard. The copies are kept in sync by a consensus system ClickHouse Keeper. This is flexible but also more works on operators.
VictoriaLogs takes the opposite bet. When logs come in, the inserter just spreads them across all storage nodes on its own, so there is no sharding key for you to design. When a query runs, the selector asks every storage node in parallel and merges the results. There is no consensus system at all. If you want high availability, you run 2 independent clusters and send your logs to both, rather than having the database copy data internally. So this is simpler and less learning curve. See more here https://victoriametrics.com/blog/victorialogs-architecture-b...
No expensive indexing or compute needed.
Vertically on a single machine, the two are quite similar, both fan work out across all CPU cores. The different is on scaling out.
ClickHouse scales by making you describe the cluster yourself. You decide how many shards to split the data into, how many copies (replicas) each shard keeps, which row goes to which shard. The copies are kept in sync by a consensus system ClickHouse Keeper. This is flexible but also more works on operators.
VictoriaLogs takes the opposite bet. When logs come in, the inserter just spreads them across all storage nodes on its own, so there is no sharding key for you to design. When a query runs, the selector asks every storage node in parallel and merges the results. There is no consensus system at all. If you want high availability, you run 2 independent clusters and send your logs to both, rather than having the database copy data internally. So this is simpler and less learning curve. See more here https://victoriametrics.com/blog/victorialogs-architecture-b...
The platform I'm currently working on uses GCP Cloud Metrics, which is all sorts of bad. Funny enough, today I was troubleshooting something on it and after a good 30-40 min of frustration I decided to ask Gemini.
Gemini not only confirmed that Cloud Metrics is incredibly bad, but it listed 5 different ways in which it's a horrible experience and why. I then added one and it went on an 6 paragraphs rant about in which ways that problem was horrible and frustrating.
I've been advocating for migrating to Victoria Metrics, and I think it's going to happen - there's too many competing priorities at the moment though, so it might take a while.
The different is on scaling out.
ClickHouse scales by making you describe the cluster yourself. You decide how many shards to split the data into, how many copies (replicas) each shard keeps, which row goes to which shard. The copies are kept in sync by a consensus system ClickHouse Keeper. This is flexible but also more works on operators.
VictoriaLogs takes the opposite bet. When logs come in, the inserter just spreads them across all storage nodes on its own, so there is no sharding key for you to design. When a query runs, the selector asks every storage node in parallel and merges the results. There is no consensus system at all. If you want high availability, you run 2 independent clusters and send your logs to both, rather than having the database copy data internally. So this is simpler and less learning curve. See more here https://victoriametrics.com/blog/victorialogs-architecture-b...