Ep11. Designing Data-Intensive Applications - Partitioning Eng Cafe podcast

Ep11. Designing Data-Intensive Applications - Partitioning

4y ago 33:46

内容由Thomas Wang提供。所有播客内容（包括剧集、图形和播客描述）均由 Thomas Wang 或其播客平台合作伙伴直接上传和提供。如果您认为有人在未经您许可的情况下使用您的受版权保护的作品，您可以按照此处概述的流程进行操作https://zh.player.fm/legal。

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

🔴 这一期偏重技术话题，我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便，希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。

# Show Notes

📕 Designing Data-Intensive Applications
What is partitioning?
- A partition is a division of a logical database or its constituent elements into distinct independent parts.
Main reason: scalability - the query load can be distributed across many processors.
Youtube / Vitess scaling story
- Single MySQL → Add read replica → Write can’t catchup up → Partition
How to partition?
Partitioning by Key Range (e.g., Bigtable)
- Assign a continuous range of keys to each partition
- Pro: range scan is easier, data locality
- Cons: certain access patterns can lead to hot spots (timestamp)
- Cons: finding split points and managing rebalancing is hard
Partitioning by Hash
- Good hash function: uniformly distribute keys
- Con: no easy range queries
Cassandra does KKV (partitioning key, sort key, value)
Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber
Secondary indexes: Local index
- Efficient write, expensive read
- ElasticSearch
Secondary indexes: Global index
- Efficient read, expensive write
- Using Global Secondary Indexes in DynamoDB (这里说错了，DynamoDB 支持 20 global secondary indexes per table）
Rebalancing partitions
- Move loads to other nodes
Fixed number of partitions
- New node steals partitions from every existing node
Notion: 480 partitions
Dynamic partitioning
- 📈: split partition into 2
- 📉: merge 2 partitions into 1
Fixed number of partitions per node
- https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30
Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
Request Routing
- 3 approaches: nodes talk to each other, separate routing tier, smart client
- Separate coordination service such as ZooKeeper
Notes by xg

# 联系方式

官网: eng.cafe
微信公众号: Eng Cafe
Twitter: @engcafefm
Youtube: Eng Cafe
小宇宙播客
泛用型播客客户端: eng.cafe/subscribe
Email: [email protected]

16集单集

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

📕 Designing Data-Intensive Applications
What is partitioning?
- A partition is a division of a logical database or its constituent elements into distinct independent parts.
Main reason: scalability - the query load can be distributed across many processors.
Youtube / Vitess scaling story
- Single MySQL → Add read replica → Write can’t catchup up → Partition
How to partition?
Partitioning by Key Range (e.g., Bigtable)
- Assign a continuous range of keys to each partition
- Pro: range scan is easier, data locality
- Cons: certain access patterns can lead to hot spots (timestamp)
- Cons: finding split points and managing rebalancing is hard
Partitioning by Hash
- Good hash function: uniformly distribute keys
- Con: no easy range queries
Cassandra does KKV (partitioning key, sort key, value)
Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber
Secondary indexes: Local index
- Efficient write, expensive read
- ElasticSearch
Secondary indexes: Global index
- Efficient read, expensive write
- Using Global Secondary Indexes in DynamoDB (这里说错了，DynamoDB 支持 20 global secondary indexes per table）
Rebalancing partitions
- Move loads to other nodes
Fixed number of partitions
- New node steals partitions from every existing node
Notion: 480 partitions
Dynamic partitioning
- 📈: split partition into 2
- 📉: merge 2 partitions into 1
Fixed number of partitions per node
- https://www.datastax.com/blog/new-token-allocation-algorithm-cassandra-30
Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
Request Routing
- 3 approaches: nodes talk to each other, separate routing tier, smart client
- Separate coordination service such as ZooKeeper
Notes by xg

# 联系方式

官网: eng.cafe
微信公众号: Eng Cafe
Twitter: @engcafefm
Youtube: Eng Cafe
小宇宙播客
泛用型播客客户端: eng.cafe/subscribe
Email: [email protected]

值得一听的播客

Eng Cafe « »
Ep11. Designing Data-Intensive Applications - Partitioning

Ep11. Designing Data-Intensive Applications - Partitioning

值得一听的播客

所有剧集

欢迎使用Player FM

快速参考指南

类似 Eng Cafe 的节目

值得一听的播客

Eng Cafe « » Ep11. Designing Data-Intensive Applications - Partitioning

Ep11. Designing Data-Intensive Applications - Partitioning

值得一听的播客

欢迎使用Player FM

类似 Eng Cafe 的节目

快速参考指南

Eng Cafe « »
Ep11. Designing Data-Intensive Applications - Partitioning