Ep11. Designing Data-Intensive Applications - Partitioning

33:46
 
分享
 

Manage episode 332172726 series 2858756
由Player FM以及我们的用户群所搜索的Thomas Wang — 版权由出版商所拥有,而不是Player FM,音频直接从出版商的伺服器串流. 点击订阅按钮以查看Player FM更新,或粘贴收取点链接到其他播客应用程序里。

这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。

🔴 这一期偏重技术话题,我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便,希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。

# Show Notes

  • 📕 Designing Data-Intensive Applications
  • What is partitioning?
    • A partition is a division of a logical database or its constituent elements into distinct independent parts.
  • Main reason: scalability - the query load can be distributed across many processors.
  • Youtube / Vitess scaling story
    • Single MySQL → Add read replica → Write can’t catchup up → Partition
  • How to partition?
  • Partitioning by Key Range (e.g., Bigtable)
    • Assign a continuous range of keys to each partition
    • Pro: range scan is easier, data locality
    • Cons: certain access patterns can lead to hot spots (timestamp)
    • Cons: finding split points and managing rebalancing is hard
  • Partitioning by Hash
    • Good hash function: uniformly distribute keys
    • Con: no easy range queries
  • Cassandra does KKV (partitioning key, sort key, value)
  • Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber
  • Secondary indexes: Local index
    • Efficient write, expensive read
    • ElasticSearch
  • Secondary indexes: Global index
  • Rebalancing partitions
    • Move loads to other nodes
  • Fixed number of partitions
    • New node steals partitions from every existing node
  • Notion: 480 partitions
  • Dynamic partitioning
    • 📈: split partition into 2
    • 📉: merge 2 partitions into 1
  • Fixed number of partitions per node
  • Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
  • Request Routing
    • 3 approaches: nodes talk to each other, separate routing tier, smart client
    • Separate coordination service such as ZooKeeper
  • Notes by xg

# 联系方式

13集单集