BTCD源码分析之database存储

btcd 是一个golang实现的比特币全节点。我曾尝试做一个python版的fork，有些研究。

本文分析其 database 包的基本结构和实现。

基于git代码版本: 7d2daa5bfef28c5e282571bc06416516936115ee

1. database
2. 流程示意图
3. 参考列表
4. End

1 database

database模块提供了bitcoin存储相关功能，通过多层缓存，逻辑上分成metadata和flat block files两部分存储。

1.1 database/*

1.1.1 interface.go

interface.go 定义了database包对外提供的类型和方法主要包括4块: DB, TX, Bucket, Cursor 四个部分。

这4部分的关系是:

DB提供了整体数据库的功能，存储的数据大体上分成元数据和区块数据。
Tx是操作DB获取数据和状态变化的途径（可以提供原子性等），
Bucket是DB存储的元数据的容易，以一些列键值对方式存储。
Cursor是遍历Bucket里键值对的手段。

// DB通过接口和具体实现分隔开

type DB interface {
    // 返回databse driver类型
    Type() string

    // 开始一个数据库事务
    Begin(writable bool) (Tx, error)

    // 在一个只读的事务里执行传入的方法
    View(fn func(tx Tx) error) error

    // 在一个读写的事务里执行传入的方法
    Update(fn func(tx Tx) error) error

    // 关闭数据库
    Close() error
}

// 数据库事务接口
// database在存储数据时，分为metadata数据存储和通常的block文件存储

type Tx interface {
    // 返回元数据
    Metadata() Bucket

    // 存储区块
    StoreBlock(block *btcutil.Block) error

    // 检查区块是否存在
    HasBlock(hash *chainhash.Hash) (bool, error)
    HasBlocks(hashes []chainhash.Hash) ([]bool, error)

    // 获取区块头数据
    FetchBlockHeader(hash *chainhash.Hash) ([]byte, error)
    FetchBlockHeaders(hashes []chainhash.Hash) ([][]byte, error)

    // 获取区块数据
    FetchBlock(hash *chainhash.Hash) ([]byte, error)
    FetchBlocks(hashes []chainhash.Hash) ([][]byte, error)

    // 获取某部分区块数据
    FetchBlockRegion(region *BlockRegion) ([]byte, error)
    FetchBlockRegions(regions []BlockRegion) ([][]byte, error)

    // 提交（元数据和区块存储）事务
    Commit() error

    // 回滚（元数据和区块存储）事务
    Rollback() error
}

Bucket

// Bucket 表示一系列的键值对存储

type Bucket interface {
    // 返回一个子bucket
    Bucket(key []byte) Bucket

    // 创建子bucket
    CreateBucket(key []byte) (Bucket, error)
    CreateBucketIfNotExists(key []byte) (Bucket, error)

    // 删除bucket
    DeleteBucket(key []byte) error

    // 遍历键值对
    ForEach(func(k, v []byte) error) error

    // 遍历子bucket
    ForEachBucket(func(k []byte) error) error

    // 返回键值对之间遍历的游标
    Cursor() Cursor

    // bucket是否可写
    Writable() bool

    // 保存、获取、删除键值对
    Put(key, value []byte) error
    Get(key []byte) []byte
    Delete(key []byte) error
}

Cursor

type Cursor interface {
    // 返回关联的bucket
    Bucket() Bucket

    // 如其名
    Delete() error
    First() bool
    Last() bool
    Next() bool
    Prev() bool
    Seek(seek []byte) bool
    Key() []byte
    Value() []byte
}

1.1.2 driver.go

driver.go定义了用来创建DB的drivers的结构

type Driver struct {
    // 返回driver类型
    DbType string
    // 创建一个DB
    Create func(args ...interface{}) (DB, error)
    // 打开DB
    Open func(args ...interface{}) (DB, error)
    // 日志相关
    UseLogger func(logger btclog.Logger)
}

1.2 database/ffldb

ffldb包是对上面定义的接口的一种实现。

1.2.1 db.go

对 database/interface.go 里 DB,Tx,Bucket,Cursor 接口的实现.

db结构主要是分成两个部分： store 和 cache store部分用文件来保存区块信息， cache部分是用leveldb来保存一些区块和整体的一些元数据。

type db struct {
    writeLock sync.Mutex   // Limit to one write transaction at a time.
    closeLock sync.RWMutex // Make database close block while txns active.
    closed    bool         // Is the database closed?
    store     *blockStore  // Handles read/writing blocks to flat files.
    cache     *dbCache     // Cache layer which wraps underlying leveldb DB.
}

transaction

transation表示了每个db操作的事务的概念，采用 pendingBlocks, pendingBlockData 来缓存事务里的block数据，用 pendingKeys, pendingRemove 来缓存事务里的metadata数据

// transaction represents a database transaction.  It can either be read-only or
// read-write and implements the database.Bucket interface.  The transaction
// provides a root bucket against which all read and writes occur.
type transaction struct {
        managed        bool             // Is the transaction managed?
        closed         bool             // Is the transaction closed?
        writable       bool             // Is the transaction writable?
        db             *db              // DB instance the tx was created from.
        snapshot       *dbCacheSnapshot // Underlying snapshot for txns.
        metaBucket     *bucket          // The root metadata bucket.
        blockIdxBucket *bucket          // The block index bucket.

        // Blocks that need to be stored on commit.  The pendingBlocks map is
        // kept to allow quick lookups of pending data by block hash.
        pendingBlocks    Map[chainhash.Hash]int
        pendingBlockData []pendingBlock

        // Keys that need to be stored or deleted on commit.
        pendingKeys   *treap.Mutable
        pendingRemove *treap.Mutable

        // Active iterators that need to be notified when the pending keys have
        // been updated so the cursors can properly handle updates to the
        // transaction state.
        activeIterLock sync.RWMutex
        activeIters    []*treap.Iterator
}

bucket

bucket通过操作关联的transaction上的 pending_keys, pending_remove 来实现其 Put,Get,Delete,ForEach 等接口方法。

type bucket struct {
    tx *transaction
    id [4]byte
}

bucket通过 Cursor() 方法来生成Cursor对象

func (b *bucket) Cursor() database.Cursor {
        // Ensure transaction state is valid.
        if err := b.tx.checkClosed(); err != nil {
                return &cursor{bucket: b}
        }

        // Create the cursor and setup a runtime finalizer to ensure the
        // iterators are released when the cursor is garbage collected.
        c := newCursor(b, b.id[:], ctFull)
        runtime.SetFinalizer(c, cursorFinalizer)
        return c
}

// newCursor 方法本质是创建键值对的iterators

cursor

cursor是上面bucket创建的各种iterators的组合，然后再其基础上实现 Cursor 接口所需的 First,Last,Next 等遍历方法。

// cursor is an internal type used to represent a cursor over key/value pairs
// and nested buckets of a bucket and implements the database.Cursor interface.
type cursor struct {
    bucket      *bucket
    dbIter      iterator.Iterator
    pendingIter iterator.Iterator
    currentIter iterator.Iterator
}

1.2.2 blockio.go

DB采用文件存储block相关数据的实现。

// blockStore houses information used to handle reading and writing blocks (and
// part of blocks) into flat files with support for multiple concurrent readers.
type blockStore struct {
        // network is the specific network to use in the flat files for each
        // block.
        network wire.BitcoinNet

        // 文件存放目录
        // basePath is the base path used for the flat block files and metadata.
        basePath string

        // 每个文件最大存储blocks数量
        // maxBlockFileSize is the maximum size for each file used to store
        // blocks.  It is defined on the store so the whitebox tests can
        // override the value.
        maxBlockFileSize uint32

        // 一些缓存记录和并发控制
        obfMutex         sync.RWMutex
        lruMutex         sync.Mutex
        openBlocksLRU    *list.List // Contains uint32 block file numbers.
        fileNumToLRUElem map[uint32]*list.Element
        openBlockFiles   map[uint32]*lockableFile

        // 读写文件的油表
        // writeCursor houses the state for the current file and location that
        // new blocks are written to.
        writeCursor *writeCursor

        // 操作底层存储文件的方法
        // These functions are set to openFile, openWriteFile, and deleteFile by
        // default, but are exposed here to allow the whitebox tests to replace
        // them when working with mock files.
        openFileFunc      func(fileNum uint32) (*lockableFile, error)
        openWriteFileFunc func(fileNum uint32) (filer, error)
        deleteFileFunc    func(fileNum uint32) error
}

1.2.3 dbcache.go

dbcache.go 里主要的结构有 dbCache 和 dbCacheSnapshot

dbCache是db关联的存储metadata的结构

type dbCache struct {
    // 底层存储的leveldb
    // ldb is the underlying leveldb DB for metadata.
    ldb *leveldb.DB

    // store is used to sync blocks to flat files.
    store *blockStore

    // 缓存大小和时间相关
    maxSize       uint64
    flushInterval time.Duration
    lastFlush     time.Time

    // 缓存
    cacheLock    sync.RWMutex
    cachedKeys   *treap.Immutable
    cachedRemove *treap.Immutable
}

dbCacheSnapshot是每个Transaction创建时生成的对于dbCache的一个快照:

// dbCacheSnapshot defines a snapshot of the database cache and underlying
// database at a particular point in time.
type dbCacheSnapshot struct {
    dbSnapshot    *leveldb.Snapshot
    pendingKeys   *treap.Immutable
    pendingRemove *treap.Immutable
}

1.2.4 其它

driver.go: 对database/driver.go的里的Driver类型的实现。
ldbtreapiter.go: 相关iterators的包装层
reconcile.go: 保证metadata和flat block files一致性的一些操作

1.3 internal/treap

主要实现了可变的和不可变的treap数据类型，用来存储metadata。

Treap（Tree + Heap）是一种平衡二叉树，不过Treap会记录一个优先级（一般来说是随机生成），即Treap在以关键码构成二叉搜索树的同时，还会按照优先级的高低满足堆的性质。

对于每个结点，该结点的优先级不大于其所有孩子的优先级。Treap引入优先级的原因就是防止BST（二叉搜索树）退化成一条链，从而影响查询效率。

所以对于结点上的关键字来说，它是一颗BST，而对于结点上的优先级来讲，它是一个小顶堆。其平均查找长度为 O(logn) 。

Treap有插入、删除、旋转和查询等基本操作，进而可以实现查询第 k 大和查询关键字 x 排名等功能。

1.3.1 common.go

包含了实现treap的基本数据结构: treapNode 和 parentStack :

// treapNode represents a node in the treap.
type treapNode struct {
    key      []byte
    value    []byte
    priority int
    left     *treapNode
    right    *treapNode
}

type parentStack struct {
    index    int
    items    [staticDepth]*treapNode
    overflow []*treapNode
}

1.3.2 mutable.go, imutable.go

分别实现了可变的treap和不可变的treap。不可变的treap每次插入或者删除均返回新的treap。两者均实现了: Len, Size, Has, Get, Put, Delete, ForEach 等方法用来实现对treap的基本操作

1.3.3 treapiter.go

实现了对treap的遍历方法，包括: First,Last, Next, Prev,Seek,Key, Value, Valid, ForceReseek 等

1.4 cmd/dbtool

提供了命令行操作DB的一些方法，暂时略去

2 流程示意图

一图抵千言:)，下面是我尝试梳理事务流程时画的一张示意图，供参考：

图片较大适合新建窗口浏览：存储结构和流程示意图