rebuild: rocksdb dbbench 运行成功

This commit is contained in:
2026-03-11 04:23:54 +00:00
parent 470412a1c2
commit a153ca5040
12 changed files with 734 additions and 837 deletions

784
README.md
View File

@@ -1,640 +1,218 @@
# ZVFS
## usage ZVFS 是一个基于 `SPDK Blobstore` 的轻量级用户态文件系统原型,
```shell 通过 `LD_PRELOAD` 拦截常见 POSIX 文件 API`/zvfs` 路径下的文件 I/O 转换为 Blob I/O。
目标是让上层应用尽量少改动地复用阻塞式文件接口,同时接近 SPDK 在低队列深度QD≈1场景的性能上限。
## 1. 项目结构
```text
zvfs/
├── src/
│ ├── hook/ # POSIX API hook 层open/read/write/...
│ ├── fs/ # inode/path/fd 运行时元数据管理
│ ├── spdk_engine/ # SPDK Blobstore 封装
│ ├── common/ # 对齐与缓冲区工具函数
│ ├── config.h # 默认配置JSON、bdev、xattr key 等)
│ └── Makefile # 产出 libzvfs.so
├── tests/
│ ├── hook/ # hook API 语义测试
│ ├── ioengine_test/ # Blob 引擎单元测试
│ └── Makefile
├── scripts/ # db_bench/hook 测试辅助脚本
├── spdk/ # SPDK 子模块
└── README.md
```
## 2. 核心架构
### 2.1 分层
当前实现:
```text
App (open/read/write/fstat/...)
-> LD_PRELOAD Hook (src/hook)
-> ZVFS Runtime Metadata (src/fs)
-> SPDK Engine (src/spdk_engine)
-> SPDK Blobstore
-> bdev (Malloc/NVMe)
```
目标架构Daemon + IPC
```text
App (multi-process, e.g. PostgreSQL)
-> LD_PRELOAD Hook Client
-> IPC (Unix Domain Socket)
-> zvfs daemon
-> metadata manager
-> SPDK worker threads
-> SPDK Blobstore / bdev
```
### 2.2 目标架构简版HOOK 层 + daemon 层)
- `HOOK 层`
- 拦截 `/zvfs` 路径的 POSIX API 并同步发起 IPC 请求。
- 维护本地最小状态(如 `fd -> remote_handle_id`)。
- 对非 `/zvfs` 路径继续透传到 `real_*` syscallPOSIX passthrough
- `daemon 层`
- 独占 SPDK 资源(`spdk_env/blobstore/spdk_thread`)。
- 统一处理元数据与并发控制path/inode/handle
- 接收 IPC 请求并执行实际 I/O返回 POSIX 风格结果与 errno。
### 2.3 元数据与数据映射
- 文件数据:存储在 SPDK blob 中。
- 文件到 blob 的映射:写入真实文件的 `xattr`key: `user.zvfs.blob_id`)。
- 运行时维护三张表:
- `inode_table``blob_id -> inode`
- `path_cache``path -> inode`
- `fd_table``fd -> open_file`
### 2.4 当前实现的 I/O 路径要点
- `blob_read/blob_write` 统一走按 `io_unit_size` 对齐的 DMA 缓冲。
- 非对齐写会触发读改写RMW先读对齐块再覆盖局部写回。
- `readv/writev` 在 hook 层会做聚合,减少多次 I/O 提交。
- `fsync/fdatasync` 对 zvfs fd 调用 `blob_sync_md``sync_file_range` 在 zvfs 路径直接返回成功。
## 3. 构建
> 下面命令以仓库根目录为 `/home/lian/try/zvfs` 为例。
### 3.1 初始化并构建 SPDK
```bash
git submodule update --init --recursive git submodule update --init --recursive
cd spdk cd spdk
./scripts/pkgdep.sh ./scripts/pkgdep.sh
./configure --with-shared ./configure --with-shared
make -j make -j"$(nproc)"
make
# sometimes dd if=/dev/zero of=/dev/nvme0n1 bs=1M count=10
LD_PRELOAD=./libzvfs.so ./func_test
``` ```
## 测试 ### 3.2 构建 ZVFS 与测试
### 总结
由于是目标是hook阻塞的API相当于队列深度为1。
队列深度为1的情况下spdk测试工具spdk_nvme_perf的测试结果 ```bash
1. iosize = 4K100MiB/s cd /home/lian/try/zvfs
2. ioszie = 128K1843MiB/s make -j"$(nproc)"
make test -j"$(nproc)"
zvfs的测试结果
1. iosize = 4K95MiB/s
2. ioszie = 128K1662MiB/s
相当于spdk测试工具读写的90%性能。
对比系统调用:
1. O_DIRECT
1. 大块4K43MiB/s
2. 小块128K724MiB/s
2. !O_DIRECT
1. 大块4K1460MiB/s
2. 小块128K1266MiB/s
非对齐情况下,写入性能/2因为需要read-update-write。
### spdk_nvme_perf 性能基准测试
```shell
cd /home/lian/share/10.1-spdk/spdk
export LD_LIBRARY_PATH=/home/lian/share/10.1-spdk/zvfs/spdk/build/lib:/home/lian/share/10.1-spdk/zvfs/spdk/dpdk/build/lib:$LD_LIBRARY_PATH
export PATH=/home/lian/share/10.1-spdk/zvfs/spdk/build/bin:$PATH
./build/bin/spdk_nvme_perf \
-r 'trtype:PCIe traddr:0000:03:00.0' \
-q 1 -o 4096 -w randwrite -t 5
root@ubuntu:/home/lian/share/10.1-spdk/spdk# ./build/bin/spdk_nvme_perf -r 'trtype:PCIe traddr:0000:03:00.0' -q 1 -o 4096 -w randwrite -t 5
Initializing NVMe Controllers
Attached to NVMe Controller at 0000:03:00.0 [15ad:07f0]
Associating PCIE (0000:03:00.0) NSID 1 with lcore 0
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0000:03:00.0) NSID 1 from core 0: 25765.92 100.65 38.77 16.58 802.32
========================================================
Total : 25765.92 100.65 38.77 16.58 802.32
./build/bin/spdk_nvme_perf \
-r 'trtype:PCIe traddr:0000:03:00.0' \
-q 32 -o 4096 -w randwrite -t 5
root@ubuntu:/home/lian/share/10.1-spdk/spdk# ./build/bin/spdk_nvme_perf -r 'trtype:PCIe traddr:0000:03:00.0' -q 32 -o 4096 -w randwrite -t 5
Initializing NVMe Controllers
Attached to NVMe Controller at 0000:03:00.0 [15ad:07f0]
Associating PCIE (0000:03:00.0) NSID 1 with lcore 0
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0000:03:00.0) NSID 1 from core 0: 80122.94 312.98 399.36 36.31 2225.64
========================================================
Total : 80122.94 312.98 399.36 36.31 2225.64
./build/bin/spdk_nvme_perf \
-r 'trtype:PCIe traddr:0000:03:00.0' \
-q 1 -o 131072 -w write -t 5
root@ubuntu:/home/lian/share/10.1-spdk/spdk# ./build/bin/spdk_nvme_perf -r 'trtype:PCIe traddr:0000:03:00.0' -q 1 -o 131072 -w write -t 5
Initializing NVMe Controllers
Attached to NVMe Controller at 0000:03:00.0 [15ad:07f0]
Associating PCIE (0000:03:00.0) NSID 1 with lcore 0
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0000:03:00.0) NSID 1 from core 0: 14746.80 1843.35 67.79 40.16 4324.96
========================================================
Total : 14746.80 1843.35 67.79 40.16 4324.96
./build/bin/spdk_nvme_perf \
-r 'trtype:PCIe traddr:0000:03:00.0' \
-q 32 -o 131072 -w write -t 5
root@ubuntu:/home/lian/share/10.1-spdk/spdk# ./build/bin/spdk_nvme_perf -r 'trtype:PCIe traddr:0000:03:00.0' -q 32 -o 131072 -w write -t 5
Initializing NVMe Controllers
Attached to NVMe Controller at 0000:03:00.0 [15ad:07f0]
Associating PCIE (0000:03:00.0) NSID 1 with lcore 0
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0000:03:00.0) NSID 1 from core 0: 21997.40 2749.68 1455.09 96.64 26152.13
========================================================
Total : 21997.40 2749.68 1455.09 96.64 26152.13
```
### 系统调用
#### no O_DIRECT 小块
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# ./func_test
=== test_single_file_perf ===
Path : /tmp/test.dat
IO size : 4 KB
Max file: 2048 MB
Duration: 10 sec
WRITE:
total : 12668.9 MB
time : 10.003 sec
IOPS : 324211 ops/sec
BW : 1266.45 MB/s
READ:
total : 7664.5 MB
time : 10.000 sec
IOPS : 196210 ops/sec
BW : 766.44 MB/s
=== all tests PASSED ===
```
#### no O_DIRECT 大块
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# ./func_test
=== test_single_file_perf ===
Path : /tmp/test.dat
IO size : 128 KB
Max file: 2048 MB
Duration: 10 sec
WRITE:
total : 14609.5 MB
time : 10.000 sec
IOPS : 11688 ops/sec
BW : 1460.95 MB/s
READ:
total : 8138.6 MB
time : 10.000 sec
IOPS : 6511 ops/sec
BW : 813.85 MB/s
=== all tests PASSED ===
``` ```
#### no O_DIRECT 随机 对齐 大块 产物:
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs/zvfs# ./func_test
=== test_single_file_random_perf === - `src/libzvfs.so`
Path : /tmp/test.dat - `tests/bin/hook_api_test`
IO size : 128 KB - `tests/bin/ioengine_single_blob_test`
Range : 2048 MB - `tests/bin/ioengine_multi_blob_test`
Duration: 10 sec - `tests/bin/ioengine_same_blob_mt_test`
RANDOM WRITE: ## 4. 运行与验证
total : 8930.8 MB
time : 10.001 sec
IOPS : 7144 ops/sec
BW : 893.01 MB/s
RANDOM READ: ### 4.1 Hook API 语义测试
total : 8238.9 MB
time : 10.000 sec
IOPS : 6591 ops/sec
BW : 823.89 MB/s
=== all tests PASSED === ```bash
``` mkdir -p /zvfs
#### no O_DIRECT 随机 非对齐 大块 cd /home/lian/try/zvfs
```shell LD_PRELOAD=$PWD/src/libzvfs.so ZVFS_TEST_ROOT=/zvfs ./tests/bin/hook_api_test
root@ubuntu:/home/lian/share/10.1-spdk/zvfs/zvfs# ./func_test
=== test_single_file_random_perf ===
Path : /tmp/test.dat
IO size : 128 KB
Range : 2048 MB
Duration: 10 sec
RANDOM WRITE:
total : 5964.4 MB
time : 10.000 sec
IOPS : 4771 ops/sec
BW : 596.43 MB/s
RANDOM READ:
total : 6607.8 MB
time : 10.000 sec
IOPS : 5286 ops/sec
BW : 660.77 MB/s
=== all tests PASSED ===
``` ```
#### O_DIRECT 小块 覆盖点包括:
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# ./func_test
=== test_single_file_perf === - `open/openat/rename/unlink`
Path : /tmp/test.dat - `read/write/pread/pwrite/readv/writev/pwritev`
IO size : 4 KB - `fstat/lseek/ftruncate`
Max file: 2048 MB - `fcntl/ioctl(FIONREAD)`
Duration: 10 sec - `fsync/fdatasync`
WRITE: ### 4.2 SPDK 引擎测试
total : 434.5 MB
time : 10.000 sec
IOPS : 11122 ops/sec
BW : 43.45 MB/s
READ: ```bash
total : 373.8 MB cd /home/lian/try/zvfs
time : 10.000 sec SPDK_BDEV_NAME=Malloc0 ./tests/bin/ioengine_single_blob_test
IOPS : 9568 ops/sec SPDK_BDEV_NAME=Malloc0 ./tests/bin/ioengine_multi_blob_test
BW : 37.38 MB/s SPDK_BDEV_NAME=Malloc0 ./tests/bin/ioengine_same_blob_mt_test
=== all tests PASSED ===
```
#### O_DIRECT 大块
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# ./func_test
=== test_single_file_perf ===
Path : /tmp/test.dat
IO size : 128 KB
Max file: 2048 MB
Duration: 10 sec
WRITE:
total : 7245.4 MB
time : 10.000 sec
IOPS : 5796 ops/sec
BW : 724.53 MB/s
READ:
total : 9006.5 MB
time : 10.000 sec
IOPS : 7205 ops/sec
BW : 900.64 MB/s
=== all tests PASSED ===
``` ```
### SPDK ## 5. 关键环境变量
#### 非对齐
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# LD_PRELOAD=./libzvfs.so ./func_test /zvfs
=== test_single_file_perf === - `SPDK_BDEV_NAME`:选择后端 bdev默认 `Malloc0`)。
Path : /zvfs/file.dat - `ZVFS_BDEV``zvfs_ensure_init` 使用的 bdev 名称(未设置时使用 `config.h` 默认值)。
IO size : 128 KB - `SPDK_JSON_CONFIG`:覆盖默认 SPDK JSON 配置路径。
Max file: 2048 MB
Duration: 10 sec
WRITE: ## 6. 性能说明(仅保留趋势)
total : 10304.0 MB
time : 10.000 sec
IOPS : 8243 ops/sec
BW : 1030.40 MB/s
READ: `README` 历史压测数据来自旧版本,不能直接当作当前版本结论;但可作为设计趋势参考:
total : 17788.5 MB
time : 10.000 sec
IOPS : 14231 ops/sec
BW : 1778.85 MB/s
=== all tests PASSED === - 目标工作负载为阻塞 API近似 `QD=1`
- 旧数据下ZVFS 在 `QD=1` 时约达到 `spdk_nvme_perf``90%~95%`
- 4K`95 MiB/s` vs `100 MiB/s`
- 128K`1662 MiB/s` vs `1843 MiB/s`
- 相对同机 `O_DIRECT` 路径,旧数据写带宽约有 `2.2x~2.3x` 提升。
- 非对齐写存在 RMW吞吐明显下降旧数据常见接近对齐写的一半
如果需要用于对外汇报,请重新在当前 commit 与固定硬件环境下复测。
## 7. 当前限制
- 仅拦截 `/zvfs` 路径。
- `mmap` 对 zvfs fd 当前返回 `ENOTSUP`(建议上层关闭 mmap 读写)。
- `dup/dup2/dup3` 对 zvfs fd 当前返回 `ENOTSUP`
- `rename``/zvfs` 与非 `/zvfs` 路径返回 `EXDEV`
- `fallocate(FALLOC_FL_PUNCH_HOLE)` 未实现。
## 8. 后续建议
- 补齐 mmap 路径mmap table + 脏页回写)。
- 完善多线程/高并发下的语义与压测基线。
- 增加版本化 benchmark 报告,避免 README 中历史数据失真。
## 9. Blob Store 血泪教训
### Owner Thread 绑定
blobstore内部负责并发控制让所有metadata操作都在一个线程上执行回调固定绑定给创建blobstore的线程。所以多线程模型下不是send给谁谁就能poll到回调的。
正确架构:
``` ```
#### 全对齐大块 metadata thread
```shell spdk_bs_load()
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# LD_PRELOAD=./libzvfs.so ./func_test /zvfs resize
delete
sync_md
=== test_single_file_perf === worker thread
Path : /zvfs/file.dat blob_io_read
IO size : 128 KB blob_io_write
Max file: 2048 MB ```
Duration: 10 sec
WRITE: ### spdk_for_each_channel() Barrier
total : 16624.4 MB 某些 metadata 操作非常慢:
time : 10.000 sec
IOPS : 13299 ops/sec
BW : 1662.43 MB/s
READ:
total : 16430.8 MB
time : 10.000 sec
IOPS : 13145 ops/sec
BW : 1643.07 MB/s
=== all tests PASSED ===
``` ```
resize
#### 全对齐小块 delete
```shell unload
root@ubuntu:/home/lian/share/10.1-spdk/zvfs# LD_PRELOAD=./libzvfs.so ./func_test /zvfs snapshot
=== test_single_file_perf ===
Path : /zvfs/file.dat
IO size : 4 KB
Max file: 2048 MB
Duration: 10 sec
WRITE:
total : 944.5 MB
time : 10.000 sec
IOPS : 24179 ops/sec
BW : 94.45 MB/s
READ:
total : 982.8 MB
time : 10.000 sec
IOPS : 25159 ops/sec
BW : 98.28 MB/s
=== all tests PASSED ===
``` ```
这些操作内部会调用spdk_for_each_channel()
#### 对齐随机写(大块) 语义:在所有 io_channel 所属线程执行 callback
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs/zvfs# LD_PRELOAD=./libzvfs.so ./func_test /zvfs
=== test_single_file_random_perf ===
Path : /zvfs/file.dat
IO size : 128 KB
Range : 2048 MB
Duration: 10 sec
RANDOM WRITE:
total : 17461.8 MB
time : 10.000 sec
IOPS : 13969 ops/sec
BW : 1746.17 MB/s
RANDOM READ:
total : 17439.5 MB
time : 10.000 sec
IOPS : 13952 ops/sec
BW : 1743.95 MB/s
=== all tests PASSED ===
```
#### 非对齐随机写(大块)
```shell
root@ubuntu:/home/lian/share/10.1-spdk/zvfs/zvfs# LD_PRELOAD=./libzvfs.so ./func_test /zvfs
=== test_single_file_random_perf ===
Path : /zvfs/file.dat
IO size : 128 KB
Range : 2048 MB
Duration: 10 sec
RANDOM WRITE:
total : 7500.2 MB
time : 10.000 sec
IOPS : 6000 ops/sec
BW : 750.02 MB/s
RANDOM READ:
total : 15143.8 MB
time : 10.000 sec
IOPS : 12115 ops/sec
BW : 1514.35 MB/s
=== all tests PASSED ===
```
## SPDK
1. blob_store: blob仓库管理多个blob对象。
2. blob: 存储对象,逻辑上连续,物理上不一定连续。相当于文件。
3. cluster: 分配单元,一个 blob 可以由多个 cluster 构成,扩容即分配新的 cluster。相当于文件系统的block group。
4. page: IO单元一个 cluster 等于多个 page。
文件系统
## 架构设计
```scss
| 应用程序
| (POSIX API: open/read/write/close)
| LD_PRELOAD 拦截层
| (简单路径判断和转发到zvfs)
| zvfs 文件系统层
| (blob 操作)
| SPDK Blobstore
| 块设备 (Malloc0)
```
### 磁盘布局
```scss
BlobStore:
|—— Super Blob元数据使用SPDK的Super Blob锚定
|——超级块
|——目录项/目录日志
|—— Blob 1 (文件A...)
|—— Blob 2 (文件B...)
|—— Blob N (文件C...)
```
### 数据结构
#### Super Blob元数据
```scss
[超级块]
- magic_number: 0x5A563146 (ZV1F)
- version: 1
[目录项]
- filename[256]: 文件名
- blob_id: 对应的数据blob ID
- file_size: 文件实际大小(字节)
- allocated_clusters: 已分配的cluster数量
- is_valid: 标记是否有效(用于删除)
```
类似
```c ```c
/* 目录项(内存中的目录) */ for each channel:
typedef struct { send_msg(channel->thread)
char filename[256];
spdk_blob_id blob_id;
uint64_t file_size; // 文件逻辑大小(字节)
uint32_t allocated_clusters; // 已分配的cluster数量
bool is_valid; // false 表示已删除
int32_t open_count; // 打开的文件句柄数量
} zvfs_dirent_t;
/* 文件系统全局结构 */
typedef struct zvfs {
struct spdk_blob_store *bs;
struct spdk_io_channel *channel;
struct spdk_blob *super_blob; // 承载目录日志的blob
uint64_t io_unit_size; // page大小单位字节
/* 目录 */
zvfs_dirent_t *dirents; // 目录项数组 #define ZVFS_MAX_FILES 1024
uint32_t dirent_count; // 当前有效项数
/* 伪FD表 */
struct zvfs_file *fd_table[ZVFS_MAX_FD]; // // e.g., #define ZVFS_MAX_FD 64
int fd_base; // 伪FD起始值如10000
int openfd_count;
/* 元数据 */
uint32_t magic; // 0x5A563146 (ZV1F)
uint32_t version; // 1
} zvfs_t;
/* 打开的文件句柄 */
typedef struct zvfs_file {
zvfs_t *fs;
struct spdk_blob *blob;
zvfs_dirent_t *dirent; // 指回目录项 file_size/allocated_clusters
uint64_t current_offset; // 当前读写位置
int flags; // O_RDONLY / O_RDWR / O_CREAT 等
int pseudo_fd;
/* 临时DMA缓冲区可选每个file一个避免每次malloc */
void *dma_buf;
uint64_t dma_buf_size;
} zvfs_file_t;
``` ```
### 工作流程 #### 问题1持有 Channel 的 Thread 不 poll
#### mount 如果所属线程不poll就会卡住。
hook POSIX API没有很好的调用时机单线程目前采用懒加载。 #### 问题2线程退出 Channel 没有释放
```scss 永远卡住
1. [创建块设备]
- spdk_bdev_create_bs_dev_ext
2. [初始化文件系统]
- spdk_bs_init 或者 spdk_bs_load已有数据时
- spdk_bs_get_io_unit_size 获取io单元大小(page)
- spdk_bs_alloc_io_channel 分配blobstore的读写入口
3. [读取元数据]
- spdk_bs_get_super_blob 获取 Super Blob ID
- spdk_bs_open_blob 打开 Super Blob
- 读取超级块,校验 magic
- 读取目录项数组,加载到内存 dirents
4. [创建zvfs_t结构体]
- 创建 zvfs_t 结构体
- 填充 bs/channel/super_blob/dirents 等字段
```
#### open
##### O_RDONLY / O_RDWR
```scss
1. [文件名查找]
- 遍历 dirents匹配 filename 且 is_valid=true
- 找不到返回 -ENOENT
2. [打开blob]
- spdk_bs_open_blob(dirent->blob_id)
- dirent->open_count++
- fs->openfd_count++
3. [分配文件句柄]
- 创建 zvfs_file_tdirent 指针指向目录项
- 分配伪FD写入 fd_table
5. [返回伪FD]
```
##### O_CREAT ### IO 操作的回调行为与 metadata 操作不同
```scss spdk_blob_io_read / spdk_blob_io_write 的回调,是通过传入的 io_channel 投递的,回调回到分配该 channel 的 thread。
1. [文件名查找]
- 遍历 dirents若 filename 已存在且 is_valid=true返回 -EEXIST
- 找一个 is_valid=false 的空槽位没有空槽则追加dirent_count < max_files
2. [创建blob]
- spdk_bs_create_blob → 得到 blob_id
- spdk_bs_open_blob → 得到 blob 句柄
- spdk_blob_resize 初始分配空间
- spdk_blob_sync_md 持久化 cluster 分配
3. [写目录]
- 填充 filename/blob_id/file_size=0/is_valid=true
- dirent->open_count = 1
4. [创建文件句柄]
- 创建 zvfs_file_t
- 分配伪FD写入 fd_table
5. [返回伪FD]
``` ### 超时任务
> 说明目录变更只写内存unmount 时统一持久化。 设置超时就免不了超时后回调成功执行,超时后回调仍会触发,存在 UAF 风险
### read
读写都以字节为单位offset / count 单位为字节;根据 io_unit_size 做对齐计算。
```scss
1. [参数]
- fd
- buffer
- count
- offset(隐含)
2. [边界检查]
- 实际可读 = min(count, dirent->file_size - current_offset)
- 实际可读为0则返回0
3. [计算Blob位置]
- start_page = current_offset / io_unit_size
- page_offset = current_offset % io_unit_size
- num_pages = (page_offset + 实际可读 + io_unit_size - 1) / io_unit_size
4. [DMA读取]
- 非对齐读(offset != 0 || count 不是整页)
- 需要DMA临时缓冲区spdk_dma_zmalloc
- spdk_blob_io_read(blob, channel, dma_buffer, start_page, num_pages, ...)
- 从 dma_buffer + page_offset 拷贝到用户 buffer
- 对齐
- 仍使用DMA缓冲区执行读取再拷贝到用户buffer
5. [更新offset]
- current_offset += 实际可读
6. [返回实际读取字节数]
```
> 说明SPDK需要DMA可用的内存应用提供的用户缓冲区通常不满足要求。即便对齐也不能直接提交给spdk_blob_io_*应使用DMA缓冲作为跳板未来通过注册内存池可优化直传。
### write
```scss
1. [参数]
- fd
- buffer
- count
- offset(隐含)
2. [检查空间是否足够]
- 需要大小 = current_offset + count
- 若超过 allocated_clusters 对应容量:
- spdk_blob_resize 扩容
- spdk_blob_sync_md
- 更新 dirent->allocated_clusters
3. [计算写入位置]
- start_page / page_offset / num_pages同read
4. [DMA写入]
- 非对齐写(offset != 0 || count 不是整页)
- 读取涉及的首尾page到DMA临时缓冲区
- 修改对应位置的数据
- 写回spdk_blob_io_write(blob, channel, dma_buffer, start_page, num_pages, ...)
- 对齐
- 仍通过DMA缓冲区提交写入
5. [更新状态]
- current_offset += count
- dirent->file_size = max(dirent->file_size, current_offset)
6. [返回写入字节数]
```
### close
```scss
1. [关闭Blob]
- spdk_blob_close(file->blob)
- dirent->open_count--
- fs->openfd_count++
- 若 open_count == 0 且 is_valid == false已unlinkspdk_bs_delete_blob, 清空dirent
- 若 openfd_count == 0 则 unmount
2. [释放缓冲区]
- 释放 dma_buf
- 清除 fd_table[pseudo_fd]
- free(zvfs_file_t)
3. [返回0]
```
### unlink
```scss
1. [查找目录项]
- 遍历 dirents匹配 filename 且 is_valid=true
- 找不到返回 -ENOENT
2. [标记删除]
- dirent->is_valid = false
3. [判断是否立即删除]
- open_count == 0spdk_bs_delete_blob清空该槽位
- open_count > 0延迟最后一个 close 负责删除
4. [返回0]
```
### unmount
```scss
1. [关闭channel]
- spdk_bs_free_io_channel
2. [关闭BlobStore]
- spdk_bs_unload
3. [释放FS]
- free(fs)
```
### 其他方案
如果不使用`LD_PRELOAD`hook可以使用FUSE。\
FUSE是一种内核文件系统程序挂载在文件目录上对这个目录的访问会使用这个文件系统程序。\
文件系统程序会将请求转发给应用层程序这里的应用层程序可以是SPDK。这样就不用管其他的操作。

View File

@@ -1,5 +1,6 @@
#!/usr/bin/env bash #!/usr/bin/env bash
set -euo pipefail set -euo pipefail
env -u LD_PRELOAD rm -rf /zvfs/rocksdb_manual || true
# ========================= # =========================
# Manual Config (edit here) # Manual Config (edit here)
@@ -19,9 +20,11 @@ DB_PATH="/zvfs/rocksdb_manual"
BENCHMARKS="fillrandom,readrandom" BENCHMARKS="fillrandom,readrandom"
# key数 # key数
NUM=1000000 # NUM=1000000
NUM=50000
# 线程数 # 线程数
THREADS=1 THREADS=2
# 随机种子 # 随机种子
SEED=1 SEED=1

View File

@@ -25,6 +25,7 @@
// waiter // waiter
#define WAITER_MAX_TIME 10000000 #define WAITER_MAX_TIME 10000000
#define ZVFS_WAIT_TIME 5000ULL

View File

@@ -17,6 +17,7 @@
#include <unistd.h> #include <unistd.h>
#include <limits.h> #include <limits.h>
#include <pthread.h> #include <pthread.h>
#include <stdio.h>
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* 内部open 的核心逻辑(路径已解析为绝对路径) */ /* 内部open 的核心逻辑(路径已解析为绝对路径) */
@@ -44,7 +45,15 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode)
/* 1. 创建 blob */ /* 1. 创建 blob */
handle = blob_create(0); handle = blob_create(0);
if (!handle) { errno = EIO; goto fail; } if (!handle) {
int saved = errno;
if (saved == 0) saved = EIO;
fprintf(stderr,
"[zvfs] create blob failed path=%s flags=0x%x errno=%d(%s)\n",
abspath, flags, saved, strerror(saved));
errno = saved;
goto fail;
}
blob_id = handle->id; blob_id = handle->id;
/* 2. 把 blob_id 写入真实文件的 xattr */ /* 2. 把 blob_id 写入真实文件的 xattr */
@@ -80,7 +89,7 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode)
/* path_cache 命中:直接用缓存的 inode重新 blob_open */ /* path_cache 命中:直接用缓存的 inode重新 blob_open */
blob_id = inode->blob_id; blob_id = inode->blob_id;
handle = blob_open(blob_id); handle = blob_open(blob_id);
if (!handle) { errno = EIO; goto fail; } if (!handle) { if (errno == 0) errno = EIO; goto fail; }
/* 共享 inode增加引用 */ /* 共享 inode增加引用 */
atomic_fetch_add(&inode->ref_count, 1); atomic_fetch_add(&inode->ref_count, 1);
@@ -101,7 +110,7 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode)
} else { } else {
/* 全新 inode需从真实文件 stat 获取 mode/size */ /* 全新 inode需从真实文件 stat 获取 mode/size */
struct stat st; struct stat st;
if (real_fstat(real_fd, &st) < 0) goto fail; if (zvfs_real_fstat(real_fd, &st) < 0) goto fail;
inode = inode_alloc(blob_id, st.st_mode, ZVFS_ITYPE_FILE); inode = inode_alloc(blob_id, st.st_mode, ZVFS_ITYPE_FILE);
if (!inode) { errno = ENOMEM; goto fail; } if (!inode) { errno = ENOMEM; goto fail; }
@@ -117,7 +126,7 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode)
} }
handle = blob_open(blob_id); handle = blob_open(blob_id);
if (!handle) { errno = EIO; goto fail; } if (!handle) { if (errno == 0) errno = EIO; goto fail; }
} }
} }
@@ -340,11 +349,14 @@ zvfs_close_impl(int fd)
return real_close(fd); return real_close(fd);
} }
/* ---- openfile 引用归零:关闭 blob handle --------------------- */ /* ---- openfile 引用归零:先刷 metadata关闭 blob handle ------ */
struct zvfs_inode *inode = of->inode; struct zvfs_inode *inode = of->inode;
struct zvfs_blob_handle *handle = of->handle; struct zvfs_blob_handle *handle = of->handle;
int sync_failed = 0;
openfile_free(of); openfile_free(of);
if (blob_sync_md(handle) < 0)
sync_failed = 1;
blob_close(handle); blob_close(handle);
/* ---- inode ref_count-- --------------------------------------- */ /* ---- inode ref_count-- --------------------------------------- */
@@ -391,7 +403,14 @@ zvfs_close_impl(int fd)
inode_free(inode); inode_free(inode);
} }
return real_close(fd); int rc = real_close(fd);
if (rc < 0)
return -1;
if (sync_failed) {
errno = EIO;
return -1;
}
return 0;
} }
int int

View File

@@ -81,6 +81,15 @@ int (*real_fstatat)(int, const char *, struct stat *, int) = NULL;
int (*real_fstatat64)(int, const char *, struct stat64 *, int) = NULL; int (*real_fstatat64)(int, const char *, struct stat64 *, int) = NULL;
int (*real_statx)(int, const char *, int, unsigned int, int (*real_statx)(int, const char *, int, unsigned int,
struct statx *) = NULL; struct statx *) = NULL;
int (*real___xstat)(int, const char *, struct stat *) = NULL;
int (*real___xstat64)(int, const char *, struct stat64 *) = NULL;
int (*real___fxstat)(int, int, struct stat *) = NULL;
int (*real___fxstat64)(int, int, struct stat64 *) = NULL;
int (*real___lxstat)(int, const char *, struct stat *) = NULL;
int (*real___lxstat64)(int, const char *, struct stat64 *) = NULL;
int (*real___fxstatat)(int, int, const char *, struct stat *, int) = NULL;
int (*real___fxstatat64)(int, int, const char *, struct stat64 *,
int) = NULL;
/* sync */ /* sync */
int (*real_fsync)(int) = NULL; int (*real_fsync)(int) = NULL;
@@ -116,10 +125,19 @@ int (*real___open64)(const char *, int, ...) = NULL;
int (*real___libc_open)(const char *, int, ...) = NULL; int (*real___libc_open)(const char *, int, ...) = NULL;
ssize_t (*real___read)(int, void *, size_t) = NULL; ssize_t (*real___read)(int, void *, size_t) = NULL;
ssize_t (*real___libc_read)(int, void *, size_t) = NULL; ssize_t (*real___libc_read)(int, void *, size_t) = NULL;
ssize_t (*real___read_nocancel)(int, void *, size_t) = NULL;
ssize_t (*real___pread64)(int, void *, size_t, off_t) = NULL;
ssize_t (*real___libc_pread)(int, void *, size_t, off_t) = NULL;
ssize_t (*real___pread64_nocancel)(int, void *, size_t, off_t) = NULL;
ssize_t (*real___read_chk)(int, void *, size_t, size_t) = NULL;
ssize_t (*real___pread_chk)(int, void *, size_t, off_t, size_t) = NULL;
ssize_t (*real___pread64_chk)(int, void *, size_t, off_t, size_t) = NULL;
ssize_t (*real___write)(int, const void *, size_t) = NULL; ssize_t (*real___write)(int, const void *, size_t) = NULL;
ssize_t (*real___libc_write)(int, const void *, size_t) = NULL; ssize_t (*real___libc_write)(int, const void *, size_t) = NULL;
int (*real___close)(int) = NULL; int (*real___close)(int) = NULL;
int (*real___libc_close)(int) = NULL; int (*real___libc_close)(int) = NULL;
size_t (*real_fread_unlocked)(void *, size_t, size_t, FILE *) = NULL;
size_t (*real_fread)(void *, size_t, size_t, FILE *) = NULL;
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* dlsym 辅助宏 */ /* dlsym 辅助宏 */
@@ -180,14 +198,14 @@ void zvfs_hook_init(void)
LOAD_SYM(real_fallocate, "fallocate"); LOAD_SYM(real_fallocate, "fallocate");
LOAD_SYM(real_posix_fallocate,"posix_fallocate"); LOAD_SYM(real_posix_fallocate,"posix_fallocate");
LOAD_SYM(real_stat, "stat"); LOAD_SYM_OPTIONAL(real_stat, "stat");
LOAD_SYM(real_stat64, "stat64"); LOAD_SYM_OPTIONAL(real_stat64, "stat64");
LOAD_SYM(real_fstat, "fstat"); LOAD_SYM_OPTIONAL(real_fstat, "fstat");
LOAD_SYM(real_fstat64, "fstat64"); LOAD_SYM_OPTIONAL(real_fstat64, "fstat64");
LOAD_SYM(real_lstat, "lstat"); LOAD_SYM_OPTIONAL(real_lstat, "lstat");
LOAD_SYM(real_lstat64, "lstat64"); LOAD_SYM_OPTIONAL(real_lstat64, "lstat64");
LOAD_SYM(real_fstatat, "fstatat"); LOAD_SYM_OPTIONAL(real_fstatat, "fstatat");
LOAD_SYM(real_fstatat64, "fstatat64"); LOAD_SYM_OPTIONAL(real_fstatat64, "fstatat64");
LOAD_SYM(real_fsync, "fsync"); LOAD_SYM(real_fsync, "fsync");
LOAD_SYM(real_fdatasync, "fdatasync"); LOAD_SYM(real_fdatasync, "fdatasync");
LOAD_SYM(real_fcntl, "fcntl"); LOAD_SYM(real_fcntl, "fcntl");
@@ -215,17 +233,110 @@ void zvfs_hook_init(void)
LOAD_SYM_OPTIONAL(real___open, "__open"); LOAD_SYM_OPTIONAL(real___open, "__open");
LOAD_SYM_OPTIONAL(real___open64, "__open64"); LOAD_SYM_OPTIONAL(real___open64, "__open64");
LOAD_SYM_OPTIONAL(real___libc_open, "__libc_open"); LOAD_SYM_OPTIONAL(real___libc_open, "__libc_open");
LOAD_SYM_OPTIONAL(real___xstat, "__xstat");
LOAD_SYM_OPTIONAL(real___xstat64, "__xstat64");
LOAD_SYM_OPTIONAL(real___fxstat, "__fxstat");
LOAD_SYM_OPTIONAL(real___fxstat64, "__fxstat64");
LOAD_SYM_OPTIONAL(real___lxstat, "__lxstat");
LOAD_SYM_OPTIONAL(real___lxstat64, "__lxstat64");
LOAD_SYM_OPTIONAL(real___fxstatat, "__fxstatat");
LOAD_SYM_OPTIONAL(real___fxstatat64, "__fxstatat64");
LOAD_SYM_OPTIONAL(real___read, "__read"); LOAD_SYM_OPTIONAL(real___read, "__read");
LOAD_SYM_OPTIONAL(real___libc_read, "__libc_read"); LOAD_SYM_OPTIONAL(real___libc_read, "__libc_read");
LOAD_SYM_OPTIONAL(real___read_nocancel, "__read_nocancel");
LOAD_SYM_OPTIONAL(real___pread64, "__pread64");
LOAD_SYM_OPTIONAL(real___libc_pread, "__libc_pread");
LOAD_SYM_OPTIONAL(real___pread64_nocancel, "__pread64_nocancel");
LOAD_SYM_OPTIONAL(real___read_chk, "__read_chk");
LOAD_SYM_OPTIONAL(real___pread_chk, "__pread_chk");
LOAD_SYM_OPTIONAL(real___pread64_chk, "__pread64_chk");
LOAD_SYM_OPTIONAL(real___write, "__write"); LOAD_SYM_OPTIONAL(real___write, "__write");
LOAD_SYM_OPTIONAL(real___libc_write, "__libc_write"); LOAD_SYM_OPTIONAL(real___libc_write, "__libc_write");
LOAD_SYM_OPTIONAL(real___close, "__close"); LOAD_SYM_OPTIONAL(real___close, "__close");
LOAD_SYM_OPTIONAL(real___libc_close, "__libc_close"); LOAD_SYM_OPTIONAL(real___libc_close, "__libc_close");
LOAD_SYM_OPTIONAL(real_fread_unlocked, "fread_unlocked");
LOAD_SYM_OPTIONAL(real_fread, "fread");
/* 初始化全局 fs 结构 */ /* 初始化全局 fs 结构 */
zvfs_fs_init(); zvfs_fs_init();
} }
#ifndef _STAT_VER
#define _STAT_VER 0
#endif
int
zvfs_real_stat(const char *path, struct stat *buf)
{
if (real_stat) return real_stat(path, buf);
if (real___xstat) return real___xstat(_STAT_VER, path, buf);
errno = ENOSYS;
return -1;
}
int
zvfs_real_stat64(const char *path, struct stat64 *buf)
{
if (real_stat64) return real_stat64(path, buf);
if (real___xstat64) return real___xstat64(_STAT_VER, path, buf);
errno = ENOSYS;
return -1;
}
int
zvfs_real_fstat(int fd, struct stat *buf)
{
if (real_fstat) return real_fstat(fd, buf);
if (real___fxstat) return real___fxstat(_STAT_VER, fd, buf);
errno = ENOSYS;
return -1;
}
int
zvfs_real_fstat64(int fd, struct stat64 *buf)
{
if (real_fstat64) return real_fstat64(fd, buf);
if (real___fxstat64) return real___fxstat64(_STAT_VER, fd, buf);
errno = ENOSYS;
return -1;
}
int
zvfs_real_lstat(const char *path, struct stat *buf)
{
if (real_lstat) return real_lstat(path, buf);
if (real___lxstat) return real___lxstat(_STAT_VER, path, buf);
errno = ENOSYS;
return -1;
}
int
zvfs_real_lstat64(const char *path, struct stat64 *buf)
{
if (real_lstat64) return real_lstat64(path, buf);
if (real___lxstat64) return real___lxstat64(_STAT_VER, path, buf);
errno = ENOSYS;
return -1;
}
int
zvfs_real_fstatat(int dirfd, const char *path, struct stat *buf, int flags)
{
if (real_fstatat) return real_fstatat(dirfd, path, buf, flags);
if (real___fxstatat) return real___fxstatat(_STAT_VER, dirfd, path, buf, flags);
errno = ENOSYS;
return -1;
}
int
zvfs_real_fstatat64(int dirfd, const char *path, struct stat64 *buf, int flags)
{
if (real_fstatat64) return real_fstatat64(dirfd, path, buf, flags);
if (real___fxstatat64) return real___fxstatat64(_STAT_VER, dirfd, path, buf, flags);
errno = ENOSYS;
return -1;
}
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* 路径 / fd 判断 */ /* 路径 / fd 判断 */
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */

View File

@@ -7,6 +7,7 @@
#include <fcntl.h> #include <fcntl.h>
#include <unistd.h> #include <unistd.h>
#include <stdint.h> #include <stdint.h>
#include <stdio.h>
#include "fs/zvfs_sys_init.h" #include "fs/zvfs_sys_init.h"
/* /*
@@ -73,6 +74,17 @@ extern int (*real_fstatat)(int dirfd, const char *path, struct stat *buf, int
extern int (*real_fstatat64)(int dirfd, const char *path, struct stat64 *buf, int flags); extern int (*real_fstatat64)(int dirfd, const char *path, struct stat64 *buf, int flags);
extern int (*real_statx)(int dirfd, const char *path, int flags, extern int (*real_statx)(int dirfd, const char *path, int flags,
unsigned int mask, struct statx *buf); unsigned int mask, struct statx *buf);
/* glibc xstat fallback */
extern int (*real___xstat)(int ver, const char *path, struct stat *buf);
extern int (*real___xstat64)(int ver, const char *path, struct stat64 *buf);
extern int (*real___fxstat)(int ver, int fd, struct stat *buf);
extern int (*real___fxstat64)(int ver, int fd, struct stat64 *buf);
extern int (*real___lxstat)(int ver, const char *path, struct stat *buf);
extern int (*real___lxstat64)(int ver, const char *path, struct stat64 *buf);
extern int (*real___fxstatat)(int ver, int dirfd, const char *path,
struct stat *buf, int flags);
extern int (*real___fxstatat64)(int ver, int dirfd, const char *path,
struct stat64 *buf, int flags);
/* sync */ /* sync */
extern int (*real_fsync)(int fd); extern int (*real_fsync)(int fd);
@@ -109,10 +121,19 @@ extern int (*real___open64)(const char *path, int flags, ...);
extern int (*real___libc_open)(const char *path, int flags, ...); extern int (*real___libc_open)(const char *path, int flags, ...);
extern ssize_t (*real___read)(int fd, void *buf, size_t count); extern ssize_t (*real___read)(int fd, void *buf, size_t count);
extern ssize_t (*real___libc_read)(int fd, void *buf, size_t count); extern ssize_t (*real___libc_read)(int fd, void *buf, size_t count);
extern ssize_t (*real___read_nocancel)(int fd, void *buf, size_t count);
extern ssize_t (*real___pread64)(int fd, void *buf, size_t count, off64_t offset);
extern ssize_t (*real___libc_pread)(int fd, void *buf, size_t count, off64_t offset);
extern ssize_t (*real___pread64_nocancel)(int fd, void *buf, size_t count, off64_t offset);
extern ssize_t (*real___read_chk)(int fd, void *buf, size_t count, size_t buflen);
extern ssize_t (*real___pread_chk)(int fd, void *buf, size_t count, off_t offset, size_t buflen);
extern ssize_t (*real___pread64_chk)(int fd, void *buf, size_t count, off64_t offset, size_t buflen);
extern ssize_t (*real___write)(int fd, const void *buf, size_t count); extern ssize_t (*real___write)(int fd, const void *buf, size_t count);
extern ssize_t (*real___libc_write)(int fd, const void *buf, size_t count); extern ssize_t (*real___libc_write)(int fd, const void *buf, size_t count);
extern int (*real___close)(int fd); extern int (*real___close)(int fd);
extern int (*real___libc_close)(int fd); extern int (*real___libc_close)(int fd);
extern size_t (*real_fread_unlocked)(void *ptr, size_t size, size_t nmemb, FILE *stream);
extern size_t (*real_fread)(void *ptr, size_t size, size_t nmemb, FILE *stream);
/* 初始化所有 real_* 指针,在 constructor 中调用 */ /* 初始化所有 real_* 指针,在 constructor 中调用 */
void zvfs_hook_init(void); void zvfs_hook_init(void);
@@ -127,4 +148,14 @@ int zvfs_is_zvfs_fd(int fd);
* 成功返回 0失败返回 -1 并设置 errno。 * 成功返回 0失败返回 -1 并设置 errno。
*/ */
int zvfs_resolve_atpath(int dirfd, const char *path, char *buf, size_t bufsz); int zvfs_resolve_atpath(int dirfd, const char *path, char *buf, size_t bufsz);
/* stat wrapper优先 real_*fallback 到 __xstat* */
int zvfs_real_stat(const char *path, struct stat *buf);
int zvfs_real_stat64(const char *path, struct stat64 *buf);
int zvfs_real_fstat(int fd, struct stat *buf);
int zvfs_real_fstat64(int fd, struct stat64 *buf);
int zvfs_real_lstat(const char *path, struct stat *buf);
int zvfs_real_lstat64(const char *path, struct stat64 *buf);
int zvfs_real_fstatat(int dirfd, const char *path, struct stat *buf, int flags);
int zvfs_real_fstatat64(int dirfd, const char *path, struct stat64 *buf, int flags);
#endif // __ZVFS_HOOK_INIT_H__ #endif // __ZVFS_HOOK_INIT_H__

View File

@@ -10,9 +10,14 @@
#include "spdk_engine/io_engine.h" #include "spdk_engine/io_engine.h"
#include <errno.h> #include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h> #include <string.h>
#include <pthread.h> #include <pthread.h>
#include <stdint.h> #include <stdint.h>
#include <limits.h>
#include <unistd.h>
#include <bits/types/struct_FILE.h>
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* 内部:单段 pread / pwrite不修改 of->offset */ /* 内部:单段 pread / pwrite不修改 of->offset */
@@ -212,6 +217,56 @@ get_of(int fd)
return of; return of;
} }
static size_t
zvfs_fread_impl(void *ptr, size_t size, size_t nmemb, FILE *stream, int unlocked)
{
ZVFS_HOOK_ENTER();
int fd = stream ? fileno(stream) : -1;
struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || fd < 0 || !(of = get_of(fd))) {
size_t r = 0;
if (unlocked) {
if (real_fread_unlocked) r = real_fread_unlocked(ptr, size, nmemb, stream);
else if (real_fread) r = real_fread(ptr, size, nmemb, stream);
else errno = ENOSYS;
} else {
if (real_fread) r = real_fread(ptr, size, nmemb, stream);
else if (real_fread_unlocked) r = real_fread_unlocked(ptr, size, nmemb, stream);
else errno = ENOSYS;
}
ZVFS_HOOK_LEAVE();
return r;
}
zvfs_ensure_init();
if (size == 0 || nmemb == 0) {
ZVFS_HOOK_LEAVE();
return 0;
}
if (nmemb > SIZE_MAX / size) {
errno = EOVERFLOW;
ZVFS_HOOK_LEAVE();
return 0;
}
size_t total = size * nmemb;
ssize_t n = zvfs_pread_impl(of, ptr, total, of->offset);
if (n > 0)
of->offset += (uint64_t)n;
/* Keep stdio state machine consistent for callers that check feof/ferror. */
if (n < 0) {
stream->_flags |= _IO_ERR_SEEN;
} else if ((size_t)n < total) {
stream->_flags |= _IO_EOF_SEEN;
}
ZVFS_HOOK_LEAVE();
return (n <= 0) ? 0 : ((size_t)n / size);
}
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* read */ /* read */
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
@@ -221,7 +276,7 @@ read(int fd, void *buf, size_t count)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_read(fd, buf, count); ssize_t r = real_read(fd, buf, count);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -240,6 +295,15 @@ read(int fd, void *buf, size_t count)
ssize_t __read(int fd, void *buf, size_t count) { return read(fd, buf, count); } ssize_t __read(int fd, void *buf, size_t count) { return read(fd, buf, count); }
ssize_t __libc_read(int fd, void *buf, size_t count) { return read(fd, buf, count); } ssize_t __libc_read(int fd, void *buf, size_t count) { return read(fd, buf, count); }
ssize_t __read_nocancel(int fd, void *buf, size_t count) { return read(fd, buf, count); }
ssize_t __read_chk(int fd, void *buf, size_t count, size_t buflen)
{
if (count > buflen) {
errno = ERANGE;
return -1;
}
return read(fd, buf, count);
}
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* pread / pread64 */ /* pread / pread64 */
@@ -250,7 +314,7 @@ pread(int fd, void *buf, size_t count, off_t offset)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_pread(fd, buf, count, offset); ssize_t r = real_pread(fd, buf, count, offset);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -269,6 +333,49 @@ ssize_t pread64(int fd, void *buf, size_t count, off_t offset)
return pread(fd, buf, count, offset); return pread(fd, buf, count, offset);
} }
ssize_t __pread64(int fd, void *buf, size_t count, off_t offset)
{
return pread(fd, buf, count, offset);
}
ssize_t __libc_pread(int fd, void *buf, size_t count, off_t offset)
{
return pread(fd, buf, count, offset);
}
ssize_t __pread64_nocancel(int fd, void *buf, size_t count, off_t offset)
{
return pread(fd, buf, count, offset);
}
ssize_t __pread_chk(int fd, void *buf, size_t count, off_t offset, size_t buflen)
{
if (count > buflen) {
errno = ERANGE;
return -1;
}
return pread(fd, buf, count, offset);
}
ssize_t __pread64_chk(int fd, void *buf, size_t count, off_t offset, size_t buflen)
{
if (count > buflen) {
errno = ERANGE;
return -1;
}
return pread(fd, buf, count, offset);
}
size_t fread_unlocked(void *ptr, size_t size, size_t nmemb, FILE *stream)
{
return zvfs_fread_impl(ptr, size, nmemb, stream, 1);
}
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
{
return zvfs_fread_impl(ptr, size, nmemb, stream, 0);
}
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
/* readv / preadv / preadv64 / preadv2 */ /* readv / preadv / preadv64 / preadv2 */
/* ------------------------------------------------------------------ */ /* ------------------------------------------------------------------ */
@@ -278,7 +385,7 @@ readv(int fd, const struct iovec *iov, int iovcnt)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_readv(fd, iov, iovcnt); ssize_t r = real_readv(fd, iov, iovcnt);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -300,7 +407,7 @@ preadv(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_preadv(fd, iov, iovcnt, offset); ssize_t r = real_preadv(fd, iov, iovcnt, offset);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -324,7 +431,7 @@ preadv2(int fd, const struct iovec *iov, int iovcnt, off_t offset, int flags)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_preadv2 ssize_t r = real_preadv2
? real_preadv2(fd, iov, iovcnt, offset, flags) ? real_preadv2(fd, iov, iovcnt, offset, flags)
@@ -358,7 +465,7 @@ write(int fd, const void *buf, size_t count)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_write(fd, buf, count); ssize_t r = real_write(fd, buf, count);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -423,7 +530,7 @@ pwrite(int fd, const void *buf, size_t count, off_t offset)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_pwrite(fd, buf, count, offset); ssize_t r = real_pwrite(fd, buf, count, offset);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -454,7 +561,7 @@ writev(int fd, const struct iovec *iov, int iovcnt)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_writev(fd, iov, iovcnt); ssize_t r = real_writev(fd, iov, iovcnt);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -501,7 +608,7 @@ pwritev(int fd, const struct iovec *iov, int iovcnt, off_t offset)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_pwritev(fd, iov, iovcnt, offset); ssize_t r = real_pwritev(fd, iov, iovcnt, offset);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
@@ -525,7 +632,7 @@ pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset, int flags)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
struct zvfs_open_file *of; struct zvfs_open_file *of = NULL;
if (ZVFS_IN_HOOK() || !(of = get_of(fd))) { if (ZVFS_IN_HOOK() || !(of = get_of(fd))) {
ssize_t r = real_pwritev2 ssize_t r = real_pwritev2
? real_pwritev2(fd, iov, iovcnt, offset, flags) ? real_pwritev2(fd, iov, iovcnt, offset, flags)

View File

@@ -4,6 +4,7 @@
#include <sys/types.h> #include <sys/types.h>
#include <sys/uio.h> #include <sys/uio.h>
#include <unistd.h> #include <unistd.h>
#include <stdio.h>
/* /*
* read / write 族。 * read / write 族。
@@ -46,7 +47,16 @@ ssize_t pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset,
/* glibc 别名 */ /* glibc 别名 */
ssize_t __read(int fd, void *buf, size_t count); ssize_t __read(int fd, void *buf, size_t count);
ssize_t __libc_read(int fd, void *buf, size_t count); ssize_t __libc_read(int fd, void *buf, size_t count);
ssize_t __read_nocancel(int fd, void *buf, size_t count);
ssize_t __pread64(int fd, void *buf, size_t count, off_t offset);
ssize_t __libc_pread(int fd, void *buf, size_t count, off_t offset);
ssize_t __pread64_nocancel(int fd, void *buf, size_t count, off_t offset);
ssize_t __read_chk(int fd, void *buf, size_t count, size_t buflen);
ssize_t __pread_chk(int fd, void *buf, size_t count, off_t offset, size_t buflen);
ssize_t __pread64_chk(int fd, void *buf, size_t count, off_t offset, size_t buflen);
ssize_t __write(int fd, const void *buf, size_t count); ssize_t __write(int fd, const void *buf, size_t count);
ssize_t __libc_write(int fd, const void *buf, size_t count); ssize_t __libc_write(int fd, const void *buf, size_t count);
size_t fread_unlocked(void *ptr, size_t size, size_t nmemb, FILE *stream);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
#endif // __ZVFS_HOOK_RW_H__ #endif // __ZVFS_HOOK_RW_H__

View File

@@ -106,7 +106,7 @@ stat(const char *path, struct stat *buf)
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) { if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) {
int r = real_stat(path, buf); int r = zvfs_real_stat(path, buf);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return r; return r;
} }
@@ -114,7 +114,7 @@ stat(const char *path, struct stat *buf)
zvfs_ensure_init(); zvfs_ensure_init();
/* 先透传,拿到完整 statmode、ino、dev、nlink 等) */ /* 先透传,拿到完整 statmode、ino、dev、nlink 等) */
if (real_stat(path, buf) < 0) { if (zvfs_real_stat(path, buf) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -138,14 +138,14 @@ stat64(const char *path, struct stat64 *buf)
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) { if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) {
int r = real_stat64(path, buf); int r = zvfs_real_stat64(path, buf);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return r; return r;
} }
zvfs_ensure_init(); zvfs_ensure_init();
if (real_stat64(path, buf) < 0) { if (zvfs_real_stat64(path, buf) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -168,7 +168,7 @@ fstat(int fd, struct stat *buf)
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
/* 先透传:拿到 mode/ino/dev/nlink/blksize 等 */ /* 先透传:拿到 mode/ino/dev/nlink/blksize 等 */
if (real_fstat(fd, buf) < 0) { if (zvfs_real_fstat(fd, buf) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -196,7 +196,7 @@ fstat64(int fd, struct stat64 *buf)
{ {
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
if (real_fstat64(fd, buf) < 0) { if (zvfs_real_fstat64(fd, buf) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -229,14 +229,14 @@ lstat(const char *path, struct stat *buf)
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) { if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) {
int r = real_lstat(path, buf); int r = zvfs_real_lstat(path, buf);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return r; return r;
} }
zvfs_ensure_init(); zvfs_ensure_init();
if (real_lstat(path, buf) < 0) { if (zvfs_real_lstat(path, buf) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -255,14 +255,14 @@ lstat64(const char *path, struct stat64 *buf)
ZVFS_HOOK_ENTER(); ZVFS_HOOK_ENTER();
if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) { if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) {
int r = real_lstat64(path, buf); int r = zvfs_real_lstat64(path, buf);
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return r; return r;
} }
zvfs_ensure_init(); zvfs_ensure_init();
if (real_lstat64(path, buf) < 0) { if (zvfs_real_lstat64(path, buf) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -292,7 +292,7 @@ fstatat(int dirfd, const char *path, struct stat *buf, int flags)
is_zvfs = zvfs_is_zvfs_path(abspath); is_zvfs = zvfs_is_zvfs_path(abspath);
} }
if (real_fstatat(dirfd, path, buf, flags) < 0) { if (zvfs_real_fstatat(dirfd, path, buf, flags) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }
@@ -321,7 +321,7 @@ fstatat64(int dirfd, const char *path, struct stat64 *buf, int flags)
is_zvfs = zvfs_is_zvfs_path(abspath); is_zvfs = zvfs_is_zvfs_path(abspath);
} }
if (real_fstatat64(dirfd, path, buf, flags) < 0) { if (zvfs_real_fstatat64(dirfd, path, buf, flags) < 0) {
ZVFS_HOOK_LEAVE(); ZVFS_HOOK_LEAVE();
return -1; return -1;
} }

View File

@@ -12,13 +12,14 @@
#include <errno.h> #include <errno.h>
#include <pthread.h> #include <pthread.h>
#include <string.h> #include <string.h>
#include <time.h>
struct zvfs_spdk_io_engine g_engine = {0}; struct zvfs_spdk_io_engine g_engine = {0};
static int g_engine_init_rc = -EAGAIN; static int g_engine_init_rc = -EAGAIN;
static pthread_mutex_t g_super_blob_mutex = PTHREAD_MUTEX_INITIALIZER;
static spdk_blob_id g_super_blob_id_cache = SPDK_BLOBID_INVALID;
static __thread struct zvfs_tls_ctx tls = {0}; static __thread struct zvfs_tls_ctx tls = {0};
static pthread_once_t g_tls_cleanup_once = PTHREAD_ONCE_INIT;
static pthread_key_t g_tls_cleanup_key;
// 初始化操作上下文 // 初始化操作上下文
struct json_load_ctx { struct json_load_ctx {
@@ -54,9 +55,6 @@ struct md_op_ctx {
struct { // for delete struct { // for delete
spdk_blob_id blob_id; spdk_blob_id blob_id;
} delete; } delete;
struct { // for get/set super
spdk_blob_id blob_id;
} super;
}; };
char *op_name; char *op_name;
}; };
@@ -67,9 +65,37 @@ struct io_completion_ctx {
int rc; int rc;
}; };
struct md_poller_bootstrap_ctx {
const char *bdev_name;
pthread_mutex_t mu;
pthread_cond_t cv;
bool done;
int rc;
};
static uint64_t now_mono_ms(void);
static int open_bdev_and_init_bs(const char *bdev_name);
static void ensure_tls_cleanup_key(void);
static void tls_cleanup_destructor(void *arg);
// metadata poller 线程函数 // metadata poller 线程函数
static void *md_poller_fn(void *arg) { static void *md_poller_fn(void *arg) {
struct md_poller_bootstrap_ctx *boot = arg;
spdk_set_thread(g_engine.md_thread); spdk_set_thread(g_engine.md_thread);
tls.thread = g_engine.md_thread;
int init_rc = open_bdev_and_init_bs(boot->bdev_name);
pthread_mutex_lock(&boot->mu);
boot->rc = init_rc;
boot->done = true;
pthread_cond_signal(&boot->cv);
pthread_mutex_unlock(&boot->mu);
if (init_rc != 0) {
return NULL;
}
while (true) { while (true) {
spdk_thread_poll(g_engine.md_thread, 0, 0); spdk_thread_poll(g_engine.md_thread, 0, 0);
usleep(1000); usleep(1000);
@@ -77,14 +103,21 @@ static void *md_poller_fn(void *arg) {
return NULL; return NULL;
} }
static uint64_t now_mono_ms(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (uint64_t)ts.tv_sec * 1000ULL + (uint64_t)ts.tv_nsec / 1000000ULL;
}
// 前向声明 // 前向声明
static struct spdk_io_channel *get_current_channel(void); static struct spdk_io_channel *get_current_channel(void);
static int dispatch_md_op(struct md_op_ctx *ctx); static int dispatch_md_op(struct md_op_ctx *ctx);
static int dispatch_md_op_quiet(struct md_op_ctx *ctx);
static void md_op_cb(void *arg); static void md_op_cb(void *arg);
static int open_bdev_and_init_bs(const char *bdev_name);
static int load_json_config(void); static int load_json_config(void);
static int ensure_engine_ready(const char *op); static int ensure_engine_ready(const char *op);
static int ensure_current_spdk_thread(const char *op);
// callbacks // callbacks
static void json_app_load_done(int rc, void *arg); static void json_app_load_done(int rc, void *arg);
@@ -97,8 +130,6 @@ static void blob_sync_md_cb(void *arg, int rc);
static void blob_close_cb(void *arg, int rc); static void blob_close_cb(void *arg, int rc);
static void blob_delete_cb(void *arg, int rc); static void blob_delete_cb(void *arg, int rc);
static void io_completion_cb(void *arg, int rc); static void io_completion_cb(void *arg, int rc);
static void blob_get_super_cb(void *arg, spdk_blob_id blobid, int rc);
static void blob_set_super_cb(void *arg, int rc);
// op functions on matadata // op functions on matadata
static void blob_create_on_md(struct md_op_ctx *ctx); static void blob_create_on_md(struct md_op_ctx *ctx);
@@ -107,8 +138,6 @@ static void blob_resize_on_md(struct md_op_ctx *ctx);
static void blob_sync_md_on_md(struct md_op_ctx *ctx); static void blob_sync_md_on_md(struct md_op_ctx *ctx);
static void blob_close_on_md(struct md_op_ctx *ctx); static void blob_close_on_md(struct md_op_ctx *ctx);
static void blob_delete_on_md(struct md_op_ctx *ctx); static void blob_delete_on_md(struct md_op_ctx *ctx);
static void blob_get_super_on_md(struct md_op_ctx *ctx);
static void blob_set_super_on_md(struct md_op_ctx *ctx);
__attribute__((constructor)) static void preload_init(void) { __attribute__((constructor)) static void preload_init(void) {
const char *auto_init = getenv("ZVFS_AUTO_INIT"); const char *auto_init = getenv("ZVFS_AUTO_INIT");
@@ -116,7 +145,6 @@ __attribute__((constructor)) static void preload_init(void) {
return; return;
} }
printf("\n\n auto init \n\n");
const char *bdev_name = getenv("SPDK_BDEV_NAME") ? getenv("SPDK_BDEV_NAME") : ZVFS_BDEV; const char *bdev_name = getenv("SPDK_BDEV_NAME") ? getenv("SPDK_BDEV_NAME") : ZVFS_BDEV;
g_engine_init_rc = io_engine_init(bdev_name); g_engine_init_rc = io_engine_init(bdev_name);
if (g_engine_init_rc != 0) { if (g_engine_init_rc != 0) {
@@ -125,7 +153,7 @@ __attribute__((constructor)) static void preload_init(void) {
} }
static int wait_done(bool *done_ptr, int *rc_ptr, const char *op) { static int wait_done(bool *done_ptr, int *rc_ptr, const char *op) {
int iter = 0; const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME;
while (!*done_ptr) { while (!*done_ptr) {
if (tls.thread) { if (tls.thread) {
spdk_thread_poll(tls.thread, 0, 0); spdk_thread_poll(tls.thread, 0, 0);
@@ -133,7 +161,8 @@ static int wait_done(bool *done_ptr, int *rc_ptr, const char *op) {
SPDK_ERRLOG("not init tls.thread\n"); SPDK_ERRLOG("not init tls.thread\n");
return -EBADE; return -EBADE;
} }
if (++iter > WAITER_MAX_TIME) {
if (now_mono_ms() >= deadline_ms) {
SPDK_ERRLOG("%s timeout\n", op); SPDK_ERRLOG("%s timeout\n", op);
return -ETIMEDOUT; return -ETIMEDOUT;
} }
@@ -147,15 +176,24 @@ static int wait_done(bool *done_ptr, int *rc_ptr, const char *op) {
} }
static int wait_done_volatile(volatile bool *done_ptr, int *rc_ptr, const char *op) { static int wait_done_volatile(volatile bool *done_ptr, int *rc_ptr, const char *op) {
int iter = 0; const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME;
while (!*done_ptr) { bool logged_no_tls = false;
while (!__atomic_load_n(done_ptr, __ATOMIC_ACQUIRE)) {
if (tls.thread) { if (tls.thread) {
spdk_thread_poll(tls.thread, 0, 0); spdk_thread_poll(tls.thread, 0, 0);
}else{ } else {
SPDK_ERRLOG("not init tls.thread\n"); /*
return -EBADE; * md ops are executed on g_engine.md_thread by md_poller_fn.
* If current worker TLS is not initialized, we still need to wait
* for callback completion; returning early can invalidate stack ctx.
*/
if (!logged_no_tls) {
SPDK_NOTICELOG("%s: tls.thread not initialized, waiting on md thread only\n", op);
logged_no_tls = true;
}
usleep(1000);
} }
if (++iter > WAITER_MAX_TIME) { if (now_mono_ms() >= deadline_ms) {
SPDK_ERRLOG("%s timeout\n", op); SPDK_ERRLOG("%s timeout\n", op);
return -ETIMEDOUT; return -ETIMEDOUT;
} }
@@ -168,25 +206,6 @@ static int wait_done_volatile(volatile bool *done_ptr, int *rc_ptr, const char *
return 0; return 0;
} }
// no rc error
static int wait_done_volatile_quiet(volatile bool *done_ptr, int *rc_ptr, const char *op) {
int iter = 0;
while (!*done_ptr) {
if (tls.thread) {
spdk_thread_poll(tls.thread, 0, 0);
} else {
SPDK_ERRLOG("not init tls.thread\n");
return -EBADE;
}
if (++iter > WAITER_MAX_TIME) {
SPDK_ERRLOG("%s timeout\n", op);
return -ETIMEDOUT;
}
}
return *rc_ptr;
}
int io_engine_init(const char *bdev_name) { int io_engine_init(const char *bdev_name) {
if (g_engine_init_rc == 0 && g_engine.bs != NULL && g_engine.md_thread != NULL) { if (g_engine_init_rc == 0 && g_engine.bs != NULL && g_engine.md_thread != NULL) {
return 0; return 0;
@@ -239,22 +258,40 @@ int io_engine_init(const char *bdev_name) {
return g_engine_init_rc; return g_engine_init_rc;
} }
// 起专用 poller pthread for md_thread struct md_poller_bootstrap_ctx boot = {
.bdev_name = bdev_name,
.done = false,
.rc = 0,
};
pthread_mutex_init(&boot.mu, NULL);
pthread_cond_init(&boot.cv, NULL);
// 起专用 poller pthread for md_thread并在该线程完成 bdev/blobstore 初始化)
pthread_t md_poller_tid; pthread_t md_poller_tid;
if (pthread_create(&md_poller_tid, NULL, md_poller_fn, NULL) != 0) { if (pthread_create(&md_poller_tid, NULL, md_poller_fn, &boot) != 0) {
SPDK_ERRLOG("pthread_create for md_poller failed\n"); SPDK_ERRLOG("pthread_create for md_poller failed\n");
pthread_cond_destroy(&boot.cv);
pthread_mutex_destroy(&boot.mu);
g_engine_init_rc = -1; g_engine_init_rc = -1;
return g_engine_init_rc; return g_engine_init_rc;
} }
if (pthread_detach(md_poller_tid) != 0) { if (pthread_detach(md_poller_tid) != 0) {
SPDK_ERRLOG("pthread_detach for md_poller failed\n"); SPDK_ERRLOG("pthread_detach for md_poller failed\n");
pthread_cond_destroy(&boot.cv);
pthread_mutex_destroy(&boot.mu);
g_engine_init_rc = -1; g_engine_init_rc = -1;
return g_engine_init_rc; return g_engine_init_rc;
} }
// init bdev/bs pthread_mutex_lock(&boot.mu);
g_super_blob_id_cache = SPDK_BLOBID_INVALID; while (!boot.done) {
int rc = open_bdev_and_init_bs(bdev_name); pthread_cond_wait(&boot.cv, &boot.mu);
}
int rc = boot.rc;
pthread_mutex_unlock(&boot.mu);
pthread_cond_destroy(&boot.cv);
pthread_mutex_destroy(&boot.mu);
if (rc != 0) { if (rc != 0) {
g_engine_init_rc = rc; g_engine_init_rc = rc;
return rc; return rc;
@@ -283,19 +320,12 @@ static struct spdk_io_channel *get_current_channel(void) {
return NULL; return NULL;
} }
if (tls.thread) { if (ensure_current_spdk_thread("get_current_channel") != 0) {
spdk_thread_poll(tls.thread, 0, 0); return NULL;
} }
if (!tls.thread) { if (tls.thread) {
char name[32]; spdk_thread_poll(tls.thread, 0, 0);
snprintf(name, sizeof(name), "worker_%lu", pthread_self());
tls.thread = spdk_thread_create(name, NULL);
if (!tls.thread) {
SPDK_ERRLOG("spdk_thread_create failed\n");
return NULL;
}
spdk_set_thread(tls.thread);
} }
if (!tls.channel) { if (!tls.channel) {
@@ -308,33 +338,107 @@ static struct spdk_io_channel *get_current_channel(void) {
return tls.channel; return tls.channel;
} }
static void put_current_channel(struct spdk_io_channel *ch) {
if (!ch) {
return;
}
spdk_put_io_channel(ch);
if (tls.thread) {
spdk_thread_poll(tls.thread, 0, 0);
}
if (tls.channel == ch) {
tls.channel = NULL;
}
}
static void ensure_tls_cleanup_key(void) {
(void)pthread_key_create(&g_tls_cleanup_key, tls_cleanup_destructor);
}
static void tls_cleanup_destructor(void *arg) {
(void)arg;
if (!tls.thread || tls.thread == g_engine.md_thread) {
return;
}
spdk_set_thread(tls.thread);
if (tls.channel) {
spdk_put_io_channel(tls.channel);
tls.channel = NULL;
}
spdk_thread_exit(tls.thread);
const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME;
while (!spdk_thread_is_exited(tls.thread)) {
spdk_thread_poll(tls.thread, 0, 0);
if (now_mono_ms() >= deadline_ms) {
SPDK_ERRLOG("worker tls thread exit timeout\n");
break;
}
usleep(1000);
}
if (spdk_thread_is_exited(tls.thread)) {
spdk_thread_destroy(tls.thread);
}
tls.thread = NULL;
pthread_setspecific(g_tls_cleanup_key, NULL);
}
static int ensure_current_spdk_thread(const char *op) {
pthread_once(&g_tls_cleanup_once, ensure_tls_cleanup_key);
if (!tls.thread) {
char name[32];
snprintf(name, sizeof(name), "worker_%lu", (unsigned long)pthread_self());
tls.thread = spdk_thread_create(name, NULL);
if (!tls.thread) {
SPDK_ERRLOG("%s: spdk_thread_create failed\n", op);
return -ENOMEM;
}
pthread_setspecific(g_tls_cleanup_key, (void *)1);
}
spdk_set_thread(tls.thread);
return 0;
}
// 通用 dispatch md op // 通用 dispatch md op
static int dispatch_md_op(struct md_op_ctx *ctx) { static int dispatch_md_op(struct md_op_ctx *ctx) {
int rc = ensure_engine_ready(ctx->op_name ? ctx->op_name : "dispatch_md_op"); int rc = ensure_engine_ready(ctx->op_name ? ctx->op_name : "dispatch_md_op");
if (rc != 0) { if (rc != 0) {
return rc; return rc;
} }
rc = ensure_current_spdk_thread(ctx->op_name ? ctx->op_name : "dispatch_md_op");
ctx->done = false;
ctx->rc = 0;
spdk_thread_send_msg(g_engine.md_thread, md_op_cb, ctx);
return wait_done_volatile(&ctx->done, &ctx->rc, ctx->op_name);
}
static int dispatch_md_op_quiet(struct md_op_ctx *ctx) {
int rc = ensure_engine_ready(ctx->op_name ? ctx->op_name : "dispatch_md_op_quiet");
if (rc != 0) { if (rc != 0) {
return rc; return rc;
} }
ctx->done = false; struct md_op_ctx *async_ctx = malloc(sizeof(*async_ctx));
ctx->rc = 0; if (!async_ctx) {
return -ENOMEM;
}
*async_ctx = *ctx;
__atomic_store_n(&async_ctx->done, false, __ATOMIC_RELAXED);
async_ctx->rc = 0;
spdk_thread_send_msg(g_engine.md_thread, md_op_cb, ctx); rc = spdk_thread_send_msg(g_engine.md_thread, md_op_cb, async_ctx);
return wait_done_volatile_quiet(&ctx->done, &ctx->rc, ctx->op_name); if (rc != 0) {
SPDK_ERRLOG("%s: spdk_thread_send_msg failed: %d\n", async_ctx->op_name, rc);
free(async_ctx);
return rc;
}
rc = wait_done_volatile(&async_ctx->done, &async_ctx->rc, async_ctx->op_name);
if (rc == -ETIMEDOUT) {
SPDK_ERRLOG("%s timeout; keep async ctx alive to avoid UAF\n", async_ctx->op_name);
return rc;
}
*ctx = *async_ctx;
free(async_ctx);
return rc;
} }
static int ensure_engine_ready(const char *op) { static int ensure_engine_ready(const char *op) {
@@ -438,111 +542,12 @@ static int open_bdev_and_init_bs(const char *bdev_name) {
return 0; return 0;
} }
static void blob_get_super_cb(void *arg, spdk_blob_id blobid, int rc) {
struct md_op_ctx *ctx = arg;
ctx->rc = rc;
ctx->super.blob_id = blobid;
ctx->done = true;
}
static void blob_set_super_cb(void *arg, int rc) {
struct md_op_ctx *ctx = arg;
ctx->rc = rc;
ctx->done = true;
}
static void blob_get_super_on_md(struct md_op_ctx *ctx) {
spdk_bs_get_super(g_engine.bs, blob_get_super_cb, ctx);
}
static void blob_set_super_on_md(struct md_op_ctx *ctx) {
spdk_bs_set_super(g_engine.bs, ctx->super.blob_id, blob_set_super_cb, ctx);
}
static int bs_get_super_id(spdk_blob_id *blob_id) {
struct md_op_ctx ctx = {
.fn = blob_get_super_on_md,
.op_name = "blob get super",
};
ctx.super.blob_id = SPDK_BLOBID_INVALID;
int rc = dispatch_md_op_quiet(&ctx);
if (rc != 0) {
return rc;
}
*blob_id = ctx.super.blob_id;
return 0;
}
static int bs_set_super_id(spdk_blob_id blob_id) {
struct md_op_ctx ctx = {
.fn = blob_set_super_on_md,
.op_name = "blob set super",
};
ctx.super.blob_id = blob_id;
return dispatch_md_op(&ctx);
}
struct zvfs_blob_handle *blob_get_super(void) {
pthread_mutex_lock(&g_super_blob_mutex);
if (g_super_blob_id_cache != SPDK_BLOBID_INVALID) {
struct zvfs_blob_handle *cached = blob_open(g_super_blob_id_cache);
if (cached) {
pthread_mutex_unlock(&g_super_blob_mutex);
return cached;
}
g_super_blob_id_cache = SPDK_BLOBID_INVALID;
}
spdk_blob_id super_id = SPDK_BLOBID_INVALID;
int rc = bs_get_super_id(&super_id);
if (rc == 0 && super_id != SPDK_BLOBID_INVALID) {
g_super_blob_id_cache = super_id;
struct zvfs_blob_handle *existing = blob_open(super_id);
if (!existing) {
g_super_blob_id_cache = SPDK_BLOBID_INVALID;
}
pthread_mutex_unlock(&g_super_blob_mutex);
return existing;
}
if (rc == 0 && super_id == SPDK_BLOBID_INVALID) {
rc = -ENOENT;
}
if (rc != -ENOENT) {
SPDK_ERRLOG("spdk_bs_get_super failed: %d\n", rc);
pthread_mutex_unlock(&g_super_blob_mutex);
return NULL;
}
struct zvfs_blob_handle *created = blob_create(0);
if (!created) {
pthread_mutex_unlock(&g_super_blob_mutex);
return NULL;
}
rc = bs_set_super_id(created->id);
if (rc != 0) {
spdk_blob_id created_id = created->id;
SPDK_ERRLOG("spdk_bs_set_super failed: %d\n", rc);
blob_close(created);
blob_delete(created_id);
pthread_mutex_unlock(&g_super_blob_mutex);
return NULL;
}
g_super_blob_id_cache = created->id;
pthread_mutex_unlock(&g_super_blob_mutex);
return created;
}
// blob_create // blob_create
static void blob_create_cb(void *arg, spdk_blob_id blobid, int rc) { static void blob_create_cb(void *arg, spdk_blob_id blobid, int rc) {
struct md_op_ctx *ctx = arg; struct md_op_ctx *ctx = arg;
ctx->rc = rc; ctx->rc = rc;
ctx->create.blob_id = blobid; ctx->create.blob_id = blobid;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
} }
static void blob_create_on_md(struct md_op_ctx *ctx) { static void blob_create_on_md(struct md_op_ctx *ctx) {
@@ -556,13 +561,17 @@ struct zvfs_blob_handle *blob_create(uint64_t size_hint) {
if(size_hint == 0) size_hint = g_engine.cluster_size; if(size_hint == 0) size_hint = g_engine.cluster_size;
struct md_op_ctx ctx = {.fn = blob_create_on_md, .create.size_hint = size_hint, .op_name = "blob create"}; struct md_op_ctx ctx = {.fn = blob_create_on_md, .create.size_hint = size_hint, .op_name = "blob create"};
int rc = dispatch_md_op(&ctx); int rc = dispatch_md_op(&ctx);
if (rc) return NULL; if (rc) {
errno = (rc < 0) ? -rc : EIO;
return NULL;
}
struct zvfs_blob_handle *handle = blob_open(ctx.create.blob_id); struct zvfs_blob_handle *handle = blob_open(ctx.create.blob_id);
if (handle && size_hint > 0) { if (handle && size_hint > 0) {
rc = blob_resize(handle, size_hint); // 初始 resize rc = blob_resize(handle, size_hint); // 初始 resize
if (rc != 0) { if (rc != 0) {
SPDK_ERRLOG("blob_resize failed after create: %d\n", rc); SPDK_ERRLOG("blob_resize failed after create: %d\n", rc);
errno = (rc < 0) ? -rc : EIO;
blob_close(handle); blob_close(handle);
return NULL; return NULL;
} }
@@ -570,6 +579,7 @@ struct zvfs_blob_handle *blob_create(uint64_t size_hint) {
rc = blob_sync_md(handle); rc = blob_sync_md(handle);
if (rc != 0) { if (rc != 0) {
SPDK_ERRLOG("blob_sync_md failed after resize: %d\n", rc); SPDK_ERRLOG("blob_sync_md failed after resize: %d\n", rc);
errno = (rc < 0) ? -rc : EIO;
blob_close(handle); blob_close(handle);
return NULL; return NULL;
} }
@@ -582,7 +592,7 @@ static void blob_open_cb(void *arg, struct spdk_blob *blob, int rc) {
struct md_op_ctx *ctx = arg; struct md_op_ctx *ctx = arg;
ctx->rc = rc; ctx->rc = rc;
ctx->open.blob = blob; ctx->open.blob = blob;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
} }
static void blob_open_on_md(struct md_op_ctx *ctx) { static void blob_open_on_md(struct md_op_ctx *ctx) {
@@ -594,7 +604,10 @@ static void blob_open_on_md(struct md_op_ctx *ctx) {
struct zvfs_blob_handle *blob_open(uint64_t blob_id) { struct zvfs_blob_handle *blob_open(uint64_t blob_id) {
struct md_op_ctx ctx = {.fn = blob_open_on_md, .open.blob_id = blob_id, .op_name = "blob open"}; struct md_op_ctx ctx = {.fn = blob_open_on_md, .open.blob_id = blob_id, .op_name = "blob open"};
int rc = dispatch_md_op(&ctx); int rc = dispatch_md_op(&ctx);
if (rc) return NULL; if (rc) {
errno = (rc < 0) ? -rc : EIO;
return NULL;
}
struct zvfs_blob_handle *handle = malloc(sizeof(*handle)); struct zvfs_blob_handle *handle = malloc(sizeof(*handle));
if (!handle) return NULL; if (!handle) return NULL;
@@ -628,15 +641,18 @@ int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf
spdk_thread_poll(tls.thread, 0, 0); spdk_thread_poll(tls.thread, 0, 0);
} }
if (len == 0) return 0;
struct spdk_io_channel *ch = get_current_channel(); struct spdk_io_channel *ch = get_current_channel();
if (!ch) return -1; if (!ch) return -1;
if (len == 0) return 0; int ret = 0;
// 越界检查 // 越界检查
if (offset + len > handle->size) { if (offset + len > handle->size) {
SPDK_ERRLOG("blob_write out of range: offset=%lu len=%zu blob_size=%lu\n", SPDK_ERRLOG("blob_write out of range: offset=%lu len=%zu blob_size=%lu\n",
offset, len, handle->size); offset, len, handle->size);
return -ERANGE; ret = -ERANGE;
goto out;
} }
// 计算对齐后的 IO 范围和 dma_buf 内偏移 // 计算对齐后的 IO 范围和 dma_buf 内偏移
@@ -646,13 +662,15 @@ int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf
int rc = zvfs_calc_io_units(offset, len, g_engine.io_unit_size, &lba_off, &lba_len, &buf_off); int rc = zvfs_calc_io_units(offset, len, g_engine.io_unit_size, &lba_off, &lba_len, &buf_off);
if (rc != 0) { if (rc != 0) {
SPDK_ERRLOG("blob_write calc_io_units failed: %d\n", rc); SPDK_ERRLOG("blob_write calc_io_units failed: %d\n", rc);
return rc; ret = rc;
goto out;
} }
size_t aligned_bytes = lba_len * g_engine.io_unit_size; size_t aligned_bytes = lba_len * g_engine.io_unit_size;
if (aligned_bytes > ZVFS_DMA_BUF_SIZE) { if (aligned_bytes > ZVFS_DMA_BUF_SIZE) {
SPDK_ERRLOG("blob_write aligned_bytes=%zu exceeds ZVFS_DMA_BUF_SIZE\n", aligned_bytes); SPDK_ERRLOG("blob_write aligned_bytes=%zu exceeds ZVFS_DMA_BUF_SIZE\n", aligned_bytes);
return -ENOSPC; ret = -ENOSPC;
goto out;
} }
struct io_completion_ctx io_ctx = {.done = false, .rc = 0}; struct io_completion_ctx io_ctx = {.done = false, .rc = 0};
@@ -662,7 +680,10 @@ int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf
rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_write(read phase)"); rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_write(read phase)");
if (rc != 0) return rc; if (rc != 0) {
ret = rc;
goto out;
}
memcpy((uint8_t *)handle->dma_buf + buf_off, buf, len); memcpy((uint8_t *)handle->dma_buf + buf_off, buf, len);
io_ctx.done = false; io_ctx.done = false;
@@ -671,9 +692,15 @@ int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf
spdk_blob_io_write(handle->blob, ch, handle->dma_buf, lba_off, lba_len, spdk_blob_io_write(handle->blob, ch, handle->dma_buf, lba_off, lba_len,
io_completion_cb, &io_ctx); io_completion_cb, &io_ctx);
rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_write(write phase)"); rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_write(write phase)");
if (rc != 0) return rc; if (rc != 0) {
ret = rc;
goto out;
}
return io_ctx.rc; ret = io_ctx.rc;
out:
put_current_channel(ch);
return ret;
} }
// blob_read 类似 // blob_read 类似
@@ -682,15 +709,18 @@ int blob_read(struct zvfs_blob_handle *handle, uint64_t offset, void *buf, size_
spdk_thread_poll(tls.thread, 0, 0); spdk_thread_poll(tls.thread, 0, 0);
} }
if (len == 0) return 0;
struct spdk_io_channel *ch = get_current_channel(); struct spdk_io_channel *ch = get_current_channel();
if (!ch) return -1; if (!ch) return -1;
if (len == 0) return 0; int ret = 0;
// 越界检查 // 越界检查
if (offset + len > handle->size) { if (offset + len > handle->size) {
SPDK_ERRLOG("blob_read out of range: offset=%lu len=%zu blob_size=%lu\n", SPDK_ERRLOG("blob_read out of range: offset=%lu len=%zu blob_size=%lu\n",
offset, len, handle->size); offset, len, handle->size);
return -ERANGE; ret = -ERANGE;
goto out;
} }
@@ -701,14 +731,16 @@ int blob_read(struct zvfs_blob_handle *handle, uint64_t offset, void *buf, size_
int rc = zvfs_calc_io_units(offset, len, g_engine.io_unit_size, &lba_off, &lba_len, &buf_off); int rc = zvfs_calc_io_units(offset, len, g_engine.io_unit_size, &lba_off, &lba_len, &buf_off);
if (rc != 0) { if (rc != 0) {
SPDK_ERRLOG("io_read offset/len not aligned to io_unit_size=%lu\n", g_engine.io_unit_size); SPDK_ERRLOG("io_read offset/len not aligned to io_unit_size=%lu\n", g_engine.io_unit_size);
return rc; ret = rc;
goto out;
} }
// 读入对齐范围到 dma_buf再从正确偏移处截取到用户 buf // 读入对齐范围到 dma_buf再从正确偏移处截取到用户 buf
size_t aligned_bytes = lba_len * g_engine.io_unit_size; size_t aligned_bytes = lba_len * g_engine.io_unit_size;
if (aligned_bytes > ZVFS_DMA_BUF_SIZE) { if (aligned_bytes > ZVFS_DMA_BUF_SIZE) {
SPDK_ERRLOG("blob_read aligned_bytes=%zu exceeds ZVFS_DMA_BUF_SIZE\n", aligned_bytes); SPDK_ERRLOG("blob_read aligned_bytes=%zu exceeds ZVFS_DMA_BUF_SIZE\n", aligned_bytes);
return -ENOSPC; ret = -ENOSPC;
goto out;
} }
struct io_completion_ctx io_ctx = {.done = false, .rc = 0}; struct io_completion_ctx io_ctx = {.done = false, .rc = 0};
@@ -717,17 +749,23 @@ int blob_read(struct zvfs_blob_handle *handle, uint64_t offset, void *buf, size_
io_completion_cb, &io_ctx); io_completion_cb, &io_ctx);
rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_read"); rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_read");
if (rc != 0) return rc; if (rc != 0) {
ret = rc;
goto out;
}
memcpy(buf, (uint8_t *)handle->dma_buf + buf_off, len); memcpy(buf, (uint8_t *)handle->dma_buf + buf_off, len);
return io_ctx.rc; ret = io_ctx.rc;
out:
put_current_channel(ch);
return ret;
} }
// blob_resize // blob_resize
static void blob_resize_cb(void *arg, int rc) { static void blob_resize_cb(void *arg, int rc) {
struct md_op_ctx *ctx = arg; struct md_op_ctx *ctx = arg;
ctx->rc = rc; ctx->rc = rc;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
} }
static void blob_resize_on_md(struct md_op_ctx *ctx) { static void blob_resize_on_md(struct md_op_ctx *ctx) {
@@ -736,7 +774,7 @@ static void blob_resize_on_md(struct md_op_ctx *ctx) {
int rc = zvfs_calc_ceil_units(ctx->handle_op.new_size, cluster_size, &new_clusters); int rc = zvfs_calc_ceil_units(ctx->handle_op.new_size, cluster_size, &new_clusters);
if (rc != 0) { if (rc != 0) {
ctx->rc = rc; ctx->rc = rc;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
return; return;
} }
spdk_blob_resize(ctx->handle_op.handle->blob, new_clusters, blob_resize_cb, ctx); spdk_blob_resize(ctx->handle_op.handle->blob, new_clusters, blob_resize_cb, ctx);
@@ -759,7 +797,7 @@ int blob_resize(struct zvfs_blob_handle *handle, uint64_t new_size) {
static void blob_sync_md_cb(void *arg, int rc) { static void blob_sync_md_cb(void *arg, int rc) {
struct md_op_ctx *ctx = arg; struct md_op_ctx *ctx = arg;
ctx->rc = rc; ctx->rc = rc;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
} }
static void blob_sync_md_on_md(struct md_op_ctx *ctx) { static void blob_sync_md_on_md(struct md_op_ctx *ctx) {
@@ -776,7 +814,7 @@ int blob_sync_md(struct zvfs_blob_handle *handle) {
static void blob_close_cb(void *arg, int rc) { static void blob_close_cb(void *arg, int rc) {
struct md_op_ctx *ctx = arg; struct md_op_ctx *ctx = arg;
ctx->rc = rc; ctx->rc = rc;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
} }
static void blob_close_on_md(struct md_op_ctx *ctx) { static void blob_close_on_md(struct md_op_ctx *ctx) {
@@ -798,7 +836,7 @@ int blob_close(struct zvfs_blob_handle *handle) {
static void blob_delete_cb(void *arg, int rc) { static void blob_delete_cb(void *arg, int rc) {
struct md_op_ctx *ctx = arg; struct md_op_ctx *ctx = arg;
ctx->rc = rc; ctx->rc = rc;
ctx->done = true; __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE);
} }
static void blob_delete_on_md(struct md_op_ctx *ctx) { static void blob_delete_on_md(struct md_op_ctx *ctx) {

View File

@@ -31,7 +31,6 @@ typedef struct zvfs_tls_ctx {
int io_engine_init(const char *bdev_name); int io_engine_init(const char *bdev_name);
struct zvfs_blob_handle *blob_get_super(void);
struct zvfs_blob_handle *blob_create(uint64_t size_hint); // 创建并 open返回 handle struct zvfs_blob_handle *blob_create(uint64_t size_hint); // 创建并 open返回 handle
struct zvfs_blob_handle *blob_open(uint64_t blob_id); // open 现有 blob返回 handle struct zvfs_blob_handle *blob_open(uint64_t blob_id); // open 现有 blob返回 handle
int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf, size_t len); int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf, size_t len);

View File

@@ -7,7 +7,7 @@
"method": "bdev_malloc_create", "method": "bdev_malloc_create",
"params": { "params": {
"name": "Malloc0", "name": "Malloc0",
"num_blocks": 32768, "num_blocks": 262140,
"block_size": 512 "block_size": 512
} }
} }