diff --git a/README.md b/README.md index 89d28be..2153729 100755 --- a/README.md +++ b/README.md @@ -1,101 +1,126 @@ # ZVFS -ZVFS 是一个基于 `SPDK Blobstore` 的轻量级用户态文件系统原型, -通过 `LD_PRELOAD` 拦截常见 POSIX 文件 API,把 `/zvfs` 路径下的文件 I/O 转换为 Blob I/O。 +ZVFS 是一个基于 SPDK Blobstore 的用户态文件系统原型,目标是在不改业务代码的前提下,将常见 POSIX 文件 I/O 重定向到用户态高性能存储路径。 +核心思想是复用 Linux 文件管理机制(命名空间/目录/元数据),把文件数据平面放到 ZVFS。 -目标是让上层应用尽量少改动地复用阻塞式文件接口,同时接近 SPDK 在低队列深度(QD≈1)场景的性能上限。 +- Hook 方式:`LD_PRELOAD` +- 挂载前缀:`/zvfs` +- 架构:多进程 Client + 独立 Daemon + SPDK +- 语义:同步阻塞(请求-响应) -## 1. 项目结构 +--- + +## 1. 项目定位 + +这个项目重点不只是“把 I/O 跑起来”,而是把以下工程问题串起来: + +1. 在多线程/多进程应用(RocksDB / PostgreSQL)里做透明接管。 +2. 保留 POSIX 语义(open/close/dup/fork/append/sync 等)。 +3. 把 SPDK 资源集中在 daemon 管理,避免每进程重复初始化。 +4. 在同步阻塞语义下,把协议、并发、错误处理做完整。 + +--- + +## 2. 架构设计 + +![](zvfs架构图.excalidraw.svg) ```text -zvfs/ -├── src/ -│ ├── hook/ # POSIX API hook 层(open/read/write/...) -│ ├── fs/ # inode/path/fd 运行时元数据管理 -│ ├── spdk_engine/ # SPDK Blobstore 封装 -│ ├── common/ # 对齐与缓冲区工具函数 -│ ├── config.h # 默认配置(JSON、bdev、xattr key 等) -│ └── Makefile # 产出 libzvfs.so -├── tests/ -│ ├── hook/ # hook API 语义测试 -│ ├── ioengine_test/ # Blob 引擎单元测试 -│ └── Makefile -├── scripts/ # db_bench/hook 测试辅助脚本 -├── spdk/ # SPDK 子模块 -└── README.md +App (PostgreSQL / RocksDB / db_bench / pgbench) + -> LD_PRELOAD libzvfs.so + -> Hook Client (POSIX 拦截 + 本地状态) + -> Unix Domain Socket IPC (sync/blocking) + -> zvfs_daemon + -> 协议反序列化 + 分发 + -> metadata thread + io threads + -> SPDK Blobstore / bdev ``` -## 2. 核心架构 +### 2.1 透传策略 -### 2.1 分层 +**控制面复用 Linux,数据面走 ZVFS**。 -当前实现: +- 控制面(Linux 负责) + - 目录/命名空间管理。 + - 文件节点生命周期与权限语义(create/open/close/stat/rename/unlink 等)。 + - 这些操作在 `/zvfs` 下也会真实执行系统调用,ZVFS 不重复实现目录树管理。 -```text -App (open/read/write/fstat/...) - -> LD_PRELOAD Hook (src/hook) - -> ZVFS Runtime Metadata (src/fs) - -> SPDK Engine (src/spdk_engine) - -> SPDK Blobstore - -> bdev (Malloc/NVMe) -``` +- 数据面(ZVFS 负责) + - 文件内容读写由 blob 承载。 + - `read/write` 的真实数据路径不走 Linux 文件数据面,而走 ZVFS IPC + SPDK。 -目标架构(Daemon + IPC): +- 关键绑定方式 + - `create`:真实创建 Linux 文件 + 在 ZVFS 创建 blob + 把 `blob_id` 写入文件 xattr。 + - `open`:真实 `open` Linux 文件 + 读取 xattr 获取 `blob_id` + 在 ZVFS 打开 blob。 + - `write`:写入 blob 成功后,使用 `ftruncate` 同步 Linux 视角 `st_size`。 -```text -App (multi-process, e.g. PostgreSQL) - -> LD_PRELOAD Hook Client - -> IPC (Unix Domain Socket) - -> zvfs daemon - -> metadata manager - -> SPDK worker threads - -> SPDK Blobstore / bdev -``` +- 工程收益 + - 直接减少约 50% 的实现工作量。 + - 兼容性更好,数据库可直接复用现有文件组织方式。 -### 2.2 目标架构简版(HOOK 层 + daemon 层) +### 2.2 分层职责 -- `HOOK 层` - - 拦截 `/zvfs` 路径的 POSIX API 并同步发起 IPC 请求。 - - 维护本地最小状态(如 `fd -> remote_handle_id`)。 - - 对非 `/zvfs` 路径继续透传到 `real_*` syscall(POSIX passthrough)。 -- `daemon 层` - - 独占 SPDK 资源(`spdk_env/blobstore/spdk_thread`)。 - - 统一处理元数据与并发控制(path/inode/handle)。 - - 接收 IPC 请求并执行实际 I/O,返回 POSIX 风格结果与 errno。 +- Client(`src/hook` + `src/spdk_engine/io_engine.c`) + - 判断是否 `/zvfs` 路径。 + - 拦截 POSIX API 并发起同步 IPC。 + - 维护最小本地状态(`fd_table/path_cache/inode_table`)。 -### 2.3 元数据与数据映射 +- Daemon(`src/daemon`) + - 独占 SPDK 环境与线程。 + - 统一执行 blob create/open/read/write/resize/sync/delete。 + - 统一管理 handle/ref_count。 -- 文件数据:存储在 SPDK blob 中。 -- 文件到 blob 的映射:写入真实文件的 `xattr`(key: `user.zvfs.blob_id`)。 -- 运行时维护三张表: - - `inode_table`:`blob_id -> inode` - - `path_cache`:`path -> inode` - - `fd_table`:`fd -> open_file` +- 协议层(`src/proto/ipc_proto.*`) + - 统一头 + per-op body。 + - Request Header:`opcode + payload_len` + - Response Header:`opcode + status + payload_len` -### 2.4 当前实现的 I/O 路径要点 +### 2.3 为什么是同步阻塞 IPC -- `blob_read/blob_write` 统一走按 `io_unit_size` 对齐的 DMA 缓冲。 -- 非对齐写会触发读改写(RMW):先读对齐块,再覆盖局部写回。 -- `readv/writev` 在 hook 层会做聚合,减少多次 I/O 提交。 -- `fsync/fdatasync` 对 zvfs fd 调用 `blob_sync_md`;`sync_file_range` 在 zvfs 路径直接返回成功。 +- 业务侧兼容成本低,最容易对齐 POSIX 语义。 +- 调试路径更直接(一个请求对应一个响应)。 +- 先解决正确性和语义完整,再考虑异步化。 -## 3. 构建 +--- -> 下面命令以仓库根目录为 `/home/lian/try/zvfs` 为例。 +## 3. 功能覆盖(当前) -### 3.1 初始化并构建 SPDK +### 3.1 已接管的核心接口 + +- 控制面协同:`open/openat/creat/rename/unlink/...`(真实 syscall + ZVFS 元数据协同) +- 数据面接管:`read/write/pread/pwrite/readv/writev/pwritev` +- 元数据:`fstat/lseek/ftruncate/fallocate` +- 同步:`fsync/fdatasync/sync_file_range` +- FD 语义:`dup/dup2/dup3/fork/close_range` + +### 3.2 语义要点 + +- `write` 默认使用 `AUTO_GROW`。 +- 非 `AUTO_GROW` 写越界返回 `ENOSPC`。 +- `O_APPEND` 语义由 inode `logical_size` 保证。 +- `write` 成功后会同步更新 Linux 文件大小(`ftruncate`),保持 `stat` 视角一致。 +- `mmap` 对 zvfs fd 当前返回 `ENOTSUP`(非 zvfs fd 透传)。 + +### 3.3 映射关系 + +- 文件数据在 SPDK blob 中。 +- 文件到 blob 的映射通过 xattr:`user.zvfs.blob_id`。 + +--- + +## 4. 构建与运行 + +### 4.1 构建 ```bash +cd /home/lian/try/zvfs git submodule update --init --recursive + cd spdk ./scripts/pkgdep.sh ./configure --with-shared make -j"$(nproc)" -``` -### 3.2 构建 ZVFS 与测试 - -```bash cd /home/lian/try/zvfs make -j"$(nproc)" make test -j"$(nproc)" @@ -104,115 +129,158 @@ make test -j"$(nproc)" 产物: - `src/libzvfs.so` -- `tests/bin/hook_api_test` -- `tests/bin/ioengine_single_blob_test` -- `tests/bin/ioengine_multi_blob_test` -- `tests/bin/ioengine_same_blob_mt_test` +- `src/daemon/zvfs_daemon` +- `tests/bin/*` -## 4. 运行与验证 +### 4.2 启动 daemon -### 4.1 Hook API 语义测试 +```bash +cd /home/lian/try/zvfs +./src/daemon/zvfs_daemon +``` + +可选环境变量: + +- `SPDK_BDEV_NAME` +- `SPDK_JSON_CONFIG` +- `ZVFS_SOCKET_PATH` / `ZVFS_IPC_SOCKET_PATH` + +### 4.3 快速验证 ```bash mkdir -p /zvfs -cd /home/lian/try/zvfs -LD_PRELOAD=$PWD/src/libzvfs.so ZVFS_TEST_ROOT=/zvfs ./tests/bin/hook_api_test +LD_PRELOAD=./src/libzvfs.so ZVFS_TEST_ROOT=/zvfs ./tests/bin/hook_api_test +./tests/bin/ipc_zvfs_test ``` -覆盖点包括: +--- -- `open/openat/rename/unlink` -- `read/write/pread/pwrite/readv/writev/pwritev` -- `fstat/lseek/ftruncate` -- `fcntl/ioctl(FIONREAD)` -- `fsync/fdatasync` +## 5. 性能测试 -### 4.2 SPDK 引擎测试 +### 5.1 测试目标 -```bash -cd /home/lian/try/zvfs -SPDK_BDEV_NAME=Malloc0 ./tests/bin/ioengine_single_blob_test -SPDK_BDEV_NAME=Malloc0 ./tests/bin/ioengine_multi_blob_test -SPDK_BDEV_NAME=Malloc0 ./tests/bin/ioengine_same_blob_mt_test -``` +- 目标场景:低队列深度下阻塞 I/O 性能。 +- 对比对象:`spdk_nvme_perf` 与内核路径(`O_DIRECT`)。 -## 5. 关键环境变量 +### 5.2 工具与脚本 -- `SPDK_BDEV_NAME`:选择后端 bdev(默认 `Malloc0`)。 -- `ZVFS_BDEV`:`zvfs_ensure_init` 使用的 bdev 名称(未设置时使用 `config.h` 默认值)。 -- `SPDK_JSON_CONFIG`:覆盖默认 SPDK JSON 配置路径。 +- RocksDB:`scripts/run_db_bench_zvfs.sh` +- PostgreSQL:`codex/run_pgbench_no_mmap.sh` -## 6. 性能说明(仅保留趋势) +建议: -`README` 历史压测数据来自旧版本,不能直接当作当前版本结论;但可作为设计趋势参考: +- PostgreSQL 测试时关闭 mmap 路径(shared memory 改为 sysv,避免 mmap 干扰)。 -- 目标工作负载为阻塞 API,近似 `QD=1`。 -- 旧数据下,ZVFS 在 `QD=1` 时约达到 `spdk_nvme_perf` 的 `90%~95%`。 - - 4K:约 `95 MiB/s` vs `100 MiB/s` - - 128K:约 `1662 MiB/s` vs `1843 MiB/s` -- 相对同机 `O_DIRECT` 路径,旧数据写带宽约有 `2.2x~2.3x` 提升。 -- 非对齐写存在 RMW,吞吐明显下降(旧数据常见接近对齐写的一半)。 +### 5.3 历史结果 -如果需要用于对外汇报,请重新在当前 commit 与固定硬件环境下复测。 +> 以下是历史版本结论,用于说明设计方向。 -## 7. 当前限制 +- QD=1 下可达到 `spdk_nvme_perf` 的约 `90%~95%`。 +- 相对同机 `O_DIRECT`,顺序写吞吐可有约 `2.2x~2.3x` 提升。 +- 非对齐写因 RMW 开销,吞吐明显下降。 -- 仅拦截 `/zvfs` 路径。 -- `mmap` 对 zvfs fd 当前返回 `ENOTSUP`(建议上层关闭 mmap 读写)。 -- `dup/dup2/dup3` 对 zvfs fd 当前返回 `ENOTSUP`。 -- `rename` 跨 `/zvfs` 与非 `/zvfs` 路径返回 `EXDEV`。 -- `fallocate(FALLOC_FL_PUNCH_HOLE)` 未实现。 +--- -## 8. 后续建议 +## 6. 关键工程难点与踩坑复盘(重点) -- 补齐 mmap 路径(mmap table + 脏页回写)。 -- 完善多线程/高并发下的语义与压测基线。 -- 增加版本化 benchmark 报告,避免 README 中历史数据失真。 +这一节是项目最有价值的部分,记录了从“能跑”到“可用于数据库 workload”过程中遇到的关键问题。 -## 9. Blob Store 血泪教训 +### 6.1 SPDK 元数据回调线程模型 -### Owner Thread 绑定 +问题:把 metadata 操作随意派发到任意线程,容易卡住或回调不回来。 -blobstore内部负责并发控制,让所有metadata操作都在一个线程上执行,回调固定绑定给创建blobstore的线程。所以多线程模型下不是send给谁谁就能poll到回调的。 +根因: -正确架构: -``` -metadata thread - spdk_bs_load() - resize - delete - sync_md +- blobstore metadata 操作与创建线程/通道绑定。 +- `resize/delete/unload` 内部会走 `spdk_for_each_channel()` barrier。 -worker thread - blob_io_read - blob_io_write - ``` +修复策略: -### spdk_for_each_channel() Barrier -某些 metadata 操作非常慢: -``` -resize -delete -unload -snapshot -``` -这些操作内部会调用:spdk_for_each_channel() +- 明确 metadata thread 和 io thread 分工。 +- 保证持有 channel 的线程持续 poll。 +- 线程退出时严格释放 channel,避免 barrier 永久等待。 -语义:在所有 io_channel 所属线程执行 callback +### 6.2 Daemon 卡住(请求已收但流程停滞) -类似 -```c -for each channel: - send_msg(channel->thread) -``` +现象:请求日志打印到一半后卡住,压测进程阻塞。 -#### 问题1:持有 Channel 的 Thread 不 poll -如果所属线程不poll,就会卡住。 -#### 问题2:线程退出 Channel 没有释放 -永远卡住 +根因: -### IO 操作的回调行为与 metadata 操作不同 -spdk_blob_io_read / spdk_blob_io_write 的回调,是通过传入的 io_channel 投递的,回调回到分配该 channel 的 thread。 +- UDS 流式读取没有完整分帧处理。 +- 固定小缓冲导致回包序列化失败(`serialize resp failed`)。 -### 超时任务 -设置超时就免不了超时后回调成功执行,超时后回调仍会触发,存在 UAF 风险 +修复: + +- 改为连接级接收缓冲,循环读到 `EAGAIN`。 +- 按“完整包”消费,残包保留到下一轮。 +- 回包序列化改为动态缓冲 + `send_all`。 + +### 6.3 PostgreSQL Tablespace 无法命中 Hook + +现象:建表空间后文件操作路径是 `pg_tblspc/...`,daemon 无请求日志。 + +根因: + +- PostgreSQL 通过符号链接访问 tablespace。 +- 仅按字符串前缀 `/zvfs` 判断会漏判。 + +修复: + +- 路径判定增加 `realpath()` 后再判断。 +- `O_CREAT` 且文件尚不存在时,使用 `realpath(parent)+basename` 判定。 + +### 6.4 PostgreSQL 报 `Permission denied`(跨用户连接 daemon) + +现象:`CREATE DATABASE ... TABLESPACE ...` 报权限错误。 + +根因: + +- daemon 由 root 启动,UDS 文件权限受 umask 影响。 +- postgres 用户无法 `connect(/tmp/zvfs.sock)`。 + +修复: + +- daemon `bind` 后显式 `chmod(socket, 0666)`。 + +### 6.5 PostgreSQL 报 `Message too long` + +现象:部分 SQL(尤其 `CREATE DATABASE` 路径)失败,错误为 `Message too long`。 + +根因: + +- 不是 daemon 解析失败,而是 client 序列化请求时超出 `ZVFS_IPC_BUF_SIZE`。 +- 当前 hook 会把 `writev` 聚合成一次大写请求,容易触发上限。 + +当前处理: + +- 将 `ZVFS_IPC_BUF_SIZE` 提高到 `16MB`(`src/common/config.h`)。 + +后续优化方向: + +- 在 client `blob_write_ex` 做透明分片发送(保持同步阻塞语义)。 + +### 6.6 dup/dup2/fork 语义一致性 + +问题:多个 fd 指向同一 open file description 时,如何保证 handle 引用计数一致。 + +方案: + +- 协议增加 `ADD_REF` / `ADD_REF_BATCH`。 +- 在 hook 中对 `dup/dup2/dup3/fork` 明确执行引用增加。 +- `close_range` 增加边界保护(避免 `UINT_MAX` 场景死循环)。 + +--- + +## 7. 当前限制与下一步 + +### 7.1 当前限制 + +- 单请求仍受 `ZVFS_IPC_BUF_SIZE` 约束。 +- `mmap` 暂不支持 zvfs fd。 +- `ADD_REF_BATCH` 当前优先功能,不保证原子性。 + +### 7.2 下一步计划 + +1. 实现 `WRITE` 客户端透明分片,彻底消除单包上限问题。 +2. 持续完善 PostgreSQL 场景(tablespace + pgbench + crash/restart)。 +3. 补齐更系统的性能复测(固定硬件、固定参数、全量报告)。 diff --git a/postgresql.conf b/postgresql.conf new file mode 100644 index 0000000..115c28b --- /dev/null +++ b/postgresql.conf @@ -0,0 +1,751 @@ +# ----------------------------- +# PostgreSQL configuration file +# ----------------------------- +# +# This file consists of lines of the form: +# +# name = value +# +# (The "=" is optional.) Whitespace may be used. Comments are introduced with +# "#" anywhere on a line. The complete list of parameter names and allowed +# values can be found in the PostgreSQL documentation. +# +# The commented-out settings shown in this file represent the default values. +# Re-commenting a setting is NOT sufficient to revert it to the default value; +# you need to reload the server. +# +# This file is read on server startup and when the server receives a SIGHUP +# signal. If you edit the file on a running system, you have to SIGHUP the +# server for the changes to take effect, run "pg_ctl reload", or execute +# "SELECT pg_reload_conf()". Some parameters, which are marked below, +# require a server shutdown and restart to take effect. +# +# Any parameter can also be given as a command-line option to the server, e.g., +# "postgres -c log_connections=on". Some parameters can be changed at run time +# with the "SET" SQL command. +# +# Memory units: B = bytes Time units: us = microseconds +# kB = kilobytes ms = milliseconds +# MB = megabytes s = seconds +# GB = gigabytes min = minutes +# TB = terabytes h = hours +# d = days + + +#------------------------------------------------------------------------------ +# FILE LOCATIONS +#------------------------------------------------------------------------------ + +# The default values of these variables are driven from the -D command-line +# option or PGDATA environment variable, represented here as ConfigDir. + +#data_directory = 'ConfigDir' # use data in another directory + # (change requires restart) +#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file + # (change requires restart) +#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file + # (change requires restart) + +# If external_pid_file is not explicitly set, no extra PID file is written. +#external_pid_file = '' # write an extra PID file + # (change requires restart) + + +#------------------------------------------------------------------------------ +# CONNECTIONS AND AUTHENTICATION +#------------------------------------------------------------------------------ + +# - Connection Settings - + +#listen_addresses = 'localhost' # what IP address(es) to listen on; + # comma-separated list of addresses; + # defaults to 'localhost'; use '*' for all + # (change requires restart) +#port = 5432 # (change requires restart) +max_connections = 100 # (change requires restart) +#superuser_reserved_connections = 3 # (change requires restart) +#unix_socket_directories = '/var/run/postgresql' # comma-separated list of directories + # (change requires restart) +#unix_socket_group = '' # (change requires restart) +#unix_socket_permissions = 0777 # begin with 0 to use octal notation + # (change requires restart) +#bonjour = off # advertise server via Bonjour + # (change requires restart) +#bonjour_name = '' # defaults to the computer name + # (change requires restart) + +# - TCP settings - +# see "man 7 tcp" for details + +#tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds; + # 0 selects the system default +#tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds; + # 0 selects the system default +#tcp_keepalives_count = 0 # TCP_KEEPCNT; + # 0 selects the system default +#tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; + # 0 selects the system default + +# - Authentication - + +#authentication_timeout = 1min # 1s-600s +#password_encryption = md5 # md5 or scram-sha-256 +#db_user_namespace = off + +# GSSAPI using Kerberos +#krb_server_keyfile = 'FILE:${sysconfdir}/krb5.keytab' +#krb_caseins_users = off + +# - SSL - + +#ssl = off +#ssl_ca_file = '' +#ssl_cert_file = 'server.crt' +#ssl_crl_file = '' +#ssl_key_file = 'server.key' +#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers +#ssl_prefer_server_ciphers = on +#ssl_ecdh_curve = 'prime256v1' +#ssl_min_protocol_version = 'TLSv1' +#ssl_max_protocol_version = '' +#ssl_dh_params_file = '' +#ssl_passphrase_command = '' +#ssl_passphrase_command_supports_reload = off + + +#------------------------------------------------------------------------------ +# RESOURCE USAGE (except WAL) +#------------------------------------------------------------------------------ + +# - Memory - + +shared_buffers = 128MB # min 128kB + # (change requires restart) +#huge_pages = try # on, off, or try + # (change requires restart) +#temp_buffers = 8MB # min 800kB +#max_prepared_transactions = 0 # zero disables the feature + # (change requires restart) +# Caution: it is not advisable to set max_prepared_transactions nonzero unless +# you actively intend to use prepared transactions. +#work_mem = 4MB # min 64kB +#maintenance_work_mem = 64MB # min 1MB +#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem +#max_stack_depth = 2MB # min 100kB +shared_memory_type = sysv # the default is the first option + # supported by the operating system: + # mmap + # sysv + # windows + # (change requires restart) +dynamic_shared_memory_type = sysv # the default is the first option + # supported by the operating system: + # posix + # sysv + # windows + # mmap + # (change requires restart) + +# - Disk - + +#temp_file_limit = -1 # limits per-process temp file space + # in kB, or -1 for no limit + +# - Kernel Resources - + +#max_files_per_process = 1000 # min 25 + # (change requires restart) + +# - Cost-Based Vacuum Delay - + +#vacuum_cost_delay = 0 # 0-100 milliseconds (0 disables) +#vacuum_cost_page_hit = 1 # 0-10000 credits +#vacuum_cost_page_miss = 10 # 0-10000 credits +#vacuum_cost_page_dirty = 20 # 0-10000 credits +#vacuum_cost_limit = 200 # 1-10000 credits + +# - Background Writer - + +#bgwriter_delay = 200ms # 10-10000ms between rounds +#bgwriter_lru_maxpages = 100 # max buffers written/round, 0 disables +#bgwriter_lru_multiplier = 2.0 # 0-10.0 multiplier on buffers scanned/round +#bgwriter_flush_after = 512kB # measured in pages, 0 disables + +# - Asynchronous Behavior - + +#effective_io_concurrency = 1 # 1-1000; 0 disables prefetching +#max_worker_processes = 8 # (change requires restart) +#max_parallel_maintenance_workers = 2 # limited by max_parallel_workers +#max_parallel_workers_per_gather = 2 # limited by max_parallel_workers +#parallel_leader_participation = on +#max_parallel_workers = 8 # number of max_worker_processes that + # can be used in parallel operations +#old_snapshot_threshold = -1 # 1min-60d; -1 disables; 0 is immediate + # (change requires restart) +#backend_flush_after = 0 # measured in pages, 0 disables + + +#------------------------------------------------------------------------------ +# WRITE-AHEAD LOG +#------------------------------------------------------------------------------ + +# - Settings - + +#wal_level = replica # minimal, replica, or logical + # (change requires restart) +#fsync = on # flush data to disk for crash safety + # (turning this off can cause + # unrecoverable data corruption) +#synchronous_commit = on # synchronization level; + # off, local, remote_write, remote_apply, or on +#wal_sync_method = fsync # the default is the first option + # supported by the operating system: + # open_datasync + # fdatasync (default on Linux and FreeBSD) + # fsync + # fsync_writethrough + # open_sync +#full_page_writes = on # recover from partial page writes +#wal_compression = off # enable compression of full-page writes +#wal_log_hints = off # also do full page writes of non-critical updates + # (change requires restart) +#wal_init_zero = on # zero-fill new WAL files +#wal_recycle = on # recycle WAL files +#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers + # (change requires restart) +#wal_writer_delay = 200ms # 1-10000 milliseconds +#wal_writer_flush_after = 1MB # measured in pages, 0 disables + +#commit_delay = 0 # range 0-100000, in microseconds +#commit_siblings = 5 # range 1-1000 + +# - Checkpoints - + +#checkpoint_timeout = 5min # range 30s-1d +max_wal_size = 1GB +min_wal_size = 80MB +#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0 +#checkpoint_flush_after = 256kB # measured in pages, 0 disables +#checkpoint_warning = 30s # 0 disables + +# - Archiving - + +#archive_mode = off # enables archiving; off, on, or always + # (change requires restart) +#archive_command = '' # command to use to archive a logfile segment + # placeholders: %p = path of file to archive + # %f = file name only + # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' +#archive_timeout = 0 # force a logfile segment switch after this + # number of seconds; 0 disables + +# - Archive Recovery - + +# These are only used in recovery mode. + +#restore_command = '' # command to use to restore an archived logfile segment + # placeholders: %p = path of file to restore + # %f = file name only + # e.g. 'cp /mnt/server/archivedir/%f %p' + # (change requires restart) +#archive_cleanup_command = '' # command to execute at every restartpoint +#recovery_end_command = '' # command to execute at completion of recovery + +# - Recovery Target - + +# Set these only when performing a targeted recovery. + +#recovery_target = '' # 'immediate' to end recovery as soon as a + # consistent state is reached + # (change requires restart) +#recovery_target_name = '' # the named restore point to which recovery will proceed + # (change requires restart) +#recovery_target_time = '' # the time stamp up to which recovery will proceed + # (change requires restart) +#recovery_target_xid = '' # the transaction ID up to which recovery will proceed + # (change requires restart) +#recovery_target_lsn = '' # the WAL LSN up to which recovery will proceed + # (change requires restart) +#recovery_target_inclusive = on # Specifies whether to stop: + # just after the specified recovery target (on) + # just before the recovery target (off) + # (change requires restart) +#recovery_target_timeline = 'latest' # 'current', 'latest', or timeline ID + # (change requires restart) +#recovery_target_action = 'pause' # 'pause', 'promote', 'shutdown' + # (change requires restart) + + +#------------------------------------------------------------------------------ +# REPLICATION +#------------------------------------------------------------------------------ + +# - Sending Servers - + +# Set these on the master and on any standby that will send replication data. + +#max_wal_senders = 10 # max number of walsender processes + # (change requires restart) +#wal_keep_segments = 0 # in logfile segments; 0 disables +#wal_sender_timeout = 60s # in milliseconds; 0 disables + +#max_replication_slots = 10 # max number of replication slots + # (change requires restart) +#track_commit_timestamp = off # collect timestamp of transaction commit + # (change requires restart) + +# - Master Server - + +# These settings are ignored on a standby server. + +#synchronous_standby_names = '' # standby servers that provide sync rep + # method to choose sync standbys, number of sync standbys, + # and comma-separated list of application_name + # from standby(s); '*' = all +#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed + +# - Standby Servers - + +# These settings are ignored on a master server. + +#primary_conninfo = '' # connection string to sending server + # (change requires restart) +#primary_slot_name = '' # replication slot on sending server + # (change requires restart) +#promote_trigger_file = '' # file name whose presence ends recovery +#hot_standby = on # "off" disallows queries during recovery + # (change requires restart) +#max_standby_archive_delay = 30s # max delay before canceling queries + # when reading WAL from archive; + # -1 allows indefinite delay +#max_standby_streaming_delay = 30s # max delay before canceling queries + # when reading streaming WAL; + # -1 allows indefinite delay +#wal_receiver_status_interval = 10s # send replies at least this often + # 0 disables +#hot_standby_feedback = off # send info from standby to prevent + # query conflicts +#wal_receiver_timeout = 60s # time that receiver waits for + # communication from master + # in milliseconds; 0 disables +#wal_retrieve_retry_interval = 5s # time to wait before retrying to + # retrieve WAL after a failed attempt +#recovery_min_apply_delay = 0 # minimum delay for applying changes during recovery + +# - Subscribers - + +# These settings are ignored on a publisher. + +#max_logical_replication_workers = 4 # taken from max_worker_processes + # (change requires restart) +#max_sync_workers_per_subscription = 2 # taken from max_logical_replication_workers + + +#------------------------------------------------------------------------------ +# QUERY TUNING +#------------------------------------------------------------------------------ + +# - Planner Method Configuration - + +#enable_bitmapscan = on +#enable_hashagg = on +#enable_hashjoin = on +#enable_indexscan = on +#enable_indexonlyscan = on +#enable_material = on +#enable_mergejoin = on +#enable_nestloop = on +#enable_parallel_append = on +#enable_seqscan = on +#enable_sort = on +#enable_tidscan = on +#enable_partitionwise_join = off +#enable_partitionwise_aggregate = off +#enable_parallel_hash = on +#enable_partition_pruning = on + +# - Planner Cost Constants - + +#seq_page_cost = 1.0 # measured on an arbitrary scale +#random_page_cost = 4.0 # same scale as above +#cpu_tuple_cost = 0.01 # same scale as above +#cpu_index_tuple_cost = 0.005 # same scale as above +#cpu_operator_cost = 0.0025 # same scale as above +#parallel_tuple_cost = 0.1 # same scale as above +#parallel_setup_cost = 1000.0 # same scale as above + +#jit_above_cost = 100000 # perform JIT compilation if available + # and query more expensive than this; + # -1 disables +#jit_inline_above_cost = 500000 # inline small functions if query is + # more expensive than this; -1 disables +#jit_optimize_above_cost = 500000 # use expensive JIT optimizations if + # query is more expensive than this; + # -1 disables + +#min_parallel_table_scan_size = 8MB +#min_parallel_index_scan_size = 512kB +#effective_cache_size = 4GB + +# - Genetic Query Optimizer - + +#geqo = on +#geqo_threshold = 12 +#geqo_effort = 5 # range 1-10 +#geqo_pool_size = 0 # selects default based on effort +#geqo_generations = 0 # selects default based on effort +#geqo_selection_bias = 2.0 # range 1.5-2.0 +#geqo_seed = 0.0 # range 0.0-1.0 + +# - Other Planner Options - + +#default_statistics_target = 100 # range 1-10000 +#constraint_exclusion = partition # on, off, or partition +#cursor_tuple_fraction = 0.1 # range 0.0-1.0 +#from_collapse_limit = 8 +#join_collapse_limit = 8 # 1 disables collapsing of explicit + # JOIN clauses +#force_parallel_mode = off +#jit = on # allow JIT compilation +#plan_cache_mode = auto # auto, force_generic_plan or + # force_custom_plan + + +#------------------------------------------------------------------------------ +# REPORTING AND LOGGING +#------------------------------------------------------------------------------ + +# - Where to Log - + +#log_destination = 'stderr' # Valid values are combinations of + # stderr, csvlog, syslog, and eventlog, + # depending on platform. csvlog + # requires logging_collector to be on. + +# This is used when logging to stderr: +#logging_collector = off # Enable capturing of stderr and csvlog + # into log files. Required to be on for + # csvlogs. + # (change requires restart) + +# These are only used if logging_collector is on: +#log_directory = 'log' # directory where log files are written, + # can be absolute or relative to PGDATA +#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, + # can include strftime() escapes +#log_file_mode = 0600 # creation mode for log files, + # begin with 0 to use octal notation +#log_truncate_on_rotation = off # If on, an existing log file with the + # same name as the new log file will be + # truncated rather than appended to. + # But such truncation only occurs on + # time-driven rotation, not on restarts + # or size-driven rotation. Default is + # off, meaning append to existing files + # in all cases. +#log_rotation_age = 1d # Automatic rotation of logfiles will + # happen after that time. 0 disables. +#log_rotation_size = 10MB # Automatic rotation of logfiles will + # happen after that much log output. + # 0 disables. + +# These are relevant when logging to syslog: +#syslog_facility = 'LOCAL0' +#syslog_ident = 'postgres' +#syslog_sequence_numbers = on +#syslog_split_messages = on + +# This is only relevant when logging to eventlog (win32): +# (change requires restart) +#event_source = 'PostgreSQL' + +# - When to Log - + +#log_min_messages = warning # values in order of decreasing detail: + # debug5 + # debug4 + # debug3 + # debug2 + # debug1 + # info + # notice + # warning + # error + # log + # fatal + # panic + +#log_min_error_statement = error # values in order of decreasing detail: + # debug5 + # debug4 + # debug3 + # debug2 + # debug1 + # info + # notice + # warning + # error + # log + # fatal + # panic (effectively off) + +#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements + # and their durations, > 0 logs only + # statements running at least this number + # of milliseconds + +#log_transaction_sample_rate = 0.0 # Fraction of transactions whose statements + # are logged regardless of their duration. 1.0 logs all + # statements from all transactions, 0.0 never logs. + +# - What to Log - + +#debug_print_parse = off +#debug_print_rewritten = off +#debug_print_plan = off +#debug_pretty_print = on +#log_checkpoints = off +#log_connections = off +#log_disconnections = off +#log_duration = off +#log_error_verbosity = default # terse, default, or verbose messages +#log_hostname = off +#log_line_prefix = '%m [%p] ' # special values: + # %a = application name + # %u = user name + # %d = database name + # %r = remote host and port + # %h = remote host + # %p = process ID + # %t = timestamp without milliseconds + # %m = timestamp with milliseconds + # %n = timestamp with milliseconds (as a Unix epoch) + # %i = command tag + # %e = SQL state + # %c = session ID + # %l = session line number + # %s = session start timestamp + # %v = virtual transaction ID + # %x = transaction ID (0 if none) + # %q = stop here in non-session + # processes + # %% = '%' + # e.g. '<%u%%%d> ' +#log_lock_waits = off # log lock waits >= deadlock_timeout +#log_statement = 'none' # none, ddl, mod, all +#log_replication_commands = off +#log_temp_files = -1 # log temporary files equal or larger + # than the specified size in kilobytes; + # -1 disables, 0 logs all temp files +log_timezone = 'Etc/UTC' + +#------------------------------------------------------------------------------ +# PROCESS TITLE +#------------------------------------------------------------------------------ + +#cluster_name = '' # added to process titles if nonempty + # (change requires restart) +#update_process_title = on + + +#------------------------------------------------------------------------------ +# STATISTICS +#------------------------------------------------------------------------------ + +# - Query and Index Statistics Collector - + +#track_activities = on +#track_counts = on +#track_io_timing = off +#track_functions = none # none, pl, all +#track_activity_query_size = 1024 # (change requires restart) +#stats_temp_directory = 'pg_stat_tmp' + + +# - Monitoring - + +#log_parser_stats = off +#log_planner_stats = off +#log_executor_stats = off +#log_statement_stats = off + + +#------------------------------------------------------------------------------ +# AUTOVACUUM +#------------------------------------------------------------------------------ + +#autovacuum = on # Enable autovacuum subprocess? 'on' + # requires track_counts to also be on. +#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and + # their durations, > 0 logs only + # actions running at least this number + # of milliseconds. +#autovacuum_max_workers = 3 # max number of autovacuum subprocesses + # (change requires restart) +#autovacuum_naptime = 1min # time between autovacuum runs +#autovacuum_vacuum_threshold = 50 # min number of row updates before + # vacuum +#autovacuum_analyze_threshold = 50 # min number of row updates before + # analyze +#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum +#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze +#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum + # (change requires restart) +#autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age + # before forced vacuum + # (change requires restart) +#autovacuum_vacuum_cost_delay = 2ms # default vacuum cost delay for + # autovacuum, in milliseconds; + # -1 means use vacuum_cost_delay +#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for + # autovacuum, -1 means use + # vacuum_cost_limit + + +#------------------------------------------------------------------------------ +# CLIENT CONNECTION DEFAULTS +#------------------------------------------------------------------------------ + +# - Statement Behavior - + +#client_min_messages = notice # values in order of decreasing detail: + # debug5 + # debug4 + # debug3 + # debug2 + # debug1 + # log + # notice + # warning + # error +#search_path = '"$user", public' # schema names +#row_security = on +#default_tablespace = '' # a tablespace name, '' uses the default +#temp_tablespaces = '' # a list of tablespace names, '' uses + # only default tablespace +#default_table_access_method = 'heap' +#check_function_bodies = on +#default_transaction_isolation = 'read committed' +#default_transaction_read_only = off +#default_transaction_deferrable = off +#session_replication_role = 'origin' +#statement_timeout = 0 # in milliseconds, 0 is disabled +#lock_timeout = 0 # in milliseconds, 0 is disabled +#idle_in_transaction_session_timeout = 0 # in milliseconds, 0 is disabled +#vacuum_freeze_min_age = 50000000 +#vacuum_freeze_table_age = 150000000 +#vacuum_multixact_freeze_min_age = 5000000 +#vacuum_multixact_freeze_table_age = 150000000 +#vacuum_cleanup_index_scale_factor = 0.1 # fraction of total number of tuples + # before index cleanup, 0 always performs + # index cleanup +#bytea_output = 'hex' # hex, escape +#xmlbinary = 'base64' +#xmloption = 'content' +#gin_fuzzy_search_limit = 0 +#gin_pending_list_limit = 4MB + +# - Locale and Formatting - + +datestyle = 'iso, mdy' +#intervalstyle = 'postgres' +timezone = 'Etc/UTC' +#timezone_abbreviations = 'Default' # Select the set of available time zone + # abbreviations. Currently, there are + # Default + # Australia (historical usage) + # India + # You can create your own file in + # share/timezonesets/. +#extra_float_digits = 1 # min -15, max 3; any value >0 actually + # selects precise output mode +#client_encoding = sql_ascii # actually, defaults to database + # encoding + +# These settings are initialized by initdb, but they can be changed. +lc_messages = 'en_US.UTF-8' # locale for system error message + # strings +lc_monetary = 'en_US.UTF-8' # locale for monetary formatting +lc_numeric = 'en_US.UTF-8' # locale for number formatting +lc_time = 'en_US.UTF-8' # locale for time formatting + +# default configuration for text search +default_text_search_config = 'pg_catalog.english' + +# - Shared Library Preloading - + +#shared_preload_libraries = '' # (change requires restart) +#local_preload_libraries = '' +#session_preload_libraries = '' +#jit_provider = 'llvmjit' # JIT library to use + +# - Other Defaults - + +#dynamic_library_path = '$libdir' + + +#------------------------------------------------------------------------------ +# LOCK MANAGEMENT +#------------------------------------------------------------------------------ + +#deadlock_timeout = 1s +#max_locks_per_transaction = 64 # min 10 + # (change requires restart) +#max_pred_locks_per_transaction = 64 # min 10 + # (change requires restart) +#max_pred_locks_per_relation = -2 # negative values mean + # (max_pred_locks_per_transaction + # / -max_pred_locks_per_relation) - 1 +#max_pred_locks_per_page = 2 # min 0 + + +#------------------------------------------------------------------------------ +# VERSION AND PLATFORM COMPATIBILITY +#------------------------------------------------------------------------------ + +# - Previous PostgreSQL Versions - + +#array_nulls = on +#backslash_quote = safe_encoding # on, off, or safe_encoding +#escape_string_warning = on +#lo_compat_privileges = off +#operator_precedence_warning = off +#quote_all_identifiers = off +#standard_conforming_strings = on +#synchronize_seqscans = on + +# - Other Platforms and Clients - + +#transform_null_equals = off + + +#------------------------------------------------------------------------------ +# ERROR HANDLING +#------------------------------------------------------------------------------ + +#exit_on_error = off # terminate session on any error? +#restart_after_crash = on # reinitialize after backend crash? +#data_sync_retry = off # retry or panic on failure to fsync + # data? + # (change requires restart) + + +#------------------------------------------------------------------------------ +# CONFIG FILE INCLUDES +#------------------------------------------------------------------------------ + +# These options allow settings to be loaded from files other than the +# default postgresql.conf. Note that these are directives, not variable +# assignments, so they can usefully be given more than once. + +#include_dir = '...' # include files ending in '.conf' from + # a directory, e.g., 'conf.d' +#include_if_exists = '...' # include file only if it exists +#include = '...' # include file + + +#------------------------------------------------------------------------------ +# CUSTOMIZED OPTIONS +#------------------------------------------------------------------------------ + +# Add settings for extensions here diff --git a/scripts/do_pgbecnh.md b/scripts/do_pgbecnh.md new file mode 100644 index 0000000..fc57c94 --- /dev/null +++ b/scripts/do_pgbecnh.md @@ -0,0 +1,77 @@ +```shell +# 1. 安装 PostgreSQL 和 pgbench + +sudo apt-get update +sudo apt-get install -y postgresql postgresql-contrib + +# 2. 找到 postgresql.conf(Ubuntu 通常在这个目录) + +ls /etc/postgresql/*/main/postgresql.conf + +# 3. 配置禁用 mmap(编辑 postgresql.conf) + +shared_memory_type = sysv +dynamic_shared_memory_type = sysv + +# 4. 重启 PostgreSQL + +sudo systemctl stop postgresql +rm -rf /home/lian/pg/pgdata +rm -rf /zvfs/pg_ts_bench + +sudo chown -R postgres:postgres /home/lian/pg +sudo -u postgres mkdir -p /home/lian/pg/pgdata +sudo chown -R postgres:postgres /home/lian/pg/pgdata + +sudo -u postgres env LD_PRELOAD=/home/lian/try/zvfs/src/libzvfs.so \ + /usr/lib/postgresql/12/bin/initdb -D /home/lian/pg/pgdata + +cp ./postgresql.conf /home/lian/pg/pgdata/ + +sudo -u postgres env LD_PRELOAD=/home/lian/try/zvfs/src/libzvfs.so \ + /usr/lib/postgresql/12/bin/pg_ctl -D /home/lian/pg/pgdata -l /tmp/pg.log start + +sudo -u postgres env LD_PRELOAD=/home/lian/try/zvfs/src/libzvfs.so \ + /usr/lib/postgresql/12/bin/psql + +sudo -u postgres env LD_PRELOAD=/home/lian/try/zvfs/src/libzvfs.so \ + /usr/lib/postgresql/12/bin/pg_ctl -D /home/lian/pg/pgdata -l /tmp/pg.log restart + +# 创建测试环境 +sudo -u postgres mkdir -p /zvfs/pg_ts_bench +sudo chown -R postgres:postgres /zvfs/pg_ts_bench +sudo chmod 700 /zvfs/pg_ts_bench + +CREATE TABLESPACE zvfs_ts LOCATION '/zvfs/pg_ts_bench'; +DROP DATABASE IF EXISTS benchdb; +CREATE DATABASE benchdb TABLESPACE zvfs_ts; + +DROP TABLE IF EXISTS hook_probe; +CREATE TABLE hook_probe(id int) TABLESPACE zvfs_ts; +INSERT INTO hook_probe VALUES (1); +INSERT INTO hook_probe VALUES (2); +INSERT INTO hook_probe VALUES (3); +INSERT INTO hook_probe VALUES (4); +SELECT * FROM hook_probe; +DELETE FROM hook_probe WHERE id = 1; +UPDATE hook_probe SET id = 11 WHERE id = 2; +SELECT * FROM hook_probe; + + +# 5. 验证配置生效 +pid=$(pgrep -u postgres -xo postgres) +echo "pid=$pid" +sudo grep libzvfs /proc/$pid/maps + +sudo -u postgres psql -p 5432 -c "show data_directory;" +sudo -u postgres psql -c "SHOW shared_memory_type;" +sudo -u postgres psql -c "SHOW dynamic_shared_memory_type;" + +# 6. 创建测试库(如未创建) + +sudo -u postgres createdb benchdb + +# 7. 运行你的 bench 脚本 + +bash /home/lian/try/zvfs/scripts/run_pgbench_no_mmap.sh +``` \ No newline at end of file diff --git a/scripts/run_db_bench_zvfs.sh b/scripts/run_db_bench_zvfs.sh index 4977c3f..5f6af04 100755 --- a/scripts/run_db_bench_zvfs.sh +++ b/scripts/run_db_bench_zvfs.sh @@ -21,7 +21,7 @@ BENCHMARKS="fillrandom,readrandom" # key数 # NUM=1000000 -NUM=50000 +NUM=500 # 线程数 THREADS=2 diff --git a/scripts/run_pgbench_no_mmap.sh b/scripts/run_pgbench_no_mmap.sh new file mode 100755 index 0000000..a5d534e --- /dev/null +++ b/scripts/run_pgbench_no_mmap.sh @@ -0,0 +1,91 @@ +#!/usr/bin/env bash +set -euo pipefail + +# 仅执行 pgbench 的脚本(不安装 PostgreSQL,不 initdb,不启停服务,不改配置)。 +# +# 前提条件: +# 1) PostgreSQL 已经在运行。 +# 2) 测试库已经存在(默认 benchdb)。 +# 3) PostgreSQL 已经在外部配置为禁用 mmap 共享内存: +# shared_memory_type = sysv +# dynamic_shared_memory_type = sysv +# +# 关于 Malloc0: +# - 当前后端是内存虚拟设备,容量有限。 +# - 默认参数故意设置得较小,避免一次灌入过多数据。 +# +# 关于 LD_PRELOAD: +# - USE_LD_PRELOAD_INIT=1:初始化阶段(pgbench -i)启用 LD_PRELOAD +# - USE_LD_PRELOAD_RUN=1 :压测阶段启用 LD_PRELOAD +# - 设为 0 即可关闭对应阶段的 LD_PRELOAD +# +# 用法: +# bash codex/run_pgbench_no_mmap.sh +# +# 可选环境变量(含义): +# PG_HOST=127.0.0.1 +# PostgreSQL 服务器地址。 +# PG_PORT=5432 +# PostgreSQL 服务器端口(默认改为 5432)。 +# PG_DB=benchdb +# 压测数据库名。 +# PG_SCALE=2 +# pgbench 初始化规模因子(-s),越大初始数据越多。 +# PG_TIME=20 +# 压测持续时间(秒,pgbench -T)。 +# PG_CLIENTS=2 +# 并发客户端数(pgbench -c)。 +# PG_JOBS=2 +# 工作线程数(pgbench -j)。 +# PG_SUPERUSER=postgres +# 执行 pgbench 的系统用户(通常是 postgres)。 +# LD_PRELOAD_PATH=/home/lian/try/zvfs/src/libzvfs.so +# LD_PRELOAD 目标库路径(你的 zvfs hook so)。 +# PG_BIN_DIR=/usr/lib/postgresql/16/bin +# pgbench 所在目录;不填时自动从 PATH 查找。 +# USE_LD_PRELOAD_INIT=1 +# 初始化阶段(pgbench -i)是否启用 LD_PRELOAD:1=启用,0=关闭。 +# USE_LD_PRELOAD_RUN=1 +# 压测阶段是否启用 LD_PRELOAD:1=启用,0=关闭。 + +PG_HOST="${PG_HOST:-127.0.0.1}" +PG_PORT="${PG_PORT:-5432}" +PG_DB="${PG_DB:-benchdb}" +PG_SCALE="${PG_SCALE:-2}" +PG_TIME="${PG_TIME:-20}" +PG_CLIENTS="${PG_CLIENTS:-2}" +PG_JOBS="${PG_JOBS:-2}" +PG_SUPERUSER="${PG_SUPERUSER:-postgres}" +LD_PRELOAD_PATH="${LD_PRELOAD_PATH:-/home/lian/try/zvfs/src/libzvfs.so}" +PG_BIN_DIR="${PG_BIN_DIR:-$(dirname "$(command -v pgbench 2>/dev/null || true)")}" +USE_LD_PRELOAD_INIT="${USE_LD_PRELOAD_INIT:-1}" +USE_LD_PRELOAD_RUN="${USE_LD_PRELOAD_RUN:-1}" + +if [[ -z "${PG_BIN_DIR}" || ! -x "${PG_BIN_DIR}/pgbench" ]]; then + echo "未找到 pgbench,请设置 PG_BIN_DIR 或把 pgbench 放到 PATH 中。" >&2 + exit 1 +fi + +run_pgbench_cmd() { + local use_preload="$1" + shift + if [[ "${use_preload}" == "1" ]]; then + sudo -u "${PG_SUPERUSER}" env LD_PRELOAD="${LD_PRELOAD_PATH}" "$@" + else + sudo -u "${PG_SUPERUSER}" "$@" + fi +} + +echo "当前参数:" +echo " host=${PG_HOST} port=${PG_PORT} db=${PG_DB}" +echo " scale=${PG_SCALE} clients=${PG_CLIENTS} jobs=${PG_JOBS} time=${PG_TIME}s" +echo " preload_init=${USE_LD_PRELOAD_INIT} preload_run=${USE_LD_PRELOAD_RUN}" + +echo "[1/2] 初始化数据(pgbench -i)" +run_pgbench_cmd "${USE_LD_PRELOAD_INIT}" \ + "${PG_BIN_DIR}/pgbench" -h "${PG_HOST}" -p "${PG_PORT}" -i -s "${PG_SCALE}" "${PG_DB}" + +echo "[2/2] 执行压测(pgbench -T)" +run_pgbench_cmd "${USE_LD_PRELOAD_RUN}" \ + "${PG_BIN_DIR}/pgbench" -h "${PG_HOST}" -p "${PG_PORT}" \ + -c "${PG_CLIENTS}" -j "${PG_JOBS}" -T "${PG_TIME}" -P 5 "${PG_DB}" diff --git a/scripts/search_libzvfs.sh b/scripts/search_libzvfs.sh new file mode 100755 index 0000000..355e419 --- /dev/null +++ b/scripts/search_libzvfs.sh @@ -0,0 +1,4 @@ +pgrep -u postgres -x postgres | while read p; do + echo "PID=$p" + sudo grep -m1 libzvfs /proc/$p/maps || echo " (no libzvfs)" +done \ No newline at end of file diff --git a/src/Makefile b/src/Makefile index fb9af0c..305ebd6 100755 --- a/src/Makefile +++ b/src/Makefile @@ -6,7 +6,6 @@ SPDK_ROOT_DIR := $(abspath $(CURDIR)/../spdk) include $(SPDK_ROOT_DIR)/mk/spdk.common.mk include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk -include $(SPDK_ROOT_DIR)/mk/spdk.app_vars.mk LIBZVFS := libzvfs.so @@ -18,6 +17,7 @@ C_SRCS := \ fs/zvfs_path_entry.c \ fs/zvfs_open_file.c \ fs/zvfs_sys_init.c \ + proto/ipc_proto.c \ hook/zvfs_hook_init.c \ hook/zvfs_hook_fd.c \ hook/zvfs_hook_rw.c \ @@ -28,24 +28,40 @@ C_SRCS := \ hook/zvfs_hook_dir.c \ hook/zvfs_hook_mmap.c \ +# 指定头文件搜索路径 +CFLAGS += -I$(abspath $(CURDIR)) -fPIC +# SPDK 库依赖 SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_bdev -LIBS += $(SPDK_LIB_LINKER_ARGS) -CFLAGS += -I$(abspath $(CURDIR)) -LDFLAGS += -shared -rdynamic -Wl,-z,nodelete -Wl,--disable-new-dtags \ +# 链接选项 +LDFLAGS += -shared -Wl,-soname,$(LIBZVFS) -Wl,-z,nodelete \ + -Wl,--disable-new-dtags \ -Wl,-rpath,$(SPDK_ROOT_DIR)/build/lib \ -Wl,-rpath,$(SPDK_ROOT_DIR)/dpdk/build/lib + +# 系统库 SYS_LIBS += -ldl +# 获取 SPDK 库的链接参数 +SPDK_LIBS = $(call spdk_lib_list_to_linker_args,$(SPDK_LIB_LIST)) + +DEPS = $(OBJS:.o=.d) all: $(LIBZVFS) - @: + $(MAKE) -C daemon -$(LIBZVFS): $(OBJS) $(SPDK_LIB_FILES) $(ENV_LIBS) - $(LINK_C) +# 构建目标文件 +$(OBJDIR)/%.o: %.c + $(CC) $(CFLAGS) -c $< -o $@ + +# 构建共享库 +$(LIBZVFS): $(OBJS) + $(CC) $(LDFLAGS) -o $@ $^ $(SPDK_LIBS) $(SYS_LIBS) clean: $(CLEAN_C) $(LIBZVFS) + rm -f $(DEPS) $(OBJS) $(LIBZVFS) + $(MAKE) -C daemon clean include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk diff --git a/src/config.h b/src/common/config.h similarity index 52% rename from src/config.h rename to src/common/config.h index 283da44..c2e71fc 100644 --- a/src/config.h +++ b/src/common/config.h @@ -1,33 +1,20 @@ #ifndef __ZVFS_CONFIG_H__ #define __ZVFS_CONFIG_H__ -/** - * ZVFS - */ + #define ZVFS_XATTR_BLOB_ID "user.zvfs.blob_id" -/** - * SPDK - */ + // dev #define SPDK_JSON_PATH "/home/lian/try/zvfs/src/zvfsmalloc.json" // #define ZVFS_BDEV "Nvme0n1" -#ifndef ZVFS_BDEV #define ZVFS_BDEV "Malloc0" -#endif -// super blob -#define ZVFS_SB_MAGIC UINT64_C(0x5A5646535F534200) /* "ZVFS_SB\0" */ -#define ZVFS_SB_VERSION UINT32_C(1) - -// dma -#define ZVFS_DMA_BUF_SIZE (1024 * 1024) - -// waiter -#define WAITER_MAX_TIME 10000000 +#define ZVFS_DMA_BUF_SIZE (1024 * 1024) #define ZVFS_WAIT_TIME 5000ULL - - +#define ZVFS_IPC_DEFAULT_SOCKET_PATH "/tmp/zvfs.sock" +// #define ZVFS_IPC_BUF_SIZE 4096 +#define ZVFS_IPC_BUF_SIZE (16 * 1024 * 1024) #endif // __ZVFS_CONFIG_H__ diff --git a/src/common/utils.c b/src/common/utils.c index 0a11269..691691c 100644 --- a/src/common/utils.c +++ b/src/common/utils.c @@ -50,44 +50,3 @@ int zvfs_calc_ceil_units(uint64_t bytes, } return 0; } - -int buf_init(zvfs_buf_t *b, size_t initial) -{ - b->data = malloc(initial); - if (!b->data) return -1; - b->cap = initial; - b->len = 0; - return 0; -} - -void buf_free(zvfs_buf_t *b) -{ - free(b->data); - b->data = NULL; - b->len = b->cap = 0; -} - -/* - * 确保缓冲区还有 need 字节可用,不够则 realloc 两倍。 - */ -int buf_reserve(zvfs_buf_t *b, size_t need) -{ - if (b->len + need <= b->cap) return 0; - - size_t new_cap = b->cap * 2; - while (new_cap < b->len + need) new_cap *= 2; - - uint8_t *p = realloc(b->data, new_cap); - if (!p) return -1; - b->data = p; - b->cap = new_cap; - return 0; -} - -int buf_append(zvfs_buf_t *b, const void *src, size_t n) -{ - if (buf_reserve(b, n) != 0) return -1; - memcpy(b->data + b->len, src, n); - b->len += n; - return 0; -} diff --git a/src/common/utils.h b/src/common/utils.h index 1d63023..f05984a 100644 --- a/src/common/utils.h +++ b/src/common/utils.h @@ -15,15 +15,4 @@ int zvfs_calc_ceil_units(uint64_t bytes, uint64_t unit_size, uint64_t *units_out); -typedef struct { - uint8_t *data; - size_t cap; - size_t len; -} zvfs_buf_t; - -int buf_init(zvfs_buf_t *b, size_t initial); -void buf_free(zvfs_buf_t *b); -int buf_reserve(zvfs_buf_t *b, size_t need); -int buf_append(zvfs_buf_t *b, const void *src, size_t n); - #endif // __ZVFS_COMMON_UTILS_H__ diff --git a/src/daemon/Makefile b/src/daemon/Makefile new file mode 100644 index 0000000..053b775 --- /dev/null +++ b/src/daemon/Makefile @@ -0,0 +1,20 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright (C) 2017 Intel Corporation +# All rights reserved. +# + +SPDK_ROOT_DIR := $(abspath $(CURDIR)/../../spdk) +PROTO_DIR := $(abspath $(CURDIR)/../proto) +COMMON_DIR := $(abspath $(CURDIR)/../common) +include $(SPDK_ROOT_DIR)/mk/spdk.common.mk +include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk + +APP = zvfs_daemon + +CFLAGS += -I$(abspath $(CURDIR)/..) + +C_SRCS := main.c ipc_cq.c ipc_reactor.c spdk_engine.c spdk_engine_wrapper.c $(PROTO_DIR)/ipc_proto.c $(COMMON_DIR)/utils.c + +SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_bdev + +include $(SPDK_ROOT_DIR)/mk/spdk.app.mk \ No newline at end of file diff --git a/src/daemon/ipc_cq.c b/src/daemon/ipc_cq.c new file mode 100644 index 0000000..6217f70 --- /dev/null +++ b/src/daemon/ipc_cq.c @@ -0,0 +1,61 @@ +#include "ipc_cq.h" +#include +#include +#include +#include +#include + +struct cq *g_cq; + +struct cq *CQ_Create(void) { + struct cq *q = (struct cq*)malloc(sizeof(*q)); + if (!q) return NULL; + q->head = q->tail = NULL; + pthread_mutex_init(&q->lock, NULL); + return q; +} + +void CQ_Destroy(struct cq *q) { + while (q->head) { + struct cq_item *tmp = q->head; + q->head = tmp->next; + free(tmp->resp->data); // 如果 resp 有 data + free(tmp->resp); + free(tmp); + } + pthread_mutex_destroy(&q->lock); + free(q); +} + +/* 推入响应 */ +void CQ_Push(struct cq *q, struct zvfs_resp *resp) { + struct cq_item *item = (struct cq_item *)malloc(sizeof(*item)); + item->resp = resp; + item->next = NULL; + + pthread_mutex_lock(&q->lock); + if (q->tail) { + q->tail->next = item; + q->tail = item; + } else { + q->head = q->tail = item; + } + pthread_mutex_unlock(&q->lock); +} + +/* 弹出响应 */ +struct zvfs_resp *CQ_Pop(struct cq *q) { + pthread_mutex_lock(&q->lock); + struct cq_item *item = q->head; + if (!item) { + pthread_mutex_unlock(&q->lock); + return NULL; + } + q->head = item->next; + if (!q->head) q->tail = NULL; + pthread_mutex_unlock(&q->lock); + + struct zvfs_resp *resp = item->resp; + free(item); + return resp; +} \ No newline at end of file diff --git a/src/daemon/ipc_cq.h b/src/daemon/ipc_cq.h new file mode 100644 index 0000000..9f6660d --- /dev/null +++ b/src/daemon/ipc_cq.h @@ -0,0 +1,26 @@ +#ifndef __ZVFS_IPC_CQ_H__ +#define __ZVFS_IPC_CQ_H__ + +#include "proto/ipc_proto.h" +#include + + +struct cq_item { + struct zvfs_resp *resp; + struct cq_item *next; +}; + +struct cq { + struct cq_item *head; + struct cq_item *tail; + pthread_mutex_t lock; +}; + +struct cq *CQ_Create(void); +void CQ_Destroy(struct cq *q); +void CQ_Push(struct cq *q, struct zvfs_resp *resp); +struct zvfs_resp *CQ_Pop(struct cq *q); + +extern struct cq *g_cq; + +#endif \ No newline at end of file diff --git a/src/daemon/ipc_reactor.c b/src/daemon/ipc_reactor.c new file mode 100644 index 0000000..c27907f --- /dev/null +++ b/src/daemon/ipc_reactor.c @@ -0,0 +1,309 @@ +#include "ipc_reactor.h" +#include "ipc_cq.h" +#include "common/config.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int send_all(int fd, const uint8_t *buf, size_t len) { + size_t off = 0; + + while (off < len) { + ssize_t sent = send(fd, buf + off, len - off, 0); + if (sent > 0) { + off += (size_t)sent; + continue; + } + if (sent < 0 && errno == EINTR) { + continue; + } + if (sent < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { + /* 当前实现优先功能,等待对端可写后重试。 */ + usleep(100); + continue; + } + return -1; + } + return 0; +} + +/** ====================================================== */ +/* CQ OP */ +/** ====================================================== */ +static void cq_consume_send(struct cq *q) { + struct zvfs_resp *resp; + while ((resp = CQ_Pop(q)) != NULL) { + struct zvfs_conn *conn = resp->conn; + size_t cap = ZVFS_IPC_BUF_SIZE; + uint8_t *buf = NULL; + + // printf("[resp][%s]\n",cast_opcode2string(resp->opcode)); + + buf = malloc(cap); + if (!buf) { + fprintf(stderr, "serialize resp failed: alloc %zu bytes\n", cap); + free(resp->data); + free(resp); + continue; + } + + size_t n = zvfs_serialize_resp(resp, buf, cap); + if (n == 0 && resp->status == 0 && resp->opcode == ZVFS_OP_READ) { + if (resp->length <= SIZE_MAX - 64) { + size_t need = (size_t)resp->length + 64; + uint8_t *bigger = realloc(buf, need); + if (bigger) { + buf = bigger; + cap = need; + n = zvfs_serialize_resp(resp, buf, cap); + } + } + } + + if (n == 0) { + fprintf(stderr, "serialize resp failed: op=%u status=%d len=%lu cap=%zu\n", + resp->opcode, resp->status, resp->length, cap); + free(buf); + free(resp->data); + free(resp); + continue; + } + + if (send_all(conn->fd, buf, n) != 0) { + perror("send"); + free(buf); + free(resp->data); + free(resp); + continue; + } + free(buf); + + // 清理 + if(resp->data) free(resp->data); + free(resp); + } +} + +static int set_nonblock(int fd){ + int flags = fcntl(fd, F_GETFL, 0); + if (flags < 0) + return -1; + + return fcntl(fd, F_SETFL, flags | O_NONBLOCK); +} + +static void epoll_add(struct zvfs_reactor *r, int fd, void *ptr, uint32_t events) +{ + struct epoll_event ev; + + memset(&ev, 0, sizeof(ev)); + ev.events = events; + ev.data.ptr = ptr; + + epoll_ctl(r->epfd, EPOLL_CTL_ADD, fd, &ev); +} + +static void epoll_mod(struct zvfs_reactor *r, int fd, void *ptr, uint32_t events){ + struct epoll_event ev; + + memset(&ev, 0, sizeof(ev)); + ev.events = events; + ev.data.ptr = ptr; + + epoll_ctl(r->epfd, EPOLL_CTL_MOD, fd, &ev); +} + +static void conn_destroy(struct zvfs_conn *c){ + close(c->fd); + free(c); +} + +int zvfs_conn_get_fd(struct zvfs_conn *conn){ + return conn->fd; +} + +void zvfs_conn_set_ctx(struct zvfs_conn *conn, void *ctx){ + conn->user_ctx = ctx; +} + +void *zvfs_conn_get_ctx(struct zvfs_conn *conn){ + return conn->user_ctx; +} + +void zvfs_conn_enable_write(struct zvfs_conn *conn){ + if (conn->want_write) + return; + + conn->want_write = 1; + + struct zvfs_reactor *r = conn->reactor; + + epoll_mod(r, conn->fd, conn, + EPOLLIN | EPOLLOUT | EPOLLET); +} + +void zvfs_conn_disable_write(struct zvfs_conn *conn){ + if (!conn->want_write) + return; + + conn->want_write = 0; + + struct zvfs_reactor *r = conn->reactor; + + epoll_mod(r, conn->fd, conn, + EPOLLIN | EPOLLET); +} + +void zvfs_conn_close(struct zvfs_conn *conn){ + struct zvfs_reactor *r = conn->reactor; + + if (r->opts.on_close) + r->opts.on_close(conn, r->opts.cb_ctx); + + epoll_ctl(r->epfd, EPOLL_CTL_DEL, conn->fd, NULL); + + conn_destroy(conn); +} + +/** + * AF_UNIX -> Unix Domain Socket + * SOCK_STREAM -> 类似 TCP + * path -> 通过某个文件进行通信 + */ +static int create_listen_socket(const char *path, int backlog){ + int fd = socket(AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) + return -1; + + struct sockaddr_un addr; + + memset(&addr, 0, sizeof(addr)); + addr.sun_family = AF_UNIX; + strncpy(addr.sun_path, path, sizeof(addr.sun_path) - 1); + + unlink(path); + + if (bind(fd, (struct sockaddr*)&addr, sizeof(addr)) < 0) + return -1; + + /* + * 避免 daemon 由 root 启动时,socket 权限受 umask 影响导致 + * 其它用户(如 postgres)connect() 被 EACCES 拒绝。 + */ + if (chmod(path, 0666) < 0) + return -1; + + if (listen(fd, backlog) < 0) + return -1; + + set_nonblock(fd); + + return fd; +} + +struct zvfs_reactor *zvfs_reactor_create(const struct zvfs_reactor_opts *opts){ + struct zvfs_reactor *r = calloc(1, sizeof(*r)); + + r->opts = *opts; + + r->epfd = epoll_create1(0); + + r->listen_fd = create_listen_socket( + opts->socket_path, + opts->backlog); + + epoll_add(r, r->listen_fd, NULL, EPOLLIN); + + return r; +} + +static void handle_accept(struct zvfs_reactor *r){ + for (;;) { + + int fd = accept(r->listen_fd, NULL, NULL); + + if (fd < 0) { + + if (errno == EAGAIN || errno == EWOULDBLOCK) + return; + + return; + } + + set_nonblock(fd); + + struct zvfs_conn *conn = calloc(1, sizeof(*conn)); + + conn->fd = fd; + conn->reactor = r; + + epoll_add(r, fd, conn, EPOLLIN | EPOLLET); + + if (r->opts.on_accept) + r->opts.on_accept(conn, r->opts.cb_ctx); + } +} + +int +zvfs_reactor_run(struct zvfs_reactor *r){ + struct epoll_event events[64]; + + r->running = 1; + + while (r->running) { + + int n = epoll_wait(r->epfd, events, 64, 0); + + for (int i = 0; i < n; i++) { + + if (events[i].data.ptr == NULL) { + + handle_accept(r); + continue; + } + + struct zvfs_conn *conn = events[i].data.ptr; + + if (events[i].events & (EPOLLHUP | EPOLLERR)) { + + zvfs_conn_close(conn); + continue; + } + + if ((events[i].events & EPOLLIN) && + r->opts.on_read) { + + r->opts.on_read(conn, r->opts.cb_ctx); + } + + if ((events[i].events & EPOLLOUT) && + r->opts.on_write) { + + r->opts.on_write(conn, r->opts.cb_ctx); + } + } + cq_consume_send(g_cq); + } + return 0; +} + +void zvfs_reactor_stop(struct zvfs_reactor *r){ + r->running = 0; +} + + +void zvfs_reactor_destroy(struct zvfs_reactor *r){ + close(r->listen_fd); + close(r->epfd); + free(r); +} + diff --git a/src/daemon/ipc_reactor.h b/src/daemon/ipc_reactor.h new file mode 100644 index 0000000..fc0f49d --- /dev/null +++ b/src/daemon/ipc_reactor.h @@ -0,0 +1,118 @@ +#ifndef __ZVFS_IPC_REACTOR_H__ +#define __ZVFS_IPC_REACTOR_H__ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +struct zvfs_reactor_opts; +struct zvfs_conn; +struct zvfs_reactor; + +/* callbacks */ + + +typedef void (*zvfs_on_accept_fn)( + struct zvfs_conn *conn, + void *ctx); + +typedef void (*zvfs_on_read_fn)( + struct zvfs_conn *conn, + void *ctx); + +typedef void (*zvfs_on_write_fn)( + struct zvfs_conn *conn, + void *ctx); + +typedef void (*zvfs_on_close_fn)( + struct zvfs_conn *conn, + void *ctx); + +/* configuration */ + +struct zvfs_reactor_opts { + + const char *socket_path; + + int backlog; + + int max_events; + + zvfs_on_accept_fn on_accept; + + zvfs_on_read_fn on_read; + + zvfs_on_write_fn on_write; + + zvfs_on_close_fn on_close; + + void *cb_ctx; +}; + +struct zvfs_conn { + + int fd; + + int want_write; + + void *user_ctx; + + struct zvfs_reactor *reactor; +}; + +struct zvfs_reactor { + + int epfd; + + int listen_fd; + + int running; + + struct zvfs_reactor_opts opts; +}; + + +/* reactor lifecycle */ + +struct zvfs_reactor * +zvfs_reactor_create(const struct zvfs_reactor_opts *opts); + +int +zvfs_reactor_run(struct zvfs_reactor *reactor); + +void +zvfs_reactor_stop(struct zvfs_reactor *reactor); + +void +zvfs_reactor_destroy(struct zvfs_reactor *reactor); + + +/* connection helpers */ + +int +zvfs_conn_get_fd(struct zvfs_conn *conn); + +void +zvfs_conn_close(struct zvfs_conn *conn); + +void +zvfs_conn_enable_write(struct zvfs_conn *conn); + +void +zvfs_conn_disable_write(struct zvfs_conn *conn); + +void +zvfs_conn_set_ctx(struct zvfs_conn *conn, void *ctx); + +void * +zvfs_conn_get_ctx(struct zvfs_conn *conn); + + +#ifdef __cplusplus +} +#endif + +#endif \ No newline at end of file diff --git a/src/daemon/main.c b/src/daemon/main.c new file mode 100644 index 0000000..70ae53b --- /dev/null +++ b/src/daemon/main.c @@ -0,0 +1,259 @@ + +#include "common/config.h" +#include "proto/ipc_proto.h" +#include "ipc_reactor.h" +#include "ipc_cq.h" +#include "spdk_engine_wrapper.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +// #define IPC_REACTOR_ECHO + +#define IPC_REACTOR_ZVFS + +extern struct zvfs_spdk_io_engine g_engine; + + +#ifdef IPC_REACTOR_ECHO +static void on_accept(struct zvfs_conn *conn, void *ctx) +{ + printf("client connected fd=%d\n", + zvfs_conn_get_fd(conn)); +} + + +static void on_read(struct zvfs_conn *c, void *ctx) +{ + int fd = zvfs_conn_get_fd(c); + + char buf[4096]; + + ssize_t n = read(fd, buf, sizeof(buf)); + + if (n == 0) { + zvfs_conn_close(c); + return; + } + + if (n < 0) { + + if (errno == EAGAIN || errno == EWOULDBLOCK) + return; + + perror("read"); + zvfs_conn_close(c); + return; + } + + printf("recv %ld bytes: %.*s\n", n, (int)n, buf); + + ssize_t w = write(fd, buf, n); + + if (w < 0) { + perror("write"); + zvfs_conn_close(c); + return; + } +} + + +static void on_write(struct zvfs_conn *conn, void *ctx) +{ + /* echo server 不需要 write queue */ +} + + +static void on_close(struct zvfs_conn *conn, void *ctx) +{ + printf("connection closed fd=%d\n", + zvfs_conn_get_fd(conn)); +} + + +int main() +{ + struct zvfs_reactor_opts opts = { + .socket_path = "/tmp/zvfs.sock", + .backlog = 128, + .max_events = 64, + .on_accept = on_accept, + .on_read = on_read, + .on_write = on_write, + .on_close = on_close, + .cb_ctx = NULL + }; + + struct zvfs_reactor *r = zvfs_reactor_create(&opts); + + printf("echo server started: %s\n", opts.socket_path); + + zvfs_reactor_run(r); + + return 0; +} + + +#else +static void on_accept(struct zvfs_conn *conn, void *ctx) +{ + struct { + uint8_t *buf; + size_t len; + size_t cap; + } *rctx = calloc(1, sizeof(*rctx)); + + if (!rctx) { + fprintf(stderr, "[accept] alloc conn ctx failed\n"); + zvfs_conn_close(conn); + return; + } + + rctx->cap = ZVFS_IPC_BUF_SIZE; + rctx->buf = calloc(1, rctx->cap); + if (!rctx->buf) { + fprintf(stderr, "[accept] alloc conn rx buffer failed\n"); + free(rctx); + zvfs_conn_close(conn); + return; + } + zvfs_conn_set_ctx(conn, rctx); + + printf("client connected fd=%d\n", + zvfs_conn_get_fd(conn)); +} + +static void on_read(struct zvfs_conn *c, void *ctx) +{ + int fd = zvfs_conn_get_fd(c); + struct { + uint8_t *buf; + size_t len; + size_t cap; + } *rctx = zvfs_conn_get_ctx(c); + + if (!rctx || !rctx->buf || rctx->cap == 0) { + fprintf(stderr, "[read] invalid conn ctx fd=%d\n", fd); + zvfs_conn_close(c); + return; + } + + for (;;) { + if (rctx->len >= rctx->cap) { + fprintf(stderr, "[read] rx buffer overflow fd=%d len=%zu cap=%zu\n", + fd, rctx->len, rctx->cap); + zvfs_conn_close(c); + return; + } + + ssize_t n = read(fd, rctx->buf + rctx->len, rctx->cap - rctx->len); + if (n == 0) { + fprintf(stderr, "[read] fd=%d closed\n", fd); + zvfs_conn_close(c); + return; + } + + if (n < 0) { + if (errno != EAGAIN && errno != EWOULDBLOCK) { + perror("[read]"); + zvfs_conn_close(c); + return; + } + break; + } + + rctx->len += (size_t)n; + } + + size_t offset = 0; + while (offset < rctx->len) { + struct zvfs_req *req = calloc(1, sizeof(*req)); + if (!req) { + fprintf(stderr, "malloc failed\n"); + break; + } + + size_t consumed = zvfs_deserialize_req(rctx->buf + offset, rctx->len - offset, req); + if (consumed == 0) { + free(req); + break; /* 等待更多数据 */ + } + + printf("[req][%s]\n", cast_opcode2string(req->opcode)); + req->conn = c; + offset += consumed; + + if (dispatch_to_worker(req) < 0) { + fprintf(stderr, "[dispatcher] [fd:%d] dispatch error\n", c->fd); + } + } + + if (offset > 0) { + size_t remain = rctx->len - offset; + if (remain > 0) { + memmove(rctx->buf, rctx->buf + offset, remain); + } + rctx->len = remain; + } + + if (rctx->len == rctx->cap) { + fprintf(stderr, "[read] request too large or malformed fd=%d cap=%zu\n", + fd, rctx->cap); + zvfs_conn_close(c); + } +} + +static void on_close(struct zvfs_conn *conn, void *ctx) +{ + struct { + uint8_t *buf; + size_t len; + size_t cap; + } *rctx = zvfs_conn_get_ctx(conn); + + if (rctx) { + free(rctx->buf); + free(rctx); + zvfs_conn_set_ctx(conn, NULL); + } + + printf("connection closed fd=%d\n", + zvfs_conn_get_fd(conn)); +} + + +int main(void){ + + + const char *bdev_name = getenv("SPDK_BDEV_NAME") ? getenv("SPDK_BDEV_NAME") : ZVFS_BDEV; + const char *json_file = getenv("SPDK_JSON_CONFIG") ? getenv("SPDK_JSON_CONFIG") : SPDK_JSON_PATH; + + g_cq = CQ_Create(); + + zvfs_engine_init(bdev_name, json_file, 4); + + struct zvfs_reactor_opts opts = { + .socket_path = ZVFS_IPC_DEFAULT_SOCKET_PATH, + .backlog = 128, + .max_events = 64, + .on_accept = on_accept, + .on_read = on_read, + .on_write = NULL, + .on_close = on_close, + .cb_ctx = &g_engine + }; + + struct zvfs_reactor *r = zvfs_reactor_create(&opts); + zvfs_reactor_run(r); + + + + if(g_cq) CQ_Destroy(g_cq); +} +#endif diff --git a/src/daemon/spdk_engine.c b/src/daemon/spdk_engine.c new file mode 100644 index 0000000..2e7b075 --- /dev/null +++ b/src/daemon/spdk_engine.c @@ -0,0 +1,1047 @@ +#include "common/utils.h" +#include "common/config.h" +#include "spdk_engine.h" +#include "ipc_cq.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/** =========================================================== + * 全局引擎状态 + * ============================================================ */ +struct zvfs_spdk_io_engine g_engine = {0}; + +/** =========================================================== + * 内部辅助:时钟 + * ============================================================ */ +static uint64_t now_mono_ms(void) { + struct timespec ts; + clock_gettime(CLOCK_MONOTONIC, &ts); + return (uint64_t)ts.tv_sec * 1000ULL + (uint64_t)ts.tv_nsec / 1000000ULL; +} + +/** =========================================================== + * 内部辅助:错误路径统一 push resp 并释放 req + * 仅用于无法构造正常 resp 的错误情形 + * ============================================================ */ +static void push_err_resp(struct zvfs_req *req, int status) { + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { + SPDK_ERRLOG("push_err_resp: calloc failed, op_code=%u\n", req->opcode); + if (req->data) free(req->data); + if (req->add_ref_items) free(req->add_ref_items); + free(req); + return; + } + resp->opcode = req->opcode; + resp->conn = req->conn; + resp->status = status; + if (req->data) free(req->data); + if (req->add_ref_items) free(req->add_ref_items); + free(req); + CQ_Push(g_cq, resp); +} + +/** =========================================================== + * bdev / blobstore 初始化(仅在 md 线程上执行) + * 初始化阶段使用同步等待是必要的,因为此时还没有其他线程可以 poll + * ============================================================ */ +struct bs_init_ctx { + bool done; + int rc; + struct spdk_blob_store *bs; +}; + +struct json_load_ctx { + bool done; + int rc; +}; + +static void zvfs_spdk_bdev_event_cb(enum spdk_bdev_event_type type, + struct spdk_bdev *bdev, void *event_ctx) { + (void)event_ctx; + if (type == SPDK_BDEV_EVENT_REMOVE) + SPDK_NOTICELOG("bdev removed: %s\n", spdk_bdev_get_name(bdev)); +} + +static void bs_init_cb(void *arg, struct spdk_blob_store *bs, int bserrno) { + struct bs_init_ctx *ctx = arg; + ctx->rc = bserrno; + ctx->bs = bs; + ctx->done = true; +} + +/** + * 初始化阶段的同步等待:在 md 线程自身上 poll 直到回调完成 + * 仅在 io_engine_init 阶段合法使用 + */ +static int wait_done_init(bool *done_ptr, int *rc_ptr, + struct spdk_thread *thread, const char *op) { + const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME; + while (!*done_ptr) { + spdk_thread_poll(thread, 0, 0); + if (now_mono_ms() >= deadline_ms) { + SPDK_ERRLOG("%s timeout\n", op); + return -ETIMEDOUT; + } + } + if (*rc_ptr != 0) { + SPDK_ERRLOG("%s failed: %d\n", op, *rc_ptr); + return *rc_ptr; + } + return 0; +} + +static void json_app_load_done(int rc, void *arg) { + struct json_load_ctx *ctx = arg; + ctx->rc = rc; + ctx->done = true; +} + +static int load_json_config(struct spdk_thread *thread, const char *json_file) { + const char *path = json_file ? json_file : getenv("SPDK_JSON_CONFIG"); + if (!path) path = SPDK_JSON_PATH; + + struct json_load_ctx ctx = {.done = false, .rc = 0}; + spdk_subsystem_init_from_json_config(path, SPDK_DEFAULT_RPC_ADDR, + json_app_load_done, &ctx, true); + return wait_done_init(&ctx.done, &ctx.rc, thread, "load_json_config"); +} + +static int open_bdev_and_init_bs(const char *bdev_name, struct spdk_thread *thread) { + struct spdk_bs_dev *bs_dev = NULL; + int rc = spdk_bdev_create_bs_dev_ext(bdev_name, zvfs_spdk_bdev_event_cb, NULL, &bs_dev); + if (rc != 0) { + SPDK_ERRLOG("spdk_bdev_create_bs_dev_ext failed: %d\n", rc); + return rc; + } + g_engine.bs_dev = bs_dev; + + struct bs_init_ctx ctx = {.done = false, .rc = 0, .bs = NULL}; + spdk_bs_load(bs_dev, NULL, bs_init_cb, &ctx); + rc = wait_done_init(&ctx.done, &ctx.rc, thread, "bs_load"); + if (rc != 0) { + SPDK_NOTICELOG("bs_load failed (%d), fallback to bs_init\n", rc); + bs_dev = NULL; + rc = spdk_bdev_create_bs_dev_ext(bdev_name, zvfs_spdk_bdev_event_cb, NULL, &bs_dev); + if (rc != 0) { + SPDK_ERRLOG("spdk_bdev_create_bs_dev_ext(fallback) failed: %d\n", rc); + g_engine.bs_dev = NULL; + return rc; + } + g_engine.bs_dev = bs_dev; + ctx.done = false; ctx.rc = 0; ctx.bs = NULL; + spdk_bs_init(bs_dev, NULL, bs_init_cb, &ctx); + rc = wait_done_init(&ctx.done, &ctx.rc, thread, "bs_init"); + if (rc != 0) { + g_engine.bs_dev = NULL; + return rc; + } + } + + g_engine.bs = ctx.bs; + g_engine.io_unit_size = spdk_bs_get_io_unit_size(ctx.bs); + g_engine.cluster_size = spdk_bs_get_cluster_size(ctx.bs); + SPDK_NOTICELOG("Blobstore ready on bdev=%s io_unit=%lu cluster=%lu\n", + bdev_name, g_engine.io_unit_size, g_engine.cluster_size); + return 0; +} + +/** =========================================================== + * 线程池 bootstrap + * thread_pool[0] = md 线程(元数据操作) + * thread_pool[1..N-1] = io 线程(数据读写,各自独占一个 channel) + * ============================================================ */ +struct thread_bootstrap_ctx { + int idx; + const char *bdev_name; /* 仅 md 线程使用 */ + const char *json_file; /* 仅 md 线程使用 */ + pthread_mutex_t mu; + pthread_cond_t cv; + bool done; + int rc; +}; + +/* md 线程:负责 json 加载、bs 初始化,之后持续 poll */ +static void *md_poller_fn(void *arg) { + struct thread_bootstrap_ctx *boot = arg; + struct zvfs_io_thread *slot = &g_engine.thread_pool[0]; + + spdk_set_thread(slot->thread); + + int rc = load_json_config(slot->thread, boot->json_file); + if (rc != 0) { SPDK_ERRLOG("load_json_config failed: %d\n", rc); goto notify; } + + rc = open_bdev_and_init_bs(boot->bdev_name, slot->thread); + +notify: + pthread_mutex_lock(&boot->mu); + boot->rc = rc; + boot->done = true; + pthread_cond_signal(&boot->cv); + pthread_mutex_unlock(&boot->mu); + + if (rc != 0) return NULL; + + slot->ready = true; + /* 持续 poll,处理所有通过 spdk_thread_send_msg 分发的 md 操作 */ + while (true) { + spdk_thread_poll(slot->thread, 0, 0); + usleep(100); + } + return NULL; +} + +/* io 线程:等待 bs 就绪后分配 channel,然后持续 poll */ +static void *io_poller_fn(void *arg) { + struct thread_bootstrap_ctx *boot = arg; + int idx = boot->idx; + struct zvfs_io_thread *slot = &g_engine.thread_pool[idx]; + + spdk_set_thread(slot->thread); + + /* 等待 md 线程完成 blobstore 初始化 */ + const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME * 10; + while (!g_engine.bs) { + if (now_mono_ms() >= deadline_ms) { + SPDK_ERRLOG("io_thread[%d]: wait bs timeout\n", idx); + pthread_mutex_lock(&boot->mu); + boot->rc = -ETIMEDOUT; boot->done = true; + pthread_cond_signal(&boot->cv); + pthread_mutex_unlock(&boot->mu); + return NULL; + } + usleep(1000); + } + + slot->channel = spdk_bs_alloc_io_channel(g_engine.bs); + if (!slot->channel) { + SPDK_ERRLOG("io_thread[%d]: alloc io_channel failed\n", idx); + pthread_mutex_lock(&boot->mu); + boot->rc = -ENOMEM; boot->done = true; + pthread_cond_signal(&boot->cv); + pthread_mutex_unlock(&boot->mu); + return NULL; + } + + pthread_mutex_lock(&boot->mu); + boot->rc = 0; boot->done = true; + pthread_cond_signal(&boot->cv); + pthread_mutex_unlock(&boot->mu); + + slot->ready = true; + /* 持续 poll,处理通过 spdk_thread_send_msg 分发的 IO 操作 */ + while (true) { + spdk_thread_poll(slot->thread, 0, 0); + usleep(100); + } + return NULL; +} + +/** =========================================================== + * io_engine_init + * ============================================================ */ +int io_engine_init(const char *bdev_name, const char *json_file, int thread_num) { + if (g_engine.bs && g_engine.thread_pool) return 0; + + /* 1. 初始化 SPDK 环境 */ + struct spdk_env_opts env_opts; + spdk_env_opts_init(&env_opts); + env_opts.name = "zvfs"; + if (spdk_env_init(&env_opts) != 0) { + SPDK_ERRLOG("spdk_env_init failed\n"); + return -1; + } + spdk_log_set_print_level(SPDK_LOG_NOTICE); + spdk_log_set_level(SPDK_LOG_NOTICE); + spdk_log_open(NULL); + + if (spdk_thread_lib_init(NULL, 0) != 0) { + SPDK_ERRLOG("spdk_thread_lib_init failed\n"); + return -1; + } + + /* 2. 分配线程池(至少 1 md + 1 io) */ + if (thread_num < 2) thread_num = 2; + g_engine.thread_count = thread_num; + g_engine.io_thread_count = thread_num - 1; + g_engine.thread_pool = calloc(thread_num, sizeof(struct zvfs_io_thread)); + if (!g_engine.thread_pool) { + SPDK_ERRLOG("calloc thread_pool failed\n"); + return -ENOMEM; + } + + /* 3. 为所有线程预先创建 spdk_thread 对象 */ + for (int i = 0; i < thread_num; i++) { + char name[32]; + if (i == 0) snprintf(name, sizeof(name), "md_thread"); + else snprintf(name, sizeof(name), "io_thread_%d", i); + + g_engine.thread_pool[i].thread = spdk_thread_create(name, NULL); + if (!g_engine.thread_pool[i].thread) { + SPDK_ERRLOG("spdk_thread_create[%d] failed\n", i); + return -ENOMEM; + } + } + + /* 4. 启动 md 线程,等待其完成 json + bs 初始化 */ + struct thread_bootstrap_ctx md_boot = { + .idx = 0, + .bdev_name = bdev_name, + .json_file = json_file, + .done = false, + .rc = 0, + }; + pthread_mutex_init(&md_boot.mu, NULL); + pthread_cond_init(&md_boot.cv, NULL); + + pthread_t md_tid; + if (pthread_create(&md_tid, NULL, md_poller_fn, &md_boot) != 0) { + SPDK_ERRLOG("pthread_create md_thread failed\n"); + return -1; + } + pthread_detach(md_tid); + g_engine.thread_pool[0].tid = md_tid; + + pthread_mutex_lock(&md_boot.mu); + while (!md_boot.done) pthread_cond_wait(&md_boot.cv, &md_boot.mu); + int md_rc = md_boot.rc; + pthread_mutex_unlock(&md_boot.mu); + pthread_cond_destroy(&md_boot.cv); + pthread_mutex_destroy(&md_boot.mu); + + if (md_rc != 0) { + SPDK_ERRLOG("md init failed: %d\n", md_rc); + return md_rc; + } + + /* 5. 启动所有 io 线程,等待各自就绪 */ + struct thread_bootstrap_ctx *io_boots = + calloc(g_engine.io_thread_count, sizeof(*io_boots)); + if (!io_boots) return -ENOMEM; + + for (int i = 1; i < thread_num; i++) { + struct thread_bootstrap_ctx *b = &io_boots[i - 1]; + b->idx = i; + b->done = false; + b->rc = 0; + pthread_mutex_init(&b->mu, NULL); + pthread_cond_init(&b->cv, NULL); + + pthread_t io_tid; + if (pthread_create(&io_tid, NULL, io_poller_fn, b) != 0) { + SPDK_ERRLOG("pthread_create io_thread[%d] failed\n", i); + free(io_boots); + return -1; + } + pthread_detach(io_tid); + g_engine.thread_pool[i].tid = io_tid; + } + + for (int i = 0; i < g_engine.io_thread_count; i++) { + pthread_mutex_lock(&io_boots[i].mu); + while (!io_boots[i].done) pthread_cond_wait(&io_boots[i].cv, &io_boots[i].mu); + int io_rc = io_boots[i].rc; + pthread_mutex_unlock(&io_boots[i].mu); + pthread_cond_destroy(&io_boots[i].cv); + pthread_mutex_destroy(&io_boots[i].mu); + if (io_rc != 0) { + SPDK_ERRLOG("io_thread[%d] init failed: %d\n", i + 1, io_rc); + free(io_boots); + return io_rc; + } + } + free(io_boots); + + /* 6. 初始化哈希表锁 */ + g_engine.handle_cache = NULL; + pthread_mutex_init(&g_engine.cache_mu, NULL); + + SPDK_NOTICELOG("io_engine_init done: %d threads (%d io)\n", + thread_num, g_engine.io_thread_count); + return 0; +} + +/** =========================================================== + * 线程选择 + * ============================================================ */ +static struct zvfs_io_thread *get_md_thread(void) { + return &g_engine.thread_pool[0]; +} + +/* Round-robin 选取 io 线程 */ +static struct zvfs_io_thread *pick_io_thread(void) { + if (g_engine.io_thread_count <= 0) return NULL; + static _Atomic uint64_t rr = 0; + uint64_t idx = atomic_fetch_add(&rr, 1); + int slot = (int)(idx % (uint64_t)g_engine.io_thread_count) + 1; /* [1, N-1] */ + return &g_engine.thread_pool[slot]; +} + +/** =========================================================== + * 内部操作上下文定义 + * ============================================================ */ + +/* blob_create 内部需要 create → open → resize → sync_md 四步 */ +struct create_chain_ctx { + struct zvfs_req *req; + spdk_blob_id blob_id; + struct spdk_blob *blob; +}; + +struct open_ctx { + struct zvfs_req *req; + struct spdk_blob *blob; +}; + +struct resize_ctx { + struct zvfs_req *req; + struct zvfs_blob_handle *handle; + uint64_t new_size; +}; + +struct sync_ctx { + struct zvfs_req *req; + struct zvfs_blob_handle *handle; +}; + +struct close_ctx { + struct zvfs_req *req; + struct zvfs_blob_handle *handle; +}; + +struct delete_ctx { + struct zvfs_req *req; +}; + +struct io_ctx { + struct zvfs_req *req; + struct zvfs_blob_handle *handle; + struct spdk_thread *io_thread; + struct spdk_io_channel *channel; + uint64_t lba_off; + uint64_t lba_len; + uint32_t buf_off; +}; + +struct write_autogrow_ctx { + struct io_ctx *ioctx; + uint64_t new_size; +}; + +/** =========================================================== + * blob_create:create → open → resize → sync_md → push resp + * ============================================================ */ +static void create_open_cb(void *arg, struct spdk_blob *blob, int bserrno); +static void create_resize_cb(void *arg, int bserrno); +static void create_sync_cb(void *arg, int bserrno); +static void create_open_start_cb(void *arg, spdk_blob_id blobid, int bserrno); + +static void do_blob_create(void *arg) { + struct create_chain_ctx *cctx = arg; + struct spdk_blob_opts opts; + spdk_blob_opts_init(&opts, sizeof(opts)); + spdk_bs_create_blob_ext(g_engine.bs, &opts, create_open_start_cb, cctx); +} + +/* create 完成后立即打开 blob */ +static void create_open_start_cb(void *arg, spdk_blob_id blobid, int bserrno) { + struct create_chain_ctx *cctx = arg; + if (bserrno != 0) { + SPDK_ERRLOG("create_blob failed: %d\n", bserrno); + push_err_resp(cctx->req, bserrno); + free(cctx); + return; + } + cctx->blob_id = blobid; + spdk_bs_open_blob(g_engine.bs, blobid, create_open_cb, cctx); +} + +static void create_open_cb(void *arg, struct spdk_blob *blob, int bserrno) { + struct create_chain_ctx *cctx = arg; + if (bserrno != 0) { + SPDK_ERRLOG("create open_blob failed: %d\n", bserrno); + push_err_resp(cctx->req, bserrno); + free(cctx); + return; + } + cctx->blob = blob; + + uint64_t size_hint = cctx->req->size_hint; + if (size_hint == 0) size_hint = g_engine.cluster_size; + + uint64_t new_clusters = 0; + if (zvfs_calc_ceil_units(size_hint, g_engine.cluster_size, &new_clusters) != 0) { + push_err_resp(cctx->req, -EINVAL); + free(cctx); + return; + } + + spdk_blob_resize(blob, new_clusters, create_resize_cb, cctx); +} + +static void create_resize_cb(void *arg, int bserrno) { + struct create_chain_ctx *cctx = arg; + if (bserrno != 0) { + SPDK_ERRLOG("create resize failed: %d\n", bserrno); + spdk_blob_close(cctx->blob, NULL, NULL); + push_err_resp(cctx->req, bserrno); + free(cctx); + return; + } + spdk_blob_sync_md(cctx->blob, create_sync_cb, cctx); +} + +static void create_sync_cb(void *arg, int bserrno) { + struct create_chain_ctx *cctx = arg; + if (bserrno != 0) { + SPDK_ERRLOG("create sync_md failed: %d\n", bserrno); + spdk_blob_close(cctx->blob, NULL, NULL); + push_err_resp(cctx->req, bserrno); + free(cctx); + return; + } + + /* 构造 handle */ + struct zvfs_blob_handle *handle = calloc(1, sizeof(*handle)); + if (!handle) { + spdk_blob_close(cctx->blob, NULL, NULL); + push_err_resp(cctx->req, -ENOMEM); + free(cctx); + return; + } + handle->blob_id = cctx->blob_id; + handle->blob = cctx->blob; + handle->dma_buf_size = ZVFS_DMA_BUF_SIZE; + atomic_init(&handle->ref_count, 1); + handle->dma_buf = spdk_dma_malloc(ZVFS_DMA_BUF_SIZE, g_engine.io_unit_size, NULL); + if (!handle->dma_buf) { + spdk_blob_close(cctx->blob, NULL, NULL); + free(handle); + push_err_resp(cctx->req, -ENOMEM); + free(cctx); + return; + } + + /* 构造响应 */ + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { + spdk_dma_free(handle->dma_buf); + spdk_blob_close(cctx->blob, NULL, NULL); + free(handle); + push_err_resp(cctx->req, -ENOMEM); + free(cctx); + return; + } + resp->opcode = cctx->req->opcode; + resp->status = 0; + resp->conn = cctx->req->conn; + resp->blob_id = cctx->blob_id; + + zvfs_handle_id_t handle_id; + if (engine_cache_insert(handle, &handle_id) != 0) { + spdk_dma_free(handle->dma_buf); + spdk_blob_close(cctx->blob, NULL, NULL); + free(handle); + push_err_resp(cctx->req, -ENOMEM); + free(cctx); + return; + } + resp->handle_id = handle_id; + + free(cctx->req); + free(cctx); + CQ_Push(g_cq, resp); +} + +int blob_create(struct zvfs_req *req) { + struct create_chain_ctx *cctx = calloc(1, sizeof(*cctx)); + if (!cctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + cctx->req = req; + spdk_thread_send_msg(get_md_thread()->thread, do_blob_create, cctx); + return 0; +} + +/** =========================================================== + * blob_open + * ============================================================ */ +static void blob_open_done_cb(void *arg, struct spdk_blob *blob, int bserrno) { + struct open_ctx *octx = arg; + if (bserrno != 0) { + SPDK_ERRLOG("blob_open failed: %d\n", bserrno); + push_err_resp(octx->req, bserrno); + free(octx); + return; + } + + octx->blob = blob; + + struct zvfs_blob_handle *handle = calloc(1, sizeof(*handle)); + if (!handle) { push_err_resp(octx->req, -ENOMEM); free(octx); return; } + handle->blob_id = octx->req->blob_id; + handle->blob = blob; + handle->dma_buf_size = ZVFS_DMA_BUF_SIZE; + atomic_init(&handle->ref_count, 1); + handle->dma_buf = spdk_dma_malloc(ZVFS_DMA_BUF_SIZE, g_engine.io_unit_size, NULL); + if (!handle->dma_buf) { + spdk_blob_close(blob, NULL, NULL); + free(handle); + push_err_resp(octx->req, -ENOMEM); + free(octx); + return; + } + + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { + spdk_dma_free(handle->dma_buf); + spdk_blob_close(blob, NULL, NULL); + free(handle); + push_err_resp(octx->req, -ENOMEM); + free(octx); + return; + } + + resp->opcode = octx->req->opcode; + resp->status = 0; + resp->conn = octx->req->conn; + zvfs_handle_id_t handle_id; + if (engine_cache_insert(handle, &handle_id) != 0) { + spdk_dma_free(handle->dma_buf); + spdk_blob_close(blob, NULL, NULL); + free(handle); + push_err_resp(octx->req, -ENOMEM); + free(octx); + return; + } + resp->handle_id = handle_id; + resp->size = spdk_blob_get_num_clusters(octx->blob) * g_engine.cluster_size; + + + free(octx->req); + free(octx); + CQ_Push(g_cq, resp); +} + +static void do_blob_open(void *arg) { + struct open_ctx *octx = arg; + struct spdk_blob_open_opts opts; + spdk_blob_open_opts_init(&opts, sizeof(opts)); + spdk_bs_open_blob_ext(g_engine.bs, octx->req->blob_id, &opts, blob_open_done_cb, octx); +} + +int blob_open(struct zvfs_req *req) { + struct open_ctx *octx = calloc(1, sizeof(*octx)); + if (!octx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + octx->req = req; + spdk_thread_send_msg(get_md_thread()->thread, do_blob_open, octx); + return 0; +} + +/** =========================================================== + * blob_resize + * ============================================================ */ +static void blob_resize_done_cb(void *arg, int bserrno) { + struct resize_ctx *rctx = arg; + + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { push_err_resp(rctx->req, -ENOMEM); free(rctx); return; } + resp->opcode = rctx->req->opcode; + resp->status = bserrno; + resp->conn = rctx->req->conn; + + free(rctx->req); + free(rctx); + CQ_Push(g_cq, resp); +} + +static void do_blob_resize(void *arg) { + struct resize_ctx *rctx = arg; + uint64_t new_clusters = 0; + int rc = zvfs_calc_ceil_units(rctx->new_size, g_engine.cluster_size, &new_clusters); + if (rc != 0) { push_err_resp(rctx->req, rc); free(rctx); return; } + spdk_blob_resize(rctx->handle->blob, new_clusters, blob_resize_done_cb, rctx); +} + +int blob_resize(struct zvfs_req *req) { + struct zvfs_blob_handle *handle = req->handle; + if (!handle) { push_err_resp(req, -EINVAL); return -EINVAL; } + + struct resize_ctx *rctx = calloc(1, sizeof(*rctx)); + if (!rctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + rctx->req = req; + rctx->handle = handle; + rctx->new_size = req->size_hint; /* 上层通过 size_hint 传入新大小 */ + spdk_thread_send_msg(get_md_thread()->thread, do_blob_resize, rctx); + return 0; +} + +/** =========================================================== + * blob_sync_md + * ============================================================ */ +static void blob_sync_md_done_cb(void *arg, int bserrno) { + struct sync_ctx *sctx = arg; + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { push_err_resp(sctx->req, -ENOMEM); free(sctx); return; } + resp->opcode = sctx->req->opcode; + resp->status = bserrno; + resp->conn = sctx->req->conn; + + free(sctx->req); + free(sctx); + CQ_Push(g_cq, resp); +} + +static void do_blob_sync_md(void *arg) { + struct sync_ctx *sctx = arg; + spdk_blob_sync_md(sctx->handle->blob, blob_sync_md_done_cb, sctx); +} + +int blob_sync_md(struct zvfs_req *req) { + struct zvfs_blob_handle *handle = req->handle; + if (!handle) { push_err_resp(req, -EINVAL); return -EINVAL; } + + struct sync_ctx *sctx = calloc(1, sizeof(*sctx)); + if (!sctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + sctx->req = req; + sctx->handle = handle; + spdk_thread_send_msg(get_md_thread()->thread, do_blob_sync_md, sctx); + return 0; +} + +/** =========================================================== + * blob_close + * ============================================================ */ +static void blob_close_done_cb(void *arg, int bserrno) { + struct close_ctx *cctx = arg; + if (bserrno == 0) { + engine_cache_remove((zvfs_handle_id_t)(uintptr_t)cctx->handle); + spdk_dma_free(cctx->handle->dma_buf); + free(cctx->handle); + } + + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { push_err_resp(cctx->req, -ENOMEM); free(cctx); return; } + resp->opcode = cctx->req->opcode; + resp->status = bserrno; + resp->conn = cctx->req->conn; + + free(cctx->req); + free(cctx); + CQ_Push(g_cq, resp); +} + +static void do_blob_close(void *arg) { + struct close_ctx *cctx = arg; + spdk_blob_close(cctx->handle->blob, blob_close_done_cb, cctx); +} + +int blob_close(struct zvfs_req *req) { + struct zvfs_blob_handle *handle = req->handle; + if (!handle) { push_err_resp(req, -EINVAL); return -EINVAL; } + + while (1) { + unsigned int old_ref = atomic_load(&handle->ref_count); + if (old_ref == 0) { + push_err_resp(req, -EINVAL); + return -EINVAL; + } + if (atomic_compare_exchange_weak(&handle->ref_count, &old_ref, old_ref - 1)) { + if (old_ref > 1) { + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { + push_err_resp(req, -ENOMEM); + return -ENOMEM; + } + resp->opcode = req->opcode; + resp->status = 0; + resp->conn = req->conn; + free(req); + CQ_Push(g_cq, resp); + return 0; + } + break; + } + } + + struct close_ctx *cctx = calloc(1, sizeof(*cctx)); + if (!cctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + cctx->req = req; + cctx->handle = handle; + spdk_thread_send_msg(get_md_thread()->thread, do_blob_close, cctx); + return 0; +} + +/** =========================================================== + * blob_delete + * ============================================================ */ +static void blob_delete_done_cb(void *arg, int bserrno) { + struct delete_ctx *dctx = arg; + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { push_err_resp(dctx->req, -ENOMEM); free(dctx); return; } + resp->opcode = dctx->req->opcode; + resp->status = bserrno; + resp->conn = dctx->req->conn; + + free(dctx->req); + free(dctx); + CQ_Push(g_cq, resp); +} + +static void do_blob_delete(void *arg) { + struct delete_ctx *dctx = arg; + spdk_bs_delete_blob(g_engine.bs, dctx->req->blob_id, blob_delete_done_cb, dctx); +} + +int blob_delete(struct zvfs_req *req) { + struct delete_ctx *dctx = calloc(1, sizeof(*dctx)); + if (!dctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + dctx->req = req; + spdk_thread_send_msg(get_md_thread()->thread, do_blob_delete, dctx); + return 0; +} + +/** =========================================================== + * blob_read + * IO 操作分发到 io 线程,通过该线程专属的 channel 执行 + * ============================================================ */ +static void blob_read_done_cb(void *arg, int bserrno) { + struct io_ctx *ioctx = arg; + if (bserrno != 0) { + push_err_resp(ioctx->req, bserrno); + free(ioctx); + return; + } + + /* 从 dma_buf 拷贝到用户 buf */ + memcpy(ioctx->req->data, + (uint8_t *)ioctx->handle->dma_buf + ioctx->buf_off, + ioctx->req->length); + + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { push_err_resp(ioctx->req, -ENOMEM); free(ioctx); return; } + resp->opcode = ioctx->req->opcode; + resp->status = 0; + resp->conn = ioctx->req->conn; + resp->length = ioctx->req->length; + resp->data = ioctx->req->data; + + free(ioctx->req); + free(ioctx); + CQ_Push(g_cq, resp); +} + +static void do_blob_read(void *arg) { + struct io_ctx *ioctx = arg; + struct zvfs_blob_handle *handle = ioctx->handle; + + uint64_t cur_size = spdk_blob_get_num_clusters(handle->blob) * g_engine.cluster_size; + if (ioctx->req->offset + ioctx->req->length > cur_size) { + SPDK_ERRLOG("blob_read out of range: offset=%lu len=%lu size=%lu\n", + ioctx->req->offset, ioctx->req->length, cur_size); + push_err_resp(ioctx->req, -ERANGE); + free(ioctx); + return; + } + + uint64_t lba_off = 0, lba_len = 0; + uint32_t buf_off = 0; + int rc = zvfs_calc_io_units(ioctx->req->offset, ioctx->req->length, + g_engine.io_unit_size, &lba_off, &lba_len, &buf_off); + if (rc != 0) { push_err_resp(ioctx->req, rc); free(ioctx); return; } + + if (lba_len * g_engine.io_unit_size > ZVFS_DMA_BUF_SIZE) { + SPDK_ERRLOG("blob_read: aligned size exceeds DMA buf\n"); + push_err_resp(ioctx->req, -ENOSPC); + free(ioctx); + return; + } + + ioctx->lba_off = lba_off; + ioctx->lba_len = lba_len; + ioctx->buf_off = buf_off; + + spdk_blob_io_read(handle->blob, ioctx->channel, handle->dma_buf, + lba_off, lba_len, blob_read_done_cb, ioctx); +} + +int blob_read(struct zvfs_req *req) { + struct zvfs_blob_handle *handle = req->handle; + if (!handle || !req->data) { push_err_resp(req, -EINVAL); return -EINVAL; } + + struct zvfs_io_thread *iot = pick_io_thread(); + if (!iot || !iot->ready || !iot->channel) { + push_err_resp(req, -EIO); return -EIO; + } + + struct io_ctx *ioctx = calloc(1, sizeof(*ioctx)); + if (!ioctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + ioctx->req = req; + ioctx->handle = handle; + ioctx->channel = iot->channel; + + spdk_thread_send_msg(iot->thread, do_blob_read, ioctx); + return 0; +} + +/** =========================================================== + * blob_write(read-modify-write) + * ============================================================ */ +static void blob_write_writephase_cb(void *arg, int bserrno) { + struct io_ctx *ioctx = arg; + if (bserrno != 0) { + push_err_resp(ioctx->req, bserrno); + free(ioctx); + return; + } + + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { push_err_resp(ioctx->req, -ENOMEM); free(ioctx); return; } + resp->opcode = ioctx->req->opcode; + resp->status = 0; + resp->conn = ioctx->req->conn; + resp->bytes_written = ioctx->req->length; + + free(ioctx->req); + free(ioctx); + CQ_Push(g_cq, resp); +} + +static void blob_write_readphase_cb(void *arg, int bserrno) { + struct io_ctx *ioctx = arg; + if (bserrno != 0) { + push_err_resp(ioctx->req, bserrno); + free(ioctx); + return; + } + + /* read-modify: 将用户数据覆盖到 dma_buf 的对应区域 */ + memcpy((uint8_t *)ioctx->handle->dma_buf + ioctx->buf_off, + ioctx->req->data, ioctx->req->length); + + spdk_blob_io_write(ioctx->handle->blob, ioctx->channel, ioctx->handle->dma_buf, + ioctx->lba_off, ioctx->lba_len, blob_write_writephase_cb, ioctx); +} + +static void do_blob_write(void *arg) { + struct io_ctx *ioctx = arg; + struct zvfs_blob_handle *handle = ioctx->handle; + + uint64_t end = 0; + if (__builtin_add_overflow(ioctx->req->offset, ioctx->req->length, &end)) { + push_err_resp(ioctx->req, -EOVERFLOW); + free(ioctx); + return; + } + + uint64_t cur_size = spdk_blob_get_num_clusters(handle->blob) * g_engine.cluster_size; + if (end > cur_size) { + SPDK_ERRLOG("blob_write out of range: offset=%lu len=%lu size=%lu\n", + ioctx->req->offset, ioctx->req->length, cur_size); + push_err_resp(ioctx->req, -ENOSPC); + free(ioctx); + return; + } + + uint64_t lba_off = 0, lba_len = 0; + uint32_t buf_off = 0; + int rc = zvfs_calc_io_units(ioctx->req->offset, ioctx->req->length, + g_engine.io_unit_size, &lba_off, &lba_len, &buf_off); + if (rc != 0) { push_err_resp(ioctx->req, rc); free(ioctx); return; } + + if (lba_len * g_engine.io_unit_size > ZVFS_DMA_BUF_SIZE) { + SPDK_ERRLOG("blob_write: aligned size exceeds DMA buf\n"); + push_err_resp(ioctx->req, -ENOSPC); + free(ioctx); + return; + } + + ioctx->lba_off = lba_off; + ioctx->lba_len = lba_len; + ioctx->buf_off = buf_off; + + /* 先读出完整的对齐块,再 modify,再写回 */ + spdk_blob_io_read(handle->blob, ioctx->channel, handle->dma_buf, + lba_off, lba_len, blob_write_readphase_cb, ioctx); +} + +static void write_autogrow_resize_cb(void *arg, int bserrno) { + struct write_autogrow_ctx *wctx = arg; + if (bserrno != 0) { + push_err_resp(wctx->ioctx->req, bserrno); + free(wctx->ioctx); + free(wctx); + return; + } + + spdk_thread_send_msg(wctx->ioctx->io_thread, do_blob_write, wctx->ioctx); + free(wctx); +} + +static void do_write_autogrow_resize(void *arg) { + struct write_autogrow_ctx *wctx = arg; + uint64_t new_clusters = 0; + int rc = zvfs_calc_ceil_units(wctx->new_size, g_engine.cluster_size, &new_clusters); + if (rc != 0) { + push_err_resp(wctx->ioctx->req, rc); + free(wctx->ioctx); + free(wctx); + return; + } + spdk_blob_resize(wctx->ioctx->handle->blob, new_clusters, write_autogrow_resize_cb, wctx); +} + +int blob_write(struct zvfs_req *req) { + struct zvfs_blob_handle *handle = req->handle; + if (!handle || !req->data) { push_err_resp(req, -EINVAL); return -EINVAL; } + + struct zvfs_io_thread *iot = pick_io_thread(); + if (!iot || !iot->ready || !iot->channel) { + push_err_resp(req, -EIO); return -EIO; + } + + struct io_ctx *ioctx = calloc(1, sizeof(*ioctx)); + if (!ioctx) { push_err_resp(req, -ENOMEM); return -ENOMEM; } + ioctx->req = req; + ioctx->handle = handle; + ioctx->io_thread = iot->thread; + ioctx->channel = iot->channel; + + uint64_t end = 0; + if (__builtin_add_overflow(req->offset, req->length, &end)) { + free(ioctx); + push_err_resp(req, -EOVERFLOW); + return -EOVERFLOW; + } + + uint64_t cur_size = spdk_blob_get_num_clusters(handle->blob) * g_engine.cluster_size; + if (end > cur_size) { + if ((req->write_flags & ZVFS_WRITE_F_AUTO_GROW) == 0) { + free(ioctx); + push_err_resp(req, -ENOSPC); + return -ENOSPC; + } + + struct write_autogrow_ctx *wctx = calloc(1, sizeof(*wctx)); + if (!wctx) { + free(ioctx); + push_err_resp(req, -ENOMEM); + return -ENOMEM; + } + wctx->ioctx = ioctx; + wctx->new_size = end; + spdk_thread_send_msg(get_md_thread()->thread, do_write_autogrow_resize, wctx); + return 0; + } + + spdk_thread_send_msg(iot->thread, do_blob_write, ioctx); + return 0; +} diff --git a/src/daemon/spdk_engine.h b/src/daemon/spdk_engine.h new file mode 100644 index 0000000..27b4620 --- /dev/null +++ b/src/daemon/spdk_engine.h @@ -0,0 +1,68 @@ +#ifndef __ZVFS_SPDK_ENGINE_H__ +#define __ZVFS_SPDK_ENGINE_H__ + +#include "common/uthash.h" +#include "proto/ipc_proto.h" +#include +#include +#include +#include + + +// blob_handle 结构体:底层 blob 信息,不含文件级 size(上层维护) +typedef struct zvfs_blob_handle { + spdk_blob_id blob_id; + struct spdk_blob *blob; + void *dma_buf; + uint64_t dma_buf_size; + atomic_uint ref_count; +} zvfs_blob_handle_t; + +struct zvfs_io_thread { + struct spdk_thread *thread; + struct spdk_io_channel *channel; // 每个 io 线程独占一个 channel + pthread_t tid; + bool ready; +}; + +typedef uint64_t zvfs_handle_id_t; + +struct zvfs_blob_cache_entry { + zvfs_handle_id_t handle_id; // key != blob_id + struct zvfs_blob_handle *handle; + UT_hash_handle hh; +}; + +typedef struct zvfs_spdk_io_engine { + struct spdk_bs_dev *bs_dev; + struct spdk_blob_store *bs; + + + /* 线程池:thread_pool[0] 固定为 md 线程,其余为 io 线程 */ + struct zvfs_io_thread *thread_pool; // 线程池 + int thread_count; // 总线程数 (= CPU 核心数) + int io_thread_count; // 线程数量 + + struct zvfs_blob_cache_entry *handle_cache; // handle_id -> handle 映射 + pthread_mutex_t cache_mu; + + uint64_t io_unit_size; + uint64_t cluster_size; +} zvfs_spdk_io_engine_t; + + +int engine_cache_insert(struct zvfs_blob_handle *handle, zvfs_handle_id_t *out_id); +struct zvfs_blob_handle *engine_cache_lookup(zvfs_handle_id_t handle_id); +void engine_cache_remove(zvfs_handle_id_t handle_id); + +int io_engine_init(const char *bdev_name, const char *json_file, int thread_num); +int blob_create(struct zvfs_req *req); +int blob_open(struct zvfs_req *req); +int blob_write(struct zvfs_req *req); +int blob_read(struct zvfs_req *req); +int blob_resize(struct zvfs_req *req); +int blob_sync_md(struct zvfs_req *req); +int blob_close(struct zvfs_req *req); +int blob_delete(struct zvfs_req *req); + +#endif // __ZVFS_IO_ENGINE_H__ diff --git a/src/daemon/spdk_engine_wrapper.c b/src/daemon/spdk_engine_wrapper.c new file mode 100644 index 0000000..65b61b8 --- /dev/null +++ b/src/daemon/spdk_engine_wrapper.c @@ -0,0 +1,210 @@ +#include "spdk_engine_wrapper.h" +#include "spdk_engine.h" +#include "ipc_cq.h" +#include + +extern struct zvfs_spdk_io_engine g_engine; + +/** cq op */ +static void push_err_resp(struct zvfs_req *req, int status) { + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { + SPDK_ERRLOG("push_err_resp: calloc failed, op_code=%u\n", req->opcode); + if (req->data) free(req->data); + if (req->add_ref_items) free(req->add_ref_items); + free(req); + return; + } + resp->opcode = req->opcode; + resp->conn = req->conn; + resp->status = status; + if (req->data) free(req->data); + if (req->add_ref_items) free(req->add_ref_items); + free(req); + CQ_Push(g_cq, resp); +} + +static void push_ok_resp(struct zvfs_req *req) { + struct zvfs_resp *resp = calloc(1, sizeof(*resp)); + if (!resp) { + SPDK_ERRLOG("push_ok_resp: calloc failed, op_code=%u\n", req->opcode); + if (req->data) free(req->data); + if (req->add_ref_items) free(req->add_ref_items); + free(req); + return; + } + resp->opcode = req->opcode; + resp->conn = req->conn; + resp->status = 0; + if (req->data) free(req->data); + if (req->add_ref_items) free(req->add_ref_items); + free(req); + CQ_Push(g_cq, resp); +} + +/** hash map op */ +int engine_cache_insert(struct zvfs_blob_handle *handle, zvfs_handle_id_t *out_id) { + struct zvfs_blob_cache_entry *entry = calloc(1, sizeof(*entry)); + if (!entry) return -ENOMEM; + entry->handle_id = (zvfs_handle_id_t)(uintptr_t)handle; + entry->handle = handle; + pthread_mutex_lock(&g_engine.cache_mu); + HASH_ADD(hh, g_engine.handle_cache, handle_id, sizeof(zvfs_handle_id_t), entry); + pthread_mutex_unlock(&g_engine.cache_mu); + *out_id = entry->handle_id; + return 0; +} + +struct zvfs_blob_handle *engine_cache_lookup(zvfs_handle_id_t handle_id) { + struct zvfs_blob_cache_entry *entry = NULL; + pthread_mutex_lock(&g_engine.cache_mu); + HASH_FIND(hh, g_engine.handle_cache, &handle_id, sizeof(zvfs_handle_id_t), entry); + pthread_mutex_unlock(&g_engine.cache_mu); + return entry ? entry->handle : NULL; +} + +void engine_cache_remove(zvfs_handle_id_t handle_id) { + struct zvfs_blob_cache_entry *entry = NULL; + pthread_mutex_lock(&g_engine.cache_mu); + HASH_FIND(hh, g_engine.handle_cache, &handle_id, sizeof(zvfs_handle_id_t), entry); + if (entry) { HASH_DEL(g_engine.handle_cache, entry); free(entry); } + pthread_mutex_unlock(&g_engine.cache_mu); +} + +static int fill_handle(struct zvfs_req *req, const char *op) { + struct zvfs_blob_handle *handle = engine_cache_lookup(req->handle_id); + if (!handle) { + SPDK_ERRLOG("%s: invalid handle_id=%lu\n", op, req->handle_id); + push_err_resp(req, -EBADF); + return -EBADF; + } + req->handle = handle; + return 0; +} + + + +// zvfs wrapper + +int zvfs_engine_init(const char *bdev_name, const char *json_file, int thread_num) { + return io_engine_init(bdev_name, json_file, thread_num); +} + +/* create / open:handle 在 engine 回调里注册,wrapper 直接透传 */ +static int zvfs_create(struct zvfs_req *req) { + return blob_create(req); +} + +static int zvfs_open(struct zvfs_req *req) { + return blob_open(req); +} + +/* delete:只需要 blob_id,无需 handle */ +static int zvfs_delete(struct zvfs_req *req) { + return blob_delete(req); +} + +/* 以下操作需要先填充 handle */ +static int zvfs_write(struct zvfs_req *req) { + if (fill_handle(req, "zvfs_write") != 0) return -EBADF; + return blob_write(req); +} + +static int zvfs_read(struct zvfs_req *req) { + if (fill_handle(req, "zvfs_read") != 0) return -EBADF; + return blob_read(req); +} + +static int zvfs_resize(struct zvfs_req *req) { + if (fill_handle(req, "zvfs_resize") != 0) return -EBADF; + return blob_resize(req); +} + +static int zvfs_sync_md(struct zvfs_req *req) { + if (fill_handle(req, "zvfs_sync_md") != 0) return -EBADF; + return blob_sync_md(req); +} + +/* close:fill_handle 之后 engine 回调里会同步 cache_remove */ +static int zvfs_close(struct zvfs_req *req) { + if (fill_handle(req, "zvfs_close") != 0) return -EBADF; + return blob_close(req); +} + +static int zvfs_add_ref(struct zvfs_req *req) { + if (req->ref_delta == 0) { + push_err_resp(req, -EINVAL); + return -EINVAL; + } + if (fill_handle(req, "zvfs_add_ref") != 0) return -EBADF; + atomic_fetch_add(&req->handle->ref_count, req->ref_delta); + push_ok_resp(req); + return 0; +} + +static int zvfs_add_ref_batch(struct zvfs_req *req) { + int rc = 0; + uint32_t i = 0; + + if (req->add_ref_count == 0 || !req->add_ref_items) { + push_err_resp(req, -EINVAL); + return -EINVAL; + } + + /* TODO: 当前为功能优先的非原子批量加引用实现。 */ + for (i = 0; i < req->add_ref_count; i++) { + struct zvfs_add_ref_item *item = &req->add_ref_items[i]; + struct zvfs_blob_handle *handle = NULL; + + if (item->ref_delta == 0) { + rc = -EINVAL; + continue; + } + + handle = engine_cache_lookup(item->handle_id); + if (!handle) { + rc = -EBADF; + continue; + } + + atomic_fetch_add(&handle->ref_count, item->ref_delta); + } + + if (rc != 0) { + push_err_resp(req, rc); + return rc; + } + + push_ok_resp(req); + return 0; +} + +int dispatch_to_worker(struct zvfs_req *req){ + switch (req->opcode) + { + case ZVFS_OP_CREATE: + return zvfs_create(req); + case ZVFS_OP_OPEN: + return zvfs_open(req); + case ZVFS_OP_READ: + return zvfs_read(req); + case ZVFS_OP_WRITE: + return zvfs_write(req); + case ZVFS_OP_RESIZE: + return zvfs_resize(req); + case ZVFS_OP_SYNC_MD: + return zvfs_sync_md(req); + case ZVFS_OP_CLOSE: + return zvfs_close(req); + case ZVFS_OP_DELETE: + return zvfs_delete(req); + case ZVFS_OP_ADD_REF: + return zvfs_add_ref(req); + case ZVFS_OP_ADD_REF_BATCH: + return zvfs_add_ref_batch(req); + default: + break; + } + + return -1; +} diff --git a/src/daemon/spdk_engine_wrapper.h b/src/daemon/spdk_engine_wrapper.h new file mode 100644 index 0000000..2d5f2d9 --- /dev/null +++ b/src/daemon/spdk_engine_wrapper.h @@ -0,0 +1,13 @@ +#ifndef __ZVFS_ENGINE_H__ +#define __ZVFS_ENGINE_H__ + +#include "proto/ipc_proto.h" + + + +int zvfs_engine_init(const char *bdev_name, const char *json_file, int thread_num); + + +int dispatch_to_worker(struct zvfs_req *req); + +#endif \ No newline at end of file diff --git a/src/daemon/zvfs_daemon b/src/daemon/zvfs_daemon new file mode 100755 index 0000000..cd747a9 Binary files /dev/null and b/src/daemon/zvfs_daemon differ diff --git a/src/fs/zvfs.c b/src/fs/zvfs.c index d9306fe..2075a03 100644 --- a/src/fs/zvfs.c +++ b/src/fs/zvfs.c @@ -1,7 +1,8 @@ #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif -#include "config.h" + +#include "common/config.h" #include "common/utils.h" #include "fs/zvfs.h" #include "fs/zvfs_inode.h" @@ -10,6 +11,7 @@ #include #include +#include struct zvfs_fs g_fs = {0}; /* ------------------------------------------------------------------ */ diff --git a/src/fs/zvfs_inode.c b/src/fs/zvfs_inode.c index b0198e2..55fe736 100644 --- a/src/fs/zvfs_inode.c +++ b/src/fs/zvfs_inode.c @@ -67,10 +67,11 @@ void inode_remove(uint64_t blob_id) { /* size / timestamp helpers (调用方持有 inode->mu) */ /* ------------------------------------------------------------------ */ -void inode_update_size(struct zvfs_inode *inode, int real_fd, uint64_t new_size) { +int inode_update_size(struct zvfs_inode *inode, int real_fd, uint64_t new_size) { inode->logical_size = new_size; if (real_fd >= 0) - ftruncate(real_fd, (off_t)new_size); /* 同步 st_size,忽略错误 */ + return ftruncate(real_fd, (off_t)new_size); /* 同步 st_size,忽略错误 */ + return 0; } void inode_touch_atime(struct zvfs_inode *inode) { diff --git a/src/fs/zvfs_inode.h b/src/fs/zvfs_inode.h index bc4334e..5950aad 100644 --- a/src/fs/zvfs_inode.h +++ b/src/fs/zvfs_inode.h @@ -49,7 +49,7 @@ void inode_remove(uint64_t blob_id); // 更新 logical_size,同时负责调用 ftruncate 同步 st_size // 需持有 inode->mu -void inode_update_size(struct zvfs_inode *inode, int real_fd, uint64_t new_size); +int inode_update_size(struct zvfs_inode *inode, int real_fd, uint64_t new_size); // 更新时间戳(需持有 inode->mu) void inode_touch_atime(struct zvfs_inode *inode); diff --git a/src/fs/zvfs_open_file.c b/src/fs/zvfs_open_file.c index 178910c..f328e63 100644 --- a/src/fs/zvfs_open_file.c +++ b/src/fs/zvfs_open_file.c @@ -15,19 +15,18 @@ struct zvfs_open_file *openfile_alloc(int fd, struct zvfs_inode *inode, int flags, - struct zvfs_blob_handle *handle) + uint64_t handle_id) { struct zvfs_open_file *of = calloc(1, sizeof(*of)); if (!of) return NULL; - of->fd = fd; - of->inode = inode; - of->handle = handle; - of->flags = flags; - of->fd_flags = 0; - of->offset = 0; - atomic_init(&of->ref_count, 1); + of->fd = fd; + of->inode = inode; + of->handle_id = handle_id; + of->flags = flags; + of->fd_flags = 0; + of->offset = 0; return of; } @@ -94,4 +93,4 @@ uint64_t openfile_seek(struct zvfs_open_file *of, int64_t offset, int whence) of->offset = (uint64_t)new_off; return of->offset; -} \ No newline at end of file +} diff --git a/src/fs/zvfs_open_file.h b/src/fs/zvfs_open_file.h index de47c1e..7b2fd98 100644 --- a/src/fs/zvfs_open_file.h +++ b/src/fs/zvfs_open_file.h @@ -3,33 +3,26 @@ #include "common/uthash.h" #include "spdk_engine/io_engine.h" -#include #include -#ifndef SPDK_BLOB_ID_DEFINED -typedef uint64_t spdk_blob_id; -#define SPDK_BLOB_ID_DEFINED -#endif - struct zvfs_open_file { int fd; // key,和真实 fd 1:1 struct zvfs_inode *inode; - struct zvfs_blob_handle *handle; + uint64_t handle_id; int flags; int fd_flags; uint64_t offset; // 非 APPEND 模式的当前位置 - atomic_int ref_count; // dup / close 用 UT_hash_handle hh; }; -// 分配 openfile,不插入全局表,ref_count 初始为 1 +// 分配 openfile,不插入全局表 struct zvfs_open_file *openfile_alloc(int fd, struct zvfs_inode *inode, - int flags, struct zvfs_blob_handle *handle); + int flags, uint64_t handle_id); -// 释放内存(调用前确保 ref_count == 0,不负责 blob_close) +// 释放内存 void openfile_free(struct zvfs_open_file *of); // 插入全局表(需持有 fd_mu) @@ -45,4 +38,4 @@ void openfile_remove(int fd); // 需持有 of->inode->mu(读 logical_size) uint64_t openfile_seek(struct zvfs_open_file *of, int64_t offset, int whence); -#endif // __ZVFS_OPEN_FILE_H__ \ No newline at end of file +#endif // __ZVFS_OPEN_FILE_H__ diff --git a/src/fs/zvfs_sys_init.c b/src/fs/zvfs_sys_init.c index 6f9375d..57cb53f 100644 --- a/src/fs/zvfs_sys_init.c +++ b/src/fs/zvfs_sys_init.c @@ -2,7 +2,8 @@ #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif -#include "config.h" + +#include "common/config.h" #include "zvfs_sys_init.h" #include "fs/zvfs.h" // zvfs_fs_init #include "spdk_engine/io_engine.h" @@ -17,17 +18,6 @@ static int _init_ok = 0; static void do_init(void) { - const char *bdev = getenv("ZVFS_BDEV"); - if (!bdev) { - bdev = ZVFS_BDEV; - fprintf(stderr, "[zvfs] ZVFS_BDEV not set, set as (%s)\n", ZVFS_BDEV); - } - - if (io_engine_init(bdev) != 0) { - fprintf(stderr, "[zvfs] FATAL: io_engine_init(%s) failed\n", bdev); - abort(); - } - _init_ok = 1; } diff --git a/src/hook/zvfs_hook_fcntl.c b/src/hook/zvfs_hook_fcntl.c index 14646b9..186d11b 100644 --- a/src/hook/zvfs_hook_fcntl.c +++ b/src/hook/zvfs_hook_fcntl.c @@ -68,9 +68,19 @@ zvfs_fcntl_impl(int fd, int cmd, va_list ap) /* ---- dup 类 -------------------------------------------------- */ case F_DUPFD: case F_DUPFD_CLOEXEC: { - (void)va_arg(ap, int); - errno = ENOTSUP; - return -1; + int minfd = va_arg(ap, int); + int newfd = real_fcntl(fd, cmd, minfd); + if (newfd < 0) + return -1; + + int new_fd_flags = (cmd == F_DUPFD_CLOEXEC) ? FD_CLOEXEC : 0; + if (zvfs_dup_attach_newfd(fd, newfd, new_fd_flags) < 0) { + int saved = errno; + real_close(newfd); + errno = saved; + return -1; + } + return newfd; } /* ---- 文件锁(不实现,假装无锁)-------------------------------- */ diff --git a/src/hook/zvfs_hook_fd.c b/src/hook/zvfs_hook_fd.c index 771d5e6..0bb1807 100644 --- a/src/hook/zvfs_hook_fd.c +++ b/src/hook/zvfs_hook_fd.c @@ -19,6 +19,91 @@ #include #include +/* ------------------------------------------------------------------ */ +/* 内部:路径判定辅助 */ +/* ------------------------------------------------------------------ */ + +/** + * openat 到达符号链接之后跳转到 /zvfs 下,导致捕获不了。 + * + * 1. 判断路径是不是 /zvfs + * 2. 判断readpath是不是 /zvfs + * 3. 如果O_CREATE并且目标不存在,realpath什么也拿不到。先解析父路径,再拼接看是不是落在 /zvfs + */ +static int +zvfs_classify_path(const char *abspath, int may_create, + char *normalized_out, size_t out_size) +{ + char resolved[PATH_MAX]; + char tmp[PATH_MAX]; + char parent[PATH_MAX]; + char candidate[PATH_MAX]; + const char *name; + char *slash; + int n; + + if (!abspath || !normalized_out || out_size == 0) { + return 0; + } + + strncpy(normalized_out, abspath, out_size); + normalized_out[out_size - 1] = '\0'; + + if (zvfs_is_zvfs_path(abspath)) { + return 1; + } + + if (realpath(abspath, resolved) != NULL) { + if (zvfs_is_zvfs_path(resolved)) { + strncpy(normalized_out, resolved, out_size); + normalized_out[out_size - 1] = '\0'; + return 1; + } + return 0; + } + + if (!may_create) { + return 0; + } + + strncpy(tmp, abspath, sizeof(tmp)); + tmp[sizeof(tmp) - 1] = '\0'; + slash = strrchr(tmp, '/'); + if (!slash) { + return 0; + } + + name = slash + 1; + if (*name == '\0') { + return 0; + } + + if (slash == tmp) { + strcpy(parent, "/"); + } else { + *slash = '\0'; + strncpy(parent, tmp, sizeof(parent)); + parent[sizeof(parent) - 1] = '\0'; + } + + if (realpath(parent, resolved) == NULL) { + return 0; + } + + n = snprintf(candidate, sizeof(candidate), "%s/%s", resolved, name); + if (n <= 0 || (size_t)n >= sizeof(candidate)) { + return 0; + } + + if (!zvfs_is_zvfs_path(candidate)) { + return 0; + } + + strncpy(normalized_out, candidate, out_size); + normalized_out[out_size - 1] = '\0'; + return 1; +} + /* ------------------------------------------------------------------ */ /* 内部:open 的核心逻辑(路径已解析为绝对路径) */ /* ------------------------------------------------------------------ */ @@ -36,16 +121,15 @@ static int zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode) { - struct zvfs_inode *inode = NULL; - struct zvfs_blob_handle *handle = NULL; - uint64_t blob_id = 0; + struct zvfs_inode *inode = NULL; + uint64_t blob_id = 0; + uint64_t handle_id = 0; if (flags & O_CREAT) { /* ---- 创建路径 -------------------------------------------- */ /* 1. 创建 blob */ - handle = blob_create(0); - if (!handle) { + if (blob_create(0, &blob_id, &handle_id) != 0) { int saved = errno; if (saved == 0) saved = EIO; fprintf(stderr, @@ -54,7 +138,6 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode) errno = saved; goto fail; } - blob_id = handle->id; /* 2. 把 blob_id 写入真实文件的 xattr */ if (zvfs_xattr_write_blob_id(real_fd, blob_id) < 0) goto fail; @@ -88,8 +171,10 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode) if (inode) { /* path_cache 命中:直接用缓存的 inode,重新 blob_open */ blob_id = inode->blob_id; - handle = blob_open(blob_id); - if (!handle) { if (errno == 0) errno = EIO; goto fail; } + if (blob_open(blob_id, &handle_id) != 0) { + if (errno == 0) errno = EIO; + goto fail; + } /* 共享 inode,增加引用 */ atomic_fetch_add(&inode->ref_count, 1); @@ -106,6 +191,10 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode) pthread_mutex_unlock(&g_fs.inode_mu); if (inode) { + if (blob_open(blob_id, &handle_id) != 0) { + if (errno == 0) errno = EIO; + goto fail; + } atomic_fetch_add(&inode->ref_count, 1); } else { /* 全新 inode:需从真实文件 stat 获取 mode/size */ @@ -123,15 +212,16 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode) pthread_mutex_lock(&g_fs.path_mu); path_cache_insert(abspath, inode); pthread_mutex_unlock(&g_fs.path_mu); + if (blob_open(blob_id, &handle_id) != 0) { + if (errno == 0) errno = EIO; + goto fail; + } } - - handle = blob_open(blob_id); - if (!handle) { if (errno == 0) errno = EIO; goto fail; } } } /* ---- 分配 openfile,插入 fd_table ---------------------------- */ - struct zvfs_open_file *of = openfile_alloc(real_fd, inode, flags, handle); + struct zvfs_open_file *of = openfile_alloc(real_fd, inode, flags, handle_id); if (!of) { errno = ENOMEM; goto fail_handle; } pthread_mutex_lock(&g_fs.fd_mu); @@ -141,7 +231,9 @@ zvfs_open_impl(int real_fd, const char *abspath, int flags, mode_t mode) return real_fd; fail_handle: - blob_close(handle); + if (handle_id != 0) { + blob_close(handle_id); + } fail: /* inode 若刚分配(ref_count==1)需要回滚 */ if (inode && atomic_load(&inode->ref_count) == 1) { @@ -165,6 +257,10 @@ open(const char *path, int flags, ...) { ZVFS_HOOK_ENTER(); + char abspath[PATH_MAX]; + char normpath[PATH_MAX]; + int is_zvfs_path = 0; + mode_t mode = 0; if (flags & O_CREAT) { va_list ap; @@ -173,8 +269,13 @@ open(const char *path, int flags, ...) va_end(ap); } + if (zvfs_resolve_atpath(AT_FDCWD, path, abspath, sizeof(abspath)) == 0) { + is_zvfs_path = zvfs_classify_path(abspath, (flags & O_CREAT) != 0, + normpath, sizeof(normpath)); + } + int ret; - if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(path)) { + if (ZVFS_IN_HOOK() || !is_zvfs_path) { ret = real_open(path, flags, mode); ZVFS_HOOK_LEAVE(); return ret; @@ -186,7 +287,7 @@ open(const char *path, int flags, ...) int real_fd = real_open(path, flags, mode); if (real_fd < 0) { ZVFS_HOOK_LEAVE(); return -1; } - ret = zvfs_open_impl(real_fd, path, flags, mode); + ret = zvfs_open_impl(real_fd, normpath, flags, mode); if (ret < 0) { int saved = errno; real_close(real_fd); @@ -217,6 +318,9 @@ openat(int dirfd, const char *path, int flags, ...) { ZVFS_HOOK_ENTER(); + char normpath[PATH_MAX]; + int is_zvfs_path = 0; + mode_t mode = 0; if (flags & O_CREAT) { va_list ap; va_start(ap, flags); @@ -230,9 +334,11 @@ openat(int dirfd, const char *path, int flags, ...) ZVFS_HOOK_LEAVE(); return -1; } + is_zvfs_path = zvfs_classify_path(abspath, (flags & O_CREAT) != 0, + normpath, sizeof(normpath)); int ret; - if (ZVFS_IN_HOOK() || !zvfs_is_zvfs_path(abspath)) { + if (ZVFS_IN_HOOK() || !is_zvfs_path) { ret = real_openat(dirfd, path, flags, mode); ZVFS_HOOK_LEAVE(); return ret; @@ -243,7 +349,7 @@ openat(int dirfd, const char *path, int flags, ...) int real_fd = real_openat(dirfd, path, flags, mode); if (real_fd < 0) { ZVFS_HOOK_LEAVE(); return -1; } - ret = zvfs_open_impl(real_fd, abspath, flags, mode); + ret = zvfs_open_impl(real_fd, normpath, flags, mode); if (ret < 0) { int saved = errno; real_close(real_fd); @@ -321,43 +427,23 @@ int __libc_open(const char *path, int flags, ...) /* ------------------------------------------------------------------ */ /* - * zvfs_close_impl - zvfs fd 的关闭逻辑。 - * - * 调用方已持有 fd_mu。函数内部会释放 fd_mu 后再处理 inode。 + * zvfs_release_openfile - 释放一个 openfile 对应的 zvfs 资源。 + * 这里只处理 zvfs bookkeeping,不做 real_close(fd)。 */ static int -zvfs_close_impl(int fd) +zvfs_release_openfile(struct zvfs_open_file *of, int do_sync_md) { - /* 持 fd_mu 取出 openfile,从表里摘除 */ - pthread_mutex_lock(&g_fs.fd_mu); - struct zvfs_open_file *of = openfile_lookup(fd); - if (!of) { - pthread_mutex_unlock(&g_fs.fd_mu); - errno = EBADF; - return -1; - } - int new_ref = atomic_fetch_sub(&of->ref_count, 1) - 1; - if (new_ref == 0) - openfile_remove(fd); - pthread_mutex_unlock(&g_fs.fd_mu); - - if (new_ref > 0) { - /* - * 还有其他 dup 出来的 fd 引用同一个 openfile, - * 只关闭真实 fd,不动 blob 和 inode。 - */ - return real_close(fd); - } - - /* ---- openfile 引用归零:先刷 metadata,再关闭 blob handle ------ */ - struct zvfs_inode *inode = of->inode; - struct zvfs_blob_handle *handle = of->handle; - int sync_failed = 0; + int saved_errno = 0; + struct zvfs_inode *inode = of->inode; + uint64_t handle_id = of->handle_id; openfile_free(of); - if (blob_sync_md(handle) < 0) - sync_failed = 1; - blob_close(handle); + if (do_sync_md && handle_id != 0 && blob_sync_md(handle_id) < 0) { + saved_errno = (errno != 0) ? errno : EIO; + } + if (handle_id != 0 && blob_close(handle_id) < 0 && saved_errno == 0) { + saved_errno = (errno != 0) ? errno : EIO; + } /* ---- inode ref_count-- --------------------------------------- */ int inode_ref = atomic_fetch_sub(&inode->ref_count, 1) - 1; @@ -372,8 +458,8 @@ zvfs_close_impl(int fd) do_delete = inode->deleted; pthread_mutex_unlock(&inode->mu); - if (do_delete) - blob_delete(inode->blob_id); + if (do_delete && blob_delete(inode->blob_id) < 0 && saved_errno == 0) + saved_errno = (errno != 0) ? errno : EIO; pthread_mutex_lock(&g_fs.inode_mu); inode_remove(inode->blob_id); @@ -403,13 +489,52 @@ zvfs_close_impl(int fd) inode_free(inode); } + if (saved_errno != 0) { + errno = saved_errno; + return -1; + } + return 0; +} + +/* + * zvfs_detach_fd_mapping - 仅摘除 fd -> openfile 映射并释放 zvfs 资源。 + * 不调用 real_close(fd),用于 dup2/dup3 中 newfd 旧值清理。 + */ +static int +zvfs_detach_fd_mapping(int fd, int do_sync_md) +{ + pthread_mutex_lock(&g_fs.fd_mu); + struct zvfs_open_file *of = openfile_lookup(fd); + if (!of) { + pthread_mutex_unlock(&g_fs.fd_mu); + errno = EBADF; + return -1; + } + openfile_remove(fd); + pthread_mutex_unlock(&g_fs.fd_mu); + + return zvfs_release_openfile(of, do_sync_md); +} + +/* + * zvfs_close_impl - close(fd) 的 zvfs 路径: + * 先做 bookkeeping,再做 real_close(fd)。 + */ +static int +zvfs_close_impl(int fd) +{ + int bk_rc = zvfs_detach_fd_mapping(fd, 1); + int bk_errno = (bk_rc < 0) ? errno : 0; + int rc = real_close(fd); if (rc < 0) return -1; - if (sync_failed) { - errno = EIO; + + if (bk_rc < 0) { + errno = bk_errno; return -1; } + return 0; } @@ -436,6 +561,180 @@ close(int fd) int __close(int fd) { return close(fd); } int __libc_close(int fd) { return close(fd); } +/* ------------------------------------------------------------------ */ +/* dup helper */ +/* ------------------------------------------------------------------ */ + +int +zvfs_dup_attach_newfd(int oldfd, int newfd, int new_fd_flags) +{ + struct zvfs_open_file *old_of, *new_of; + int fd_flags; + int rc; + int saved; + + if (oldfd < 0 || newfd < 0) { + errno = EBADF; + return -1; + } + + pthread_mutex_lock(&g_fs.fd_mu); + old_of = openfile_lookup(oldfd); + if (!old_of) { + pthread_mutex_unlock(&g_fs.fd_mu); + errno = EBADF; + return -1; + } + if (openfile_lookup(newfd) != NULL) { + pthread_mutex_unlock(&g_fs.fd_mu); + errno = EEXIST; + return -1; + } + + rc = blob_add_ref(old_of->handle_id, 1); + if (rc != 0) { + pthread_mutex_unlock(&g_fs.fd_mu); + return -1; + } + + new_of = openfile_alloc(newfd, old_of->inode, old_of->flags, old_of->handle_id); + if (!new_of) { + saved = (errno != 0) ? errno : ENOMEM; + (void)blob_close(old_of->handle_id); + pthread_mutex_unlock(&g_fs.fd_mu); + errno = saved; + return -1; + } + + new_of->offset = old_of->offset; + fd_flags = (new_fd_flags >= 0) ? new_fd_flags : old_of->fd_flags; + new_of->fd_flags = fd_flags; + + atomic_fetch_add(&old_of->inode->ref_count, 1); + openfile_insert(new_of); + pthread_mutex_unlock(&g_fs.fd_mu); + return 0; +} + +static int +zvfs_add_ref_batch_or_fallback(const uint64_t *handle_ids, + const uint32_t *ref_deltas, + uint32_t count) +{ + uint32_t i; + + if (count == 0) + return 0; + + if (blob_add_ref_batch(handle_ids, ref_deltas, count) == 0) + return 0; + + for (i = 0; i < count; i++) { + if (blob_add_ref(handle_ids[i], ref_deltas[i]) != 0) + return -1; + } + return 0; +} + +static void +zvfs_rollback_added_refs(const uint64_t *handle_ids, uint32_t count) +{ + uint32_t i; + for (i = 0; i < count; i++) { + if (handle_ids[i] != 0) + (void)blob_close(handle_ids[i]); + } +} + +static int +zvfs_snapshot_fd_handles(uint64_t **handle_ids_out, + uint32_t **ref_deltas_out, + uint32_t *count_out) +{ + struct zvfs_open_file *of, *tmp; + uint32_t i = 0; + uint32_t count; + uint64_t *handle_ids = NULL; + uint32_t *ref_deltas = NULL; + + *handle_ids_out = NULL; + *ref_deltas_out = NULL; + *count_out = 0; + + pthread_mutex_lock(&g_fs.fd_mu); + count = (uint32_t)HASH_COUNT(g_fs.fd_table); + if (count == 0) { + pthread_mutex_unlock(&g_fs.fd_mu); + return 0; + } + + handle_ids = calloc(count, sizeof(*handle_ids)); + ref_deltas = calloc(count, sizeof(*ref_deltas)); + if (!handle_ids || !ref_deltas) { + pthread_mutex_unlock(&g_fs.fd_mu); + free(handle_ids); + free(ref_deltas); + errno = ENOMEM; + return -1; + } + + HASH_ITER(hh, g_fs.fd_table, of, tmp) { + if (i >= count) + break; + handle_ids[i] = of->handle_id; + ref_deltas[i] = 1; + i++; + } + pthread_mutex_unlock(&g_fs.fd_mu); + + *handle_ids_out = handle_ids; + *ref_deltas_out = ref_deltas; + *count_out = i; + return 0; +} + +static int +zvfs_snapshot_fds_in_range(unsigned int first, unsigned int last, + int **fds_out, uint32_t *count_out) +{ + struct zvfs_open_file *of, *tmp; + uint32_t cap; + uint32_t n = 0; + int *fds = NULL; + + *fds_out = NULL; + *count_out = 0; + + pthread_mutex_lock(&g_fs.fd_mu); + cap = (uint32_t)HASH_COUNT(g_fs.fd_table); + if (cap == 0) { + pthread_mutex_unlock(&g_fs.fd_mu); + return 0; + } + + fds = calloc(cap, sizeof(*fds)); + if (!fds) { + pthread_mutex_unlock(&g_fs.fd_mu); + errno = ENOMEM; + return -1; + } + + HASH_ITER(hh, g_fs.fd_table, of, tmp) { + if (of->fd < 0) { + continue; + } + if ((unsigned int)of->fd < first || (unsigned int)of->fd > last) { + continue; + } + fds[n++] = of->fd; + } + pthread_mutex_unlock(&g_fs.fd_mu); + + *fds_out = fds; + *count_out = n; + return 0; +} + /* ------------------------------------------------------------------ */ /* close_range */ /* ------------------------------------------------------------------ */ @@ -452,32 +751,53 @@ close_range(unsigned int first, unsigned int last, int flags) return ret; } + if (first > last) { + errno = EINVAL; + ZVFS_HOOK_LEAVE(); + return -1; + } + /* - * 遍历范围内所有 fd,zvfs fd 单独走 zvfs_close_impl, - * 其余统一交给 real_close_range(如果内核支持)。 - * 若内核不支持 close_range(< 5.9),逐个 close。 + * 只快照当前 zvfs fd_table 中命中的 fd,避免对 [first,last] 做 + * 全范围扫描(last=UINT_MAX 时会非常慢,且旧逻辑存在回绕风险)。 */ int any_err = 0; int inited = 0; - for (unsigned int fd = first; fd <= last; fd++) { - if (zvfs_is_zvfs_fd((int)fd)) { - if (!inited) { - zvfs_ensure_init(); - inited = 1; - } - if (zvfs_close_impl((int)fd) < 0) any_err = 1; + int *zvfs_fds = NULL; + uint32_t zvfs_fd_count = 0; + if (zvfs_snapshot_fds_in_range(first, last, &zvfs_fds, &zvfs_fd_count) < 0) { + ZVFS_HOOK_LEAVE(); + return -1; + } + + for (uint32_t i = 0; i < zvfs_fd_count; i++) { + if (!inited) { + zvfs_ensure_init(); + inited = 1; + } + if (zvfs_close_impl(zvfs_fds[i]) < 0) { + any_err = 1; } } + free(zvfs_fds); /* 让内核处理剩余非 zvfs fd(CLOEXEC 等 flags 也在这里生效) */ if (real_close_range) { if (real_close_range(first, last, flags) < 0 && !any_err) any_err = 1; } else { - /* 降级:逐个 close 非 zvfs fd */ - for (unsigned int fd = first; fd <= last; fd++) { + /* 降级:逐个 close 非 zvfs fd(按 open-max 做上界截断) */ + unsigned int upper = last; + long open_max = sysconf(_SC_OPEN_MAX); + if (open_max > 0 && upper >= (unsigned int)open_max) { + upper = (unsigned int)open_max - 1; + } + + for (unsigned int fd = first; fd <= upper; fd++) { if (!zvfs_is_zvfs_fd((int)fd)) real_close((int)fd); + if (fd == upper) + break; } } @@ -501,14 +821,24 @@ dup(int oldfd) return ret; } - /* - * 当前版本不支持在 zvfs fd 上做 dup。 - * 先明确返回 ENOTSUP,避免暴露错误的 offset 语义。 - */ zvfs_ensure_init(); - errno = ENOTSUP; + + int newfd = real_dup(oldfd); + if (newfd < 0) { + ZVFS_HOOK_LEAVE(); + return -1; + } + + if (zvfs_dup_attach_newfd(oldfd, newfd, 0) < 0) { + int saved = errno; + (void)real_close(newfd); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + ZVFS_HOOK_LEAVE(); - return -1; + return newfd; } /* ------------------------------------------------------------------ */ @@ -534,9 +864,32 @@ dup2(int oldfd, int newfd) } zvfs_ensure_init(); - errno = ENOTSUP; + int newfd_was_zvfs = zvfs_is_zvfs_fd(newfd); + + int ret = real_dup2(oldfd, newfd); + if (ret < 0) { + ZVFS_HOOK_LEAVE(); + return -1; + } + + if (newfd_was_zvfs && zvfs_detach_fd_mapping(newfd, 1) < 0) { + int saved = errno; + (void)real_close(newfd); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + + if (zvfs_dup_attach_newfd(oldfd, newfd, 0) < 0) { + int saved = errno; + (void)real_close(newfd); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + ZVFS_HOOK_LEAVE(); - return -1; + return ret; } /* ------------------------------------------------------------------ */ @@ -561,8 +914,92 @@ dup3(int oldfd, int newfd, int flags) return -1; } + if ((flags & ~O_CLOEXEC) != 0) { + errno = EINVAL; + ZVFS_HOOK_LEAVE(); + return -1; + } + zvfs_ensure_init(); - errno = ENOTSUP; + int newfd_was_zvfs = zvfs_is_zvfs_fd(newfd); + + int ret = real_dup3(oldfd, newfd, flags); + if (ret < 0) { + ZVFS_HOOK_LEAVE(); + return -1; + } + + if (newfd_was_zvfs && zvfs_detach_fd_mapping(newfd, 1) < 0) { + int saved = errno; + (void)real_close(newfd); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + + int fd_flags = (flags & O_CLOEXEC) ? FD_CLOEXEC : 0; + if (zvfs_dup_attach_newfd(oldfd, newfd, fd_flags) < 0) { + int saved = errno; + (void)real_close(newfd); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + ZVFS_HOOK_LEAVE(); - return -1; + return ret; +} + +/* ------------------------------------------------------------------ */ +/* fork */ +/* ------------------------------------------------------------------ */ + +pid_t +fork(void) +{ + ZVFS_HOOK_ENTER(); + + if (ZVFS_IN_HOOK()) { + pid_t ret = real_fork(); + ZVFS_HOOK_LEAVE(); + return ret; + } + + uint64_t *handle_ids = NULL; + uint32_t *ref_deltas = NULL; + uint32_t count = 0; + + if (zvfs_snapshot_fd_handles(&handle_ids, &ref_deltas, &count) < 0) { + ZVFS_HOOK_LEAVE(); + return -1; + } + + if (count > 0) { + zvfs_ensure_init(); + if (zvfs_add_ref_batch_or_fallback(handle_ids, ref_deltas, count) < 0) { + int saved = errno; + free(handle_ids); + free(ref_deltas); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + } + + pid_t ret = real_fork(); + if (ret < 0) { + int saved = errno; + if (count > 0) + zvfs_rollback_added_refs(handle_ids, count); + free(handle_ids); + free(ref_deltas); + errno = saved; + ZVFS_HOOK_LEAVE(); + return -1; + } + + free(handle_ids); + free(ref_deltas); + ZVFS_HOOK_LEAVE(); + return ret; } diff --git a/src/hook/zvfs_hook_fd.h b/src/hook/zvfs_hook_fd.h index bbb18a4..750587f 100644 --- a/src/hook/zvfs_hook_fd.h +++ b/src/hook/zvfs_hook_fd.h @@ -12,16 +12,17 @@ * 非 zvfs 路径 → 透传 * * close: - * zvfs fd → openfile ref_count-- - * 归零:blob_close;若 inode->deleted,blob_delete + inode_free - * inode ref_count--(归零:path_cache_remove + inode_free) + * zvfs fd → blob_sync_md + blob_close + * inode ref_count--(归零:若 inode->deleted 则 blob_delete,再 inode_free) * real_close * 非 zvfs fd → 透传 * * dup / dup2 / dup3: - * zvfs fd → 新 fd 插入 fd_table,openfile.ref_count++(共享同一 openfile), - * real_dup* 同步执行(内核也要知道这个 fd) + * zvfs fd → real_dup* + daemon ADD_REF + 本地 openfile/inode 引用维护 * 非 zvfs fd → 透传 + * + * fork: + * 子进程会对已继承的 zvfs handle 执行 ADD_REF_BATCH(失败时退化为逐个 ADD_REF) */ /* open 族 */ @@ -40,6 +41,10 @@ int close_range(unsigned int first, unsigned int last, int flags); int dup(int oldfd); int dup2(int oldfd, int newfd); int dup3(int oldfd, int newfd, int flags); +pid_t fork(void); + +/* 给 fcntl(F_DUPFD*) 复用的内部辅助接口 */ +int zvfs_dup_attach_newfd(int oldfd, int newfd, int new_fd_flags); /* glibc 内部别名(与 open/close 实现体共享逻辑,转发即可) */ int __open(const char *path, int flags, ...); @@ -48,4 +53,4 @@ int __libc_open(const char *path, int flags, ...); int __close(int fd); int __libc_close(int fd); -#endif // __ZVFS_HOOK_FD_H__ \ No newline at end of file +#endif // __ZVFS_HOOK_FD_H__ diff --git a/src/hook/zvfs_hook_init.h b/src/hook/zvfs_hook_init.h index 2d97aba..64b81d9 100644 --- a/src/hook/zvfs_hook_init.h +++ b/src/hook/zvfs_hook_init.h @@ -114,6 +114,10 @@ extern void *(*real_mmap64)(void *addr, size_t length, int prot, int flags, extern int (*real_munmap)(void *addr, size_t length); extern int (*real_msync)(void *addr, size_t length, int flags); +/* 进程 */ +extern pid_t (*real_fork)(void); +extern pid_t (*real_vfork)(void); + /* glibc 内部别名 */ extern int (*real___open)(const char *path, int flags, ...); diff --git a/src/hook/zvfs_hook_rw.c b/src/hook/zvfs_hook_rw.c index 9c37cb1..cd1e245 100644 --- a/src/hook/zvfs_hook_rw.c +++ b/src/hook/zvfs_hook_rw.c @@ -7,6 +7,7 @@ #include "fs/zvfs.h" #include "fs/zvfs_open_file.h" #include "fs/zvfs_inode.h" +#include "proto/ipc_proto.h" #include "spdk_engine/io_engine.h" #include @@ -50,7 +51,7 @@ zvfs_pread_impl(struct zvfs_open_file *of, if (count == 0) return 0; - if (blob_read(of->handle, offset, buf, count) < 0) { + if (blob_read(of->handle_id, offset, buf, count) < 0) { errno = EIO; return -1; } @@ -74,33 +75,15 @@ zvfs_pwrite_impl(struct zvfs_open_file *of, uint64_t end = offset + count; - /* - * 若写入范围超出 blob 当前物理大小,先 resize。 - * blob_resize 是 SPDK 侧的操作(可能分配新 cluster)。 - */ - pthread_mutex_lock(&of->inode->mu); - uint64_t old_size = of->inode->logical_size; - pthread_mutex_unlock(&of->inode->mu); - - if (end > old_size) { - if (blob_resize(of->handle, end) < 0) { - errno = EIO; - return -1; - } - } - - if (blob_write(of->handle, offset, buf, count) < 0) { - errno = EIO; + if (blob_write_ex(of->handle_id, offset, buf, count, ZVFS_WRITE_F_AUTO_GROW) < 0) { return -1; } /* 更新 logical_size(持锁,inode_update_size 负责 ftruncate) */ - if (end > old_size) { - pthread_mutex_lock(&of->inode->mu); - if (end > of->inode->logical_size) /* double-check */ - inode_update_size(of->inode, of->fd, end); - pthread_mutex_unlock(&of->inode->mu); - } + pthread_mutex_lock(&of->inode->mu); + if (end > of->inode->logical_size) /* double-check */ + inode_update_size(of->inode, of->fd, end); + pthread_mutex_unlock(&of->inode->mu); return (ssize_t)count; } @@ -151,7 +134,7 @@ zvfs_iov_pread(struct zvfs_open_file *of, char *tmp = malloc(total_len); if (!tmp) { errno = ENOMEM; return -1; } - if (blob_read(of->handle, offset, tmp, total_len) < 0) { + if (blob_read(of->handle_id, offset, tmp, total_len) < 0) { free(tmp); errno = EIO; return -1; @@ -477,36 +460,16 @@ write(int fd, const void *buf, size_t count) uint64_t write_off; if (of->flags & O_APPEND) { - /* - * O_APPEND:每次写入位置 = 当前 logical_size(原子操作)。 - * 持 inode->mu 保证 read-then-write 的原子性, - * 防止两个 O_APPEND fd 并发写时覆盖彼此数据。 - */ - /* --- O_APPEND 内联写 -------------------------------------- */ + /* O_APPEND:每次写入位置 = 当前 logical_size。 */ pthread_mutex_lock(&of->inode->mu); write_off = of->inode->logical_size; /* 重新取,防止 TOCTOU */ - uint64_t end = write_off + count; - - pthread_mutex_unlock(&of->inode->mu); - - if (blob_resize(of->handle, end) < 0) { - errno = EIO; - ZVFS_HOOK_LEAVE(); - return -1; - } - if (blob_write(of->handle, write_off, buf, count) < 0) { - errno = EIO; - ZVFS_HOOK_LEAVE(); - return -1; - } - - pthread_mutex_lock(&of->inode->mu); - if (end > of->inode->logical_size) - inode_update_size(of->inode, of->fd, end); pthread_mutex_unlock(&of->inode->mu); + ssize_t r = zvfs_pwrite_impl(of, buf, count, write_off); + if (r > 0) + of->offset = write_off + (uint64_t)r; ZVFS_HOOK_LEAVE(); - return (ssize_t)count; + return r; } else { write_off = of->offset; @@ -572,28 +535,14 @@ writev(int fd, const struct iovec *iov, int iovcnt) ssize_t r; if (of->flags & O_APPEND) { - /* - * O_APPEND + writev:和 write 一样需要原子序列。 - * 先计算总字节数,用 iov_pwrite 完成,整个过程持 inode->mu。 - */ - size_t total_len = 0; - for (int i = 0; i < iovcnt; i++) total_len += iov[i].iov_len; - + /* O_APPEND + writev:以当前 logical_size 作为写入起点。 */ pthread_mutex_lock(&of->inode->mu); uint64_t write_off = of->inode->logical_size; - uint64_t end = write_off + total_len; pthread_mutex_unlock(&of->inode->mu); - if (blob_resize(of->handle, end) < 0) { errno = EIO; ZVFS_HOOK_LEAVE(); return -1; } r = zvfs_iov_pwrite(of, iov, iovcnt, write_off); - - if (r > 0) { - pthread_mutex_lock(&of->inode->mu); - uint64_t new_end = write_off + (uint64_t)r; - if (new_end > of->inode->logical_size) - inode_update_size(of->inode, of->fd, new_end); - pthread_mutex_unlock(&of->inode->mu); - } + if (r > 0) + of->offset = write_off + (uint64_t)r; } else { r = zvfs_iov_pwrite(of, iov, iovcnt, of->offset); if (r > 0) of->offset += (uint64_t)r; diff --git a/src/hook/zvfs_hook_seek.c b/src/hook/zvfs_hook_seek.c index 637dde4..ac7ba0d 100644 --- a/src/hook/zvfs_hook_seek.c +++ b/src/hook/zvfs_hook_seek.c @@ -69,21 +69,21 @@ off_t lseek64(int fd, off_t offset, int whence) /* - * zvfs_truncate_by_inode - 对有 handle 的 openfile 做 truncate。 - * 找到任意一个打开该 inode 的 openfile 取其 handle。 + * zvfs_truncate_by_inode - 对有 handle_id 的 openfile 做 truncate。 + * 找到任意一个打开该 inode 的 openfile 取其 handle_id。 */ static int zvfs_truncate_inode_with_handle(struct zvfs_inode *inode, int real_fd, uint64_t new_size) { - /* 在 fd_table 里找一个指向该 inode 的 openfile 取 handle */ - struct zvfs_blob_handle *handle = NULL; + /* 在 fd_table 里找一个指向该 inode 的 openfile 取 handle_id */ + uint64_t handle_id = 0; pthread_mutex_lock(&g_fs.fd_mu); struct zvfs_open_file *of, *tmp; HASH_ITER(hh, g_fs.fd_table, of, tmp) { (void)tmp; if (of->inode == inode) { - handle = of->handle; + handle_id = of->handle_id; break; } } @@ -93,20 +93,23 @@ zvfs_truncate_inode_with_handle(struct zvfs_inode *inode, uint64_t old_size = inode->logical_size; pthread_mutex_unlock(&inode->mu); - if (new_size != old_size && handle) { - if (blob_resize(handle, new_size) < 0) { + if (new_size != old_size && handle_id != 0) { + if (blob_resize(handle_id, new_size) < 0) { errno = EIO; return -1; } - } else if (new_size != old_size && !handle) { + } else if (new_size != old_size && handle_id == 0) { /* * 文件未被打开:需要临时 blob_open。 * 这种情况下 truncate(path, ...) 被调用但文件没有 fd。 */ - handle = blob_open(inode->blob_id); - if (!handle) { errno = EIO; return -1; } - int rc = blob_resize(handle, new_size); - blob_close(handle); + uint64_t temp_handle_id = 0; + if (blob_open(inode->blob_id, &temp_handle_id) < 0) { + errno = EIO; + return -1; + } + int rc = blob_resize(temp_handle_id, new_size); + blob_close(temp_handle_id); if (rc < 0) { errno = EIO; return -1; } } diff --git a/src/hook/zvfs_hook_sync.c b/src/hook/zvfs_hook_sync.c index 43e5a05..a97a9c2 100644 --- a/src/hook/zvfs_hook_sync.c +++ b/src/hook/zvfs_hook_sync.c @@ -39,7 +39,7 @@ fsync(int fd) * zvfs 无写缓冲区,数据已在 blob_write 时落到 SPDK 存储。 * 调用 blob_sync_md 确保 blob 元数据(size 等)持久化。 */ - int r = blob_sync_md(of->handle); + int r = blob_sync_md(of->handle_id); if (r < 0) errno = EIO; ZVFS_HOOK_LEAVE(); @@ -75,7 +75,7 @@ fdatasync(int fd) * 对 zvfs:数据已无缓冲,blob_sync_md 同步 size 元数据即可。 * 与 fsync 实现相同——如果将来区分数据/元数据可在此分叉。 */ - int r = blob_sync_md(of->handle); + int r = blob_sync_md(of->handle_id); if (r < 0) errno = EIO; ZVFS_HOOK_LEAVE(); diff --git a/src/main.c b/src/main.c deleted file mode 100644 index e69de29..0000000 diff --git a/src/proto/ipc_proto.c b/src/proto/ipc_proto.c new file mode 100644 index 0000000..14ee947 --- /dev/null +++ b/src/proto/ipc_proto.c @@ -0,0 +1,1056 @@ +#include "ipc_proto.h" + +#include +#include + +#define ZVFS_REQ_HEADER_WIRE_SIZE (sizeof(uint32_t) + sizeof(uint32_t)) +#define ZVFS_RESP_HEADER_WIRE_SIZE (sizeof(uint32_t) + sizeof(int32_t) + sizeof(uint32_t)) + +static int write_bytes(uint8_t **p, size_t *remaining, const void *src, size_t n) { + if (*remaining < n) { + return -1; + } + memcpy(*p, src, n); + *p += n; + *remaining -= n; + return 0; +} + +static int read_bytes(const uint8_t **p, size_t *remaining, void *dst, size_t n) { + if (*remaining < n) { + return -1; + } + memcpy(dst, *p, n); + *p += n; + *remaining -= n; + return 0; +} + +static int write_u32(uint8_t **p, size_t *remaining, uint32_t v) { + return write_bytes(p, remaining, &v, sizeof(v)); +} + +static int read_u32(const uint8_t **p, size_t *remaining, uint32_t *v) { + return read_bytes(p, remaining, v, sizeof(*v)); +} + +static int write_s32(uint8_t **p, size_t *remaining, int32_t v) { + return write_bytes(p, remaining, &v, sizeof(v)); +} + +static int read_s32(const uint8_t **p, size_t *remaining, int32_t *v) { + return read_bytes(p, remaining, v, sizeof(*v)); +} + +static int write_u64(uint8_t **p, size_t *remaining, uint64_t v) { + return write_bytes(p, remaining, &v, sizeof(v)); +} + +static int read_u64(const uint8_t **p, size_t *remaining, uint64_t *v) { + return read_bytes(p, remaining, v, sizeof(*v)); +} + +static int valid_opcode(uint32_t opcode) { + return opcode >= ZVFS_OP_CREATE && opcode <= ZVFS_OP_ADD_REF_BATCH; +} + +/* -------------------- header -------------------- */ + +size_t zvfs_serialize_req_header(const struct zvfs_req_header *header, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!header || !buf) { + return 0; + } + if (write_u32(&p, &remaining, header->opcode) != 0) { + return 0; + } + if (write_u32(&p, &remaining, header->payload_len) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_header(const uint8_t *buf, size_t buf_len, struct zvfs_req_header *header) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!header || !buf) { + return 0; + } + if (read_u32(&p, &remaining, &header->opcode) != 0) { + return 0; + } + if (read_u32(&p, &remaining, &header->payload_len) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_resp_header(const struct zvfs_resp_header *header, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!header || !buf) { + return 0; + } + if (write_u32(&p, &remaining, header->opcode) != 0) { + return 0; + } + if (write_s32(&p, &remaining, header->status) != 0) { + return 0; + } + if (write_u32(&p, &remaining, header->payload_len) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_resp_header(const uint8_t *buf, size_t buf_len, struct zvfs_resp_header *header) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!header || !buf) { + return 0; + } + if (read_u32(&p, &remaining, &header->opcode) != 0) { + return 0; + } + if (read_s32(&p, &remaining, &header->status) != 0) { + return 0; + } + if (read_u32(&p, &remaining, &header->payload_len) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +/* -------------------- request body -------------------- */ + +size_t zvfs_serialize_req_create(const struct zvfs_req_create_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->size_hint) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_create(const uint8_t *buf, size_t buf_len, struct zvfs_req_create_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->size_hint) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_open(const struct zvfs_req_open_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->blob_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_open(const uint8_t *buf, size_t buf_len, struct zvfs_req_open_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->blob_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_read(const struct zvfs_req_read_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0 || + write_u64(&p, &remaining, body->offset) != 0 || + write_u64(&p, &remaining, body->length) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_read(const uint8_t *buf, size_t buf_len, struct zvfs_req_read_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->handle_id) != 0 || + read_u64(&p, &remaining, &body->offset) != 0 || + read_u64(&p, &remaining, &body->length) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_write(const struct zvfs_req_write_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (body->length > 0 && !body->data) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0 || + write_u64(&p, &remaining, body->offset) != 0 || + write_u64(&p, &remaining, body->length) != 0 || + write_u32(&p, &remaining, body->flags) != 0) { + return 0; + } + if (body->length > 0 && write_bytes(&p, &remaining, body->data, (size_t)body->length) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_write(const uint8_t *buf, size_t buf_len, struct zvfs_req_write_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + body->data = NULL; + body->flags = 0; + if (read_u64(&p, &remaining, &body->handle_id) != 0 || + read_u64(&p, &remaining, &body->offset) != 0 || + read_u64(&p, &remaining, &body->length) != 0 || + read_u32(&p, &remaining, &body->flags) != 0) { + return 0; + } + + if (body->length > remaining) { + return 0; + } + + if (body->length > 0) { + body->data = malloc((size_t)body->length); + if (!body->data) { + return 0; + } + memcpy((void *)body->data, p, (size_t)body->length); + p += body->length; + remaining -= (size_t)body->length; + } + + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_resize(const struct zvfs_req_resize_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0 || + write_u64(&p, &remaining, body->new_size) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_resize(const uint8_t *buf, size_t buf_len, struct zvfs_req_resize_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->handle_id) != 0 || + read_u64(&p, &remaining, &body->new_size) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_sync_md(const struct zvfs_req_sync_md_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_sync_md(const uint8_t *buf, size_t buf_len, struct zvfs_req_sync_md_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->handle_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_close(const struct zvfs_req_close_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_close(const uint8_t *buf, size_t buf_len, struct zvfs_req_close_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->handle_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_delete(const struct zvfs_req_delete_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->blob_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_delete(const uint8_t *buf, size_t buf_len, struct zvfs_req_delete_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->blob_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_add_ref(const struct zvfs_req_add_ref_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0 || + write_u32(&p, &remaining, body->ref_delta) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_add_ref(const uint8_t *buf, size_t buf_len, struct zvfs_req_add_ref_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->handle_id) != 0 || + read_u32(&p, &remaining, &body->ref_delta) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_req_add_ref_batch(const struct zvfs_req_add_ref_batch_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + uint32_t i; + + if (!body || !buf) { + return 0; + } + if (body->item_count > 0 && !body->items) { + return 0; + } + + if (write_u32(&p, &remaining, body->item_count) != 0) { + return 0; + } + for (i = 0; i < body->item_count; i++) { + if (write_u64(&p, &remaining, body->items[i].handle_id) != 0 || + write_u32(&p, &remaining, body->items[i].ref_delta) != 0) { + return 0; + } + } + + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_req_add_ref_batch(const uint8_t *buf, size_t buf_len, struct zvfs_req_add_ref_batch_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + uint32_t i = 0; + struct zvfs_add_ref_item *items = NULL; + + if (!body || !buf) { + return 0; + } + body->items = NULL; + body->item_count = 0; + + if (read_u32(&p, &remaining, &body->item_count) != 0) { + return 0; + } + if (body->item_count > 0) { + items = calloc(body->item_count, sizeof(*items)); + if (!items) { + return 0; + } + for (i = 0; i < body->item_count; i++) { + if (read_u64(&p, &remaining, &items[i].handle_id) != 0 || + read_u32(&p, &remaining, &items[i].ref_delta) != 0) { + free(items); + return 0; + } + } + body->items = items; + } + + return (size_t)(p - buf); +} + +/* -------------------- response body -------------------- */ + +size_t zvfs_serialize_resp_create(const struct zvfs_resp_create_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->blob_id) != 0 || + write_u64(&p, &remaining, body->handle_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_resp_create(const uint8_t *buf, size_t buf_len, struct zvfs_resp_create_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->blob_id) != 0 || + read_u64(&p, &remaining, &body->handle_id) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_resp_open(const struct zvfs_resp_open_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->handle_id) != 0 || + write_u64(&p, &remaining, body->size) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_resp_open(const uint8_t *buf, size_t buf_len, struct zvfs_resp_open_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->handle_id) != 0 || + read_u64(&p, &remaining, &body->size) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_resp_read(const struct zvfs_resp_read_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (body->length > 0 && !body->data) { + return 0; + } + if (write_u64(&p, &remaining, body->length) != 0) { + return 0; + } + if (body->length > 0 && write_bytes(&p, &remaining, body->data, (size_t)body->length) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_resp_read(const uint8_t *buf, size_t buf_len, struct zvfs_resp_read_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + body->data = NULL; + + if (read_u64(&p, &remaining, &body->length) != 0) { + return 0; + } + if (body->length > remaining) { + return 0; + } + + if (body->length > 0) { + body->data = malloc((size_t)body->length); + if (!body->data) { + return 0; + } + memcpy(body->data, p, (size_t)body->length); + p += body->length; + remaining -= (size_t)body->length; + } + + return (size_t)(p - buf); +} + +size_t zvfs_serialize_resp_write(const struct zvfs_resp_write_body *body, uint8_t *buf, size_t buf_len) { + uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (write_u64(&p, &remaining, body->bytes_written) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_deserialize_resp_write(const uint8_t *buf, size_t buf_len, struct zvfs_resp_write_body *body) { + const uint8_t *p = buf; + size_t remaining = buf_len; + + if (!body || !buf) { + return 0; + } + if (read_u64(&p, &remaining, &body->bytes_written) != 0) { + return 0; + } + return (size_t)(p - buf); +} + +size_t zvfs_serialize_resp_resize(uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_deserialize_resp_resize(const uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_serialize_resp_sync_md(uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_deserialize_resp_sync_md(const uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_serialize_resp_close(uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_deserialize_resp_close(const uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_serialize_resp_delete(uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +size_t zvfs_deserialize_resp_delete(const uint8_t *buf, size_t buf_len) { + (void)buf; + (void)buf_len; + return 0; +} + +/* -------------------- compatibility wrapper: req -------------------- */ + +size_t zvfs_serialize_req(struct zvfs_req *req, uint8_t *buf, size_t buf_len) { + struct zvfs_req_header header; + size_t body_len = 0; + + if (!req || !buf || !valid_opcode(req->opcode)) { + return 0; + } + if (buf_len < ZVFS_REQ_HEADER_WIRE_SIZE) { + return 0; + } + + switch (req->opcode) { + case ZVFS_OP_CREATE: { + struct zvfs_req_create_body body = { .size_hint = req->size_hint }; + body_len = zvfs_serialize_req_create(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_OPEN: { + struct zvfs_req_open_body body = { .blob_id = req->blob_id }; + body_len = zvfs_serialize_req_open(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_READ: { + struct zvfs_req_read_body body = { + .handle_id = req->handle_id, + .offset = req->offset, + .length = req->length, + }; + body_len = zvfs_serialize_req_read(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_WRITE: { + struct zvfs_req_write_body body = { + .handle_id = req->handle_id, + .offset = req->offset, + .length = req->length, + .flags = req->write_flags, + .data = req->data, + }; + body_len = zvfs_serialize_req_write(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_RESIZE: { + struct zvfs_req_resize_body body = { + .handle_id = req->handle_id, + .new_size = req->size_hint, + }; + body_len = zvfs_serialize_req_resize(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_SYNC_MD: { + struct zvfs_req_sync_md_body body = { .handle_id = req->handle_id }; + body_len = zvfs_serialize_req_sync_md(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_CLOSE: { + struct zvfs_req_close_body body = { .handle_id = req->handle_id }; + body_len = zvfs_serialize_req_close(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_DELETE: { + struct zvfs_req_delete_body body = { .blob_id = req->blob_id }; + body_len = zvfs_serialize_req_delete(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_ADD_REF: { + struct zvfs_req_add_ref_body body = { + .handle_id = req->handle_id, + .ref_delta = req->ref_delta, + }; + body_len = zvfs_serialize_req_add_ref(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_ADD_REF_BATCH: { + struct zvfs_req_add_ref_batch_body body = { + .item_count = req->add_ref_count, + .items = req->add_ref_items, + }; + body_len = zvfs_serialize_req_add_ref_batch(&body, buf + ZVFS_REQ_HEADER_WIRE_SIZE, + buf_len - ZVFS_REQ_HEADER_WIRE_SIZE); + break; + } + default: + return 0; + } + + if (body_len == 0) { + return 0; + } + + header.opcode = req->opcode; + header.payload_len = (uint32_t)body_len; + if (zvfs_serialize_req_header(&header, buf, buf_len) != ZVFS_REQ_HEADER_WIRE_SIZE) { + return 0; + } + + return ZVFS_REQ_HEADER_WIRE_SIZE + body_len; +} + +size_t zvfs_deserialize_req(uint8_t *buf, size_t buf_len, struct zvfs_req *req) { + struct zvfs_req_header header; + size_t header_len; + size_t total; + size_t consumed = 0; + const uint8_t *payload; + + if (!buf || !req) { + return 0; + } + + header_len = zvfs_deserialize_req_header(buf, buf_len, &header); + if (header_len == 0 || !valid_opcode(header.opcode)) { + return 0; + } + + total = header_len + header.payload_len; + if (buf_len < total) { + return 0; + } + + memset(req, 0, sizeof(*req)); + req->opcode = header.opcode; + + payload = buf + header_len; + + switch (header.opcode) { + case ZVFS_OP_CREATE: { + struct zvfs_req_create_body body; + consumed = zvfs_deserialize_req_create(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->size_hint = body.size_hint; + } + break; + } + case ZVFS_OP_OPEN: { + struct zvfs_req_open_body body; + consumed = zvfs_deserialize_req_open(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->blob_id = body.blob_id; + } + break; + } + case ZVFS_OP_READ: { + struct zvfs_req_read_body body; + consumed = zvfs_deserialize_req_read(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->handle_id = body.handle_id; + req->offset = body.offset; + req->length = body.length; + req->data = malloc(req->length); + } + break; + } + case ZVFS_OP_WRITE: { + struct zvfs_req_write_body body; + consumed = zvfs_deserialize_req_write(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->handle_id = body.handle_id; + req->offset = body.offset; + req->length = body.length; + req->write_flags = body.flags; + req->data = (void *)body.data; + } + break; + } + case ZVFS_OP_RESIZE: { + struct zvfs_req_resize_body body; + consumed = zvfs_deserialize_req_resize(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->handle_id = body.handle_id; + req->size_hint = body.new_size; + } + break; + } + case ZVFS_OP_SYNC_MD: { + struct zvfs_req_sync_md_body body; + consumed = zvfs_deserialize_req_sync_md(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->handle_id = body.handle_id; + } + break; + } + case ZVFS_OP_CLOSE: { + struct zvfs_req_close_body body; + consumed = zvfs_deserialize_req_close(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->handle_id = body.handle_id; + } + break; + } + case ZVFS_OP_DELETE: { + struct zvfs_req_delete_body body; + consumed = zvfs_deserialize_req_delete(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->blob_id = body.blob_id; + } + break; + } + case ZVFS_OP_ADD_REF: { + struct zvfs_req_add_ref_body body; + consumed = zvfs_deserialize_req_add_ref(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->handle_id = body.handle_id; + req->ref_delta = body.ref_delta; + } + break; + } + case ZVFS_OP_ADD_REF_BATCH: { + struct zvfs_req_add_ref_batch_body body; + consumed = zvfs_deserialize_req_add_ref_batch(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + req->add_ref_count = body.item_count; + req->add_ref_items = (struct zvfs_add_ref_item *)body.items; + } + break; + } + default: + return 0; + } + + if (consumed != header.payload_len) { + if (req->data) { + free(req->data); + req->data = NULL; + } + if (req->add_ref_items) { + free(req->add_ref_items); + req->add_ref_items = NULL; + } + return 0; + } + + return total; +} + +/* -------------------- compatibility wrapper: resp -------------------- */ + +size_t zvfs_serialize_resp(struct zvfs_resp *resp, uint8_t *buf, size_t buf_len) { + struct zvfs_resp_header header; + size_t body_len = 0; + + if (!resp || !buf || !valid_opcode(resp->opcode)) { + return 0; + } + if (buf_len < ZVFS_RESP_HEADER_WIRE_SIZE) { + return 0; + } + + if (resp->status == 0) { + switch (resp->opcode) { + case ZVFS_OP_CREATE: { + struct zvfs_resp_create_body body = { + .blob_id = resp->blob_id, + .handle_id = resp->handle_id, + }; + body_len = zvfs_serialize_resp_create(&body, buf + ZVFS_RESP_HEADER_WIRE_SIZE, + buf_len - ZVFS_RESP_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_OPEN: { + struct zvfs_resp_open_body body = { + .handle_id = resp->handle_id, + .size = resp->size, + }; + body_len = zvfs_serialize_resp_open(&body, buf + ZVFS_RESP_HEADER_WIRE_SIZE, + buf_len - ZVFS_RESP_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_READ: { + struct zvfs_resp_read_body body = { + .length = resp->length, + .data = resp->data, + }; + body_len = zvfs_serialize_resp_read(&body, buf + ZVFS_RESP_HEADER_WIRE_SIZE, + buf_len - ZVFS_RESP_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_WRITE: { + struct zvfs_resp_write_body body = { .bytes_written = resp->bytes_written }; + body_len = zvfs_serialize_resp_write(&body, buf + ZVFS_RESP_HEADER_WIRE_SIZE, + buf_len - ZVFS_RESP_HEADER_WIRE_SIZE); + break; + } + case ZVFS_OP_RESIZE: + case ZVFS_OP_SYNC_MD: + case ZVFS_OP_CLOSE: + case ZVFS_OP_DELETE: + case ZVFS_OP_ADD_REF: + case ZVFS_OP_ADD_REF_BATCH: + body_len = 0; + break; + default: + return 0; + } + + if (resp->opcode <= ZVFS_OP_WRITE && body_len == 0) { + return 0; + } + } + + header.opcode = resp->opcode; + header.status = resp->status; + header.payload_len = (uint32_t)body_len; + + if (zvfs_serialize_resp_header(&header, buf, buf_len) != ZVFS_RESP_HEADER_WIRE_SIZE) { + return 0; + } + + return ZVFS_RESP_HEADER_WIRE_SIZE + body_len; +} + +size_t zvfs_deserialize_resp(uint8_t *buf, size_t buf_len, struct zvfs_resp *resp) { + struct zvfs_resp_header header; + size_t header_len; + size_t total; + size_t consumed = 0; + const uint8_t *payload; + + if (!buf || !resp) { + return 0; + } + + header_len = zvfs_deserialize_resp_header(buf, buf_len, &header); + if (header_len == 0 || !valid_opcode(header.opcode)) { + return 0; + } + + total = header_len + header.payload_len; + if (buf_len < total) { + return 0; + } + + memset(resp, 0, sizeof(*resp)); + resp->opcode = header.opcode; + resp->status = header.status; + + if (header.status != 0) { + if (header.payload_len != 0) { + return 0; + } + return total; + } + + payload = buf + header_len; + + switch (header.opcode) { + case ZVFS_OP_CREATE: { + struct zvfs_resp_create_body body; + consumed = zvfs_deserialize_resp_create(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + resp->blob_id = body.blob_id; + resp->handle_id = body.handle_id; + } + break; + } + case ZVFS_OP_OPEN: { + struct zvfs_resp_open_body body; + consumed = zvfs_deserialize_resp_open(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + resp->handle_id = body.handle_id; + resp->size = body.size; + } + break; + } + case ZVFS_OP_READ: { + struct zvfs_resp_read_body body; + consumed = zvfs_deserialize_resp_read(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + resp->length = body.length; + resp->data = body.data; + } + break; + } + case ZVFS_OP_WRITE: { + struct zvfs_resp_write_body body; + consumed = zvfs_deserialize_resp_write(payload, header.payload_len, &body); + if (consumed == header.payload_len) { + resp->bytes_written = body.bytes_written; + } + break; + } + case ZVFS_OP_RESIZE: + case ZVFS_OP_SYNC_MD: + case ZVFS_OP_CLOSE: + case ZVFS_OP_DELETE: + case ZVFS_OP_ADD_REF: + case ZVFS_OP_ADD_REF_BATCH: + if (header.payload_len != 0) { + return 0; + } + consumed = 0; + break; + default: + return 0; + } + + if (consumed != header.payload_len) { + if (resp->data) { + free(resp->data); + resp->data = NULL; + } + return 0; + } + + return total; +} diff --git a/src/proto/ipc_proto.h b/src/proto/ipc_proto.h new file mode 100644 index 0000000..e76659f --- /dev/null +++ b/src/proto/ipc_proto.h @@ -0,0 +1,265 @@ +#ifndef __IPC_PROTO_H__ +#define __IPC_PROTO_H__ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +struct zvfs_conn; +struct zvfs_blob_handle; + +enum zvfs_opcode { + ZVFS_OP_CREATE = 1, + ZVFS_OP_OPEN, + ZVFS_OP_READ, + ZVFS_OP_WRITE, + ZVFS_OP_RESIZE, + ZVFS_OP_SYNC_MD, + ZVFS_OP_CLOSE, + ZVFS_OP_DELETE, + ZVFS_OP_ADD_REF, + ZVFS_OP_ADD_REF_BATCH +}; + +inline const char* cast_opcode2string(uint32_t op){ + switch (op) + { + case ZVFS_OP_CREATE: + return "CREATE"; + break; + case ZVFS_OP_OPEN: + return "OPEN"; + break; + case ZVFS_OP_READ: + return "READ"; + break; + case ZVFS_OP_WRITE: + return "WRITE"; + break; + case ZVFS_OP_RESIZE: + return "RESIZE"; + break; + case ZVFS_OP_SYNC_MD: + return "SYNC"; + break; + case ZVFS_OP_CLOSE: + return "CLOSE"; + break; + case ZVFS_OP_DELETE: + return "DELETE"; + break; + default: + break; + } + return "ERROR"; +} + +#define ZVFS_WRITE_F_AUTO_GROW (1u << 0) + +/* 最小固定头(同步阻塞场景,不含 request_id) */ +struct zvfs_req_header { + uint32_t opcode; + uint32_t payload_len; +}; + +struct zvfs_resp_header { + uint32_t opcode; + int32_t status; + uint32_t payload_len; +}; + +/* -------------------- per-op request body -------------------- */ + +struct zvfs_req_create_body { + uint64_t size_hint; +}; + +struct zvfs_req_open_body { + uint64_t blob_id; +}; + +struct zvfs_req_read_body { + uint64_t handle_id; + uint64_t offset; + uint64_t length; +}; + +struct zvfs_req_write_body { + uint64_t handle_id; + uint64_t offset; + uint64_t length; + uint32_t flags; + const void *data; +}; + +struct zvfs_req_resize_body { + uint64_t handle_id; + uint64_t new_size; +}; + +struct zvfs_req_sync_md_body { + uint64_t handle_id; +}; + +struct zvfs_req_close_body { + uint64_t handle_id; +}; + +struct zvfs_req_delete_body { + uint64_t blob_id; +}; + +struct zvfs_add_ref_item { + uint64_t handle_id; + uint32_t ref_delta; +}; + +struct zvfs_req_add_ref_body { + uint64_t handle_id; + uint32_t ref_delta; +}; + +struct zvfs_req_add_ref_batch_body { + uint32_t item_count; + const struct zvfs_add_ref_item *items; +}; + +/* -------------------- per-op response body -------------------- */ + +struct zvfs_resp_create_body { + uint64_t blob_id; + uint64_t handle_id; +}; + +struct zvfs_resp_open_body { + uint64_t handle_id; + uint64_t size; +}; + +struct zvfs_resp_read_body { + uint64_t length; + void *data; +}; + +struct zvfs_resp_write_body { + uint64_t bytes_written; +}; + +/* resize/sync_md/close/delete 成功时 body 为空 */ +size_t zvfs_serialize_resp_resize(uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_resize(const uint8_t *buf, size_t buf_len); +size_t zvfs_serialize_resp_sync_md(uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_sync_md(const uint8_t *buf, size_t buf_len); +size_t zvfs_serialize_resp_close(uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_close(const uint8_t *buf, size_t buf_len); +size_t zvfs_serialize_resp_delete(uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_delete(const uint8_t *buf, size_t buf_len); + +/* -------------------- 兼容旧接口 req/resp -------------------- */ + +struct zvfs_req { + uint32_t opcode; + + uint64_t size_hint; + uint64_t blob_id; + uint64_t handle_id; + + uint64_t offset; + uint64_t length; + uint32_t write_flags; + void *data; + + uint32_t ref_delta; + uint32_t add_ref_count; + struct zvfs_add_ref_item *add_ref_items; + + struct zvfs_conn *conn; + struct zvfs_blob_handle *handle; +}; + +struct zvfs_resp { + uint32_t opcode; + int32_t status; + + uint64_t blob_id; + uint64_t handle_id; + uint64_t size; + + uint64_t length; + void *data; + + uint64_t bytes_written; + + struct zvfs_conn *conn; +}; + +/* -------------------- 头部序列化/反序列化 -------------------- */ + +size_t zvfs_serialize_req_header(const struct zvfs_req_header *header, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_header(const uint8_t *buf, size_t buf_len, struct zvfs_req_header *header); + +size_t zvfs_serialize_resp_header(const struct zvfs_resp_header *header, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_header(const uint8_t *buf, size_t buf_len, struct zvfs_resp_header *header); + +/* -------------------- request body 序列化/反序列化 -------------------- */ + +size_t zvfs_serialize_req_create(const struct zvfs_req_create_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_create(const uint8_t *buf, size_t buf_len, struct zvfs_req_create_body *body); + +size_t zvfs_serialize_req_open(const struct zvfs_req_open_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_open(const uint8_t *buf, size_t buf_len, struct zvfs_req_open_body *body); + +size_t zvfs_serialize_req_read(const struct zvfs_req_read_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_read(const uint8_t *buf, size_t buf_len, struct zvfs_req_read_body *body); + +size_t zvfs_serialize_req_write(const struct zvfs_req_write_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_write(const uint8_t *buf, size_t buf_len, struct zvfs_req_write_body *body); + +size_t zvfs_serialize_req_resize(const struct zvfs_req_resize_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_resize(const uint8_t *buf, size_t buf_len, struct zvfs_req_resize_body *body); + +size_t zvfs_serialize_req_sync_md(const struct zvfs_req_sync_md_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_sync_md(const uint8_t *buf, size_t buf_len, struct zvfs_req_sync_md_body *body); + +size_t zvfs_serialize_req_close(const struct zvfs_req_close_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_close(const uint8_t *buf, size_t buf_len, struct zvfs_req_close_body *body); + +size_t zvfs_serialize_req_delete(const struct zvfs_req_delete_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_delete(const uint8_t *buf, size_t buf_len, struct zvfs_req_delete_body *body); + +size_t zvfs_serialize_req_add_ref(const struct zvfs_req_add_ref_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_add_ref(const uint8_t *buf, size_t buf_len, struct zvfs_req_add_ref_body *body); + +size_t zvfs_serialize_req_add_ref_batch(const struct zvfs_req_add_ref_batch_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req_add_ref_batch(const uint8_t *buf, size_t buf_len, struct zvfs_req_add_ref_batch_body *body); + +/* -------------------- response body 序列化/反序列化 -------------------- */ + +size_t zvfs_serialize_resp_create(const struct zvfs_resp_create_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_create(const uint8_t *buf, size_t buf_len, struct zvfs_resp_create_body *body); + +size_t zvfs_serialize_resp_open(const struct zvfs_resp_open_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_open(const uint8_t *buf, size_t buf_len, struct zvfs_resp_open_body *body); + +size_t zvfs_serialize_resp_read(const struct zvfs_resp_read_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_read(const uint8_t *buf, size_t buf_len, struct zvfs_resp_read_body *body); + +size_t zvfs_serialize_resp_write(const struct zvfs_resp_write_body *body, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp_write(const uint8_t *buf, size_t buf_len, struct zvfs_resp_write_body *body); + +/* -------------------- 兼容封装 -------------------- */ + +size_t zvfs_serialize_req(struct zvfs_req *req, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_req(uint8_t *buf, size_t buf_len, struct zvfs_req *req); + +size_t zvfs_serialize_resp(struct zvfs_resp *resp, uint8_t *buf, size_t buf_len); +size_t zvfs_deserialize_resp(uint8_t *buf, size_t buf_len, struct zvfs_resp *resp); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/src/spdk_engine/io_engine.c b/src/spdk_engine/io_engine.c index b2c25ac..737e4f5 100644 --- a/src/spdk_engine/io_engine.c +++ b/src/spdk_engine/io_engine.c @@ -1,850 +1,511 @@ #include "spdk_engine/io_engine.h" -#include "config.h" -#include "common/utils.h" -#include -#include -#include -#include -#include -#include -#include +#include "common/config.h" +#include "proto/ipc_proto.h" + #include -#include +#include +#include +#include #include -#include +#include +#include +#include -struct zvfs_spdk_io_engine g_engine = {0}; -static int g_engine_init_rc = -EAGAIN; -static __thread struct zvfs_tls_ctx tls = {0}; -static pthread_once_t g_tls_cleanup_once = PTHREAD_ONCE_INIT; -static pthread_key_t g_tls_cleanup_key; -// 初始化操作上下文 -struct json_load_ctx { - bool done; - int rc; +struct ipc_client_ctx { + int fd; + uint8_t rx_buf[ZVFS_IPC_BUF_SIZE]; + size_t rx_len; }; -struct bs_init_ctx { - bool done; - int rc; - struct spdk_blob_store *bs; +static __thread struct ipc_client_ctx g_ipc_tls = { + .fd = -1, + .rx_len = 0, }; -// metadata 操作通用上下文 -struct md_op_ctx { - void (*fn)(struct md_op_ctx *ctx); - volatile bool done; - int rc; - // op-specific fields - union { - struct { // for create - uint64_t size_hint; - spdk_blob_id blob_id; - } create; - struct { // for open - spdk_blob_id blob_id; - struct spdk_blob *blob; - } open; - struct { // for resize/sync/close - struct zvfs_blob_handle *handle; - uint64_t new_size; // for resize - } handle_op; - struct { // for delete - spdk_blob_id blob_id; - } delete; - }; - char *op_name; -}; - -// IO completion 上下文 -struct io_completion_ctx { - bool done; - int rc; -}; - -struct md_poller_bootstrap_ctx { - const char *bdev_name; - pthread_mutex_t mu; - pthread_cond_t cv; - bool done; - int rc; -}; - -static uint64_t now_mono_ms(void); -static int open_bdev_and_init_bs(const char *bdev_name); -static void ensure_tls_cleanup_key(void); -static void tls_cleanup_destructor(void *arg); - -// metadata poller 线程函数 -static void *md_poller_fn(void *arg) { - struct md_poller_bootstrap_ctx *boot = arg; - - spdk_set_thread(g_engine.md_thread); - tls.thread = g_engine.md_thread; - - int init_rc = open_bdev_and_init_bs(boot->bdev_name); - pthread_mutex_lock(&boot->mu); - boot->rc = init_rc; - boot->done = true; - pthread_cond_signal(&boot->cv); - pthread_mutex_unlock(&boot->mu); - - if (init_rc != 0) { - return NULL; +static const char *zvfs_ipc_socket_path(void) { + const char *path = getenv("ZVFS_SOCKET_PATH"); + if (path && path[0] != '\0') { + return path; } - while (true) { - spdk_thread_poll(g_engine.md_thread, 0, 0); - usleep(1000); + path = getenv("ZVFS_IPC_SOCKET_PATH"); + if (path && path[0] != '\0') { + return path; } - return NULL; + + return ZVFS_IPC_DEFAULT_SOCKET_PATH; } -static uint64_t now_mono_ms(void) { - struct timespec ts; - clock_gettime(CLOCK_MONOTONIC, &ts); - return (uint64_t)ts.tv_sec * 1000ULL + (uint64_t)ts.tv_nsec / 1000000ULL; +static void ipc_close_conn(struct ipc_client_ctx *ctx) { + if (ctx->fd >= 0) { + close(ctx->fd); + } + ctx->fd = -1; + ctx->rx_len = 0; } -// 前向声明 -static struct spdk_io_channel *get_current_channel(void); -static int dispatch_md_op(struct md_op_ctx *ctx); -static void md_op_cb(void *arg); -static int load_json_config(void); -static int ensure_engine_ready(const char *op); -static int ensure_current_spdk_thread(const char *op); - - - -// callbacks -static void json_app_load_done(int rc, void *arg); -static void zvfs_spdk_bdev_event_cb(enum spdk_bdev_event_type type, struct spdk_bdev *bdev, void *event_ctx); -static void bs_init_cb(void *arg, struct spdk_blob_store *bs, int bserrno); -static void blob_create_cb(void *arg, spdk_blob_id blobid, int rc); -static void blob_open_cb(void *arg, struct spdk_blob *blob, int rc); -static void blob_resize_cb(void *arg, int rc); -static void blob_sync_md_cb(void *arg, int rc); -static void blob_close_cb(void *arg, int rc); -static void blob_delete_cb(void *arg, int rc); -static void io_completion_cb(void *arg, int rc); - -// op functions on matadata -static void blob_create_on_md(struct md_op_ctx *ctx); -static void blob_open_on_md(struct md_op_ctx *ctx); -static void blob_resize_on_md(struct md_op_ctx *ctx); -static void blob_sync_md_on_md(struct md_op_ctx *ctx); -static void blob_close_on_md(struct md_op_ctx *ctx); -static void blob_delete_on_md(struct md_op_ctx *ctx); - -__attribute__((constructor)) static void preload_init(void) { - const char *auto_init = getenv("ZVFS_AUTO_INIT"); - if (!auto_init || strcmp(auto_init, "1") != 0) { - return; +static int ipc_connect(struct ipc_client_ctx *ctx) { + int fd = socket(AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) { + return -1; } - const char *bdev_name = getenv("SPDK_BDEV_NAME") ? getenv("SPDK_BDEV_NAME") : ZVFS_BDEV; - g_engine_init_rc = io_engine_init(bdev_name); - if (g_engine_init_rc != 0) { - SPDK_ERRLOG("io_engine_init failed in constructor: %d\n", g_engine_init_rc); + struct sockaddr_un addr; + memset(&addr, 0, sizeof(addr)); + addr.sun_family = AF_UNIX; + strncpy(addr.sun_path, zvfs_ipc_socket_path(), sizeof(addr.sun_path) - 1); + + if (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) { + int saved = errno; + close(fd); + errno = saved; + return -1; } + + ctx->fd = fd; + ctx->rx_len = 0; + return 0; } -static int wait_done(bool *done_ptr, int *rc_ptr, const char *op) { - const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME; - while (!*done_ptr) { - if (tls.thread) { - spdk_thread_poll(tls.thread, 0, 0); - }else{ - SPDK_ERRLOG("not init tls.thread\n"); - return -EBADE; +static int ipc_ensure_connected(struct ipc_client_ctx *ctx) { + if (ctx->fd >= 0) { + return 0; + } + return ipc_connect(ctx); +} + +static int write_all(int fd, const uint8_t *buf, size_t len) { + size_t off = 0; + while (off < len) { + ssize_t n = write(fd, buf + off, len - off); + if (n > 0) { + off += (size_t)n; + continue; } - if (now_mono_ms() >= deadline_ms) { - SPDK_ERRLOG("%s timeout\n", op); - return -ETIMEDOUT; + if (n < 0 && errno == EINTR) { + continue; } - } - if (*rc_ptr != 0) { - SPDK_ERRLOG("%s failed in callback: %d\n", op, *rc_ptr); - return *rc_ptr; + if (n == 0) { + errno = EPIPE; + } + return -1; } return 0; } -static int wait_done_volatile(volatile bool *done_ptr, int *rc_ptr, const char *op) { - const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME; - bool logged_no_tls = false; - while (!__atomic_load_n(done_ptr, __ATOMIC_ACQUIRE)) { - if (tls.thread) { - spdk_thread_poll(tls.thread, 0, 0); - } else { - /* - * md ops are executed on g_engine.md_thread by md_poller_fn. - * If current worker TLS is not initialized, we still need to wait - * for callback completion; returning early can invalidate stack ctx. - */ - if (!logged_no_tls) { - SPDK_NOTICELOG("%s: tls.thread not initialized, waiting on md thread only\n", op); - logged_no_tls = true; - } - usleep(1000); - } - if (now_mono_ms() >= deadline_ms) { - SPDK_ERRLOG("%s timeout\n", op); - return -ETIMEDOUT; - } - } - - if (*rc_ptr != 0) { - SPDK_ERRLOG("%s failed in callback: %d\n", op, *rc_ptr); - return *rc_ptr; - } - return 0; -} - -int io_engine_init(const char *bdev_name) { - if (g_engine_init_rc == 0 && g_engine.bs != NULL && g_engine.md_thread != NULL) { +static int try_pop_resp(struct ipc_client_ctx *ctx, struct zvfs_resp *resp) { + size_t consumed = zvfs_deserialize_resp(ctx->rx_buf, ctx->rx_len, resp); + if (consumed == 0) { return 0; } - struct spdk_env_opts env_opts; - spdk_env_opts_init(&env_opts); - env_opts.name = "zvfs"; - - - if (spdk_env_init(&env_opts) != 0) { - SPDK_ERRLOG("spdk_env_init failed\n"); - g_engine_init_rc = -1; - return g_engine_init_rc; + if (consumed < ctx->rx_len) { + memmove(ctx->rx_buf, ctx->rx_buf + consumed, ctx->rx_len - consumed); } - - spdk_log_set_print_level(SPDK_LOG_NOTICE); - spdk_log_set_level(SPDK_LOG_NOTICE); - spdk_log_open(NULL); - - if (spdk_thread_lib_init(NULL, 0) != 0) { - SPDK_ERRLOG("spdk_thread_lib_init failed\n"); - g_engine_init_rc = -1; - return g_engine_init_rc; - } - - // 为主线程 lazy init(constructor 在主线程跑) - tls.thread = spdk_thread_create("main_thread", NULL); - if (!tls.thread) { - SPDK_ERRLOG("create main_thread failed\n"); - g_engine_init_rc = -1; - return g_engine_init_rc; - } - spdk_set_thread(tls.thread); - - if (load_json_config() != 0) { - SPDK_ERRLOG("Failed to load SPDK config\n"); - g_engine_init_rc = -1; - return g_engine_init_rc; - } - - /** - * 这里是因为要让一个线程专门负责poll - */ - // 创建 md_thread - g_engine.md_thread = spdk_thread_create("md_thread", NULL); - if (!g_engine.md_thread) { - SPDK_ERRLOG("create md_thread failed\n"); - g_engine_init_rc = -1; - return g_engine_init_rc; - } - - struct md_poller_bootstrap_ctx boot = { - .bdev_name = bdev_name, - .done = false, - .rc = 0, - }; - pthread_mutex_init(&boot.mu, NULL); - pthread_cond_init(&boot.cv, NULL); - - // 起专用 poller pthread for md_thread(并在该线程完成 bdev/blobstore 初始化) - pthread_t md_poller_tid; - if (pthread_create(&md_poller_tid, NULL, md_poller_fn, &boot) != 0) { - SPDK_ERRLOG("pthread_create for md_poller failed\n"); - pthread_cond_destroy(&boot.cv); - pthread_mutex_destroy(&boot.mu); - g_engine_init_rc = -1; - return g_engine_init_rc; - } - if (pthread_detach(md_poller_tid) != 0) { - SPDK_ERRLOG("pthread_detach for md_poller failed\n"); - pthread_cond_destroy(&boot.cv); - pthread_mutex_destroy(&boot.mu); - g_engine_init_rc = -1; - return g_engine_init_rc; - } - - pthread_mutex_lock(&boot.mu); - while (!boot.done) { - pthread_cond_wait(&boot.cv, &boot.mu); - } - int rc = boot.rc; - pthread_mutex_unlock(&boot.mu); - pthread_cond_destroy(&boot.cv); - pthread_mutex_destroy(&boot.mu); - - if (rc != 0) { - g_engine_init_rc = rc; - return rc; - } - g_engine_init_rc = 0; - return g_engine_init_rc; + ctx->rx_len -= consumed; + return 1; } -static int load_json_config(void) { - const char *path = getenv("SPDK_JSON_CONFIG"); - if(!path) path = SPDK_JSON_PATH; +static int read_into_rx(struct ipc_client_ctx *ctx) { + while (1) { + if (ctx->rx_len >= sizeof(ctx->rx_buf)) { + errno = EOVERFLOW; + return -1; + } + ssize_t n = read(ctx->fd, ctx->rx_buf + ctx->rx_len, sizeof(ctx->rx_buf) - ctx->rx_len); + if (n > 0) { + ctx->rx_len += (size_t)n; + return 0; + } - struct json_load_ctx ctx = { - .done = false, - .rc = 0 - }; - spdk_subsystem_init_from_json_config(path, SPDK_DEFAULT_RPC_ADDR, json_app_load_done, - &ctx, true); - return wait_done(&ctx.done, &ctx.rc, "load_json_config"); + if (n == 0) { + errno = ECONNRESET; + return -1; + } + + if (errno == EINTR) { + continue; + } + + return -1; + } } -// lazy get channel -static struct spdk_io_channel *get_current_channel(void) { - if (ensure_engine_ready("get_current_channel") != 0) { - return NULL; - } +static int recv_one_resp(struct ipc_client_ctx *ctx, struct zvfs_resp *resp_out) { + while (1) { + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); - if (ensure_current_spdk_thread("get_current_channel") != 0) { - return NULL; - } + int has_resp = try_pop_resp(ctx, &resp); + if (has_resp == 1) { + *resp_out = resp; + return 0; + } - if (tls.thread) { - spdk_thread_poll(tls.thread, 0, 0); - } + if (read_into_rx(ctx) != 0) { + return -1; + } - if (!tls.channel) { - tls.channel = spdk_bs_alloc_io_channel(g_engine.bs); - if (!tls.channel) { - SPDK_ERRLOG("alloc io_channel failed\n"); - return NULL; + if (ctx->rx_len == sizeof(ctx->rx_buf)) { + struct zvfs_resp probe; + memset(&probe, 0, sizeof(probe)); + if (zvfs_deserialize_resp(ctx->rx_buf, ctx->rx_len, &probe) == 0) { + errno = EPROTO; + return -1; + } + if (probe.data) { + free(probe.data); + } } } - return tls.channel; } -static void put_current_channel(struct spdk_io_channel *ch) { - if (!ch) { - return; +static int set_errno_by_status(int status) { + if (status == 0) { + return 0; } - spdk_put_io_channel(ch); - if (tls.thread) { - spdk_thread_poll(tls.thread, 0, 0); + + if (status < 0) { + errno = -status; + } else { + errno = status; } - if (tls.channel == ch) { - tls.channel = NULL; + + if (errno == 0) { + errno = EIO; } + return -1; } -static void ensure_tls_cleanup_key(void) { - (void)pthread_key_create(&g_tls_cleanup_key, tls_cleanup_destructor); +uint8_t tx[ZVFS_IPC_BUF_SIZE]; + +static int ipc_do_req(struct zvfs_req *req, struct zvfs_resp *resp_out) { + struct ipc_client_ctx *ctx = &g_ipc_tls; + + if (ipc_ensure_connected(ctx) != 0) { + return -1; + } + + size_t tx_len = zvfs_serialize_req(req, tx, sizeof(tx)); + if (tx_len == 0) { + errno = EMSGSIZE; + return -1; + } + + if (write_all(ctx->fd, tx, tx_len) != 0) { + ipc_close_conn(ctx); + return -1; + } + + if (recv_one_resp(ctx, resp_out) != 0) { + ipc_close_conn(ctx); + return -1; + } + + return set_errno_by_status(resp_out->status); } -static void tls_cleanup_destructor(void *arg) { - (void)arg; - if (!tls.thread || tls.thread == g_engine.md_thread) { - return; - } - - spdk_set_thread(tls.thread); - - if (tls.channel) { - spdk_put_io_channel(tls.channel); - tls.channel = NULL; - } - - spdk_thread_exit(tls.thread); - const uint64_t deadline_ms = now_mono_ms() + ZVFS_WAIT_TIME; - while (!spdk_thread_is_exited(tls.thread)) { - spdk_thread_poll(tls.thread, 0, 0); - if (now_mono_ms() >= deadline_ms) { - SPDK_ERRLOG("worker tls thread exit timeout\n"); - break; - } - usleep(1000); - } - - if (spdk_thread_is_exited(tls.thread)) { - spdk_thread_destroy(tls.thread); - } - tls.thread = NULL; - pthread_setspecific(g_tls_cleanup_key, NULL); -} - -static int ensure_current_spdk_thread(const char *op) { - pthread_once(&g_tls_cleanup_once, ensure_tls_cleanup_key); - - if (!tls.thread) { - char name[32]; - snprintf(name, sizeof(name), "worker_%lu", (unsigned long)pthread_self()); - tls.thread = spdk_thread_create(name, NULL); - if (!tls.thread) { - SPDK_ERRLOG("%s: spdk_thread_create failed\n", op); - return -ENOMEM; - } - pthread_setspecific(g_tls_cleanup_key, (void *)1); - } - spdk_set_thread(tls.thread); +int io_engine_init(void) { return 0; } -// 通用 dispatch md op -static int dispatch_md_op(struct md_op_ctx *ctx) { - int rc = ensure_engine_ready(ctx->op_name ? ctx->op_name : "dispatch_md_op"); - if (rc != 0) { - return rc; - } - rc = ensure_current_spdk_thread(ctx->op_name ? ctx->op_name : "dispatch_md_op"); - if (rc != 0) { - return rc; +int blob_create(uint64_t size_hint, uint64_t *blob_id_out, uint64_t *handle_id_out) { + if (!blob_id_out || !handle_id_out) { + errno = EINVAL; + return -1; } - struct md_op_ctx *async_ctx = malloc(sizeof(*async_ctx)); - if (!async_ctx) { - return -ENOMEM; - } - *async_ctx = *ctx; - __atomic_store_n(&async_ctx->done, false, __ATOMIC_RELAXED); - async_ctx->rc = 0; + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_CREATE; + req.size_hint = size_hint; - rc = spdk_thread_send_msg(g_engine.md_thread, md_op_cb, async_ctx); + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); - if (rc != 0) { - SPDK_ERRLOG("%s: spdk_thread_send_msg failed: %d\n", async_ctx->op_name, rc); - free(async_ctx); - return rc; + if (ipc_do_req(&req, &resp) != 0) { + return -1; } - rc = wait_done_volatile(&async_ctx->done, &async_ctx->rc, async_ctx->op_name); - if (rc == -ETIMEDOUT) { - SPDK_ERRLOG("%s timeout; keep async ctx alive to avoid UAF\n", async_ctx->op_name); - return rc; + *blob_id_out = resp.blob_id; + *handle_id_out = resp.handle_id; + + if (resp.data) { + free(resp.data); } - - *ctx = *async_ctx; - free(async_ctx); - return rc; -} - -static int ensure_engine_ready(const char *op) { - if (g_engine_init_rc != 0) { - SPDK_ERRLOG("%s: io engine init failed, rc=%d\n", op, g_engine_init_rc); - return g_engine_init_rc; - } - - if (!g_engine.bs || !g_engine.md_thread) { - SPDK_ERRLOG("%s: io engine not ready (bs=%p, md_thread=%p)\n", - op, (void *)g_engine.bs, (void *)g_engine.md_thread); - return -EIO; - } - return 0; } -static void md_op_cb(void *arg) { - struct md_op_ctx *ctx = arg; - ctx->fn(ctx); -} - -void json_app_load_done(int rc, void *arg) { - struct json_load_ctx* ctx = (struct json_load_ctx*)arg; - ctx->done = true; - ctx->rc = rc; -} - -// bdev open + bs init -static void zvfs_spdk_bdev_event_cb(enum spdk_bdev_event_type type, struct spdk_bdev *bdev, - void *event_ctx) { - // 后续加日志或处理 - switch (type) { - case SPDK_BDEV_EVENT_REMOVE: - SPDK_NOTICELOG("bdev removed: %s\n", spdk_bdev_get_name(bdev)); - break; - default: - break; - } -} - -static void bs_init_cb(void *arg, struct spdk_blob_store *bs, int bserrno) { - struct bs_init_ctx *ctx = (struct bs_init_ctx *)arg; - ctx->rc = bserrno; - ctx->bs = bs; - ctx->done = true; -} - -static int open_bdev_and_init_bs(const char *bdev_name) { - SPDK_NOTICELOG("open_bdev_and_init_bs\n"); - struct spdk_bs_dev *bs_dev = NULL; - int rc = spdk_bdev_create_bs_dev_ext(bdev_name, zvfs_spdk_bdev_event_cb, NULL, &bs_dev); - if (rc != 0) { - SPDK_ERRLOG("spdk_bdev_create_bs_dev_ext failed: %d\n", rc); - return rc; - } - g_engine.bs_dev = bs_dev; - - struct bs_init_ctx ctx = { - .done = false, - .rc = 0, - .bs = NULL - }; - - /* 优先加载已有 blobstore;失败时回退到 init。 */ - spdk_bs_load(bs_dev, NULL, bs_init_cb, &ctx); - rc = wait_done(&ctx.done, &ctx.rc, "bs_load"); - if (rc != 0) { - SPDK_NOTICELOG("spdk_bs_load failed (%d), fallback to spdk_bs_init\n", rc); - - /* - * 注意:spdk_bs_load 失败路径会销毁传入的 dev。 - * 这里必须重新 create 一个新的 bs_dev,不能复用旧指针。 - */ - bs_dev = NULL; - rc = spdk_bdev_create_bs_dev_ext(bdev_name, zvfs_spdk_bdev_event_cb, NULL, &bs_dev); - if (rc != 0) { - SPDK_ERRLOG("spdk_bdev_create_bs_dev_ext(for init fallback) failed: %d\n", rc); - g_engine.bs_dev = NULL; - return rc; - } - g_engine.bs_dev = bs_dev; - - ctx.done = false; - ctx.rc = 0; - ctx.bs = NULL; - - spdk_bs_init(bs_dev, NULL, bs_init_cb, &ctx); - rc = wait_done(&ctx.done, &ctx.rc, "bs_init"); - if (rc != 0) { - g_engine.bs_dev = NULL; - return rc; - } +int blob_open(uint64_t blob_id, uint64_t *handle_id_out) { + if (!handle_id_out) { + errno = EINVAL; + return -1; } - g_engine.bs = ctx.bs; - g_engine.io_unit_size = spdk_bs_get_io_unit_size(ctx.bs); - g_engine.cluster_size = spdk_bs_get_cluster_size(ctx.bs); + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_OPEN; + req.blob_id = blob_id; - SPDK_NOTICELOG("Blobstore initialized successfully on bdev: %s\n", bdev_name); + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + return -1; + } + + *handle_id_out = resp.handle_id; + + if (resp.data) { + free(resp.data); + } return 0; } -// blob_create -static void blob_create_cb(void *arg, spdk_blob_id blobid, int rc) { - struct md_op_ctx *ctx = arg; - ctx->rc = rc; - ctx->create.blob_id = blobid; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); -} - -static void blob_create_on_md(struct md_op_ctx *ctx) { - struct spdk_blob_opts opts; - spdk_blob_opts_init(&opts, sizeof(opts)); - // size_hint 如果需,但 create 不直接 set size,用 resize 后 - spdk_bs_create_blob_ext(g_engine.bs, &opts, blob_create_cb, ctx); -} - -struct zvfs_blob_handle *blob_create(uint64_t size_hint) { - if(size_hint == 0) size_hint = g_engine.cluster_size; - struct md_op_ctx ctx = {.fn = blob_create_on_md, .create.size_hint = size_hint, .op_name = "blob create"}; - int rc = dispatch_md_op(&ctx); - if (rc) { - errno = (rc < 0) ? -rc : EIO; - return NULL; +int blob_write_ex(uint64_t handle_id, uint64_t offset, const void *buf, size_t len, uint32_t write_flags) { + if (len == 0) { + return 0; + } + if (!buf || handle_id == 0) { + errno = EINVAL; + return -1; } - struct zvfs_blob_handle *handle = blob_open(ctx.create.blob_id); - if (handle && size_hint > 0) { - rc = blob_resize(handle, size_hint); // 初始 resize - if (rc != 0) { - SPDK_ERRLOG("blob_resize failed after create: %d\n", rc); - errno = (rc < 0) ? -rc : EIO; - blob_close(handle); - return NULL; + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_WRITE; + req.handle_id = handle_id; + req.offset = offset; + req.length = (uint64_t)len; + req.write_flags = write_flags; + req.data = (void *)buf; + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + return -1; + } + + if (resp.bytes_written != (uint64_t)len) { + if (resp.data) { + free(resp.data); } + errno = EIO; + return -1; + } - rc = blob_sync_md(handle); - if (rc != 0) { - SPDK_ERRLOG("blob_sync_md failed after resize: %d\n", rc); - errno = (rc < 0) ? -rc : EIO; - blob_close(handle); - return NULL; + if (resp.data) { + free(resp.data); + } + return 0; +} + +int blob_write(uint64_t handle_id, uint64_t offset, const void *buf, size_t len) { + return blob_write_ex(handle_id, offset, buf, len, 0); +} + +int blob_read(uint64_t handle_id, uint64_t offset, void *buf, size_t len) { + if (len == 0) { + return 0; + } + if (!buf || handle_id == 0) { + errno = EINVAL; + return -1; + } + + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_READ; + req.handle_id = handle_id; + req.offset = offset; + req.length = (uint64_t)len; + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + return -1; + } + + if (!resp.data || resp.length != (uint64_t)len) { + if (resp.data) { + free(resp.data); } - } - return handle; -} - -// blob_open -static void blob_open_cb(void *arg, struct spdk_blob *blob, int rc) { - struct md_op_ctx *ctx = arg; - ctx->rc = rc; - ctx->open.blob = blob; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); -} - -static void blob_open_on_md(struct md_op_ctx *ctx) { - struct spdk_blob_open_opts opts; - spdk_blob_open_opts_init(&opts, sizeof(opts)); - spdk_bs_open_blob_ext(g_engine.bs, ctx->open.blob_id, &opts, blob_open_cb, ctx); -} - -struct zvfs_blob_handle *blob_open(uint64_t blob_id) { - struct md_op_ctx ctx = {.fn = blob_open_on_md, .open.blob_id = blob_id, .op_name = "blob open"}; - int rc = dispatch_md_op(&ctx); - if (rc) { - errno = (rc < 0) ? -rc : EIO; - return NULL; + errno = EPROTO; + return -1; } - struct zvfs_blob_handle *handle = malloc(sizeof(*handle)); - if (!handle) return NULL; - - handle->id = blob_id; - handle->blob = ctx.open.blob; - handle->size = spdk_blob_get_num_clusters(handle->blob) * g_engine.cluster_size; - - // 预分配固定大小的 DMA buf,后续所有 IO 都经过这块缓存,避免每次 IO 动态申请 - // 必须用 spdk_dma_malloc 保证地址对齐到 io_unit_size - handle->dma_buf_size = ZVFS_DMA_BUF_SIZE; - handle->dma_buf = spdk_dma_malloc(ZVFS_DMA_BUF_SIZE, g_engine.io_unit_size, NULL); - if (!handle->dma_buf) { - SPDK_ERRLOG("spdk_dma_malloc failed for blob %lu\n", blob_id); - free(handle); - return NULL; - } - - return handle; + memcpy(buf, resp.data, len); + free(resp.data); + return 0; } -// blob_write -static void io_completion_cb(void *arg, int rc) { - struct io_completion_ctx *ctx = arg; - ctx->rc = rc; - ctx->done = true; +int blob_resize(uint64_t handle_id, uint64_t new_size) { + if (handle_id == 0) { + errno = EINVAL; + return -1; + } + + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_RESIZE; + req.handle_id = handle_id; + req.size_hint = new_size; + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + return -1; + } + + if (resp.data) { + free(resp.data); + } + return 0; } -int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf, size_t len) { - if (tls.thread) { - spdk_thread_poll(tls.thread, 0, 0); +int blob_sync_md(uint64_t handle_id) { + if (handle_id == 0) { + errno = EINVAL; + return -1; } - if (len == 0) return 0; + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_SYNC_MD; + req.handle_id = handle_id; - struct spdk_io_channel *ch = get_current_channel(); - if (!ch) return -1; - int ret = 0; + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); - // 越界检查 - if (offset + len > handle->size) { - SPDK_ERRLOG("blob_write out of range: offset=%lu len=%zu blob_size=%lu\n", - offset, len, handle->size); - ret = -ERANGE; - goto out; + if (ipc_do_req(&req, &resp) != 0) { + return -1; } - // 计算对齐后的 IO 范围和 dma_buf 内偏移 - uint64_t lba_off = 0; - uint64_t lba_len = 0; - uint32_t buf_off = 0; - int rc = zvfs_calc_io_units(offset, len, g_engine.io_unit_size, &lba_off, &lba_len, &buf_off); - if (rc != 0) { - SPDK_ERRLOG("blob_write calc_io_units failed: %d\n", rc); - ret = rc; - goto out; + if (resp.data) { + free(resp.data); } - - size_t aligned_bytes = lba_len * g_engine.io_unit_size; - if (aligned_bytes > ZVFS_DMA_BUF_SIZE) { - SPDK_ERRLOG("blob_write aligned_bytes=%zu exceeds ZVFS_DMA_BUF_SIZE\n", aligned_bytes); - ret = -ENOSPC; - goto out; - } - - struct io_completion_ctx io_ctx = {.done = false, .rc = 0}; - - spdk_blob_io_read(handle->blob, ch, handle->dma_buf, lba_off, lba_len, - io_completion_cb, &io_ctx); - - - rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_write(read phase)"); - if (rc != 0) { - ret = rc; - goto out; - } - - memcpy((uint8_t *)handle->dma_buf + buf_off, buf, len); - io_ctx.done = false; - io_ctx.rc = 0; - - spdk_blob_io_write(handle->blob, ch, handle->dma_buf, lba_off, lba_len, - io_completion_cb, &io_ctx); - rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_write(write phase)"); - if (rc != 0) { - ret = rc; - goto out; - } - - ret = io_ctx.rc; -out: - put_current_channel(ch); - return ret; + return 0; } -// blob_read 类似 -int blob_read(struct zvfs_blob_handle *handle, uint64_t offset, void *buf, size_t len) { - if (tls.thread) { - spdk_thread_poll(tls.thread, 0, 0); +int blob_close(uint64_t handle_id) { + if (handle_id == 0) { + errno = EINVAL; + return -1; } - if (len == 0) return 0; + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_CLOSE; + req.handle_id = handle_id; - struct spdk_io_channel *ch = get_current_channel(); - if (!ch) return -1; - int ret = 0; + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); - // 越界检查 - if (offset + len > handle->size) { - SPDK_ERRLOG("blob_read out of range: offset=%lu len=%zu blob_size=%lu\n", - offset, len, handle->size); - ret = -ERANGE; - goto out; + if (ipc_do_req(&req, &resp) != 0) { + return -1; } - - // 计算对齐后的 IO 范围和 dma_buf 内偏移 - uint64_t lba_off = 0; - uint64_t lba_len = 0; - uint32_t buf_off = 0; - int rc = zvfs_calc_io_units(offset, len, g_engine.io_unit_size, &lba_off, &lba_len, &buf_off); - if (rc != 0) { - SPDK_ERRLOG("io_read offset/len not aligned to io_unit_size=%lu\n", g_engine.io_unit_size); - ret = rc; - goto out; + if (resp.data) { + free(resp.data); } - - // 读入对齐范围到 dma_buf,再从正确偏移处截取到用户 buf - size_t aligned_bytes = lba_len * g_engine.io_unit_size; - if (aligned_bytes > ZVFS_DMA_BUF_SIZE) { - SPDK_ERRLOG("blob_read aligned_bytes=%zu exceeds ZVFS_DMA_BUF_SIZE\n", aligned_bytes); - ret = -ENOSPC; - goto out; - } - - struct io_completion_ctx io_ctx = {.done = false, .rc = 0}; - - spdk_blob_io_read(handle->blob, ch, handle->dma_buf, lba_off, lba_len, - io_completion_cb, &io_ctx); - - rc = wait_done(&io_ctx.done, &io_ctx.rc, "io_read"); - if (rc != 0) { - ret = rc; - goto out; - } - - memcpy(buf, (uint8_t *)handle->dma_buf + buf_off, len); - ret = io_ctx.rc; -out: - put_current_channel(ch); - return ret; -} - -// blob_resize -static void blob_resize_cb(void *arg, int rc) { - struct md_op_ctx *ctx = arg; - ctx->rc = rc; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); -} - -static void blob_resize_on_md(struct md_op_ctx *ctx) { - uint64_t new_clusters = 0; - uint64_t cluster_size = g_engine.cluster_size; - int rc = zvfs_calc_ceil_units(ctx->handle_op.new_size, cluster_size, &new_clusters); - if (rc != 0) { - ctx->rc = rc; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); - return; - } - spdk_blob_resize(ctx->handle_op.handle->blob, new_clusters, blob_resize_cb, ctx); -} - -int blob_resize(struct zvfs_blob_handle *handle, uint64_t new_size) { - struct md_op_ctx ctx = {.fn = blob_resize_on_md, .op_name = "blob resize"}; - ctx.handle_op.handle = handle; - ctx.handle_op.new_size = new_size; - int rc = dispatch_md_op(&ctx); - if (rc == 0) { - uint64_t new_clusters = 0; - zvfs_calc_ceil_units(new_size, g_engine.cluster_size, &new_clusters); - handle->size = new_clusters * g_engine.cluster_size; - } - return rc; -} - -// blob_sync_md -static void blob_sync_md_cb(void *arg, int rc) { - struct md_op_ctx *ctx = arg; - ctx->rc = rc; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); -} - -static void blob_sync_md_on_md(struct md_op_ctx *ctx) { - spdk_blob_sync_md(ctx->handle_op.handle->blob, blob_sync_md_cb, ctx); -} - -int blob_sync_md(struct zvfs_blob_handle *handle) { - struct md_op_ctx ctx = {.fn = blob_sync_md_on_md, .op_name = "blob sync"}; - ctx.handle_op.handle = handle; - return dispatch_md_op(&ctx); -} - -// blob_close -static void blob_close_cb(void *arg, int rc) { - struct md_op_ctx *ctx = arg; - ctx->rc = rc; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); -} - -static void blob_close_on_md(struct md_op_ctx *ctx) { - spdk_blob_close(ctx->handle_op.handle->blob, blob_close_cb, ctx); -} - -int blob_close(struct zvfs_blob_handle *handle) { - struct md_op_ctx ctx = {.fn = blob_close_on_md, .op_name = "blob close"}; - ctx.handle_op.handle = handle; - int rc = dispatch_md_op(&ctx); - if (rc == 0) { - spdk_dma_free(handle->dma_buf); - free(handle); - } - return rc; -} - -// blob_delete -static void blob_delete_cb(void *arg, int rc) { - struct md_op_ctx *ctx = arg; - ctx->rc = rc; - __atomic_store_n(&ctx->done, true, __ATOMIC_RELEASE); -} - -static void blob_delete_on_md(struct md_op_ctx *ctx) { - spdk_bs_delete_blob(g_engine.bs, ctx->delete.blob_id, blob_delete_cb, ctx); + return 0; } int blob_delete(uint64_t blob_id) { - struct md_op_ctx ctx = {.fn = blob_delete_on_md, .op_name = "blob delete"}; - ctx.delete.blob_id = blob_id; - return dispatch_md_op(&ctx); + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_DELETE; + req.blob_id = blob_id; + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + return -1; + } + + if (resp.data) { + free(resp.data); + } + return 0; +} + +int blob_add_ref(uint64_t handle_id, uint32_t ref_delta) { + if (handle_id == 0 || ref_delta == 0) { + errno = EINVAL; + return -1; + } + + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_ADD_REF; + req.handle_id = handle_id; + req.ref_delta = ref_delta; + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + return -1; + } + + if (resp.data) { + free(resp.data); + } + return 0; +} + +int blob_add_ref_batch(const uint64_t *handle_ids, const uint32_t *ref_deltas, uint32_t count) { + uint32_t i; + struct zvfs_add_ref_item *items = NULL; + + if (!handle_ids || !ref_deltas || count == 0) { + errno = EINVAL; + return -1; + } + + items = calloc(count, sizeof(*items)); + if (!items) { + errno = ENOMEM; + return -1; + } + + for (i = 0; i < count; i++) { + if (handle_ids[i] == 0 || ref_deltas[i] == 0) { + free(items); + errno = EINVAL; + return -1; + } + items[i].handle_id = handle_ids[i]; + items[i].ref_delta = ref_deltas[i]; + } + + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_ADD_REF_BATCH; + req.add_ref_count = count; + req.add_ref_items = items; + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + + if (ipc_do_req(&req, &resp) != 0) { + free(items); + return -1; + } + + free(items); + if (resp.data) { + free(resp.data); + } + return 0; } diff --git a/src/spdk_engine/io_engine.h b/src/spdk_engine/io_engine.h index c5a80d0..8fbcece 100644 --- a/src/spdk_engine/io_engine.h +++ b/src/spdk_engine/io_engine.h @@ -2,42 +2,20 @@ #define __ZVFS_IO_ENGINE_H__ #include -#include -#include +#include -// blob_handle 结构体:底层 blob 信息,不含文件级 size(上层维护) -typedef struct zvfs_blob_handle { - spdk_blob_id id; - struct spdk_blob *blob; - uint64_t size; - void *dma_buf; - uint64_t dma_buf_size; -} zvfs_blob_handle_t ; +int io_engine_init(void); -typedef struct zvfs_spdk_io_engine { - struct spdk_bs_dev *bs_dev; - struct spdk_blob_store *bs; - struct spdk_thread *md_thread; - uint64_t io_unit_size; - uint64_t cluster_size; - int reactor_count; - -} zvfs_spdk_io_engine_t; - -typedef struct zvfs_tls_ctx { - struct spdk_thread *thread; - struct spdk_io_channel *channel; -}zvfs_tls_ctx_t; - -int io_engine_init(const char *bdev_name); - -struct zvfs_blob_handle *blob_create(uint64_t size_hint); // 创建并 open,返回 handle -struct zvfs_blob_handle *blob_open(uint64_t blob_id); // open 现有 blob,返回 handle -int blob_write(struct zvfs_blob_handle *handle, uint64_t offset, const void *buf, size_t len); -int blob_read(struct zvfs_blob_handle *handle, uint64_t offset, void *buf, size_t len); -int blob_resize(struct zvfs_blob_handle *handle, uint64_t new_size); -int blob_sync_md(struct zvfs_blob_handle *handle); -int blob_close(struct zvfs_blob_handle *handle); // close 这个 handle 的 blob* -int blob_delete(uint64_t blob_id); // delete,整个 blob(不需 handle) +int blob_create(uint64_t size_hint, uint64_t *blob_id_out, uint64_t *handle_id_out); +int blob_open(uint64_t blob_id, uint64_t *handle_id_out); +int blob_write_ex(uint64_t handle_id, uint64_t offset, const void *buf, size_t len, uint32_t write_flags); +int blob_write(uint64_t handle_id, uint64_t offset, const void *buf, size_t len); +int blob_read(uint64_t handle_id, uint64_t offset, void *buf, size_t len); +int blob_resize(uint64_t handle_id, uint64_t new_size); +int blob_sync_md(uint64_t handle_id); +int blob_close(uint64_t handle_id); +int blob_delete(uint64_t blob_id); +int blob_add_ref(uint64_t handle_id, uint32_t ref_delta); +int blob_add_ref_batch(const uint64_t *handle_ids, const uint32_t *ref_deltas, uint32_t count); #endif // __ZVFS_IO_ENGINE_H__ diff --git a/src/zvfsmalloc.json b/src/zvfsmalloc.json index 69925d7..e761e99 100755 --- a/src/zvfsmalloc.json +++ b/src/zvfsmalloc.json @@ -7,7 +7,7 @@ "method": "bdev_malloc_create", "params": { "name": "Malloc0", - "num_blocks": 262140, + "num_blocks": 1048576, "block_size": 512 } } diff --git a/tests/Makefile b/tests/Makefile index 765f5f2..6a4f4b4 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -1,4 +1,4 @@ -SUBDIRS := ioengine_test hook +SUBDIRS := hook_test daemon_test .PHONY: all clean $(SUBDIRS) diff --git a/tests/daemon_test/Makefile b/tests/daemon_test/Makefile new file mode 100644 index 0000000..830872f --- /dev/null +++ b/tests/daemon_test/Makefile @@ -0,0 +1,12 @@ + +BIN_DIR := $(abspath $(CURDIR)/../bin) +PROTO_DIR := $(abspath $(CURDIR)/../../src/proto) + +CFLAGS := -I$(abspath $(CURDIR)/../../src) + +all: + gcc -g -o $(BIN_DIR)/ipc_echo_test ipc_echo_test.c + gcc -g $(CFLAGS) -o $(BIN_DIR)/ipc_zvfs_test ipc_zvfs_test.c $(PROTO_DIR)/ipc_proto.c + +clean: + rm -rf $(BIN_DIR)/ipc_echo_test $(BIN_DIR)/ipc_zvfs_test \ No newline at end of file diff --git a/tests/daemon_test/ipc_echo_test.c b/tests/daemon_test/ipc_echo_test.c new file mode 100644 index 0000000..cb8a9c3 --- /dev/null +++ b/tests/daemon_test/ipc_echo_test.c @@ -0,0 +1,33 @@ +#include +#include +#include +#include +#include + + +int main() +{ + int fd = socket(AF_UNIX, SOCK_STREAM, 0); + + struct sockaddr_un addr; + + memset(&addr, 0, sizeof(addr)); + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, "/tmp/zvfs.sock"); + + connect(fd, (struct sockaddr*)&addr, sizeof(addr)); + + char *msg = "hello reactor\n"; + + write(fd, msg, strlen(msg)); + + char buf[4096]; + + int n = read(fd, buf, sizeof(buf)); + + printf("echo: %.*s\n", n, buf); + + close(fd); + + return 0; +} \ No newline at end of file diff --git a/tests/daemon_test/ipc_zvfs_test.c b/tests/daemon_test/ipc_zvfs_test.c new file mode 100644 index 0000000..61b71e9 --- /dev/null +++ b/tests/daemon_test/ipc_zvfs_test.c @@ -0,0 +1,265 @@ +#include +#include +#include +#include +#include +#include +#include "proto/ipc_proto.h" + +#define SOCKET_PATH "/tmp/zvfs.sock" +#define BUF_SIZE 4096 + +int connect_to_server() { + int fd = socket(AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) { + perror("socket"); + return -1; + } + + struct sockaddr_un addr; + memset(&addr, 0, sizeof(addr)); + addr.sun_family = AF_UNIX; + strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path)-1); + + if (connect(fd, (struct sockaddr*)&addr, sizeof(addr)) < 0) { + perror("connect"); + close(fd); + return -1; + } + + return fd; +} + +// -------------------- 操作函数 -------------------- + +void do_create(int fd) { + struct zvfs_req req; + memset(&req, 0, sizeof(req)); + req.opcode = ZVFS_OP_CREATE; + req.size_hint = 1024; // 1KB + + uint8_t buf[BUF_SIZE]; + size_t n = zvfs_serialize_req(&req, buf, sizeof(buf)); + if (n == 0) { fprintf(stderr,"serialize failed\n"); return; } + + if (write(fd, buf, n) != n) { perror("write"); return; } + + uint8_t resp_buf[BUF_SIZE]; + ssize_t r = read(fd, resp_buf, sizeof(resp_buf)); + if (r <= 0) { perror("read"); return; } + + struct zvfs_resp resp; + memset(&resp, 0, sizeof(resp)); + size_t consumed = zvfs_deserialize_resp(resp_buf, r, &resp); + if (consumed == 0) { fprintf(stderr, "deserialize failed\n"); return; } + + printf("Received CREATE response: status=%d, blob_id=%lu, handle_id=%lu\n", + resp.status, resp.blob_id, resp.handle_id); + + if(resp.data) free(resp.data); +} + +void do_open(int fd, uint64_t blob_id) { + struct zvfs_req req; + memset(&req,0,sizeof(req)); + req.opcode = ZVFS_OP_OPEN; + req.blob_id = blob_id; + + uint8_t buf[BUF_SIZE]; + size_t n = zvfs_serialize_req(&req, buf, sizeof(buf)); + if (n == 0) { fprintf(stderr,"serialize failed\n"); return; } + + if (write(fd, buf, n) != n) { perror("write"); return; } + + uint8_t resp_buf[BUF_SIZE]; + ssize_t r = read(fd, resp_buf, sizeof(resp_buf)); + if (r <= 0) { perror("read"); return; } + + struct zvfs_resp resp; + memset(&resp,0,sizeof(resp)); + size_t consumed = zvfs_deserialize_resp(resp_buf, r, &resp); + if (consumed == 0) { fprintf(stderr, "deserialize failed\n"); return; } + + printf("Received OPEN response: status=%d, handle_id=%lu, size=%lu\n", + resp.status, resp.handle_id, resp.size); + + if(resp.data) free(resp.data); +} + +void do_read(int fd, uint64_t handle_id, uint64_t offset, uint64_t length) { + struct zvfs_req req; + memset(&req,0,sizeof(req)); + req.opcode = ZVFS_OP_READ; + req.handle_id = handle_id; + req.offset = offset; + req.length = length; + + uint8_t buf[BUF_SIZE]; + size_t n = zvfs_serialize_req(&req, buf, sizeof(buf)); + if (n == 0) { fprintf(stderr,"serialize failed\n"); return; } + + if (write(fd, buf, n) != n) { perror("write"); return; } + + uint8_t resp_buf[BUF_SIZE]; + ssize_t r = read(fd, resp_buf, sizeof(resp_buf)); + if (r <= 0) { perror("read"); return; } + + struct zvfs_resp resp; + memset(&resp,0,sizeof(resp)); + size_t consumed = zvfs_deserialize_resp(resp_buf, r, &resp); + if (consumed == 0) { fprintf(stderr, "deserialize failed\n"); return; } + + printf("Received READ response: status=%d, length=%lu\n", + resp.status, resp.length); + + if(resp.data) { + printf("Data: "); + for(size_t i=0;i\n read \n write \n writeg \n close \n delete \n resize \n quit\n"); + + char line[256]; + while (1) { + printf("> "); + if(!fgets(line, sizeof(line), stdin)) break; + + char cmd[32]; + uint64_t a,b,c; + char data[128]; + + if (sscanf(line, "%31s", cmd) != 1) continue; + + if (strcmp(cmd,"quit")==0) break; + else if (strcmp(cmd,"create")==0) do_create(fd); + else if (strcmp(cmd,"open")==0 && sscanf(line,"%*s %lu",&a)==1) do_open(fd,a); + else if (strcmp(cmd,"read")==0 && sscanf(line,"%*s %lu %lu %lu",&a,&b,&c)==3) do_read(fd,a,b,c); + else if (strcmp(cmd,"write")==0 && sscanf(line,"%*s %lu %lu %127s",&a,&b,data)==3) + do_write(fd, a, b, data, strlen(data), 0); + else if (strcmp(cmd,"writeg")==0 && sscanf(line,"%*s %lu %lu %127s",&a,&b,data)==3) + do_write(fd, a, b, data, strlen(data), ZVFS_WRITE_F_AUTO_GROW); + else if (strcmp(cmd,"close")==0 && sscanf(line,"%*s %lu",&a)==1) do_close(fd,a); + else if (strcmp(cmd,"delete")==0 && sscanf(line,"%*s %lu",&a)==1) do_delete(fd,a); + else if (strcmp(cmd,"resize")==0 && sscanf(line,"%*s %lu %lu",&a,&b)==2) do_resize(fd,a,b); + else printf("Unknown or invalid command\n"); + } + + close(fd); + return 0; +} diff --git a/tests/hook/Makefile b/tests/hook_test/Makefile similarity index 100% rename from tests/hook/Makefile rename to tests/hook_test/Makefile diff --git a/tests/hook/hook_api_test.c b/tests/hook_test/hook_api_test.c similarity index 100% rename from tests/hook/hook_api_test.c rename to tests/hook_test/hook_api_test.c diff --git a/tests/ioengine_test/Makefile b/tests/ioengine_test/Makefile deleted file mode 100644 index d47dc0d..0000000 --- a/tests/ioengine_test/Makefile +++ /dev/null @@ -1,43 +0,0 @@ -# SPDX-License-Identifier: BSD-3-Clause - -SPDK_ROOT_DIR := $(abspath $(CURDIR)/../../spdk) -include $(SPDK_ROOT_DIR)/mk/spdk.common.mk -include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk -include $(SPDK_ROOT_DIR)/mk/spdk.app_vars.mk - -# 输出目录 -BIN_DIR := $(abspath $(CURDIR)/../bin) - -TEST_BINS := \ - ioengine_single_blob_test \ - ioengine_multi_blob_test \ - ioengine_same_blob_mt_test - -COMMON_SRCS := \ - test_common.c \ - ../../src/spdk_engine/io_engine.c \ - ../../src/common/utils.c - -SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_bdev -LIBS += $(SPDK_LIB_LINKER_ARGS) - -CFLAGS += -I$(abspath $(CURDIR)/../../src) -I$(CURDIR) - -.PHONY: all clean -all: $(BIN_DIR) $(addprefix $(BIN_DIR)/,$(TEST_BINS)) - -# 创建 bin 目录 -$(BIN_DIR): - mkdir -p $(BIN_DIR) - -$(BIN_DIR)/ioengine_single_blob_test: ioengine_single_blob_test.c $(COMMON_SRCS) $(SPDK_LIB_FILES) $(ENV_LIBS) - $(CC) $(CFLAGS) -o $@ $< $(COMMON_SRCS) $(LDFLAGS) $(LIBS) $(ENV_LDFLAGS) $(SYS_LIBS) - -$(BIN_DIR)/ioengine_multi_blob_test: ioengine_multi_blob_test.c $(COMMON_SRCS) $(SPDK_LIB_FILES) $(ENV_LIBS) - $(CC) $(CFLAGS) -o $@ $< $(COMMON_SRCS) $(LDFLAGS) $(LIBS) $(ENV_LDFLAGS) $(SYS_LIBS) - -$(BIN_DIR)/ioengine_same_blob_mt_test: ioengine_same_blob_mt_test.c $(COMMON_SRCS) $(SPDK_LIB_FILES) $(ENV_LIBS) - $(CC) $(CFLAGS) -o $@ $< $(COMMON_SRCS) $(LDFLAGS) $(LIBS) $(ENV_LDFLAGS) $(SYS_LIBS) - -clean: - rm -f $(addprefix $(BIN_DIR)/,$(TEST_BINS)) \ No newline at end of file diff --git a/tests/ioengine_test/ioengine_multi_blob_test.c b/tests/ioengine_test/ioengine_multi_blob_test.c deleted file mode 100644 index 9d8118f..0000000 --- a/tests/ioengine_test/ioengine_multi_blob_test.c +++ /dev/null @@ -1,106 +0,0 @@ -#include -#include -#include -#include - -#include "spdk_engine/io_engine.h" -#include "test_common.h" - -#define MULTI_BLOB_COUNT 3 - -int main(void) { - int rc = 0; - const char *bdev_name = getenv("SPDK_BDEV_NAME"); - struct zvfs_blob_handle *handles[MULTI_BLOB_COUNT] = {0}; - uint64_t ids[MULTI_BLOB_COUNT] = {0}; - uint64_t cluster = 0; - void *wbuf = NULL; - void *rbuf = NULL; - int i = 0; - - if (!bdev_name) { - bdev_name = "Malloc0"; - } - if (io_engine_init(bdev_name) != 0) { - fprintf(stderr, "TEST2: io_engine_init failed (bdev=%s)\n", bdev_name); - return 1; - } - - printf("[TEST2] single thread / multi blob\n"); - - handles[0] = blob_create(0); - if (!handles[0]) { - fprintf(stderr, "TEST2: create first blob failed\n"); - return 1; - } - ids[0] = handles[0]->id; - cluster = handles[0]->size; - if (cluster == 0) { - fprintf(stderr, "TEST2: invalid cluster size\n"); - rc = 1; - goto out; - } - if (blob_resize(handles[0], cluster * 2) != 0) { - fprintf(stderr, "TEST2: resize first blob failed\n"); - rc = 1; - goto out; - } - - for (i = 1; i < MULTI_BLOB_COUNT; i++) { - handles[i] = blob_create(cluster * 2); - if (!handles[i]) { - fprintf(stderr, "TEST2: create blob %d failed\n", i); - rc = 1; - goto out; - } - ids[i] = handles[i]->id; - } - - if (alloc_aligned_buf(&wbuf, cluster) != 0 || alloc_aligned_buf(&rbuf, cluster) != 0) { - fprintf(stderr, "TEST2: alloc aligned buffer failed\n"); - rc = 1; - goto out; - } - - for (i = 0; i < MULTI_BLOB_COUNT; i++) { - fill_pattern((uint8_t *)wbuf, cluster, (uint8_t)(0x20 + i)); - memset(rbuf, 0, cluster); - - if (blob_write(handles[i], 0, wbuf, cluster) != 0) { - fprintf(stderr, "TEST2: blob_write[%d] failed\n", i); - rc = 1; - goto out; - } - if (blob_read(handles[i], 0, rbuf, cluster) != 0) { - fprintf(stderr, "TEST2: blob_read[%d] failed\n", i); - rc = 1; - goto out; - } - if (memcmp(wbuf, rbuf, cluster) != 0) { - fprintf(stderr, "TEST2: blob[%d] readback mismatch\n", i); - rc = 1; - goto out; - } - } - -out: - for (i = 0; i < MULTI_BLOB_COUNT; i++) { - if (handles[i]) { - (void)blob_close(handles[i]); - } - } - for (i = 0; i < MULTI_BLOB_COUNT; i++) { - if (ids[i] != 0) { - (void)blob_delete(ids[i]); - } - } - free(wbuf); - free(rbuf); - - if (rc == 0) { - printf("[TEST2] PASS\n"); - return 0; - } - printf("[TEST2] FAIL\n"); - return 1; -} diff --git a/tests/ioengine_test/ioengine_same_blob_mt_test.c b/tests/ioengine_test/ioengine_same_blob_mt_test.c deleted file mode 100644 index 4754778..0000000 --- a/tests/ioengine_test/ioengine_same_blob_mt_test.c +++ /dev/null @@ -1,147 +0,0 @@ -#include -#include -#include -#include -#include - -#include "spdk_engine/io_engine.h" -#include "test_common.h" - -#define THREAD_COUNT 4 - -struct mt_case_arg { - struct zvfs_blob_handle *handle; - uint64_t cluster_size; - uint64_t offset; - uint8_t seed; - pthread_barrier_t *barrier; - int rc; -}; - -static void *mt_case_worker(void *arg) { - struct mt_case_arg *ctx = (struct mt_case_arg *)arg; - void *wbuf = NULL; - void *rbuf = NULL; - - if (alloc_aligned_buf(&wbuf, ctx->cluster_size) != 0 || - alloc_aligned_buf(&rbuf, ctx->cluster_size) != 0) { - free(wbuf); - free(rbuf); - ctx->rc = 1; - return NULL; - } - - fill_pattern((uint8_t *)wbuf, ctx->cluster_size, ctx->seed); - (void)pthread_barrier_wait(ctx->barrier); - - if (blob_write(ctx->handle, ctx->offset, wbuf, ctx->cluster_size) != 0) { - ctx->rc = 1; - goto out; - } - if (blob_read(ctx->handle, ctx->offset, rbuf, ctx->cluster_size) != 0) { - ctx->rc = 1; - goto out; - } - if (memcmp(wbuf, rbuf, ctx->cluster_size) != 0) { - ctx->rc = 1; - goto out; - } - - ctx->rc = 0; - -out: - free(wbuf); - free(rbuf); - return NULL; -} - -int main(void) { - int rc = 0; - const char *bdev_name = getenv("SPDK_BDEV_NAME"); - int i = 0; - struct zvfs_blob_handle *h = NULL; - uint64_t blob_id = 0; - uint64_t cluster = 0; - pthread_t tids[THREAD_COUNT]; - struct mt_case_arg args[THREAD_COUNT]; - pthread_barrier_t barrier; - int barrier_inited = 0; - - if (!bdev_name) { - bdev_name = "Malloc0"; - } - if (io_engine_init(bdev_name) != 0) { - fprintf(stderr, "TEST3: io_engine_init failed (bdev=%s)\n", bdev_name); - return 1; - } - - printf("[TEST3] multi thread / same blob\n"); - - h = blob_create(0); - if (!h) { - fprintf(stderr, "TEST3: blob_create failed\n"); - return 1; - } - blob_id = h->id; - cluster = h->size; - if (cluster == 0) { - fprintf(stderr, "TEST3: invalid cluster size\n"); - rc = 1; - goto out; - } - if (blob_resize(h, cluster * THREAD_COUNT) != 0) { - fprintf(stderr, "TEST3: blob_resize failed\n"); - rc = 1; - goto out; - } - - if (pthread_barrier_init(&barrier, NULL, THREAD_COUNT) != 0) { - fprintf(stderr, "TEST3: barrier init failed\n"); - rc = 1; - goto out; - } - barrier_inited = 1; - - for (i = 0; i < THREAD_COUNT; i++) { - args[i].handle = h; - args[i].cluster_size = cluster; - args[i].offset = cluster * (uint64_t)i; - args[i].seed = (uint8_t)(0x40 + i); - args[i].barrier = &barrier; - args[i].rc = 1; - if (pthread_create(&tids[i], NULL, mt_case_worker, &args[i]) != 0) { - fprintf(stderr, "TEST3: pthread_create[%d] failed\n", i); - rc = 1; - while (--i >= 0) { - pthread_join(tids[i], NULL); - } - goto out; - } - } - - for (i = 0; i < THREAD_COUNT; i++) { - pthread_join(tids[i], NULL); - if (args[i].rc != 0) { - fprintf(stderr, "TEST3: worker[%d] failed\n", i); - rc = 1; - } - } - -out: - if (barrier_inited) { - (void)pthread_barrier_destroy(&barrier); - } - if (h) { - (void)blob_close(h); - } - if (blob_id != 0) { - (void)blob_delete(blob_id); - } - - if (rc == 0) { - printf("[TEST3] PASS\n"); - return 0; - } - printf("[TEST3] FAIL\n"); - return 1; -} diff --git a/tests/ioengine_test/ioengine_single_blob_test.c b/tests/ioengine_test/ioengine_single_blob_test.c deleted file mode 100644 index e2070ef..0000000 --- a/tests/ioengine_test/ioengine_single_blob_test.c +++ /dev/null @@ -1,136 +0,0 @@ -#include -#include -#include -#include - -#include "spdk_engine/io_engine.h" -#include "test_common.h" - -int main(void) { - int rc = 0; - const char *bdev_name = getenv("SPDK_BDEV_NAME"); - struct zvfs_blob_handle *h = NULL; - struct zvfs_blob_handle *reopen = NULL; - uint64_t blob_id = 0; - uint64_t cluster = 0; - void *wbuf = NULL; - void *rbuf = NULL; - - if (!bdev_name) { - bdev_name = "Malloc0"; - } - if (io_engine_init(bdev_name) != 0) { - fprintf(stderr, "TEST1: io_engine_init failed (bdev=%s)\n", bdev_name); - return 1; - } - - printf("[TEST1] single thread / single blob\n"); - - h = blob_create(0); - if (!h) { - fprintf(stderr, "TEST1: blob_create failed\n"); - return 1; - } - blob_id = h->id; - cluster = h->size; - if (cluster == 0) { - fprintf(stderr, "TEST1: invalid cluster size\n"); - rc = 1; - goto out; - } - - rc = blob_resize(h, cluster * 2); - if (rc != 0) { - fprintf(stderr, "TEST1: blob_resize failed: %d\n", rc); - rc = 1; - goto out; - } - - rc = alloc_aligned_buf(&wbuf, cluster); - if (rc != 0) { - fprintf(stderr, "TEST1: alloc write buf failed: %d\n", rc); - rc = 1; - goto out; - } - rc = alloc_aligned_buf(&rbuf, cluster); - if (rc != 0) { - fprintf(stderr, "TEST1: alloc read buf failed: %d\n", rc); - rc = 1; - goto out; - } - fill_pattern((uint8_t *)wbuf, cluster, 0x11); - - rc = blob_write(h, 0, wbuf, cluster); - if (rc != 0) { - fprintf(stderr, "TEST1: blob_write failed: %d\n", rc); - rc = 1; - goto out; - } - - rc = blob_read(h, 0, rbuf, cluster); - if (rc != 0) { - fprintf(stderr, "TEST1: blob_read failed: %d\n", rc); - rc = 1; - goto out; - } - if (memcmp(wbuf, rbuf, cluster) != 0) { - fprintf(stderr, "TEST1: readback mismatch\n"); - rc = 1; - goto out; - } - - rc = blob_sync_md(h); - if (rc != 0) { - fprintf(stderr, "TEST1: blob_sync_md failed: %d\n", rc); - rc = 1; - goto out; - } - - rc = blob_close(h); - if (rc != 0) { - fprintf(stderr, "TEST1: blob_close failed: %d\n", rc); - rc = 1; - goto out; - } - h = NULL; - - reopen = blob_open(blob_id); - if (!reopen) { - fprintf(stderr, "TEST1: blob_open(reopen) failed\n"); - rc = 1; - goto out; - } - - memset(rbuf, 0, cluster); - rc = blob_read(reopen, 0, rbuf, cluster); - if (rc != 0) { - fprintf(stderr, "TEST1: reopen blob_read failed: %d\n", rc); - rc = 1; - goto out; - } - if (memcmp(wbuf, rbuf, cluster) != 0) { - fprintf(stderr, "TEST1: reopen readback mismatch\n"); - rc = 1; - goto out; - } - -out: - if (reopen) { - (void)blob_close(reopen); - } - if (h) { - (void)blob_close(h); - } - if (blob_id != 0) { - (void)blob_delete(blob_id); - } - free(wbuf); - free(rbuf); - - if (rc == 0) { - printf("[TEST1] PASS\n"); - return 0; - } - printf("[TEST1] FAIL\n"); - return 1; -} diff --git a/tests/ioengine_test/test_common.c b/tests/ioengine_test/test_common.c deleted file mode 100644 index 12336be..0000000 --- a/tests/ioengine_test/test_common.c +++ /dev/null @@ -1,20 +0,0 @@ -#include "test_common.h" - -#include -#include - -int alloc_aligned_buf(void **buf, size_t len) { - int rc = posix_memalign(buf, 4096, len); - if (rc != 0) { - return -rc; - } - memset(*buf, 0, len); - return 0; -} - -void fill_pattern(uint8_t *buf, size_t len, uint8_t seed) { - size_t i = 0; - for (i = 0; i < len; i++) { - buf[i] = (uint8_t)(seed + (uint8_t)i); - } -} diff --git a/tests/ioengine_test/test_common.h b/tests/ioengine_test/test_common.h deleted file mode 100644 index c2f18af..0000000 --- a/tests/ioengine_test/test_common.h +++ /dev/null @@ -1,10 +0,0 @@ -#ifndef __IOENGINE_TEST_COMMON_H__ -#define __IOENGINE_TEST_COMMON_H__ - -#include -#include - -int alloc_aligned_buf(void **buf, size_t len); -void fill_pattern(uint8_t *buf, size_t len, uint8_t seed); - -#endif // __IOENGINE_TEST_COMMON_H__ diff --git a/zvfs架构图.excalidraw.svg b/zvfs架构图.excalidraw.svg new file mode 100644 index 0000000..f6fa23c --- /dev/null +++ b/zvfs架构图.excalidraw.svg @@ -0,0 +1,16 @@ + + + eyJ2ZXJzaW9uIjoiMSIsImVuY29kaW5nIjoiYnN0cmluZyIsImNvbXByZXNzZWQiOnRydWUsImVuY29kZWQiOiJ4nO19yW4ra5Levp5CuL2pclx1MDAxN7P/eSh0NSCJmiVqXHUwMDE2Jfk0XHUwMDA0ipNIUSTFQVOj1r3yyrA3NuA38MaAN4V+m3bXaziCOlx1MDAxMpPMP5OZVJLixSXr1lx1MDAxOUQyT1x1MDAwZTF8MX3xL79bWfml99ou//KnlV/KL8VCo1bqXHUwMDE0nn/5I/78qdzp1lpNeItccv7ebfU7xcEn73q9dvdP//BcdTAwMGb4WY88tWrFsldsPbx/rdwoP5SbvS588D/D31dW/mXwK7xTK+GX355uWm9vx0KUr8rV9ctTftGlucFXXHUwMDA3XHUwMDFm+jibTrnYKzSrjfLwrVx1MDAxN/i55NyThlx1MDAwZl+f777Cu1x1MDAxOa0+//5cXCv17uBnSilPfv70rlxcq9714MeUSe1RyoUwhjDJxfAz7//sn1bI50+6vU7rvrzearQ6eG5/R1x1MDAwNq/hmd1cdTAwMTaK99VOq98sfX6m1yk0u+1CXHUwMDA3bsXwc5Vao3Hae22838VC8a7f8V3f+7+S/3nedOznn9/rtuApXHK/XHUwMDA1/2z1rlnudke+02pcdTAwMTeKtVx1MDAxZd5cdTAwMTNKhteB59jeKVxyns0/+4/QLP08wsfDXHUwMDFhPlx0/vMnf1x1MDAxOZ5PuYxPklqhqdHc2M93hlwiY1xyXHUwMDFk/2mu1Vx1MDAxY4iP0ZZbS8jwhte6WZCa3uColUKjW1x1MDAxZd5YPLWNcYnyS5VPsspHt7m7dfnAz1itvU7l6f5e6/7zRo1IV6HTaT3/8vnOX/5cdTAwMTh13Ge2V613XG73j3me31xcN69cdTAwMDfruVothePmr1x1MDAwZU6fy4e957Nda5S5vCm/nq+ncNzXYuO8/pi52sxcdTAwMWORyt7GXHUwMDE5I3d5kcJxXHUwMDFi11x1MDAwZnVVeDk1rYssb1QuKm1T76Vw3MruXeksV97WZ6dr1dfy4/Vep0pTOG6LXHUwMDFkvFCre8TWz2XrcufqtG/SuFx1MDAwZnc3uvpy99Beq1/kXHUwMDFipfvn+sFL9TGF4+p8pdfcJ437+8PsXHUwMDE2P7vsXHUwMDFk5V5ZXG7HJey19vJ23dmt1U9q2/uku3FOeVxux1WZ9Y3nm2L2VL8085Wy3b2Rt9cpXHUwMDFj9+KkfrRxXFypk1Jtj1x1MDAxNreaJ6+C6XjH/fmnoW3rt0uFd8NCtebMcviN+Wx9o9a8hzeb/UZj+LNW8X5oi37nO+ExV2Yyj3aj+vZwrdpnj3t9oqtdn8GZ5Mq04lx1MDAxZeehrox7eniaXHUwMDFmzoxahytTw28uXZf/zoe6Lqml4OC+hjdz6LlUuOdSRGrG2PD9LzuuXHUwMDBmXHUwMDEx6ZVfxizpu4SV+quFt9zb0+H97fZJe1+KXdKoxtWjrZvXtYeNg6rMV8xDM/+2W7JXpTT8wH25cqb1baauy43HM1Ffvzw+TUk/mSRG+FThS/rpvntcdTAwMDH9XHUwMDFjuflcdTAwMDPVNER5JlQ1qfKYQzWJXHSqJluqZnf88X3AR1x1MDAwMcpElVxmaNtAP0mYXHUwMDBlUiqV4ML6jF58LVx1MDAxYzmPr4veUJJQguAmnlx1MDAxY62vnJY7cMq+p9Nq9k5rb3jujIz8dLPwUGu8fsRXn1x1MDAwN1pt1Kp4XHUwMDEzfinCWfuPXHUwMDA0h+3VIEb7/MBDrVTye5XbQrdcZmc9uE1DYSzCP1WAn3Z24jitVqdWrTVcdTAwMGKNs+AlRWha7mAvl9tv5sXZ89tbLbOX7fZvT+Jomlx1MDAxNDLKXHRKXHUwMDFi0DO+1LNkeqZcdTAwMTVRhHDtUjRcdTAwMWF0gZ+Kxlx1MDAwMClR0Ipp3N2sXHUwMDE1LVsuPLSavz/q1Fx1MDAxZVxundeVh3KvXHUwMDAw/0ZhpfVcZnL+hzSUr1Gu9FwiVK/XaifTu5GrXHUwMDFiV7JcdJdcdTAwMTOheHv3j43+wfld5y1X2WpcXD+93Vxc6+f4XHUwMDEw1DBPcVx1MDAwM+G4XHUwMDAyqyp9kTzeXHUwMDE5y7RHXHUwMDE050pcboXpXHUwMDEyXHUwMDFl0EWqbVBcdTAwMTcpW6ZSXHUwMDEy4lEjXHUwMDE44Vx1MDAxMKq6NFx1MDAxNGKWUFx1MDAxNeWaXHUwMDE5UFObYiplXHUwMDEyXCK9vnvi+ZuaPrhZXbuVXHUwMDA3++drr6xcdTAwMTlcdTAwMTc5smarvL29Wd3avSy3svRk5ynb76ZcdTAwMTXZMSpcdTAwMDFcdTAwMTekg1x1MDAxY91XXHUwMDE5XHUwMDA3OVLh0XCN4mySRlHl0Ci61Khw91x1MDAwNvdZXGLjxJHWXHUwMDA2cpNcdTAwMWa6o1x1MDAxOSGGwWNI27sll8OAd7tttG5v7lxuTVx1MDAwMHd/+lx1MDAwMVx1MDAwZqp0f4M/+k8/moN3aqVcdTAwMWbNTrlyU4TT6v1oep737WiTXHUwMDBl/60xuDnBQY17wuSX7tTiXGK/5zOtQrAw8aDGXHUwMDAyZIpcdTAwMTnsv9uMTn2XXHUwMDFmbrw+7m/t5DdzR/dcdTAwMDc7OXG4OOpcdTAwMTU0XHUwMDFj71xigEfYK22kJ532Ki2j82n3jMPuqaFD/EjccFxyQNbvKdPxwyk5x83T43z92PRKmVx1MDAxZFI/PF57K1+qw9jlgNzrWnO1v83ub5rX5F5vvPWfX1NI19Tpk71rtVu5pzt1fHFW3H7pPFx1MDAxNVM47q/Ambv9tM9cdTAwMDJcdTAwMTBtQi2AYFZcdTAwMDNcdTAwMDRmcVx1MDAxMlxyUY9/wS2A4TpcIlx1MDAwNjBEz8lcdTAwMDJoXHUwMDFkNFx1MDAwML57/9NcdTAwMDAwwLxSWIhcdTAwMWanxVxy83LmiT3xu1Sg5/M/YHds+TXPPeaiJziucVx1MDAxN/15jlx1MDAxMfj5VO7s5baf7NvmXHUwMDExPzrObLCb7bOzRJVcdTAwMTFiKVPwnCnThI6IJIRKXHUwMDFlt8ow+f6rMVx1MDAwMVmiVji8yVx1MDAxMkQnrfAzQ+EpaOGq8EtcdTAwMTZcdTAwMGWeuFx1MDAwMEdNpe97M49LXHUwMDFmXHUwMDBi+/RU8uPtUlnX8i/PtaqtPsR1Zbnjje725dnRXeNxrbJdZKtX+2tbKbhIWe+UXHUwMDBiW/eba7Zx1lx1MDAxM9vrZ7dHmXZcbsfdz3VPXHUwMDBiXHUwMDA3Lzx3dn9l1d7aVbnT7KTmepmRLKU42v1UYsTRcCqeXGY3XHUwMDAxTEw0XHUwMDAxPle1TFx1MDAxM0+Oo1x1MDAxOVx1MDAwM1xcraizmUfQ0JIoNlx1MDAwMEmq/Fx1MDAwZmhxssSD8LF3XHUwMDA3Oli6XHUwMDE53JVvj5NDqzJcdTAwMTNcdTAwMWPmuFx1MDAwZlx1MDAwZV5ZhFx1MDAxMu5cdTAwMTW0Lj10xEa93ezv9CqHm50zncBcdTAwMTmDrokwTaREemZcbmcs1VJcdTAwMTOTOWNOqWaaXHUwMDA3ksFcdTAwMDOxXHUwMDEyofrJrWbSSqqn0M8pXfF99VTubper13Qvs5uXa92LzjqbQ3PBrFxc/ESXmWrTgvvuxXCZhlx1MDAwYs+EKypjk1xcplxceswkXHUwMDFlU2hG/PlBnz7KcH0kVjLK/NXYxfGXR+CpXHUwMDE2uHNhglx1MDAxZlx1MDAxYveR71dcdTAwMTOhadGmZsxcdTAwMWOMqpqwnlx1MDAxMJZIo1x1MDAxOFVcXIhcdTAwMTFVU2TUJbr6+DzAtkRoTVxy41JcdTAwMWJOg6onRjpcdTAwMDB/y9pcdTAwMTf0kizMS1LBKVx1MDAxMcGSabSXZIJbKoiexknOSinbrdq4XHUwMDE3XHUwMDFl/mllKFx1MDAwN4O/fP75n//o/HS4vOFrIGnDQ1x1MDAwNPxdo9DtrbdcdTAwMWVcdTAwMWVqPbi2IzyvgEnsXHUwMDE1Or21WrNUa1ZHXHUwMDFm2M9xkDi9R1x1MDAwM+NS7OM1XHUwMDEzjzKrjaaSaKE44dL3oWqhPVAhJUaULCBcdTAwMTLlZmnyKUVcdTAwMWJcdTAwMTXfKWWIR4yBmNRoI5SF50u046RY8DxcdTAwMDY3Z1x1MDAxNa3IXHUwMDFk4PXxW1x1MDAwN2fpf8+PPkJ6rVwikUyk2ZJcdTAwMDDWlbTvJWhORs1cdTAwMTbFfmRcdTAwMTZccuU9XGL3LKFcdTAwMTZcdTAwMTAlgUPIof3/NFtcXHqS2uFreJSlXHTzP6BQXHUwMDEzXHUwMDA2OspcdTAwMDGzXHUwMDEzVyTOhFx1MDAxYf/psDHSXHUwMDEwLZlZqH6tlI1YmPThKyB38zBosa1cdTAwMDdcdTAwMThcdTAwMGZNuVJgqDjcSzAj3PepXHUwMDBm4yE48b2UXHSISCyTXHUwMDE2nUtcdTAwMTgzaURcdTAwMTBGQJYlYZKCoVxynJXwpPSdXHUwMDE0pbOzb7Gq9sqEtyxaTTinOlZv8M+Zm8P9+u3B8+u13ttrXHUwMDFm3jw8XHUwMDFmnq+rxTEs7pKdMuBcdTAwMWaNUVYxXHUwMDAwodxnXHUwMDEz8PtCQlx1MDAxY8hBzrRcdTAwMTVEgquaYdWeufr/fF1cdTAwMTVcdTAwMWaoXGa8uGDaP3CxUHX7Tun26bq7dVwi107Orjqnur5fyuvvTvJ/oVx1MDAxZiBg8Fx1MDAxY0l+ocjM6+vShuoqI1xcXHUwMDFia5hN0GDjfEqLrqtWROiqpMLjfl2dnapcdTAwMTJHiFx1MDAxOayvg3M1cFx1MDAxZZIl0dQv1Ne5oCBcdTAwMDTTQIup6utcdTAwMGallfesedd/J2deYZ/gZcYzXHUwMDE3vrOc3k/qYMP/MCmsrGbxajbvqrf6tFFsXHUwMDE2Svu0dKbu781htrrRvlp01Vx1MDAwM3VcdTAwMDPESISlXHUwMDE0/shGR0uE5IvmJVx1MDAxObPGMuUzXHUwMDAyXHUwMDBi5STZ0/VcdTAwMTnJrVx1MDAxNrr26am/3snt39CT3nc3oc20XHUwMDEyXHUwMDBlSF34+rFn5CSVXHSN6bggSlxixePkpaJcdTAwMWXSoisqM+GKXG5RyZxcXKRrmMXhIcGTXHUwMDEyqoiZU1x1MDAwN1x1MDAxYVx1MDAxN1JcdJJAXGK/5iFrrZ++Z65cdTAwMGVygn9cdTAwMTl3kMOTdGpcXFx1MDAxY1xcXHUwMDFlmSfT3CNKU1wiIJhk/trp4J5p7VFf6jE4olxiXHUwMDFmXHUwMDEwXHUwMDE2XHUwMDEwXHUwMDFmYVx1MDAwMufulGukQytPWWopk4JrRsmyXHUwMDFlnjTTb5Q0VPtccoIvT6ZCXHUwMDA3iKWZ1dxH0pDq88epZskyodKHr6DcXHKPXHUwMDE3cIippcmSZKSYXHUwMDA2zbNWKlx1MDAwMy9iXHUwMDFkSXbjTZvsj8bhK/50XHUwMDFkI9Qwylx1MDAxOINfXHTgZVx1MDAxMziNXHUwMDE5JsLiYJtoXHUwMDAzJjxcIiQx2kqrXHUwMDAx5o5cdTAwMTkw6SFcclxumDgtXHJXJFifxLlcdTAwMGZcdTAwMDGyXHUwMDAz71x1MDAxYfg6d1ow6Vxyhqg4I0xcdTAwMTJOhr5kacH8XHUwMDBmKDzRXHUwMDBmT4Bpa1xcY58sOFxmOkz0a8pAw9NP9L9cdTAwMDPe4T/8XSYsVPrwXHUwMDE1lLuFsmDEwzIr52BjLVx1MDAwMVx1MDAxYqyVXGbWLpWnXHLhhlx1MDAwMqZFMDllnj9cdTAwMWE0jVozLTScXGZV0nIwqkFjxjwrqFx1MDAxMNoqxog2Kiiz6dq29UZNKd1j/Vx1MDAxMi9sX673Xjpbr8dxupxcdTAwMTSVnuChuW+F3CyMa22sosRh2eRyvjZZXzBhXHUwMDAwJlx1MDAwMOs6XHUwMDFinUz4XGZcdTAwMDBopjRWxynFzFx1MDAwYmd9xDhFXGJcZnrlXHUwMDFmzdLgfH40W+1y80ez2Gh1yzhe2sWYafooKn5QXHUwMDE0SlwiXHUwMDExOk5cdTAwMWLJXCJcdTAwMTHnslwiVPKs0VGsXHUwMDBit3qPPT6/Pb+ubtX6h3FUXHUwMDEy4iQ9opOjcEOAwVx1MDAwZcm0fSilcMx+2SWeXGLTSc64MdqJXHUwMDFiqFxiZ4ugRktcdOB/XHUwMDFhtohZqyRcdTAwMDbzP5rPnVpvIKuVv//7wW+ZzLcqou9cdTAwMTYn0MPwa4nQvuhqX1x1MDAxNNjXkntKYFx1MDAwN5thnMDDXHUwMDE40T5FKPYtOdJnn+lpLlx1MDAwMG5RVGCKXVx1MDAwN8zFlmuYh71cdFxmXHUwMDAwXCJ8Si9n6ZKCfVx1MDAwZY9cYm6w0436k8xBXHUwMDFlXHUwMDAyLjVY1WlG6X5cdTAwMTX5inDxXHUwMDFivFx1MDAxYlx1MDAxMLx5oP1cdTAwMDR5XHUwMDAybuFcdMHDtczAQ7K+UOWzf8bRXHUwMDEyXHUwMDE4XHUwMDBi30fP5Y6eXHUwMDA1k9j1XHUwMDBmMaGEoMMwXHUwMDExOFxy5mGujPDBr1x1MDAxMHWKWSP86IpPlEGz8MwhvmWWgZ9cdTAwMDNcdTAwMTg/XG4nYlx1MDAxODSqPVx1MDAwZd5cdTAwMGVuimJcdTAwMDZuilx1MDAwYvKPyZVcXM45JM2/Srh3klnpgiGChMJcdTAwMTBG4PlcYitk+lx1MDAxNm1B0lx1MDAxN5lwXHUwMDAxxFdQ9OZh0mInXHUwMDBiMsSTcDfh/0ZrXHUwMDFj7jTBnkA5e5uGmWBcdTAwMGK6XHUwMDBl8b5cdTAwMDTDKrRw5FHmbtSiOU+iU7LKs1JBUIyFXHUwMDBi/7qBXHUwMDAxzYZcdTAwMTRIwqFcdTAwMTVjRikpgnlcdTAwMGKcKdHSXHUwMDE4cDPgcuBcdTAwMTiOdlx1MDAxZVx1MDAxZFVoXVx1MDAxYTGHXHUwMDExgyeBpVx1MDAwNuWyYY7FXHUwMDA2Q1x1MDAxYoY8JJSZXHUwMDE5ZDeotIl6XHUwMDBlZmTDwuVccl/ad4DZ2awkxoJQrrjSXHUwMDFhkCRcIlx1MDAwN0tdOU40tj8nszD5Pp1cdItmNFx1MDAxYoFlXHUwMDA029uE0ETBaY3Q5/hOSrC5mrBmodh7vbl8fanzarV7s/1aPN/YXHUwMDBimrBQxlCNczeAeKmCkFOOlsYpRW5cdTAwMTYztGPBdSwgSZ4rwlxcRpRJXHUwMDA3wsFcYlx1MDAxOaFDWENVaPNcdTAwMWbXXHUwMDFhW2/nyWK/n33dvNDPXHUwMDBmXHUwMDE1tWpPXHUwMDFl2yZXb3ZysWnGzmkuX99+vT9bz26Tg8tcdTAwMWM76NRTa4OmksskzTVcdTAwMTGK5b7KgGI5XHUwMDA2tyXzmKahOoVkJz6dXHUwMDEyXHUwMDAxnZLR1Fm/ZX1ypU9ccvhcdFx1MDAwMt7dWdFgoUOiymhBVKx+9eRcdTAwMWNjiYRwKFM/M465i4f5VCymmtue4HHG06eDi4nC4JFcdTAwMDZhUl+XsaHOi0hcdTAwMDA9kYpGna5cdTAwMGLUd+m8XHUwMDEyXHUwMDAy78HqMEA7SZu3IGJGnqJcdTAwMTmUMFx1MDAxMHd/f+7AL2H4QtlKXHUwMDEzao9jx8lwN1p7x0C4pEpcdFxcpyeV1L5NOitzaJqKx/lrSWiunXHNXHUwMDA0Nyx+t7XcOrl6y2aq2atruXet8jqv91x1MDAxN57wM1x1MDAwM9GcJznXnFBcdTAwMWNHXHUwMDFkMYJcdTAwMTljPCrEXHUwMDA3j1xmmeFUXHUwMDA0JdbzT5Y6ZskpwFwiT0pqMWCjXHUwMDEwRlx1MDAwNbqxXHKIXHUwMDFhxFbkm1x1MDAwNlx1MDAwYme7IHBWi/xmtXDvXHUwMDBii/EmQnVmwLjQlKA6aMDpavlp9+mAdLY5XHUwMDEzudbdVixawoy2xrPE/NRcdTAwMWR/l95Ae5TxXGK1n9rjWyX2IfaML9fXJFx1MDAwMevKciO43+P7gIJcdTAwMGWvm0KEy8BypI9cdTAwMTRS4Fk67N2VOytHnVZcdTAwMTGu+/enZcDLpULnNZW9NdM3XHUwMDFkTbW5JvRSXCK0b5tuXp/s7lx1MDAxZbX3rnb16nk2qy5fXHUwMDFkjUYhXHUwMDE5KHBf4Fx1MDAxM4aJPDXaXHUwMDAxqEdoYfzdXHUwMDEww0lY61BBX6vHb1lcdTAwMDWTsFx1MDAwM3NrrVx1MDAxY5ng9iXPRXj/slG4+Y3oOW6tye5dvlxcPDxcdTAwMTTrh89XW6vHpVxcRV83Yu9RPO2vPz7Itbte7VmXSnuH9Z11mYK7zO1d73ZcdTAwMWG99Z3V+21ReH5hq+u7aewrLp7sVFx1MDAxYeeZyqXOlFx1MDAwYmell07muHKYXHUwMDA2XHUwMDFj2aOrp5XVx1x1MDAxN1F4qdyzm63sebmfwnGzeVq4vq8+bfc65d11XT9tr2WvUjjuJbnaKbdOaLPztrPfu8o8XFzd3lx1MDAxZaRw3HudzWb2tmpHveNMpp2TtcvMeiWF467aXvFss19qrr1cdTAwMTZaO7eb2eKLSON89era+lOvf9vvyEIn02jmb64qMfdcbk+EZeCZIWZKXHUwMDEyN0c4XHUwMDA2t5bGgWWc8VGfMEptxe2YV1xiOlx1MDAwNeZcYj+WsCw8h6q11cI/q+ufXVx0p79UmlhGjVlEWHZcdTAwMDCY51x1MDAwM8osbjZ1XHUwMDAyelx1MDAxYVx1MDAwN2gjXHUwMDE3XHUwMDE1oXtXh418ub+zSyrtZ1x1MDAwMHBP4q1759iVXHUwMDFiXHUwMDBlyqRcdTAwMDdhulx1MDAxMJRg2MNHQVx1MDAxOSBxT2slXHUwMDA2RCjSXHUwMDA3XHJ8yYDBJkJtXGJ8iCpXkEQ5XVx1MDAxMmImR2lcdTAwMTA1XHUwMDBiNjLwOtRUI0PX3ygjtE11g8NsQc9Dd61xTLiix2zjcLd72qu+mDScZ72dOTja2Hqm5G5j+2J3TfHN17c0wNTR5f7T2k6/U6yQ7truWiZ7eVx1MDAxNJO4XCLyuFx1MDAxN1ltXsonVuu1XHUwMDE3nrlsZK5e6+dcdTAwMGJcZk5cdTAwMWVbh71b2Xo7otnuOl873XwjtprCcWV252qjvr5zfvV4d777eHB6mbuMmYv6XHUwMDE2cD0rXHUwMDEwvJ/dXHUwMDA1n3e0nj3q0+uz9uZRs5I7WmBwXW1vnNC9x3pP71bYdVaLjfxqTLD6LaC99GpE7eKu2mdcdTAwMGbHsmMutu+veFx1MDAxYfe3t1FcdTAwMDWnelKUmzvPSlx1MDAxZeS3ZIXm01x1MDAwM+2WpbXipV/Pme5Z+fLqZOdsv360frl3sJeJXHUwMDA12lx1MDAwMUF6TFx1MDAwM5KEXHUwMDAwXHUwMDAy27ZGN48xTTykL5DKKqGNXGYmcnxlwiVkj7Ep1Vx1MDAxMkBXJoSyPnySUylLlFwiNvV+7XfILlx1MDAxMohhXHUwMDAwsm9cdTAwMWZcdTAwMWXuzVx1MDAwNaqnmzVcdTAwMWScdoRORWecolpcdTAwMWMyXHUwMDFhUDInRjFCNMDxsXV+XGKzXHUwMDA15sYpM8JcdTAwMWGmglBcXHiESOS90MxcdTAwMWHQPkfDg7XKs0IykFx0pjUhS51L2P2A/MTIzuFe+Fx1MDAxZLGVVnKwl7igKz1YPil1mnnhpdpzZm3vuXe83d6vPT9ty93k21GmUvNUWytCXHUwMDA1XHUwMDFiX1x1MDAwMZFcdTAwMWVcdTAwMWUtcIGp9TdHh/Ir/k5iI7mxXHUwMDEwkGE7tvRxea58NFZoXGKwXHTm41xyhOJK82BAONLtXHUwMDExPN30OozdXHUwMDEyXHUwMDEzXHUwMDBiXHUwMDExKOtcdTAwMTmiXGLYL7hcXClHhySksVx1MDAxZbdEMMWEMZxcdTAwMDX56X311iVcIojD7Vx1MDAwMKJCRyjM/IggNDcgXHUwMDE0rn2jfFx1MDAwNjOpXG7h4FdcdTAwMTDB3/7H//5///bf/uO//pd//7f/ubhJvFx0XHUwMDBlflx1MDAxYy+MXFxUhObpjUL9uH1fJy9wXHUwMDE2TXV3vdrdLFx1MDAwNzUvPInHPKybg31cdTAwMDTh4mQ0iYd1U49zJC1C1E5cdTAwMWPtkcS6OvuXldWkOTtMgVx1MDAwMlRzVlZlOF0olVx1MDAxMErh6tVpxpKmhFx1MDAwN7s1enBgq5ulXHUwMDBi0VCX4qljrlxuV9+dtJtVXHUwMDEy7NeWrJpVMuU3n/RwS31cdTAwMWOIXHUwMDAzZzJiY4PTXHUwMDFlZIKRdbFJL0FOeNqDMKw5Me6cU49oU+FMMlx1MDAwYs9nXHUwMDEx81x1MDAxZfu1Zv/lP/77v/77X//v3/7PX//21/+1uFBnXHUwMDAyJlx1MDAxOYc6wSuLUMPoYlJknkRcdTAwMTHuobNUTFNcYvzGQo33OUZcIlx1MDAxNKfCSFTWgFx1MDAxZVx1MDAxMk9aQrRcdTAwMDXESrjhxknyLqknhWKKXHUwMDAwWMbqpliqqVx1MDAxM1xuhbNMcGqIwa1PLihEQ9eLUFxiXHUwMDEyIVTWcpqVflNCoftK+7Jzn+nuZs9zotfd3yue9Sq/xkxJJlxctvFcdTAwMTWU6uFcdTAwMDFcdTAwMDPX+Fx1MDAxZMlcdTAwMTLFXGLHWFx1MDAwNC5A+EaMVnxcdTAwMDTCnCuutVx1MDAwNExcdTAwMDA+dspB8OiWiZWRyVx1MDAxOFx1MDAxY1uSimD+jklcdTAwMTU8pZmTXHS7hTNcdTAwMGViUZx5XGa5OJDNxOoxxk2qtKckx05bJExWrr3Cy/HURICFg2SCXrnrNOHzqYxcdTAwMTiqXHUwMDE1suamXHJY3tMyQ1s7NeXm4qKUXHRIwk21XHUwMDE5uX0gMqSbXHUwMDAwTayHzFx1MDAxYlZQjm2Lo1x1MDAxM1rgXHUwMDBiPWu5spRoQnDSLlxiTYL6ppE1izAqXHTjwjK+XFw3kFx1MDAxMIgwXCKENOBcdTAwMTPdNZuIpFxm3HNcdTAwMGJcdTAwMTDGl5ueOVx1MDAxMnl4LFxc7l5d5io98va4WsxcdTAwMTSOns/lr1x1MDAxMYmMfDogw/OAXHUwMDFksZ08wlx1MDAwZZxQRClBbkVcdTAwMWSs0UDEoTTTjOG6TSWlmlx1MDAxMnhEh1RjwINccnoukTFcdTAwMTZrJ8FtXG7KXHUwMDAzLFx1MDAwNKdkNddMacfS0ZSRiFs4YyFcdTAwMTFhkPovxDDC38EwXG6tsDpcdTAwMDGhQbDJWzuAyJL7O2L2TktJtVx0LlxcRVBcdTAwMTc6o28p01x1MDAxNleZpVxyQ0A6ccPv12HIj+ZLodfr/PlH87bRuq2VfjQrpX/8p1x1MDAxZs27Qlx1MDAxM5CD/1x1MDAxZX9cdTAwMTNMXHTl/55cdTAwMDQrwijBJ1xcb4S2Rvc7RuJcdTAwMThJhKdcdTAwMTRcdTAwMTbTkY+fyPFyXHUwMDEy9ZiAKJNDZMEgXo+FYyCI8lx1MDAwNqxHXHUwMDEyvFx1MDAwMFx1MDAwNCNL5U2IY7CRwFx1MDAxMOJsXHUwMDA3l8GGlCGKsVx1MDAxMOcj5pxcdTAwMWaKyTRcdTAwMWWyzy+lzEHtUNVli73Jbnv1V49iXHUwMDAyXHUwMDEyPFx1MDAwZlx1MDAxNJMgeVwiXHUwMDA0seBKqbbITuroNFFcdTAwMWVcdTAwMDb7wjBcYj5cZuBcdTAwMTk7XHUwMDFkiEmSPVx1MDAxMVx1MDAxY0w/2FxuJVx1MDAxOaHasVx1MDAxMmrm6Vx1MDAxM7csxlx1MDAwMS2SaY+GW0GQXHUwMDA2hcSFlEuI6WyQzVg4SupiXHUwMDE5voVnT1x1MDAxNMH/iDNOXHUwMDBin3ehUlBJhVx1MDAxMqlnT5hcdTAwMTZW+3qVp4At75s8Kt9cdTAwMGVMxPCZjeGSXHQwYVx1MDAxY5d8XlBUnFx1MDAxMNlcdTAwMWNcdTAwMTCNPKjwqGTIpCqMtWZU58BcdTAwMDF6kkVMo1x1MDAxMc9Q0FSipdGUcE1cdTAwMWMqqFx1MDAwMdxw0GbwL8RSX0r3N62RSSo7mKNcdTAwMDdcdTAwMGbjjC6Q6T9UU1x1MDAwNVx1MDAwN+2Wis+xXHT2uH9Yfz1aJVx1MDAwN6uX5ZY1Zb3evTn7VUKRUNHGV0Co54FNXHUwMDEyZFiEXHUwMDFjrLTkYrCV0UcvXHUwMDFiXHUwMDAxXHUwMDA0Us+oXGLJrDBKwk1cdTAwMTJGXHUwMDFhXHUwMDEzRCPSXHUwMDBisqOnjEfcXHUwMDAyXHUwMDE5XHUwMDBijyDbqOBSKYaTNXy08K0sxGxcdTAwMTC+XHUwMDBiXCI1Mlx1MDAxNlx1MDAwN4nIXUmU5a6mcDyiXHUwMDA3W5ecXGbjTIVcdTAwMGbKa1x1MDAwM9+STKQ/KG9cdTAwMDCP+HDQXHUwMDE34Mii5U1C1zVNwlx1MDAxMmHwJEmiJJpcdTAwMWQlXHUwMDEyrnBrPTC4XHUwMDEwYVx1MDAwM1TRQvHxZlx1MDAxNFx0XHUwMDExo8RpbFBbpoI6STyiXHUwMDE5jt9zXFw6wVx1MDAxOFx1MDAwNEtBJaWSeFx1MDAxMNYh6sG2XHUwMDE2vaxcdTAwMDElhixcdTAwMDLuLoSh1NUvz4NccmafXHUwMDE5UXiwYGin6pefXHUwMDEysJxuPIr+/mXppreZzYpcdTAwMDd5e/xoSolcdTAwMDCLgkDbTlx1MDAxNbWkXGZYQkVcdTAwMWJfQaGeXHUwMDA3ZEmQTmG4XHUwMDE5iiqw5oQyqoOLmSj1qLWU2ffEXHUwMDFj4NpoXGIzy8lcdTAwMWS31MRBXHUwMDE1gnHPaLhcdTAwMDLKNDH+ZdDvWVx1MDAwZeNcdTAwMTFcdTAwMDLoXHJz+FxmXHUwMDE0KJjrXHUwMDE1jva5ZZojXHUwMDE0ViDxXHUwMDFhXHUwMDA0sC5DZMMjJ6ZcdTAwMDdcdTAwMGIy0ucwXHUwMDA3teRaTEWf/Fx1MDAwMSpcdTAwMDb7XHUwMDEyV1x1MDAxNjvLMcHHj8NcYt8lRShd9LDCROBcdTAwMDBcdTAwMTjeKKQ1XHUwMDE2yoxN7GCrXGLjylgkNIdYxaF2Ia1cIlx1MDAxMFx1MDAxZVphXHQ1Ylx07Vx1MDAxM69117j4hDs7VnV4+5bVilx1MDAxMohcdTAwMTPmmNXotcvXvZ7uiduTrfzebmfnLtdcIslAXHUwMDAyKD6dKpaYbZuIX37nXHUwMDAxXGJcdTAwMTLkMFx1MDAxOOipYIPNhZZxQlxcq1x1MDAxYTWuVOaCXCL5LZU6KGqpJzUg0MRFWFpcdTAwMTKwJUaxXHUwMDA16Fx1MDAxM3FLZywwXCKFJ8LNXCKgRY9SxUHZrNScOilil2gkXHUwMDAxXHUwMDFhXHUwMDAxYZaaaO0yeTa8R99cdTAwMTJcdHEtmyqRO6FTXHUwMDA0MPd0wctcdTAwMDdcdTAwMWOp9Dr9Zlx1MDAxMY76+0rpXHUwMDBmg/Xpb7Vv719cckcmXHUwMDEzQMQ4MnFfXYQyRtN/RYFcdTAwMTRhPFx1MDAxY6RiXHUwMDEyVI4yQ0aZOZGQhIA1NJRQajlx8DU7u0CoJy1cdTAwMWWQXHUwMDE4+Fx1MDAxZuXLbvKkXGYkmGeCmNm5fyVihzxcdTAwMTh/8Fx1MDAxMNOp7JQo5eAm166e9O/Vhu3c8lxy0d8s3yVCKeBkteBcdTAwMGKGUoJcIjxcdTAwMGacXHUwMDEyO3GBI0BcbodWwHlcdTAwMDJcdTAwMDRcdTAwMDFvXHUwMDE5XFxdqD1cdTAwMGXiIKWC91x07u+eXHUwMDBlpiTATnhKuFx1MDAwYt1cdTAwMTgphJRBkMI8XG5cdTAwMGVfK43LZySXM6/DuIUzXHUwMDA2SOE00iwqgFvwXHUwMDA2br9lnFx1MDAxM8f8IXekeJdTNeF1XHUwMDE4wLfgXYRwl5vDXGZcdTAwMWX2czNKp+JBnTxTY75cdTAwMDZRWp3770Yk4Vx1MDAxMzVcdTAwMTNcdTAwMTBDXHUwMDAwkeDFRChaNPVCXHUwMDE0XHUwMDAwkdpcdTAwMTNcdTAwMTTsXHUwMDA1kqJcdTAwMTli6SgjXHUwMDFht1x1MDAxOC1KJrSGWJZpXHUwMDEybFx1MDAxYndcdTAwMDBcdTAwMTCI3XCZrKVcbnspuVjO9SZcdTAwMWSnsVxc4lZcXFx1MDAxN/5QwThiyFx1MDAxZm5x9fJ8t5fe3TxcdTAwMWWc9rf26lx1MDAxObV3kruiOf10m1xmf1A878XCXHUwMDFmXHUwMDAxXHSeXHUwMDA3/EjS8ylcclx1MDAwMyeOUoJcdTAwMWLgWbDVQ3iM4X5qI7WFa+DBzcnpZknwlKz4KX7glVx1MDAxZKekcaGZgSBMXHUwMDEyKbDJYub4wy2cMfBcdTAwMDc1nlxyt4pcdTAwMDDyPW2RhoQybYSIN0uzbIKLaFx1MDAwM8El6sQy51x1MDAxZavQRfOUUYJTWyT1ilxyPltiplo078dcdTAwMWaL1lx1MDAwMuLrqFx1MDAxOcMjXHUwMDEzXHUwMDAwhFx1MDAwYo8kaf+ol2rt1+NSpX+cq1x1MDAxN9SpOWqIzH68Ko5cItJTyHHMkLdA+bA+3iTwkVx1MDAxZVhcdTAwMTQuIFCEiIG4lidcYuIxaqRcdTAwMTKSMMaUI3kpPVx1MDAwM3GE4cZcYlx1MDAwMuGRMEtNTVx1MDAwNljQVkqCLFx1MDAwZa74QYVnTJiF4Jj5kc7MXHUwMDExS+2xvkfYaye3lS/p/P399tbjYy1ZtypXXFxcctXz27bhhko2vjJcdTAwMDGhTlx1MDAxM8OEb8qdZVx1MDAwN4b70cXw57i4VUopkFmYYVx1MDAxN8mYXHUwMDE1XHUwMDAxXHUwMDFiI1x1MDAxNdxHXHUwMDBit1RcdTAwMTjm2IDkq5IsXHUwMDEzXG4xyFOpMXAvnVx1MDAxY85cdTAwMTFcdTAwMGKQjFx1MDAwMbBcbjhymjm6XHR9nUZKOrzWKVx1MDAxYzpOh94sOlPHXHUwMDA0Rzvux/3XXHUwMDE0oXfRK3wj01x1MDAwYpx6RFpcdTAwMWNNVkb5XHUwMDA3JN+9t/KE4Vx1MDAxY9dcdTAwMWSC71x1MDAxNo4pV8ZcctarNbdIXHUwMDFhXHUwMDAwn3BccnxRXHUwMDBmXHUwMDFlMFx1MDAxOGWLXFw7bDn1mpi9w1x1MDAxYVx1MDAwYjE4dVUoabA5e1jv0Nh4O91cdTAwMWHZKb33m9RXpll7qTe38merL6K/fV5M2LpcdFx1MDAwNmZcdTAwMDHyXHJcdTAwMTmKwj/Mxlx1MDAwYjsybFx1MDAwMmIvuODIxCVcdTAwMDZ1ejXxgOGqgq9xJUlcdTAwMTNccoRlXHUwMDBm3p5uWm9vx0KUr8rV9ctTftGlOXf2gOPYioCHwymE3dr4PvRRvMBaOC5PXHUwMDEzmlx1MDAwMVx1MDAwNKNTZjSit9KvjGRZKGVI3kWFsYxcdTAwMGIz7Nv8PCuKJ67h1FxyoZZcdTAwMWKpdVDJ0sVAblx1MDAwNYiBgbjmnrRoZpnROOI0aorBUFuqlcBSXHUwMDE1ROPB3dzGsVx1MDAwM3LZ9Vx1MDAxMYqANFhVYZUzh6vDXHUwMDExXHUwMDEwI1xm3OFcZnhV30sqX0ZAK0iQ8Vx1MDAxZfKvfH9CI7zlY1x1MDAwMmRxXHUwMDAxobFLi9DB/NXB6XP5sPd8tmuNMpc35dfz9XjpXGZcdTAwMDChXHUwMDFlZ9IqzkDJmL+jcaCGhnhcdTAwMTLCXHUwMDExtCrKUlx1MDAwN18gXHUwMDE2kyGWIWDlXHRTUjJHYEKWWplcZlx1MDAwMVFtObhgTV3xSjhcdTAwMDBC4lx1MDAwNWn0PGk/lMheXFydmrXCMcmyzd3nUuX08TJZ9lx1MDAwMiXr+/FPuFx1MDAxY499/evwJDxZkVx1MDAwMnChVEjNjeU4ri1cdTAwMWOMIKDv0lhcIkBQLLbaOIQwXYTgXHUwMDE2kVx1MDAxOFxiIcNcdTAwMTU+XHUwMDExgKRgl4wkY2SmcJ1cdTAwMTNsk3V0XSwhQsSGXHUwMDE5XHUwMDAz4Fx1MDAxMtPXzrgrXHUwMDE0I1x1MDAxMGmUooymT8ZhLE4pfVx1MDAxNSPcvE+I/vSpi4tcdTAwMTEmuHFnsmT00lwitPC12DivP2auNjNHpLK3ccbIXV6kkTMxwiNIzc5ccrKDXHRcdTAwMWRcdTAwMWN4Zdx6XHUwMDE4sFBcdTAwMDVgXHUwMDEyIKhxQFx1MDAwNK49nHCXykgscNJlj0ZCyFx1MDAwMFx1MDAxNtByQZxcdTAwMDWPYFwiZbg3gWA9Xug5Qoad9YvNNX22S0xvda+9pV67xcNmwpSJYvL7IUOGUu1cdTAwMDFw5lhtwuZcdTAwMDc+UvHgzFx1MDAwM22BoEtLYpg1fOLxwlx1MDAxNWVwvHFcdTAwMTVJXHUwMDEzk6SAPFx1MDAxOINcdTAwMTOnXHUwMDA2p1x1MDAxNlxyXFyDWIycXHQh2ljF4H7C7Vx1MDAxNIpbx1DvrFx1MDAxMZBb4mMgIK6ZJ5jmaHxcdCajRklUJZFcdTAwMWXAPUk5wH5qRTBd7UqSLMtE4Y1cdTAwMWZcdTAwMDL0VepcdTAwMTA299CysTWGa5zAT1x1MDAxYlx1MDAwML0nSaYq7X5cdTAwMDCghcmNhFx1MDAxN4kmQJNx3Fx1MDAxMysl0rh+qKvCy6lpXWR5o3JRaZt6L1x1MDAxNtzJcNycXHUwMDA3hopKXHUwMDFjRuB6NDFcdFx1MDAwMVx0qCTWulx1MDAwMS1cdTAwMWJug1x1MDAxZFx1MDAxZVx1MDAxNvBcdTAwMTJcdTAwMGVuM1BNhuy6QVx1MDAxNVxcJkSSJkTAb4BD1M7sZXhBSGKrIZ/nWpmX68NytrlvznJcdTAwMWI793tvz706P39IXHUwMDA2buA6fXTY31x1MDAwNW5CpXjs21+HXHUwMDFlM02H4PNcdTAwMDfvaS129Cnlo9Kdn/t3y0RcZvfPqCcp2Fx1MDAxOKFBJlxiNWNEXHUwMDFkSFY6wVx1MDAxNC1rJIlcdTAwMTIgOEQlLXPlP3h405hcdTAwMDVcXMmsYDPoXHUwMDEysZyrL1x1MDAxMXVcZpJcdTAwMDRcdTAwMDNqi19FmWSC23amQFx1MDAwMldcdTAwMTehipXdu9JZrrytz07Xqq/lx+u9TpXGzILgYiWBxUohhX9cdTAwMGLzOyrg2KFcclx1MDAxOFx1MDAxMGIgZmRQXHUwMDE1XHUwMDE5hG/SXG6IXFyFXHUwMDA2m0RcdTAwMWSayYVcdTAwMDdcdTAwMWbg8C5cdTAwMTFCXHUwMDExYZadI0nZPFx1MDAxOMTayIToglx04VxuTITlSlx0Nseuz92H547deVtrXHUwMDFhdlI4f7SnXVNM1vWpccfNVGue0k6CSFx1MDAwZmRcdTAwMTaEXHUwMDBme9+pldT//aBMTzxgqKZcZo5cdTAwMTeeVPk6XHUwMDEySaP8XHUwMDAygSCcNyf4XHUwMDFmTiNcdTAwMDbhxkj9XHUwMDA1idJcdTAwMDLSnG5cdTAwMTJcdTAwMDRAkOCaXHUwMDExJTVcdTAwMGWRXHUwMDBiaoMnxbGnnSDDKDdMSEpnXlx1MDAxNXIrQFxmUMT1XHUwMDAw9VxiXHUwMDFjdSGa+DnAXHUwMDA3llx1MDAxON7mUlxuJD7BXHUwMDE2viBFgV6u5E3UOMKZkpRcdTAwMTknKFxub1x1MDAxY1x1MDAwMXWH2JnZ9Fx1MDAxN8tgpoV9aVx1MDAxNmaAXHUwMDE4ulxie75cdTAwMTlcYoXnRCZcdTAwMDBcdTAwMTUnedngilwiVC6bp4Xr++rTdq9T3l3X9dP2WtaxXHUwMDA325lcdTAwMTMh1jNcXFx1MDAxOeyGXHUwMDA2taJcdTAwMDHSU+JcdTAwMTkpwapBnI471Fx1MDAxY1x1MDAxYniDSkclOFx1MDAwN8JcYqWcKoGx5lJcdFx1MDAxM+JcdTAwMWStXHUwMDE4IS7NXHUwMDE0obtcdTAwMTOkwlx1MDAxNmc7T37TXHIqc/n8pr4s5zNmrdW4XFzfv0mWXHUwMDE00cxQslx1MDAwMITs/k9cdTAwMDfFd1x1MDAxZVgkXHUwMDAxmykxODQoLMVcdTAwMDZcXKaYr//2kyaMXHUwMDAxatNcdTAwMDa3XHUwMDA2XHUwMDFiS0VwXHUwMDAyNvXtMFx1MDAwNIRcdTAwMGb8iaJcdTAwMDBcdTAwMDZcdTAwMTmETY5cdTAwMWXWWUNcdTAwMGa3OMaAXHUwMDFlXHUwMDE5QLFIrFx1MDAxNmpcdTAwMDYlQmGFnCdcdTAwMTC4c3/X5Cdz6tA+LDMyMcBcdTAwMDfOO1x1MDAwYiqd+2FCTZzFjbxgN9KnKtNS+aeCp8BcdTAwMWVcdTAwMWSQ0sXmTZ1cdTAwMDBcdTAwMTPGocfnXHUwMDA1RWjcfnZcdTAwMTfuydF69qhPr8/am0fNSu4oLvKQXHUwMDFl1Vx1MDAwNrmsjDBkbN6WXHUwMDFi6ykrtVx1MDAxMMaALfGPdkZcdTAwMDBcdTAwMGbGXHUwMDE1hFx1MDAxMNhcdTAwMWavLOBcdTAwMTVplug/IfBcdTAwMDBcdTAwMGJcYtG8XHTOwlx1MDAwZqwgXHUwMDBiU0z1/lx1MDAxOIcpgvRcdTAwMTRzsLlpsVx1MDAwMEJQzOZcdTAwMDFcdTAwMTBcdTAwMTIwdEF8bFx1MDAwMaxLLHRcIkxwsZtcIq0pU1x1MDAwNEwwXHUwMDA203ZcdTAwMDJCmOW0bYtcdTAwMWS8UKt7xNbPZety5+q0b+J1sE0wXCKKXHUwMDAxkFx1MDAwMzhcIrlcdTAwMDE4x3zF/882XG5uPInbgOFGaVwiXcnbZUk38Yy+XHUwMDFj7NFzJ2vDu005ZrBcdTAwMDBFznHIr5a57W+S4uFBqXh1unq0azk1Nln8Ylx1MDAxOJ9u3DdV81x1MDAxNCrHY9/+unWaYVF3MFxiJ0FcdTAwMDaEkFx1MDAxNuFcdTAwMTh30DLPPIhwy0SMIEJZb7BccktcdTAwMWFlNJjVMVtE6Fx1MDAwNFtkXHUwMDFkgGZcdTAwMTlBhLd0QVxcXHUwMDBiLsw6R2nCQ1xiJTjuTLUz2OhkLWFfr+m+w+5fQUl3gtt2lnTHLy5CXHUwMDBm72509eXuob1Wv8g3SvfP9YOX6mMsVFx1MDAxMN3XXHUwMDBlgSdusFx1MDAxM0ZQjizwQUWMQ1x1MDAwNcC0ZyVjguHuNmV8d+m3rKdJqI8pU1x1MDAxNm6xXHUwMDEzJIS2tStKsd+HTlx1MDAxM2dMiVx1MDAxMY7z5Yd183xpm49cdTAwMTWiT8xt7/g62f5rg7uTvlx1MDAxZiNcdTAwMDRcbrp6ZIdTQKInXHUwMDFlL5pcdTAwMDcg4nhfRyGpNJBpXFxfSsApM2RcdTAwMWOlMsgkOPe+dkBAmlx1MDAxMkuJQk4la41wIKC5U1x1MDAwMbhVIFx1MDAwNiTiVntcZvdZwFx1MDAwNXHjW3g0MMSce4JcZmZcdTAwMDA1botcdTAwMGXOXHUwMDE3cVx1MDAwN7vhXHUwMDEyXHUwMDExRTS5Y2O5XGaZXHUwMDE1XG4lV6ZUaaVcdTAwMTTjM2hzQ4aBqao7n5CoX/nRXFyEim44XHUwMDBlmlx1MDAwMFRcdTAwMDI46POKXCI07pJcXO2UWye02Xnb2e9dZVx1MDAxZa5ub2Nu3Vx1MDAxZcxcdTAwMWFJxdBCcCrJXHUwMDE4XHUwMDA1mcW8KmXwtjTMOlpLmYfzXHTI2s7BOirJXHUwMDFkQ31Uck9zXHJ2SHKrmFqin8RDfUqAYXSmWcM3WFJCOTij6Xovpi3xPvXtlb5uXdR3y2Qrk33au6juJIM/1Fxu3338LvhcdTAwMTMu1/hcbkj0PNBKkj1cdTAwMTCEXHUwMDAz8OVcdTAwMDSwXHUwMDE0wVZVxoNoRXpEXGJBkC5cdTAwMTi7pNSUW7iT5JlcdTAwMDVcdTAwMDTgIMtcdTAwMDDIOd5WV55cdTAwMTmwipGYXHUwMDExUYwyOvN9VW55jYFVMkwxT1uKgSOgXHUwMDE1uM/jNWCkkbBIXHUwMDFhOGBU8fm1z1x1MDAxYfDQ1C7xymS8ojRcdTAwMDdJNU7uxlxiuKJcdTAwMDQygFORfq3J4F61L1xy5Vx1MDAxNVx1MDAxYq1uebGrwFx1MDAxM6DFOFxcXHUwMDE5XlGE0lXbXHUwMDFiJ3Tvsd7Tu1x1MDAxNXad1WIjv8riwlx1MDAxNeaBXHUwMDAxYVx1MDAxNNfaMWlGJ2EgtPQgPCNcdTAwMTBcYkH8IFx1MDAxZISpztWZXHUwMDEyXHUwMDE0WVxmSI9cYvjMJeVA0lxujuFS+Dm8hnpcdTAwMTncdfuhl1x1MDAwMjdZXHUwMDExoeY4k9etmfpFsbPP9VU/9/S0xo7X2ufJsFx0x0Tht2OTsc2ZY+I7XHUwMDBmKJKo00viuFx1MDAxZKgqwe5hpVxcSyGENbjXUVx1MDAwMfSTbNpO+Fx1MDAwNEsh4DFcdTAwMTJGXHTHpK7FWa9vqFx1MDAxY7nFMVx1MDAxNvSQKsJcYmpMlDCGXHUwMDFjXGY4XHUwMDEyXHUwMDE4tIEu3LFsfFx1MDAwZs+TYLeXXCLuvd0sXHUwMDE0eFxmglx1MDAwM0Zn0HxmuPV7qWlxx3djjvCm91x0+MCJOaI0TecrveY+adzfXHUwMDFmZrf42WXvKPdcdTAwMWFcdTAwMWJvUM9cdTAwMTCMVlxmVoHYeMuIUZ4yhjNmNVx1MDAxMsBcdTAwMDTzI1pcdIAk2ipsizfKOIZOli0jSVx1MDAwMYfAJXxGXHUwMDA0XHUwMDE5XHUwMDEwXHUwMDExXHUwMDAzhreMXGJJkWV/nvN9O4U7XCJcdTAwMGXrNXlzVro9PVxcXHUwMDE3O7ncblwizGEpzrl8O+ZcYpXjsW9/XHUwMDFkc8y2ZVx1MDAwNJwmUippKsFzMq6CXHUwMDFiMmNcZualzVx1MDAwYuRcdTAwMTSSXHUwMDE4SFx1MDAwMNfg0UH/XHUwMDBirpDwUS29XHUwMDFiJ8k8XHUwMDFhaZwodZRMllx1MDAwYqHCoYBFwVx1MDAxMcI5XHUwMDA0XHUwMDE3XFzL+1x1MDAxOetALEr1XGZWZnPKqY9tbPr1XHUwMDEx71H74M+lcvGmU64sUE9J+HqoXHSO3b1WXCL6WiPU9F5ns5m9rdpR7ziTaedk7TKzXolcdTAwMDcjcDWsNFRxzVx1MDAwNdeGj5Y2x4osKpgqdE7NMU9qpiRcdTAwMGV/XHUwMDE5f9vKb1lrk6yXkFx1MDAwMilLXajehneUoCVVZp5cdTAwMTDiiYOBzzzJx61+R5x0bs7LN7KdXGZCyFx1MDAxMay0XHUwMDEwaYuA+KZcdCHSqKBQLYxkhFx1MDAxMVx1MDAwYmG2kT68466fkCmbPVx1MDAxMtRPcFx1MDAwZsKA2duoXHUwMDAxhlx0XHUwMDBl8nFP4m5cYq6EXHUwMDExXFwpaWa+y9ItnTGwS4Za7pFQm0hcdTAwMDFvXG6Ni8qRuIlcdTAwMDS7PXzcWMssxmToXCJcdFx1MDAxNVx1MDAxNu+lw9yx0JBcdPkymdTpMzpbbJOait3sXHUwMDAzuPSbgy99MzZcdE9iTEBcdTAwMGLj2OTn5URoWunViNrFXbXPXHUwMDFljmXHXFxs31/xmMNzVOGWXHUwMDFkXFxqXGbOSzIqR2uVgem5OPhDXHUwMDEz49HB+lx1MDAxOU5AhdmyqSMp/iCWXHRKqZNmLGrwRVtcYk/9wGXmXHUwMDEw5LLRzJYq+fJTX+Wu83l2eL7W6CeDIINcdTAwMGWDxYIgXHUwMDAxXHSeXHUwMDA3XHUwMDA0SVI5oUaA9cWhQa41k3TSXFxcdTAwMWXiqelASILSXHQ1gD9wry2ANqJ8bP8rc6ucuMUxXHUwMDE25jBcdTAwMDZrI2GGUCO5m0TgXHUwMDA3XHUwMDAyQY1cdTAwMGVcdTAwMWHCJepIljChhlx1MDAxOSyxufnoQ2FcdTAwMDdcdTAwMTX4dKxJv2nDamX1l8ZuXHUwMDE2XHUwMDFkeEzAXHTJgVx1MDAwN2GvtZe3685urX5S294n3Y1zyuNcdTAwMDJcdTAwMGaCPf9cdTAwMTCSXGJcdCje1/L1rm9aelxmVE1cdTAwMWFCccVYcLhGWerhXHUwMDBldlwiXHUwMDA3/anaoX7L8knS8lx0Ulx1MDAwYqPTcCmlXGZn08A2YSPVNMHAtCzKm2frxfXz3dPT1Y3XRnutXHUwMDBiXHUwMDAx5lUy3GGI5N/Pjlx1MDAxOCrGY9/+Ou6YbfWEw/lTK7RcdTAwMWFsQ9ZB3z86XHUwMDAyw1x1MDAxZEmRtEmVnVwiXHUwMDEyXHUwMDAzXHUwMDBi4HhcdTAwMTOVSH3AcG9cdTAwMDdcdTAwMWLLP2hcdTAwMDGWK9I0UeGIipbFk3AsoCniLudKhfBcdTAwMDFcXMpcdTAwMDGkMXhcdTAwMDKp5yA4MdRfMpu2eFJcdTAwMWGczOfypVx1MDAxZs13j/r7WmWlU678+c/kXHUwMDBm341cdTAwMTXCXHUwMDBiKFx1MDAxM1xcu7OAXHUwMDEy43oj9FVl1jeeb4rZU/3SzFfKdvdG3l7HXHUwMDAyXHUwMDEzUnqAyYk1WoDrYmK068kyg/FcdTAwMTJcdTAwMTXUYkQkXHUwMDFkbU9cdTAwMWEr2FxcXGLMOVx1MDAxYeGofC6RRFIkYbWVmmrhUmpcdTAwMWJaXHUwMDBmNVx1MDAwNlx1MDAxN0jPtYhynLl+rJdcdTAwMGV3My9PzbXt2t1ZdvPqLFx0kuBI0cC+v1x1MDAwZiNMise+vOhAXHUwMDAyolx1MDAwMUU4YDM64EdcdTAwMGXuiOJcdTAwMWUoszZKMFxcN26kg3Eo7bFVp4jEQFx1MDAxMoxzT1x1MDAwM9JkzDCNSyZGXHJcdTAwMTNyXHUwMDBlRVx1MDAxYSa/91tcdTAwMGWCxFx1MDAwMFx1MDAxMlx1MDAxY4k2XHJcdTAwMGLmXHUwMDBm3mFdKJTAjfZWKZn+fiYwXHUwMDBl/v6aaaFEoVTy9yPcfH/vRfhgyFx1MDAwNEfuhFx1MDAwZY7ri1BIvbq2/tTr3/Y7stDJNJr5m6tKzLZNnM3SXGb7b9HEXHUwMDE4PqqSSFNMmJZMSVx1MDAwMnaUxJtcdTAwMTOhUnnIm06ohd+0f1x1MDAwMPq3rKFJ4Fx1MDAwMqgtV1JcdTAwMDZ3qOC3QvVWXGZGXHUwMDAz0l8rXHUwMDBiVoRcdL5wfVx1MDAxMeNSlqZTT6MxgitkjdaWXHUwMDEwXHUwMDFj/NF+XGLy2Vx1MDAxOYFLYCFuxqWigtlcdJ1cdTAwMTGzJFx1MDAwYuxtVDOGnlx1MDAxNOXmzrOSXHUwMDA3+S1ZofmYNlx1MDAwNICWxW1cdCC1hFk7XjM1XHUwMDEwjlxiXHUwMDAx6EQxxWUspnNNtGe0XHUwMDAyPZBcZlwimWXdIKlcdTAwMDFhXHUwMDA0l7JT4epiUKHlXHUwMDA0iltHJFx1MDAwMMk5XHUwMDA2XHUwMDFjpFjtrN9s1Fx1MDAwYp2349ft/b3W/WMyXHUwMDFlII6tu1x1MDAwYjBcYj9WMlx1MDAxZFx1MDAxNeB52KYkXHUwMDE1U468JMooXHUwMDAw4Vx1MDAxY7FewDRRrF9cdTAwMTIqXHUwMDA1pn+MoWxKqvNcdTAwMDRcdTAwMDVTrsBGcmpcdTAwMTmB2M2M0Ch9nlx1MDAxNETQklJuMHyAP1x1MDAwNau4KZtFt3TGiHUyXHUwMDAwqCB4XHUwMDBiNYuDmrpkwjKwishKXHUwMDE0MIvGUcJZZk3Dmc+RwpypYEcq+qDQ4TNcdTAwMDZcIk5cdTAwMTlcYttcZkJcdTAwMWRr9HRm6SPUKfXbP5rN8vNKpfSP//RcdTAwMTFcdTAwMGJ8f6hcdTAwMTOeJZ1cdTAwMDBcIsZDnbDri9DHi5P60cZxpU5KtT1a3GqevFxuptNcYnWM1Fx1MDAxZeio4Fx1MDAxMOxwK2hQXHUwMDFmpVx1MDAxNZ6kXHUwMDE4XHUwMDA3XHRtLFlyXHUwMDFhp8HXY7jA/kaX0kZsZCHYRqynanqYXHUwMDEypahqtXrZvs1nzoqHl4c7by/iYHUzXHUwMDE5StGG8+9HKaFSPPbtr8OU2eZFhVx1MDAwMeSipLFcXFx1MDAxMFx1MDAwNWgmXHUwMDE4XFxRz1xiyVx0t1x1MDAwM+ZcdTAwMTm44JmvaHNcdTAwMGJJXGawQLWEXHUwMDBiYtpcdTAwMTg1mLRcdTAwMWat2Fx1MDAxONxcdFx1MDAxOGmalonRZFiBYmtcdTAwMDEgYPfi2lCwILDHXHUwMDFlZ1x1MDAxZNPHXG5G+iYnflx1MDAxM2nRXHSePFla9Hc/LdUvhXb7tFx1MDAwN/f208r88lQrP69cdTAwMDWl+e8qg1x1MDAxN35/oMwoyeWB9/rL7/7y/1x1MDAwMXjDmGwifQ== + + + + RPC ServerDeamon(Primary metadata owner)blob_handle:spdk_blob*blob_idref_count...handleidspdk_thread_groupPasermd threadsio threadcreatedeleteopencloseresizereadwriteref++ref--NVmeOther Process(Secondary)Main ProcessHOOK目录操作Linux文件系统createcreatexattr=blobidfd<>handleidopenfdopenfdfd<>handleidwrite fdftruncate(fd)reszieforkforkfd<>handleidblob_createblob idhandle idblob_openblob idhandle idblob_writehandle idwritesizereadfdblob_readhandle idbufsizeclosefdcloseblob_closeblob_dec_refhandle idunlinkunlinkblob_deleteblob idunlink(if ref==0)blob_add_refhandle_iddupnew fd<>handleidblob_add_refhandle_id \ No newline at end of file