驅動程式範例¶

食譜來源：driver_example.cc

在此，我們將展示使用 ADBC 驅動程式框架函式庫在 C++ 中建構 ADBC 驅動程式的結構。這與 ADBC 用於建構其 SQLite 和 PostgreSQL 驅動程式的函式庫相同，並抽象化了 C 可調用物件和目錄/元數據函式的細節，這些細節可能難以實作，但對於有效利用 ADBC 生態系統的其餘部分至關重要。

在高階層次上，我們將建構一個驅動程式，其「資料庫」是一個目錄，其中資料庫中的每個「表格」都是一個包含 Arrow IPC 串流的檔案。表格可以使用批量擷取功能寫入，表格可以使用 SELECT * FROM (檔案) 形式的簡單查詢讀取。

安裝¶

此快速入門實際上是一個可讀的 C++ 檔案。您可以複製儲存庫、建置範例並跟隨操作。

我們假設您使用 conda-forge 來處理依賴項。需要 CMake、C++17 編譯器和 ADBC 函式庫。它們可以按如下方式安裝

mamba install cmake compilers libadbc-driver-manager

建置¶

我們在此使用 CMake。從 ADBC 儲存庫的來源簽出

mkdir build
cd build
cmake ../docs/source/cpp/recipe_driver -DADBC_DRIVER_EXAMPLE_BUILD_TESTS=ON
cmake --build .
ctest

使用 C++ 建置 ADBC 驅動程式¶

讓我們先從一些包含項目開始。值得注意的是，我們需要驅動程式框架標頭檔和 nanoarrow，我們將使用它來建立和使用此範例驅動程式中的 Arrow C 資料介面結構。

#include "driver_example.h"

#include <cstdio>
#include <string>

#include "driver/framework/connection.h"
#include "driver/framework/database.h"
#include "driver/framework/statement.h"

#include "nanoarrow/nanoarrow.hpp"
#include "nanoarrow/nanoarrow_ipc.hpp"

#include "arrow-adbc/adbc.h"

接下來，我們將一些必要的框架類型帶入命名空間，以減少實作的冗長性

adbc::driver::Option：選項可以在 ADBC 資料庫、連線和陳述式上設定。它們可以是字串、不透明二進位檔、雙精度浮點數或整數。Option 類別抽象化了如何取得、設定和剖析這些值的細節。
adbc::driver::Status：Status 是 ADBC 驅動程式框架的錯誤處理機制：沒有傳回值但可能會失敗的函式會傳回 Status。您可以使用 UNWRAP_STATUS(some_call()) 作為 Status status = some_call(); if (!status.ok()) return status; 的簡寫，以簡潔地傳播錯誤。
adbc::driver::Result：Result<T> 用作函式的傳回值，這些函式在成功時傳回類型為 T 的值，而在失敗時使用 Status 傳達其錯誤。您可以使用 UNWRAP_RESULT(some_type value, some_call()) 作為簡寫，表示
```
some_type value;
Result<some_type> maybe_value = some_call();
if (!maybe_value.status().ok()) {
  return maybe_value.status();
} else {
  value = *maybe_value;
}
```

using adbc::driver::Option;
using adbc::driver::Result;
using adbc::driver::Status;

namespace {

接下來，我們將提供資料庫實作。驅動程式框架使用奇異遞迴模板模式 (CRTP)。框架會處理此模式的細節，但實際上這仍然只是覆寫處理細節的基底類別中的方法。

在此，我們的資料庫實作將只記錄使用者傳遞的 uri。我們對此的解釋將是 file:// uri 到一個目錄，我們的 IPC 檔案應寫入和/或應從中讀取 IPC 檔案。這是 ADBC 中資料庫的角色：資料庫的共用句柄，可能會在連線之間快取一些共用狀態，但仍然允許多個連線同時對資料庫執行操作。

class DriverExampleDatabase : public adbc::driver::Database<DriverExampleDatabase> {
 public:
  [[maybe_unused]] constexpr static std::string_view kErrorPrefix = "[example]";

  Status SetOptionImpl(std::string_view key, Option value) override {
    // Handle and validate options implemented by this driver
    if (key == "uri") {
      UNWRAP_RESULT(std::string_view uri, value.AsString());

      if (uri.find("file://") != 0) {
        return adbc::driver::status::InvalidArgument(
            "[example] uri must start with 'file://'");
      }

      uri_ = uri;
      return adbc::driver::status::Ok();
    }

    // Defer to the base implementation to handle state managed by the base
    // class (and error for all other options).
    return Base::SetOptionImpl(key, value);
  }

  Result<Option> GetOption(std::string_view key) override {
    // Return the value of options implemented by this driver
    if (key == "uri") {
      return Option(uri_);
    }

    // Defer to the base implementation to handle state managed by the base
    // class (and error for all other options).
    return Base::GetOption(key);
  }

  // This is called after zero or more calls to SetOption() on
  Status InitImpl() override {
    if (uri_.empty()) {
      return adbc::driver::status::InvalidArgument(
          "[example] Must set uri to a non-empty value");
    }

    return Base::InitImpl();
  }

  // Getters for members needed by the connection and/or statement:
  const std::string& uri() { return uri_; }

 private:
  std::string uri_;
};

接下來，我們實作連線。雖然資料庫的角色通常是儲存或快取資訊，但連線的角色是提供可能難以取得的資源句柄（例如，連線到資料庫時協商身份驗證）。因為我們的範例「資料庫」只是一個目錄，所以我們不需要在連線中執行太多資源管理，除了提供一種讓子陳述式存取資料庫 uri 的方法。

連線的另一個角色是提供關於表格、欄位、統計資訊和其他目錄類資訊的元數據，呼叫者可能想在發出查詢之前知道這些資訊。驅動程式框架基底類別提供協助程式來實作這些函式，以便您主要可以使用 C++17 標準函式庫來實作它們（而不是自己建構 C 級陣列）。

class DriverExampleConnection : public adbc::driver::Connection<DriverExampleConnection> {
 public:
  [[maybe_unused]] constexpr static std::string_view kErrorPrefix = "[example]";

  // Get information from the database and/or store a reference if needed.
  Status InitImpl(void* parent) {
    auto& database = *reinterpret_cast<DriverExampleDatabase*>(parent);
    uri_ = database.uri();
    return Base::InitImpl(parent);
  }

  // Getters for members needed by the statement:
  const std::string& uri() { return uri_; }

 private:
  std::string uri_;
};

接下來，我們提供陳述式實作。陳述式是管理查詢執行的地方。因為我們的資料來源實際上是 Arrow 資料，所以我們不必提供管理類型或值轉換的層。SQLite 和 PostgreSQL 驅動程式都投入了許多程式碼行來有效率地實作和測試這些轉換。nanoarrow 函式庫可用於在兩個方向上實作轉換，並且是另一篇文章的範圍。

class DriverExampleStatement : public adbc::driver::Statement<DriverExampleStatement> {
 public:
  [[maybe_unused]] constexpr static std::string_view kErrorPrefix = "[example]";

  // Get information from the connection and/or store a reference if needed.
  Status InitImpl(void* parent) {
    auto& connection = *reinterpret_cast<DriverExampleConnection*>(parent);
    uri_ = connection.uri();
    return Base::InitImpl(parent);
  }

  // Our implementation of a bulk ingestion is to write an Arrow IPC stream as a file
  // using the target table as the filename.
  Result<int64_t> ExecuteIngestImpl(IngestState& state) {
    std::string directory = uri_.substr(strlen("file://"));
    std::string filename = directory + "/" + *state.target_table;

    nanoarrow::ipc::UniqueOutputStream output_stream;
    FILE* c_file = std::fopen(filename.c_str(), "wb");
    UNWRAP_ERRNO(Internal, ArrowIpcOutputStreamInitFile(output_stream.get(), c_file,
                                                        /*close_on_release*/ true));

    nanoarrow::ipc::UniqueWriter writer;
    UNWRAP_ERRNO(Internal, ArrowIpcWriterInit(writer.get(), output_stream.get()));

    ArrowError nanoarrow_error;
    ArrowErrorInit(&nanoarrow_error);
    UNWRAP_NANOARROW(nanoarrow_error, Internal,
                     ArrowIpcWriterWriteArrayStream(writer.get(), &bind_parameters_,
                                                    &nanoarrow_error));

    return -1;
  }

  // Our implementation of query execution is to accept a simple query in the form
  // SELECT * FROM (the filename).
  Result<int64_t> ExecuteQueryImpl(QueryState& state, ArrowArrayStream* stream) {
    std::string prefix("SELECT * FROM ");
    if (state.query.find(prefix) != 0) {
      return adbc::driver::status::InvalidArgument(
          "[example] Query must be in the form 'SELECT * FROM filename'");
    }

    std::string directory = uri_.substr(strlen("file://"));
    std::string filename = directory + "/" + state.query.substr(prefix.size());

    nanoarrow::ipc::UniqueInputStream input_stream;
    FILE* c_file = std::fopen(filename.c_str(), "rb");
    UNWRAP_ERRNO(Internal, ArrowIpcInputStreamInitFile(input_stream.get(), c_file,
                                                       /*close_on_release*/ true));

    UNWRAP_ERRNO(Internal,
                 ArrowIpcArrayStreamReaderInit(stream, input_stream.get(), nullptr));
    return -1;
  }

  // This path is taken when the user calls Prepare() first.
  Result<int64_t> ExecuteQueryImpl(PreparedState& state, ArrowArrayStream* stream) {
    QueryState query_state{state.query};
    return ExecuteQueryImpl(query_state, stream);
  }

 private:
  std::string uri_;
};

}  // namespace

最後，我們建立驅動程式初始化函式，驅動程式管理器需要此函式來為構成 ADBC C API 的 Adbc**() 函式提供實作。此函式的名稱很重要：此檔案將建置到名為 libdriver_example.(so|dll|dylib) 的共用函式庫中，因此驅動程式管理器將尋找符號 AdbcDriverExampleInit() 作為預設進入點，當被要求載入驅動程式 "driver_example" 時。

extern "C" AdbcStatusCode AdbcDriverExampleInit(int version, void* raw_driver,
                                                AdbcError* error) {
  using ExampleDriver =
      adbc::driver::Driver<DriverExampleDatabase, DriverExampleConnection,
                           DriverExampleStatement>;
  return ExampleDriver::Init(version, raw_driver, error);
}

低階測試¶

食譜來源：driver_example_test.cc

在我們撰寫驅動程式的草圖後，下一步是確保驅動程式管理器可以載入它，並且可以初始化和釋放資料庫、連線和陳述式實例。

首先，我們將包含驅動程式管理器和 googletest。

#include "driver_example.h"

#include "arrow-adbc/adbc_driver_manager.h"
#include "gtest/gtest.h"

接下來，我們將為基本生命週期宣告一個測試案例

TEST(DriverExample, TestLifecycle) {
  struct AdbcError error = ADBC_ERROR_INIT;

  struct AdbcDatabase database;
  ASSERT_EQ(AdbcDatabaseNew(&database, &error), ADBC_STATUS_OK);
  AdbcDriverManagerDatabaseSetInitFunc(&database, &AdbcDriverExampleInit, &error);
  ASSERT_EQ(AdbcDatabaseSetOption(&database, "uri", "file://foofy", &error),
            ADBC_STATUS_OK);
  ASSERT_EQ(AdbcDatabaseInit(&database, &error), ADBC_STATUS_OK);

  struct AdbcConnection connection;
  ASSERT_EQ(AdbcConnectionNew(&connection, &error), ADBC_STATUS_OK);
  ASSERT_EQ(AdbcConnectionInit(&connection, &database, &error), ADBC_STATUS_OK);

  struct AdbcStatement statement;
  ASSERT_EQ(AdbcStatementNew(&connection, &statement, &error), ADBC_STATUS_OK);

  ASSERT_EQ(AdbcStatementRelease(&statement, &error), ADBC_STATUS_OK);
  ASSERT_EQ(AdbcConnectionRelease(&connection, &error), ADBC_STATUS_OK);
  ASSERT_EQ(AdbcDatabaseRelease(&database, &error), ADBC_STATUS_OK);

  if (error.release) {
    error.release(&error);
  }
}

位於 apache/arrow-adbc 儲存庫中的驅動程式可以使用內建驗證函式庫，該函式庫針對功能齊全的 SQL 資料庫實作通用測試套件，並提供實用程式來測試一系列輸入和輸出。

高階測試¶

食譜來源：driver_example.py

在驗證基本驅動程式功能後，我們可以使用 adbc_driver_manager Python 套件的內建 dbapi 實作來公開隨時可用的 Pythonic 資料庫 API。這也適用於高階測試！

首先，我們將匯入 pathlib 以進行一些路徑計算，以及 adbc_driver_manager 的 dbapi 模組

from pathlib import Path

from adbc_driver_manager import dbapi

接下來，我們將定義一個 connect() 函式，該函式使用我們在上一節中使用 cmake 建置的共用函式庫的位置來包裝 dbapi.connect()。為了我們的教學目的，這將位於 CMake build/ 目錄中。

def connect(uri: str):
    build_dir = Path(__file__).parent / "build"
    for lib in [
        "libdriver_example.dylib",
        "libdriver_example.so",
        "driver_example.dll",
    ]:
        driver_lib = build_dir / lib
        if driver_lib.exists():
            return dbapi.connect(
                driver=str(driver_lib.resolve()), db_kwargs={"uri": uri}
            )

    raise RuntimeError("Can't find driver shared object")

接下來，我們可以讓我們的驅動程式試試看！我們在驅動程式中實作的兩個部分是「批量擷取」功能和「從中選取全部」，所以讓我們看看它是否有效！

if __name__ == "__main__":
    import os

    import pyarrow

    with connect(uri=Path(__file__).parent.as_uri()) as con:
        data = pyarrow.table({"col": [1, 2, 3]})
        with con.cursor() as cur:
            cur.adbc_ingest("example.arrows", data, mode="create")

        with con.cursor() as cur:
            cur.execute("SELECT * FROM example.arrows")
            print(cur.fetchall())

        os.unlink(Path(__file__).parent / "example.arrows")

高階測試也可以使用 adbcdrivermanager 套件在 R 中撰寫。

library(adbcdrivermanager)

drv <- adbc_driver("build/libdriver_example.dylib")
db <- adbc_database_init(drv, uri = paste0("file://", getwd()))
con <- adbc_connection_init(db)

data.frame(col = 1:3) |> write_adbc(con, "example.arrows")
con |> read_adbc("SELECT * FROM example.arrows") |> as.data.frame()
unlink("example.arrows")