RecordBatchReader 類別 — RecordBatchReader • Arrow R 套件

Apache Arrow 定義了兩種格式，用於序列化資料以進行行程間通訊 (IPC)：一種是「串流」格式，另一種是「檔案」格式，稱為 Feather。RecordBatchStreamReader 和 RecordBatchFileReader 分別是用於從這些格式的輸入來源存取記錄批次的介面。

有關如何使用這些類別的指南，請參閱範例章節。

工廠方法

RecordBatchFileReader$create() 和 RecordBatchStreamReader$create() 工廠方法實例化物件，並接受一個單一參數，該參數根據類別命名

file 字元檔案名稱、原始向量或 Arrow 檔案連線物件 (例如 RandomAccessFile)。
stream 原始向量、Buffer 或 InputStream。

方法

$read_next_batch()：傳回 RecordBatch，在 Reader 中迭代。如果 Reader 中沒有更多批次，則傳回 NULL。
$schema：傳回 Schema (主動綁定)
$batches()：傳回 RecordBatch 的列表
$read_table()：將 reader 的 RecordBatch 收集到 Table 中
$get_batch(i)：對於 RecordBatchFileReader，依整數索引傳回特定的批次。
$num_record_batches()：對於 RecordBatchFileReader，查看檔案中有多少批次。

另請參閱

read_ipc_stream() 和 read_feather() 提供了更簡單的介面，用於從這些格式讀取資料，並且在許多使用情況下都已足夠。

範例

tf <- tempfile()
on.exit(unlink(tf))

batch <- record_batch(chickwts)

# This opens a connection to the file in Arrow
file_obj <- FileOutputStream$create(tf)
# Pass that to a RecordBatchWriter to write data conforming to a schema
writer <- RecordBatchFileWriter$create(file_obj, batch$schema)
writer$write(batch)
# You may write additional batches to the stream, provided that they have
# the same schema.
# Call "close" on the writer to indicate end-of-file/stream
writer$close()
# Then, close the connection--closing the IPC message does not close the file
file_obj$close()

# Now, we have a file we can read from. Same pattern: open file connection,
# then pass it to a RecordBatchReader
read_file_obj <- ReadableFile$create(tf)
reader <- RecordBatchFileReader$create(read_file_obj)
# RecordBatchFileReader knows how many batches it has (StreamReader does not)
reader$num_record_batches
#> [1] 1
# We could consume the Reader by calling $read_next_batch() until all are,
# consumed, or we can call $read_table() to pull them all into a Table
tab <- reader$read_table()
# Call as.data.frame to turn that Table into an R data.frame
df <- as.data.frame(tab)
# This should be the same data we sent
all.equal(df, chickwts, check.attributes = FALSE)
#> [1] TRUE
# Unlike the Writers, we don't have to close RecordBatchReaders,
# but we do still need to close the file connection
read_file_obj$close()