Feather 檔案格式#

Feather 是一種可攜式的檔案格式，用於儲存 Arrow 表格或資料框架（來自 Python 或 R 等語言），其內部使用Arrow IPC 格式。 Feather 在 Arrow 專案初期被創建，作為 Python (pandas) 和 R 的快速、與語言無關的資料框架儲存的概念驗證。Feather 有兩個檔案格式版本

版本 2 (V2)，預設版本，在磁碟上完全以 Arrow IPC 檔案格式表示。V2 檔案支援儲存所有 Arrow 資料類型，以及使用 LZ4 或 ZSTD 進行壓縮。V2 最初在 Apache Arrow 0.17.0 中提供。
版本 1 (V1)，從 2016 年開始提供的舊版版本，已被 V2 取代。V1 檔案與 Arrow IPC 檔案不同，並且缺少許多功能，例如儲存所有 Arrow 資料類型的能力。V1 檔案也缺乏壓縮支援。我們計劃在可預見的未來維持對 V1 的讀取支援。

pyarrow.feather 模組包含此格式的讀取和寫入函數。write_feather() 接受 Table 或 pandas.DataFrame 物件

import pyarrow.feather as feather
feather.write_feather(df, '/path/to/file')

read_feather() 將 Feather 檔案讀取為 pandas.DataFrame。read_table() 將 Feather 檔案讀取為 Table。在內部，read_feather() 只是呼叫 read_table()，然後將結果轉換為 pandas

# Result is pandas.DataFrame
read_df = feather.read_feather('/path/to/file')

# Result is pyarrow.Table
read_arrow = feather.read_table('/path/to/file')

這些函數可以使用檔案路徑或檔案類物件進行讀取和寫入。例如

with open('/path/to/file', 'wb') as f:
    feather.write_feather(df, f)

with open('/path/to/file', 'rb') as f:
    read_df = feather.read_feather(f)

傳遞給 read_feather 的檔案輸入必須支援搜尋。

使用壓縮#

從 Apache Arrow 0.17.0 版本開始，Feather V2 檔案（預設版本）支援兩種快速壓縮函式庫：LZ4（使用 frame 格式）和 ZSTD。如果 LZ4 可用（如果您透過正常的套件管理器取得 pyarrow，它應該可用），則預設會使用 LZ4

# Uses LZ4 by default
feather.write_feather(df, file_path)

# Use LZ4 explicitly
feather.write_feather(df, file_path, compression='lz4')

# Use ZSTD
feather.write_feather(df, file_path, compression='zstd')

# Do not compress
feather.write_feather(df, file_path, compression='uncompressed')

請注意，預設的 LZ4 壓縮通常可以在不犧牲太多讀取或寫入效能的情況下，產生更小的檔案。在某些情況下，由於減少了磁碟 IO 需求，LZ4 壓縮的檔案可能比未壓縮的檔案讀寫速度更快。

寫入版本 1 (V1) 檔案#

為了與不支援版本 2 檔案的函式庫相容，您可以透過傳遞 version=1 給 write_feather 來寫入版本 1 格式。我們計劃在可預見的未來維持對 V1 的讀取支援。