Arrow 檔案 I/O#

Apache Arrow 提供檔案 I/O 函數，以促進從應用程式的開始到結束使用 Arrow。在本文中，您將

讀取 Arrow 檔案到 RecordBatch，然後再寫回
讀取 CSV 檔案到 Table，然後再寫回
讀取 Parquet 檔案到 Table，然後再寫回

先決條件#

在繼續之前，請確保您已具備

Arrow 安裝，您可以在此處設定：在您自己的專案中使用 Arrow C++
從基本 Arrow 資料結構了解基本 Arrow 資料結構
用於執行最終應用程式的目錄 – 此程式將產生一些檔案，因此請做好準備。

設定#

在寫出一些檔案 I/O 之前，我們需要填補一些空白

我們需要包含必要的標頭檔。
需要 main() 來將事物結合在一起。
我們需要檔案來進行操作。

包含#

在編寫 C++ 程式碼之前，我們需要一些包含檔。我們將取得 iostream 以進行輸出，然後為我們將在本文中使用的每種檔案類型匯入 Arrow 的 I/O 功能

#include <arrow/api.h>
#include <arrow/csv/api.h>
#include <arrow/io/api.h>
#include <arrow/ipc/api.h>
#include <parquet/arrow/reader.h>
#include <parquet/arrow/writer.h>

#include <iostream>

Main()#

對於我們的膠水程式碼，我們將使用先前關於資料結構的教學中的 main() 模式

int main() {
  arrow::Status st = RunMain();
  if (!st.ok()) {
    std::cerr << st << std::endl;
    return 1;
  }
  return 0;
}

就像我們之前使用它一樣，它與 RunMain() 配對

arrow::Status RunMain() {

產生用於讀取的檔案#

我們需要一些檔案來實際操作。實際上，您的應用程式可能會有一些輸入。但是，在這裡，我們想要探索為了 I/O 而進行 I/O，因此讓我們產生一些檔案以使其易於遵循。為了建立這些檔案，我們將定義一個輔助函數，我們將首先執行它。請隨意閱讀此內容，但稍後將在本文中解釋所使用的概念。請注意，我們正在使用先前教學中的日/月/年資料。現在，只需複製函數即可

arrow::Status GenInitialFile() {
  // Make a couple 8-bit integer arrays and a 16-bit integer array -- just like
  // basic Arrow example.
  arrow::Int8Builder int8builder;
  int8_t days_raw[5] = {1, 12, 17, 23, 28};
  ARROW_RETURN_NOT_OK(int8builder.AppendValues(days_raw, 5));
  std::shared_ptr<arrow::Array> days;
  ARROW_ASSIGN_OR_RAISE(days, int8builder.Finish());

  int8_t months_raw[5] = {1, 3, 5, 7, 1};
  ARROW_RETURN_NOT_OK(int8builder.AppendValues(months_raw, 5));
  std::shared_ptr<arrow::Array> months;
  ARROW_ASSIGN_OR_RAISE(months, int8builder.Finish());

  arrow::Int16Builder int16builder;
  int16_t years_raw[5] = {1990, 2000, 1995, 2000, 1995};
  ARROW_RETURN_NOT_OK(int16builder.AppendValues(years_raw, 5));
  std::shared_ptr<arrow::Array> years;
  ARROW_ASSIGN_OR_RAISE(years, int16builder.Finish());

  // Get a vector of our Arrays
  std::vector<std::shared_ptr<arrow::Array>> columns = {days, months, years};

  // Make a schema to initialize the Table with
  std::shared_ptr<arrow::Field> field_day, field_month, field_year;
  std::shared_ptr<arrow::Schema> schema;

  field_day = arrow::field("Day", arrow::int8());
  field_month = arrow::field("Month", arrow::int8());
  field_year = arrow::field("Year", arrow::int16());

  schema = arrow::schema({field_day, field_month, field_year});
  // With the schema and data, create a Table
  std::shared_ptr<arrow::Table> table;
  table = arrow::Table::Make(schema, columns);

  // Write out test files in IPC, CSV, and Parquet for the example to use.
  std::shared_ptr<arrow::io::FileOutputStream> outfile;
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_in.arrow"));
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::ipc::RecordBatchWriter> ipc_writer,
                        arrow::ipc::MakeFileWriter(outfile, schema));
  ARROW_RETURN_NOT_OK(ipc_writer->WriteTable(*table));
  ARROW_RETURN_NOT_OK(ipc_writer->Close());

  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_in.csv"));
  ARROW_ASSIGN_OR_RAISE(auto csv_writer,
                        arrow::csv::MakeCSVWriter(outfile, table->schema()));
  ARROW_RETURN_NOT_OK(csv_writer->WriteTable(*table));
  ARROW_RETURN_NOT_OK(csv_writer->Close());

  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_in.parquet"));
  PARQUET_THROW_NOT_OK(
      parquet::arrow::WriteTable(*table, arrow::default_memory_pool(), outfile, 5));

  return arrow::Status::OK();
}

為了使您其餘的程式碼能夠正常運作，請確保將 GenInitialFile() 作為 RunMain() 中的第一行呼叫，以初始化環境

  // Generate initial files for each format with a helper function -- don't worry,
  // we'll also write a table in this example.
  ARROW_RETURN_NOT_OK(GenInitialFile());

使用 Arrow 檔案進行 I/O#

我們將逐步進行此操作，先讀取然後寫入，如下所示

讀取檔案
1. 開啟檔案
2. 將檔案繫結到 ipc::RecordBatchFileReader
3. 將檔案讀取到 RecordBatch
寫入檔案
1. 取得 io::FileOutputStream
2. 從 RecordBatch 寫入檔案

開啟檔案#

為了實際讀取檔案，我們需要取得某種指向它的方式。在 Arrow 中，這表示我們將取得一個 io::ReadableFile 物件 – 很像 ArrayBuilder 可以清除和建立新陣列，我們可以將其重新指派給新檔案，因此我們將在整個範例中使用此實例

  // First, we have to set up a ReadableFile object, which just lets us point our
  // readers to the right data on disk. We'll be reusing this object, and rebinding
  // it to multiple files throughout the example.
  std::shared_ptr<arrow::io::ReadableFile> infile;

io::ReadableFile 本身幾乎沒有作用 – 我們實際上是使用 io::ReadableFile::Open() 將其繫結到檔案。對於此處的目的，預設引數就足夠了

  // Get "test_in.arrow" into our file pointer
  ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open(
                                    "test_in.arrow", arrow::default_memory_pool()));

開啟 Arrow 檔案讀取器#

io::ReadableFile 太過通用，無法提供讀取 Arrow 檔案的所有功能。我們需要使用它來取得 ipc::RecordBatchFileReader 物件。此物件實作了讀取具有正確格式的 Arrow 檔案所需的所有邏輯。我們透過 ipc::RecordBatchFileReader::Open() 取得一個

  // Open up the file with the IPC features of the library, gives us a reader object.
  ARROW_ASSIGN_OR_RAISE(auto ipc_reader, arrow::ipc::RecordBatchFileReader::Open(infile));

將開啟的 Arrow 檔案讀取到 RecordBatch#

我們必須使用 RecordBatch 來讀取 Arrow 檔案，因此我們將取得一個 RecordBatch。一旦我們有了它，我們就可以實際讀取檔案。Arrow 檔案可以有多個 RecordBatches，因此我們必須傳遞索引。此檔案只有一個，因此傳遞 0

  // Using the reader, we can read Record Batches. Note that this is specific to IPC;
  // for other formats, we focus on Tables, but here, RecordBatches are used.
  std::shared_ptr<arrow::RecordBatch> rbatch;
  ARROW_ASSIGN_OR_RAISE(rbatch, ipc_reader->ReadRecordBatch(0));

準備 FileOutputStream#

對於輸出，我們需要一個 io::FileOutputStream。就像我們的 io::ReadableFile 一樣，我們將重複使用它，因此請做好準備。我們以與讀取時相同的方式開啟檔案

  // Just like with input, we get an object for the output file.
  std::shared_ptr<arrow::io::FileOutputStream> outfile;
  // Bind it to "test_out.arrow"
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_out.arrow"));

從 RecordBatch 寫入 Arrow 檔案#

現在，我們取得先前讀取到的 RecordBatch，並使用它以及我們的目標檔案來建立 ipc::RecordBatchWriter。ipc::RecordBatchWriter 需要兩件事

目標檔案
我們 RecordBatch 的 Schema (以防我們需要寫入更多相同格式的 RecordBatches。)

Schema 來自我們現有的 RecordBatch，目標檔案是我們剛建立的輸出串流。

  // Set up a writer with the output file -- and the schema! We're defining everything
  // here, loading to fire.
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::ipc::RecordBatchWriter> ipc_writer,
                        arrow::ipc::MakeFileWriter(outfile, rbatch->schema()));

我們可以只使用我們的 RecordBatch 呼叫 ipc::RecordBatchWriter::WriteRecordBatch() 來填滿我們的檔案

  // Write the record batch.
  ARROW_RETURN_NOT_OK(ipc_writer->WriteRecordBatch(*rbatch));

特別是對於 IPC，寫入器必須關閉，因為它預期可能會寫入多個批次。要做到這一點

  // Specifically for IPC, the writer needs to be explicitly closed.
  ARROW_RETURN_NOT_OK(ipc_writer->Close());

現在我們已經讀取和寫入了一個 IPC 檔案！

使用 CSV 進行 I/O#

我們將逐步進行此操作，先讀取然後寫入，如下所示

讀取檔案
1. 開啟檔案
2. 準備表格
3. 使用 csv::TableReader 讀取檔案
寫入檔案
1. 取得 io::FileOutputStream
2. 從 Table 寫入檔案

開啟 CSV 檔案#

對於 CSV 檔案，我們需要開啟一個 io::ReadableFile，就像 Arrow 檔案一樣，並重複使用我們之前的 io::ReadableFile 物件來執行此操作

  // Bind our input file to "test_in.csv"
  ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open("test_in.csv"));

準備表格#

CSV 可以讀取到 Table 中，因此宣告一個指向 Table 的指標

  std::shared_ptr<arrow::Table> csv_table;

將 CSV 檔案讀取到表格#

CSV 讀取器具有需要傳遞的選項結構 – 幸運的是，這些結構有我們可以直接傳遞的預設值。有關其他選項的參考，請前往此處：檔案格式。沒有任何特殊分隔符號且很小，因此我們可以使用預設值建立讀取器

  // The CSV reader has several objects for various options. For now, we'll use defaults.
  ARROW_ASSIGN_OR_RAISE(
      auto csv_reader,
      arrow::csv::TableReader::Make(
          arrow::io::default_io_context(), infile, arrow::csv::ReadOptions::Defaults(),
          arrow::csv::ParseOptions::Defaults(), arrow::csv::ConvertOptions::Defaults()));

在 CSV 讀取器準備就緒後，我們可以使用其 csv::TableReader::Read() 方法來填滿我們的 Table

  // Read the table.
  ARROW_ASSIGN_OR_RAISE(csv_table, csv_reader->Read())

從表格寫入 CSV 檔案#

將 CSV 寫入 Table 看起來與將 IPC 寫入 RecordBatch 完全相同，除了使用我們的 Table，以及使用 ipc::RecordBatchWriter::WriteTable() 而不是 ipc::RecordBatchWriter::WriteRecordBatch()。請注意，使用了相同的寫入器類別 – 我們使用 ipc::RecordBatchWriter::WriteTable() 進行寫入，因為我們有一個 Table。我們將以檔案為目標，使用我們 Table 的 Schema，然後寫入 Table

  // Bind our output file to "test_out.csv"
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_out.csv"));
  // The CSV writer has simpler defaults, review API documentation for more complex usage.
  ARROW_ASSIGN_OR_RAISE(auto csv_writer,
                        arrow::csv::MakeCSVWriter(outfile, csv_table->schema()));
  ARROW_RETURN_NOT_OK(csv_writer->WriteTable(*csv_table));
  // Not necessary, but a safe practice.
  ARROW_RETURN_NOT_OK(csv_writer->Close());

現在，我們已經讀取和寫入了一個 CSV 檔案！

使用 Parquet 進行檔案 I/O#

我們將逐步進行此操作，先讀取然後寫入，如下所示

讀取檔案
1. 開啟檔案
2. 準備 parquet::arrow::FileReader
3. 將檔案讀取到 Table
寫入檔案
1. 將 Table 寫入檔案

開啟 Parquet 檔案#

再次強調，此檔案格式 Parquet 需要一個 io::ReadableFile，我們已經有了，並且需要在檔案上呼叫 io::ReadableFile::Open() 方法

  // Bind our input file to "test_in.parquet"
  ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open("test_in.parquet"));

設定 Parquet 讀取器#

與往常一樣，我們需要一個讀取器來實際讀取檔案。我們一直在從 Arrow 命名空間取得每個檔案格式的讀取器。這次，我們進入 Parquet 命名空間以取得 parquet::arrow::FileReader

  std::unique_ptr<parquet::arrow::FileReader> reader;

現在，要設定我們的讀取器，我們呼叫 parquet::arrow::OpenFile()。是的，即使我們使用了 io::ReadableFile::Open()，這也是必要的。請注意，我們是透過參考傳遞 parquet::arrow::FileReader，而不是在輸出中指派給它

  // Note that Parquet's OpenFile() takes the reader by reference, rather than returning
  // a reader.
  PARQUET_ASSIGN_OR_THROW(reader,
                          parquet::arrow::OpenFile(infile, arrow::default_memory_pool()));

將 Parquet 檔案讀取到表格#

手中有準備好的 parquet::arrow::FileReader 後，我們可以讀取到 Table，但我們必須透過參考傳遞 Table，而不是輸出到它

  std::shared_ptr<arrow::Table> parquet_table;
  // Read the table.
  PARQUET_THROW_NOT_OK(reader->ReadTable(&parquet_table));

從表格寫入 Parquet 檔案#

對於單次寫入，寫入 Parquet 檔案不需要寫入器物件。相反地，我們提供表格，指向它將用於任何必要記憶體消耗的記憶體池，告訴它寫入位置，以及如果需要完全分割檔案的區塊大小

  // Parquet writing does not need a declared writer object. Just get the output
  // file bound, then pass in the table, memory pool, output, and chunk size for
  // breaking up the Table on-disk.
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_out.parquet"));
  PARQUET_THROW_NOT_OK(parquet::arrow::WriteTable(
      *parquet_table, arrow::default_memory_pool(), outfile, 5));

結束程式#

最後，我們只需傳回 Status::OK()，以便 main() 知道我們已完成，並且一切正常。就像第一個教學中一樣。

  return arrow::Status::OK();
}

有了這個，您已經在 Arrow 中讀取和寫入了 IPC、CSV 和 Parquet，並且可以正確載入資料和寫入輸出！現在，我們可以進入下一篇文章中使用運算函數處理資料。

請參閱以下內容以取得完整程式碼的副本

// (Doc section: Includes)
#include <arrow/api.h>
#include <arrow/csv/api.h>
#include <arrow/io/api.h>
#include <arrow/ipc/api.h>
#include <parquet/arrow/reader.h>
#include <parquet/arrow/writer.h>

#include <iostream>
// (Doc section: Includes)

// (Doc section: GenInitialFile)
arrow::Status GenInitialFile() {
  // Make a couple 8-bit integer arrays and a 16-bit integer array -- just like
  // basic Arrow example.
  arrow::Int8Builder int8builder;
  int8_t days_raw[5] = {1, 12, 17, 23, 28};
  ARROW_RETURN_NOT_OK(int8builder.AppendValues(days_raw, 5));
  std::shared_ptr<arrow::Array> days;
  ARROW_ASSIGN_OR_RAISE(days, int8builder.Finish());

  int8_t months_raw[5] = {1, 3, 5, 7, 1};
  ARROW_RETURN_NOT_OK(int8builder.AppendValues(months_raw, 5));
  std::shared_ptr<arrow::Array> months;
  ARROW_ASSIGN_OR_RAISE(months, int8builder.Finish());

  arrow::Int16Builder int16builder;
  int16_t years_raw[5] = {1990, 2000, 1995, 2000, 1995};
  ARROW_RETURN_NOT_OK(int16builder.AppendValues(years_raw, 5));
  std::shared_ptr<arrow::Array> years;
  ARROW_ASSIGN_OR_RAISE(years, int16builder.Finish());

  // Get a vector of our Arrays
  std::vector<std::shared_ptr<arrow::Array>> columns = {days, months, years};

  // Make a schema to initialize the Table with
  std::shared_ptr<arrow::Field> field_day, field_month, field_year;
  std::shared_ptr<arrow::Schema> schema;

  field_day = arrow::field("Day", arrow::int8());
  field_month = arrow::field("Month", arrow::int8());
  field_year = arrow::field("Year", arrow::int16());

  schema = arrow::schema({field_day, field_month, field_year});
  // With the schema and data, create a Table
  std::shared_ptr<arrow::Table> table;
  table = arrow::Table::Make(schema, columns);

  // Write out test files in IPC, CSV, and Parquet for the example to use.
  std::shared_ptr<arrow::io::FileOutputStream> outfile;
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_in.arrow"));
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::ipc::RecordBatchWriter> ipc_writer,
                        arrow::ipc::MakeFileWriter(outfile, schema));
  ARROW_RETURN_NOT_OK(ipc_writer->WriteTable(*table));
  ARROW_RETURN_NOT_OK(ipc_writer->Close());

  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_in.csv"));
  ARROW_ASSIGN_OR_RAISE(auto csv_writer,
                        arrow::csv::MakeCSVWriter(outfile, table->schema()));
  ARROW_RETURN_NOT_OK(csv_writer->WriteTable(*table));
  ARROW_RETURN_NOT_OK(csv_writer->Close());

  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_in.parquet"));
  PARQUET_THROW_NOT_OK(
      parquet::arrow::WriteTable(*table, arrow::default_memory_pool(), outfile, 5));

  return arrow::Status::OK();
}
// (Doc section: GenInitialFile)

// (Doc section: RunMain)
arrow::Status RunMain() {
  // (Doc section: RunMain)
  // (Doc section: Gen Files)
  // Generate initial files for each format with a helper function -- don't worry,
  // we'll also write a table in this example.
  ARROW_RETURN_NOT_OK(GenInitialFile());
  // (Doc section: Gen Files)

  // (Doc section: ReadableFile Definition)
  // First, we have to set up a ReadableFile object, which just lets us point our
  // readers to the right data on disk. We'll be reusing this object, and rebinding
  // it to multiple files throughout the example.
  std::shared_ptr<arrow::io::ReadableFile> infile;
  // (Doc section: ReadableFile Definition)
  // (Doc section: Arrow ReadableFile Open)
  // Get "test_in.arrow" into our file pointer
  ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open(
                                    "test_in.arrow", arrow::default_memory_pool()));
  // (Doc section: Arrow ReadableFile Open)
  // (Doc section: Arrow Read Open)
  // Open up the file with the IPC features of the library, gives us a reader object.
  ARROW_ASSIGN_OR_RAISE(auto ipc_reader, arrow::ipc::RecordBatchFileReader::Open(infile));
  // (Doc section: Arrow Read Open)
  // (Doc section: Arrow Read)
  // Using the reader, we can read Record Batches. Note that this is specific to IPC;
  // for other formats, we focus on Tables, but here, RecordBatches are used.
  std::shared_ptr<arrow::RecordBatch> rbatch;
  ARROW_ASSIGN_OR_RAISE(rbatch, ipc_reader->ReadRecordBatch(0));
  // (Doc section: Arrow Read)

  // (Doc section: Arrow Write Open)
  // Just like with input, we get an object for the output file.
  std::shared_ptr<arrow::io::FileOutputStream> outfile;
  // Bind it to "test_out.arrow"
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_out.arrow"));
  // (Doc section: Arrow Write Open)
  // (Doc section: Arrow Writer)
  // Set up a writer with the output file -- and the schema! We're defining everything
  // here, loading to fire.
  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::ipc::RecordBatchWriter> ipc_writer,
                        arrow::ipc::MakeFileWriter(outfile, rbatch->schema()));
  // (Doc section: Arrow Writer)
  // (Doc section: Arrow Write)
  // Write the record batch.
  ARROW_RETURN_NOT_OK(ipc_writer->WriteRecordBatch(*rbatch));
  // (Doc section: Arrow Write)
  // (Doc section: Arrow Close)
  // Specifically for IPC, the writer needs to be explicitly closed.
  ARROW_RETURN_NOT_OK(ipc_writer->Close());
  // (Doc section: Arrow Close)

  // (Doc section: CSV Read Open)
  // Bind our input file to "test_in.csv"
  ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open("test_in.csv"));
  // (Doc section: CSV Read Open)
  // (Doc section: CSV Table Declare)
  std::shared_ptr<arrow::Table> csv_table;
  // (Doc section: CSV Table Declare)
  // (Doc section: CSV Reader Make)
  // The CSV reader has several objects for various options. For now, we'll use defaults.
  ARROW_ASSIGN_OR_RAISE(
      auto csv_reader,
      arrow::csv::TableReader::Make(
          arrow::io::default_io_context(), infile, arrow::csv::ReadOptions::Defaults(),
          arrow::csv::ParseOptions::Defaults(), arrow::csv::ConvertOptions::Defaults()));
  // (Doc section: CSV Reader Make)
  // (Doc section: CSV Read)
  // Read the table.
  ARROW_ASSIGN_OR_RAISE(csv_table, csv_reader->Read())
  // (Doc section: CSV Read)

  // (Doc section: CSV Write)
  // Bind our output file to "test_out.csv"
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_out.csv"));
  // The CSV writer has simpler defaults, review API documentation for more complex usage.
  ARROW_ASSIGN_OR_RAISE(auto csv_writer,
                        arrow::csv::MakeCSVWriter(outfile, csv_table->schema()));
  ARROW_RETURN_NOT_OK(csv_writer->WriteTable(*csv_table));
  // Not necessary, but a safe practice.
  ARROW_RETURN_NOT_OK(csv_writer->Close());
  // (Doc section: CSV Write)

  // (Doc section: Parquet Read Open)
  // Bind our input file to "test_in.parquet"
  ARROW_ASSIGN_OR_RAISE(infile, arrow::io::ReadableFile::Open("test_in.parquet"));
  // (Doc section: Parquet Read Open)
  // (Doc section: Parquet FileReader)
  std::unique_ptr<parquet::arrow::FileReader> reader;
  // (Doc section: Parquet FileReader)
  // (Doc section: Parquet OpenFile)
  // Note that Parquet's OpenFile() takes the reader by reference, rather than returning
  // a reader.
  PARQUET_ASSIGN_OR_THROW(reader,
                          parquet::arrow::OpenFile(infile, arrow::default_memory_pool()));
  // (Doc section: Parquet OpenFile)

  // (Doc section: Parquet Read)
  std::shared_ptr<arrow::Table> parquet_table;
  // Read the table.
  PARQUET_THROW_NOT_OK(reader->ReadTable(&parquet_table));
  // (Doc section: Parquet Read)

  // (Doc section: Parquet Write)
  // Parquet writing does not need a declared writer object. Just get the output
  // file bound, then pass in the table, memory pool, output, and chunk size for
  // breaking up the Table on-disk.
  ARROW_ASSIGN_OR_RAISE(outfile, arrow::io::FileOutputStream::Open("test_out.parquet"));
  PARQUET_THROW_NOT_OK(parquet::arrow::WriteTable(
      *parquet_table, arrow::default_memory_pool(), outfile, 5));
  // (Doc section: Parquet Write)
  // (Doc section: Return)
  return arrow::Status::OK();
}
// (Doc section: Return)

// (Doc section: Main)
int main() {
  arrow::Status st = RunMain();
  if (!st.ok()) {
    std::cerr << st << std::endl;
    return 1;
  }
  return 0;
}
// (Doc section: Main)