Arrow Flight RPC#

Arrow Flight 是一個基於 Arrow 資料的高效能資料服務 RPC 框架,建立於 gRPCIPC 格式 之上。

Flight 圍繞 Arrow 記錄批次的串流組織,可以從另一個服務下載或上傳到另一個服務。一組元資料方法提供串流的探索和內省,以及實作應用程式特定方法的能力。

方法和訊息線路格式由 Protobuf 定義,使與可能單獨支援 gRPC 和 Arrow 但不支援 Flight 的客戶端具有互通性。但是,Flight 實作包含進一步的最佳化,以避免 Protobuf 使用中的額外負擔(主要圍繞避免過多的記憶體複製)。

RPC 方法和請求模式#

Flight 定義了一組 RPC 方法,用於上傳/下載資料、檢索關於資料串流的元資料、列出可用的資料串流,以及用於實作應用程式特定的 RPC 方法。Flight 服務實作了這些方法的一些子集,而 Flight 客戶端可以呼叫這些方法中的任何一個。

資料串流由描述符(FlightDescriptor 訊息)識別,描述符可以是路徑或任意二進位命令。例如,描述符可以編碼 SQL 查詢、分散式檔案系統上檔案的路徑,甚至是 pickled Python 物件;應用程式可以根據自己的需要使用此訊息。

因此,一個 Flight 客戶端可以連接到任何服務並執行基本操作。為了促進這一點,Flight 服務預期支援一些常見的請求模式,如下所述。當然,應用程式可以忽略相容性,而只是將 Flight RPC 方法視為用於其自身目的的底層建構區塊。

有關所涉及的方法和訊息的完整詳細資訊,請參閱 Protocol Buffer 定義

下載資料#

希望下載資料的客戶端將會

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: GetFlightInfo(FlightDescriptor) Metadata Server->>Client: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

透過 DoGet 檢索資料。#

  1. 建構或取得他們感興趣的資料集的 FlightDescriptor

    客戶端可能已經知道他們想要的描述符,或者他們可以使用像 ListFlights 這樣的方法來發現它們。

  2. 呼叫 GetFlightInfo(FlightDescriptor) 以取得 FlightInfo 訊息。

    Flight 不要求資料與元資料位於同一伺服器上。因此,FlightInfo 包含有關資料位置的詳細資訊,因此客戶端可以從適當的伺服器提取資料。這被編碼為 FlightInfo 內的一系列 FlightEndpoint 訊息。每個端點代表包含部分回應資料的位置。

    端點包含可以從中檢索此資料的位置(伺服器位址)列表,以及 Ticket,這是一個不透明的二進位權杖,伺服器將使用它來識別正在請求的資料。

    如果 FlightInfo.ordered 為 true,則表示來自不同端點的資料之間存在某種順序。客戶端應產生與每個端點傳回的資料以從前到後的順序串連時相同的結果。

    如果 FlightInfo.ordered 為 false,則客戶端可以以任意順序傳回來自任何端點的資料。來自任何特定端點的資料必須按順序傳回,但來自不同端點的資料可以交錯以允許平行提取。

    請注意,由於某些客戶端可能會忽略 FlightInfo.ordered,因此如果排序很重要且無法確保客戶端支援,則伺服器應傳回單個端點。

    回應還包含其他元資料,例如綱要,以及可選的資料集大小估計值。

  3. 使用伺服器傳回的每個端點。

    要使用端點,客戶端應連接到端點中的一個位置,然後使用端點中的票證呼叫 DoGet(Ticket)。這將為客戶端提供 Arrow 記錄批次的串流。

    如果伺服器希望指示資料位於本機伺服器上而不是不同的位置,則它可以傳回空的位置列表。然後,客戶端可以重複使用與原始伺服器的現有連線來提取資料。否則,客戶端必須連接到指示的位置之一。

    伺服器可能會將「自身」列為與其他伺服器位置並列的位置。通常,這需要伺服器知道其公用位址,但它也可以使用特殊的 URI 字串 arrow-flight-reuse-connection://? 來告訴客戶端,他們可以重複使用與同一伺服器的現有連線,而無需能夠命名自身。請參閱下面的 連線重用

    透過這種方式,端點內的位置也可以被認為執行了前瞻性負載平衡或服務發現功能。端點可以代表已分割或以其他方式分散的資料。

    客戶端必須使用所有端點才能檢索完整的資料集。客戶端可以以任何順序使用端點,甚至可以平行使用,或者將端點分發到多台機器上以供使用;這取決於應用程式的實作。客戶端也可以使用 FlightInfo.ordered。請參閱前面的項目以取得 FlightInfo.ordered 的詳細資訊。

    每個端點都可能具有到期時間 (FlightEndpoint.expiration_time)。如果端點具有到期時間,則客戶端可以透過 DoGet 多次取得資料,直到達到到期時間。否則,是否可以重試 DoGet 請求由應用程式定義。到期時間表示為 google.protobuf.Timestamp

    如果到期時間很短,客戶端可以透過 RenewFlightEndpoint 動作延長到期時間。客戶端需要使用 DoAction 以及 RenewFlightEndpoint 動作類型來延長到期時間。Action.body 必須是具有要續訂的 FlightEndpointRenewFlightEndpointRequest

    客戶端可以透過 CancelFlightInfo 動作取消傳回的 FlightInfo。客戶端需要使用 DoAction 以及 CancelFlightInfo 動作類型來取消 FlightInfo

透過執行繁重查詢下載資料#

客戶端可能需要請求繁重查詢才能下載資料。但是,GetFlightInfo 在查詢完成之前不會傳回,因此客戶端會被封鎖。在這種情況下,客戶端可以使用 PollFlightInfo 而不是 GetFlightInfo

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Metadata Server participant Data Server Client->>Metadata Server: PollFlightInfo(FlightDescriptor) Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor') Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor'', ...} Client->>Metadata Server: PollFlightInfo(FlightDescriptor'') Metadata Server->>Client: PollInfo{descriptor: null, info: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]} Note over Client, Data Server: This may be parallelized Note over Client, Data Server: Some endpoints may be processed while polling loop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightData end

透過 PollFlightInfo 輪詢長時間執行的查詢。#

  1. 如同之前一樣,建構或取得 FlightDescriptor

  2. 呼叫 PollFlightInfo(FlightDescriptor) 以取得 PollInfo 訊息。

    伺服器應在第一次呼叫時盡快回應。因此,客戶端不應等待第一個 PollInfo 回應。

    如果查詢尚未完成,PollInfo.flight_descriptor 具有 FlightDescriptor。客戶端應使用描述符(而不是原始的 FlightDescriptor)來呼叫下一個 PollFlightInfo()。伺服器應辨識 PollInfo.flight_descriptor,即使它不一定是最新的一個,以防客戶端錯過了中間的更新。

    如果查詢已完成,則 PollInfo.flight_descriptor 未設定。

    PollInfo.info 是目前可用的結果。每次都是完整的 FlightInfo,而不僅僅是先前和目前 FlightInfo 之間的差異。伺服器每次都應僅附加到 PollInfo.info 中的端點。因此,即使查詢尚未完成,客戶端也可以使用 PollInfo.info 中的 Ticket 執行 DoGet(Ticket)FlightInfo.ordered 也有效。

    伺服器不應回應,直到結果與上次不同為止。這樣,客戶端可以「長輪詢」更新,而無需不斷發出請求。如果需要,客戶端可以設定短暫的逾時時間以避免封鎖呼叫。

    PollInfo.progress 可能已設定。它代表查詢的進度。如果已設定,則值必須在 [0.0, 1.0] 範圍內。該值不一定是單調或非遞減的。伺服器可以僅透過更新 PollInfo.progress 值來回應,儘管它不應向客戶端發送垃圾郵件般的更新。

    PollInfo.timestamp 是此請求的到期時間。在此時間之後,伺服器可能不再接受輪詢描述符,並且查詢可能會被取消。這可能會在呼叫 PollFlightInfo 時更新。到期時間表示為 google.protobuf.Timestamp

    客戶端可以透過 CancelFlightInfo 動作取消查詢。

    如果查詢失敗,伺服器應傳回錯誤狀態而不是回應。客戶端不應輪詢請求,除非是 TIMED_OUTUNAVAILABLE,它們可能不是源自伺服器。

  3. 如同之前一樣,使用伺服器傳回的每個端點。

上傳資料#

若要上傳資料,客戶端將會

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoPut(FlightData) Client->>Server: stream of FlightData Server->>Client: PutResult{app_metadata}

透過 DoPut 上傳資料。#

  1. 如同之前一樣,建構或取得 FlightDescriptor

  2. 呼叫 DoPut(FlightData) 並上傳 Arrow 記錄批次的串流。

    FlightDescriptor 包含在第一條訊息中,以便伺服器可以識別資料集。

DoPut 允許伺服器將回應訊息連同自訂元資料傳送回客戶端。這可用於實作諸如可恢復寫入之類的功能(例如,伺服器可以定期傳送訊息,指示到目前為止已提交了多少列)。

交換資料#

某些使用案例可能需要在單個呼叫中上傳和下載資料。雖然可以使用多個呼叫來模擬,但如果應用程式是有狀態的,這可能會很困難。例如,應用程式可能希望實作一個呼叫,其中客戶端上傳資料,伺服器以該資料的轉換作為回應;如果使用 DoGetDoPut 實作,這將需要是有狀態的。相反,DoExchange 允許將其作為單個呼叫實作。客戶端將會

%% Licensed to the Apache Software Foundation (ASF) under one %% or more contributor license agreements. See the NOTICE file %% distributed with this work for additional information %% regarding copyright ownership. The ASF licenses this file %% to you under the Apache License, Version 2.0 (the %% "License"); you may not use this file except in compliance %% with the License. You may obtain a copy of the License at %% %% http://www.apache.org/licenses/LICENSE-2.0 %% %% Unless required by applicable law or agreed to in writing, %% software distributed under the License is distributed on an %% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY %% KIND, either express or implied. See the License for the %% specific language governing permissions and limitations %% under the License. sequenceDiagram autonumber participant Client participant Server Note right of Client: The first FlightData includes a FlightDescriptor Client->>Server: DoExchange(FlightData) par [Client sends data] Client->>Server: stream of FlightData and [Server sends data] Server->>Client: stream of FlightData end

使用 DoExchange 的複雜資料流。#

  1. 如同之前一樣,建構或取得 FlightDescriptor

  2. 呼叫 DoExchange(FlightData)

    如同 DoPut 一樣,FlightDescriptor 包含在第一條訊息中。此時,客戶端和伺服器都可以同時將資料串流傳輸到另一端。

身份驗證#

Flight 支援各種身份驗證方法,應用程式可以根據自己的需求自訂這些方法。

「交握」身份驗證

這分為兩個部分實作。在連線時,客戶端呼叫 Handshake RPC 方法,應用程式定義的身份驗證處理常式可以與伺服器上的對應方交換任意數量的訊息。然後,處理常式提供二進位權杖。然後,Flight 客戶端將在所有未來呼叫的標頭中包含此權杖,伺服器身份驗證處理常式將驗證該權杖。

應用程式可以使用其中的任何部分;例如,他們可以忽略初始交握,並在每次呼叫時傳送外部取得的權杖(例如,持有者權杖),或者他們可以在交握期間建立信任,並且不驗證每次呼叫的權杖,將連線視為有狀態的(「登入」模式)。

警告

除非在每次呼叫時都驗證權杖,否則此模式不安全,尤其是在存在第 7 層負載平衡器(gRPC 常見的情況)或 gRPC 透明地重新連線客戶端的情況下。

基於標頭/基於中介軟體的身份驗證

客戶端可以在呼叫中包含自訂標頭。然後可以實作自訂中介軟體,以在伺服器端驗證和接受/拒絕呼叫。

相互 TLS (mTLS)

客戶端在連線建立期間提供憑證,伺服器會驗證該憑證。應用程式不需要實作任何身份驗證程式碼,但必須佈建和分發憑證。

這可能僅在某些實作中可用,並且僅在也啟用 TLS 時才可用。

某些 Flight 實作也可能公開底層 gRPC API,在這種情況下,任何 gRPC 支援的身份驗證方法 均可用。

位置 URI#

Flight 主要根據以下 Protobuf 和 gRPC 規範定義,但 Arrow 實作也可能支援替代傳輸(請參閱 Flight RPC)。客戶端和伺服器需要知道給定位置中的 URI 使用哪種傳輸,因此 Flight 實作應針對給定的傳輸使用以下 URI 方案

傳輸

URI 方案

gRPC(純文字)

grpc: 或 grpc+tcp

gRPC (TLS)

grpc+tls

gRPC(Unix 網域套接字)

grpc+unix

(重複使用連線)

arrow-flight-reuse-connection

UCX(純文字)(1)

ucx

註解

  • (1) Flight UCX 傳輸已在 19.0.0 版本中棄用。分離式 IPC 協定 章節提出了一種替代解決方案。

連線重用#

上面的「重複使用連線」不是特定的傳輸。相反,它表示客戶端可以嘗試對同一伺服器(以及透過相同的連線)執行 DoGet,它最初從該伺服器取得 FlightInfo(即,它對其呼叫了 GetFlightInfo)。這與未傳回特定 Location 時的解釋方式相同。

這允許伺服器將「自身」作為一個可能的位置傳回以提取資料,而無需知道其自身的公用位址,這在部署中可能很有用,在這種部署中,知道這一點可能很困難或不可能。例如,開發人員可能會將雲端環境中的遠端服務轉發到其本機;在這種情況下,遠端服務將無法知道正在透過其存取的本機主機名稱和埠。

出於相容性原因,URI 應始終為 arrow-flight-reuse-connection://?,並帶有尾隨的空查詢字串。Java 的 URI 實作不接受 scheme:scheme://,而 C++ 的實作不接受空字串,因此顯而易見的候選者不相容。所選的表示形式可以由這兩種實作以及 Go 的 net/url 和 Python 的 urllib.parse 剖析。

錯誤處理#

Arrow Flight 定義了自己的錯誤代碼集。實作在語言之間有所不同(例如,在 C++ 中,「未實作」是通用的 Arrow 錯誤狀態,而在 Java 中是 Flight 特定的例外),但公開了以下集合

錯誤代碼

描述

UNKNOWN

未知的錯誤。如果沒有其他錯誤適用,則為預設值。

INTERNAL

服務實作內部發生錯誤。

INVALID_ARGUMENT

客戶端將無效的引數傳遞給 RPC。

TIMED_OUT

操作超過了逾時時間或截止期限。

NOT_FOUND

找不到請求的資源(動作、資料串流)。

ALREADY_EXISTS

資源已存在。

CANCELLED

操作已取消(由客戶端或伺服器取消)。

UNAUTHENTICATED

客戶端未通過身份驗證。

UNAUTHORIZED

客戶端已通過身份驗證,但不具有請求操作的權限。

UNIMPLEMENTED

RPC 未實作。

UNAVAILABLE

伺服器不可用。可能會由客戶端針對連線問題發出。

外部資源#

Protocol Buffer 定義#

  1/*
  2 * Licensed to the Apache Software Foundation (ASF) under one
  3 * or more contributor license agreements.  See the NOTICE file
  4 * distributed with this work for additional information
  5 * regarding copyright ownership.  The ASF licenses this file
  6 * to you under the Apache License, Version 2.0 (the
  7 * "License"); you may not use this file except in compliance
  8 * with the License.  You may obtain a copy of the License at
  9 * <p>
 10 * http://www.apache.org/licenses/LICENSE-2.0
 11 * <p>
 12 * Unless required by applicable law or agreed to in writing, software
 13 * distributed under the License is distributed on an "AS IS" BASIS,
 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 * See the License for the specific language governing permissions and
 16 * limitations under the License.
 17 */
 18
 19syntax = "proto3";
 20import "google/protobuf/timestamp.proto";
 21
 22option java_package = "org.apache.arrow.flight.impl";
 23option go_package = "github.com/apache/arrow-go/arrow/flight/gen/flight";
 24option csharp_namespace = "Apache.Arrow.Flight.Protocol";
 25
 26package arrow.flight.protocol;
 27
 28/*
 29 * A flight service is an endpoint for retrieving or storing Arrow data. A
 30 * flight service can expose one or more predefined endpoints that can be
 31 * accessed using the Arrow Flight Protocol. Additionally, a flight service
 32 * can expose a set of actions that are available.
 33 */
 34service FlightService {
 35
 36  /*
 37   * Handshake between client and server. Depending on the server, the
 38   * handshake may be required to determine the token that should be used for
 39   * future operations. Both request and response are streams to allow multiple
 40   * round-trips depending on auth mechanism.
 41   */
 42  rpc Handshake(stream HandshakeRequest) returns (stream HandshakeResponse) {}
 43
 44  /*
 45   * Get a list of available streams given a particular criteria. Most flight
 46   * services will expose one or more streams that are readily available for
 47   * retrieval. This api allows listing the streams available for
 48   * consumption. A user can also provide a criteria. The criteria can limit
 49   * the subset of streams that can be listed via this interface. Each flight
 50   * service allows its own definition of how to consume criteria.
 51   */
 52  rpc ListFlights(Criteria) returns (stream FlightInfo) {}
 53
 54  /*
 55   * For a given FlightDescriptor, get information about how the flight can be
 56   * consumed. This is a useful interface if the consumer of the interface
 57   * already can identify the specific flight to consume. This interface can
 58   * also allow a consumer to generate a flight stream through a specified
 59   * descriptor. For example, a flight descriptor might be something that
 60   * includes a SQL statement or a Pickled Python operation that will be
 61   * executed. In those cases, the descriptor will not be previously available
 62   * within the list of available streams provided by ListFlights but will be
 63   * available for consumption for the duration defined by the specific flight
 64   * service.
 65   */
 66  rpc GetFlightInfo(FlightDescriptor) returns (FlightInfo) {}
 67
 68  /*
 69   * For a given FlightDescriptor, start a query and get information
 70   * to poll its execution status. This is a useful interface if the
 71   * query may be a long-running query. The first PollFlightInfo call
 72   * should return as quickly as possible. (GetFlightInfo doesn't
 73   * return until the query is complete.)
 74   *
 75   * A client can consume any available results before
 76   * the query is completed. See PollInfo.info for details.
 77   *
 78   * A client can poll the updated query status by calling
 79   * PollFlightInfo() with PollInfo.flight_descriptor. A server
 80   * should not respond until the result would be different from last
 81   * time. That way, the client can "long poll" for updates
 82   * without constantly making requests. Clients can set a short timeout
 83   * to avoid blocking calls if desired.
 84   *
 85   * A client can't use PollInfo.flight_descriptor after
 86   * PollInfo.expiration_time passes. A server might not accept the
 87   * retry descriptor anymore and the query may be cancelled.
 88   *
 89   * A client may use the CancelFlightInfo action with
 90   * PollInfo.info to cancel the running query.
 91   */
 92  rpc PollFlightInfo(FlightDescriptor) returns (PollInfo) {}
 93
 94  /*
 95   * For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema
 96   * This is used when a consumer needs the Schema of flight stream. Similar to
 97   * GetFlightInfo this interface may generate a new flight that was not previously
 98   * available in ListFlights.
 99   */
100   rpc GetSchema(FlightDescriptor) returns (SchemaResult) {}
101
102  /*
103   * Retrieve a single stream associated with a particular descriptor
104   * associated with the referenced ticket. A Flight can be composed of one or
105   * more streams where each stream can be retrieved using a separate opaque
106   * ticket that the flight service uses for managing a collection of streams.
107   */
108  rpc DoGet(Ticket) returns (stream FlightData) {}
109
110  /*
111   * Push a stream to the flight service associated with a particular
112   * flight stream. This allows a client of a flight service to upload a stream
113   * of data. Depending on the particular flight service, a client consumer
114   * could be allowed to upload a single stream per descriptor or an unlimited
115   * number. In the latter, the service might implement a 'seal' action that
116   * can be applied to a descriptor once all streams are uploaded.
117   */
118  rpc DoPut(stream FlightData) returns (stream PutResult) {}
119
120  /*
121   * Open a bidirectional data channel for a given descriptor. This
122   * allows clients to send and receive arbitrary Arrow data and
123   * application-specific metadata in a single logical stream. In
124   * contrast to DoGet/DoPut, this is more suited for clients
125   * offloading computation (rather than storage) to a Flight service.
126   */
127  rpc DoExchange(stream FlightData) returns (stream FlightData) {}
128
129  /*
130   * Flight services can support an arbitrary number of simple actions in
131   * addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut
132   * operations that are potentially available. DoAction allows a flight client
133   * to do a specific action against a flight service. An action includes
134   * opaque request and response objects that are specific to the type action
135   * being undertaken.
136   */
137  rpc DoAction(Action) returns (stream Result) {}
138
139  /*
140   * A flight service exposes all of the available action types that it has
141   * along with descriptions. This allows different flight consumers to
142   * understand the capabilities of the flight service.
143   */
144  rpc ListActions(Empty) returns (stream ActionType) {}
145}
146
147/*
148 * The request that a client provides to a server on handshake.
149 */
150message HandshakeRequest {
151
152  /*
153   * A defined protocol version
154   */
155  uint64 protocol_version = 1;
156
157  /*
158   * Arbitrary auth/handshake info.
159   */
160  bytes payload = 2;
161}
162
163message HandshakeResponse {
164
165  /*
166   * A defined protocol version
167   */
168  uint64 protocol_version = 1;
169
170  /*
171   * Arbitrary auth/handshake info.
172   */
173  bytes payload = 2;
174}
175
176/*
177 * A message for doing simple auth.
178 */
179message BasicAuth {
180  string username = 2;
181  string password = 3;
182}
183
184message Empty {}
185
186/*
187 * Describes an available action, including both the name used for execution
188 * along with a short description of the purpose of the action.
189 */
190message ActionType {
191  string type = 1;
192  string description = 2;
193}
194
195/*
196 * A service specific expression that can be used to return a limited set
197 * of available Arrow Flight streams.
198 */
199message Criteria {
200  bytes expression = 1;
201}
202
203/*
204 * An opaque action specific for the service.
205 */
206message Action {
207  string type = 1;
208  bytes body = 2;
209}
210
211/*
212 * An opaque result returned after executing an action.
213 */
214message Result {
215  bytes body = 1;
216}
217
218/*
219 * Wrap the result of a getSchema call
220 */
221message SchemaResult {
222  // The schema of the dataset in its IPC form:
223  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
224  //   4 bytes - the byte length of the payload
225  //   a flatbuffer Message whose header is the Schema
226  bytes schema = 1;
227}
228
229/*
230 * The name or tag for a Flight. May be used as a way to retrieve or generate
231 * a flight or be used to expose a set of previously defined flights.
232 */
233message FlightDescriptor {
234
235  /*
236   * Describes what type of descriptor is defined.
237   */
238  enum DescriptorType {
239
240    // Protobuf pattern, not used.
241    UNKNOWN = 0;
242
243    /*
244     * A named path that identifies a dataset. A path is composed of a string
245     * or list of strings describing a particular dataset. This is conceptually
246     *  similar to a path inside a filesystem.
247     */
248    PATH = 1;
249
250    /*
251     * An opaque command to generate a dataset.
252     */
253    CMD = 2;
254  }
255
256  DescriptorType type = 1;
257
258  /*
259   * Opaque value used to express a command. Should only be defined when
260   * type = CMD.
261   */
262  bytes cmd = 2;
263
264  /*
265   * List of strings identifying a particular dataset. Should only be defined
266   * when type = PATH.
267   */
268  repeated string path = 3;
269}
270
271/*
272 * The access coordinates for retrieval of a dataset. With a FlightInfo, a
273 * consumer is able to determine how to retrieve a dataset.
274 */
275message FlightInfo {
276  // The schema of the dataset in its IPC form:
277  //   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix
278  //   4 bytes - the byte length of the payload
279  //   a flatbuffer Message whose header is the Schema
280  bytes schema = 1;
281
282  /*
283   * The descriptor associated with this info.
284   */
285  FlightDescriptor flight_descriptor = 2;
286
287  /*
288   * A list of endpoints associated with the flight. To consume the
289   * whole flight, all endpoints (and hence all Tickets) must be
290   * consumed. Endpoints can be consumed in any order.
291   *
292   * In other words, an application can use multiple endpoints to
293   * represent partitioned data.
294   *
295   * If the returned data has an ordering, an application can use
296   * "FlightInfo.ordered = true" or should return the all data in a
297   * single endpoint. Otherwise, there is no ordering defined on
298   * endpoints or the data within.
299   *
300   * A client can read ordered data by reading data from returned
301   * endpoints, in order, from front to back.
302   *
303   * Note that a client may ignore "FlightInfo.ordered = true". If an
304   * ordering is important for an application, an application must
305   * choose one of them:
306   *
307   * * An application requires that all clients must read data in
308   *   returned endpoints order.
309   * * An application must return the all data in a single endpoint.
310   */
311  repeated FlightEndpoint endpoint = 3;
312
313  // Set these to -1 if unknown.
314  int64 total_records = 4;
315  int64 total_bytes = 5;
316
317  /*
318   * FlightEndpoints are in the same order as the data.
319   */
320  bool ordered = 6;
321
322  /*
323   * Application-defined metadata.
324   *
325   * There is no inherent or required relationship between this
326   * and the app_metadata fields in the FlightEndpoints or resulting
327   * FlightData messages. Since this metadata is application-defined,
328   * a given application could define there to be a relationship,
329   * but there is none required by the spec.
330   */
331  bytes app_metadata = 7;
332}
333
334/*
335 * The information to process a long-running query.
336 */
337message PollInfo {
338  /*
339   * The currently available results.
340   *
341   * If "flight_descriptor" is not specified, the query is complete
342   * and "info" specifies all results. Otherwise, "info" contains
343   * partial query results.
344   *
345   * Note that each PollInfo response contains a complete
346   * FlightInfo (not just the delta between the previous and current
347   * FlightInfo).
348   *
349   * Subsequent PollInfo responses may only append new endpoints to
350   * info.
351   *
352   * Clients can begin fetching results via DoGet(Ticket) with the
353   * ticket in the info before the query is
354   * completed. FlightInfo.ordered is also valid.
355   */
356  FlightInfo info = 1;
357
358  /*
359   * The descriptor the client should use on the next try.
360   * If unset, the query is complete.
361   */
362  FlightDescriptor flight_descriptor = 2;
363
364  /*
365   * Query progress. If known, must be in [0.0, 1.0] but need not be
366   * monotonic or nondecreasing. If unknown, do not set.
367   */
368  optional double progress = 3;
369
370  /*
371   * Expiration time for this request. After this passes, the server
372   * might not accept the retry descriptor anymore (and the query may
373   * be cancelled). This may be updated on a call to PollFlightInfo.
374   */
375  google.protobuf.Timestamp expiration_time = 4;
376}
377
378/*
379 * The request of the CancelFlightInfo action.
380 *
381 * The request should be stored in Action.body.
382 */
383message CancelFlightInfoRequest {
384  FlightInfo info = 1;
385}
386
387/*
388 * The result of a cancel operation.
389 *
390 * This is used by CancelFlightInfoResult.status.
391 */
392enum CancelStatus {
393  // The cancellation status is unknown. Servers should avoid using
394  // this value (send a NOT_FOUND error if the requested query is
395  // not known). Clients can retry the request.
396  CANCEL_STATUS_UNSPECIFIED = 0;
397  // The cancellation request is complete. Subsequent requests with
398  // the same payload may return CANCELLED or a NOT_FOUND error.
399  CANCEL_STATUS_CANCELLED = 1;
400  // The cancellation request is in progress. The client may retry
401  // the cancellation request.
402  CANCEL_STATUS_CANCELLING = 2;
403  // The query is not cancellable. The client should not retry the
404  // cancellation request.
405  CANCEL_STATUS_NOT_CANCELLABLE = 3;
406}
407
408/*
409 * The result of the CancelFlightInfo action.
410 *
411 * The result should be stored in Result.body.
412 */
413message CancelFlightInfoResult {
414  CancelStatus status = 1;
415}
416
417/*
418 * An opaque identifier that the service can use to retrieve a particular
419 * portion of a stream.
420 *
421 * Tickets are meant to be single use. It is an error/application-defined
422 * behavior to reuse a ticket.
423 */
424message Ticket {
425  bytes ticket = 1;
426}
427
428/*
429 * A location where a Flight service will accept retrieval of a particular
430 * stream given a ticket.
431 */
432message Location {
433  string uri = 1;
434}
435
436/*
437 * A particular stream or split associated with a flight.
438 */
439message FlightEndpoint {
440
441  /*
442   * Token used to retrieve this stream.
443   */
444  Ticket ticket = 1;
445
446  /*
447   * A list of URIs where this ticket can be redeemed via DoGet().
448   *
449   * If the list is empty, the expectation is that the ticket can only
450   * be redeemed on the current service where the ticket was
451   * generated.
452   *
453   * If the list is not empty, the expectation is that the ticket can be
454   * redeemed at any of the locations, and that the data returned will be
455   * equivalent. In this case, the ticket may only be redeemed at one of the
456   * given locations, and not (necessarily) on the current service. If one
457   * of the given locations is "arrow-flight-reuse-connection://?", the
458   * client may redeem the ticket on the service where the ticket was
459   * generated (i.e., the same as above), in addition to the other
460   * locations. (This URI was chosen to maximize compatibility, as 'scheme:'
461   * or 'scheme://' are not accepted by Java's java.net.URI.)
462   *
463   * In other words, an application can use multiple locations to
464   * represent redundant and/or load balanced services.
465   */
466  repeated Location location = 2;
467
468  /*
469   * Expiration time of this stream. If present, clients may assume
470   * they can retry DoGet requests. Otherwise, it is
471   * application-defined whether DoGet requests may be retried.
472   */
473  google.protobuf.Timestamp expiration_time = 3;
474
475  /*
476   * Application-defined metadata.
477   *
478   * There is no inherent or required relationship between this
479   * and the app_metadata fields in the FlightInfo or resulting
480   * FlightData messages. Since this metadata is application-defined,
481   * a given application could define there to be a relationship,
482   * but there is none required by the spec.
483   */
484  bytes app_metadata = 4;
485}
486
487/*
488 * The request of the RenewFlightEndpoint action.
489 *
490 * The request should be stored in Action.body.
491 */
492message RenewFlightEndpointRequest {
493  FlightEndpoint endpoint = 1;
494}
495
496/*
497 * A batch of Arrow data as part of a stream of batches.
498 */
499message FlightData {
500
501  /*
502   * The descriptor of the data. This is only relevant when a client is
503   * starting a new DoPut stream.
504   */
505  FlightDescriptor flight_descriptor = 1;
506
507  /*
508   * Header for message data as described in Message.fbs::Message.
509   */
510  bytes data_header = 2;
511
512  /*
513   * Application-defined metadata.
514   */
515  bytes app_metadata = 3;
516
517  /*
518   * The actual batch of Arrow data. Preferably handled with minimal-copies
519   * coming last in the definition to help with sidecar patterns (it is
520   * expected that some implementations will fetch this field off the wire
521   * with specialized code to avoid extra memory copies).
522   */
523  bytes data_body = 1000;
524}
525
526/**
527 * The response message associated with the submission of a DoPut.
528 */
529message PutResult {
530  bytes app_metadata = 1;
531}
532
533/*
534 * EXPERIMENTAL: Union of possible value types for a Session Option to be set to.
535 *
536 * By convention, an attempt to set a valueless SessionOptionValue should
537 * attempt to unset or clear the named option value on the server.
538 */
539message SessionOptionValue {
540  message StringListValue {
541    repeated string values = 1;
542  }
543
544  oneof option_value {
545    string string_value = 1;
546    bool bool_value = 2;
547    sfixed64 int64_value = 3;
548    double double_value = 4;
549    StringListValue string_list_value = 5;
550  }
551}
552
553/*
554 * EXPERIMENTAL: A request to set session options for an existing or new (implicit)
555 * server session.
556 *
557 * Sessions are persisted and referenced via a transport-level state management, typically
558 * RFC 6265 HTTP cookies when using an HTTP transport.  The suggested cookie name or state
559 * context key is 'arrow_flight_session_id', although implementations may freely choose their
560 * own name.
561 *
562 * Session creation (if one does not already exist) is implied by this RPC request, however
563 * server implementations may choose to initiate a session that also contains client-provided
564 * session options at any other time, e.g. on authentication, or when any other call is made
565 * and the server wishes to use a session to persist any state (or lack thereof).
566 */
567message SetSessionOptionsRequest {
568  map<string, SessionOptionValue> session_options = 1;
569}
570
571/*
572 * EXPERIMENTAL: The results (individually) of setting a set of session options.
573 *
574 * Option names should only be present in the response if they were not successfully
575 * set on the server; that is, a response without an Error for a name provided in the
576 * SetSessionOptionsRequest implies that the named option value was set successfully.
577 */
578message SetSessionOptionsResult {
579  enum ErrorValue {
580    // Protobuf deserialization fallback value: The status is unknown or unrecognized.
581    // Servers should avoid using this value. The request may be retried by the client.
582    UNSPECIFIED = 0;
583    // The given session option name is invalid.
584    INVALID_NAME = 1;
585    // The session option value or type is invalid.
586    INVALID_VALUE = 2;
587    // The session option cannot be set.
588    ERROR = 3;
589  }
590
591  message Error {
592    ErrorValue value = 1;
593  }
594
595  map<string, Error> errors = 1;
596}
597
598/*
599 * EXPERIMENTAL: A request to access the session options for the current server session.
600 *
601 * The existing session is referenced via a cookie header or similar (see
602 * SetSessionOptionsRequest above); it is an error to make this request with a missing,
603 * invalid, or expired session cookie header or other implementation-defined session
604 * reference token.
605 */
606message GetSessionOptionsRequest {
607}
608
609/*
610 * EXPERIMENTAL: The result containing the current server session options.
611 */
612message GetSessionOptionsResult {
613    map<string, SessionOptionValue> session_options = 1;
614}
615
616/*
617 * Request message for the "Close Session" action.
618 *
619 * The exiting session is referenced via a cookie header.
620 */
621message CloseSessionRequest {
622}
623
624/*
625 * The result of closing a session.
626 */
627message CloseSessionResult {
628  enum Status {
629    // Protobuf deserialization fallback value: The session close status is unknown or
630    // not recognized. Servers should avoid using this value (send a NOT_FOUND error if
631    // the requested session is not known or expired). Clients can retry the request.
632    UNSPECIFIED = 0;
633    // The session close request is complete. Subsequent requests with
634    // the same session produce a NOT_FOUND error.
635    CLOSED = 1;
636    // The session close request is in progress. The client may retry
637    // the close request.
638    CLOSING = 2;
639    // The session is not closeable. The client should not retry the
640    // close request.
641    NOT_CLOSEABLE = 3;
642  }
643
644  Status status = 1;
645}