建立 Arrow 資料類型 — data-type • Arrow R 套件

這些函數建立對應於 Arrow 類型的類型物件。在定義 schema() 時或作為其他類型（如 struct）的輸入時使用它們。這些函數大多數不帶參數，但少數函數會帶參數。

用法

int8()

int16()

int32()

int64()

uint8()

uint16()

uint32()

uint64()

float16()

halffloat()

float32()

float()

float64()

boolean()

bool()

utf8()

large_utf8()

binary()

large_binary()

fixed_size_binary(byte_width)

string()

date32()

date64()

time32(unit = c("ms", "s"))

time64(unit = c("ns", "us"))

duration(unit = c("s", "ms", "us", "ns"))

null()

timestamp(unit = c("s", "ms", "us", "ns"), timezone = "")

decimal(precision, scale)

decimal128(precision, scale)

decimal256(precision, scale)

struct(...)

list_of(type)

large_list_of(type)

fixed_size_list_of(type, list_size)

map_of(key_type, item_type, .keys_sorted = FALSE)

參數

byte_width: FixedSizeBinary 類型的位元組寬度。
unit: 對於時間/時間戳記類型，時間單位。time32() 可以接受 "s" 或 "ms"，而 time64() 可以是 "us" 或 "ns"。timestamp() 可以接受這四個值中的任何一個。
timezone: 對於 timestamp()，可選的時區字串。
precision: 對於 decimal()、decimal128() 和 decimal256()，arrow decimal 類型可以表示的有效位數。decimal128() 的最大精確度為 38 位有效位數，而 decimal256() 的最大精確度為 76 位數。decimal() 將使用它來選擇要返回的十進制類型。
scale: 對於 decimal()、decimal128() 和 decimal256()，小數點後的位數。它可以是負數。
...: 對於 struct()，用於定義 struct 欄位的具名類型列表
type: 對於 list_of()，用於建立類型列表的資料類型
list_size: FixedSizeList 類型的列表大小。
key_type, item_type: 對於 MapType，鍵和項目類型。
.keys_sorted: 使用 TRUE 來聲明 MapType 的鍵已排序。

值

繼承自 DataType 的 Arrow 類型物件。

詳細資訊

一些函數具有別名

utf8() 和 string()
float16() 和 halffloat()
float32() 和 float()
bool() 和 boolean()
在 arrow 函數（例如 schema() 或 cast()）內部呼叫時，也支援 double() 作為建立 float64() 的一種方式

date32() 建立具有 "day" 單位的日期時間類型，類似於 R Date 類別。date64() 具有 "ms" 單位。

uint32 (32 位元無號整數)、uint64 (64 位元無號整數) 和 int64 (64 位元有號整數) 類型可能包含超出 R 的 integer 類型（32 位元有號整數）範圍的值。當這些 arrow 物件轉換為 R 物件時，uint32 和 uint64 會轉換為 double ("numeric")，而 int64 會轉換為 bit64::integer64。對於 int64 類型，可以停用此轉換（以便 int64 始終產生 bit64::integer64 物件），方法是設定 options(arrow.int64_downcast = FALSE)。

decimal128() 建立 Decimal128Type。Arrow 十進制數是編碼為純量整數的定點十進制數字。precision 是十進制類型可以表示的有效位數；scale 是小數點後的位數。例如，數字 1234.567 的精確度為 7，小數位數為 3。請注意，scale 可以為負數。

例如，decimal128(7, 3) 可以精確地表示數字 1234.567 和 -1234.567（在內部編碼為 128 位元整數 1234567 和 -1234567），但不能表示 12345.67 或 123.4567。

decimal128(5, -3) 可以精確地表示數字 12345000（在內部編碼為 128 位元整數 12345），但不能表示 123450000 或 1234500。scale 可以被認為是控制捨入的參數。當為負數時，scale 會導致數字使用科學記號和 10 的次方來表示。

decimal256() 建立 Decimal256Type，它允許更高的最大精確度。對於大多數用例，Decimal128Type 提供的最大精確度已足夠，並且它將產生更緊湊和更高效的編碼。

decimal() 根據 precision 的值建立 Decimal128Type 或 Decimal256Type。如果 precision 大於 38，則返回 Decimal256Type，否則返回 Decimal128Type。

使用 decimal128() 或 decimal256()，因為這些名稱比 decimal() 更具資訊性。

另請參閱

dictionary() 用於建立字典（類似因子）類型。

範例

bool()
#> Boolean
#> bool
struct(a = int32(), b = double())
#> StructType
#> struct<a: int32, b: double>
timestamp("ms", timezone = "CEST")
#> Timestamp
#> timestamp[ms, tz=CEST]
time64("ns")
#> Time64
#> time64[ns]

# Use the cast method to change the type of data contained in Arrow objects.
# Please check the documentation of each data object class for details.
my_scalar <- Scalar$create(0L, type = int64()) # int64
my_scalar$cast(timestamp("ns")) # timestamp[ns]
#> Scalar
#> 1970-01-01 00:00:00.000000000

my_array <- Array$create(0L, type = int64()) # int64
my_array$cast(timestamp("s", timezone = "UTC")) # timestamp[s, tz=UTC]
#> Array
#> <timestamp[s, tz=UTC]>
#> [
#>   1970-01-01 00:00:00Z
#> ]

my_chunked_array <- chunked_array(0L, 1L) # int32
my_chunked_array$cast(date32()) # date32[day]
#> ChunkedArray
#> <date32[day]>
#> [
#>   [
#>     1970-01-01
#>   ],
#>   [
#>     1970-01-02
#>   ]
#> ]

# You can also use `cast()` in an Arrow dplyr query.
if (requireNamespace("dplyr", quietly = TRUE)) {
  library(dplyr, warn.conflicts = FALSE)
  arrow_table(mtcars) %>%
    transmute(
      col1 = cast(cyl, string()),
      col2 = cast(cyl, int8())
    ) %>%
    compute()
}
#> Table
#> 32 rows x 2 columns
#> $col1 <string>
#> $col2 <int8>
#> 
#> See $metadata for additional Schema metadata