打包数据

快速参考

这些表总结了打包和解包的指令。

针对整数

Directive     | Meaning
--------------|---------------------------------------------------------------
C             | 8-bit unsigned (unsigned char)
S             | 16-bit unsigned, native endian (uint16_t)
L             | 32-bit unsigned, native endian (uint32_t)
Q             | 64-bit unsigned, native endian (uint64_t)
J             | pointer width unsigned, native endian (uintptr_t)

c             | 8-bit signed (signed char)
s             | 16-bit signed, native endian (int16_t)
l             | 32-bit signed, native endian (int32_t)
q             | 64-bit signed, native endian (int64_t)
j             | pointer width signed, native endian (intptr_t)

S_ S!         | unsigned short, native endian
I I_ I!       | unsigned int, native endian
L_ L!         | unsigned long, native endian
Q_ Q!         | unsigned long long, native endian
              |   (raises ArgumentError if the platform has no long long type)
J!            | uintptr_t, native endian (same with J)

s_ s!         | signed short, native endian
i i_ i!       | signed int, native endian
l_ l!         | signed long, native endian
q_ q!         | signed long long, native endian
              |   (raises ArgumentError if the platform has no long long type)
j!            | intptr_t, native endian (same with j)

S> s> S!> s!> | each the same as the directive without >, but big endian
L> l> L!> l!> |   S> is the same as n
I!> i!>       |   L> is the same as N
Q> q> Q!> q!> |
J> j> J!> j!> |

S< s< S!< s!< | each the same as the directive without <, but little endian
L< l< L!< l!< |   S< is the same as v
I!< i!<       |   L< is the same as V
Q< q< Q!< q!< |
J< j< J!< j!< |

n             | 16-bit unsigned, network (big-endian) byte order
N             | 32-bit unsigned, network (big-endian) byte order
v             | 16-bit unsigned, VAX (little-endian) byte order
V             | 32-bit unsigned, VAX (little-endian) byte order

U             | UTF-8 character
w             | BER-compressed integer

针对浮点数

Directive | Meaning
----------|--------------------------------------------------
D d       | double-precision, native format
F f       | single-precision, native format
E         | double-precision, little-endian byte order
e         | single-precision, little-endian byte order
G         | double-precision, network (big-endian) byte order
g         | single-precision, network (big-endian) byte order

针对字符串

Directive | Meaning
----------|-----------------------------------------------------------------
A         | arbitrary binary string (remove trailing nulls and ASCII spaces)
a         | arbitrary binary string
Z         | null-terminated string
B         | bit string (MSB first)
b         | bit string (LSB first)
H         | hex string (high nibble first)
h         | hex string (low nibble first)
u         | UU-encoded string
M         | quoted-printable, MIME encoding (see RFC2045)
m         | base64 encoded string (RFC 2045) (default)
          |   (base64 encoded string (RFC 4648) if followed by 0)
P         | pointer to a structure (fixed-length string)
p         | pointer to a null-terminated string

其他打包指令

Directive | Meaning
----------|----------------------------------------------------------------
@         | moves to absolute position
X         | back up a byte
x         | null byte

其他解包指令

Directive | Meaning
----------|----------------------------------------------------------------
@         | skip to the offset given by the length argument
X         | skip backward one byte
x         | skip forward one byte

打包和解包

一些 Ruby 核心方法处理数据的打包和解包

这些方法都接受一个字符串 template,其中包含零个或多个指令字符,每个指令字符后跟零个或多个修饰符字符。

示例(指令 'C' 指定“无符号字符”)

[65].pack('C')      # => "A"  # One element, one directive.
[65, 66].pack('CC') # => "AB" # Two elements, two directives.
[65, 66].pack('C')  # => "A"  # Extra element is ignored.
[65].pack('')       # => ""   # No directives.
[65].pack('CC')               # Extra directive raises ArgumentError.

'A'.unpack('C')   # => [65]      # One character, one directive.
'AB'.unpack('CC') # => [65, 66]  # Two characters, two directives.
'AB'.unpack('C')  # => [65]      # Extra character is ignored.
'A'.unpack('CC')  # => [65, nil] # Extra directive generates nil.
'AB'.unpack('')   # => []        # No directives.

字符串 template 可以包含任何有效的指令组合(指令 'c' 指定“有符号字符”)

[65, -1].pack('cC')  # => "A\xFF"
"A\xFF".unpack('cC') # => [65, 255]

字符串 template 可以包含空格(会被忽略)和注释,每个注释都以字符 '#' 开头,并持续到下一个换行符(包括换行符)

[0,1].pack("  C  #foo \n  C  ")    # => "\x00\x01"
"\0\1".unpack("  C  #foo \n  C  ") # => [0, 1]

任何指令后都可以跟随以下任一修饰符

如果元素不符合提供的指令,则只编码最低有效位

[257].pack("C").unpack("C") # => [1]

打包方法

方法 Array#pack 接受可选的关键字参数 buffer,该参数指定目标字符串(而不是新字符串)

[65, 66].pack('C*', buffer: 'foo') # => "fooAB"

该方法可以接受一个块

# Packed string is passed to the block.
[65, 66].pack('C*') {|s| p s }    # => "AB"

解包方法

方法 String#unpackString#unpack1 都接受一个可选的关键字参数 offset,该参数指定字符串中的偏移量

'ABC'.unpack('C*', offset: 1)  # => [66, 67]
'ABC'.unpack1('C*', offset: 1) # => 66

两种方法都可以接受一个块

# Each unpacked object is passed to the block.
ret = []
"ABCD".unpack("C*") {|c| ret << c }
ret # => [65, 66, 67, 68]

# The single unpacked object is passed to the block.
'AB'.unpack1('C*') {|ele| p ele } # => 65

整数指令

每个整数指令都指定输入或输出数组中一个元素的打包或解包。

8 位整数指令

16 位整数指令

32 位整数指令

64 位整数指令

平台相关的整数指令

其他整数指令

整数指令的修饰符

对于以下指令,可以将 '!''_' 修饰符作为底层平台的本机大小后缀。

对于始终为本机大小的指令,本机大小修饰符会被静默忽略。

字节序修饰符也可以在上面的指令中作为后缀

浮点数指令

每个浮点数指令都指定输入或输出数组中一个元素的打包或解包。

单精度浮点数指令

双精度浮点数指令

浮点数指令可以是无穷大或非数字

inf = 1.0/0.0                  # => Infinity
[inf].pack('f')                # => "\x00\x00\x80\x7F"
"\x00\x00\x80\x7F".unpack('f') # => [Infinity]

nan = inf/inf                  # => NaN
[nan].pack('f')                # => "\x00\x00\xC0\x7F"
"\x00\x00\xC0\x7F".unpack('f') # => [NaN]

字符串指令

每个字符串指令都指定输入或输出字符串中一个字节的打包或解包。

二进制字符串指令

位字符串指令

十六进制字符串指令

指针字符串指令

其他字符串指令

偏移量指令