Table of Contents

Overview

The following are notes by Emil Lenngren on PostgreSQL wire representation of types:

bool:
	text: t or f
	binary: a byte: 1 or 0

bytea:
	text:
		either \x followed by hex-characters (lowercase by default),
		or plain characters, where non-printable characters (between 0x20 and 0x7e, inclusive) are written as \nnn (octal) and \ is written as \\
	binary: the bytes as they are

char:
	This type holds a single char/byte. (Not to be confused with bpchar (blank-padded char) which is PostgreSQL's alias to the SQL standard's char).
	The char may be the null-character
	text: the char as a byte, encoding seems to be ignored
	binary: the char as a byte

name:
	A null-padded string of NAMEDATALEN (currently 64) bytes (the last byte must be a null-character). Used in pg catalog.
	text: the name as a string
	binary: the name as a string

int2/int4/int8:
	text: text representation in base 10
	binary: binary version of the integer

int2vector:
	non-null elements, 0-indexed, 1-dim
	text: 1 2 3 4
	binary: same as int2[]

oidvector:
	non-null elements, 0-indexed, 1-dim
	text: 1 2 3 4
	binary: same as oid[]

regproc:
	internally just an OID (UInt32)
	text: -, name of procedure, or numeric if not found
	binary: only the OID in binary

regprocedure/regoper/regoperator/regclass/regconfig/regdictionary:
	similar to regproc

text:
	text: the string as it is
	binary: the string as it is

oid:
	A 32-bit unsigned integer used for internal object identification.
	text: the text-representation of this integer in base 10
	binary: the UInt32

tid:
	tuple id
	Internally a tuple of a BlockNumber (UInt32) and an OffsetNumber (UInt16)
	text: (blockNumber,offsetNumber)
	binary: the block number in binary followed by offset number in binary

xid:
	transaction id
	Internally just a TransactionId (UInt32)
	text: the number
	binary: the number in binary

cid:
	command id
	Internally just a CommandId (UInt32)
	text: the number
	binary: the number in binary

json:
	json
	text: the json an text
	binary: the json as text

jsonb:
	json internally stored in an efficient binary format
	text: the json as text
	binary: An Int32 (version number, currently 1), followed by data (currently just json as text)

xml:
	Xml. It is probably most efficient to use the text format, especially when receiving from client.
	text: the xml as text (when sent from the server: encoding removed, when receiving: assuming database encoding)
	binary: the xml as text (when sent from the server: in the client's specified encoding, when receiving: figures out itself)

pg_node_tree:
	used as type for the column typdefaultbin in pg_type
	does not accept input
	text: text
	binary: text

smgr:
	storage manager
	can only have the value "magnetic disk"
	text: magnetic disk
	binary: not available

point:
	A tuple of two float8
	text:
		(x,y)
		The floats are interpreted with the C strtod function.
		The floats are written with the snprintf function, with %.*g format. NaN/-Inf/+Inf can be written, but not interpretability depends on platform. The extra_float_digits setting is honored.
		For linux, NaN, [+-]Infinity, [+-]Inf works, but not on Windows. Windows also have other output syntax for these special numbers. (1.#QNAN for example)
	binary: the two floats

lseg:
	A tuple of two points
	text: [(x1,y1),(x2,y2)] see point for details
	binary: the four floats in the order x1, y1, x2, y2

path:
	A boolean whether the path is opened or closed + a vector of points.
	text: [(x1,y1),...] for open path and ((x1,y1),...) for closed paths. See point for details.
	binary: first a byte indicating open (0) or close (1), then the number of points (Int32), then a vector of points

box:
	A tuple of two points. The coordinates will be reordered so that the first is the upper right and the second is the lower left.
	text: (x1,y1),(x2,y2) see point for details
	binary: the four floats in the order x1, y1, x2, y2 (doesn't really matter since they will be reordered)

polygon:
	Same as path but with two differences: is always closed and internally stores the bounding box.
	text: same as closed path
	binary: the number of points (Int32), then a vector of points

line (version 9.4):
	Ax + By + C = 0. Stored with three float8.
	Constraint: A and B must not both be zero (only checked on text input, not binary).
	text: {A,B,C} see point for details about the string representation of floats. Can also use the same input format as a path with two different points, representing the line between those.
	binary: the three floats

circle:
	<(x,y),r> (center point and radius), stored with three float8.
	text: <(x,y),r> see point for details about the string representation of floats.
	binary: the three floats x, y, r in that order

float4/float8:
	text:
		(leading/trailing whitespace is skipped) interpreted with the C strtod function, but since it has problems with NaN, [+-]Infinity, [+-]Inf, those strings are identified (case-insensitively) separately.
		when outputting: NaN, [+-]Infinity is treated separately, otherwise the string is printed with snprintf %.*g and the extra_float_digits setting is honored.
	binary: the float

abstime:
	A unix timestamp stored as a 32-bit signed integer with seconds-precision (seconds since 1970-01-01 00:00:00), in UTC
	Has three special values: Invalid (2^31-1), infinity (2^31-3), -infinity (-2^31)
	text: same format as timestamptz, or "invalid", "infinity", "-infinity"
	binary: Int32
	

reltime:
	A time interval with seconds-precision (stored as an 32-bit signed integer)
	text: same as interval
	binary: Int32

tinterval:
	Consists of a status (Int32) and two abstimes. Status is valid (1) iff both abstimes are valid, else 0.
	Note that the docs incorrectly states that ' is used as quote instead of "
	text: ["<abstime>" "<abstime>"]
	binary: Int32 (status), Int32 (abstime 1), Int32 (abstime 2)

unknown:
	text: text
	binary: text

money:
	A 64-bit signed integer. For example, $123.45 is stored as the integer 12345. Number of fraction digits is locale-dependent.
	text: a locale-depedent string
	binary: the raw 64-bit integer

macaddr:
	6 bytes
	text: the 6 bytes in hex (always two characters per byte) separated by :
	binary: the 6 bytes appearing in the same order as when written in text

inet/cidr:
	Struct of Family (byte: ipv4=2, ipv6=3), Netmask (byte with number of bits in the netmask), Ipaddr bytes (16)
	Text: The IP-address in text format and /netmask. /netmask is omitted in inet if the netmask is the whole address.
	Binary: family byte, netmask byte, byte (cidr=1, inet=0), number of bytes in address, bytes of the address

aclitem:
	Access list item used in pg_class
	Text: Something like postgres=arwdDxt/postgres
	Binary: not available

bpchar:
	Blank-padded char. The type modifier is used to blank-pad the input.
	text: text
	binary: text

varchar:
	Variable-length char. The type modifier is used to check the input's length.
	text: text
	binary: text

date:
	A signed 32-bit integer of a date. 0 = 2000-01-01.
	Infinity: INT_MAX, -Infinity: INT_MIN
	Text: Date only using the specified date style
	Binary: Int32

time:
	A signed 64-bit integer representing microseconds from 00:00:00.000000. (Legacy uses 64-bit float). Negative values are not allowed.
	Max value is 24:00:00.000000.
	text: hh:mm:ss or hh:mm:ss.ffffff where the fraction part is between 1 and 6 digits (trailing zeros are not written)
	binary: the 64-bit integer

timetz:
	A struct of
	Time: A signed 64-bit integer representing microseconds from 00:00:00.000000. (Legacy uses 64-bit float). Negative values are not allowed.
		Max value is 24:00:00.000000.
	Zone: A signed 32-bit integer representing the zone (in seconds). Note that the sign is inverted. So GMT+1h is stored as -1h.
	text: hh:mm:ss or hh:mm:ss.ffffff where the fraction part is between 1 and 6 digits (trailing zeros are not written)
	binary: the 64-bit integer followed by the 32-bit integer

timestamp:
	A signed 64-bit integer representing microseconds from 2000-01-01 00:00:00.000000
	Infinity is LONG_MAX and -Infinity is LONG_MIN
	(Infinity would be 294277-01-09 04:00:54.775807)
	Earliest possible timestamp is 4714-11-24 00:00:00 BC. Even earlier would be possible, but due to internal calculations those are forbidden.
	text: dependent on date style
	binary: the 64-bit integer

timestamptz:
	A signed 64-bit integer representing microseconds from 2000-01-01 00:00:00.000000 UTC. (Time zone is not stored).
	Infinity is LONG_MAX and -Infinity is LONG_MIN
	text: first converted to the time zone in the db settings, then printed according to the date style
	binary: the 64-bit integer

interval:
	A struct of
	Time (Int64): all time units other than days, months and years (microseconds)
	Day (Int32): days, after time for alignment
	Month (Int32): months and years, after time for alignment
	text: Style dependent, but for example: "-11 mons +15435 days -11111111:53:00"
	binary: all fields in the struct

bit/varbit:
	First a signed 32-bit integer containing the number of bits (negative length not allowed). Then all the bits in big end first.
	So a varbit of length 1 has the first (and only) byte set to either 0x80 or 0x00. Last byte is assumed (and is automatically zero-padded in recv) to be zero-padded.
	text:
		when sending from backend: all the bits, written with 1s and 0s.
		when receiving from client: (optionally b or B followed by) all the bits as 1s and 0s, or a x or X followed by hexadecimal digits (upper- or lowercase), big endian first.
	binary: the 32-bit length followed by the bytes containing the bits

numeric:
	A variable-length numeric value, can be negative.
	text: NaN or first - if it is negative, then the digits with . as decimal separator
	binary:
		first a header of 4 16-bit signed integers:
			number of digits in the digits array that follows (can be 0, but not negative),
			weight of the first digit (10000^weight), can be both negative, positive or 0,
			sign: negative=0x4000, positive=0x0000, NaN=0xC000
			dscale: number of digits (in base 10) to print after the decimal separator
		then the array of digits:
			The digits are stored in base 10000, where each digit is a 16-bit integer.
			Trailing zeros are not stored in this array, to save space.
			The digits are stored such that, if written as base 10000, the decimal separator can be inserted between two digits in base 10000,
				i.e. when this is to be printed in base 10, only the first digit in base 10000 can (possibly) be printed with less than 4 characters.
				Note that this does not apply for the digits after the decimal separator; the digits should be printed out in chunks of 4
				characters and then truncated with the given dscale.

refcursor:
	uses the same routines as text

record:
	Describes a tuple. Is also the "base class" for composite types (i.e. it uses the same i/o functions).
	text:
		( followed by a list of comma-separated text-encoded values followed by ).
		Empty element means null.
		Quoted with " and " if necessary. " is escaped with "" and \ is escaped with \\ (this differs from arrays where " is escaped with \").
		Must be quoted if it is an empty string or contains one of "\,() or a space.
	binary:
		First a 32-bit integer with the number of columns, then for each column:
			An OID indicating the type of the column
			The length of the column (32-bit integer), or -1 if null
			The column data encoded as binary

cstring:
	text/binary: all characters are sent without the trailing null-character

void:
	Used for example as return value in SELECT * FROM func_returning_void()
	text: an empty string
	binary: zero bytes

uuid:
	A 16-byte uuid.
	text: group of 8, 4, 4, 4, 12 hexadecimal lower-case characters, separated by -. The first byte is written first. It is allowed to surround it with {}.
	binary: the 16 bytes

txid_snapshot:
	(txid is a UInt64)
	A struct of
	UInt32 nxip (size of the xip array)
	txid xmin (no values in xip is smaller than this)
	txid xmax (no values in xip is larger than or equal this)
	txid[] xip (is ordered in ascending order)
	text: xmin:xmax:1,2,3,4
	binary: all fields in the structure

tsvector:
	Used for text searching. Example of tsvector: 'a':1,6,10 'on':5 'and':8 'ate':9A 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4
	Max length for each lexeme string is 2046 bytes (excluding the trailing null-char)
	The words are sorted when parsed, and only written once. Positions are also sorted and only written once.
	For some reason, the unique check does not seem to be made for binary input, only text input...
	text: As seen above. ' is escaped with '' and \ is escaped with \\.
	binary:
		UInt32 number of lexemes
		for each lexeme:
			lexeme text in client encoding, null-terminated
			UInt16 number of positions
			for each position:
				UInt16 WordEntryPos, where the most significant 2 bits is weight, and the 14 least significant bits is pos (can't be 0). Weights 3,2,1,0 represent A,B,C,D

tsquery:
	A tree with operands and operators (&, |, !). Operands are strings, with optional weight (bitmask of ABCD) and prefix search (yes/no, written with *).
	text: the tree written in infix notation. Example: ( 'abc':*B | 'def' ) & !'ghi'
	binary: the tree written in prefix notation:
		First the number of tokens (a token is an operand or an operator).
		For each token:
			UInt8 type (1 = val, 2 = oper) followed by
			For val: UInt8 weight + UInt8 prefix (1 = yes / 0 = no) + null-terminated string,
			For oper: UInt8 oper (1 = not, 2 = and, 3 = or, 4 = phrase). 
			In case of phrase oper code, an additional UInt16 field is sent (distance value of operator). Default is 1 for <->, otherwise the n value in '<n>'.

enum:
	Simple text

gtsvector:
	GiST for tsvector. Probably internal type.

int4range/numrange/tsrange/tstzrange/daterange/int8range and user-defined range types:
	/* A range's flags byte contains these bits: */
	#define RANGE_EMPTY         0x01    /* range is empty */
	#define RANGE_LB_INC        0x02    /* lower bound is inclusive */
	#define RANGE_UB_INC        0x04    /* upper bound is inclusive */
	#define RANGE_LB_INF        0x08    /* lower bound is -infinity */
	#define RANGE_UB_INF        0x10    /* upper bound is +infinity */
	#define RANGE_LB_NULL       0x20    /* lower bound is null (NOT USED) */
	#define RANGE_UB_NULL       0x40    /* upper bound is null (NOT USED) */
	#define RANGE_CONTAIN_EMPTY 0x80/* marks a GiST internal-page entry whose
	                                 * subtree contains some empty ranges */
	A range has no lower bound if any of RANGE_EMPTY, RANGE_LB_INF (or RANGE_LB_NULL, not used anymore) is set. The same applies for upper bounds.
	text:
		A range with RANGE_EMPTY is just written as the string "empty".
		Inclusive bounds are written with [ and ], else ( and ) is used.
		The two values are comma-separated.
		Missing bounds are written as an empty string (without quotes).
		Each value is quoted with " if necessary. Quotes are necessary if the string is either the empty string or contains "\,()[] or spaces. " is escaped with "" and \ is escaped with \\.
		Example: [18,21]
	binary: First the flag byte. Then, if has lower bound: 32-bit length + binary-encoded data. Then, if has upper bound: 32-bit length + binary-encoded data.

hstore:
	Key/value-store. Both keys and values are strings.
	text:
		Comma-space separated string, where each item is written as "key"=>"value" or "key"=>NULL. " and \ are escaped as \" and \\.
		Example: "a"=>"b", "c"=>NULL, "d"=>"q"
	binary:
		Int32 count
		for each item:
			Int32 keylen
			string of the key (not null-terminated)
			Int32 length of item (or -1 if null)
			the item as a string

ghstore:
	internal type for indexing hstore

domain types:
	mapped types used in information_schema:
		cardinal_number: int4 (must be nonnegative or null)
		character_data: varchar
		sql_identifier: varchar
		time_stamp: timestamptz
		yes_or_no: varchar(3) (must be "YES" or "NO" or null)
	intnotnull: when an int4 is cast to this type, it is checked that the int4 is not null, but it still returns an int4 and not intnotnull...