Supported arithmetic¶

Rounding directions¶

Some of the classes of operators presented in the following sections are templated by a rounding direction. This is the direction chosen when converting a real number that cannot be exactly represented in the destination format.

There are eleven directions:

zr: toward zero
aw: away from zero
dn: toward minus infinity (down)
up: toward plus infinity
od: to odd mantissas
ne: to nearest, tie breaking to even mantissas
no: to nearest, tie breaking to odd mantissas
nz: to nearest, tie breaking toward zero
na: to nearest, tie breaking away from zero
nd: to nearest, tie breaking toward minus infinity
nu: to nearest, tie breaking toward plus infinity

The rounding directions mandated by the IEEE-754 standard are ne (default mode, rounding to nearest), zr, dn, up, and na (introduced for decimal arithmetic).

Floating-point operators¶

This class of operators covers all the formats whose number sets are \(F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}\). In particular, IEEE-754 floating-point formats (with subnormal numbers) are part of this class, if we set apart overflow issues. Both parameters p and d select a particular format. The last parameter selects the rounding direction.

float< precision, minimum_exponent, rounding_direction >(...)

Formats with no minimal exponent (and thus no underflow) are also available:

float< precision, rounding_direction >(...)

Having to remember the precision and minimum exponent parameters may be a bit tedious, so an alternate syntax is provided: instead of these two parameters, a name can be given to the float class.

float< name, rounding_direction >(...)

There are four predefined formats:

ieee_32: IEEE-754 single precision
ieee_64: IEEE-754 double precision
ieee_128: IEEE-754 quadruple precision
x86_80: extended precision on x86-like processors

Fixed-point operators¶

This class of operators covers all the formats whose number sets are \(F(e) = \{m \cdot 2^e\}\). The first parameter selects the weight of the least significant bit. The second parameter selects the rounding direction.

fixed< lsb_weight, rounding_direction >(...)

Rounding to integer is a special case of fixed point rounding of weight 0. A syntactic shortcut is provided.

int< rounding_direction >(...)

Miscellaneous operators¶

The following operators are underspecified and therefore not suitable for formal proofs.

Functions with relative error¶

This set of functions is defined with related theorems on relative error. They can be used to express properties that cannot be directly expressed through unary rounding operators.

{add|sub|mul}_rel< precision [, minimum_exponent] >(..., ...)

If the minimum exponent is not provided, the bound on the relative error is assumed to be valid on the entire domain. Otherwise the interval \([-2^e,2^e]\) is excluded from the domain.

Talking about the expression add_rel<20,-60>(a,b) is like talking about a fresh expression c such that not |a + b| <= 1b-60 -> |c -/ (a + b)| <= 1b-20.

Rounding operators with homogen properties¶

To be written.