Supported arithmetic ==================== Rounding directions ------------------- Some of the classes of operators presented in the following sections are templated by a rounding direction. This is the direction chosen when converting a real number that cannot be exactly represented in the destination format. There are eleven directions: ``zr`` toward zero ``aw`` away from zero ``dn`` toward minus infinity (down) ``up`` toward plus infinity ``od`` to odd mantissas ``ne`` to nearest, tie breaking to even mantissas ``no`` to nearest, tie breaking to odd mantissas ``nz`` to nearest, tie breaking toward zero ``na`` to nearest, tie breaking away from zero ``nd`` to nearest, tie breaking toward minus infinity ``nu`` to nearest, tie breaking toward plus infinity The rounding directions mandated by the IEEE-754 standard are ``ne`` (default mode, rounding to nearest), ``zr``, ``dn``, ``up``, and ``na`` (introduced for decimal arithmetic). Floating-point operators ------------------------ This class of operators covers all the formats whose number sets are :math:`F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}`. In particular, IEEE-754 floating-point formats (with subnormal numbers) are part of this class, if we set apart overflow issues. Both parameters p and d select a particular format. The last parameter selects the rounding direction. :: float< precision, minimum_exponent, rounding_direction >(...) Formats with no minimal exponent (and thus no underflow) are also available: :: float< precision, rounding_direction >(...) Having to remember the precision and minimum exponent parameters may be a bit tedious, so an alternate syntax is provided: instead of these two parameters, a name can be given to the ``float`` class. :: float< name, rounding_direction >(...) There are four predefined formats: ``ieee_32`` IEEE-754 single precision ``ieee_64`` IEEE-754 double precision ``ieee_128`` IEEE-754 quadruple precision ``x86_80`` extended precision on x86-like processors Fixed-point operators --------------------- This class of operators covers all the formats whose number sets are :math:`F(e) = \{m \cdot 2^e\}`. The first parameter selects the weight of the least significant bit. The second parameter selects the rounding direction. :: fixed< lsb_weight, rounding_direction >(...) Rounding to integer is a special case of fixed point rounding of weight 0. A syntactic shortcut is provided. :: int< rounding_direction >(...) Miscellaneous operators ----------------------- The following operators are underspecified and therefore not suitable for formal proofs. Functions with relative error ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This set of functions is defined with related theorems on relative error. They can be used to express properties that cannot be directly expressed through unary rounding operators. :: {add|sub|mul}_rel< precision [, minimum_exponent] >(..., ...) If the minimum exponent is not provided, the bound on the relative error is assumed to be valid on the entire domain. Otherwise the interval :math:`[-2^e,2^e]` is excluded from the domain. Talking about the expression ``add_rel<20,-60>(a,b)`` is like talking about a fresh expression ``c`` such that ``not |a + b| <= 1b-60 -> |c -/ (a + b)| <= 1b-20``. Rounding operators with homogen properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To be written.