# Supported arithmetic¶

## Rounding directions¶

Some of the classes of operators presented in the following sections are templated by a rounding direction. This is the direction chosen when converting a real number that cannot be exactly represented in the destination format.

There are eleven directions:

`zr`

- toward zero
`aw`

- away from zero
`dn`

- toward minus infinity (down)
`up`

- toward plus infinity
`od`

- to odd mantissas
`ne`

- to nearest, tie breaking to even mantissas
`no`

- to nearest, tie breaking to odd mantissas
`nz`

- to nearest, tie breaking toward zero
`na`

- to nearest, tie breaking away from zero
`nd`

- to nearest, tie breaking toward minus infinity
`nu`

- to nearest, tie breaking toward plus infinity

The rounding directions mandated by the IEEE-754 standard are `ne`

(default mode, rounding to nearest), `zr`

, `dn`

, `up`

, and `na`

(introduced for decimal arithmetic).

## Floating-point operators¶

This class of operators covers all the formats whose number sets are \(F(p,d) = \{m \cdot 2^e; |m| < 2^p, e \ge d\}\). In particular, IEEE-754 floating-point formats (with subnormal numbers) are part of this class, if we set apart overflow issues. Both parameters p and d select a particular format. The last parameter selects the rounding direction.

```
float< precision, minimum_exponent, rounding_direction >(...)
```

Formats with no minimal exponent (and thus no underflow) are also available:

```
float< precision, rounding_direction >(...)
```

Having to remember the precision and minimum exponent parameters may be
a bit tedious, so an alternate syntax is provided: instead of these two
parameters, a name can be given to the `float`

class.

```
float< name, rounding_direction >(...)
```

There are four predefined formats:

`ieee_32`

- IEEE-754 single precision
`ieee_64`

- IEEE-754 double precision
`ieee_128`

- IEEE-754 quadruple precision
`x86_80`

- extended precision on x86-like processors

## Fixed-point operators¶

This class of operators covers all the formats whose number sets are \(F(e) = \{m \cdot 2^e\}\). The first parameter selects the weight of the least significant bit. The second parameter selects the rounding direction.

```
fixed< lsb_weight, rounding_direction >(...)
```

Rounding to integer is a special case of fixed point rounding of weight 0. A syntactic shortcut is provided.

```
int< rounding_direction >(...)
```

## Miscellaneous operators¶

The following operators are underspecified and therefore not suitable for formal proofs.

### Functions with relative error¶

This set of functions is defined with related theorems on relative error. They can be used to express properties that cannot be directly expressed through unary rounding operators.

```
{add|sub|mul}_rel< precision [, minimum_exponent] >(..., ...)
```

If the minimum exponent is not provided, the bound on the relative error is assumed to be valid on the entire domain. Otherwise the interval \([-2^e,2^e]\) is excluded from the domain.

Talking about the expression `add_rel<20,-60>(a,b)`

is like talking
about a fresh expression `c`

such that ```
not |a + b| <= 1b-60
-> |c -/ (a + b)| <= 1b-20
```

.

### Rounding operators with homogen properties¶

To be written.