Next: Low-level Functions, Previous: Rational Number Functions, Up: Top [Index]
GMP floating point numbers are stored in objects of type mpf_t
and
functions operating on them have an mpf_
prefix.
The mantissa of each float has a user-selectable precision, in practice only limited by available memory. Each variable has its own precision, and that can be increased or decreased at any time. This selectable precision is a minimum value, GMP rounds it up to a whole limb.
The accuracy of a calculation is determined by the priorly set precision of the destination variable and the numeric values of the input variables. Input variables’ set precisions do not affect calculations (except indirectly as their values might have been affected when they were assigned).
The exponent of each float has fixed precision, one machine word on most
systems. In the current implementation the exponent is a count of limbs, so
for example on a 32-bit system this means a range of roughly
2^-68719476768 to 2^68719476736, or on a 64-bit system
this will be much greater. Note however that mpf_get_str
can only
return an exponent which fits an mp_exp_t
and currently
mpf_set_str
doesn’t accept exponents bigger than a long
.
Each variable keeps track of the mantissa data actually in use. This means that if a float is exactly represented in only a few bits then only those bits will be used in a calculation, even if the variable’s selected precision is high. This is a performance optimization; it does not affect the numeric results.
Internally, GMP sometimes calculates with higher precision than that of the destination variable in order to limit errors. Final results are always truncated to the destination variable’s precision.
The mantissa is stored in binary. One consequence of this is that decimal
fractions like 0.1 cannot be represented exactly. The same is true of
plain IEEE double
floats. This makes both highly unsuitable for
calculations involving money or other values that should be exact decimal
fractions. (Suitably scaled integers, or perhaps rationals, are better
choices.)
The mpf
functions and variables have no special notion of infinity or
not-a-number, and applications must take care not to overflow the exponent or
results will be unpredictable.
Note that the mpf
functions are not intended as a smooth
extension to IEEE P754 arithmetic. In particular results obtained on one
computer often differ from the results on a computer with a different word
size.
New projects should consider using the GMP extension library MPFR (http://mpfr.org) instead. MPFR provides well-defined precision and accurate rounding, and thereby naturally extends IEEE P754.
• Initializing Floats | ||
• Assigning Floats | ||
• Simultaneous Float Init & Assign | ||
• Converting Floats | ||
• Float Arithmetic | ||
• Float Comparison | ||
• I/O of Floats | ||
• Miscellaneous Float Functions |
Next: Assigning Floats, Previous: Floating-point Functions, Up: Floating-point Functions [Index]
Set the default precision to be at least prec bits. All
subsequent calls to mpf_init
will use this precision, but previously
initialized variables are unaffected.
Return the default precision actually used.
An mpf_t
object must be initialized before storing the first value in
it. The functions mpf_init
and mpf_init2
are used for that
purpose.
Initialize x to 0. Normally, a variable should be initialized once only
or at least be cleared, using mpf_clear
, between initializations. The
precision of x is undefined unless a default precision has already been
established by a call to mpf_set_default_prec
.
Initialize x to 0 and set its precision to be at least
prec bits. Normally, a variable should be initialized once only or at
least be cleared, using mpf_clear
, between initializations.
Initialize a NULL-terminated list of mpf_t
variables, and set their
values to 0. The precision of the initialized variables is undefined unless a
default precision has already been established by a call to
mpf_set_default_prec
.
Free the space occupied by x. Make sure to call this function for all
mpf_t
variables when you are done with them.
Free the space occupied by a NULL-terminated list of mpf_t
variables.
Here is an example on how to initialize floating-point variables:
{ mpf_t x, y; mpf_init (x); /* use default precision */ mpf_init2 (y, 256); /* precision at least 256 bits */ … /* Unless the program is about to exit, do ... */ mpf_clear (x); mpf_clear (y); }
The following three functions are useful for changing the precision during a calculation. A typical use would be for adjusting the precision gradually in iterative algorithms like Newton-Raphson, making the computation precision closely match the actual accurate part of the numbers.
Return the current precision of op, in bits.
Set the precision of rop to be at least prec bits. The value in rop will be truncated to the new precision.
This function requires a call to realloc
, and so should not be used in
a tight loop.
Set the precision of rop to be at least prec bits, without changing the memory allocated.
prec must be no more than the allocated precision for rop, that
being the precision when rop was initialized, or in the most recent
mpf_set_prec
.
The value in rop is unchanged, and in particular if it had a higher precision than prec it will retain that higher precision. New values written to rop will use the new prec.
Before calling mpf_clear
or the full mpf_set_prec
, another
mpf_set_prec_raw
call must be made to restore rop to its original
allocated precision. Failing to do so will have unpredictable results.
mpf_get_prec
can be used before mpf_set_prec_raw
to get the
original allocated precision. After mpf_set_prec_raw
it reflects the
prec value set.
mpf_set_prec_raw
is an efficient way to use an mpf_t
variable at
different precisions during a calculation, perhaps to gradually increase
precision in an iteration, or just to use various different precisions for
different purposes during a calculation.
Next: Simultaneous Float Init & Assign, Previous: Initializing Floats, Up: Floating-point Functions [Index]
These functions assign new values to already initialized floats (see Initializing Floats).
Set the value of rop from op.
Set the value of rop from the string in str. The string is of the
form ‘M@N’ or, if the base is 10 or less, alternatively ‘MeN’.
‘M’ is the mantissa and ‘N’ is the exponent. The mantissa is always
in the specified base. The exponent is either in the specified base or, if
base is negative, in decimal. The decimal point expected is taken from
the current locale, on systems providing localeconv
.
The argument base may be in the ranges 2 to 62, or -62 to -2. Negative values are used to specify that the exponent is in decimal.
For bases up to 36, case is ignored; upper-case and lower-case letters have the same value; for bases 37 to 62, upper-case letter represent the usual 10..35 while lower-case letter represent 36..61.
Unlike the corresponding mpz
function, the base will not be determined
from the leading characters of the string if base is 0. This is so that
numbers like ‘0.23’ are not interpreted as octal.
White space is allowed in the string, and is simply ignored. [This is not
really true; white-space is ignored in the beginning of the string and within
the mantissa, but not in other places, such as after a minus sign or in the
exponent. We are considering changing the definition of this function, making
it fail when there is any white-space in the input, since that makes a lot of
sense. Please tell us your opinion about this change. Do you really want it
to accept "3 14"
as meaning 314 as it does now?]
This function returns 0 if the entire string is a valid number in base base. Otherwise it returns -1.
Swap rop1 and rop2 efficiently. Both the values and the precisions of the two variables are swapped.
Next: Converting Floats, Previous: Assigning Floats, Up: Floating-point Functions [Index]
For convenience, GMP provides a parallel series of initialize-and-set functions
which initialize the output and then store the value there. These functions’
names have the form mpf_init_set…
Once the float has been initialized by any of the mpf_init_set…
functions, it can be used as the source or destination operand for the ordinary
float functions. Don’t use an initialize-and-set function on a variable
already initialized!
Initialize rop and set its value from op.
The precision of rop will be taken from the active default precision, as
set by mpf_set_default_prec
.
Initialize rop and set its value from the string in str. See
mpf_set_str
above for details on the assignment operation.
Note that rop is initialized even if an error occurs. (I.e., you have to
call mpf_clear
for it.)
The precision of rop will be taken from the active default precision, as
set by mpf_set_default_prec
.
Next: Float Arithmetic, Previous: Simultaneous Float Init & Assign, Up: Floating-point Functions [Index]
Convert op to a double
, truncating if necessary (i.e. rounding
towards zero).
If the exponent in op is too big or too small to fit a double
then the result is system dependent. For too big an infinity is returned when
available. For too small 0.0 is normally returned. Hardware overflow,
underflow and denorm traps may or may not occur.
Convert op to a double
, truncating if necessary (i.e. rounding
towards zero), and with an exponent returned separately.
The return value is in the range 0.5<=abs(d)<1 and the
exponent is stored to *exp
. d * 2^exp is the (truncated) op value. If op is zero,
the return is 0.0 and 0 is stored to *exp
.
This is similar to the standard C frexp
function (see Normalization
Functions in The GNU C Library Reference Manual).
Convert op to a long
or unsigned long
, truncating any
fraction part. If op is too big for the return type, the result is
undefined.
See also mpf_fits_slong_p
and mpf_fits_ulong_p
(see Miscellaneous Float Functions).
Convert op to a string of digits in base base. The base argument may vary from 2 to 62 or from -2 to -36. Up to n_digits digits will be generated. Trailing zeros are not returned. No more digits than can be accurately represented by op are ever generated. If n_digits is 0 then that accurate maximum number of digits are generated.
For base in the range 2..36, digits and lower-case letters are used; for -2..-36, digits and upper-case letters are used; for 37..62, digits, upper-case letters, and lower-case letters (in that significance order) are used.
If str is NULL
, the result string is allocated using the current
allocation function (see Custom Allocation). The block will be
strlen(str)+1
bytes, that being exactly enough for the string and
null-terminator.
If str is not NULL
, it should point to a block of
n_digits + 2 bytes, that being enough for the mantissa, a
possible minus sign, and a null-terminator. When n_digits is 0 to get
all significant digits, an application won’t be able to know the space
required, and str should be NULL
in that case.
The generated string is a fraction, with an implicit radix point immediately
to the left of the first digit. The applicable exponent is written through
the expptr pointer. For example, the number 3.1416 would be returned as
string "31416"
and exponent 1.
When op is zero, an empty string is produced and the exponent returned is 0.
A pointer to the result string is returned, being either the allocated block or the given str.
Next: Float Comparison, Previous: Converting Floats, Up: Floating-point Functions [Index]
Set rop to op1 + op2.
Set rop to op1 - op2.
Set rop to op1 times op2.
Division is undefined if the divisor is zero, and passing a zero divisor to the divide functions will make these functions intentionally divide by zero. This lets the user handle arithmetic exceptions in these functions in the same manner as other arithmetic exceptions.
Set rop to op1/op2.
Set rop to the square root of op.
Set rop to op1 raised to the power op2.
Set rop to -op.
Set rop to the absolute value of op.
Set rop to op1 times 2 raised to op2.
Set rop to op1 divided by 2 raised to op2.
Next: I/O of Floats, Previous: Float Arithmetic, Up: Floating-point Functions [Index]
Compare op1 and op2. Return a positive value if op1 > op2, zero if op1 = op2, and a negative value if op1 < op2.
mpf_cmp_d
can be called with an infinity, but results are undefined for
a NaN.
This function is mathematically ill-defined and should not be used.
Return non-zero if the first op3 bits of op1 and op2 are equal, zero otherwise. Note that numbers like e.g., 256 (binary 100000000) and 255 (binary 11111111) will never be equal by this function’s measure, and furthermore that 0 will only be equal to itself.
Compute the relative difference between op1 and op2 and store the result in rop. This is abs(op1-op2)/op1.
Return +1 if op > 0, 0 if op = 0, and -1 if op < 0.
This function is actually implemented as a macro. It evaluates its argument multiple times.
Next: Miscellaneous Float Functions, Previous: Float Comparison, Up: Floating-point Functions [Index]
Functions that perform input from a stdio stream, and functions that output to
a stdio stream, of mpf
numbers. Passing a NULL
pointer for a
stream argument to any of these functions will make them read from
stdin
and write to stdout
, respectively.
When using any of these functions, it is a good idea to include stdio.h before gmp.h, since that will allow gmp.h to define prototypes for these functions.
See also Formatted Output and Formatted Input.
Print op to stream, as a string of digits. Return the number of bytes written, or if an error occurred, return 0.
The mantissa is prefixed with an ‘0.’ and is in the given base,
which may vary from 2 to 62 or from -2 to -36. An exponent is
then printed, separated by an ‘e’, or if the base is greater than 10 then
by an ‘@’. The exponent is always in decimal. The decimal point follows
the current locale, on systems providing localeconv
.
For base in the range 2..36, digits and lower-case letters are used; for -2..-36, digits and upper-case letters are used; for 37..62, digits, upper-case letters, and lower-case letters (in that significance order) are used.
Up to n_digits will be printed from the mantissa, except that no more digits than are accurately representable by op will be printed. n_digits can be 0 to select that accurate maximum.
Read a string in base base from stream, and put the read float in
rop. The string is of the form ‘M@N’ or, if the base is 10 or
less, alternatively ‘MeN’. ‘M’ is the mantissa and ‘N’ is the
exponent. The mantissa is always in the specified base. The exponent is
either in the specified base or, if base is negative, in decimal. The
decimal point expected is taken from the current locale, on systems providing
localeconv
.
The argument base may be in the ranges 2 to 36, or -36 to -2. Negative values are used to specify that the exponent is in decimal.
Unlike the corresponding mpz
function, the base will not be determined
from the leading characters of the string if base is 0. This is so that
numbers like ‘0.23’ are not interpreted as octal.
Return the number of bytes read, or if an error occurred, return 0.
Previous: I/O of Floats, Up: Floating-point Functions [Index]
Set rop to op rounded to an integer. mpf_ceil
rounds to the
next higher integer, mpf_floor
to the next lower, and mpf_trunc
to the integer towards zero.
Return non-zero if op is an integer.
Return non-zero if op would fit in the respective C data type, when truncated to an integer.
Generate a uniformly distributed random float in rop, such that 0 <= rop < 1, with nbits significant bits in the mantissa or less if the precision of rop is smaller.
The variable state must be initialized by calling one of the
gmp_randinit
functions (Random State Initialization) before
invoking this function.
Generate a random float of at most max_size limbs, with long strings of zeros and ones in the binary representation. The exponent of the number is in the interval -exp to exp (in limbs). This function is useful for testing functions and algorithms, since these kind of random numbers have proven to be more likely to trigger corner-case bugs. Negative random numbers are generated when max_size is negative.
Previous: I/O of Floats, Up: Floating-point Functions [Index]