The main differences between the Java floating-point types float and double

❮ Es6 Class Android Tutorial Textview ❯

Category Programming Techniques

To understand the range of values and computational accuracy of float and double, one must first understand how decimals are stored in computers:

For example: 78.375, is a positive decimal. To store this number in a computer, it needs to be represented in the format of a floating-point number, starting with binary conversion:

PS: The binary decimal point is different from the decimal decimal point. After the binary decimal point are negative powers of 2, and after the decimal is negative powers of 10.

1 Binary Conversion of Decimals (Floating Point Numbers)

The integer part of 78.375:

The fractional part:

So, the binary form of 78.375 is 1001110.011

Then, using binary scientific notation, there is

Note that after conversion, the number represented in binary scientific notation has a base, an exponent, and a fractional part, which is called a floating-point number

2 Storage of Floating Point Numbers in Computers

In a computer, this number is saved using floating-point notation, which is divided into three parts:

The first part is used to store the sign bit, which distinguishes between positive and negative, and here it is 0, indicating a positive number.

The second part is used to store the exponent, and the exponent here is 6 in decimal.

The third part is used to store the fraction, and the fractional part here is 001110011.

As shown in the figure below:

For example, the float type is 32 bits, which is a single-precision floating-point representation:

The sign bit (sign) occupies 1 bit, indicating positive and negative numbers.

The exponent bit (exponent) occupies 8 bits, indicating the exponent.

The fraction bit (fraction) occupies 23 bits, indicating the decimal, and the insufficient number of bits is filled with 0.

The double type is 64 bits, which is a double-precision floating-point representation:

The sign bit occupies 1 bit, the exponent bit occupies 11 bits, and the fraction bit occupies 52 bits.

At this point, it can be vaguely seen:

The exponent bit determines the range of values, because the larger the number that the exponent bit can represent, the larger the number that can be represented!

And the fraction bit determines the computational accuracy, because the larger the number that the fraction bit can represent, the higher the computational accuracy!

It may not be clear enough, let's give an example:

The fraction bit of float is only 23 bits, that is, 23 bits of binary, which can represent the maximum decimal number of 2 to the power of 23, that is, 8,388,608, that is, 7 decimal places, strictly speaking, the accuracy can only guarantee 6 decimal place operations.

The fraction bit of double has 52 bits, corresponding to the maximum decimal value of 4,503,599,627,370,496, which has 16 decimal places, so the computational accuracy can only guarantee 15 decimal place operations.

PS: The scientific calculators we often see, such as those used in high school, generally support up to 15 digits of operation, and beyond that, it is not accurate enough. In actual programming, the double type is used more often because it can ensure 15 digits of operation. If higher precision is required, other data types need to be used, such as the BigDecimal type in Java, which can support higher precision operations.

3 Exponent Bias and Unsigned Representation

It should be noted that the exponent can be negative or positive, that is, the exponent is a signed integer, and the calculation of signed integers is more troublesome than that of unsigned integers. So to reduce unnecessary trouble, when actually storing the exponent, it needs to be converted into an unsigned integer. So how to convert it?

Note that the exponent part of float is 8 bits, so the range of the exponent is -126 to +127. In order to eliminate the actual calculation impact brought by negative numbers (such as comparison, addition, subtraction, etc.), a simple mapping can be made when actually storing the exponent, and add a bias, for example, the exponent bias for float is 127, so there will be no negative numbers.

For example:

If the exponent is 6, then what is actually stored is 6+127=133, that is, after converting 133 into binary, it is stored.

If the exponent is -3, then what is actually stored is -3+127=124, that is, after converting 124 into binary, it is stored.

When we need to calculate the actual decimal number represented

❮ Es6 Class Android Tutorial Textview ❯