The underlying principle and solution of the precision loss of JAVA floating point number calculation

  • 2020-06-07 04:27:12
  • OfStack

Question:

When you operate on two double types of values, you sometimes have problems with outliers. Such as:


  System.out.println(19.99+20);
  System.out.println(1.0-0.66);
  System.out.println(0.033*100);
  System.out.println(12.3/100);

Output:


39.989999999999995
0.33999999999999997
3.3000000000000003
0.12300000000000001

The simple floating point types float and double in Java cannot be evaluated precisely. The problem is not bug for JAVA, because the computer itself is in base 2, and floating point Numbers are actually approximations, so converting from base 2 to 10 floating point Numbers tends to lose precision, resulting in a drop in precision.

The principle of loss of precision can be explained very simply. First, a positive integer is represented in the form 01010 on a computer, and floating point is no exception.

So 11 divided by 2 is 5 with 1 left over

5 divided by 2 is 2 remainder 1

2 divided by 2 is 1 remainder 0

1 divided by 2 is 0 remainder 1

So base 112 is 1011.

The double type occupies 8 bytes, 64 bits, the first bit is the symbolic bit, the next 11 bits are the exponential part, and the rest are significant digits.

A positive integer divided by 2 has to end, and then you can just multiply it by 2 to get back to base 10.

For example: the significant digit portion of 0.99,

0.99 times 2 is 1 plus 0.98 -- > 1
0.98 * 2 = 1+0.96 -- > 1
0.96 * 2 = 1+0.92 -- > 1
0.92 * 2 = 1+0.84 -- > 1
.

There is no end to this cycle, and double has a finite number of significant digits, so there must be a loss, so base 2 cannot accurately represent 0.99, just as base 10 cannot accurately represent 1/31.

Solutions:

In the Effective Java "mentioned in a principle, that is float and double can only be used as a scientific or engineering calculations, but we're going to use in the business computing java. math. BigDecimal, by using BigDecimal class can solve the above problems, first of all, it is important to note that the direct use of string to construct BigDecimal is absolutely no loss of accuracy, if use double or convert double to string to construct BigDecimal precision will still have to lose, So I think the solution is to use the floating point number string to store, involving operations directly with string to construct double, otherwise there will be a loss of precision.

1. Add


/**
 *  add 
 * @param double1
 * @param double2
 * @return
 */
public static double add(String doubleValA, String doubleValB) { 
  BigDecimal a2 = new BigDecimal(doubleValA); 
  BigDecimal b2 = new BigDecimal(doubleValB); 
  return a2.add(b2).doubleValue(); 
}

2. The subtraction


/**
 *  Subtracting the 
 * @param double1
 * @param double2
 * @return
 */
public static double sub(String doubleValA, String doubleValB) { 
  BigDecimal a2 = new BigDecimal(doubleValA); 
  BigDecimal b2 = new BigDecimal(doubleValB); 
  return a2.subtract(b2).doubleValue();
}

Is multiplied by 3.


/**
 *  multiply 
 * @param double1
 * @param double2
 * @return
 */
public static double mul(String doubleValA, String doubleValB) { 
  BigDecimal a2 = new BigDecimal(doubleValA); 
  BigDecimal b2 = new BigDecimal(doubleValB); 
  return a2.multiply(b2).doubleValue();
}

4. Division


/**
 *  division 
 * @param double1
 * @param double2
 * @param scale  Specifies accuracy when the division is not complete 
 * @return
 */
public static double div(String doubleValA, String doubleValB, int scale) { 
  BigDecimal a2 = new BigDecimal(doubleValA); 
  BigDecimal b2 = new BigDecimal(doubleValB);
  return a2.divide(b2, scale, BigDecimal.ROUND_HALF_UP).doubleValue(); 
}

5. Main function call


public static void main(String[] args) {
  String doubleValA = "3.14159267";
  String doubleValB = "2.358";
  System.out.println("add:" + add(doubleValA, doubleValB));
  System.out.println("sub:" + sub(doubleValA, doubleValB));
  System.out.println("mul:" + mul(doubleValA, doubleValB));
  System.out.println("div:" + div(doubleValA, doubleValB, 8));
}

The results are shown as follows:


 add:5.49959267
 sub:0.78359267
 mul:7.40787551586
 div:1.33231241

So the best thing to do is to ditch double altogether and use string and java.math.BigDecimal.

java performs float,double operations in accordance with the floating point representation specified by IEEE. This structure is a scientific notation, represented by a symbol, exponent, and mantissa with a base of 2 -- that is, a floating point number is represented as mantissa times 2 to the exponent with a symbol. Please keep an eye on my blog for details on how to store it and how to run it. I will summarize the details later.


Related articles: