TSE.
MathematicsFinanceHealthPhysicsEngineeringBrowse all

Computer Science · Digital & Computing

Floating Point Representation Calculator

Converts a decimal number into its IEEE 754 single-precision (32-bit) or double-precision (64-bit) floating point binary representation, including sign, exponent, and mantissa fields.

Calculator

Advertisement

Formula

V is the represented value. s is the sign bit (0 for positive, 1 for negative). e is the stored biased exponent as an unsigned integer. bias = 127 for single precision (32-bit) and 1023 for double precision (64-bit). The term (1 + ...) represents the implicit leading 1 plus the fractional mantissa m_i bits, where p = 23 for single precision and p = 52 for double precision.

Source: IEEE 754-2019 Standard for Floating-Point Arithmetic, IEEE Computer Society.

How it works

The IEEE 754 standard, published by the Institute of Electrical and Electronics Engineers, defines how real numbers are stored in binary hardware. Every floating point number is encoded in three fields: a sign bit (1 bit), a biased exponent (8 bits for single, 11 bits for double), and a mantissa or significand (23 bits for single, 52 bits for double). Together these fields allow a single 32-bit word or 64-bit word to represent an enormous range of values — from roughly 1.2 × 10⁻³⁸ to 3.4 × 10³⁸ for single precision, and far beyond for double precision.

The encoding formula is V = (−1)^s × 2^(e − bias) × (1 + f), where s is the sign bit, e is the stored exponent, the bias is 127 (single) or 1023 (double), and f is the fractional part of the mantissa. The leading 1 in (1 + f) is implicit and never stored — this is called the hidden bit, and it gives the format one extra bit of precision for free. The biased exponent scheme allows the stored value to be an unsigned integer while still representing both very small and very large numbers.

Floating point arithmetic is used everywhere: graphics processing units (GPUs), physics simulation engines, machine learning training loops, financial risk models, and signal processing systems all rely on it. Knowing how a number is represented in memory helps engineers debug precision loss, avoid catastrophic cancellation, choose between float and double types, and understand why some decimal values like 0.1 cannot be represented exactly in binary floating point.

Worked example

Let's encode −13.75 in IEEE 754 single-precision (32-bit) format step by step.

Step 1 — Sign bit: The number is negative, so s = 1.

Step 2 — Convert magnitude to binary: 13 in binary is 1101. The fractional part 0.75 = 0.11 in binary (0.5 + 0.25). So |−13.75| = 1101.11 in binary.

Step 3 — Normalize: Write in the form 1.xxxxx × 2^n. Moving the point three places left gives 1.10111 × 2³. The true exponent is e = 3.

Step 4 — Biased exponent: Add the bias: 3 + 127 = 130. In binary: 10000010.

Step 5 — Mantissa: Strip the implicit leading 1. The fractional bits are 10111, padded to 23 bits: 10111000000000000000000.

Step 6 — Assemble the bit pattern: Sign | Exponent | Mantissa = 1 | 10000010 | 10111000000000000000000, which is the 32-bit hex value 0xC15C0000.

Step 7 — Verify: (−1)¹ × 2^(130−127) × (1 + 0.10111₂) = −1 × 8 × 1.71875 = −13.75. The reconstruction is exact in this case because 0.75 is a sum of powers of two.

Limitations & notes

Not all decimal values can be represented exactly in binary floating point. The number 0.1, for example, has an infinite repeating binary expansion and will always be stored with a tiny rounding error — this is a fundamental property of the format, not a bug in any particular implementation. Single-precision arithmetic is subject to larger rounding errors than double precision; for applications requiring more than about 7 significant decimal digits, double precision is necessary. Special values such as positive infinity, negative infinity, and NaN (Not a Number) are encoded using reserved exponent patterns (all zeros or all ones) and are not handled numerically by this calculator. Denormalized (subnormal) numbers, which fill the gap near zero, use a different encoding where the implicit leading bit is 0 rather than 1, and this calculator does not currently visualize the subnormal encoding path. Very large or very small inputs may exhaust the representable range and produce overflow or underflow. Always verify critical numerical computations against the actual hardware behavior of your target platform.

Frequently asked questions

Why can't 0.1 be represented exactly in IEEE 754 floating point?

The decimal fraction 0.1 equals 1/10, which has no finite binary expansion — just as 1/3 has no finite decimal expansion. In binary it becomes a repeating pattern 0.0001100110011... that is truncated at the mantissa width, introducing a tiny rounding error. This is why adding 0.1 ten times in most programming languages does not produce exactly 1.0.

What is the difference between single precision and double precision floating point?

Single precision (float) uses 32 bits: 1 sign bit, 8 exponent bits, and 23 mantissa bits, providing roughly 7 significant decimal digits of precision. Double precision (double) uses 64 bits: 1 sign bit, 11 exponent bits, and 52 mantissa bits, providing roughly 15–16 significant decimal digits. Double precision is the default in most scientific and financial software.

What is the bias in the IEEE 754 exponent field?

The bias is a fixed offset added to the true exponent before storing it, allowing the exponent field to be an unsigned integer that can be compared and sorted without special handling of negative values. For 32-bit single precision the bias is 127; for 64-bit double precision it is 1023. To recover the true exponent, subtract the bias from the stored value.

What is machine epsilon and why does it matter?

Machine epsilon (ε) is the smallest floating point number such that 1 + ε ≠ 1 in the given format. It equals 2⁻²³ ≈ 1.19 × 10⁻⁷ for single precision and 2⁻⁵² ≈ 2.22 × 10⁻¹⁶ for double precision. It is a practical measure of relative rounding error and is used as a convergence threshold in iterative numerical algorithms.

What are NaN and infinity in floating point, and how are they encoded?

NaN (Not a Number) and infinity are special values defined by IEEE 754 to handle exceptional conditions like division by zero or the square root of a negative number. They are encoded using a reserved all-ones exponent pattern (255 for single, 2047 for double). Infinity has a zero mantissa; NaN has a non-zero mantissa. These values propagate through calculations so that errors are detectable at the end of a computation.

Last updated: 2025-01-15 · Formula verified against primary sources.