Peter Norvig 22 October 2015, revised 28 October 2015

Beal's Conjecture Revisited2

In 1637, Pierre de Fermat wrote in the margin of a book that he had a proof of his famous "Last Theorem":

If $A^n + B^n = C^n$,
where $A, B, C, n$ are positive integers
then $n \le 2$.

Centuries passed before Andrew Beal, a businessman and amateur mathematician, made his conjecture in 1993:

If $A^x + B^y = C^z$,
where $A, B, C, x, y, z$ are positive integers and $x, y, z$ are all greater than $2$,
then $A, B$ and $C$ must have a common prime factor.

Andrew Wiles proved Fermat's theorem in 1995, but Beal's offer of \$1,000,000 for a proof or disproof of his conjecture remains unclaimed. I don't have the mathematical skills of Wiles, so all I can do is write a program to search for counterexamples. I first wrote that program in 2000, and my name got associated with Beal's Conjecture, which means I get a lot of emails with purported proofs or counterexamples (many asking how they can collect their prize money). So far, all the emails have been wrong. This page catalogs some of the more common errors—including two mistakes of my own—and shows an updated program.

How to Not Win A Million Dollars

  • A proof must show that there are no examples that satisfy the conditions. A common error is to show how a certain technique generates an infinite collection of numbers that satisfy $A^x + B^y = C^z$ and then show that in all of these, $A, B, C$ have a common factor. But that's not good enough, unless you can also prove that no other technique can satisfy the equation.

  • It is valid to use proof by contradiction: assume the conjecture is true, and show that that leads to a contradiction. It is not valid to use proof by circular reasoning: assume the conjecture is true, put in some irrelevant steps, and show that it follows that the conjecture is true.

  • A valid counterexample needs to satisfy all four conditions—don't leave one out:

$A, B, C, x, y, z$ are positive integers
$x, y, z > 2$
$A^x + B^y = C^z$
$A, B, C$ have no common prime factor.

(If you think you have a valid counterexample, before you share it with Andrew Beal, or me, or anyone else, you can check it with my Online Beal Counterexample Checker.)

  • One correspondent claimed that $27^4 + 162 ^ 3 = 9 ^ 7$ was a solution, because the first three conditions hold, and the common factor is 9, which isn't a prime. But of course, if $A, B, C$ have 9 as a common factor, then they also have 3, and 3 is prime. The phrase "no common prime factor" means the same thing as "no common factor greater than 1."

  • Another claimed that $2^3+2^3=2^4$ was a counterexample, because all the bases are 2, which is prime, and prime numbers have no prime factors. But that's not true; a prime number has itself as a factor.

  • A creative person offered $1359072^4 - 940896^4 = 137998080^3$, which fails both because $3^3 2^5 11^2$ is a common factor, and because it has a subtraction rather than an addition (although, as Julius Jacobsen pointed out, that can be rectified by adding $940896^4$ to both sides).

  • Another Beal fan started by saying "Let $C = 43$ and $z = 3$. Since $43 = 21 + 22$, we have $43^3 = (21^3 + 22^3).$" But of course $(a + b)^3 \ne (a^3 + b^3)$. This fallacy is called the freshman's dream (although I remember having different dreams as a freshman).

  • Multiple people proposed answers similar to this one:
In [1]:
from math import gcd #### In Python versions < 3.5, use "from fractions import gcd"
In [2]:
A, B, C = 60000000000000000000, 70000000000000000000, 82376613842809255677

x = y = z = 3.

A ** x + B ** y == C ** z    and    gcd(gcd(A, B), C) == 1
Out[2]:
True

WOW! The result is True! Is this a real counterexample to Beal? And also a disproof of Fermat?

Alas, it is not. Notice the decimal point in "3.", indicating a floating point number, with inexact, limited precision. Change the inexact "3." to an exact "3" and the result changes to "False". Below we see that the two sides of the equation are the same for the first 18 digits, but differ starting with the 19th:

In [3]:
(A ** 3 + B ** 3,
 C ** 3)
Out[3]:
(559000000000000000000000000000000000000000000000000000000000,
 559000000000000000063037470301555182935702892172500189973733)

They say "close" only counts in horseshoes and hand grenades, and if you threw two horseshoes at a stake on the planet Kapteyn-b (a possibly habitable and thus possibly horseshoe-playing exoplanet 12.8 light years from Earth) and the two paths differed in the 19th digit, the horseshoes would end up less than an inch apart. That's really, really close, but close doesn't count in number theory.

The Simpsons and Fermat

Speaking of close: in two different episodes of The Simpsons, close counterexamples to Fermat's Last Theorem are shown: $1782^{12} + 1841^{12} = 1922^{12}$ and $3987^{12} + 4365^{12} = 4472^{12}$. These were designed by Simpsons writer David X. Cohen to be correct up to the precision found in most handheld calculators. Cohen found the equations with a program that must have been something like this:

In [4]:
from itertools import combinations

def simpsons(bases, powers):
    """Find the integers (A, B, C, n) that come closest to solving 
    Fermat's equation, A ** n + B ** n == C ** n. 
    Let A, B range over all pairs of bases and n over all powers."""
    equations = ((A, B, iroot(A ** n + B ** n, n), n)
                 for A, B in combinations(bases, 2)
                 for n in powers)
    return min(equations, key=relative_error)

def iroot(i, n): 
    "The integer closest to the nth root of i."
    return int(round(i ** (1./n)))

def relative_error(equation):
    "Error between LHS and RHS of equation, relative to RHS." 
    (A, B, C, n) = equation
    LHS = A ** n + B ** n
    RHS = C ** n
    return abs(LHS - RHS) / RHS
In [5]:
simpsons(range(1000, 2000), [11, 12, 13])
Out[5]:
(1782, 1841, 1922, 12)
In [6]:
simpsons(range(3000, 5000), [12])
Out[6]:
(3987, 4365, 4472, 12)

Back to Beal: beal 2.0 and 2.1

In October 2015 I looked back at my original program from 2000. I ported it from Python 1.5 to 3.5 (by putting parens around the argument to print and adding long = int). It runs 250 times faster today, a tribute to both computer hardware engineers and the developers of the Python interpreter.

I found that I had misunderstood the problem in 2000. I thought that, by definition, $A$ and $B$ could not have a common factor, but actually, the definition of the conjecture only rules out examples where all three of $A, B, C$ share a common factor. I rewrote the program to reflect that, but then Mark Tiefenbruck (and later Edwin P. Berlin Jr. and Shen Lixing) wrote to point out that my original program was actually correct, not by definition, but by derivation: if $A$ and $B$ have a commmon prime factor $p$, then the sum of $A^x + B^y$ must also have that factor $p$, and since $A^x + B^y = C^z$, then $C^z$ and hence $C$ must have the factor $p$. So I was wrong twice—I originally failed to understand the problem completely, and then I failed to recognize the optimization—and that means the original program was correct.

Mark Tiefenbruck also suggested another optimization: only consider exponents that are odd primes, or 4. The idea is that a number like 512 can be expressed as either $2^9$ or $8^3$, and my program doesn't need to consider both. In general, any time we have a composite exponent, such as $b^{qp}$, where $p$ is prime, we should ignore $A=b, x=qp$, and instead consider only $A=b^q, x=p$. There's one complication to this scheme: 2 is a prime, but 2 is not a valid exponent for a Beal counterexample. So we will allow 4 as an exponent, as well as all odd primes up to max_x.

Here is the complete, updated program:

In [7]:
from math      import gcd, log
from itertools import combinations, product

def beal(max_A, max_x):
    """See if any A ** x + B ** y equals some C ** z, with gcd(A, B) == 1.
    Consider any 1 <= A,B <= max_A and x,y <= max_x, with x,y prime or 4."""
    Apowers = make_Apowers(max_A, max_x)
    Czroots = make_Czroots(Apowers)
    for (A, B) in combinations(Apowers, 2):
        if gcd(A, B) == 1:
            for (Ax, By) in product(Apowers[A], Apowers[B]):       
                Cz = Ax + By
                if Cz in Czroots:
                    C = Czroots[Cz]
                    x, y, z = exponent(Ax, A), exponent(By, B), exponent(Cz, C)
                    print('{} ** {} + {} ** {} == {} ** {} == {}'
                          .format(A, x, B, y, C, z, C ** z))

def make_Apowers(max_A, max_x): 
    "A dict of {A: [A**3, A**4, ...], ...}."
    exponents = exponents_upto(max_x)
    return {A: [A ** x for x in (exponents if (A != 1) else [3])]
            for A in range(1, max_A+1)}

def make_Czroots(Apowers): return {Cz: C for C in Apowers for Cz in Apowers[C]}            
    
def exponents_upto(max_x):
    "Return all odd primes up to max_x, as well as 4."
    exponents = [3, 4] if max_x >= 4 else [3] if max_x == 3 else []
    for x in range(5, max_x, 2):
        if not any(x % p == 0 for p in exponents):
            exponents.append(x)
    return exponents

def exponent(Cz, C): 
    """Recover z such that C ** z == Cz (or equivalently z = log Cz base C).
    For exponent(1, 1), arbitrarily choose to return 3."""
    return 3 if (Cz == C == 1) else int(round(log(Cz, C)))

It takes less than a second to verify that there are no counterexamples for combinations up to $100^{100}$, a computation that took Andrew Beal thousands of hours on his 1990s-era computers:

In [8]:
%time beal(100, 100)
CPU times: user 352 ms, sys: 2.2 ms, total: 354 ms
Wall time: 354 ms

The execution time goes up roughly with the square of max_A, so with 5 times more A values, this computation takes about 25 times longer:

In [9]:
%time beal(500, 100)
CPU times: user 10.8 s, sys: 143 ms, total: 11 s
Wall time: 11.1 s

How beal Works

The function beal first does some precomputation, creating two data structures:

  • Apowers: a dict of the form {A: [A**3, A**4, ...]} giving the nonredundant powers (prime and 4th powers) of each base, A, from 1 to max_x.
  • Czroots: a dict of {C**z : C} pairs, giving the zth root of each power in Apowers.

Then we consider all combinations of two bases, A and B, from Apowers. Here is a very small example Apowers table:

In [10]:
Apowers = make_Apowers(6, 10)
Apowers
Out[10]:
{1: [1],
 2: [8, 16, 32, 128],
 3: [27, 81, 243, 2187],
 4: [64, 256, 1024, 16384],
 5: [125, 625, 3125, 78125],
 6: [216, 1296, 7776, 279936]}

Consider the combination where A is 3 and B is 6. Of course gcd(3, 6) == 3, so the program would not consider them further, but imagine if they did not share a common factor. Then we would look at all possible Ax + By sums, for Ax in [27, 81, 243, 2187] and By in [216, 1296, 7776, 279936]. One of these would be 27 + 216, which sums to 243. We look up 243 in Czroots:

In [11]:
Czroots = make_Czroots(Apowers)
print(Czroots)
Czroots[243]
{128: 2, 1: 1, 1296: 6, 1024: 4, 32: 2, 8: 2, 64: 4, 2187: 3, 78125: 5, 256: 4, 16384: 4, 16: 2, 81: 3, 279936: 6, 243: 3, 3125: 5, 625: 5, 216: 6, 7776: 6, 27: 3, 125: 5}
Out[11]:
3

We see that 243 is in Czroots, with value 3, so this would be a counterexample (except for the common factor). The program uses the exponent function to recover the values of x, y, z, and prints the results.

Is the Program Correct?

Can we gain confidence in the program? It is difficult to test beal, because the expected output is nothing, for all known inputs. One thing we can do is verify that beal finds cases like 3 ** 3 + 6 ** 3 == 3 ** 5 == 243 that would be a counterexample except for the common factor 3. We can test this by temporarily replacing the gcd function with a mock function that always reports no common factors:

In [12]:
def gcd(a, b): return 1

beal(100, 100)
3 ** 3 + 6 ** 3 == 3 ** 5 == 243
7 ** 7 + 49 ** 3 == 98 ** 3 == 941192
8 ** 4 + 16 ** 3 == 2 ** 13 == 8192
8 ** 5 + 32 ** 3 == 16 ** 4 == 65536
9 ** 3 + 18 ** 3 == 9 ** 4 == 6561
16 ** 5 + 32 ** 4 == 8 ** 7 == 2097152
17 ** 4 + 34 ** 4 == 17 ** 5 == 1419857
19 ** 4 + 38 ** 3 == 57 ** 3 == 185193
27 ** 3 + 54 ** 3 == 3 ** 11 == 177147
28 ** 3 + 84 ** 3 == 28 ** 4 == 614656
34 ** 5 + 51 ** 4 == 85 ** 4 == 52200625

Let's make sure all those expressions are true:

In [13]:
{3 ** 3 + 6 ** 3 == 3 ** 5 == 243,
 7 ** 7 + 49 ** 3 == 98 ** 3 == 941192,
 8 ** 4 + 16 ** 3 == 2 ** 13 == 8192,
 8 ** 5 + 32 ** 3 == 16 ** 4 == 65536,
 9 ** 3 + 18 ** 3 == 9 ** 4 == 6561,
 16 ** 5 + 32 ** 4 == 8 ** 7 == 2097152,
 17 ** 4 + 34 ** 4 == 17 ** 5 == 1419857,
 19 ** 4 + 38 ** 3 == 57 ** 3 == 185193,
 27 ** 3 + 54 ** 3 == 3 ** 11 == 177147,
 28 ** 3 + 84 ** 3 == 28 ** 4 == 614656,
 34 ** 5 + 51 ** 4 == 85 ** 4 == 52200625}
Out[13]:
{True}

I get nervous having an incorrect version of gcd around; let's change it back, quick!

In [14]:
from math import gcd

beal(100, 100)

We can also provide some test cases for the subfunctions of beal:

In [15]:
def tests():
    assert make_Apowers(6, 10) == {
         1: [1],
         2: [8, 16, 32, 128],
         3: [27, 81, 243, 2187],
         4: [64, 256, 1024, 16384],
         5: [125, 625, 3125, 78125],
         6: [216, 1296, 7776, 279936]}
    
    assert make_Czroots(make_Apowers(5, 8)) == {
        1: 1, 8: 2, 16: 2, 27: 3, 32: 2, 64: 4, 81: 3,
        125: 5, 128: 2, 243: 3, 256: 4, 625: 5, 1024: 4,
        2187: 3, 3125: 5, 16384: 4, 78125: 5}
    Czroots = make_Czroots(make_Apowers(100, 100))
    assert 3 ** 3 + 6 ** 3 in Czroots
    assert 99 ** 97 in Czroots
    assert 101 ** 100 not in Czroots
    assert Czroots[99 ** 97] == 99
    
    assert exponent(10 ** 5, 10) == 5
    assert exponent(7 ** 3, 7) == 3
    assert exponent(1234 ** 999, 1234) == 999
    assert exponent(12345 ** 6789, 12345) == 6789
    assert exponent(3 ** 10000, 3) == 10000
    assert exponent(1, 1) == 3
    
    assert exponents_upto(2) == []
    assert exponents_upto(3) == [3]
    assert exponents_upto(4) == [3, 4]
    assert exponents_upto(40) == [3, 4, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
    assert exponents_upto(100) == [
        3, 4, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 
        67, 71, 73, 79, 83, 89, 97]
    
    assert gcd(3, 6) == 3
    assert gcd(3, 7) == 1
    assert gcd(861591083269373931, 94815872265407) == 97
    assert gcd(2*3*5*(7**10)*(11**12), 3*(7**5)*(11**13)*17) == 3*(7**5)*(11**12)
    
    return 'tests pass'
    
tests()
Out[15]:
'tests pass'

The program is mostly straightforward, but relies on the correctness of these arguments:

  • Are we justified in taking combinations without replacements from the table? In other words, are we sure there are no solutions of the form $A^x + A^x = C^z$? Yes, we can be sure, because then $2\;A^x = C^z$, and all the factors of $A$ would also be factors of $C$.

  • Are we justified in having a single value for each key in the Czroots table? Consider that $81 = 3^4 = 9^2$. We put {81: 3} in the table and discard {81: 9}, because any number that has 9 as a factor will always have 3 as a factor as well, so 3 is all we need to know. But what if a number could be formed with two bases where neither was a multiple of the other? For example, what if $2^7 = 5^3 = s$; then wouldn't we have to have both 2 and 5 as values for $s$ in the table? Fortunately, that can never happen, because of the fundamental theorem of arithmetic.

  • Could there be a rounding error involving the exponent function that was not caught by the tests? Possibly; but exponent is not used to find counterexamples, only to print them, so any such error wouldn't cause us to miss a counterexample.

  • Are we justified in only considering exponents that are odd primes, or the number 4? In one sense, yes, because when we consider the two terms $A^{qp}$ and $(A^q)^p$, we find they are always equal, and always have the same prime factors (the factors of $A$), so for the purposes of the Beal problem, they are equivalent, and we only need consider one of them. In another sense, there is a difference. With this optimization, when we run beal(6, 10), we are no longer testing $512$ as a value of $A$ or $B$, even though $512 = 2^9$ and both $2$ and $9$ are within range, because the program chooses to express $512$ as $8^3$, and $8$ is not in the specified range. So the program is still correctly searching for counterexamples, but the space that it searches for given max_A and max_x is different with this optimization.

  • Are we really sure that when $A$ and $B$ have a common factor greater than 1, then $C$ also shares that common factor? Yes, because if $p$ is a factor of both $A$ and $B$, then it is a factor of $A^x + B^y$, and since we know this is equal to $C^z$, then $p$ must also be a factor of $C^z$, and thus a factor of $C$.

Faster Arithmetic (mod p)

Arithmetic is slow with integers that have thousands of digits. If we want to explore much further, we'll have to make the program more efficient. An obvious improvement would be to do all the arithmetic module some prime number $p$ that fits in one word. Then we know:

$$\mbox{if} ~~ A^x + B^y = C^z ~~ \mbox{then} ~~ A^x (\mbox{mod} ~ p) + B^y (\mbox{mod} ~ p) = C^z \;(\mbox{mod} ~ p)$$

So we can do efficient tests modulo $p$, and then do the full arithmetic only for combinations that work modulo $p$. Unfortunately there will be collisions (two numbers that are distinct, but are equal mod $p$), so the tables will have to have lists of values. Here is a simple, unoptimized implementation:

In [16]:
from math        import gcd
from itertools   import combinations, product
from collections import defaultdict
                            
def beal_modp(max_A, max_x, p=2**31-1):
    """See if any A ** x + B ** y equals some C ** z (mod p), with gcd(A, B) == 1.
    If so, verify that the equation works without the (mod p).
    Consider any 1 <= A,B <= max_A and x,y <= max_x, with x,y prime or 4."""
    assert p >= max_A
    Apowers = make_Apowers_modp(max_A, max_x, p)
    Czroots = make_Czroots_modp(Apowers)
    for (A, B) in combinations(Apowers, 2):
        if gcd(A, B) == 1:
            for (Axp, x), (Byp, y) in product(Apowers[A], Apowers[B]):  
                Czp = Axp + Byp
                if Czp in Czroots:
                    lhs = A ** x + B ** y
                    for (C, z) in Czroots[Czp]:
                        if lhs == C ** z:
                            print('{} ** {} + {} ** {} == {} ** {} == {}'
                                  .format(A, x, B, y, C, z, C ** z))                        
                    

def make_Apowers_modp(max_A, max_x, p): 
    "A dict of {A: [(A**3 (mod p), 3), (A**4 (mod p), 4), ...]}."
    exponents = exponents_upto(max_x)
    return {A: [(pow(A, x, p), x) for x in (exponents if (A != 1) else [3])]
            for A in range(1, max_A+1)}

def make_Czroots_modp(Apowers): 
    "A dict of {C**z (mod p): [(C, z),...]}"
    Czroots = defaultdict(list)
    for A in Apowers:
        for (Axp, x) in Apowers[A]:
            Czroots[Axp].append((A, x))
    return Czroots 

Here we see that each entry in the Apowers table is a list of (A**x (mod p), x) pairs. For example, $6^7 = 279,936$, so in our (mod 1000) table we have the pair (936, 7) under 6.

In [17]:
Apowers = make_Apowers_modp(6, 10, 1000)
Apowers
Out[17]:
{1: [(1, 3)],
 2: [(8, 3), (16, 4), (32, 5), (128, 7)],
 3: [(27, 3), (81, 4), (243, 5), (187, 7)],
 4: [(64, 3), (256, 4), (24, 5), (384, 7)],
 5: [(125, 3), (625, 4), (125, 5), (125, 7)],
 6: [(216, 3), (296, 4), (776, 5), (936, 7)]}

And each item in the Czroots table is of the form {C**z (mod p): [(C, z), ...]}. For example, 936: [(6, 7)].

In [18]:
make_Czroots_modp(Apowers)
Out[18]:
defaultdict(list,
            {1: [(1, 3)],
             8: [(2, 3)],
             16: [(2, 4)],
             24: [(4, 5)],
             27: [(3, 3)],
             32: [(2, 5)],
             64: [(4, 3)],
             81: [(3, 4)],
             125: [(5, 3), (5, 5), (5, 7)],
             128: [(2, 7)],
             187: [(3, 7)],
             216: [(6, 3)],
             243: [(3, 5)],
             256: [(4, 4)],
             296: [(6, 4)],
             384: [(4, 7)],
             625: [(5, 4)],
             776: [(6, 5)],
             936: [(6, 7)]})

Let's run the program:

In [19]:
%time beal_modp(500, 100)
CPU times: user 9 s, sys: 145 ms, total: 9.14 s
Wall time: 9.27 s

This is a bit faster than the previous version, and the idea is that as we start dealing with much larger integers, this version will be even faster, relatively. I could improve this version by caching certain computations, managing the memory layout better, moving some computations out of loops, considering using multiple primes (as in a Bloom filter), finding a way to parallelize the program, and re-coding in a faster compiled language (such as C++ or Go or Julia). Then I could invest thousands (or millions) of CPU hours searching for counterexamples.

But Witold Jarnicki and David Konerding already did that: they wrote a C++ program that built a table of $C^z \;(\mbox{mod} \; p)$ up to $5000^{5000}$, and, in parallel across thousands of machines, searched for $A, B$ up to 200,000 and $x, y$ up to 5,000, but found no counterexamples. On a smaller scale, Edwin P. Berlin Jr. searched all $C^z$ up to $10^{17}$ and also found nothing. So I don't think it is worthwhile to continue on that path.

Conclusion

This was fun, but I can't recommend anyone spend a serious amount of computer time looking for counterexamples to the Beal Conjecture—the money you invest in computer time would be more than the expected value of your prize winnings. I suggest you work on a proof rather than a counterexample, or work on some other interesting problem instead!