[java] Fastest way to determine if an integer's square root is an integer

I'm looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer):

  1. I've done it the easy way, by using the built-in Math.sqrt() function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain.
  2. Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).

Here is the very simple and straightforward way I'm doing it now:

public final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  long tst = (long)(Math.sqrt(n) + 0.5);
  return tst*tst == n;
}

Note: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.


I've tried the different solutions to the problem:

  • After exhaustive testing, I found that adding 0.5 to the result of Math.sqrt() is not necessary, at least not on my machine.
  • The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
  • Newton's method was a good bit slower than Math.sqrt(). This is probably because Math.sqrt() uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles.
  • A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than Math.sqrt().
  • Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
  • According to John's tests, using or statements is faster in C++ than using a switch, but in Java and C# there appears to be no difference between or and switch.
  • I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or or statement, I would just say if(lookup[(int)(n&0x3F)]) { test } else return false;. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.

This question is related to java math optimization perfect-square

The answer is


If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

So having calculated n^2, the options are:

  • n^2 = target: done, return true
  • n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
  • n^2 - 2n + 1 < target < n^2 : ditto
  • target < n^2 - 2n + 1 : binary chop on a lower n
  • target > n^2 + 2n + 1 : binary chop on a higher n

(Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

I don't know whether this will be faster or not, but it's worth a try.

EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.


If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

So having calculated n^2, the options are:

  • n^2 = target: done, return true
  • n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
  • n^2 - 2n + 1 < target < n^2 : ditto
  • target < n^2 - 2n + 1 : binary chop on a lower n
  • target > n^2 + 2n + 1 : binary chop on a higher n

(Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

I don't know whether this will be faster or not, but it's worth a try.

EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.


The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.

So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. Oh well.

In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.

typedef unsigned long long UVLONG
UVLONG pp1,pp2;

void init2() {
  for (int i = 0; i < 64; i++) {
    for (int j = 0; j < 64; j++)
      if (isPerfectSquare(i * 64 + j)) {
    pp1 |= (1 << j);
    pp2 |= (1 << i);
    break;
      }
   }
   cout << "pp1=" << pp1 << "," << pp2 << "\n";  
}


inline bool isPerfectSquare5(UVLONG x) {
  return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false;
}

The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.

Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.

Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.

The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.

There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.


It's been pointed out that the last d digits of a perfect square can only take on certain values. The last d digits (in base b) of a number n is the same as the remainder when n is divided by bd, ie. in C notation n % pow(b, d).

This can be generalized to any modulus m, ie. n % m can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.

By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.


It ought to be possible to pack the 'cannot be a perfect square if the last X digits are N' much more efficiently than that! I'll use java 32 bit ints, and produce enough data to check the last 16 bits of the number - that's 2048 hexadecimal int values.

...

Ok. Either I have run into some number theory that is a little beyond me, or there is a bug in my code. In any case, here is the code:

public static void main(String[] args) {
    final int BITS = 16;

    BitSet foo = new BitSet();

    for(int i = 0; i< (1<<BITS); i++) {
        int sq = (i*i);
        sq = sq & ((1<<BITS)-1);
        foo.set(sq);
    }

    System.out.println("int[] mayBeASquare = {");

    for(int i = 0; i< 1<<(BITS-5); i++) {
        int kk = 0;
        for(int j = 0; j<32; j++) {
            if(foo.get((i << 5) | j)) {
                kk |= 1<<j;
            }
        }
        System.out.print("0x" + Integer.toHexString(kk) + ", ");
        if(i%8 == 7) System.out.println();
    }
    System.out.println("};");
}

and here are the results:

(ed: elided for poor performance in prettify.js; view revision history to see.)


Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.

Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.


Here is the simplest and most concise way, although I do not know how it compares in terms of CPU cycles. This works great if you only wish to know if the root is a whole number. If you really care if it is an integer, you can also figure that out. Here is a simple (and pure) function:

private static final MathContext precision = new MathContext(20);

private static final Function<Long, Boolean> isRootWhole = (n) -> {
    long digit = n % 10;
    if (digit == 2 || digit == 3 || digit == 7 || digit == 8) {
        return false;
    }
    return new BigDecimal(n).sqrt(precision).scale() == 0;
};

If you do not need micro-optimization, this answer is better in terms of simplicity and maintainability. If you will be calculating negative numbers, you will need to handle that accordingly, and send the absolute value into the function. I have included a minor optimization because no perfect squares have a tens digit of 2, 3, 7, or 8 due to quadratic residues mod 10.

On my CPU, a run of this algorithm on 0 - 10,000,000 took an average of 1000 - 1100 nanoseconds per calculation.

If you are performing a lesser number of calculations, the earlier calculations take a bit longer.

I had a negative comment that my previous edit did not work for large numbers. The OP mentioned Longs, and the largest perfect square that is a Long is 9223372030926249001, so this method works for all Longs.


If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?


I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:

// This is faster because a number is divisible by 2^4 or more only 6% of the time
// and more than that a vanishingly small percentage.
while((x & 0x3) == 0) x >>= 2;
// This is effectively the same as the switch-case statement used in the original
// answer. 
if((x & 0x7) != 1) return false;

However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the switch-case statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.

The algorithms below are as follows:

  • Internet - Kip's posted answer
  • Durron - My modified answer using the one-pass answer as a base
  • DurronTwo - My modified answer using the two-pass answer (by @JohnnyHeggheim), with some other slight modifications.

Here is a sample runtime if the numbers are generated using Math.abs(java.util.Random.nextLong())

 0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials

benchmark   us linear runtime
 Internet 39.7 ==============================
   Durron 37.8 ============================
DurronTwo 36.0 ===========================

vm: java
trial: 0

And here is a sample runtime if it's run on the first million longs only:

 0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials

benchmark   ms linear runtime
 Internet 2.93 ===========================
   Durron 2.24 =====================
DurronTwo 3.16 ==============================

vm: java
trial: 0

As you can see, DurronTwo does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm and Math.sqrt because the numbers are so much smaller. Meanwhile, the simpler Durron is a huge winner because it never has to divide by 4 many many times in the first million numbers.

Here's Durron:

public final static boolean isPerfectSquareDurron(long n) {
    if(n < 0) return false;
    if(n == 0) return true;

    long x = n;
    // This is faster because a number is divisible by 16 only 6% of the time
    // and more than that a vanishingly small percentage.
    while((x & 0x3) == 0) x >>= 2;
    // This is effectively the same as the switch-case statement used in the original
    // answer. 
    if((x & 0x7) == 1) {

        long sqrt;
        if(x < 410881L)
        {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y  = x;
            i  = Float.floatToRawIntBits(y);
            i  = 0x5f3759df - ( i >> 1 );
            y  = Float.intBitsToFloat(i);
            y  = y * ( 1.5F - ( x2 * y * y ) );

            sqrt = (long)(1.0F/y);
        } else {
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

And DurronTwo

public final static boolean isPerfectSquareDurronTwo(long n) {
    if(n < 0) return false;
    // Needed to prevent infinite loop
    if(n == 0) return true;

    long x = n;
    while((x & 0x3) == 0) x >>= 2;
    if((x & 0x7) == 1) {
        long sqrt;
        if (x < 41529141369L) {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y = x;
            i = Float.floatToRawIntBits(y);
            //using the magic number from 
            //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
            //since it more accurate
            i = 0x5f375a86 - (i >> 1);
            y = Float.intBitsToFloat(i);
            y = y * (1.5F - (x2 * y * y));
            y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
            sqrt = (long) ((1.0F/y) + 0.2);
        } else {
            //Carmack hack gives incorrect answer for n >= 41529141369.
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

And my benchmark harness: (Requires Google caliper 0.1-rc5)

public class SquareRootBenchmark {
    public static class Benchmark1 extends SimpleBenchmark {
        private static final int ARRAY_SIZE = 10000;
        long[] trials = new long[ARRAY_SIZE];

        @Override
        protected void setUp() throws Exception {
            Random r = new Random();
            for (int i = 0; i < ARRAY_SIZE; i++) {
                trials[i] = Math.abs(r.nextLong());
            }
        }


        public int timeInternet(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++;
                }
            }

            return trues;   
        }

        public int timeDurron(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++;
                }
            }

            return trues;   
        }

        public int timeDurronTwo(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++;
                }
            }

            return trues;   
        }
    }

    public static void main(String... args) {
        Runner.main(Benchmark1.class, args);
    }
}

UPDATE: I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo 0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241, we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:

if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;

The resulting index is either 1) the residue, 2) the residue + 0xFFFFFF, or 3) the residue + 0x1FFFFFE. Of course, we need to have a lookup table for residues modulo 0xFFFFFF, which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with a ByteBuffer and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):

public final static boolean isPerfectSquareDurronThree(long n) {
    if(n < 0) return false;
    if(n == 0) return true;

    long x = n;
    while((x & 0x3) == 0) x >>= 2;
    if((x & 0x7) == 1) {
        if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
        long sqrt;
        if(x < 410881L)
        {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y  = x;
            i  = Float.floatToRawIntBits(y);
            i  = 0x5f3759df - ( i >> 1 );
            y  = Float.intBitsToFloat(i);
            y  = y * ( 1.5F - ( x2 * y * y ) );

            sqrt = (long)(1.0F/y);
        } else {
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

I load it into a boolean array like this:

private static boolean[] goodLookupSquares = null;

public static void initGoodLookupSquares() throws Exception {
    Scanner s = new Scanner(new File("24residues_squares.txt"));

    goodLookupSquares = new boolean[0x1FFFFFE];

    while(s.hasNextLine()) {
        int residue = Integer.valueOf(s.nextLine());
        goodLookupSquares[residue] = true;
        goodLookupSquares[residue + 0xFFFFFF] = true;
        goodLookupSquares[residue + 0x1FFFFFE] = true;
    }

    s.close();
}

Example runtime. It beat Durron (version one) in every trial I ran.

 0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials

  benchmark   us linear runtime
   Internet 40.7 ==============================
     Durron 38.4 ============================
DurronThree 36.2 ==========================

vm: java
trial: 0

Here is a divide and conquer solution.

If the square root of a natural number (number) is a natural number (solution), you can easily determine a range for solution based on the number of digits of number:

  • number has 1 digit: solution in range = 1 - 4
  • number has 2 digits: solution in range = 3 - 10
  • number has 3 digits: solution in range = 10 - 40
  • number has 4 digits: solution in range = 30 - 100
  • number has 5 digits: solution in range = 100 - 400

Notice the repetition?

You can use this range in a binary search approach to see if there is a solution for which:

number == solution * solution

Here is the code

Here is my class SquareRootChecker

public class SquareRootChecker {

    private long number;
    private long initialLow;
    private long initialHigh;

    public SquareRootChecker(long number) {
        this.number = number;

        initialLow = 1;
        initialHigh = 4;
        if (Long.toString(number).length() % 2 == 0) {
            initialLow = 3;
            initialHigh = 10;
        }
        for (long i = 0; i < Long.toString(number).length() / 2; i++) {
            initialLow *= 10;
            initialHigh *= 10;
        }
        if (Long.toString(number).length() % 2 == 0) {
            initialLow /= 10;
            initialHigh /=10;
        }
    }

    public boolean checkSquareRoot() {
        return findSquareRoot(initialLow, initialHigh, number);
    }

    private boolean findSquareRoot(long low, long high, long number) {
        long check = low + (high - low) / 2;
        if (high >= low) {
            if (number == check * check) {
                return true;
            }
            else if (number < check * check) {
                high = check - 1;
                return findSquareRoot(low, high, number);
            }
            else  {
                low = check + 1;
                return findSquareRoot(low, high, number);
            }
        }
        return false;
    }

}

And here is an example on how to use it.

long number =  1234567;
long square = number * number;
SquareRootChecker squareRootChecker = new SquareRootChecker(square);
System.out.println(square + ": " + squareRootChecker.checkSquareRoot()); //Prints "1524155677489: true"

long notSquare = square + 1;
squareRootChecker = new SquareRootChecker(notSquare);
System.out.println(notSquare + ": " + squareRootChecker.checkSquareRoot()); //Prints "1524155677490: false"

The following simplification of maaartinus's solution appears to shave a few percentage points off the runtime, but I'm not good enough at benchmarking to produce a benchmark I can trust:

long goodMask; // 0xC840C04048404040 computed below
{
    for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
}

public boolean isSquare(long x) {
    // This tests if the 6 least significant bits are right.
    // Moving the to be tested bit to the highest position saves us masking.
    if (goodMask << x >= 0) return false;
    // Remove an even number of trailing zeros, leaving at most one.
    x >>= (Long.numberOfTrailingZeros(x) & (-2);
    // Repeat the test on the 6 least significant remaining bits.
    if (goodMask << x >= 0 | x <= 0) return x == 0;
    // Do it in the classical way.
    // The correctness is not trivial as the conversion from long to double is lossy!
    final long tst = (long) Math.sqrt(x);
    return tst * tst == x;
}

It would be worth checking how omitting the first test,

if (goodMask << x >= 0) return false;

would affect performance.


For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.


Here is the simplest and most concise way, although I do not know how it compares in terms of CPU cycles. This works great if you only wish to know if the root is a whole number. If you really care if it is an integer, you can also figure that out. Here is a simple (and pure) function:

private static final MathContext precision = new MathContext(20);

private static final Function<Long, Boolean> isRootWhole = (n) -> {
    long digit = n % 10;
    if (digit == 2 || digit == 3 || digit == 7 || digit == 8) {
        return false;
    }
    return new BigDecimal(n).sqrt(precision).scale() == 0;
};

If you do not need micro-optimization, this answer is better in terms of simplicity and maintainability. If you will be calculating negative numbers, you will need to handle that accordingly, and send the absolute value into the function. I have included a minor optimization because no perfect squares have a tens digit of 2, 3, 7, or 8 due to quadratic residues mod 10.

On my CPU, a run of this algorithm on 0 - 10,000,000 took an average of 1000 - 1100 nanoseconds per calculation.

If you are performing a lesser number of calculations, the earlier calculations take a bit longer.

I had a negative comment that my previous edit did not work for large numbers. The OP mentioned Longs, and the largest perfect square that is a Long is 9223372030926249001, so this method works for all Longs.


Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.

First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.

A solution would then be like this:

boolean isPerfectSquare(long number)
{
    if (number < 0) return false;
    if (number < 2) return true;

    for (int i = 0; ; i++)
    {
        long square = squareTable[i];
        if (square > number) return false;
        while (number % square == 0)
        {
            number /= square;
        }
        if (number == 1) return true;
    }
}

I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.

Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.


Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.

Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.


"I'm looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer)."

The answers are impressive, but I failed to see a simple check :

check whether the first number on the right of the long it a member of the set (0,1,4,5,6,9) . If it is not, then it cannot possibly be a 'perfect square' .

eg.

4567 - cannot be a perfect square.


I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:

// This is faster because a number is divisible by 2^4 or more only 6% of the time
// and more than that a vanishingly small percentage.
while((x & 0x3) == 0) x >>= 2;
// This is effectively the same as the switch-case statement used in the original
// answer. 
if((x & 0x7) != 1) return false;

However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the switch-case statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.

The algorithms below are as follows:

  • Internet - Kip's posted answer
  • Durron - My modified answer using the one-pass answer as a base
  • DurronTwo - My modified answer using the two-pass answer (by @JohnnyHeggheim), with some other slight modifications.

Here is a sample runtime if the numbers are generated using Math.abs(java.util.Random.nextLong())

 0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials

benchmark   us linear runtime
 Internet 39.7 ==============================
   Durron 37.8 ============================
DurronTwo 36.0 ===========================

vm: java
trial: 0

And here is a sample runtime if it's run on the first million longs only:

 0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials

benchmark   ms linear runtime
 Internet 2.93 ===========================
   Durron 2.24 =====================
DurronTwo 3.16 ==============================

vm: java
trial: 0

As you can see, DurronTwo does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm and Math.sqrt because the numbers are so much smaller. Meanwhile, the simpler Durron is a huge winner because it never has to divide by 4 many many times in the first million numbers.

Here's Durron:

public final static boolean isPerfectSquareDurron(long n) {
    if(n < 0) return false;
    if(n == 0) return true;

    long x = n;
    // This is faster because a number is divisible by 16 only 6% of the time
    // and more than that a vanishingly small percentage.
    while((x & 0x3) == 0) x >>= 2;
    // This is effectively the same as the switch-case statement used in the original
    // answer. 
    if((x & 0x7) == 1) {

        long sqrt;
        if(x < 410881L)
        {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y  = x;
            i  = Float.floatToRawIntBits(y);
            i  = 0x5f3759df - ( i >> 1 );
            y  = Float.intBitsToFloat(i);
            y  = y * ( 1.5F - ( x2 * y * y ) );

            sqrt = (long)(1.0F/y);
        } else {
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

And DurronTwo

public final static boolean isPerfectSquareDurronTwo(long n) {
    if(n < 0) return false;
    // Needed to prevent infinite loop
    if(n == 0) return true;

    long x = n;
    while((x & 0x3) == 0) x >>= 2;
    if((x & 0x7) == 1) {
        long sqrt;
        if (x < 41529141369L) {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y = x;
            i = Float.floatToRawIntBits(y);
            //using the magic number from 
            //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
            //since it more accurate
            i = 0x5f375a86 - (i >> 1);
            y = Float.intBitsToFloat(i);
            y = y * (1.5F - (x2 * y * y));
            y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
            sqrt = (long) ((1.0F/y) + 0.2);
        } else {
            //Carmack hack gives incorrect answer for n >= 41529141369.
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

And my benchmark harness: (Requires Google caliper 0.1-rc5)

public class SquareRootBenchmark {
    public static class Benchmark1 extends SimpleBenchmark {
        private static final int ARRAY_SIZE = 10000;
        long[] trials = new long[ARRAY_SIZE];

        @Override
        protected void setUp() throws Exception {
            Random r = new Random();
            for (int i = 0; i < ARRAY_SIZE; i++) {
                trials[i] = Math.abs(r.nextLong());
            }
        }


        public int timeInternet(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++;
                }
            }

            return trues;   
        }

        public int timeDurron(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++;
                }
            }

            return trues;   
        }

        public int timeDurronTwo(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++;
                }
            }

            return trues;   
        }
    }

    public static void main(String... args) {
        Runner.main(Benchmark1.class, args);
    }
}

UPDATE: I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo 0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241, we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:

if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;

The resulting index is either 1) the residue, 2) the residue + 0xFFFFFF, or 3) the residue + 0x1FFFFFE. Of course, we need to have a lookup table for residues modulo 0xFFFFFF, which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with a ByteBuffer and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):

public final static boolean isPerfectSquareDurronThree(long n) {
    if(n < 0) return false;
    if(n == 0) return true;

    long x = n;
    while((x & 0x3) == 0) x >>= 2;
    if((x & 0x7) == 1) {
        if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
        long sqrt;
        if(x < 410881L)
        {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y  = x;
            i  = Float.floatToRawIntBits(y);
            i  = 0x5f3759df - ( i >> 1 );
            y  = Float.intBitsToFloat(i);
            y  = y * ( 1.5F - ( x2 * y * y ) );

            sqrt = (long)(1.0F/y);
        } else {
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

I load it into a boolean array like this:

private static boolean[] goodLookupSquares = null;

public static void initGoodLookupSquares() throws Exception {
    Scanner s = new Scanner(new File("24residues_squares.txt"));

    goodLookupSquares = new boolean[0x1FFFFFE];

    while(s.hasNextLine()) {
        int residue = Integer.valueOf(s.nextLine());
        goodLookupSquares[residue] = true;
        goodLookupSquares[residue + 0xFFFFFF] = true;
        goodLookupSquares[residue + 0x1FFFFFE] = true;
    }

    s.close();
}

Example runtime. It beat Durron (version one) in every trial I ran.

 0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials

  benchmark   us linear runtime
   Internet 40.7 ==============================
     Durron 38.4 ============================
DurronThree 36.2 ==========================

vm: java
trial: 0

I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.

Just replace your:

if(n < 410881L){...}

code with this one:

if (n < 11043908100L) {
    //John Carmack hack, converted to Java.
    // See: http://www.codemaestro.com/reviews/9
    int i;
    float x2, y;

    x2 = n * 0.5F;
    y = n;
    i = Float.floatToRawIntBits(y);
    //using the magic number from 
    //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
    //since it more accurate
    i = 0x5f375a86 - (i >> 1);
    y = Float.intBitsToFloat(i);
    y = y * (1.5F - (x2 * y * y));
    y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate

    sqrt = Math.round(1.0F / y);
} else {
    //Carmack hack gives incorrect answer for n >= 11043908100.
    sqrt = (long) Math.sqrt(n);
}

For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.


I want this function to work with all positive 64-bit signed integers

Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.


You should get rid of the 2-power part of N right from the start.

2nd Edit The magical expression for m below should be

m = N - (N & (N-1));

and not as written

End of 2nd edit

m = N & (N-1); // the lawest bit of N
N /= m;
byte = N & 0x0F;
if ((m % 2) || (byte !=1 && byte !=9))
  return false;

1st Edit:

Minor improvement:

m = N & (N-1); // the lawest bit of N
N /= m;
if ((m % 2) || (N & 0x07 != 1))
  return false;

End of 1st edit

Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. I.e. you run the floating point part on 6% of the numbers.


Newton's Method with integer arithmetic

If you wish to avoid non-integer operations you could use the method below. It basically uses Newton's Method modified for integer arithmetic.

/**
 * Test if the given number is a perfect square.
 * @param n Must be greater than 0 and less
 *    than Long.MAX_VALUE.
 * @return <code>true</code> if n is a perfect
 *    square, or <code>false</code> otherwise.
 */
public static boolean isSquare(long n)
{
    long x1 = n;
    long x2 = 1L;

    while (x1 > x2)
    {
        x1 = (x1 + x2) / 2L;
        x2 = n / x1;
    }

    return x1 == x2 && n % x1 == 0L;
}

This implementation can not compete with solutions that use Math.sqrt. However, its performance can be improved by using the filtering mechanisms described in some of the other posts.


Square Root of a number, given that the number is a perfect square.

The complexity is log(n)

/**
 * Calculate square root if the given number is a perfect square.
 * 
 * Approach: Sum of n odd numbers is equals to the square root of n*n, given 
 * that n is a perfect square.
 *
 * @param number
 * @return squareRoot
 */

public static int calculateSquareRoot(int number) {

    int sum=1;
    int count =1;
    int squareRoot=1;
    while(sum<number) {
        count+=2;
        sum+=count;
        squareRoot++;
    }
    return squareRoot;
}

You should get rid of the 2-power part of N right from the start.

2nd Edit The magical expression for m below should be

m = N - (N & (N-1));

and not as written

End of 2nd edit

m = N & (N-1); // the lawest bit of N
N /= m;
byte = N & 0x0F;
if ((m % 2) || (byte !=1 && byte !=9))
  return false;

1st Edit:

Minor improvement:

m = N & (N-1); // the lawest bit of N
N /= m;
if ((m % 2) || (N & 0x07 != 1))
  return false;

End of 1st edit

Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. I.e. you run the floating point part on 6% of the numbers.


It's been pointed out that the last d digits of a perfect square can only take on certain values. The last d digits (in base b) of a number n is the same as the remainder when n is divided by bd, ie. in C notation n % pow(b, d).

This can be generalized to any modulus m, ie. n % m can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.

By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.


I checked all of the possible results when the last n bits of a square is observed. By successively examining more bits, up to 5/6th of inputs can be eliminated. I actually designed this to implement Fermat's Factorization algorithm, and it is very fast there.

public static boolean isSquare(final long val) {
   if ((val & 2) == 2 || (val & 7) == 5) {
     return false;
   }
   if ((val & 11) == 8 || (val & 31) == 20) {
     return false;
   }

   if ((val & 47) == 32 || (val & 127) == 80) {
     return false;
   }

   if ((val & 191) == 128 || (val & 511) == 320) {
     return false;
   }

   // if((val & a == b) || (val & c == d){
   //   return false;
   // }

   if (!modSq[(int) (val % modSq.length)]) {
        return false;
   }

   final long root = (long) Math.sqrt(val);
   return root * root == val;
}

The last bit of pseudocode can be used to extend the tests to eliminate more values. The tests above are for k = 0, 1, 2, 3

  • a is of the form (3 << 2k) - 1
  • b is of the form (2 << 2k)
  • c is of the form (2 << 2k + 2) - 1
  • d is of the form (2 << 2k - 1) * 10

    It first tests whether it has a square residual with moduli of power of two, then it tests based on a final modulus, then it uses the Math.sqrt to do a final test. I came up with the idea from the top post, and attempted to extend upon it. I appreciate any comments or suggestions.

    Update: Using the test by a modulus, (modSq) and a modulus base of 44352, my test runs in 96% of the time of the one in the OP's update for numbers up to 1,000,000,000.


  • Project Euler is mentioned in the tags and many of the problems in it require checking numbers >> 2^64. Most of the optimizations mentioned above don't work easily when you are working with an 80 byte buffer.

    I used java BigInteger and a slightly modified version of Newton's method, one that works better with integers. The problem was that exact squares n^2 converged to (n-1) instead of n because n^2-1 = (n-1)(n+1) and the final error was just one step below the final divisor and the algorithm terminated. It was easy to fix by adding one to the original argument before computing the error. (Add two for cube roots, etc.)

    One nice attribute of this algorithm is that you can immediately tell if the number is a perfect square - the final error (not correction) in Newton's method will be zero. A simple modification also lets you quickly calculate floor(sqrt(x)) instead of the closest integer. This is handy with several Euler problems.


    Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.

    Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.


    It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

    Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.


    Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.

    First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.

    A solution would then be like this:

    boolean isPerfectSquare(long number)
    {
        if (number < 0) return false;
        if (number < 2) return true;
    
        for (int i = 0; ; i++)
        {
            long square = squareTable[i];
            if (square > number) return false;
            while (number % square == 0)
            {
                number /= square;
            }
            if (number == 1) return true;
        }
    }
    

    I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.

    Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.


    If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.


    Considering for general bit length (though I have used specific type here), I tried to design simplistic algo as below. Simple and obvious check for 0,1,2 or <0 is required initially. Following is simple in sense that it doesn't try to use any existing maths functions. Most of the operator can be replaced with bit-wise operators. I haven't tested with any bench mark data though. I'm neither expert at maths or computer algorithm design in particular, I would love to see you pointing out problem. I know there is lots of improvement chances there.

    int main()
    {
        unsigned int c1=0 ,c2 = 0;  
        unsigned int x = 0;  
        unsigned int p = 0;  
        int k1 = 0;  
        scanf("%d",&p);  
        if(p % 2 == 0) {  
            x = p/2; 
        }  
        else {  
            x = (p/2) +1;  
        }  
        while(x) 
        {
            if((x*x) > p) {  
                c1 = x;  
                x = x/2; 
            }else {  
                c2 = x;  
                break;  
            }  
        }  
        if((p%2) != 0)  
            c2++;
    
        while(c2 < c1) 
        {  
            if((c2 * c2 ) == p) {  
                k1 = 1;  
                break;  
            }  
            c2++; 
        }  
        if(k1)  
            printf("\n Perfect square for %d", c2);  
        else  
            printf("\n Not perfect but nearest to :%d :", c2);  
        return 0;  
    }  
    

    If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.


    You should get rid of the 2-power part of N right from the start.

    2nd Edit The magical expression for m below should be

    m = N - (N & (N-1));
    

    and not as written

    End of 2nd edit

    m = N & (N-1); // the lawest bit of N
    N /= m;
    byte = N & 0x0F;
    if ((m % 2) || (byte !=1 && byte !=9))
      return false;
    

    1st Edit:

    Minor improvement:

    m = N & (N-1); // the lawest bit of N
    N /= m;
    if ((m % 2) || (N & 0x07 != 1))
      return false;
    

    End of 1st edit

    Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. I.e. you run the floating point part on 6% of the numbers.


    Calculating square roots by Newton's method is horrendously fast ... provided that the starting value is reasonable. However there is no reasonable starting value, and in practice we end with bisection and log(2^64) behaviour.
    To be really fast we need a fast way to get at a reasonable starting value, and that means we need to descend into machine language. If a processor provides an instruction like POPCNT in the Pentium, that counts the leading zeroes we can use that to have a starting value with half the significant bits. With care we can find a a fixed number of Newton steps that will always suffice. (Thus foregoing the need to loop and have very fast execution.)

    A second solution is going via the floating point facility, which may have a fast sqrt calculation (like the i87 coprocessor.) Even an excursion via exp() and log() may be faster than Newton degenerated into a binary search. There is a tricky aspect to this, a processor dependant analysis of what and if refinement afterwards is necessary.

    A third solution solves a slightly different problem, but is well worth mentionning because the situation is described in the question. If you want to calculate a great many square roots for numbers that differ slightly, you can use Newton iteration, if you never reinitialise the starting value, but just leave it where the previous calculation left off. I've used this with success in at least one Euler problem.


    It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

    Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.


    Not sure if this is the fastest way, but this is something I stumbled upon (long time ago in high-school) when I was bored and playing with my calculator during math class. At that time, I was really amazed this was working...

    public static boolean isIntRoot(int number) {
        return isIntRootHelper(number, 1);
    }
    
    private static boolean isIntRootHelper(int number, int index) {
        if (number == index) {
            return true;
        }
        if (number < index) {
            return false;
        }
        else {
            return isIntRootHelper(number - 2 * index, index + 1);
        }
    }
    

    It's been pointed out that the last d digits of a perfect square can only take on certain values. The last d digits (in base b) of a number n is the same as the remainder when n is divided by bd, ie. in C notation n % pow(b, d).

    This can be generalized to any modulus m, ie. n % m can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.

    By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.


    I checked all of the possible results when the last n bits of a square is observed. By successively examining more bits, up to 5/6th of inputs can be eliminated. I actually designed this to implement Fermat's Factorization algorithm, and it is very fast there.

    public static boolean isSquare(final long val) {
       if ((val & 2) == 2 || (val & 7) == 5) {
         return false;
       }
       if ((val & 11) == 8 || (val & 31) == 20) {
         return false;
       }
    
       if ((val & 47) == 32 || (val & 127) == 80) {
         return false;
       }
    
       if ((val & 191) == 128 || (val & 511) == 320) {
         return false;
       }
    
       // if((val & a == b) || (val & c == d){
       //   return false;
       // }
    
       if (!modSq[(int) (val % modSq.length)]) {
            return false;
       }
    
       final long root = (long) Math.sqrt(val);
       return root * root == val;
    }
    

    The last bit of pseudocode can be used to extend the tests to eliminate more values. The tests above are for k = 0, 1, 2, 3

  • a is of the form (3 << 2k) - 1
  • b is of the form (2 << 2k)
  • c is of the form (2 << 2k + 2) - 1
  • d is of the form (2 << 2k - 1) * 10

    It first tests whether it has a square residual with moduli of power of two, then it tests based on a final modulus, then it uses the Math.sqrt to do a final test. I came up with the idea from the top post, and attempted to extend upon it. I appreciate any comments or suggestions.

    Update: Using the test by a modulus, (modSq) and a modulus base of 44352, my test runs in 96% of the time of the one in the OP's update for numbers up to 1,000,000,000.


  • Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.

    Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.


    Not sure if this is the fastest way, but this is something I stumbled upon (long time ago in high-school) when I was bored and playing with my calculator during math class. At that time, I was really amazed this was working...

    public static boolean isIntRoot(int number) {
        return isIntRootHelper(number, 1);
    }
    
    private static boolean isIntRootHelper(int number, int index) {
        if (number == index) {
            return true;
        }
        if (number < index) {
            return false;
        }
        else {
            return isIntRootHelper(number - 2 * index, index + 1);
        }
    }
    

    I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.


    An integer problem deserves an integer solution. Thus

    Do binary search on the (non-negative) integers to find the greatest integer t such that t**2 <= n. Then test whether r**2 = n exactly. This takes time O(log n).

    If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above f(t) = t**2 - n) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.


    It ought to be possible to pack the 'cannot be a perfect square if the last X digits are N' much more efficiently than that! I'll use java 32 bit ints, and produce enough data to check the last 16 bits of the number - that's 2048 hexadecimal int values.

    ...

    Ok. Either I have run into some number theory that is a little beyond me, or there is a bug in my code. In any case, here is the code:

    public static void main(String[] args) {
        final int BITS = 16;
    
        BitSet foo = new BitSet();
    
        for(int i = 0; i< (1<<BITS); i++) {
            int sq = (i*i);
            sq = sq & ((1<<BITS)-1);
            foo.set(sq);
        }
    
        System.out.println("int[] mayBeASquare = {");
    
        for(int i = 0; i< 1<<(BITS-5); i++) {
            int kk = 0;
            for(int j = 0; j<32; j++) {
                if(foo.get((i << 5) | j)) {
                    kk |= 1<<j;
                }
            }
            System.out.print("0x" + Integer.toHexString(kk) + ", ");
            if(i%8 == 7) System.out.println();
        }
        System.out.println("};");
    }
    

    and here are the results:

    (ed: elided for poor performance in prettify.js; view revision history to see.)


    Regarding the Carmac method, it seems like it would be quite easy just to iterate once more, which should double the number of digits of accuracy. It is, after all, an extremely truncated iterative method -- Newton's, with a very good first guess.

    Regarding your current best, I see two micro-optimizations:

    • move the check vs. 0 after the check using mod255
    • rearrange the dividing out powers of four to skip all the checks for the usual (75%) case.

    I.e:

    // Divide out powers of 4 using binary search
    
    if((n & 0x3L) == 0) {
      n >>=2;
    
      if((n & 0xffffffffL) == 0)
        n >>= 32;
      if((n & 0xffffL) == 0)
          n >>= 16;
      if((n & 0xffL) == 0)
          n >>= 8;
      if((n & 0xfL) == 0)
          n >>= 4;
      if((n & 0x3L) == 0)
          n >>= 2;
    }
    

    Even better might be a simple

    while ((n & 0x03L) == 0) n >>= 2;
    

    Obviously, it would be interesting to know how many numbers get culled at each checkpoint -- I rather doubt the checks are truly independent, which makes things tricky.


    Here is a divide and conquer solution.

    If the square root of a natural number (number) is a natural number (solution), you can easily determine a range for solution based on the number of digits of number:

    • number has 1 digit: solution in range = 1 - 4
    • number has 2 digits: solution in range = 3 - 10
    • number has 3 digits: solution in range = 10 - 40
    • number has 4 digits: solution in range = 30 - 100
    • number has 5 digits: solution in range = 100 - 400

    Notice the repetition?

    You can use this range in a binary search approach to see if there is a solution for which:

    number == solution * solution
    

    Here is the code

    Here is my class SquareRootChecker

    public class SquareRootChecker {
    
        private long number;
        private long initialLow;
        private long initialHigh;
    
        public SquareRootChecker(long number) {
            this.number = number;
    
            initialLow = 1;
            initialHigh = 4;
            if (Long.toString(number).length() % 2 == 0) {
                initialLow = 3;
                initialHigh = 10;
            }
            for (long i = 0; i < Long.toString(number).length() / 2; i++) {
                initialLow *= 10;
                initialHigh *= 10;
            }
            if (Long.toString(number).length() % 2 == 0) {
                initialLow /= 10;
                initialHigh /=10;
            }
        }
    
        public boolean checkSquareRoot() {
            return findSquareRoot(initialLow, initialHigh, number);
        }
    
        private boolean findSquareRoot(long low, long high, long number) {
            long check = low + (high - low) / 2;
            if (high >= low) {
                if (number == check * check) {
                    return true;
                }
                else if (number < check * check) {
                    high = check - 1;
                    return findSquareRoot(low, high, number);
                }
                else  {
                    low = check + 1;
                    return findSquareRoot(low, high, number);
                }
            }
            return false;
        }
    
    }
    

    And here is an example on how to use it.

    long number =  1234567;
    long square = number * number;
    SquareRootChecker squareRootChecker = new SquareRootChecker(square);
    System.out.println(square + ": " + squareRootChecker.checkSquareRoot()); //Prints "1524155677489: true"
    
    long notSquare = square + 1;
    squareRootChecker = new SquareRootChecker(notSquare);
    System.out.println(notSquare + ": " + squareRootChecker.checkSquareRoot()); //Prints "1524155677490: false"
    

    For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.


    If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.


    May be the best algorithm for the problem is a fast integer square root algorithm https://stackoverflow.com/a/51585204/5191852

    There @Kde claims that three iterations of the Newton method would be sufficient for the accuracy of ±1 for 32 bit integers. Certainly, more iterations are needed for 64-bit integers, may be 6 or 7.


    If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?


    This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.

    • Mod-256 test
    • Inexact mod-3465 test (avoids integer division at the cost of some false positives)
    • Floating-point square root, round and compare with input value

    I also experimented with these modifications but they did not help performance:

    • Additional mod-255 test
    • Dividing the input value by powers of 4
    • Fast Inverse Square Root (to work for high values of N it needs 3 iterations, enough to make it slower than the hardware square root function.)

    public class SquareTester {
    
        public static boolean isPerfectSquare(long n) {
            if (n < 0) {
                return false;
            } else {
                switch ((byte) n) {
                case -128: case -127: case -124: case -119: case -112:
                case -111: case -103: case  -95: case  -92: case  -87:
                case  -79: case  -71: case  -64: case  -63: case  -60:
                case  -55: case  -47: case  -39: case  -31: case  -28:
                case  -23: case  -15: case   -7: case    0: case    1:
                case    4: case    9: case   16: case   17: case   25:
                case   33: case   36: case   41: case   49: case   57:
                case   64: case   65: case   68: case   73: case   81:
                case   89: case   97: case  100: case  105: case  113:
                case  121:
                    long i = (n * INV3465) >>> 52;
                    if (! good3465[(int) i]) {
                        return false;
                    } else {
                        long r = round(Math.sqrt(n));
                        return r*r == n; 
                    }
                default:
                    return false;
                }
            }
        }
    
        private static int round(double x) {
            return (int) Double.doubleToRawLongBits(x + (double) (1L << 52));
        }
    
        /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */
        private static final long INV3465 = 0x8ffed161732e78b9L;
    
        private static final boolean[] good3465 =
            new boolean[0x1000];
    
        static {
            for (int r = 0; r < 3465; ++ r) {
                int i = (int) ((r * r * INV3465) >>> 52);
                good3465[i] = good3465[i+1] = true;
            }
        }
    
    }
    

    Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.

    First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.

    A solution would then be like this:

    boolean isPerfectSquare(long number)
    {
        if (number < 0) return false;
        if (number < 2) return true;
    
        for (int i = 0; ; i++)
        {
            long square = squareTable[i];
            if (square > number) return false;
            while (number % square == 0)
            {
                number /= square;
            }
            if (number == 1) return true;
        }
    }
    

    I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.

    Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.


    This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:

    def isexactsqrt(v)
        value = v.abs
        residue = value
        root = 0
        onebit = 1
        onebit <<= 8 while (onebit < residue)
        onebit >>= 2 while (onebit > residue)
        while (onebit > 0)
            x = root + onebit
            if (residue >= x) then
                residue -= x
                root = x + onebit
            end
            root >>= 1
            onebit >>= 2
        end
        return (residue == 0)
    end
    

    Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:

    #include <iostream>  
    
    using namespace std;  
    typedef unsigned long long int llint;
    
    class ISqrt {           // Integer Square Root
        llint value;        // Integer whose square root is required
        llint root;         // Result: floor(sqrt(value))
        llint residue;      // Result: value-root*root
        llint onebit, x;    // Working bit, working value
    
    public:
    
        ISqrt(llint v = 2) {    // Constructor
            Root(v);            // Take the root 
        };
    
        llint Root(llint r) {   // Resets and calculates new square root
            value = r;          // Store input
            residue = value;    // Initialise for subtracting down
            root = 0;           // Clear root accumulator
    
            onebit = 1;                 // Calculate start value of counter
            onebit <<= (8*sizeof(llint)-2);         // Set up counter bit as greatest odd power of 2 
            while (onebit > residue) {onebit >>= 2; };  // Shift down until just < value
    
            while (onebit > 0) {
                x = root ^ onebit;          // Will check root+1bit (root bit corresponding to onebit is always zero)
                if (residue >= x) {         // Room to subtract?
                    residue -= x;           // Yes - deduct from residue
                    root = x + onebit;      // and step root
                };
                root >>= 1;
                onebit >>= 2;
            };
            return root;                    
        };
        llint Residue() {           // Returns residue from last calculation
            return residue;                 
        };
    };
    
    int main() {
        llint big, i, q, r, v, delta;
        big = 0; big = (big-1);         // Kludge for "big number"
        ISqrt b;                            // Make q sqrt generator
        for ( i = big; i > 0 ; i /= 7 ) {   // for several numbers
            q = b.Root(i);                  // Get the square root
            r = b.Residue();                // Get the residue
            v = q*q+r;                      // Recalc original value
            delta = v-i;                    // And diff, hopefully 0
            cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
        };
        return 0;
    };
    

    It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

    Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.


    I want this function to work with all positive 64-bit signed integers

    Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.


    Project Euler is mentioned in the tags and many of the problems in it require checking numbers >> 2^64. Most of the optimizations mentioned above don't work easily when you are working with an 80 byte buffer.

    I used java BigInteger and a slightly modified version of Newton's method, one that works better with integers. The problem was that exact squares n^2 converged to (n-1) instead of n because n^2-1 = (n-1)(n+1) and the final error was just one step below the final divisor and the algorithm terminated. It was easy to fix by adding one to the original argument before computing the error. (Add two for cube roots, etc.)

    One nice attribute of this algorithm is that you can immediately tell if the number is a perfect square - the final error (not correction) in Newton's method will be zero. A simple modification also lets you quickly calculate floor(sqrt(x)) instead of the closest integer. This is handy with several Euler problems.


    I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.

    Just replace your:

    if(n < 410881L){...}
    

    code with this one:

    if (n < 11043908100L) {
        //John Carmack hack, converted to Java.
        // See: http://www.codemaestro.com/reviews/9
        int i;
        float x2, y;
    
        x2 = n * 0.5F;
        y = n;
        i = Float.floatToRawIntBits(y);
        //using the magic number from 
        //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
        //since it more accurate
        i = 0x5f375a86 - (i >> 1);
        y = Float.intBitsToFloat(i);
        y = y * (1.5F - (x2 * y * y));
        y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
    
        sqrt = Math.round(1.0F / y);
    } else {
        //Carmack hack gives incorrect answer for n >= 11043908100.
        sqrt = (long) Math.sqrt(n);
    }
    

    For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.


    You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

    Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

    Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

    public final static boolean isPerfectSquare(long n)
    {
        if (n < 0)
            return false;
    
        switch((int)(n & 0xF))
        {
        case 0: case 1: case 4: case 9:
            long tst = (long)Math.sqrt(n);
            return tst*tst == n;
    
        default:
            return false;
        }
    }
    

    When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

    int isPerfectSquare(int n)
    {
        int h = n & 0xF;  // h is the last hex "digit"
        if (h > 9)
            return 0;
        // Use lazy evaluation to jump out of the if statement as soon as possible
        if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
        {
            int t = (int) floor( sqrt((double) n) + 0.5 );
            return t*t == n;
        }
        return 0;
    }
    

    Eliminating the switch statement had little effect on the C# code.


    You should get rid of the 2-power part of N right from the start.

    2nd Edit The magical expression for m below should be

    m = N - (N & (N-1));
    

    and not as written

    End of 2nd edit

    m = N & (N-1); // the lawest bit of N
    N /= m;
    byte = N & 0x0F;
    if ((m % 2) || (byte !=1 && byte !=9))
      return false;
    

    1st Edit:

    Minor improvement:

    m = N & (N-1); // the lawest bit of N
    N /= m;
    if ((m % 2) || (N & 0x07 != 1))
      return false;
    

    End of 1st edit

    Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. I.e. you run the floating point part on 6% of the numbers.


    I was thinking about the horrible times I've spent in Numerical Analysis course.

    And then I remember, there was this function circling around the 'net from the Quake Source code:

    float Q_rsqrt( float number )
    {
      long i;
      float x2, y;
      const float threehalfs = 1.5F;
    
      x2 = number * 0.5F;
      y  = number;
      i  = * ( long * ) &y;  // evil floating point bit level hacking
      i  = 0x5f3759df - ( i >> 1 ); // wtf?
      y  = * ( float * ) &i;
      y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
      // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
    
      #ifndef Q3_VM
      #ifdef __linux__
        assert( !isnan(y) ); // bk010122 - FPE?
      #endif
      #endif
      return y;
    }
    

    Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

    It should be usable and might even be faster, it's from one of the phenomenal id software's game!

    It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

    I originally found it at: http://www.codemaestro.com/reviews/9

    Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

    You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

    • the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
    • the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
    • the * (float*) &i converts the value back to floating point.
    • the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

    The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

    This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.


    You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

    Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

    Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

    public final static boolean isPerfectSquare(long n)
    {
        if (n < 0)
            return false;
    
        switch((int)(n & 0xF))
        {
        case 0: case 1: case 4: case 9:
            long tst = (long)Math.sqrt(n);
            return tst*tst == n;
    
        default:
            return false;
        }
    }
    

    When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

    int isPerfectSquare(int n)
    {
        int h = n & 0xF;  // h is the last hex "digit"
        if (h > 9)
            return 0;
        // Use lazy evaluation to jump out of the if statement as soon as possible
        if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
        {
            int t = (int) floor( sqrt((double) n) + 0.5 );
            return t*t == n;
        }
        return 0;
    }
    

    Eliminating the switch statement had little effect on the C# code.


    I'm pretty late to the party, but I hope to provide a better answer; shorter and (assuming my benchmark is correct) also much faster.

    long goodMask; // 0xC840C04048404040 computed below
    {
        for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
    }
    
    public boolean isSquare(long x) {
        // This tests if the 6 least significant bits are right.
        // Moving the to be tested bit to the highest position saves us masking.
        if (goodMask << x >= 0) return false;
        final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x);
        // Each square ends with an even number of zeros.
        if ((numberOfTrailingZeros & 1) != 0) return false;
        x >>= numberOfTrailingZeros;
        // Now x is either 0 or odd.
        // In binary each odd square ends with 001.
        // Postpone the sign test until now; handle zero in the branch.
        if ((x&7) != 1 | x <= 0) return x == 0;
        // Do it in the classical way.
        // The correctness is not trivial as the conversion from long to double is lossy!
        final long tst = (long) Math.sqrt(x);
        return tst * tst == x;
    }
    

    The first test catches most non-squares quickly. It uses a 64-item table packed in a long, so there's no array access cost (indirection and bounds checks). For a uniformly random long, there's a 81.25% probability of ending here.

    The second test catches all numbers having an odd number of twos in their factorization. The method Long.numberOfTrailingZeros is very fast as it gets JIT-ed into a single i86 instruction.

    After dropping the trailing zeros, the third test handles numbers ending with 011, 101, or 111 in binary, which are no perfect squares. It also cares about negative numbers and also handles 0.

    The final test falls back to double arithmetic. As double has only 53 bits mantissa, the conversion from long to double includes rounding for big values. Nonetheless, the test is correct (unless the proof is wrong).

    Trying to incorporate the mod255 idea wasn't successful.


    You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

    Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

    Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

    public final static boolean isPerfectSquare(long n)
    {
        if (n < 0)
            return false;
    
        switch((int)(n & 0xF))
        {
        case 0: case 1: case 4: case 9:
            long tst = (long)Math.sqrt(n);
            return tst*tst == n;
    
        default:
            return false;
        }
    }
    

    When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

    int isPerfectSquare(int n)
    {
        int h = n & 0xF;  // h is the last hex "digit"
        if (h > 9)
            return 0;
        // Use lazy evaluation to jump out of the if statement as soon as possible
        if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
        {
            int t = (int) floor( sqrt((double) n) + 0.5 );
            return t*t == n;
        }
        return 0;
    }
    

    Eliminating the switch statement had little effect on the C# code.


    The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.

    So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. Oh well.

    In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.

    typedef unsigned long long UVLONG
    UVLONG pp1,pp2;
    
    void init2() {
      for (int i = 0; i < 64; i++) {
        for (int j = 0; j < 64; j++)
          if (isPerfectSquare(i * 64 + j)) {
        pp1 |= (1 << j);
        pp2 |= (1 << i);
        break;
          }
       }
       cout << "pp1=" << pp1 << "," << pp2 << "\n";  
    }
    
    
    inline bool isPerfectSquare5(UVLONG x) {
      return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false;
    }
    

    The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.

    Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.

    Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.

    The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.

    There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.


    I'm pretty late to the party, but I hope to provide a better answer; shorter and (assuming my benchmark is correct) also much faster.

    long goodMask; // 0xC840C04048404040 computed below
    {
        for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
    }
    
    public boolean isSquare(long x) {
        // This tests if the 6 least significant bits are right.
        // Moving the to be tested bit to the highest position saves us masking.
        if (goodMask << x >= 0) return false;
        final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x);
        // Each square ends with an even number of zeros.
        if ((numberOfTrailingZeros & 1) != 0) return false;
        x >>= numberOfTrailingZeros;
        // Now x is either 0 or odd.
        // In binary each odd square ends with 001.
        // Postpone the sign test until now; handle zero in the branch.
        if ((x&7) != 1 | x <= 0) return x == 0;
        // Do it in the classical way.
        // The correctness is not trivial as the conversion from long to double is lossy!
        final long tst = (long) Math.sqrt(x);
        return tst * tst == x;
    }
    

    The first test catches most non-squares quickly. It uses a 64-item table packed in a long, so there's no array access cost (indirection and bounds checks). For a uniformly random long, there's a 81.25% probability of ending here.

    The second test catches all numbers having an odd number of twos in their factorization. The method Long.numberOfTrailingZeros is very fast as it gets JIT-ed into a single i86 instruction.

    After dropping the trailing zeros, the third test handles numbers ending with 011, 101, or 111 in binary, which are no perfect squares. It also cares about negative numbers and also handles 0.

    The final test falls back to double arithmetic. As double has only 53 bits mantissa, the conversion from long to double includes rounding for big values. Nonetheless, the test is correct (unless the proof is wrong).

    Trying to incorporate the mod255 idea wasn't successful.


    I want this function to work with all positive 64-bit signed integers

    Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.


    If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

    (n+1)^2 = n^2 + 2n + 1
    (n-1)^2 = n^2 - 2n + 1
    

    So having calculated n^2, the options are:

    • n^2 = target: done, return true
    • n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
    • n^2 - 2n + 1 < target < n^2 : ditto
    • target < n^2 - 2n + 1 : binary chop on a lower n
    • target > n^2 + 2n + 1 : binary chop on a higher n

    (Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

    I don't know whether this will be faster or not, but it's worth a try.

    EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.


    I was thinking about the horrible times I've spent in Numerical Analysis course.

    And then I remember, there was this function circling around the 'net from the Quake Source code:

    float Q_rsqrt( float number )
    {
      long i;
      float x2, y;
      const float threehalfs = 1.5F;
    
      x2 = number * 0.5F;
      y  = number;
      i  = * ( long * ) &y;  // evil floating point bit level hacking
      i  = 0x5f3759df - ( i >> 1 ); // wtf?
      y  = * ( float * ) &i;
      y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
      // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
    
      #ifndef Q3_VM
      #ifdef __linux__
        assert( !isnan(y) ); // bk010122 - FPE?
      #endif
      #endif
      return y;
    }
    

    Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

    It should be usable and might even be faster, it's from one of the phenomenal id software's game!

    It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

    I originally found it at: http://www.codemaestro.com/reviews/9

    Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

    You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

    • the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
    • the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
    • the * (float*) &i converts the value back to floating point.
    • the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

    The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

    This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.


    If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

    (n+1)^2 = n^2 + 2n + 1
    (n-1)^2 = n^2 - 2n + 1
    

    So having calculated n^2, the options are:

    • n^2 = target: done, return true
    • n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
    • n^2 - 2n + 1 < target < n^2 : ditto
    • target < n^2 - 2n + 1 : binary chop on a lower n
    • target > n^2 + 2n + 1 : binary chop on a higher n

    (Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

    I don't know whether this will be faster or not, but it's worth a try.

    EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.


    If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.


    Considering for general bit length (though I have used specific type here), I tried to design simplistic algo as below. Simple and obvious check for 0,1,2 or <0 is required initially. Following is simple in sense that it doesn't try to use any existing maths functions. Most of the operator can be replaced with bit-wise operators. I haven't tested with any bench mark data though. I'm neither expert at maths or computer algorithm design in particular, I would love to see you pointing out problem. I know there is lots of improvement chances there.

    int main()
    {
        unsigned int c1=0 ,c2 = 0;  
        unsigned int x = 0;  
        unsigned int p = 0;  
        int k1 = 0;  
        scanf("%d",&p);  
        if(p % 2 == 0) {  
            x = p/2; 
        }  
        else {  
            x = (p/2) +1;  
        }  
        while(x) 
        {
            if((x*x) > p) {  
                c1 = x;  
                x = x/2; 
            }else {  
                c2 = x;  
                break;  
            }  
        }  
        if((p%2) != 0)  
            c2++;
    
        while(c2 < c1) 
        {  
            if((c2 * c2 ) == p) {  
                k1 = 1;  
                break;  
            }  
            c2++; 
        }  
        if(k1)  
            printf("\n Perfect square for %d", c2);  
        else  
            printf("\n Not perfect but nearest to :%d :", c2);  
        return 0;  
    }  
    

    Newton's Method with integer arithmetic

    If you wish to avoid non-integer operations you could use the method below. It basically uses Newton's Method modified for integer arithmetic.

    /**
     * Test if the given number is a perfect square.
     * @param n Must be greater than 0 and less
     *    than Long.MAX_VALUE.
     * @return <code>true</code> if n is a perfect
     *    square, or <code>false</code> otherwise.
     */
    public static boolean isSquare(long n)
    {
        long x1 = n;
        long x2 = 1L;
    
        while (x1 > x2)
        {
            x1 = (x1 + x2) / 2L;
            x2 = n / x1;
        }
    
        return x1 == x2 && n % x1 == 0L;
    }
    

    This implementation can not compete with solutions that use Math.sqrt. However, its performance can be improved by using the filtering mechanisms described in some of the other posts.


    Calculating square roots by Newton's method is horrendously fast ... provided that the starting value is reasonable. However there is no reasonable starting value, and in practice we end with bisection and log(2^64) behaviour.
    To be really fast we need a fast way to get at a reasonable starting value, and that means we need to descend into machine language. If a processor provides an instruction like POPCNT in the Pentium, that counts the leading zeroes we can use that to have a starting value with half the significant bits. With care we can find a a fixed number of Newton steps that will always suffice. (Thus foregoing the need to loop and have very fast execution.)

    A second solution is going via the floating point facility, which may have a fast sqrt calculation (like the i87 coprocessor.) Even an excursion via exp() and log() may be faster than Newton degenerated into a binary search. There is a tricky aspect to this, a processor dependant analysis of what and if refinement afterwards is necessary.

    A third solution solves a slightly different problem, but is well worth mentionning because the situation is described in the question. If you want to calculate a great many square roots for numbers that differ slightly, you can use Newton iteration, if you never reinitialise the starting value, but just leave it where the previous calculation left off. I've used this with success in at least one Euler problem.


    This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:

    def isexactsqrt(v)
        value = v.abs
        residue = value
        root = 0
        onebit = 1
        onebit <<= 8 while (onebit < residue)
        onebit >>= 2 while (onebit > residue)
        while (onebit > 0)
            x = root + onebit
            if (residue >= x) then
                residue -= x
                root = x + onebit
            end
            root >>= 1
            onebit >>= 2
        end
        return (residue == 0)
    end
    

    Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:

    #include <iostream>  
    
    using namespace std;  
    typedef unsigned long long int llint;
    
    class ISqrt {           // Integer Square Root
        llint value;        // Integer whose square root is required
        llint root;         // Result: floor(sqrt(value))
        llint residue;      // Result: value-root*root
        llint onebit, x;    // Working bit, working value
    
    public:
    
        ISqrt(llint v = 2) {    // Constructor
            Root(v);            // Take the root 
        };
    
        llint Root(llint r) {   // Resets and calculates new square root
            value = r;          // Store input
            residue = value;    // Initialise for subtracting down
            root = 0;           // Clear root accumulator
    
            onebit = 1;                 // Calculate start value of counter
            onebit <<= (8*sizeof(llint)-2);         // Set up counter bit as greatest odd power of 2 
            while (onebit > residue) {onebit >>= 2; };  // Shift down until just < value
    
            while (onebit > 0) {
                x = root ^ onebit;          // Will check root+1bit (root bit corresponding to onebit is always zero)
                if (residue >= x) {         // Room to subtract?
                    residue -= x;           // Yes - deduct from residue
                    root = x + onebit;      // and step root
                };
                root >>= 1;
                onebit >>= 2;
            };
            return root;                    
        };
        llint Residue() {           // Returns residue from last calculation
            return residue;                 
        };
    };
    
    int main() {
        llint big, i, q, r, v, delta;
        big = 0; big = (big-1);         // Kludge for "big number"
        ISqrt b;                            // Make q sqrt generator
        for ( i = big; i > 0 ; i /= 7 ) {   // for several numbers
            q = b.Root(i);                  // Get the square root
            r = b.Residue();                // Get the residue
            v = q*q+r;                      // Recalc original value
            delta = v-i;                    // And diff, hopefully 0
            cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
        };
        return 0;
    };
    

    I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.


    Regarding the Carmac method, it seems like it would be quite easy just to iterate once more, which should double the number of digits of accuracy. It is, after all, an extremely truncated iterative method -- Newton's, with a very good first guess.

    Regarding your current best, I see two micro-optimizations:

    • move the check vs. 0 after the check using mod255
    • rearrange the dividing out powers of four to skip all the checks for the usual (75%) case.

    I.e:

    // Divide out powers of 4 using binary search
    
    if((n & 0x3L) == 0) {
      n >>=2;
    
      if((n & 0xffffffffL) == 0)
        n >>= 32;
      if((n & 0xffffL) == 0)
          n >>= 16;
      if((n & 0xffL) == 0)
          n >>= 8;
      if((n & 0xfL) == 0)
          n >>= 4;
      if((n & 0x3L) == 0)
          n >>= 2;
    }
    

    Even better might be a simple

    while ((n & 0x03L) == 0) n >>= 2;
    

    Obviously, it would be interesting to know how many numbers get culled at each checkpoint -- I rather doubt the checks are truly independent, which makes things tricky.


    I was thinking about the horrible times I've spent in Numerical Analysis course.

    And then I remember, there was this function circling around the 'net from the Quake Source code:

    float Q_rsqrt( float number )
    {
      long i;
      float x2, y;
      const float threehalfs = 1.5F;
    
      x2 = number * 0.5F;
      y  = number;
      i  = * ( long * ) &y;  // evil floating point bit level hacking
      i  = 0x5f3759df - ( i >> 1 ); // wtf?
      y  = * ( float * ) &i;
      y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
      // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
    
      #ifndef Q3_VM
      #ifdef __linux__
        assert( !isnan(y) ); // bk010122 - FPE?
      #endif
      #endif
      return y;
    }
    

    Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

    It should be usable and might even be faster, it's from one of the phenomenal id software's game!

    It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

    I originally found it at: http://www.codemaestro.com/reviews/9

    Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

    You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

    • the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
    • the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
    • the * (float*) &i converts the value back to floating point.
    • the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

    The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

    This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.


    It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

    Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.


    Square Root of a number, given that the number is a perfect square.

    The complexity is log(n)

    /**
     * Calculate square root if the given number is a perfect square.
     * 
     * Approach: Sum of n odd numbers is equals to the square root of n*n, given 
     * that n is a perfect square.
     *
     * @param number
     * @return squareRoot
     */
    
    public static int calculateSquareRoot(int number) {
    
        int sum=1;
        int count =1;
        int squareRoot=1;
        while(sum<number) {
            count+=2;
            sum+=count;
            squareRoot++;
        }
        return squareRoot;
    }
    

    "I'm looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer)."

    The answers are impressive, but I failed to see a simple check :

    check whether the first number on the right of the long it a member of the set (0,1,4,5,6,9) . If it is not, then it cannot possibly be a 'perfect square' .

    eg.

    4567 - cannot be a perfect square.


    May be the best algorithm for the problem is a fast integer square root algorithm https://stackoverflow.com/a/51585204/5191852

    There @Kde claims that three iterations of the Newton method would be sufficient for the accuracy of ±1 for 32 bit integers. Certainly, more iterations are needed for 64-bit integers, may be 6 or 7.


    If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?


    It's been pointed out that the last d digits of a perfect square can only take on certain values. The last d digits (in base b) of a number n is the same as the remainder when n is divided by bd, ie. in C notation n % pow(b, d).

    This can be generalized to any modulus m, ie. n % m can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.

    By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.


    The following simplification of maaartinus's solution appears to shave a few percentage points off the runtime, but I'm not good enough at benchmarking to produce a benchmark I can trust:

    long goodMask; // 0xC840C04048404040 computed below
    {
        for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
    }
    
    public boolean isSquare(long x) {
        // This tests if the 6 least significant bits are right.
        // Moving the to be tested bit to the highest position saves us masking.
        if (goodMask << x >= 0) return false;
        // Remove an even number of trailing zeros, leaving at most one.
        x >>= (Long.numberOfTrailingZeros(x) & (-2);
        // Repeat the test on the 6 least significant remaining bits.
        if (goodMask << x >= 0 | x <= 0) return x == 0;
        // Do it in the classical way.
        // The correctness is not trivial as the conversion from long to double is lossy!
        final long tst = (long) Math.sqrt(x);
        return tst * tst == x;
    }
    

    It would be worth checking how omitting the first test,

    if (goodMask << x >= 0) return false;
    

    would affect performance.


    I was thinking about the horrible times I've spent in Numerical Analysis course.

    And then I remember, there was this function circling around the 'net from the Quake Source code:

    float Q_rsqrt( float number )
    {
      long i;
      float x2, y;
      const float threehalfs = 1.5F;
    
      x2 = number * 0.5F;
      y  = number;
      i  = * ( long * ) &y;  // evil floating point bit level hacking
      i  = 0x5f3759df - ( i >> 1 ); // wtf?
      y  = * ( float * ) &i;
      y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
      // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
    
      #ifndef Q3_VM
      #ifdef __linux__
        assert( !isnan(y) ); // bk010122 - FPE?
      #endif
      #endif
      return y;
    }
    

    Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

    It should be usable and might even be faster, it's from one of the phenomenal id software's game!

    It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

    I originally found it at: http://www.codemaestro.com/reviews/9

    Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

    You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

    • the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
    • the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
    • the * (float*) &i converts the value back to floating point.
    • the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

    The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

    This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.


    I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.


    If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?


    This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:

    def isexactsqrt(v)
        value = v.abs
        residue = value
        root = 0
        onebit = 1
        onebit <<= 8 while (onebit < residue)
        onebit >>= 2 while (onebit > residue)
        while (onebit > 0)
            x = root + onebit
            if (residue >= x) then
                residue -= x
                root = x + onebit
            end
            root >>= 1
            onebit >>= 2
        end
        return (residue == 0)
    end
    

    Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:

    #include <iostream>  
    
    using namespace std;  
    typedef unsigned long long int llint;
    
    class ISqrt {           // Integer Square Root
        llint value;        // Integer whose square root is required
        llint root;         // Result: floor(sqrt(value))
        llint residue;      // Result: value-root*root
        llint onebit, x;    // Working bit, working value
    
    public:
    
        ISqrt(llint v = 2) {    // Constructor
            Root(v);            // Take the root 
        };
    
        llint Root(llint r) {   // Resets and calculates new square root
            value = r;          // Store input
            residue = value;    // Initialise for subtracting down
            root = 0;           // Clear root accumulator
    
            onebit = 1;                 // Calculate start value of counter
            onebit <<= (8*sizeof(llint)-2);         // Set up counter bit as greatest odd power of 2 
            while (onebit > residue) {onebit >>= 2; };  // Shift down until just < value
    
            while (onebit > 0) {
                x = root ^ onebit;          // Will check root+1bit (root bit corresponding to onebit is always zero)
                if (residue >= x) {         // Room to subtract?
                    residue -= x;           // Yes - deduct from residue
                    root = x + onebit;      // and step root
                };
                root >>= 1;
                onebit >>= 2;
            };
            return root;                    
        };
        llint Residue() {           // Returns residue from last calculation
            return residue;                 
        };
    };
    
    int main() {
        llint big, i, q, r, v, delta;
        big = 0; big = (big-1);         // Kludge for "big number"
        ISqrt b;                            // Make q sqrt generator
        for ( i = big; i > 0 ; i /= 7 ) {   // for several numbers
            q = b.Root(i);                  // Get the square root
            r = b.Residue();                // Get the residue
            v = q*q+r;                      // Recalc original value
            delta = v-i;                    // And diff, hopefully 0
            cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
        };
        return 0;
    };
    

    Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.

    First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.

    A solution would then be like this:

    boolean isPerfectSquare(long number)
    {
        if (number < 0) return false;
        if (number < 2) return true;
    
        for (int i = 0; ; i++)
        {
            long square = squareTable[i];
            if (square > number) return false;
            while (number % square == 0)
            {
                number /= square;
            }
            if (number == 1) return true;
        }
    }
    

    I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.

    Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.


    This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:

    def isexactsqrt(v)
        value = v.abs
        residue = value
        root = 0
        onebit = 1
        onebit <<= 8 while (onebit < residue)
        onebit >>= 2 while (onebit > residue)
        while (onebit > 0)
            x = root + onebit
            if (residue >= x) then
                residue -= x
                root = x + onebit
            end
            root >>= 1
            onebit >>= 2
        end
        return (residue == 0)
    end
    

    Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:

    #include <iostream>  
    
    using namespace std;  
    typedef unsigned long long int llint;
    
    class ISqrt {           // Integer Square Root
        llint value;        // Integer whose square root is required
        llint root;         // Result: floor(sqrt(value))
        llint residue;      // Result: value-root*root
        llint onebit, x;    // Working bit, working value
    
    public:
    
        ISqrt(llint v = 2) {    // Constructor
            Root(v);            // Take the root 
        };
    
        llint Root(llint r) {   // Resets and calculates new square root
            value = r;          // Store input
            residue = value;    // Initialise for subtracting down
            root = 0;           // Clear root accumulator
    
            onebit = 1;                 // Calculate start value of counter
            onebit <<= (8*sizeof(llint)-2);         // Set up counter bit as greatest odd power of 2 
            while (onebit > residue) {onebit >>= 2; };  // Shift down until just < value
    
            while (onebit > 0) {
                x = root ^ onebit;          // Will check root+1bit (root bit corresponding to onebit is always zero)
                if (residue >= x) {         // Room to subtract?
                    residue -= x;           // Yes - deduct from residue
                    root = x + onebit;      // and step root
                };
                root >>= 1;
                onebit >>= 2;
            };
            return root;                    
        };
        llint Residue() {           // Returns residue from last calculation
            return residue;                 
        };
    };
    
    int main() {
        llint big, i, q, r, v, delta;
        big = 0; big = (big-1);         // Kludge for "big number"
        ISqrt b;                            // Make q sqrt generator
        for ( i = big; i > 0 ; i /= 7 ) {   // for several numbers
            q = b.Root(i);                  // Get the square root
            r = b.Residue();                // Get the residue
            v = q*q+r;                      // Recalc original value
            delta = v-i;                    // And diff, hopefully 0
            cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
        };
        return 0;
    };
    

    I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.


    This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.

    • Mod-256 test
    • Inexact mod-3465 test (avoids integer division at the cost of some false positives)
    • Floating-point square root, round and compare with input value

    I also experimented with these modifications but they did not help performance:

    • Additional mod-255 test
    • Dividing the input value by powers of 4
    • Fast Inverse Square Root (to work for high values of N it needs 3 iterations, enough to make it slower than the hardware square root function.)

    public class SquareTester {
    
        public static boolean isPerfectSquare(long n) {
            if (n < 0) {
                return false;
            } else {
                switch ((byte) n) {
                case -128: case -127: case -124: case -119: case -112:
                case -111: case -103: case  -95: case  -92: case  -87:
                case  -79: case  -71: case  -64: case  -63: case  -60:
                case  -55: case  -47: case  -39: case  -31: case  -28:
                case  -23: case  -15: case   -7: case    0: case    1:
                case    4: case    9: case   16: case   17: case   25:
                case   33: case   36: case   41: case   49: case   57:
                case   64: case   65: case   68: case   73: case   81:
                case   89: case   97: case  100: case  105: case  113:
                case  121:
                    long i = (n * INV3465) >>> 52;
                    if (! good3465[(int) i]) {
                        return false;
                    } else {
                        long r = round(Math.sqrt(n));
                        return r*r == n; 
                    }
                default:
                    return false;
                }
            }
        }
    
        private static int round(double x) {
            return (int) Double.doubleToRawLongBits(x + (double) (1L << 52));
        }
    
        /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */
        private static final long INV3465 = 0x8ffed161732e78b9L;
    
        private static final boolean[] good3465 =
            new boolean[0x1000];
    
        static {
            for (int r = 0; r < 3465; ++ r) {
                int i = (int) ((r * r * INV3465) >>> 52);
                good3465[i] = good3465[i+1] = true;
            }
        }
    
    }
    

    You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

    Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

    Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

    public final static boolean isPerfectSquare(long n)
    {
        if (n < 0)
            return false;
    
        switch((int)(n & 0xF))
        {
        case 0: case 1: case 4: case 9:
            long tst = (long)Math.sqrt(n);
            return tst*tst == n;
    
        default:
            return false;
        }
    }
    

    When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

    int isPerfectSquare(int n)
    {
        int h = n & 0xF;  // h is the last hex "digit"
        if (h > 9)
            return 0;
        // Use lazy evaluation to jump out of the if statement as soon as possible
        if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
        {
            int t = (int) floor( sqrt((double) n) + 0.5 );
            return t*t == n;
        }
        return 0;
    }
    

    Eliminating the switch statement had little effect on the C# code.


    I want this function to work with all positive 64-bit signed integers

    Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.


    An integer problem deserves an integer solution. Thus

    Do binary search on the (non-negative) integers to find the greatest integer t such that t**2 <= n. Then test whether r**2 = n exactly. This takes time O(log n).

    If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above f(t) = t**2 - n) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.


    Examples related to java

    Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

    Examples related to math

    How to do perspective fixing? How to pad a string with leading zeros in Python 3 How can I use "e" (Euler's number) and power operation in python 2.7 numpy max vs amax vs maximum Efficiently getting all divisors of a given number Using atan2 to find angle between two vectors How to calculate percentage when old value is ZERO Finding square root without using sqrt function? Exponentiation in Python - should I prefer ** operator instead of math.pow and math.sqrt? How do I get the total number of unique pairs of a set in the database?

    Examples related to optimization

    Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Measuring execution time of a function in C++ GROUP BY having MAX date How to efficiently remove duplicates from an array without using Set Storing JSON in database vs. having a new column for each key Read file As String How to write a large buffer into a binary file in C++, fast? Is optimisation level -O3 dangerous in g++? Why is processing a sorted array faster than processing an unsorted array? MySQL my.cnf performance tuning recommendations

    Examples related to perfect-square

    How to square all the values in a vector in R? Check if a number is a perfect square Fastest way to determine if an integer's square root is an integer