Just to clarify, this is not a homework problem :)
I wanted to find primes for a math application I am building & came across Sieve of Eratosthenes approach.
I have written an implementation of it in Python. But it's terribly slow. For say, if I want to find all primes less than 2 million. It takes > 20 mins. (I stopped it at this point). How can I speed this up?
def primes_sieve(limit):
limitn = limit+1
primes = range(2, limitn)
for i in primes:
factors = range(i, limitn, i)
for f in factors[1:]:
if f in primes:
primes.remove(f)
return primes
print primes_sieve(2000)
UPDATE: I ended up doing profiling on this code & found that quite a lot of time was spent on removing an element from the list. Quite understandable considering it has to traverse the entire list (worst-case) to find the element & then remove it and then readjust the list (maybe some copy goes on?). Anyway, I chucked out list for dictionary. My new implementation -
def primes_sieve1(limit):
limitn = limit+1
primes = dict()
for i in range(2, limitn): primes[i] = True
for i in primes:
factors = range(i,limitn, i)
for f in factors[1:]:
primes[f] = False
return [i for i in primes if primes[i]==True]
print primes_sieve1(2000000)
This question is related to
python
math
primes
sieve-of-eratosthenes
Using a bit of numpy
, I could find all primes below 100 million in a little over 2 seconds.
There are two key features one should note
i
only for i
up to root of n
i
to False
using x[2*i::i] = False
is much faster than an explicit python for loop.These two significantly speed up your code. For limits below one million, there is no perceptible running time.
import numpy as np
def primes(n):
x = np.ones((n+1,), dtype=np.bool)
x[0] = False
x[1] = False
for i in range(2, int(n**0.5)+1):
if x[i]:
x[2*i::i] = False
primes = np.where(x == True)[0]
return primes
print(len(primes(100_000_000)))
i think this is shortest code for finding primes with eratosthenes method
def prime(r):
n = range(2,r)
while len(n)>0:
yield n[0]
n = [x for x in n if x not in range(n[0],r,n[0])]
print(list(prime(r)))
not sure if my code is efficeient, anyone care to comment?
from math import isqrt
def isPrime(n):
if n >= 2: # cheating the 2, is 2 even prime?
for i in range(3, int(n / 2 + 1),2): # dont waste time with even numbers
if n % i == 0:
return False
return True
def primesTo(n):
x = [2] if n >= 2 else [] # cheat the only even prime
if n >= 2:
for i in range(3, n + 1,2): # dont waste time with even numbers
if isPrime(i):
x.append(i)
return x
def primes2(n): # trying to do this using set methods and the "Sieve of Eratosthenes"
base = {2} # again cheating the 2
base.update(set(range(3, n + 1, 2))) # build the base of odd numbers
for i in range(3, isqrt(n) + 1, 2): # apply the sieve
base.difference_update(set(range(2 * i, n + 1 , i)))
return list(base)
print(primesTo(10000)) # 2 different methods for comparison
print(primes2(10000))
A simple speed hack: when you define the variable "primes," set the step to 2 to skip all even numbers automatically, and set the starting point to 1.
Then you can further optimize by instead of for i in primes, use for i in primes[:round(len(primes) ** 0.5)]. That will dramatically increase performance. In addition, you can eliminate numbers ending with 5 to further increase speed.
I just came up with this. It may not be the fastest, but I'm not using anything other than straight additions and comparisons. Of course, what stops you here is the recursion limit.
def nondivsby2():
j = 1
while True:
j += 2
yield j
def nondivsbyk(k, nondivs):
j = 0
for i in nondivs:
while j < i:
j += k
if j > i:
yield i
def primes():
nd = nondivsby2()
while True:
p = next(nd)
nd = nondivsbyk(p, nd)
yield p
def main():
for p in primes():
print(p)
I realise this isn't really answering the question of how to generate primes quickly, but perhaps some will find this alternative interesting: because python provides lazy evaluation via generators, eratosthenes' sieve can be implemented exactly as stated:
def intsfrom(n):
while True:
yield n
n += 1
def sieve(ilist):
p = next(ilist)
yield p
for q in sieve(n for n in ilist if n%p != 0):
yield q
try:
for p in sieve(intsfrom(2)):
print p,
print ''
except RuntimeError as e:
print e
The try block is there because the algorithm runs until it blows the stack and without the try block the backtrace is displayed pushing the actual output you want to see off screen.
import math
def sieve(n):
primes = [True]*n
primes[0] = False
primes[1] = False
for i in range(2,int(math.sqrt(n))+1):
j = i*i
while j < n:
primes[j] = False
j = j+i
return [x for x in range(n) if primes[x] == True]
My implementation:
import math
n = 100
marked = {}
for i in range(2, int(math.sqrt(n))):
if not marked.get(i):
for x in range(i * i, n, i):
marked[x] = True
for i in range(2, n):
if not marked.get(i):
print i
Much faster:
import time
def get_primes(n):
m = n+1
#numbers = [True for i in range(m)]
numbers = [True] * m #EDIT: faster
for i in range(2, int(n**0.5 + 1)):
if numbers[i]:
for j in range(i*i, m, i):
numbers[j] = False
primes = []
for i in range(2, m):
if numbers[i]:
primes.append(i)
return primes
start = time.time()
primes = get_primes(10000)
print(time.time() - start)
print(get_primes(100))
def eratosthenes(n):
multiples = []
for i in range(2, n+1):
if i not in multiples:
print (i)
for j in range(i*i, n+1, i):
multiples.append(j)
eratosthenes(100)
The fastest implementation I could come up with:
isprime = [True]*N
isprime[0] = isprime[1] = False
for i in range(4, N, 2):
isprime[i] = False
for i in range(3, N, 2):
if isprime[i]:
for j in range(i*i, N, 2*i):
isprime[j] = False
I figured it must be possible to simply use the empty list as the terminating condition for the loop and came up with this:
limit = 100
ints = list(range(2, limit)) # Will end up empty
while len(ints) > 0:
prime = ints[0]
print prime
ints.remove(prime)
i = 2
multiple = prime * i
while multiple <= limit:
if multiple in ints:
ints.remove(multiple)
i += 1
multiple = prime * i
Here's a version that's a bit more memory-efficient (and: a proper sieve, not trial divisions). Basically, instead of keeping an array of all the numbers, and crossing out those that aren't prime, this keeps an array of counters - one for each prime it's discovered - and leap-frogging them ahead of the putative prime. That way, it uses storage proportional to the number of primes, not up to to the highest prime.
import itertools
def primes():
class counter:
def __init__ (this, n): this.n, this.current, this.isVirgin = n, n*n, True
# isVirgin means it's never been incremented
def advancePast (this, n): # return true if the counter advanced
if this.current > n:
if this.isVirgin: raise StopIteration # if this is virgin, then so will be all the subsequent counters. Don't need to iterate further.
return False
this.current += this.n # pre: this.current == n; post: this.current > n.
this.isVirgin = False # when it's gone, it's gone
return True
yield 1
multiples = []
for n in itertools.count(2):
isPrime = True
for p in (m.advancePast(n) for m in multiples):
if p: isPrime = False
if isPrime:
yield n
multiples.append (counter (n))
You'll note that primes()
is a generator, so you can keep the results in a list or you can use them directly. Here's the first n
primes:
import itertools
for k in itertools.islice (primes(), n):
print (k)
And, for completeness, here's a timer to measure the performance:
import time
def timer ():
t, k = time.process_time(), 10
for p in primes():
if p>k:
print (time.process_time()-t, " to ", p, "\n")
k *= 10
if k>100000: return
Just in case you're wondering, I also wrote primes()
as a simple iterator (using __iter__
and __next__
), and it ran at almost the same speed. Surprised me too!
Removing from the beginning of an array (list) requires moving all of the items after it down. That means that removing every element from a list in this way starting from the front is an O(n^2) operation.
You can do this much more efficiently with sets:
def primes_sieve(limit):
limitn = limit+1
not_prime = set()
primes = []
for i in range(2, limitn):
if i in not_prime:
continue
for f in range(i*2, limitn, i):
not_prime.add(f)
primes.append(i)
return primes
print primes_sieve(1000000)
... or alternatively, avoid having to rearrange the list:
def primes_sieve(limit):
limitn = limit+1
not_prime = [False] * limitn
primes = []
for i in range(2, limitn):
if not_prime[i]:
continue
for f in xrange(i*2, limitn, i):
not_prime[f] = True
primes.append(i)
return primes
Using recursion and walrus operator:
def prime_factors(n):
for i in range(2, int(n ** 0.5) + 1):
if (q_r := divmod(n, i))[1] == 0:
return [i] + factor_list(q_r[0])
return [n]
By combining contributions from many enthusiasts (including Glenn Maynard and MrHIDEn from above comments), I came up with following piece of code in python 2:
def simpleSieve(sieveSize):
#creating Sieve.
sieve = [True] * (sieveSize+1)
# 0 and 1 are not considered prime.
sieve[0] = False
sieve[1] = False
for i in xrange(2,int(math.sqrt(sieveSize))+1):
if sieve[i] == False:
continue
for pointer in xrange(i**2, sieveSize+1, i):
sieve[pointer] = False
# Sieve is left with prime numbers == True
primes = []
for i in xrange(sieveSize+1):
if sieve[i] == True:
primes.append(i)
return primes
sieveSize = input()
primes = simpleSieve(sieveSize)
Time taken for computation on my machine for different inputs in power of 10 is:
I prefer NumPy because of speed.
import numpy as np
# Find all prime numbers using Sieve of Eratosthenes
def get_primes1(n):
m = int(np.sqrt(n))
is_prime = np.ones(n, dtype=bool)
is_prime[:2] = False # 0 and 1 are not primes
for i in range(2, m):
if is_prime[i] == False:
continue
is_prime[i*i::i] = False
return np.nonzero(is_prime)[0]
# Find all prime numbers using brute-force.
def isprime(n):
''' Check if integer n is a prime '''
n = abs(int(n)) # n is a positive integer
if n < 2: # 0 and 1 are not primes
return False
if n == 2: # 2 is the only even prime number
return True
if not n & 1: # all other even numbers are not primes
return False
# Range starts with 3 and only needs to go up the square root
# of n for all odd numbers
for x in range(3, int(n**0.5)+1, 2):
if n % x == 0:
return False
return True
# To apply a function to a numpy array, one have to vectorize the function
def get_primes2(n):
vectorized_isprime = np.vectorize(isprime)
a = np.arange(n)
return a[vectorized_isprime(a)]
Check the output:
n = 100
print(get_primes1(n))
print(get_primes2(n))
[ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
[ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
Compare the speed of Sieve of Eratosthenes and brute-force on Jupyter Notebook. Sieve of Eratosthenes in 539 times faster than brute-force for million elements.
%timeit get_primes1(1000000)
%timeit get_primes2(1000000)
4.79 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.58 s ± 31.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Probably the quickest way to have primary numbers is the following:
import sympy
list(sympy.primerange(lower, upper+1))
In case you don't need to store them, just use the code above without conversion to the list
. sympy.primerange
is a generator, so it does not consume memory.
Source: Stackoverflow.com