I have HTML webpages that I am crawling using xpath. The etree.tostring
of a certain node gives me this string:
<script>
<!--
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
document.write(a+c+b)
}
escramble_758()
//-->
</script>
I just need the output of escramble_758()
. I can write a regex to figure out the whole thing, but I want my code to remain tidy. What is the best alternative?
I am zipping through the following libraries, but I didnt see an exact solution. Most of them are trying to emulate browser, making things snail slow.
it's not yet possible to call a function defined in Javascript
)Edit: An example will be great.. (barebones will do)
This question is related to
javascript
python
screen-scraping
You can use js2py context to execute your js code and get output from document.write with mock document object:
import js2py
js = """
var output;
document = {
write: function(value){
output = value;
}
}
""" + your_script
context = js2py.EvalJs()
context.execute(js)
print(context.output)
You can also use Js2Py which is written in pure python and is able to both execute and translate javascript to python. Supports virtually whole JavaScript even labels, getters, setters and other rarely used features.
import js2py
js = """
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
document.write(a+c+b)
}
escramble_758()
""".replace("document.write", "return ")
result = js2py.eval_js(js) # executing JavaScript and converting the result to python string
Advantages of Js2Py include portability and extremely easy integration with python (since basically JavaScript is being translated to python).
To install:
pip install js2py
You can use requests-html which will download and use chromium underneath.
from requests_html import HTML
html = HTML(html="<a href='http://www.example.com/'>")
script = """
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
return a+c+b;
}
"""
val = html.render(script=script, reload=False)
print(val)
# +1 425-984-7450
More on this read here
quickjs should be the best option after quickjs come out. Just pip install quickjs
and you are ready to go.
modify based on the example on README.
from quickjs import Function
js = """
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
document.write(a+c+b)
escramble_758()
}
"""
escramble_758 = Function('escramble_758', js.replace("document.write", "return "))
print(escramble_758())
One more solution as PyV8 seems to be unmaintained and dependent on the old version of libv8.
PyMiniRacer It's a wrapper around the v8 engine and it works with the new version and is actively maintained.
pip install py-mini-racer
from py_mini_racer import py_mini_racer
ctx = py_mini_racer.MiniRacer()
ctx.eval("""
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
return a+c+b;
}
""")
ctx.call("escramble_758")
And yes, you have to replace document.write
with return
as others suggested
Source: Stackoverflow.com