1

I created a python script that uses Selenium webdriver to scrap a website. Now I'm trying to run this script from the web using CGI. So to ensure that my CGI server is working I tried this :

import cgi
print 'Content-Type: text/html'
print
list_brand = ['VOLVO','FIAT', 'BMW']
print '<h1>TESTING CGI</h1>'
print '<form>'
print '<select>'
for i in range(3):
      print '<option value="' + list_brand[i] + '">'+ list_brand[i] +'</option>'
print '</select>'
print '</form>'

And it worked fine. Now, When I use Selenium with CGI using this script:

import cgitb
import cgi
from selenium import webdriver

print 'Content-Type: text/html'
print
cgitb.enable(display=0, logdir="C:/path/to/log/directory")
path_to_pjs = 'C:path/to/phantomjs-2.1.1-windows/bin/phantomjs.exe'
browser = webdriver.PhantomJS(executable_path = path_to_pjs)
#Reaching to URL
url = 'http://www.website.fr/cl/2/products'
browser.get(url)
div_set = browser.find_elements_by_class_name('productname')
print '<form>'
print '<select>'
for div in div_set:
      print '<option value="' + div.find_element_vy_tag_name('h3').text + '">'+ div.find_element_vy_tag_name('h3').text +'</option>'
print '</select>'
print '</form>'

the page keeps loading but doesn't respond. Any idea if this is even possible (I mean running selenium from a cgi script) or why my server doesn't respond ?

CC BY-SA 3.0

2 Answers 2

0

Well, I found the solution for my problem! for one : I didn't pay attention that I wrote vy instead of by in my functions : div.find_element_by_tag_name. And the second thing was using an Apache server. For some reason the lite python server using CGIHTTPServer doesn't work. So I used XAMPP modified the httpd.conf file and the last thing was adding the path #!/Python27/python to the script.

CC BY-SA 4.0
0

That may have worked in 2017, but in 2024, Apache HTTP Server doesn't let CGI/www-data import selenium. With this CGI script

#!/usr/bin/env python3
import cgi
#from selenium import webdriver
#import selenium
print("Content-type: text/plain")
print()
print("webserver test")

uncommenting either "from selenium import webdriver" or "import selenium" will result in HTTP 500 Internal Server Error. No error on this: $ python3 -c "import selenium;from selenium import webdriver;print('test bash')"

The solution now is to do the following in GNU/Linux. This is far from perfect:

  1. Run $ crontab -e and add the line * * * * * /path/to/run.sh
  2. File run.sh is set to executable (run $ chmod +x run.sh)
  3. Contents of "run.sh":
#!/usr/bin/env bash
export DISPLAY=:0
if [[ $(cat /path/to/run1) == "Yes do it" ]]; then
    python3 -c "from selenium import webdriver;options=webdriver.ChromeOptions();options.binary_location=\"/usr/bin/brave-browser\";driver=webdriver.Chrome(options=options);driver.get(\"$(cat /path/to/run2)\");"
fi
  1. Replace "/path/to/run1" and "/path/to/run2" with actual paths to empty text files somewhere that you have. They should have 777 permissions, or similar ($ chmod 777 run1).
  2. Create these two files in "/usr/lib/cgi-bin/": urlon.sh and urloff.sh
  3. Contents of "urlon.sh" (set to executable):
#!/bin/bash
echo "Content-type: text/plain"
echo
url="$(echo -n "$REQUEST_URI" | sed "s/.*?url=//g")"
echo "Yes do it" > /path/to/run1
echo "$url" > /path/to/run2
echo "URL: $url"
  1. Contents of "urloff.sh" (set to executable):
#!/bin/bash
echo "Content-type: text/plain"
echo
echo > /path/to/run1
echo "Disabled"
  1. Usage: $ curl -kL https://10.0.0.199/cgi-bin/urlon.sh?url=https://example.com = "URL: https://example.com" and $ curl -kL https://10.0.0.199/cgi-bin/urloff.sh = "Disabled". Remember to disable it so it doesn't keep going at every minute. Also, not sure if this will work if XSreenSaver / login screen lock comes into action.
CC BY-SA 4.0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.