
上QQ阅读APP看书,第一时间看更新
Getting images from a URL with urllib
In this example, we can see how to extract images using urllib and regular expressions. The easy way to extract images from a URL is to use the re module to find img elements in the target URL.
You can find the following code in the extract_images_urllib.py file:
#!/usr/bin/env python3
from urllib.request import urlopen, urljoin
import re
def download_page(url):
return urlopen(url).read().decode('utf-8')
def extract_image_locations(page):
img_regex = re.compile('<img[^>]+src=["\'](.*?)["\']',
re.IGNORECASE)
return img_regex.findall(page)
if __name__ == '__main__':
target_url = 'http://www.packtpub.com'
packtpub = download_page(target_url)
image_locations = extract_image_locations(packtpub)
for src in image_locations:
print(urljoin(target_url, src))
In this screenshot, we can see the script execution for the packtpub.com domain:
