Learning Python Networking（Second Edition）

上QQ阅读APP看书，第一时间看更新

Getting images from a URL with urllib

In this example, we can see how to extract images using urllib and regular expressions. The easy way to extract images from a URL is to use the re module to find img elements in the target URL.

You can find the following code in the extract_images_urllib.py file:

#!/usr/bin/env python3

from urllib.request import urlopen, urljoin
import re

def download_page(url):
    return urlopen(url).read().decode('utf-8')

def extract_image_locations(page):
    img_regex = re.compile('<img[^>]+src=["\'](.*?)["\']',
    re.IGNORECASE)
    return img_regex.findall(page)

if __name__ == '__main__':
    target_url = 'http://www.packtpub.com'
    packtpub = download_page(target_url)
    image_locations = extract_image_locations(packtpub)
    for src in image_locations:
        print(urljoin(target_url, src))

In this screenshot, we can see the script execution for the packtpub.com domain: