求大神帮忙看看python这个问题

2024-10-31 06:16:42

有2个网友回答

网友（1）：

首先，仅从你的正则就能看出来，它肯定不会精确匹配你想要的内容，因为正则默认是贪婪捕获的，所以，你可以按以下方式进行匹配：

import re

#以下内容我已经用你给的网站测试过，是可以正常匹配的
#如果有什么其他需要，在追问
IMG = re.compile('"http:[^ ]+\.jpg?"')
imglist = re.findall(IMG, html) #假设html是你下载的网页内容

#看你的download函数似乎不是很健全，贴个我的函数(模拟浏览器提交数据，
＃可防止网站屏蔽，经常使用，目前挺稳定）
import time
import socket
import urlparse
import urllib2

def dowload (url, trynum = 2):
    print 'Downloading:', url
    user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'
    headers = {'User-agent': user_agent}
    request = urllib2.Request(url, headers = headers)
    try:
        html = urllib2.urlopen(request, timeout=10).read()
    except (urllib2.URLError, socket.timeout):
        html = None
        if trynum > 0:
            time.sleep(5)
            return dowload (url, trynum - 1)
    if not html:
        print 'Erro: Failed to download the url: %s' %(url)
    return html

网友（2）：

你那正则是贪婪匹配，你匹配不到的。