[代码] 纯PHP实现的全功能 Http_Client 类(支持php4,php5)

其实用PHP模拟抓取HTTP内容,非常简单,实现方式也非常多,但本代码仍具备以下特色:
1. 纯PHP代码实现,不依赖任何其它第三方库或扩展,兼容PHP4.1起的所有版本。
2. 能设置或读取所有的HTTP头
3. 包含全功能并且带有一定智能的COOKIE处理,同一实例多次请求中会智能发送必要的COOKIE,COOKIE数据与外部文件交换(导入、导出)。
4. 可以自动处理 HTTP 301,302的跳转
5. 最可贵的一点是,支持 Keep-Alive 的HTTP连接,特别适合一次运行需求多次请求同一主机的内容情况,如采集。
6. 通过内置的Download()方法可以续传下载文件。
7. 支持通过POST方式上传任意个文件文件,发送数组字段等。
8. 支持SSL。

其它应用参见代码内的注释,英文注释兼容PHPDOC风格,英文表达可能不一定准确和合乎语法,多多见谅。
希望对大家有用~~

/**
 * Full featured Http Client class in pure PHP (4.1+)
 *
 * API list:
 * Object  $http = new Http_Client([bool $verbose = false]);
 * integer $http->getStatus();
 * string  $http->getTitle();
 * string  $http->getUrl();
 * void    $http->setHeader(string $key[, string $value = null]);
 * mixed   $http->getHeader([string $key = null]);
 * void    $http->setCookie(string $key, string $value);
 * mixed   $http->getCookie([string $key = null[, string $host = null]]);
 * bool    $http->saveCookie(string $filepath);
 * bool    $http->loadCookie(string $filepath);
 * void    $http->addPostField(string $key, mixed $value);
 * void    $http->addPostFile(string $key, string $filename[, string $content = null]);
 * string  $http->Get(string $url[, bool $redirect = true]);
 * mixed   $http->Head(string $url[, bool $redirect = true]);
 * string  $http->Post(string $url[, bool redirect = true]);
 * bool    $http->Download(string $url[, string $filepath = null[, bool overwrite = false]);
 *
 * @author hightman
 * @link http://www.hightman.cn/
 * @copyright Copyright © 2008-2010 Twomice Studio
 * @version $Id: http_client.class.php,v 1.22 2010/10/16 16:42:47 hightman Exp $
 */

/**
 * Defines the package name.
 */
define ('HC_PACKAGENAME',	'HttpClient');
/**
 * Defines the package version.
 */
define ('HC_VERSION',		'2.0-beta');
/**
 * This constant defines how many times should be tried on I/O failure (timeout and error).
 * Defaults to 3, it should be greater than 0.
 */
define ('HC_MAX_RETRIES',	3);

/**
 * Http_Client is a full featured client class for the HTTP protocol.
 *
 * It currently implements some HTTP/1.x protocols, including request method HEAD, GET, POST,
 * and automatic handling of authorization, redirection request, and cookies.
 *
 * Features include:
 * 1) Pure PHP code, none of extensions is required, PHP version just from 4.1.0;
 * 2) Ability to set/get any HTTP request headers, such as user-agent, referal page, etc;
 * 3) Includes full featured cookie support, automatic sent cookie header if required on next request;
 * 4) Handle redirected requests automatically (such as HTTP/301, HTTP/302);
 * 5) Support real Keep-Alive connections, used for multiple requests;
 * 6) Can resume getting a partially-downloaded file use special download() method;
 * 7) Support multiple files upload via post method, support array named request variable (arr[]=...)
 * 8) SSL support
 *
 * The whole library code is open and free, you can use it for any purposes.
 *
 * @author hightman
 * @version 2.0-beta $
 */
class Http_Client
{
	/**
	 * local private variables
	 * @access private
	 */
	var $headers, $status, $title, $cookies, $socks, $url, $filepath, $verbose;
	var $post_files, $post_fields;
	
	/** 
	 * Constructor (PHP4-style).
	 * @param boolean wheather to display verbose execute messages
	 */
	function Http_Client($verbose = false)
	{
		$this->__construct($verbose);
	}
	
	/** 
	 * Constructor (PHP5).
	 * @param boolean wheather to display verbose execute messages
	 */
	function __construct($verbose = false)
	{
		$this->verbose = $verbose;
		$this->cookies = array();
		$this->socks = array();	
		$this->_reset();
	}

	/** 
	 * Destructor (PHP5 only).
	 * Close all opened socket connections.
	 */
	function __destruct()
	{
		foreach ($this->socks as $host => $sock) { @fclose($sock); }
		$this->socks = array();
	}

	/** 
	 * Get HTTP respond status code of the last HTTP request.
	 * @return integer http respond status code
	 */
	function getStatus()
	{
		return $this->status;
	}

	/** 
	 * Get HTTP respond short title of the last HTTP request.
	 * @return string http respond short title
	 */
	function getTitle()
	{
		return $this->title;
	}
	
	/** 
	 * Get the real URL of the last HTTP request.
	 * @return string real URL of the last http request after redirecting
	 */	
	function getUrl()
	{
		return $this->url;
	}

	/** 
	 * Get the downloaded file path after calling Download() method.
	 * @return string filepath saved on local disk
	 */	
	function getFilepath()
	{
		return $this->filepath;
	}

	/** 
	 * Set a HTTP header for the next request.
	 * @param string the name of the request header
	 * @param string the value of the request header
	 * If the value is NULL, the header will be dropped.
	 * Note: special key 'x-server-addr' will force to use instead of gethostbyname(host)
	 */
	function setHeader($key, $value = null)
	{
		$this->_reset();
		$key = strtolower($key);
		if (is_null($value)) unset($this->headers[$key]);
		else $this->headers[$key] = strval($value);
	}
	
	/** 
	 * Get one or more HTTP headers of the last request.
	 * @param string the name of the header to be fetched.
	 * If is NULL, return the all headers of the last request.
	 * @return mixed fetched header value or headersas key-value array.
	 * If the header dose not exists, NULL is returned.
	 */	
	function getHeader($key = null)
	{
		if (is_null($key)) return $this->headers;
		$key = strtolower($key);
		if (!isset($this->headers[$key])) return null;
		return $this->headers[$key];
	}

	/** 
	 * Add a HTTP cookie sent for the next request.
	 * @param string the name of the cookie to be added
	 * @param string the value of the cookie to be added
	 */
	function setCookie($key, $value)
	{
		$this->_reset();
		if (!isset($this->headers['cookie'])) $this->headers['cookie'] = array();
		$this->headers['cookie'][$key] = $value;
	}

	/** 
	 * Get a HTTP cookie item by name
	 * @param string the name of the cookie to be fetched
	 * If the name is NULL, all matched cookies are returned as key-value array.  
	 * @param string host of all saved cookies (include expired)
	 * If the host is NULL, fetch the cookie from last request.
	 * @return mixed fetched cookie item or cookies as key-value array.
	 * Every cookie item is a assoc array, keys include: value, expires, path, host
	 * If the cookie dose not exists, NULL is returned.	
	 */
	function getCookie($key = null, $host = null)
	{
		// fetch from last request
		if (!is_null($key)) $key = strtolower($key);
		if (is_null($host))
		{
			if (!isset($this->headers['cookie'])) return null;
			if (is_null($key)) return $this->headers['cookie'];
			if (!isset($this->headers['cookie'][$key])) return null;
			return $this->headers['cookie'][$key];
		}
		// fetch from all saved cookies.
		$host = strtolower($host);
		while (true)
		{
			if (isset($this->cookies[$host]))
			{
				if (is_null($key)) return $this->cookies[$host];
				if (isset($this->cookies[$host][$key])) return $this->cookies[$host][$key];
			}
			// search for next sub-domain
			$pos = strpos($host, '.', 1);
			if ($pos === false) break;
			$host = substr($host, $pos);
		}
		return null;
	}

	/** 
	 * Save all cookies to a file.
	 * @param string the file path that cookies will be saved to.
	 * @return boolean save result, return true on success and false on faiulre.
	 * Note: all cookies are serialized before saving.
	 */
	function saveCookie($fpath)
	{
		if (false === ($fd = @fopen($fpath, 'w')))
			return false;
		$data = serialize($this->cookies);
		fwrite($fd, $data);
		fclose($fd);
		return true;
	}

	/** 
	 * Load cookies from a file
	 * @param string the file path that cookies has been saved to.
	 * The cookie file should be created by saveCookie() method.
	 */
	function loadCookie($fpath)
	{
		if (file_exists($fpath) && ($cookies = @unserialzie(file_get_contents($fpath))))
			$this->cookies = $cookies;
	}

	/** 
	 * Add a post field for the next request
	 * @param string the name of the field.
	 * @param mixed the value of the field, can be array or string.
	 * If the value is an array, converted to arr[key][key2] fields automatically.
	 */
	function addPostField($key, $value)
	{
		$this->_reset();
		if (!is_array($value))
			$this->post_fields[$key] = strval($value);
		else
		{
			$value = $this->_format_array_field($value);
			foreach ($value as $tmpk => $tmpv)
			{
				$tmpk = $key . '[' . $tmpk . ']';
				$this->post_fields[$tmpk] = strval($tmpv);
			}
		}
	}

	/**
	 * Add a multipart post file for the next request
	 * @param string the name of the field
	 * @param string the filename or filepath to be uploaded
	 * @param string content the file content
	 * If the content is null and fname is a valid filepath, 
	 * content will be set to the file content.
	 */
	function addPostFile($key, $fname, $content = '')
	{
		$this->_reset();
		if ($content === '' && is_file($fname)) $content = @file_get_contents($fname);
		$this->post_files[$key] = array(basename($fname), $content);
	}

	/**
	 * Do a http request via get method
	 * @param string the absolute URL
	 * @param boolean handle redirected requests automatically or not
	 * @return string respond body data or false on failure before server respond.
	 */
	function Get($url, $redir = true)
	{
		return $this->_do_url($url, 'get', null, $redir);
	}

	/**
	 * Do a http request via head method
	 * @param string the absolute URL
	 * @param boolean handle redirected requests automatically or not	 
	 * @return mixed all respond HTTP header or false on failure before server respond.
	 */
	function Head($url, $redir = false)
	{
		if ($this->_do_url($url, 'head', null, $redir) !== false)
			return $this->getHeader(null);
		return false;
	}

	/**
	 * Do a http request via post method
	 * @param string the absolute URL
	 * @param boolean handle redirected requests automatically or not
	 * @return string respond body data or false on failure before server respond.
	 * Note: post request variable should be set by ::addPostField() and ::addPostFile()
	 */
	function Post($url, $redir = true)
	{
		$data = '';
		if (count($this->post_files) > 0)
		{
			$boundary = md5($url . microtime());
			foreach ($this->post_fields as $tmpk => $tmpv)
			{
				$data .= "--{$boundary}\r\nContent-Disposition: form-data; name=\"{$tmpk}\"\r\n\r\n{$tmpv}\r\n";
			}
			foreach ($this->post_files as $tmpk => $tmpv)
			{
				$type = 'application/octet-stream';
				$ext = strtolower(substr($tmpv[0], strrpos($tmpv[0],'.')+1));
				if (isset($GLOBALS['___HC_MIMES___'][$ext])) $type = $GLOBALS['___HC_MIMES___'][$ext];
				$data .= "--{$boundary}\r\nContent-Disposition: form-data; name=\"{$tmpk}\"; filename=\"{$tmpv[0]}\"\r\nContent-Type: $type\r\nContent-Transfer-Encoding: binary\r\n\r\n";
				$data .= $tmpv[1] . "\r\n";
			}
			$data .= "--{$boundary}--\r\n";
			$this->setHeader('content-type', 'multipart/form-data; boundary=' . $boundary);
		}
		else
		{
			foreach ($this->post_fields as $tmpk => $tmpv)
			{
				$data .= '&' . rawurlencode($tmpk) . '=' . rawurlencode($tmpv);
			}
			$data = substr($data, 1);
			$this->setHeader('content-type', 'application/x-www-form-urlencoded');
		}
		$this->setHeader('content-length', strlen($data));
		return $this->_do_url($url, 'post', $data, $redir);
	}

	/**
	 * Download a file to local via get method with range support
	 * @param string the absolute URL
	 * @param string local filepath to saved, default is the same filename on current working directory.
	 * @param boolean weather to overwrite the exists file 
	 * when filepath exists and not a valid partially-downloaded file.
	 * @return boolean true on success and false on failure.
	 * Note: this method can be used to resume getting a partially-downloaded file.
	 */
	function Download($url, $filepath = null, $overwrite = false)
	{
		// get filepath & head
		if ($filepath === true)
		{
			$overwrite = true; 
			$filepath = null;
		}
		if (is_null($filepath) || empty($filepath)) $filepath = '.';
		// get normal headers first
		$savehead = $this->getHeader(null);
		if (!$this->Head($url, true))
		{
			if ($this->verbose) echo "[ERROR] failed to get headers for downloading file.\n";
			return false;
		}
		else if ($this->getStatus() != 200)
		{
			if ($this->verbose) echo "[ERROR] can not get a valid 200 HTTP respond status.\n";
			return false;
		}
		// get filename & filesize
		$url = $this->getUrl();
		if ($this->verbose) echo "[INFO] real download url is: $url\n";
		if (is_dir($filepath))
		{
			if (substr($filepath, -1, 1) != DIRECTORY_SEPARATOR) $filepath .= DIRECTORY_SEPARATOR;		
			if (($disposition = $this->getHeader('content-disposition')) 
				&& preg_match('/filename=[\'"]?([^;\'" ]+)/', $disposition, $match))
			{
				$filename = $match[1];
				if ($this->verbose) echo "[INFO] fetch filename from disposition header: $filename\n";
			}
			else
			{
				$tmpstr = ($pos = strpos($url, '?')) ? substr($url, 0, $pos) : $url;
				$pos = strrpos($tmpstr, '/');
				$filename = substr($tmpstr, $pos + 1);
				if ($filename == '') $filename = 'index.html';
				if ($this->verbose) echo "[INFO] fetch filename from URL: $filename\n";
			}
			while (true)
			{
				$filepath .= $filename;
				if (!is_dir($filepath)) break;
				$filepath .= DIRECTORY_SEPARATOR . $filename;
			}
		}
		// check filepath
		if (!file_exists($filepath) || !($fsize = @filesize($filepath)))
		{
			$savefd = @fopen($filepath, 'w');
			if ($this->verbose) echo "[INFO] save file directly to: $filepath\n";
		}
		else
		{
			$length = $this->getHeader('content-length');
			$accept = $this->getHeader('accept-ranges');
			if ($length && $fsize < $length && stristr($accept, 'bytes'))
			{
				// range request used
				$this->setHeader('range', 'bytes=' . $fsize . '-');
				$savefd = @fopen($filepath, 'a');
				if ($this->verbose) echo "[INFO] range download used, range: {$fsize}-\n";
			}
			else if ($overwrite)
			{
				$savefd = @fopen($filepath, 'w');
				if ($this->verbose) echo "[INFO] overwrite the exists file: $filepath\n";
			}
			else
			{
				// auto append filename '.1, .2, ...'
				for ($i = 1; @file_exists($filepath . '.' . $i); $i++);
				$filepath .= '.' . $i;
				$savefd = @fopen($filepath, 'w');
				if ($this->verbose) echo "[INFO] auto skip exists file, last save to: $filepath\n";
			}
		}
		// check the savefd
		if (!$savefd)
		{
			if ($this->verbose) echo "[ERROR] can not open the file to save data: $filename\n";
			return false;
		}
		// do real download via get method
		foreach ($savehead as $hk => $hv) $this->setHeader($hk, $hv);
		if ($this->_do_url($url, 'get', null, false, $savefd) !== false)
		{
			$this->filepath = $filepath;
			fclose($savefd);
			if ($this->verbose) echo "[INFO] downloaded file saved in: $filepath\n";
			return true;
		}
		else
		{
			if ($this->verbose) echo "[ERROR] can not download the URL: $url\n";
			return false;
		}
	}
	
	// -------------------------------------------------
	// private functions
	// -------------------------------------------------
	// read data from socket
	function _sock_read($fd, $maxlen = 4096, $wfd = false)
	{
		$rlen = 0;
		$data = '';
		$ntry = HC_MAX_RETRIES;
		while (!feof($fd))
		{
			$part = fread($fd, $maxlen - $rlen);
			if ($part === false || $part === '') $ntry--;
			else $data .= $part;
			$rlen = strlen($data);
			if ($rlen == $maxlen || $ntry == 0) break;
		}
		if ($ntry == 0 || feof($fd)) @fclose($fd);
		if (is_resource($wfd))
		{
			fwrite($wfd, $data);
			$data = '';
		}
		return $data;
	}

	// write data to socket
	function _sock_write($fd, $buf)
	{
		$wlen = 0;
		$tlen = strlen($buf);
		$ntry = HC_MAX_RETRIES;
		while ($wlen < $tlen)
		{
			$nlen = fwrite($fd, substr($buf, $wlen), $tlen - $wlen);
			if (!$nlen) { if (--$ntry == 0) return false; }
			else $wlen += $nlen;
		}
		return true;
	}

	// reset some request data (status)
	function _reset()
	{
		if ($this->status !== 0) 
		{
			$this->status = 0;
			$this->url = $this->title = $this->filepath = null;
			$this->headers = $this->post_files = $this->post_fields = array();
		}
	}
	
	// check is a host belong a domain
	function _belong_domain($host, $domain)
	{
		if (!strcasecmp($domain, $host)) return true;
		if (substr($domain, 0, 1) == '.')
		{
			if (!strcasecmp($host, substr($domain, 1))) return true;
			$hlen = strlen($host);
			$dlen = strlen($domain);
			if ($hlen > $dlen && !strcasecmp(substr($host, $hlen - $dlen), $domain))
				return true;
		}
		return false;
	}

	// format array field (convert N-DIM(n>=2) array => 2-DIM array)
	function _format_array_field($value, $pk = NULL)
	{
		$ret = array();
		foreach ($value as $k => $v)
		{
			$k = (is_null($pk) ? $k : $pk . $k);
			if (is_array($v)) $ret += $this->_format_array_field($v, $k . '][');
			else $ret[$k] = $v;
		}
		return $ret;
	}

	// do a url method
	function _do_url($url, $method, $data = null, $redir = true, $savefd = false)
	{
		// check the url
		if (strncasecmp($url, 'http://', 7) && strncasecmp($url, 'https://', 8) && isset($_SERVER['HTTP_HOST']))
		{
			$base = 'http://' . $_SERVER['HTTP_HOST'];
			if (substr($url, 0, 1) != '/')
				$url = substr($_SERVER['SCRIPT_NAME'], 0, strrpos($_SERVER['SCRIPT_NAME'], '/')+1) . $url;			
			$url = $base . $url;
		}

		// parse the url
		$url = str_replace('&', '&', $url);
		$pa = @parse_url($url);
		if ($pa['scheme'] && $pa['scheme'] != 'http' && $pa['scheme'] != 'https')
		{
			trigger_error("Invalid scheme `{$pa['scheme']}`", E_USER_WARNING);
			return false;
		}
		if (!isset($pa['host']))
		{
			trigger_error("Invalid request url, host required", E_USER_WARNING);
			return false;
		}
		if (!isset($pa['port'])) $pa['port'] = ($pa['scheme'] == 'https' ? 443 : 80);
		if (!isset($pa['path']))
		{
			$pa['path'] = '/';
			$url .= '/';
		}
		$host = strtolower($pa['host']);
		if (isset($this->headers['x-server-addr'])) $addr = $this->headers['x-server-addr'];
		else $addr = gethostbyname($pa['host']);
		$port = intval($pa['port']);
		$skey = $addr . ':' . $port;
		if ($pa['scheme'] && $pa['scheme'] == 'https') $host_conn = 'ssl://' . $addr;
		else $host_conn = 'tcp://' . $addr;

		// make the query buffer
		$method = strtoupper($method);
		$buf = $method . ' ' . $pa['path'];
		if (isset($pa['query'])) $buf .= '?' . $pa['query'];
		$buf .= " HTTP/1.1\r\nHost: {$host}\r\n";
		
		// basic auth support
		if (isset($pa['user']) && isset($pa['pass']))
			$this->headers['authorization'] = 'Basic ' . base64_encode($pa['user'] . ':' . $pa['pass']);

		// set default HTTP/headers
		$savehead = $this->headers;
		$this->_reset();
		if (!isset($this->headers['user-agent'])) 
		{
			$buf .= "User-Agent: Mozilla/5.0 (Compatible; " . HC_PACKAGENAME . "/" . HC_VERSION . "; +Hightman) ";
			$buf .= "php-" . php_sapi_name() . "/" . phpversion() . " ";
			$buf .= php_uname("s") . "/" . php_uname("r") . "\r\n";
		}
		if (!isset($this->headers['accept'])) $buf .= "Accept: */*\r\n";
		if (!isset($this->headers['accept-language'])) $buf .= "Accept-Language: zh-cn,zh\r\n";
		if (!isset($this->headers['connection'])) $buf .= "Connection: Keep-Alive\r\n";
		if (isset($this->headers['accept-encoding'])) unset($this->headers['accept-encoding']);
		if (isset($this->headers['host'])) unset($this->headers['host']);

		// saved cookies (session data)
		$now = time();
		$ck_str = '';
		foreach ($this->cookies as $ck_host => $ck_list)
		{
			if (!$this->_belong_domain($host, $ck_host)) continue;
			foreach ($ck_list as $ck => $cv)
			{
				if (isset($this->headers['cookie'][$ck])) continue;
				if ($cv['expires'] > 0 && $cv['expires'] < $now) continue;
				if (strncmp($pa['path'], $cv['path'], strlen($cv['path']))) continue;
				$ck_str .= '; ' . $cv['rawdata'];
			}
		}
		foreach ($this->headers as $k => $v)
		{
			if ($k != 'cookie')
				$buf .= ucfirst($k) . ": " . $v . "\r\n";
			else
			{
				foreach ($v as $ck => $cv) $ck_str .= '; ' . rawurlencode($ck) . '=' . rawurlencode($cv);
			}
		}
		// TODO: check cookie length?
		if ($ck_str != '') $buf .= 'Cookie:' . substr($ck_str, 1) . "\r\n";
		$buf .= "\r\n";
		if ($method == 'POST') $buf .= $data . "\r\n";

		// force reset status for next query even if failed this time.
		$this->status = -1;
		$this->url = $url;

		// show the header buf
		if ($this->verbose)
		{
			echo "[INFO] request url: $url\r\n";
			echo "[SEND] request buffer\r\n----\r\n";
			echo $buf;
			echo "----\r\n";
		}

		// create the sock & send the header
		$ntry = HC_MAX_RETRIES;
		$sock = isset($this->socks[$skey]) ? $this->socks[$skey] : false;
		do
		{
			if (is_resource($sock) && $this->_sock_write($sock, $buf)) break;
			if ($sock) @fclose($sock);
			$sock = @fsockopen($host_conn, $port, $errno, $error, 3);
			if ($sock)
			{
				stream_set_blocking($sock, 1);
				stream_set_timeout($sock, 10);
			}			
		}
		while (--$ntry);
		if (!$sock)
		{
			if (isset($this->socks[$skey])) unset($this->socks[$skey]);
			trigger_error("Cann't connect to `$host:$port'", E_USER_WARNING);
			return false;
		}
		$this->socks[$skey] = $sock;
		if ($this->verbose)
		{
			echo "[SEND] using socket = {$sock}\r\n";
			echo "[RECV] http respond header\r\n----\r\n";
		}

		// read the respond header
		$with_range = isset($this->headers['range']);
		$this->headers = array();
		while ($line = fgets($sock, 2048))
		{
			if ($this->verbose) echo $line;
			$line = trim($line);
			if ($line === '') break;
			if (!strncasecmp('HTTP/', $line, 5))
			{
				$line = trim(substr($line, strpos($line, ' ')));
				list($this->status, $this->title) = explode(' ', $line, 2);
				$this->status = intval($this->status);
			}
			else if (!strncasecmp('Set-Cookie: ', $line, 12))
			{
				// ignore the cookie options: Httponly
				$ck_key = '';
				$ck_val = array('value' => '', 'expires' => 0, 'path' => '/', 'domain' => $host);
				$tmpa = explode(';', substr($line, 12));
				foreach ($tmpa as $tmp)
				{
					$tmp = trim($tmp);
					if (empty($tmp)) continue;
					list($tmpk, $tmpv) = explode('=', $tmp, 2);
					$tmpk2 = strtolower($tmpk);
					if ($ck_key == '')
					{
						$ck_key = rawurldecode($tmpk);
						$ck_val['value'] = rawurldecode($tmpv);
						$ck_val['rawdata'] = $tmpk . '=' . $tmpv;
					}
					else if ($tmpk2 == 'expires')
					{
						$ck_val['expires'] = strtotime($tmpv);
						if ($ck_val['expires'] < $now)
						{
							$ck_val['value'] = '';
							break;
						}
					}
					else if (isset($ck_val[$tmpk2]) && $tmpv != '')
					{
						$ck_val[$tmpk2] = $tmpv;
						// drop invalid-domain cookies?
						if ($tmpk2 == 'domain' && !$this->_belong_domain($host, $tmpv)) $ck_key = '';
					}
				}

				// delete cookie?
				if ($ck_key == '') continue;
				if ($ck_val['value'] == '') unset($this->cookies[$ck_val['domain']][$ck_key]);
				else $this->cookies[$ck_val['domain']][$ck_key] = $ck_val;

				// headers.
				$this->headers['cookie'][$ck_key] = $ck_val;
			}
			else 
			{
				list($k, $v) = explode(':', $line, 2);
				$k = strtolower(trim($k));
				$v = trim($v);
				$this->headers[$k] = $v;
			}
		}
		if ($this->verbose) echo "----\r\n";
		
		// check savefd
		if ($savefd && $with_range)
		{
			if ($this->status == 200)
			{
				ftruncate($savefd, 0);
				fseek($savefd, 0, SEEK_SET);
			}
			else if ($this->status != 206) $savefd = false;
		}

		// get body
		$connection = $this->getHeader('connection');
		$encoding = $this->getHeader('transfer-encoding');
		$length = $this->getHeader('content-length');
		if ($method == 'HEAD') 
		{
			// nothing to do
			$body = '';
		}
		else if ($encoding && !strcasecmp($encoding, 'chunked'))
		{
			$body = '';
			while (is_resource($sock))
			{
				if (!($line = fgets($sock, 1024))) break;
				if ($this->verbose) echo "[RECV] Chunk Line: " . $line;
				if ($p1 = strpos($line, ';')) $line = substr($line, 0, $pos);
				$chunk_len = hexdec(trim($line));
				if ($chunk_len <= 0) break;	// end the chunk
				$body .= $this->_sock_read($sock, $chunk_len, $savefd);
				fread($sock, 2);			// eat the CRLF
			}

			// trailer header
			if ($this->verbose) echo "[RECV] chunk tailer\r\n----\r\n";
			while ($line = fgets($sock, 2048))
			{
				if ($this->verbose) echo $line;
				$line = trim($line);
				if ($line === '') break;
				list($k, $v) = explode(':', $line, 2);
				$k = strtolower(trim($k));
				$v = trim($v);
				$this->headers[$k] = $v;
			}		
			if ($this->verbose) echo "----\r\n";
		}
		else if ($length)
		{
			$body = '';
			$length = intval($length);
			while ($length > 0 && is_resource($sock))
			{
				$body .= $this->_sock_read($sock, ($length > 8192 ? 8192 : $length), $savefd);
				$length -= 8192;
			}
		}
		else
		{
			$body = '';
			while (is_resource($sock) && !feof($sock)) $body .= $this->_sock_read($sock, 8192, $savefd);
			$connection = 'close';
		}		

		// check close connection?
		if ($connection && !strcasecmp($connection, 'close'))
		{
			@fclose($sock);
			unset($this->socks[$skey]);
		}
			
		// check redirect
		if ($redir && $this->status != 200 && ($location = $this->getHeader('location')))
		{
			if (!is_int($redir)) $redir = HC_MAX_RETRIES;
			if (!preg_match('/^http[s]?:\/\//i', $location))
			{
				$url2 = $pa['scheme'] . '://' . $pa['host'];
				if (strpos($url, ':', 8)) $url2 .= ':' . $pa['port'];
				if (substr($location, 0, 1) == '/') $url2 .= $location;
				else $url2 .= substr($pa['path'], 0, strrpos($pa['path'], '/') + 1) . $location;
				$location = $url2;
			}
			if (!isset($savehead['referer'])) $savehead['referer'] = $url;
			foreach ($savehead as $hk => $hv) $this->setHeader($hk, $hv);
			return $this->_do_url($location, ($method == 'HEAD' ? 'head' : 'get'), null, $redir - 1);
		}

		// return the body buf
		return $body;
	}
}

// mimetypes used on http_client
$GLOBALS['___HC_MIMES___'] = array(
	'gif' => 'image/gif',
	'png' => 'image/png',
	'bmp' => 'image/bmp',
	'jpeg' => 'image/jpeg',
	'pjpg' => 'image/pjpg',
	'jpg' => 'image/jpeg',
	'tif' => 'image/tiff',
	'htm' => 'text/html',
	'css' => 'text/css',
	'html' => 'text/html',
	'txt' => 'text/plain',
	'gz' => 'application/x-gzip',
	'tgz' => 'application/x-gzip',
	'tar' => 'application/x-tar',
	'zip' => 'application/zip',
	'hqx' => 'application/mac-binhex40',
	'doc' => 'application/msword',
	'pdf' => 'application/pdf',
	'ps' => 'application/postcript',
	'rtf' => 'application/rtf',
	'dvi' => 'application/x-dvi',
	'latex' => 'application/x-latex',
	'swf' => 'application/x-shockwave-flash',
	'tex' => 'application/x-tex',
	'mid' => 'audio/midi',
	'au' => 'audio/basic',
	'mp3' => 'audio/mpeg',
	'ram' => 'audio/x-pn-realaudio',
	'ra' => 'audio/x-realaudio',
	'rm' => 'audio/x-pn-realaudio',
	'wav' => 'audio/x-wav',
	'wma' => 'audio/x-ms-media',
	'wmv' => 'video/x-ms-media',
	'mpg' => 'video/mpeg',
	'mpga' => 'video/mpeg',
	'wrl' => 'model/vrml',
	'mov' => 'video/quicktime',
	'avi' => 'video/x-msvideo'
);

[代码] 纯PHP实现的全功能 Http_Client 类(支持php4,php5)》上有14条评论

  1. hightman

    原来是直接贴了代码,现在也可以从附件中下载了。稍后有空会贴一下示例用法,不过应该很简单的了。
    ~~~
    1.关于 Keep-Alive:
    Http_Client 在没有明确发送 Connection: close 头而且WEB服务器也支持保持连接的话,会自动对请求过的连接进行缓存,
    同一脚本内的下一次针对同一主机(IP)的请求将会直接采用这条连接,而不需重新建立连接。
    如:for ($i = 1; $i < 5; $i++) $http->Get(‘http://www.google.com’); 这样并不会为每次调用重复建立HTTP连接。

    2. 关于 Cookies:
    cookie 已经成为 HTTP 交互中非常重要的一个环节,像所有跟SESSION活动有关的行为均默认会通过COOKIE来
    传递SESSIONID,而很多站点也开始利用COOKIE的支持与否来判断是否为ROBOT采集。所以如果不能支持COOKIE
    将会使 HTTP_CLIENT 黯然失色。

    COOKIE的处理在 HTTP_Client 中是隐含行为,每次请求完成后您可以调用 $http->getCookie([string key = null]);
    来取得当前的 cookie,在同一脚本周期内请求同一域名范围的URL时,将会智能的根据需求发送所有之前接收到的COOKIE,
    就像缺省的浏览器行为一样,而用户无需多余的操作。

    HTTP的所有COOKIE数据保存在相应的数组中,在HTTP_CLIENT结束后如果有必要,您可以调用
    $http->saveCookies(‘/path/to/file’); 将得到的COOKIES都保存到 /path/to/file 中,等下一次开启新的
    HTTP_CLIENT 实例时,可以调用 $http->loadCookies(‘/path/to/file’); 把之前存下的COOKIE载入。

    3. 关于 Debug:
    由于 Http_Client 的 head/post/get 返回值均为响应body而不包含 header,所以取得HTTP头必须另行调用
    $http->getHeader([string key = null]); 若不传入 key 则返回上一请求的所有HTTP头。

    如果有情况需要 debug 显示完全的交互过程,可以在创建 http_client 实例时传入 true 作为构造函数的参数,
    那么将会直接输出详尽的交互信息。

    回复
  2. hackfan

    我觉得HttpClient最重要的一个特性是对连接的控制
    具体说就是对超时的控制

    CURL在超时控制上做的很好,纯PHP好像在这方面处理不够好

    回复
  3. hightman

    [quote=’hackfan’ pid=’3796′ dateline=’1288006531′]
    我觉得HttpClient最重要的一个特性是对连接的控制
    具体说就是对超时的控制

    CURL在超时控制上做的很好,纯PHP好像在这方面处理不够好
    [/quote]

    从PHP开始 stream 系列函数也可以完全控制超时参数,就一个简单的 file_get_contents() 有一个参数 $context 可以把控。

    HTTP_CLIENT最主要是可以维系长连接并反复运用,同时支持COOKIE的一些自动处理。

    回复
  4. hightman

    [quote=’bawbaw’ pid=’3801′ dateline=’1288078272′]
    內建的Download續傳時工作的很好,但如果已下載完成,再使用Download這方法下載同一檔案時,會卡住不動
    [/quote]

    多谢告知,不过我在我的机器上测试了一下,如果下载完成并且没有设置为overwrite那么会自动重新下载保存在原文件名.1, .2, .3 … 这样上去

    回复
  5. bawbaw

    [quote=’hightman’ pid=’3802′ dateline=’1288190344′]
    [quote=’bawbaw’ pid=’3801′ dateline=’1288078272′]
    內建的Download續傳時工作的很好,但如果已下載完成,再使用Download這方法下載同一檔案時,會卡住不動
    [/quote]

    多谢告知,不过我在我的机器上测试了一下,如果下载完成并且没有设置为overwrite那么会自动重新下载保存在原文件名.1, .2, .3 … 这样上去
    [/quote]

    原來是這樣,的確是變為 .1,.2的方式,我误以为卡住了,那有没有可能多一设置为 档案大小一样时,SKIP 的呢?

    回复
  6. hightman

    [quote=’bawbaw’ pid=’3811′ dateline=’1288311603′]
    [quote=’hightman’ pid=’3802′ dateline=’1288190344′]
    [quote=’bawbaw’ pid=’3801′ dateline=’1288078272′]
    內建的Download續傳時工作的很好,但如果已下載完成,再使用Download這方法下載同一檔案時,會卡住不動
    [/quote]

    多谢告知,不过我在我的机器上测试了一下,如果下载完成并且没有设置为overwrite那么会自动重新下载保存在原文件名.1, .2, .3 … 这样上去
    [/quote]

    原來是這樣,的確是變為 .1,.2的方式,我误以为卡住了,那有没有可能多一设置为 档案大小一样时,SKIP 的呢?
    [/quote]

    目前还没有这个选项或参数,不过你可以提前判断再决定要不要调用download()?

    回复
  7. qtoy2ha

    543行这句逻辑是否有问题(http/https):
    if (strncasecmp($url, ‘http://’, 7) && strncasecmp($url, ‘https://’, 8) && isset($_SERVER[‘HTTP_HOST’]))

    回复
  8. hightman

    没问题,这个的意思是当传入的URL不以http://且不以https://开头,并且 _SERVER[‘HTTP_HOST’] 存在时,则视为在当前主机头底下访问这个URL。

    比如:你的网址 http://localhost/path/to/test.php 中应用了 HTTP_CLIENT 类,那么在 test.php 中调用
    $http->Get(‘hello.php’) 则为视为 http://localhost/path/to/hello.php
    而调用 $http->Get(‘/hello.php’); 则视为 http://localhost/hello.php

    是这个意思

    回复
  9. qtoy2ha

    抓到一张头是这样的图片,结果连头一起保存了
    HTTP/1.1 200 OK
    Content-Type: image/jpeg
    Connection: keep-alive
    Server: nginx/BIT_V1_8
    Date: Sun, 17 Apr 2011 07:42:44 GMT
    Content-Length: 57790
    Last-Modified: Thu, 30 Dec 2010 00:01:49 GMT
    Expires: Sat, 16 Jul 2011 07:42:44 GMT
    Cache-Control: max-age=7776000
    Accept-Ranges: bytes
    Powered-By-ChinaCache: MISS from CNC-JZ-5-3B9
    Switch: FSCS

    (从这里开始是内容,二进制文件,就不帖出来了)
    图片地址:http://img1.bitautoimg.com/autoalbum/files/20101222/776/15212277614776_1499795_7.JPG

    回复

发表评论

电子邮件地址不会被公开。 必填项已用*标注