Urlparse¶
This module defines a standard interface to break Uniform Resource Locator (URL) strings up in components (addressing scheme, network location, path etc.)
parse(url)
Returns a tuple (scheme
, netloc
, path
, query
, fragment
) derived from url
.
This corresponds to the general structure of a URL: scheme://netloc/path?query#fragment
. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path
component, which is retained if present. For example:
import urlparse o = urlparse(‘http://www.cwi.nl:80/%7Eguido/Python.html’)
# result is (‘http’, ‘www.cwi.nl:80’, ‘/%7Eguido/Python.html’, ‘’, ‘’)
Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
parse_netloc(netloc)
Given netloc
as parsed by parse()
, breaks it in its component returning a tuple (user
, password
, host
, port
). Each component of the returned tuple is a string.
quote(s)
Return the urlencoded version of s
.
Urlencoding transforms unsafe bytes to their %XX representation where XX is the hex value of the byte.
Safe bytes are:
-
lowercase letters from “a” to “z” (bytes from 0x61 to 0x7a)
-
uppercase letters from “A” to “Z” (bytes from 0x41 to 0x5a)
-
numbers from “0” to “9” (bytes from 0x30 to 0x39)
-
the following symbols: $-_.+!*’()
quote_plus(s)
Like quote()
, but also escapes +
symbol.
unquote(s)
If s
is urlencoded, returns s
with every +
substituted with a space and every %xx
substituted with the corresponding character.
parse_qs(qs)
Parses a query string qs
and returns a dictionary containing the association between keys and values of qs
. Values are urldecoded by unquote()
.
urlencode(data)
Tranforms data dictionary in a urlencoded query string and returns the query string. Each pair (key, value) is encoded by quote_via
function. By default, quote_plus()
is used to quote the values. An alternate function that can be passed as quote_via is quote()
.