cpython/Doc/lib/librfc822.tex

\section{Standard Module \sectcode{rfc822}}
\stmodindex{rfc822}

\renewcommand{\indexsubitem}{(in module rfc822)}

This module defines a class, \code{Message}, which represents a
collection of ``email headers'' as defined by the Internet standard
RFC 822.  It is used in various contexts, usually to read such headers
from a file.

A \code{Message} instance is instantiated with an open file object as
parameter.  Instantiation reads headers from the file up to a blank
line and stores them in the instance; after instantiation, the file is
positioned directly after the blank line that terminates the headers.

Input lines as read from the file may either be terminated by CR-LF or
by a single linefeed; a terminating CR-LF is replaced by a single
linefeed before the line is stored.

All header matching is done independent of upper or lower case;
e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield
the same result.

\begin{funcdesc}{parsedate}{date}
Attempts to parse a date according to the rules in RFC822.  however,
some mailers don't follow that format as specified, so
\code{parsedate()} tries to guess correctly in such cases. 
\var{date} is a string containing an RFC822 date, such as 
\code{"Mon, 20 Nov 1995 19:12:08 -0500"}.  If it succeeds in parsing
the date, \code{parsedate()} returns a 9-tuple that can be passed
directly to \code{time.mktime()}; otherwise \code{None} will be
returned.  
\end{funcdesc}

\begin{funcdesc}{parsedate_tz}{date}
Performs the same function as \code{parsedate}, but returns either
\code{None} or a 10-tuple; the first 9 elements make up a tuple that
can be passed directly to \code{time.mktime()}, and the tenth is the
offset of the date's time zone from UTC (which is the official term
for Greenwich Mean Time).
\end{funcdesc}

\subsection{Message Objects}

A \code{Message} instance has the following methods:

\begin{funcdesc}{rewindbody}{}
Seek to the start of the message body.  This only works if the file
object is seekable.
\end{funcdesc}

\begin{funcdesc}{getallmatchingheaders}{name}
Return a list of lines consisting of all headers matching
\var{name}, if any.  Each physical line, whether it is a continuation
line or not, is a separate list item.  Return the empty list if no
header matches \var{name}.
\end{funcdesc}

\begin{funcdesc}{getfirstmatchingheader}{name}
Return a list of lines comprising the first header matching
\var{name}, and its continuation line(s), if any.  Return \code{None}
if there is no header matching \var{name}.
\end{funcdesc}

\begin{funcdesc}{getrawheader}{name}
Return a single string consisting of the text after the colon in the
first header matching \var{name}.  This includes leading whitespace,
the trailing linefeed, and internal linefeeds and whitespace if there
any continuation line(s) were present.  Return \code{None} if there is
no header matching \var{name}.
\end{funcdesc}

\begin{funcdesc}{getheader}{name}
Like \code{getrawheader(\var{name})}, but strip leading and trailing
whitespace (but not internal whitespace).
\end{funcdesc}

\begin{funcdesc}{getaddr}{name}
Return a pair (full name, email address) parsed from the string
returned by \code{getheader(\var{name})}.  If no header matching
\var{name} exists, return \code{None, None}; otherwise both the full
name and the address are (possibly empty )strings.

Example: If \code{m}'s first \code{From} header contains the string\\
\code{'jack@cwi.nl (Jack Jansen)'}, then
\code{m.getaddr('From')} will yield the pair
\code{('Jack Jansen', 'jack@cwi.nl')}.
If the header contained
\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
exact same result.
\end{funcdesc}

\begin{funcdesc}{getaddrlist}{name}
This is similar to \code{getaddr(\var{list})}, but parses a header
containing a list of email addresses (e.g. a \code{To} header) and
returns a list of (full name, email address) pairs (even if there was
only one address in the header).  If there is no header matching
\var{name}, return an empty list.

XXX The current version of this function is not really correct.  It
yields bogus results if a full name contains a comma.
\end{funcdesc}

\begin{funcdesc}{getdate}{name}
Retrieve a header using \code{getheader} and parse it into a 9-tuple
compatible with \code{time.mktime()}.  If there is no header matching
\var{name}, or it is unparsable, return \code{None}.

Date parsing appears to be a black art, and not all mailers adhere to
the standard.  While it has been tested and found correct on a large
collection of email from many sources, it is still possible that this
function may occasionally yield an incorrect result.
\end{funcdesc}

\begin{funcdesc}{getdate_tz}{name}
Retrieve a header using \code{getheader} and parse it into a 10-tuple;
the first 9 elements will make a tuple compatible with
\code{time.mktime()}, and the 10th is a number giving the offset of
the date's time zone from UTC.  Similarly to \code{getdate()}, if
there is no header matching \var{name}, or it is unparsable, return
\code{None}. 
\end{funcdesc}

\code{Message} instances also support a read-only mapping interface.
In particular: \code{m[name]} is the same as \code{m.getheader(name)};
and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()},
\code{m.values()} and \code{m.items()} act as expected (and
consistently).

Finally, \code{Message} instances have two public instance variables:

\begin{datadesc}{headers}
A list containing the entire set of header lines, in the order in
which they were read.  Each line contains a trailing newline.  The
blank line terminating the headers is not contained in the list.
\end{datadesc}

\begin{datadesc}{fp}
The file object passed at instantiation time.
\end{datadesc}
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 17:07:09 +01:00			`\section{Standard Module \sectcode{rfc822}}`
added WWW sections 1995-02-27 18:53:25 +01:00			`\stmodindex{rfc822}`

a few typographical changes (e.g. -- => ---) and lots of new stuff in the WWW chapter 1995-02-28 18:14:32 +01:00			`\renewcommand{\indexsubitem}{(in module rfc822)}`

added WWW sections 1995-02-27 18:53:25 +01:00			`This module defines a class, \code{Message}, which represents a`
			collection of ``email headers'' as defined by the Internet standard
			`RFC 822. It is used in various contexts, usually to read such headers`
			`from a file.`

			`A \code{Message} instance is instantiated with an open file object as`
			`parameter. Instantiation reads headers from the file up to a blank`
			`line and stores them in the instance; after instantiation, the file is`
			`positioned directly after the blank line that terminates the headers.`

			`Input lines as read from the file may either be terminated by CR-LF or`
			`by a single linefeed; a terminating CR-LF is replaced by a single`
			`linefeed before the line is stored.`

			`All header matching is done independent of upper or lower case;`
			`e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield`
			`the same result.`

Added descriptions of parsedate(), parsedate_tz(), getdate_tz() (all contributed by Andrew Kuchling). 1996-12-06 22:23:53 +01:00			`\begin{funcdesc}{parsedate}{date}`
			`Attempts to parse a date according to the rules in RFC822. however,`
			`some mailers don't follow that format as specified, so`
			`\code{parsedate()} tries to guess correctly in such cases.`
			`\var{date} is a string containing an RFC822 date, such as`
			`\code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing`
			`the date, \code{parsedate()} returns a 9-tuple that can be passed`
			`directly to \code{time.mktime()}; otherwise \code{None} will be`
			`returned.`
			`\end{funcdesc}`

			`\begin{funcdesc}{parsedate_tz}{date}`
			`Performs the same function as \code{parsedate}, but returns either`
			`\code{None} or a 10-tuple; the first 9 elements make up a tuple that`
			`can be passed directly to \code{time.mktime()}, and the tenth is the`
			`offset of the date's time zone from UTC (which is the official term`
			`for Greenwich Mean Time).`
			`\end{funcdesc}`

restructured library manual accordiung to functional group 1995-03-28 15:35:14 +02:00			`\subsection{Message Objects}`

added WWW sections 1995-02-27 18:53:25 +01:00			`A \code{Message} instance has the following methods:`

			`\begin{funcdesc}{rewindbody}{}`
			`Seek to the start of the message body. This only works if the file`
			`object is seekable.`
			`\end{funcdesc}`

			`\begin{funcdesc}{getallmatchingheaders}{name}`
changes (suggested) by Soren Larsen 1995-03-07 11:14:09 +01:00			`Return a list of lines consisting of all headers matching`
added WWW sections 1995-02-27 18:53:25 +01:00			`\var{name}, if any. Each physical line, whether it is a continuation`
			`line or not, is a separate list item. Return the empty list if no`
			`header matches \var{name}.`
			`\end{funcdesc}`

			`\begin{funcdesc}{getfirstmatchingheader}{name}`
			`Return a list of lines comprising the first header matching`
			`\var{name}, and its continuation line(s), if any. Return \code{None}`
			`if there is no header matching \var{name}.`
			`\end{funcdesc}`

			`\begin{funcdesc}{getrawheader}{name}`
			`Return a single string consisting of the text after the colon in the`
			`first header matching \var{name}. This includes leading whitespace,`
			`the trailing linefeed, and internal linefeeds and whitespace if there`
			`any continuation line(s) were present. Return \code{None} if there is`
			`no header matching \var{name}.`
			`\end{funcdesc}`

			`\begin{funcdesc}{getheader}{name}`
			`Like \code{getrawheader(\var{name})}, but strip leading and trailing`
			`whitespace (but not internal whitespace).`
			`\end{funcdesc}`

			`\begin{funcdesc}{getaddr}{name}`
			`Return a pair (full name, email address) parsed from the string`
			`returned by \code{getheader(\var{name})}. If no header matching`
			`\var{name} exists, return \code{None, None}; otherwise both the full`
			`name and the address are (possibly empty )strings.`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 17:07:09 +01:00			`Example: If \code{m}'s first \code{From} header contains the string\\`
			`\code{'jack@cwi.nl (Jack Jansen)'}, then`
added WWW sections 1995-02-27 18:53:25 +01:00			`\code{m.getaddr('From')} will yield the pair`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 17:07:09 +01:00			`\code{('Jack Jansen', 'jack@cwi.nl')}.`
added WWW sections 1995-02-27 18:53:25 +01:00			`If the header contained`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 17:07:09 +01:00			`\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the`
added WWW sections 1995-02-27 18:53:25 +01:00			`exact same result.`
			`\end{funcdesc}`

			`\begin{funcdesc}{getaddrlist}{name}`
			`This is similar to \code{getaddr(\var{list})}, but parses a header`
			`containing a list of email addresses (e.g. a \code{To} header) and`
			`returns a list of (full name, email address) pairs (even if there was`
			`only one address in the header). If there is no header matching`
			`\var{name}, return an empty list.`

			`XXX The current version of this function is not really correct. It`
			`yields bogus results if a full name contains a comma.`
			`\end{funcdesc}`

			`\begin{funcdesc}{getdate}{name}`
			`Retrieve a header using \code{getheader} and parse it into a 9-tuple`
changes (suggested) by Soren Larsen 1995-03-07 11:14:09 +01:00			`compatible with \code{time.mktime()}. If there is no header matching`
added WWW sections 1995-02-27 18:53:25 +01:00			`\var{name}, or it is unparsable, return \code{None}.`

			`Date parsing appears to be a black art, and not all mailers adhere to`
			`the standard. While it has been tested and found correct on a large`
			`collection of email from many sources, it is still possible that this`
			`function may occasionally yield an incorrect result.`
			`\end{funcdesc}`

Added descriptions of parsedate(), parsedate_tz(), getdate_tz() (all contributed by Andrew Kuchling). 1996-12-06 22:23:53 +01:00			`\begin{funcdesc}{getdate_tz}{name}`
			`Retrieve a header using \code{getheader} and parse it into a 10-tuple;`
			`the first 9 elements will make a tuple compatible with`
			`\code{time.mktime()}, and the 10th is a number giving the offset of`
			`the date's time zone from UTC. Similarly to \code{getdate()}, if`
			`there is no header matching \var{name}, or it is unparsable, return`
			`\code{None}.`
			`\end{funcdesc}`

added WWW sections 1995-02-27 18:53:25 +01:00			`\code{Message} instances also support a read-only mapping interface.`
			`In particular: \code{m[name]} is the same as \code{m.getheader(name)};`
			`and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()},`
			`\code{m.values()} and \code{m.items()} act as expected (and`
			`consistently).`

			`Finally, \code{Message} instances have two public instance variables:`

			`\begin{datadesc}{headers}`
			`A list containing the entire set of header lines, in the order in`
			`which they were read. Each line contains a trailing newline. The`
			`blank line terminating the headers is not contained in the list.`
			`\end{datadesc}`

			`\begin{datadesc}{fp}`
			`The file object passed at instantiation time.`
			`\end{datadesc}`