cpython/Doc/lib/liblocale.tex

\section{\module{locale} ---
         Internationalization services}

\declaremodule{standard}{locale}
\modulesynopsis{Internationalization services.}


The \module{locale} module opens access to the \POSIX{} locale database
and functionality. The \POSIX{} locale mechanism allows programmers
to deal with certain cultural issues in an application, without
requiring the programmer to know all the specifics of each country
where the software is executed.

The \module{locale} module is implemented on top of the
\module{_locale}\refbimodindex{_locale} module, which in turn uses an
ANSI C locale implementation if available.

The \module{locale} module defines the following exception and
functions:


\begin{funcdesc}{setlocale}{category\optional{, value}}
If \var{value} is specified, modifies the locale setting for the
\var{category}. The available categories are listed in the data
description below. The value is the name of a locale. An empty string
specifies the user's default settings. If the modification of the
locale fails, the exception \exception{Error} is
raised. If successful, the new locale setting is returned.

If no \var{value} is specified, the current setting for the
\var{category} is returned.

\function{setlocale()} is not thread safe on most systems. Applications
typically start with a call of
\begin{verbatim}
import locale
locale.setlocale(locale.LC_ALL,"")
\end{verbatim}
This sets the locale for all categories to the user's default setting
(typically specified in the \envvar{LANG} environment variable). If
the locale is not changed thereafter, using multithreading should not
cause problems.
\end{funcdesc}

\begin{excdesc}{Error}
Exception raised when \function{setlocale()} fails.
\end{excdesc}

\begin{funcdesc}{localeconv}{}
Returns the database of of the local conventions as a dictionary. This
dictionary has the following strings as keys:
\begin{itemize}
\item \code{decimal_point} specifies the decimal point used in
floating point number representations for the \constant{LC_NUMERIC}
category.
\item \code{grouping} is a sequence of numbers specifying at which
relative positions the \code{thousands_sep} is expected. If the
sequence is terminated with \constant{CHAR_MAX}, no further
grouping is performed. If the sequence terminates with a \code{0}, the last
group size is repeatedly used.
\item \code{thousands_sep} is the character used between groups.
\item \code{int_curr_symbol} specifies the international currency
symbol from the \constant{LC_MONETARY} category.
\item \code{currency_symbol} is the local currency symbol.
\item \code{mon_decimal_point} is the decimal point used in monetary
values.
\item \code{mon_thousands_sep} is the separator for grouping of
monetary values.
\item \code{mon_grouping} has the same format as the \code{grouping}
key; it is used for monetary values.
\item \code{positive_sign} and \code{negative_sign} gives the sign
used for positive and negative monetary quantities.
\item \code{int_frac_digits} and \code{frac_digits} specify the number
of fractional digits used in the international and local formatting
of monetary values.
\item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether
the currency symbol precedes the value for positive or negative
values.
\item \code{p_sep_by_space} and \code{n_sep_by_space} specifies
whether there is a space between the positive or negative value and
the currency symbol.
\item \code{p_sign_posn} and \code{n_sign_posn} indicate how the
sign should be placed for positive and negative monetary values.
\end{itemize}

The possible values for \code{p_sign_posn} and
\code{n_sign_posn} are given below.

\begin{tableii}{c|l}{code}{Value}{Explanation}
\lineii{0}{Currency and value are surrounded by parentheses.}
\lineii{1}{The sign should precede the value and currency symbol.}
\lineii{2}{The sign should follow the value and currency symbol.}
\lineii{3}{The sign should immediately precede the value.}
\lineii{4}{The sign should immediately follow the value.}
\lineii{LC_MAX}{Nothing is specified in this locale.}
\end{tableii}
\end{funcdesc}

\begin{funcdesc}{strcoll}{string1,string2}
Compares two strings according to the current \constant{LC_COLLATE}
setting. As any other compare function, returns a negative, or a
positive value, or \code{0}, depending on whether \var{string1}
collates before or after \var{string2} or is equal to it.
\end{funcdesc}

\begin{funcdesc}{strxfrm}{string}
Transforms a string to one that can be used for the built-in function
\function{cmp()}\bifuncindex{cmp}, and still returns locale-aware
results.  This function can be used when the same string is compared
repeatedly, e.g. when collating a sequence of strings.
\end{funcdesc}

\begin{funcdesc}{format}{format, val, \optional{grouping\code{ = 0}}}
Formats a number \var{val} according to the current
\constant{LC_NUMERIC} setting.  The format follows the conventions of
the \code{\%} operator.  For floating point values, the decimal point
is modified if appropriate.  If \var{grouping} is true, also takes the
grouping into account.
\end{funcdesc}

\begin{funcdesc}{str}{float}
Formats a floating point number using the same format as the built-in
function \code{str(\var{float})}, but takes the decimal point into
account.
\end{funcdesc}

\begin{funcdesc}{atof}{string}
Converts a string to a floating point number, following the
\constant{LC_NUMERIC} settings.
\end{funcdesc}

\begin{funcdesc}{atoi}{string}
Converts a string to an integer, following the \constant{LC_NUMERIC}
conventions.
\end{funcdesc}

\begin{datadesc}{LC_CTYPE}
\refstmodindex{string}
Locale category for the character type functions. Depending on the
settings of this category, the functions of module \refmodule{string}
dealing with case change their behaviour.
\end{datadesc}

\begin{datadesc}{LC_COLLATE}
Locale category for sorting strings. The functions
\function{strcoll()} and \function{strxfrm()} of the \module{locale}
module are affected.
\end{datadesc}

\begin{datadesc}{LC_TIME}
Locale category for the formatting of time. The function
\function{time.strftime()} follows these conventions.
\end{datadesc}

\begin{datadesc}{LC_MONETARY}
Locale category for formatting of monetary values. The available
options are available from the \function{localeconv()} function.
\end{datadesc}

\begin{datadesc}{LC_MESSAGES}
Locale category for message display. Python currently does not support
application specific locale-aware messages. Messages displayed by the
operating system, like those returned by \function{os.strerror()}
might be affected by this category.
\end{datadesc}

\begin{datadesc}{LC_NUMERIC}
Locale category for formatting numbers. The functions
\function{format()}, \function{atoi()}, \function{atof()} and
\function{str()} of the \module{locale} module are affected by that
category. All other numeric formatting operations are not affected.
\end{datadesc}

\begin{datadesc}{LC_ALL}
Combination of all locale settings. If this flag is used when the
locale is changed, setting the locale for all categories is
attempted. If that fails for any category, no category is changed at
all. When the locale is retrieved using this flag, a string indicating
the setting for all categories is returned. This string can be later
used to restore the settings.
\end{datadesc}

\begin{datadesc}{CHAR_MAX}
This is a symbolic constant used for different values returned by
\function{localeconv()}.
\end{datadesc}

Example:

\begin{verbatim}
>>> import locale
>>> loc = locale.setlocale(locale.LC_ALL) # get current locale
>>> locale.setlocale(locale.LC_ALL, "de") # use German locale
>>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut
>>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale
>>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale
>>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
\end{verbatim}

\subsection{Background, details, hints, tips and caveats}

The C standard defines the locale as a program-wide property that may
be relatively expensive to change.  On top of that, some
implementation are broken in such a way that frequent locale changes
may cause core dumps.  This makes the locale somewhat painful to use
correctly.

Initially, when a program is started, the locale is the \samp{C} locale, no
matter what the user's preferred locale is.  The program must
explicitly say that it wants the user's preferred locale settings by
calling \code{setlocale(LC_ALL, "")}.

It is generally a bad idea to call \function{setlocale()} in some library
routine, since as a side effect it affects the entire program.  Saving
and restoring it is almost as bad: it is expensive and affects other
threads that happen to run before the settings have been restored.

If, when coding a module for general use, you need a locale
independent version of an operation that is affected by the locale
(e.g. \function{string.lower()}, or certain formats used with
\function{time.strftime()})), you will have to find a way to do it
without using the standard library routine.  Even better is convincing
yourself that using locale settings is okay.  Only as a last resort
should you document that your module is not compatible with
non-\samp{C} locale settings.

The case conversion functions in the
\refmodule{string}\refstmodindex{string} and
\module{strop}\refbimodindex{strop} modules are affected by the locale
settings.  When a call to the \function{setlocale()} function changes
the \constant{LC_CTYPE} settings, the variables
\code{string.lowercase}, \code{string.uppercase} and
\code{string.letters} (and their counterparts in \module{strop}) are
recalculated.  Note that this code that uses these variable through
`\keyword{from} ... \keyword{import} ...', e.g. \code{from string
import letters}, is not affected by subsequent \function{setlocale()}
calls.

The only way to perform numeric operations according to the locale
is to use the special functions defined by this module:
\function{atof()}, \function{atoi()}, \function{format()},
\function{str()}.

\subsection{For extension writers and programs that embed Python}
\label{embedding-locale}

Extension modules should never call \function{setlocale()}, except to
find out what the current locale is.  But since the return value can
only be used portably to restore it, that is not very useful (except
perhaps to find out whether or not the locale is \samp{C}).

When Python is embedded in an application, if the application sets the
locale to something specific before initializing Python, that is
generally okay, and Python will use whatever locale is set,
\emph{except} that the \constant{LC_NUMERIC} locale should always be
\samp{C}.

The \function{setlocale()} function in the \module{locale} module contains
gives the Python progammer the impression that you can manipulate the
\constant{LC_NUMERIC} locale setting, but this not the case at the C
level: C code will always find that the \constant{LC_NUMERIC} locale
setting is \samp{C}.  This is because too much would break when the
decimal point character is set to something else than a period
(e.g. the Python parser would break).  Caveat: threads that run
without holding Python's global interpreter lock may occasionally find
that the numeric locale setting differs; this is because the only
portable way to implement this feature is to set the numeric locale
settings to what the user requests, extract the relevant
characteristics, and then restore the \samp{C} numeric locale.

When Python code uses the \module{locale} module to change the locale,
this also affects the embedding application.  If the embedding
application doesn't want this to happen, it should remove the
\module{_locale} extension module (which does all the work) from the
table of built-in modules in the \file{config.c} file, and make sure
that the \module{_locale} module is not accessible as a shared library.