“Python/docs/3.9/c-api/unicode”的版本间差异

来自菜鸟教程
Python/docs/3.9/c-api/unicode
跳转至:导航、​搜索
(autoload)
 
(Page commit)
 
第1行: 第1行:
 +
{{DISPLAYTITLE:Unicode 对象和编解码器 — Python 文档}}
 
<div id="unicode-objects-and-codecs" class="section">
 
<div id="unicode-objects-and-codecs" class="section">
  
 
<span id="unicodeobjects"></span>
 
<span id="unicodeobjects"></span>
= Unicode Objects and Codecs =
+
= Unicode 对象和编解码器 =
  
 
<div id="unicode-objects" class="section">
 
<div id="unicode-objects" class="section">
  
== Unicode Objects ==
+
== Unicode 对象 ==
  
Since the implementation of <span id="index-0" class="target"></span>[https://www.python.org/dev/peps/pep-0393 '''PEP 393'''] in Python 3.3, Unicode objects internally
+
由于 <span id="index-0" class="target"></span>[https://www.python.org/dev/peps/pep-0393 PEP 393] Python 3.3 中的实现,Unicode 对象在内部使用了各种表示,以便在保持内存效率的同时处理完整范围的 Unicode 字符。 对于所有代码点都低于 128、256 或 65536 的字符串,存在特殊情况; 否则,代码点必须低于 1114112(这是完整的 Unicode 范围)。
use a variety of representations, in order to allow handling the complete range
 
of Unicode characters while staying memory efficient. There are special cases
 
for strings where all code points are below 128, 256, or 65536; otherwise, code
 
points must be below 1114112 (which is the full Unicode range).
 
  
<span class="xref c c-texpr">[[#c.Py_UNICODE|Py_UNICODE]]*</span> and UTF-8 representations are created on demand and cached
+
<span class="xref c c-texpr">Py_UNICODE*</span> UTF-8 表示按需创建并缓存在 Unicode 对象中。 <span class="xref c c-texpr">Py_UNICODE*</span> 表示已弃用且效率低下。
in the Unicode object. The <span class="xref c c-texpr">[[#c.Py_UNICODE|Py_UNICODE]]*</span> representation is deprecated
 
and inefficient; it should be avoided in performance- or memory-sensitive
 
situations.
 
  
Due to the transition between the old APIs and the new APIs, Unicode objects
+
由于旧 API 和新 API 之间的转换,Unicode 对象在内部可以处于两种状态,具体取决于它们的创建方式:
can internally be in two states depending on how they were created:
 
  
* &quot;canonical&quot; Unicode objects are all objects created by a non-deprecated Unicode API. They use the most efficient representation allowed by the implementation.
+
* “规范的”Unicode 对象是由未弃用的 Unicode API 创建的所有对象。 它们使用实现所允许的最有效的表示。
* &quot;legacy&quot; Unicode objects have been created through one of the deprecated APIs (typically [[#c.PyUnicode_FromUnicode|<code>PyUnicode_FromUnicode()</code>]]) and only bear the <span class="xref c c-texpr">[[#c.Py_UNICODE|Py_UNICODE]]*</span> representation; you will have to call [[#c.PyUnicode_READY|<code>PyUnicode_READY()</code>]] on them before calling any other API.
+
* “legacy” Unicode 对象是通过其中一个不推荐使用的 API(通常是 [[#c.PyUnicode_FromUnicode|PyUnicode_FromUnicode()]])创建的,并且只带有 <span class="xref c c-texpr">Py_UNICODE*</span> 表示; 在调用任何其他 API 之前,您必须对它们调用 [[#c.PyUnicode_READY|PyUnicode_READY()]]
  
 
<div class="admonition note">
 
<div class="admonition note">
  
注解
+
笔记
  
The &quot;legacy&quot; Unicode object will be removed in Python 3.12 with deprecated
+
“遗留” Unicode 对象将在 Python 3.12 中删除,并带有弃用的 API。 从那时起,所有 Unicode 对象都将是“规范的”。 有关更多信息,请参阅 <span id="index-1" class="target"></span>[https://www.python.org/dev/peps/pep-0623 PEP 623]
APIs. All Unicode objects will be &quot;canonical&quot; since then. See <span id="index-1" class="target"></span>[https://www.python.org/dev/peps/pep-0623 '''PEP 623''']
 
for more information.
 
  
  
第37行: 第28行:
 
<div id="unicode-type" class="section">
 
<div id="unicode-type" class="section">
  
=== Unicode Type ===
+
=== Unicode 类型 ===
  
These are the basic Unicode object types used for the Unicode implementation in
+
这些是用于 Python 中 Unicode 实现的基本 Unicode 对象类型:
Python:
 
  
 
<dl>
 
<dl>
<dt>''type'' <code>Py_UCS4</code><br />
+
<dt><span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UCS4</span></span></span><br />
''type'' <code>Py_UCS2</code><br />
+
<br />
''type'' <code>Py_UCS1</code></dt>
+
<span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UCS2</span></span></span><br />
<dd><p>These types are typedefs for unsigned integer types wide enough to contain
+
<br />
characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
+
<span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UCS1</span></span></span><br />
single Unicode characters, use [[#c.Py_UCS4|<code>Py_UCS4</code>]].</p>
+
</dt>
 +
<dd><p>这些类型是无符号整数类型的 typedef,其宽度足以分别包含 32 位、16 位和 8 位字符。 处理单个Unicode字符时,使用[[#c.Py_UCS4|Py_UCS4]]</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>''type'' <code>Py_UNICODE</code></dt>
+
<dt><span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE</span></span></span><br />
<dd><p>This is a typedef of <code>wchar_t</code>, which is a 16-bit type or 32-bit type
+
</dt>
depending on the platform.</p>
+
<dd><p>这是一个 <code>wchar_t</code> 的 typedef,根据平台是 16 位类型还是 32 位类型。</p>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.3 版更改: </span>In previous versions, this was a 16-bit type or a 32-bit type depending on
+
<p><span class="versionmodified changed"> 3.3 版更改: </span> 在以前的版本中,这是 16 位类型还是 32 位类型,具体取决于您在构建时选择了“窄”还是“宽” Unicode 版本的 Python。</p>
whether you selected a &quot;narrow&quot; or &quot;wide&quot; Unicode version of Python at
 
build time.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>''type'' <code>PyASCIIObject</code><br />
+
<dt><span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyASCIIObject</span></span></span><br />
''type'' <code>PyCompactUnicodeObject</code><br />
+
<br />
''type'' <code>PyUnicodeObject</code></dt>
+
<span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyCompactUnicodeObject</span></span></span><br />
<dd><p>These subtypes of [[../structures#c|<code>PyObject</code>]] represent a Python Unicode object. In
+
<br />
almost all cases, they shouldn't be used directly, since all API functions
+
<span class="k"><span class="pre">type</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicodeObject</span></span></span><br />
that deal with Unicode objects take and return [[../structures#c|<code>PyObject</code>]] pointers.</p>
+
</dt>
 +
<dd><p>[[../structures#c|PyObject]] 的这些子类型代表 Python Unicode 对象。 在几乎所有情况下,都不应该直接使用它们,因为所有处理 Unicode 对象的 API 函数都采用并返回 [[../structures#c|PyObject]] 指针。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
; [[../type#c|PyTypeObject]] <code>PyUnicode_Type</code>
+
; [[../type#c|<span class="n"><span class="pre">PyTypeObject</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Type</span></span></span><br />
: This instance of [[../type#c|<code>PyTypeObject</code>]] represents the Python Unicode type. It is exposed to Python code as <code>str</code>.
 
  
The following APIs are really C macros and can be used to do fast checks and to
+
: [[../type#c|PyTypeObject]] 的这个实例表示 Python Unicode 类型。 它以 <code>str</code> 的形式暴露给 Python 代码。
access internal read-only data of Unicode objects:
 
  
; int <code>PyUnicode_Check</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span>
+
以下 API 是真正的 C 宏,可用于进行快速检查和访问 Unicode 对象的内部只读数据:
: Return true if the object ''o'' is a Unicode object or an instance of a Unicode subtype.
 
  
; int <code>PyUnicode_CheckExact</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Check</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
: Return true if the object ''o'' is a Unicode object, but not an instance of a subtype.
+
 
 +
: 如果对象 ''o'' 是 Unicode 对象或 Unicode 子类型的实例,则返回 true。 此功能总是成功。
 +
 
 +
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_CheckExact</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 如果对象 ''o'' Unicode 对象,但不是子类型的实例,则返回 true。 此功能总是成功。
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_READY</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_READY</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Ensure the string object ''o'' is in the &quot;canonical&quot; representation. This is
+
</dt>
required before using any of the access macros described below.</p>
+
<dd><p>确保字符串对象 ''o'' 在“规范”表示中。 在使用下面描述的任何访问宏之前,这是必需的。</p>
<p>Returns <code>0</code> on success and <code>-1</code> with an exception set on failure, which in
+
<p>成功时返回 <code>0</code>,失败时返回 <code>-1</code> 并设置异常,尤其是在内存分配失败时。</p>
particular happens if memory allocation fails.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
 
 
</div>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.10, will be removed in version 3.12: </span>This API will be removed with [[#c.PyUnicode_FromUnicode|<code>PyUnicode_FromUnicode()</code>]].</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>Py_ssize_t <code>PyUnicode_GET_LENGTH</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
<dt><span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_GET_LENGTH</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the length of the Unicode string, in code points. ''o'' has to be a
+
</dt>
Unicode object in the &quot;canonical&quot; representation (not checked).</p>
+
<dd><p>返回 Unicode 字符串的长度,以代码点为单位。 ''o'' 必须是“规范”表示中的 Unicode 对象(未选中)。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UCS1|Py_UCS1]] *<code>PyUnicode_1BYTE_DATA</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span><br />
+
<dt>[[#c.Py_UCS1|<span class="n"><span class="pre">Py_UCS1</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_1BYTE_DATA</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
[[#c.Py_UCS2|Py_UCS2]] *<code>PyUnicode_2BYTE_DATA</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span><br />
+
<br />
[[#c.Py_UCS4|Py_UCS4]] *<code>PyUnicode_4BYTE_DATA</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
[[#c.Py_UCS2|<span class="n"><span class="pre">Py_UCS2</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_2BYTE_DATA</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
+
<br />
integer types for direct character access. No checks are performed if the
+
[[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_4BYTE_DATA</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
canonical representation has the correct character size; use
+
</dt>
[[#c.PyUnicode_KIND|<code>PyUnicode_KIND()</code>]] to select the right macro. Make sure
+
<dd><p>返回一个指向转换为 UCS1、UCS2 或 UCS4 整数类型的规范表示的指针,以进行直接字符访问。 如果规范表示具有正确的字符大小,则不会执行任何检查; 使用 [[#c.PyUnicode_KIND|PyUnicode_KIND()]] 选择正确的宏。 确保在访问它之前已经调用了 [[#c.PyUnicode_READY|PyUnicode_READY()]]</p>
[[#c.PyUnicode_READY|<code>PyUnicode_READY()</code>]] has been called before accessing this.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt><code>PyUnicode_WCHAR_KIND</code><br />
+
<dt><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_WCHAR_KIND</span></span></span><br />
<code>PyUnicode_1BYTE_KIND</code><br />
+
<br />
<code>PyUnicode_2BYTE_KIND</code><br />
+
<span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_1BYTE_KIND</span></span></span><br />
<code>PyUnicode_4BYTE_KIND</code></dt>
+
<br />
<dd><p>Return values of the [[#c.PyUnicode_KIND|<code>PyUnicode_KIND()</code>]] macro.</p>
+
<span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_2BYTE_KIND</span></span></span><br />
 +
<br />
 +
<span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_4BYTE_KIND</span></span></span><br />
 +
</dt>
 +
<dd><p>返回 [[#c.PyUnicode_KIND|PyUnicode_KIND()]] 宏的值。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
 
 
</div>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.10, will be removed in version 3.12: </span><code>PyUnicode_WCHAR_KIND</code> is deprecated.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_KIND</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">unsigned</span> <span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_KIND</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Return one of the PyUnicode kind constants (see above) that indicate how many
+
</dt>
bytes per character this Unicode object uses to store its data. ''o'' has to
+
<dd><p>返回 PyUnicode 类型常量之一(见上文),指示此 Unicode 对象用于存储其数据的每个字符的字节数。 ''o'' 必须是“规范”表示中的 Unicode 对象(未选中)。</p>
be a Unicode object in the &quot;canonical&quot; representation (not checked).</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>void *<code>PyUnicode_DATA</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DATA</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Return a void pointer to the raw Unicode buffer. ''o'' has to be a Unicode
+
</dt>
object in the &quot;canonical&quot; representation (not checked).</p>
+
<dd><p>返回指向原始 Unicode 缓冲区的空指针。 ''o'' 必须是“规范”表示中的 Unicode 对象(未选中)。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>void <code>PyUnicode_WRITE</code><span class="sig-paren">(</span>int ''kind'', void *''data'', Py_ssize_t ''index'', [[#c.Py_UCS4|Py_UCS4]] ''value''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_WRITE</span></span></span><span class="sig-paren">(</span><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">kind</span></span>, <span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">data</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">index</span></span>, [[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="n"><span class="pre">value</span></span><span class="sig-paren">)</span><br />
<dd><p>Write into a canonical representation ''data'' (as obtained with
+
</dt>
[[#c.PyUnicode_DATA|<code>PyUnicode_DATA()</code>]]). This macro does not do any sanity checks and is
+
<dd><p>写入规范表示 ''data''(通过 [[#c.PyUnicode_DATA|PyUnicode_DATA()]] 获得)。 这个宏不做任何健全性检查,旨在用于循环。 调用者应该缓存从其他宏调用中获得的 ''kind'' 值和 ''data'' 指针。 ''index'' 是字符串中的索引(从 0 开始),''value'' 是应该写入该位置的新代码点值。</p>
intended for usage in loops. The caller should cache the ''kind'' value and
 
''data'' pointer as obtained from other macro calls. ''index'' is the index in
 
the string (starts at 0) and ''value'' is the new code point value which should
 
be written to that location.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UCS4|Py_UCS4]] <code>PyUnicode_READ</code><span class="sig-paren">(</span>int ''kind'', void *''data'', Py_ssize_t ''index''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_READ</span></span></span><span class="sig-paren">(</span><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">kind</span></span>, <span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">data</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">index</span></span><span class="sig-paren">)</span><br />
<dd><p>Read a code point from a canonical representation ''data'' (as obtained with
+
</dt>
[[#c.PyUnicode_DATA|<code>PyUnicode_DATA()</code>]]). No checks or ready calls are performed.</p>
+
<dd><p>从规范表示中读取代码点 ''data''(通过 [[#c.PyUnicode_DATA|PyUnicode_DATA()]] 获得)。 不执行检查或就绪调用。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UCS4|Py_UCS4]] <code>PyUnicode_READ_CHAR</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o'', Py_ssize_t ''index''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_READ_CHAR</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">index</span></span><span class="sig-paren">)</span><br />
<dd><p>Read a character from a Unicode object ''o'', which must be in the &quot;canonical&quot;
+
</dt>
representation. This is less efficient than [[#c.PyUnicode_READ|<code>PyUnicode_READ()</code>]] if you
+
<dd><p>Unicode 对象 ''o'' 中读取字符,该对象必须采用“规范”表示。 如果您进行多次连续读取,这比 [[#c.PyUnicode_READ|PyUnicode_READ()]] 效率低。</p>
do multiple consecutive reads.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt><code>PyUnicode_MAX_CHAR_VALUE</code><span class="sig-paren">(</span>''o''<span class="sig-paren">)</span></dt>
+
<dt><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_MAX_CHAR_VALUE</span></span></span><span class="sig-paren">(</span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the maximum code point that is suitable for creating another string
+
</dt>
based on ''o'', which must be in the &quot;canonical&quot; representation. This is
+
<dd><p>返回适合基于 ''o'' 创建另一个字符串的最大代码点,该字符串必须采用“规范”表示。 这始终是一个近似值,但比迭代字符串更有效。</p>
always an approximation but more efficient than iterating over the string.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
<dl>
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_GET_SIZE</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dt>Py_ssize_t <code>PyUnicode_GET_SIZE</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
 
<dd><p>Return the size of the deprecated [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] representation, in
+
: 返回已弃用的 [[#c.Py_UNICODE|Py_UNICODE]] 表示的大小,以代码单元为单位(这包括作为 2 个单元的代理对)。 ''o'' 必须是 Unicode 对象(未选中)。
code units (this includes surrogate pairs as 2 units). ''o'' has to be a
 
Unicode object (not checked).</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_GET_DATA_SIZE</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_GET_LENGTH|<code>PyUnicode_GET_LENGTH()</code>]].</p>
 
  
</div></dd></dl>
+
: 以字节为单位返回已弃用的 [[#c.Py_UNICODE|Py_UNICODE]] 表示的大小。 ''o'' 必须是 Unicode 对象(未选中)。
  
 
<dl>
 
<dl>
<dt>Py_ssize_t <code>PyUnicode_GET_DATA_SIZE</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AS_UNICODE</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the size of the deprecated [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] representation in
+
<br />
bytes. ''o'' has to be a Unicode object (not checked).</p>
+
<span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AS_DATA</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
<div class="deprecated-removed">
+
</dt>
 +
<dd><p>返回指向对象的 [[#c.Py_UNICODE|Py_UNICODE]] 表示的指针。 返回的缓冲区总是以一个额外的空代码点终止。 它还可能包含嵌入的空代码点,这会导致在大多数 C 函数中使用时字符串被截断。 <code>AS_DATA</code> 形式将指针转换为 <span class="xref c c-texpr">const char*</span>。 ''o'' 参数必须是一个 Unicode 对象(未检查)。</p>
 +
<div class="versionchanged">
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using
+
<p><span class="versionmodified changed"> 3.3 版更改: </span> 这个宏现在效率低下——因为在许多情况下 [[#c.Py_UNICODE|Py_UNICODE]] 表示不存在并且需要创建——并且可能失败(返回 <code>NULL</code>有一个例外集)。 尝试移植代码以使用新的 <code>PyUnicode_nBYTE_DATA()</code> 宏或使用 [[#c.PyUnicode_WRITE|PyUnicode_WRITE()]] 或 [[#c.PyUnicode_READ|PyUnicode_READ()]]。</p>
[[#c.PyUnicode_GET_LENGTH|<code>PyUnicode_GET_LENGTH()</code>]].</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UNICODE|Py_UNICODE]] *<code>PyUnicode_AS_UNICODE</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span><br />
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_IsIdentifier</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">o</span></span><span class="sig-paren">)</span><br />
''const'' char *<code>PyUnicode_AS_DATA</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
+
</dt>
<dd><p>Return a pointer to a [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] representation of the object. The
+
<dd><p>如果根据语言定义,部分 [[../../reference/lexical_analysis#identifiers|标识符和关键字]] ,字符串是有效标识符,则返回 <code>1</code>。 否则返回 <code>0</code></p>
returned buffer is always terminated with an extra null code point. It
 
may also contain embedded null code points, which would cause the string
 
to be truncated when used in most C functions. The <code>AS_DATA</code> form
 
casts the pointer to <span class="xref c c-texpr">''const'' char*</span>. The ''o'' argument has to be
 
a Unicode object (not checked).</p>
 
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">在 3.3 版更改: </span>This macro is now inefficient -- because in many cases the
+
<p><span class="versionmodified changed"> 在 3.9 版更改: </span> 如果字符串未准备好,该函数不再调用 [[../sys#c|Py_FatalError()]]</p>
[[#c.Py_UNICODE|<code>Py_UNICODE</code>]] representation does not exist and needs to be created
+
 
-- and can fail (return <code>NULL</code> with an exception set). Try to port the
+
</div></dd></dl>
code to use the new <code>PyUnicode_nBYTE_DATA()</code> macros or use
+
 
[[#c.PyUnicode_WRITE|<code>PyUnicode_WRITE()</code>]] or [[#c.PyUnicode_READ|<code>PyUnicode_READ()</code>]].</p>
 
  
 
</div>
 
</div>
<div class="deprecated-removed">
+
<div id="unicode-character-properties" class="section">
 +
 
 +
=== Unicode 字符属性 ===
 +
 
 +
Unicode 提供了许多不同的字符属性。 最常需要的可通过这些宏获得,这些宏根据 Python 配置映射到 C 函数。
 +
 
 +
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISSPACE</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using the
+
: 根据 ''ch'' 是否为空白字符,返回 <code>1</code> <code>0</code>
<code>PyUnicode_nBYTE_DATA()</code> family of macros.</p>
 
  
</div></dd></dl>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISLOWER</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 根据 ''ch'' 是否为小写字符,返回 <code>1</code> 或 <code>0</code>
  
<dl>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISUPPER</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
<dt>int <code>PyUnicode_IsIdentifier</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''o''<span class="sig-paren">)</span></dt>
 
<dd><p>Return <code>1</code> if the string is a valid identifier according to the language
 
definition, section [[../../reference/lexical_analysis#identifiers|<span class="std std-ref">Identifiers and keywords</span>]]. Return <code>0</code> otherwise.</p>
 
<div class="versionchanged">
 
  
<p><span class="versionmodified changed">在 3.9 版更改: </span>The function does not call [[../sys#c|<code>Py_FatalError()</code>]] anymore if the string
+
: 根据 ''ch'' 是否为大写字符,返回 <code>1</code> <code>0</code>
is not ready.</p>
 
  
</div></dd></dl>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISTITLE</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
  
 +
: 根据 ''ch'' 是否是标题字符,返回 <code>1</code> 或 <code>0</code>。
  
</div>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISLINEBREAK</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
<div id="unicode-character-properties" class="section">
 
  
=== Unicode Character Properties ===
+
: 根据 ''ch'' 是否为换行符,返回 <code>1</code> 或 <code>0</code>。
  
Unicode provides many different character properties. The most often needed ones
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISDECIMAL</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
are available through these macros which are mapped to C functions depending on
 
the Python configuration.
 
  
; int <code>Py_UNICODE_ISSPACE</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 根据 ''ch'' 是否为十进制字符,返回 <code>1</code> <code>0</code>
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a whitespace character.
 
  
; int <code>Py_UNICODE_ISLOWER</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISDIGIT</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a lowercase character.
 
  
; int <code>Py_UNICODE_ISUPPER</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 根据 ''ch'' 是否为数字字符,返回 <code>1</code> <code>0</code>
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is an uppercase character.
 
  
; int <code>Py_UNICODE_ISTITLE</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISNUMERIC</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a titlecase character.
 
  
; int <code>Py_UNICODE_ISLINEBREAK</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 根据 ''ch'' 是否为数字字符,返回 <code>1</code> <code>0</code>
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a linebreak character.
 
  
; int <code>Py_UNICODE_ISDECIMAL</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISALPHA</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a decimal character.
 
  
; int <code>Py_UNICODE_ISDIGIT</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 根据 ''ch'' 是否为字母字符,返回 <code>1</code> <code>0</code>
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a digit character.
 
  
; int <code>Py_UNICODE_ISNUMERIC</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISALNUM</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a numeric character.
 
  
; int <code>Py_UNICODE_ISALPHA</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 根据 ''ch'' 是否为字母数字字符,返回 <code>1</code> <code>0</code>
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is an alphabetic character.
 
  
; int <code>Py_UNICODE_ISALNUM</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_ISPRINTABLE</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is an alphanumeric character.
 
  
; int <code>Py_UNICODE_ISPRINTABLE</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 根据 ''ch'' 是否为可打印字符,返回 <code>1</code> <code>0</code>。 不可打印字符是在 Unicode 字符数据库中定义为“其他”或“分隔符”的字符,但被认为可打印的 ASCII 空格 (0x20) 除外。 (请注意,此上下文中的可打印字符是在字符串上调用 [[../../library/functions#repr|repr()]] 时不应转义的字符。 它与处理写入 [[../../library/sys#sys|sys.stdout]] [[../../library/sys#sys|sys.stderr]] 的字符串无关。)
: Return <code>1</code> or <code>0</code> depending on whether ''ch'' is a printable character. Nonprintable characters are those characters defined in the Unicode character database as &quot;Other&quot; or &quot;Separator&quot;, excepting the ASCII space (0x20) which is considered printable. (Note that printable characters in this context are those which should not be escaped when [[../../library/functions#repr|<code>repr()</code>]] is invoked on a string. It has no bearing on the handling of strings written to [[../../library/sys#sys|<code>sys.stdout</code>]] or [[../../library/sys#sys|<code>sys.stderr</code>]].)
 
  
These APIs can be used for fast direct character conversions:
+
这些 API 可用于快速直接字符转换:
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UNICODE|Py_UNICODE]] <code>Py_UNICODE_TOLOWER</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_TOLOWER</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the character ''ch'' converted to lower case.</p>
+
</dt>
 +
<dd><p>返回转换为小写的字符 ''ch''</p>
 
<div class="deprecated">
 
<div class="deprecated">
  
<p><span class="versionmodified deprecated">3.3 版后已移除: </span>This function uses simple case mappings.</p>
+
<p><span class="versionmodified deprecated">3.3 版起已弃用:</span>此函数使用简单的大小写映射。</p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UNICODE|Py_UNICODE]] <code>Py_UNICODE_TOUPPER</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_TOUPPER</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the character ''ch'' converted to upper case.</p>
+
</dt>
 +
<dd><p>返回转换为大写的字符 ''ch''</p>
 
<div class="deprecated">
 
<div class="deprecated">
  
<p><span class="versionmodified deprecated">3.3 版后已移除: </span>This function uses simple case mappings.</p>
+
<p><span class="versionmodified deprecated">3.3 版起已弃用:</span>此函数使用简单的大小写映射。</p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UNICODE|Py_UNICODE]] <code>Py_UNICODE_TOTITLE</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_TOTITLE</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the character ''ch'' converted to title case.</p>
+
</dt>
 +
<dd><p>返回转换为标题大小写的字符 ''ch''</p>
 
<div class="deprecated">
 
<div class="deprecated">
  
<p><span class="versionmodified deprecated">3.3 版后已移除: </span>This function uses simple case mappings.</p>
+
<p><span class="versionmodified deprecated">3.3 版起已弃用:</span>此函数使用简单的大小写映射。</p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
; int <code>Py_UNICODE_TODECIMAL</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_TODECIMAL</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return the character ''ch'' converted to a decimal positive integer. Return <code>-1</code> if this is not possible. This macro does not raise exceptions.
 
  
; int <code>Py_UNICODE_TODIGIT</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
: 返回转换为十进制正整数的字符 ''ch''。 如果这是不可能的,则返回 <code>-1</code>。 此宏不会引发异常。
: Return the character ''ch'' converted to a single digit integer. Return <code>-1</code> if this is not possible. This macro does not raise exceptions.
 
  
; double <code>Py_UNICODE_TONUMERIC</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] ''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_TODIGIT</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Return the character ''ch'' converted to a double. Return <code>-1.0</code> if this is not possible. This macro does not raise exceptions.
 
  
These APIs can be used to work with surrogates:
+
: 返回字符 ''ch'' 转换为一位整数。 如果这是不可能的,则返回 <code>-1</code>。 此宏不会引发异常。
  
; <code>Py_UNICODE_IS_SURROGATE</code><span class="sig-paren">(</span>''ch''<span class="sig-paren">)</span>
+
; <span class="kt"><span class="pre">double</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_TONUMERIC</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Check if ''ch'' is a surrogate (<code>0xD800 &lt;= ch &lt;= 0xDFFF</code>).
 
  
; <code>Py_UNICODE_IS_HIGH_SURROGATE</code><span class="sig-paren">(</span>''ch''<span class="sig-paren">)</span>
+
: 返回转换为双精度的字符 ''ch''。 如果这是不可能的,则返回 <code>-1.0</code>。 此宏不会引发异常。
: Check if ''ch'' is a high surrogate (<code>0xD800 &lt;= ch &lt;= 0xDBFF</code>).
 
  
; <code>Py_UNICODE_IS_LOW_SURROGATE</code><span class="sig-paren">(</span>''ch''<span class="sig-paren">)</span>
+
这些 API 可用于代理:
: Check if ''ch'' is a low surrogate (<code>0xDC00 &lt;= ch &lt;= 0xDFFF</code>).
 
  
; <code>Py_UNICODE_JOIN_SURROGATES</code><span class="sig-paren">(</span>''high'', ''low''<span class="sig-paren">)</span>
+
; <span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_IS_SURROGATE</span></span></span><span class="sig-paren">(</span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
: Join two surrogate characters and return a single Py_UCS4 value. ''high'' and ''low'' are respectively the leading and trailing surrogates in a surrogate pair.
+
 
 +
: 检查 ''ch'' 是否是代理 (<code>0xD800 &lt;= ch &lt;= 0xDFFF</code>)。
 +
 
 +
; <span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_IS_HIGH_SURROGATE</span></span></span><span class="sig-paren">(</span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 检查 ''ch'' 是否为高代理 (<code>0xD800 &lt;= ch &lt;= 0xDBFF</code>)。
 +
 
 +
; <span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_IS_LOW_SURROGATE</span></span></span><span class="sig-paren">(</span><span class="n"><span class="pre">ch</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 检查 ''ch'' 是否为低代理 (<code>0xDC00 &lt;= ch &lt;= 0xDFFF</code>)。
 +
 
 +
; <span class="sig-name descname"><span class="n"><span class="pre">Py_UNICODE_JOIN_SURROGATES</span></span></span><span class="sig-paren">(</span><span class="n"><span class="pre">high</span></span>, <span class="n"><span class="pre">low</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 连接两个代理字符并返回一个 Py_UCS4 值。 ''high'' ''low'' 分别是代理对中的前导和尾随代理。
  
  
第376行: 第343行:
 
<div id="creating-and-accessing-unicode-strings" class="section">
 
<div id="creating-and-accessing-unicode-strings" class="section">
  
=== Creating and accessing Unicode strings ===
+
=== 创建和访问 Unicode 字符串 ===
  
To create Unicode objects and access their basic sequence properties, use these
+
要创建 Unicode 对象并访问它们的基本序列属性,请使用以下 API:
APIs:
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_New</code><span class="sig-paren">(</span>Py_ssize_t ''size'', [[#c.Py_UCS4|Py_UCS4]] ''maxchar''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_New</span></span></span><span class="sig-paren">(</span><span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, [[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="n"><span class="pre">maxchar</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Create a new Unicode object. ''maxchar'' should be the true maximum code point
+
<dd><p>创建一个新的 Unicode 对象。 ''maxchar'' 应该是放置在字符串中的真正最大代码点。 作为近似值,它可以四舍五入到序列 127、255、65535、1114111 中最接近的值。</p>
to be placed in the string. As an approximation, it can be rounded up to the
+
<p>这是分配新 Unicode 对象的推荐方法。 使用此函数创建的对象不可调整大小。</p>
nearest value in the sequence 127, 255, 65535, 1114111.</p>
 
<p>This is the recommended way to allocate a new Unicode object. Objects
 
created using this function are not resizable.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromKindAndData</code><span class="sig-paren">(</span>int ''kind'', ''const'' void *''buffer'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromKindAndData</span></span></span><span class="sig-paren">(</span><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">kind</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">buffer</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Create a new Unicode object with the given ''kind'' (possible values are
+
<dd><p>使用给定的 ''kind'' 创建一个新的 Unicode 对象(可能的值是 [[#c.PyUnicode_1BYTE_KIND|PyUnicode_1BYTE_KIND]] 等,由 [[#c.PyUnicode_KIND|PyUnicode_KIND()]] 返回)。 ''buffer'' 必须指向每个字符 1、2 或 4 个字节的 ''size'' 单位数组,由种类给出。</p>
[[#c.PyUnicode_1BYTE_KIND|<code>PyUnicode_1BYTE_KIND</code>]] etc., as returned by
 
[[#c.PyUnicode_KIND|<code>PyUnicode_KIND()</code>]]). The ''buffer'' must point to an array of ''size''
 
units of 1, 2 or 4 bytes per character, as given by the kind.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromStringAndSize</code><span class="sig-paren">(</span>''const'' char *''u'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromStringAndSize</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">u</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Create a Unicode object from the char buffer ''u''. The bytes will be
+
<dd><p>从字符缓冲区 ''u'' 创建一个 Unicode 对象。 字节将被解释为 UTF-8 编码。 缓冲区被复制到新对象中。 如果缓冲区不是<code>NULL</code>,则返回值可能是共享对象,即 不允许修改数据。</p>
interpreted as being UTF-8 encoded. The buffer is copied into the new
+
<p>如果 ''u'' <code>NULL</code>,则此函数的行为类似于 [[#c.PyUnicode_FromUnicode|PyUnicode_FromUnicode()]],缓冲区设置为 <code>NULL</code>。 这种用法已被弃用,取而代之的是 [[#c.PyUnicode_New|PyUnicode_New()]],并将在 Python 3.12 中删除。</p></dd></dl>
object. If the buffer is not <code>NULL</code>, the return value might be a shared
 
object, i.e. modification of the data is not allowed.</p>
 
<p>If ''u'' is <code>NULL</code>, this function behaves like [[#c.PyUnicode_FromUnicode|<code>PyUnicode_FromUnicode()</code>]]
 
with the buffer set to <code>NULL</code>. This usage is deprecated in favor of
 
[[#c.PyUnicode_New|<code>PyUnicode_New()</code>]].</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromString</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">u</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromString</code><span class="sig-paren">(</span>''const'' char *''u''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 从 UTF-8 编码的空终止字符缓冲区 ''u'' 创建一个 Unicode 对象。
<p>Create a Unicode object from a UTF-8 encoded null-terminated char buffer
 
''u''.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromFormat</code><span class="sig-paren">(</span>''const'' char *''format'', ...<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromFormat</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">format</span></span>, <span class="p"><span class="pre">...</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Take a C <code>printf()</code>-style ''format'' string and a variable number of
+
<dd><p>取一个 C <code>printf()</code> 样式的 ''format'' 字符串和可变数量的参数,计算生成的 Python Unicode 字符串的大小并返回一个字符串,其中包含格式化的值。 变量参数必须是 C 类型,并且必须与 ''format'' ASCII 编码字符串中的格式字符完全对应。 允许使用以下格式字符:</p>
arguments, calculate the size of the resulting Python Unicode string and return
 
a string with the values formatted into it. The variable arguments must be C
 
types and must correspond exactly to the format characters in the ''format''
 
ASCII-encoded string. The following format characters are allowed:</p>
 
 
{|
 
{|
!width="26%"| <p>Format Characters</p>
+
!width="26%"| <p>格式字符</p>
!width="28%"| <p>Type</p>
+
!width="28%"| <p>类型</p>
!width="46%"| <p>Comment</p>
+
!width="46%"| <p>评论</p>
 
|-
 
|-
 
| <p><code>%%</code></p>
 
| <p><code>%%</code></p>
| <p>''n/a''</p>
+
| <p>''不适用''</p>
 
| <p>The literal % character.</p>
 
| <p>The literal % character.</p>
 
|-
 
|-
 
| <p><code>%c</code></p>
 
| <p><code>%c</code></p>
| <p>int</p>
+
| <p>整数</p>
| <p>A single character,
+
| <p>单个字符,表示为 C int。</p>
represented as a C int.</p>
 
 
|-
 
|-
 
| <p><code>%d</code></p>
 
| <p><code>%d</code></p>
| <p>int</p>
+
| <p>整数</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%d&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%d&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%u</code></p>
 
| <p><code>%u</code></p>
| <p>unsigned int</p>
+
| <p>无符号整数</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%u&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%u&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%ld</code></p>
 
| <p><code>%ld</code></p>
| <p>long</p>
+
| <p></p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%ld&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%ld&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%li</code></p>
 
| <p><code>%li</code></p>
| <p>long</p>
+
| <p></p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%li&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%li&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%lu</code></p>
 
| <p><code>%lu</code></p>
| <p>unsigned long</p>
+
| <p>无符号长</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%lu&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%lu&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%lld</code></p>
 
| <p><code>%lld</code></p>
| <p>long long</p>
+
| <p>长长的</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%lld&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%lld&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%lli</code></p>
 
| <p><code>%lli</code></p>
| <p>long long</p>
+
| <p>长长的</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%lli&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%lli&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%llu</code></p>
 
| <p><code>%llu</code></p>
| <p>unsigned long long</p>
+
| <p>无符号长长</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%llu&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%llu&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%zd</code></p>
 
| <p><code>%zd</code></p>
| <p>Py_ssize_t</p>
+
| <p>py_ssize_t</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%zd&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%zd&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%zi</code></p>
 
| <p><code>%zi</code></p>
| <p>Py_ssize_t</p>
+
| <p>py_ssize_t</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%zi&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%zi&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%zu</code></p>
 
| <p><code>%zu</code></p>
| <p>size_t</p>
+
| <p>尺寸_t</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%zu&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%zu&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%i</code></p>
 
| <p><code>%i</code></p>
| <p>int</p>
+
| <p>整数</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%i&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%i&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%x</code></p>
 
| <p><code>%x</code></p>
| <p>int</p>
+
| <p>整数</p>
| <p>Equivalent to
+
| <p>相当于 <code>printf(&quot;%x&quot;)</code>[[#id14|1]]</p>
<code>printf(&quot;%x&quot;)</code>. [[#id14|1]]</p>
 
 
|-
 
|-
 
| <p><code>%s</code></p>
 
| <p><code>%s</code></p>
| <p>const char*</p>
+
| <p>常量字符*</p>
| <p>A null-terminated C character
+
| <p>以空字符结尾的 C 字符数组。</p>
array.</p>
 
 
|-
 
|-
 
| <p><code>%p</code></p>
 
| <p><code>%p</code></p>
| <p>const void*</p>
+
| <p>常量空*</p>
| <p>The hex representation of a C
+
| <p>C 指针的十六进制表示。 大部分等同于 <code>printf(&quot;%p&quot;)</code>,除了它保证以文字 <code>0x</code> 开头,而不管平台的 <code>printf</code> 产生什么。</p>
pointer. Mostly equivalent to
 
<code>printf(&quot;%p&quot;)</code> except that
 
it is guaranteed to start with
 
the literal <code>0x</code> regardless
 
of what the platform's
 
<code>printf</code> yields.</p>
 
 
|-
 
|-
 
| <p><code>%A</code></p>
 
| <p><code>%A</code></p>
| <p>PyObject*</p>
+
| <p>对象*</p>
| <p>The result of calling
+
| <p>调用 [[../../library/functions#ascii|ascii()]] 的结果。</p>
[[../../library/functions#ascii|<code>ascii()</code>]].</p>
 
 
|-
 
|-
 
| <p><code>%U</code></p>
 
| <p><code>%U</code></p>
| <p>PyObject*</p>
+
| <p>对象*</p>
| <p>A Unicode object.</p>
+
| <p>一个 Unicode 对象。</p>
 
|-
 
|-
 
| <p><code>%V</code></p>
 
| <p><code>%V</code></p>
| <p>PyObject*,
+
| <p>PyObject*, const char*</p>
const char*</p>
+
| <p>一个 Unicode 对象(可能是 <code>NULL</code>)和一个以空字符结尾的 C 字符数组作为第二个参数(如果第一个参数是 <code>NULL</code>,将使用它)。</p>
| <p>A Unicode object (which may be
 
<code>NULL</code>) and a null-terminated
 
C character array as a second
 
parameter (which will be used,
 
if the first parameter is
 
<code>NULL</code>).</p>
 
 
|-
 
|-
 
| <p><code>%S</code></p>
 
| <p><code>%S</code></p>
| <p>PyObject*</p>
+
| <p>对象*</p>
| <p>The result of calling
+
| <p>调用 [[../object#c|PyObject_Str()]] 的结果。</p>
[[../object#c|<code>PyObject_Str()</code>]].</p>
 
 
|-
 
|-
 
| <p><code>%R</code></p>
 
| <p><code>%R</code></p>
| <p>PyObject*</p>
+
| <p>对象*</p>
| <p>The result of calling
+
| <p>调用 [[../object#c|PyObject_Repr()]] 的结果。</p>
[[../object#c|<code>PyObject_Repr()</code>]].</p>
 
 
|}
 
|}
  
<p>An unrecognized format character causes all the rest of the format string to be
+
<p>无法识别的格式字符会导致格式字符串的所有其余部分按原样复制到结果字符串,并丢弃任何额外的参数。</p>
copied as-is to the result string, and any extra arguments discarded.</p>
 
 
<div class="admonition note">
 
<div class="admonition note">
  
<p>注解</p>
+
<p>笔记</p>
<p>The width formatter unit is number of characters rather than bytes.
+
<p>宽度格式化单元是字符数而不是字节数。 精度格式器单位是 <code>&quot;%s&quot;</code> <code>&quot;%V&quot;</code> 的字节数(如果 <code>PyObject*</code> 参数是 <code>NULL</code>),以及 <code>&quot;%A&quot;</code><code>&quot;%U&quot;</code><code>&quot;%S&quot;</code><code>&quot;%R&quot;</code> <code>&quot;%V&quot;</code>(如果 <code>PyObject*</code> 参数不是 <code>NULL</code> .</p>
The precision formatter unit is number of bytes for <code>&quot;%s&quot;</code> and
 
<code>&quot;%V&quot;</code> (if the <code>PyObject*</code> argument is <code>NULL</code>), and a number of
 
characters for <code>&quot;%A&quot;</code>, <code>&quot;%U&quot;</code>, <code>&quot;%S&quot;</code>, <code>&quot;%R&quot;</code> and <code>&quot;%V&quot;</code>
 
(if the <code>PyObject*</code> argument is not <code>NULL</code>).</p>
 
  
 
</div>
 
</div>
 
<dl>
 
<dl>
<dt><span class="brackets">1</span><span class="fn-backref">([[#id1|1]],[[#id2|2]],[[#id3|3]],[[#id4|4]],[[#id5|5]],[[#id6|6]],[[#id7|7]],[[#id8|8]],[[#id9|9]],[[#id10|10]],[[#id11|11]],[[#id12|12]],[[#id13|13]])</span></dt>
+
<dt><span class="brackets">1</span><span class="fn-backref">([[#id1|1]],[[#id2|2]],[[#id3|3]],[[#id4|4]],[[#id5|5]],[[#id6|6]],[ X67X]7,[[#id8|8]],[[#id9|9]],[[#id10|10]],[[#id11|11]],[[#id12|12]], [[#id13|13]])</span></dt>
<dd><p>For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
+
<dd><p>对于整数说明符 (d, u, ld, li, lu, lld, lli, llu, zd, zi, zu, i, x):即使给定精度,0 转换标志也有效。</p></dd></dl>
zu, i, x): the 0-conversion flag has effect even when a precision is given.</p></dd></dl>
 
  
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.2 版更改: </span>Support for <code>&quot;%lld&quot;</code> and <code>&quot;%llu&quot;</code> added.</p>
+
<p><span class="versionmodified changed"> 3.2 版更改: </span> 添加了对 <code>&quot;%lld&quot;</code> <code>&quot;%llu&quot;</code> 的支持。</p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.3 版更改: </span>Support for <code>&quot;%li&quot;</code>, <code>&quot;%lli&quot;</code> and <code>&quot;%zi&quot;</code> added.</p>
+
<p><span class="versionmodified changed"> 3.3 版更改: </span> 添加了对 <code>&quot;%li&quot;</code><code>&quot;%lli&quot;</code> <code>&quot;%zi&quot;</code> 的支持。</p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.4 版更改: </span>Support width and precision formatter for <code>&quot;%s&quot;</code>, <code>&quot;%A&quot;</code>, <code>&quot;%U&quot;</code>,
+
<p><span class="versionmodified changed"> 3.4 版更改: </span> 支持 <code>&quot;%s&quot;</code><code>&quot;%A&quot;</code><code>&quot;%U&quot;</code><code>&quot;%V&quot;</code><code>&quot;%S&quot;</code>的宽度和精度格式器]、<code>&quot;%R&quot;</code> 添加。</p>
<code>&quot;%V&quot;</code>, <code>&quot;%S&quot;</code>, <code>&quot;%R&quot;</code> added.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromFormatV</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">format</span></span>, <span class="n"><span class="pre">va_list</span></span><span class="w"> </span><span class="n"><span class="pre">vargs</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromFormatV</code><span class="sig-paren">(</span>''const'' char *''format'', va_list ''vargs''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 与 [[#c.PyUnicode_FromFormat|PyUnicode_FromFormat()]] 相同,只是它只需要两个参数。
<p>Identical to [[#c.PyUnicode_FromFormat|<code>PyUnicode_FromFormat()</code>]] except that it takes exactly two
 
arguments.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromEncodedObject</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''obj'', ''const'' char *''encoding'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromEncodedObject</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">obj</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">encoding</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Decode an encoded object ''obj'' to a Unicode object.</p>
+
<dd><p>将编码对象 ''obj'' 解码为 Unicode 对象。</p>
<p>[[../../library/stdtypes#bytes|<code>bytes</code>]], [[../../library/stdtypes#bytearray|<code>bytearray</code>]] and other
+
<p>[[../../library/stdtypes#bytes|bytes]], [[../../library/stdtypes#bytearray|bytearray]] 和其他 [[../../glossary#term-bytes-like-object|bytes-like objects]] 根据给定的 ''encoding'' 并使用 ''定义的错误处理进行解码错误''。 两者都可以是 <code>NULL</code> 以使接口使用默认值(有关详细信息,请参阅 [[#builtincodecs|内置编解码器]] )。</p>
[[../../glossary#term-bytes-like-object|<span class="xref std std-term">bytes-like objects</span>]]
+
<p>所有其他对象,包括 Unicode 对象,都会导致设置 [[../../library/exceptions#TypeError|TypeError]]</p>
are decoded according to the given ''encoding'' and using the error handling
+
<p>如果出现错误,API 将返回 <code>NULL</code>。 调用者负责定义返回的对象。</p></dd></dl>
defined by ''errors''. Both can be <code>NULL</code> to have the interface use the default
 
values (see [[#builtincodecs|<span class="std std-ref">Built-in Codecs</span>]] for details).</p>
 
<p>All other objects, including Unicode objects, cause a [[../../library/exceptions#TypeError|<code>TypeError</code>]] to be
 
set.</p>
 
<p>The API returns <code>NULL</code> if there was an error. The caller is responsible for
 
decref'ing the returned objects.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>Py_ssize_t <code>PyUnicode_GetLength</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
+
<dt><span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_GetLength</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the length of the Unicode object, in code points.</p>
+
</dt>
 +
<dd><p>返回 Unicode 对象的长度,以代码点为单位。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>Py_ssize_t <code>PyUnicode_CopyCharacters</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''to'', Py_ssize_t ''to_start'', [[../structures#c|PyObject]] *''from'', Py_ssize_t ''from_start'', Py_ssize_t ''how_many''<span class="sig-paren">)</span></dt>
+
<dt><span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_CopyCharacters</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">to</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">to_start</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">from</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">from_start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">how_many</span></span><span class="sig-paren">)</span><br />
<dd><p>Copy characters from one Unicode object into another. This function performs
+
</dt>
character conversion when necessary and falls back to <code>memcpy()</code> if
+
<dd><p>将字符从一个 Unicode 对象复制到另一个。 此函数在必要时执行字符转换,并在可能的情况下回退到 <code>memcpy()</code>。 返回 <code>-1</code> 并设置错误异常,否则返回复制的字符数。</p>
possible. Returns <code>-1</code> and sets an exception on error, otherwise returns
 
the number of copied characters.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>Py_ssize_t <code>PyUnicode_Fill</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', Py_ssize_t ''start'', Py_ssize_t ''length'', [[#c.Py_UCS4|Py_UCS4]] ''fill_char''<span class="sig-paren">)</span></dt>
+
<dt><span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Fill</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">length</span></span>, [[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="n"><span class="pre">fill_char</span></span><span class="sig-paren">)</span><br />
<dd><p>Fill a string with a character: write ''fill_char'' into
+
</dt>
<code>unicode[start:start+length]</code>.</p>
+
<dd><p>用字符填充字符串:将 ''fill_char'' 写入 <code>unicode[start:start+length]</code></p>
<p>Fail if ''fill_char'' is bigger than the string maximum character, or if the
+
<p>如果 ''fill_char'' 大于字符串最大字符,或者字符串有 1 个以上的引用,则失败。</p>
string has more than 1 reference.</p>
+
<p>返回写入的字符数,或返回 <code>-1</code> 并在出错时引发异常。</p>
<p>Return the number of written character, or return <code>-1</code> and raise an
 
exception on error.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_WriteChar</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', Py_ssize_t ''index'', [[#c.Py_UCS4|Py_UCS4]] ''character''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_WriteChar</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">index</span></span>, [[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="n"><span class="pre">character</span></span><span class="sig-paren">)</span><br />
<dd><p>Write a character to a string. The string must have been created through
+
</dt>
[[#c.PyUnicode_New|<code>PyUnicode_New()</code>]]. Since Unicode strings are supposed to be immutable,
+
<dd><p>将字符写入字符串。 该字符串必须是通过 [[#c.PyUnicode_New|PyUnicode_New()]] 创建的。 由于 Unicode 字符串应该是不可变的,因此字符串不能被共享,或者已经被散列。</p>
the string must not be shared, or have been hashed yet.</p>
+
<p>该函数检查 ''unicode'' 是一个 Unicode 对象,索引没有越界,并且该对象可以安全地修改(即 它的引用计数是一)。</p>
<p>This function checks that ''unicode'' is a Unicode object, that the index is
 
not out of bounds, and that the object can be modified safely (i.e. that it
 
its reference count is one).</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UCS4|Py_UCS4]] <code>PyUnicode_ReadChar</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', Py_ssize_t ''index''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_ReadChar</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">index</span></span><span class="sig-paren">)</span><br />
<dd><p>Read a character from a string. This function checks that ''unicode'' is a
+
</dt>
Unicode object and the index is not out of bounds, in contrast to the macro
+
<dd><p>从字符串中读取一个字符。 与宏版本 [[#c.PyUnicode_READ_CHAR|PyUnicode_READ_CHAR()]] 相比,此函数检查 ''unicode'' 是一个 Unicode 对象并且索引没有越界。</p>
version [[#c.PyUnicode_READ_CHAR|<code>PyUnicode_READ_CHAR()</code>]].</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Substring</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', Py_ssize_t ''start'', Py_ssize_t ''end''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Substring</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">end</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Return a substring of ''str'', from character index ''start'' (included) to
+
<dd><p>返回 ''str'' 的子字符串,从字符索引 ''start''(包含)到字符索引 ''end''(不包含)。 不支持负索引。</p>
character index ''end'' (excluded). Negative indices are not supported.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UCS4|Py_UCS4]] *<code>PyUnicode_AsUCS4</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''u'', [[#c.Py_UCS4|Py_UCS4]] *''buffer'', Py_ssize_t ''buflen'', int ''copy_null''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUCS4</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">u</span></span>, [[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">buffer</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">buflen</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">copy_null</span></span><span class="sig-paren">)</span><br />
<dd><p>Copy the string ''u'' into a UCS4 buffer, including a null character, if
+
</dt>
''copy_null'' is set. Returns <code>NULL</code> and sets an exception on error (in
+
<dd><p>如果设置了 ''copy_null'',则将字符串 ''u'' 复制到 UCS4 缓冲区中,包括空字符。 返回 <code>NULL</code> 并设置错误异常(特别是,如果 ''buflen'' 小于 ''u'' 的长度,则为 [[../../library/exceptions#SystemError|SystemError]])。 ''buffer'' 成功返回。</p>
particular, a [[../../library/exceptions#SystemError|<code>SystemError</code>]] if ''buflen'' is smaller than the length of
 
''u''). ''buffer'' is returned on success.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UCS4|Py_UCS4]] *<code>PyUnicode_AsUCS4Copy</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''u''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUCS4Copy</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">u</span></span><span class="sig-paren">)</span><br />
<dd><p>Copy the string ''u'' into a new UCS4 buffer that is allocated using
+
</dt>
[[../memory#c|<code>PyMem_Malloc()</code>]]. If this fails, <code>NULL</code> is returned with a
+
<dd><p>将字符串 ''u'' 复制到使用 [[../memory#c|PyMem_Malloc()]] 分配的新 UCS4 缓冲区中。 如果失败,则返回 <code>NULL</code> 并设置 [[../../library/exceptions#MemoryError|MemoryError]]。 返回的缓冲区总是附加一个额外的空代码点。</p>
[[../../library/exceptions#MemoryError|<code>MemoryError</code>]] set. The returned buffer always has an extra
 
null code point appended.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
第710行: 第602行:
 
<div id="deprecated-py-unicode-apis" class="section">
 
<div id="deprecated-py-unicode-apis" class="section">
  
=== Deprecated Py_UNICODE APIs ===
+
=== 已弃用的 Py_UNICODE API ===
  
<div class="deprecated-removed">
+
这些 API 函数随着 <span id="index-2" class="target"></span>[https://www.python.org/dev/peps/pep-0393 PEP 393] 的实现而被弃用。 扩展模块可以继续使用它们,因为它们不会在 Python 3.x 中被删除,但需要注意它们的使用现在可能会导致性能和内存命中。
 
 
<span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0.</span>
 
 
 
 
 
</div>
 
These API functions are deprecated with the implementation of <span id="index-2" class="target"></span>[https://www.python.org/dev/peps/pep-0393 '''PEP 393'''].
 
Extension modules can continue using them, as they will not be removed in Python
 
3.x, but need to be aware that their use can now cause performance and memory hits.
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromUnicode</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''u'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromUnicode</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">u</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Create a Unicode object from the Py_UNICODE buffer ''u'' of the given size. ''u''
+
<dd><p>从给定大小的 Py_UNICODE 缓冲区 ''u'' 创建一个 Unicode 对象。 ''u'' 可能是 <code>NULL</code> 这会导致内容未定义。 用户有责任填写所需的数据。 缓冲区被复制到新对象中。</p>
may be <code>NULL</code> which causes the contents to be undefined. It is the user's
+
<p>如果缓冲区不是 <code>NULL</code>,则返回值可能是共享对象。 因此,仅当 ''u'' <code>NULL</code> 时,才允许修改生成的 Unicode 对象。</p>
responsibility to fill in the needed data. The buffer is copied into the new
+
<p>如果缓冲区是 <code>NULL</code>,则在使用任何访问宏(例如 [[#c.PyUnicode_KIND|PyUnicode_KIND()]])之前,必须在填充字符串内容后调用 [[#c.PyUnicode_READY|PyUnicode_READY()]]</p></dd></dl>
object.</p>
 
<p>If the buffer is not <code>NULL</code>, the return value might be a shared object.
 
Therefore, modification of the resulting Unicode object is only allowed when
 
''u'' is <code>NULL</code>.</p>
 
<p>If the buffer is <code>NULL</code>, [[#c.PyUnicode_READY|<code>PyUnicode_READY()</code>]] must be called once the
 
string content has been filled before using any of the access macros such as
 
[[#c.PyUnicode_KIND|<code>PyUnicode_KIND()</code>]].</p>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using
 
[[#c.PyUnicode_FromKindAndData|<code>PyUnicode_FromKindAndData()</code>]], [[#c.PyUnicode_FromWideChar|<code>PyUnicode_FromWideChar()</code>]], or
 
[[#c.PyUnicode_New|<code>PyUnicode_New()</code>]].</p>
 
  
</div></dd></dl>
+
; [[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUnicode</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
  
<dl>
+
: 返回一个只读指针,指向 Unicode 对象的内部 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区,或者在出错时返回 <code>NULL</code>。 如果对象尚不可用,这将创建对象的 <span class="xref c c-texpr">Py_UNICODE*</span> 表示。 缓冲区总是以额外的空代码点终止。 请注意,生成的 [[#c.Py_UNICODE|Py_UNICODE]] 字符串也可能包含嵌入的空代码点,这会导致该字符串在大多数 C 函数中使用时被截断。
<dt>[[#c.Py_UNICODE|Py_UNICODE]] *<code>PyUnicode_AsUnicode</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>Return a read-only pointer to the Unicode object's internal
 
[[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer, or <code>NULL</code> on error. This will create the
 
<span class="xref c c-texpr">[[#c.Py_UNICODE|Py_UNICODE]]*</span> representation of the object if it is not yet
 
available. The buffer is always terminated with an extra null code point.
 
Note that the resulting [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] string may also contain
 
embedded null code points, which would cause the string to be truncated when
 
used in most C functions.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_TransformDecimalToASCII</span></span></span><span class="sig-paren">(</span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_AsUCS4|<code>PyUnicode_AsUCS4()</code>]], [[#c.PyUnicode_AsWideChar|<code>PyUnicode_AsWideChar()</code>]],
 
[[#c.PyUnicode_ReadChar|<code>PyUnicode_ReadChar()</code>]] or similar new APIs.</p>
 
  
</div>
+
: 通过根据十进制值将给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区中的所有十进制数字替换为 ASCII 数字 0-9,创建一个 Unicode 对象。 如果发生异常,则返回 <code>NULL</code>
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.10.</span></p>
 
 
 
</div></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_TransformDecimalToASCII</code><span class="sig-paren">(</span>[[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUnicodeAndSize</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Create a Unicode object by replacing all decimal digits in
+
<dd><p>[[#c.PyUnicode_AsUnicode|PyUnicode_AsUnicode()]] 一样,但也将 [[#c.Py_UNICODE|Py_UNICODE()]] 数组长度(不包括额外的空终止符)保存在 ''size'' 中。 请注意,生成的 <span class="xref c c-texpr">Py_UNICODE*</span> 字符串可能包含嵌入的空代码点,这会导致该字符串在大多数 C 函数中使用时被截断。</p>
[[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' by ASCII digits 0--9
 
according to their decimal value. Return <code>NULL</code> if an exception occurs.</p></dd></dl>
 
 
 
<dl>
 
<dt>[[#c.Py_UNICODE|Py_UNICODE]] *<code>PyUnicode_AsUnicodeAndSize</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', Py_ssize_t *''size''<span class="sig-paren">)</span></dt>
 
<dd><p>Like [[#c.PyUnicode_AsUnicode|<code>PyUnicode_AsUnicode()</code>]], but also saves the [[#c.Py_UNICODE|<code>Py_UNICODE()</code>]]
 
array length (excluding the extra null terminator) in ''size''.
 
Note that the resulting <span class="xref c c-texpr">[[#c.Py_UNICODE|Py_UNICODE]]*</span> string
 
may contain embedded null code points, which would cause the string to be
 
truncated when used in most C functions.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
 
 
</div>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using
 
[[#c.PyUnicode_AsUCS4|<code>PyUnicode_AsUCS4()</code>]], [[#c.PyUnicode_AsWideChar|<code>PyUnicode_AsWideChar()</code>]],
 
[[#c.PyUnicode_ReadChar|<code>PyUnicode_ReadChar()</code>]] or similar new APIs.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[#c.Py_UNICODE|Py_UNICODE]] *<code>PyUnicode_AsUnicodeCopy</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
+
<dt>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUnicodeCopy</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dd><p>Create a copy of a Unicode string ending with a null code point. Return <code>NULL</code>
+
</dt>
and raise a [[../../library/exceptions#MemoryError|<code>MemoryError</code>]] exception on memory allocation failure,
+
<dd><p>创建以空代码点结尾的 Unicode 字符串的副本。 返回 <code>NULL</code> 并在内存分配失败时引发 [[../../library/exceptions#MemoryError|MemoryError]] 异常,否则返回新分配的缓冲区(使用 [[../memory#c|PyMem_Free()]] 释放缓冲区)。 请注意,生成的 <span class="xref c c-texpr">Py_UNICODE*</span> 字符串可能包含嵌入的空代码点,这会导致该字符串在大多数 C 函数中使用时被截断。</p>
otherwise return a new allocated buffer (use [[../memory#c|<code>PyMem_Free()</code>]] to free
 
the buffer). Note that the resulting <span class="xref c c-texpr">[[#c.Py_UNICODE|Py_UNICODE]]*</span> string may
 
contain embedded null code points, which would cause the string to be
 
truncated when used in most C functions.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.2 新版功能.</span></p>
+
<p><span class="versionmodified added">3.2 版中的新功能。</span></p>
  
 
</div>
 
</div>
<p>Please migrate to using [[#c.PyUnicode_AsUCS4Copy|<code>PyUnicode_AsUCS4Copy()</code>]] or similar new APIs.</p></dd></dl>
+
<p>请迁移到使用 [[#c.PyUnicode_AsUCS4Copy|PyUnicode_AsUCS4Copy()]] 或类似的新 API。</p></dd></dl>
 
 
<dl>
 
<dt>Py_ssize_t <code>PyUnicode_GetSize</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>Return the size of the deprecated [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] representation, in
 
code units (this includes surrogate pairs as 2 units).</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 3.12: </span>Part of the old-style Unicode API, please migrate to using
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_GetSize</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_GET_LENGTH|<code>PyUnicode_GET_LENGTH()</code>]].</p>
 
  
</div></dd></dl>
+
: 返回已弃用的 [[#c.Py_UNICODE|Py_UNICODE]] 表示的大小,以代码单元为单位(这包括作为 2 个单元的代理对)。
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromObject</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''obj''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromObject</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">obj</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Copy an instance of a Unicode subtype to a new true Unicode object if
+
<dd><p>如有必要,将 Unicode 子类型的实例复制到新的真正 Unicode 对象。 如果 ''obj'' 已经是真正的 Unicode 对象(不是子类型),则返回引用计数增加的引用。</p>
necessary. If ''obj'' is already a true Unicode object (not a subtype),
+
<p>Unicode 或其子类型以外的对象将导致 [[../../library/exceptions#TypeError|TypeError]]</p></dd></dl>
return the reference with incremented refcount.</p>
 
<p>Objects other than Unicode or its subtypes will cause a [[../../library/exceptions#TypeError|<code>TypeError</code>]].</p></dd></dl>
 
  
  
第830行: 第656行:
 
<div id="locale-encoding" class="section">
 
<div id="locale-encoding" class="section">
  
=== Locale Encoding ===
+
=== 语言环境编码 ===
  
The current locale encoding can be used to decode text from the operating
+
当前区域设置编码可用于解码来自操作系统的文本。
system.
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeLocaleAndSize</code><span class="sig-paren">(</span>''const'' char *''str'', Py_ssize_t ''len'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeLocaleAndSize</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">len</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Decode a string from UTF-8 on Android and VxWorks, or from the current
+
<dd><p>从 Android 和 VxWorks 上的 UTF-8 或其他平台上的当前语言环境编码解码字符串。 支持的错误处理程序是 <code>&quot;strict&quot;</code> <code>&quot;surrogateescape&quot;</code> (<span id="index-3" class="target"></span>[https://www.python.org/dev/peps/pep-0383 PEP 383])。 如果 ''errors'' 是 <code>NULL</code>,则解码器使用 <code>&quot;strict&quot;</code> 错误处理程序。 ''str'' 必须以空字符结尾,但不能包含嵌入的空字符。</p>
locale encoding on other platforms. The supported
+
<p>使用 [[#c.PyUnicode_DecodeFSDefaultAndSize|PyUnicode_DecodeFSDefaultAndSize()]] 解码来自 <code>Py_FileSystemDefaultEncoding</code> 的字符串(Python 启动时读取的语言环境编码)。</p>
error handlers are <code>&quot;strict&quot;</code> and <code>&quot;surrogateescape&quot;</code>
+
<p>此函数忽略 Python UTF-8 模式。</p>
(<span id="index-3" class="target"></span>[https://www.python.org/dev/peps/pep-0383 '''PEP 383''']). The decoder uses <code>&quot;strict&quot;</code> error handler if
 
''errors'' is <code>NULL</code>. ''str'' must end with a null character but
 
cannot contain embedded null characters.</p>
 
<p>Use [[#c.PyUnicode_DecodeFSDefaultAndSize|<code>PyUnicode_DecodeFSDefaultAndSize()</code>]] to decode a string from
 
<code>Py_FileSystemDefaultEncoding</code> (the locale encoding read at
 
Python startup).</p>
 
<p>This function ignores the Python UTF-8 mode.</p>
 
 
<div class="admonition seealso">
 
<div class="admonition seealso">
  
<p>参见</p>
+
<p>也可以看看</p>
<p>The [[../sys#c|<code>Py_DecodeLocale()</code>]] function.</p>
+
<p>[[../sys#c|Py_DecodeLocale()]] 函数。</p>
  
 
</div>
 
</div>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.7 版更改: </span>The function now also uses the current locale encoding for the
+
<p><span class="versionmodified changed"> 3.7 版更改: </span> 该函数现在还使用 <code>surrogateescape</code> 错误处理程序的当前区域设置编码,Android 除外。 之前,[[../sys#c|Py_DecodeLocale()]]用于<code>surrogateescape</code>,当前区域编码用于<code>strict</code></p>
<code>surrogateescape</code> error handler, except on Android. Previously, [[../sys#c|<code>Py_DecodeLocale()</code>]]
 
was used for the <code>surrogateescape</code>, and the current locale encoding was
 
used for <code>strict</code>.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeLocale</code><span class="sig-paren">(</span>''const'' char *''str'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeLocale</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Similar to [[#c.PyUnicode_DecodeLocaleAndSize|<code>PyUnicode_DecodeLocaleAndSize()</code>]], but compute the string
+
<dd><p>类似于 [[#c.PyUnicode_DecodeLocaleAndSize|PyUnicode_DecodeLocaleAndSize()]],但使用 <code>strlen()</code> 计算字符串长度。</p>
length using <code>strlen()</code>.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeLocale</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeLocale</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current
+
<dd><p>在 Android 和 VxWorks 上将 Unicode 对象编码为 UTF-8,或在其他平台上编码为当前的语言环境编码。 支持的错误处理程序是 <code>&quot;strict&quot;</code> <code>&quot;surrogateescape&quot;</code> (<span id="index-4" class="target"></span>[https://www.python.org/dev/peps/pep-0383 PEP 383])。 如果 ''errors'' 是 <code>NULL</code>,则编码器使用 <code>&quot;strict&quot;</code> 错误处理程序。 返回一个 [[../../library/stdtypes#bytes|bytes]] 对象。 ''unicode'' 不能包含嵌入的空字符。</p>
locale encoding on other platforms. The
+
<p>使用 [[#c.PyUnicode_EncodeFSDefault|PyUnicode_EncodeFSDefault()]] 将字符串编码为 <code>Py_FileSystemDefaultEncoding</code>(Python 启动时读取的语言环境编码)。</p>
supported error handlers are <code>&quot;strict&quot;</code> and <code>&quot;surrogateescape&quot;</code>
+
<p>此函数忽略 Python UTF-8 模式。</p>
(<span id="index-4" class="target"></span>[https://www.python.org/dev/peps/pep-0383 '''PEP 383''']). The encoder uses <code>&quot;strict&quot;</code> error handler if
 
''errors'' is <code>NULL</code>. Return a [[../../library/stdtypes#bytes|<code>bytes</code>]] object. ''unicode'' cannot
 
contain embedded null characters.</p>
 
<p>Use [[#c.PyUnicode_EncodeFSDefault|<code>PyUnicode_EncodeFSDefault()</code>]] to encode a string to
 
<code>Py_FileSystemDefaultEncoding</code> (the locale encoding read at
 
Python startup).</p>
 
<p>This function ignores the Python UTF-8 mode.</p>
 
 
<div class="admonition seealso">
 
<div class="admonition seealso">
  
<p>参见</p>
+
<p>也可以看看</p>
<p>The [[../sys#c|<code>Py_EncodeLocale()</code>]] function.</p>
+
<p>[[../sys#c|Py_EncodeLocale()]] 函数。</p>
  
 
</div>
 
</div>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.7 版更改: </span>The function now also uses the current locale encoding for the
+
<p><span class="versionmodified changed"> 3.7 版更改: </span> 该函数现在还使用 <code>surrogateescape</code> 错误处理程序的当前区域设置编码,Android 除外。 之前,[[../sys#c|Py_EncodeLocale()]]用于<code>surrogateescape</code>,当前区域编码用于<code>strict</code></p>
<code>surrogateescape</code> error handler, except on Android. Previously,
 
[[../sys#c|<code>Py_EncodeLocale()</code>]]
 
was used for the <code>surrogateescape</code>, and the current locale encoding was
 
used for <code>strict</code>.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
第917行: 第720行:
 
<div id="file-system-encoding" class="section">
 
<div id="file-system-encoding" class="section">
  
=== File System Encoding ===
+
=== 文件系统编码 ===
  
To encode and decode file names and other environment strings,
+
要对文件名和其他环境字符串进行编码和解码,应使用 <code>Py_FileSystemDefaultEncoding</code> 作为编码,并使用 <code>Py_FileSystemDefaultEncodeErrors</code> 作为错误处理程序 (<span id="index-5" class="target"></span>[https://www.python.org/dev/peps/pep-0383 PEP 383] <span id="index-6" class="target"></span>[https://www.python.org/dev/peps/pep-0529 PEP 529])。 要在参数解析期间将文件名编码为 [[../../library/stdtypes#bytes|bytes]],应使用 <code>&quot;O&amp;&quot;</code> 转换器,传递 [[#c.PyUnicode_FSConverter|PyUnicode_FSConverter()]] 作为转换函数:
<code>Py_FileSystemDefaultEncoding</code> should be used as the encoding, and
 
<code>Py_FileSystemDefaultEncodeErrors</code> should be used as the error handler
 
(<span id="index-5" class="target"></span>[https://www.python.org/dev/peps/pep-0383 '''PEP 383'''] and <span id="index-6" class="target"></span>[https://www.python.org/dev/peps/pep-0529 '''PEP 529''']). To encode file names to [[../../library/stdtypes#bytes|<code>bytes</code>]] during
 
argument parsing, the <code>&quot;O&amp;&quot;</code> converter should be used, passing
 
[[#c.PyUnicode_FSConverter|<code>PyUnicode_FSConverter()</code>]] as the conversion function:
 
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_FSConverter</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''obj'', void *''result''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FSConverter</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">obj</span></span>, <span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">result</span></span><span class="sig-paren">)</span><br />
<dd><p>ParseTuple converter: encode [[../../library/stdtypes#str|<code>str</code>]] objects -- obtained directly or
+
</dt>
through the [[../../library/os#os|<code>os.PathLike</code>]] interface -- to [[../../library/stdtypes#bytes|<code>bytes</code>]] using
+
<dd><p>ParseTuple 转换器:编码 [[../../library/stdtypes#str|str]] 对象 - 直接获得或通过 [[../../library/os#os|os.PathLike]] 接口 - 使用 [[#c.PyUnicode_EncodeFSDefault|PyUnicode_EncodeFSDefault()]] 到 [[../../library/stdtypes#bytes|bytes]][[../../library/stdtypes#bytes|bytes]] 对象按原样输出。 ''result'' 必须是一个 <span class="xref c c-texpr">PyBytesObject*</span> 不再使用时必须释放。</p>
[[#c.PyUnicode_EncodeFSDefault|<code>PyUnicode_EncodeFSDefault()</code>]]; [[../../library/stdtypes#bytes|<code>bytes</code>]] objects are output as-is.
 
''result'' must be a <span class="xref c c-texpr">[[../bytes#c|PyBytesObject]]*</span> which must be released when it is
 
no longer used.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.1 新版功能.</span></p>
+
<p><span class="versionmodified added">3.1 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.6 版更改: </span>Accepts a [[../../glossary#term-path-like-object|<span class="xref std std-term">path-like object</span>]].</p>
+
<p><span class="versionmodified changed"> 3.6 版更改:</span> 接受 [[../../glossary#term-path-like-object|类路径对象]] </p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
To decode file names to [[../../library/stdtypes#str|<code>str</code>]] during argument parsing, the <code>&quot;O&amp;&quot;</code>
+
要在参数解析期间将文件名解码为 [[../../library/stdtypes#str|str]],应使用 <code>&quot;O&amp;&quot;</code> 转换器,将 [[#c.PyUnicode_FSDecoder|PyUnicode_FSDecoder()]] 作为转换函数传递:
converter should be used, passing [[#c.PyUnicode_FSDecoder|<code>PyUnicode_FSDecoder()</code>]] as the
 
conversion function:
 
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_FSDecoder</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''obj'', void *''result''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FSDecoder</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">obj</span></span>, <span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">result</span></span><span class="sig-paren">)</span><br />
<dd><p>ParseTuple converter: decode [[../../library/stdtypes#bytes|<code>bytes</code>]] objects -- obtained either
+
</dt>
directly or indirectly through the [[../../library/os#os|<code>os.PathLike</code>]] interface -- to
+
<dd><p>ParseTuple 转换器:解码 [[../../library/stdtypes#bytes|bytes]] 对象 - 通过 [[../../library/os#os|os.PathLike]] 接口直接或间接获得 - 使用 [[#c.PyUnicode_DecodeFSDefaultAndSize|PyUnicode_DecodeFSDefaultAndSize()]] 到 [[../../library/stdtypes#str|str]][[../../library/stdtypes#str|str]] 对象按原样输出。 ''result'' 必须是一个 <span class="xref c c-texpr">PyUnicodeObject*</span> 不再使用时必须释放。</p>
[[../../library/stdtypes#str|<code>str</code>]] using [[#c.PyUnicode_DecodeFSDefaultAndSize|<code>PyUnicode_DecodeFSDefaultAndSize()</code>]]; [[../../library/stdtypes#str|<code>str</code>]]
 
objects are output as-is. ''result'' must be a <span class="xref c c-texpr">[[#c.PyUnicodeObject|PyUnicodeObject]]*</span> which
 
must be released when it is no longer used.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.2 新版功能.</span></p>
+
<p><span class="versionmodified added">3.2 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.6 版更改: </span>Accepts a [[../../glossary#term-path-like-object|<span class="xref std std-term">path-like object</span>]].</p>
+
<p><span class="versionmodified changed"> 3.6 版更改:</span> 接受 [[../../glossary#term-path-like-object|类路径对象]] </p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeFSDefaultAndSize</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeFSDefaultAndSize</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Decode a string using <code>Py_FileSystemDefaultEncoding</code> and the
+
<dd><p>使用 <code>Py_FileSystemDefaultEncoding</code> <code>Py_FileSystemDefaultEncodeErrors</code> 错误处理程序解码字符串。</p>
<code>Py_FileSystemDefaultEncodeErrors</code> error handler.</p>
+
<p>如果未设置 <code>Py_FileSystemDefaultEncoding</code>,则回退到语言环境编码。</p>
<p>If <code>Py_FileSystemDefaultEncoding</code> is not set, fall back to the
+
<p><code>Py_FileSystemDefaultEncoding</code> 在启动时从语言环境编码初始化,以后无法修改。 如果需要从当前语言环境编码解码字符串,请使用 [[#c.PyUnicode_DecodeLocaleAndSize|PyUnicode_DecodeLocaleAndSize()]]</p>
locale encoding.</p>
 
<p><code>Py_FileSystemDefaultEncoding</code> is initialized at startup from the
 
locale encoding and cannot be modified later. If you need to decode a string
 
from the current locale encoding, use
 
[[#c.PyUnicode_DecodeLocaleAndSize|<code>PyUnicode_DecodeLocaleAndSize()</code>]].</p>
 
 
<div class="admonition seealso">
 
<div class="admonition seealso">
  
<p>参见</p>
+
<p>也可以看看</p>
<p>The [[../sys#c|<code>Py_DecodeLocale()</code>]] function.</p>
+
<p>[[../sys#c|Py_DecodeLocale()]] 函数。</p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.6 版更改: </span>Use <code>Py_FileSystemDefaultEncodeErrors</code> error handler.</p>
+
<p><span class="versionmodified changed"> 3.6 版更改: </span> 使用 <code>Py_FileSystemDefaultEncodeErrors</code> 错误处理程序。</p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeFSDefault</code><span class="sig-paren">(</span>''const'' char *''s''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeFSDefault</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Decode a null-terminated string using <code>Py_FileSystemDefaultEncoding</code>
+
<dd><p>使用 <code>Py_FileSystemDefaultEncoding</code> <code>Py_FileSystemDefaultEncodeErrors</code> 错误处理程序解码以空字符结尾的字符串。</p>
and the <code>Py_FileSystemDefaultEncodeErrors</code> error handler.</p>
+
<p>如果未设置 <code>Py_FileSystemDefaultEncoding</code>,则回退到语言环境编码。</p>
<p>If <code>Py_FileSystemDefaultEncoding</code> is not set, fall back to the
+
<p>如果您知道字符串长度,请使用 [[#c.PyUnicode_DecodeFSDefaultAndSize|PyUnicode_DecodeFSDefaultAndSize()]]</p>
locale encoding.</p>
 
<p>Use [[#c.PyUnicode_DecodeFSDefaultAndSize|<code>PyUnicode_DecodeFSDefaultAndSize()</code>]] if you know the string length.</p>
 
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.6 版更改: </span>Use <code>Py_FileSystemDefaultEncodeErrors</code> error handler.</p>
+
<p><span class="versionmodified changed"> 3.6 版更改: </span> 使用 <code>Py_FileSystemDefaultEncodeErrors</code> 错误处理程序。</p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeFSDefault</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeFSDefault</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Encode a Unicode object to <code>Py_FileSystemDefaultEncoding</code> with the
+
<dd><p>使用 <code>Py_FileSystemDefaultEncodeErrors</code> 错误处理程序将 Unicode 对象编码为 <code>Py_FileSystemDefaultEncoding</code>,并返回 [[../../library/stdtypes#bytes|bytes]]。 请注意,生成的 [[../../library/stdtypes#bytes|bytes]] 对象可能包含空字节。</p>
<code>Py_FileSystemDefaultEncodeErrors</code> error handler, and return
+
<p>如果未设置 <code>Py_FileSystemDefaultEncoding</code>,则回退到语言环境编码。</p>
[[../../library/stdtypes#bytes|<code>bytes</code>]]. Note that the resulting [[../../library/stdtypes#bytes|<code>bytes</code>]] object may contain
+
<p><code>Py_FileSystemDefaultEncoding</code> 在启动时从语言环境编码初始化,以后无法修改。 如果需要将字符串编码为当前语言环境编码,请使用 [[#c.PyUnicode_EncodeLocale|PyUnicode_EncodeLocale()]]</p>
null bytes.</p>
 
<p>If <code>Py_FileSystemDefaultEncoding</code> is not set, fall back to the
 
locale encoding.</p>
 
<p><code>Py_FileSystemDefaultEncoding</code> is initialized at startup from the
 
locale encoding and cannot be modified later. If you need to encode a string
 
to the current locale encoding, use [[#c.PyUnicode_EncodeLocale|<code>PyUnicode_EncodeLocale()</code>]].</p>
 
 
<div class="admonition seealso">
 
<div class="admonition seealso">
  
<p>参见</p>
+
<p>也可以看看</p>
<p>The [[../sys#c|<code>Py_EncodeLocale()</code>]] function.</p>
+
<p>[[../sys#c|Py_EncodeLocale()]] 函数。</p>
  
 
</div>
 
</div>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.2 新版功能.</span></p>
+
<p><span class="versionmodified added">3.2 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.6 版更改: </span>Use <code>Py_FileSystemDefaultEncodeErrors</code> error handler.</p>
+
<p><span class="versionmodified changed"> 3.6 版更改: </span> 使用 <code>Py_FileSystemDefaultEncodeErrors</code> 错误处理程序。</p>
  
 
</div></dd></dl>
 
</div></dd></dl>
第1,036行: 第813行:
 
<div id="wchar-t-support" class="section">
 
<div id="wchar-t-support" class="section">
  
=== wchar_t Support ===
+
=== wchar_t 支持 ===
  
<code>wchar_t</code> support for platforms which support it:
+
<code>wchar_t</code> 对支持它的平台的支持:
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FromWideChar</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="n"><span class="pre">wchar_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">w</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_FromWideChar</code><span class="sig-paren">(</span>''const'' wchar_t *''w'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 从给定 ''大小'' <code>wchar_t</code> 缓冲区 ''w'' 创建一个 Unicode 对象。 将 <code>-1</code> 作为 ''size'' 传递表示函数必须自己计算长度,使用 wcslen。 失败时返回 <code>NULL</code>
<p>Create a Unicode object from the <code>wchar_t</code> buffer ''w'' of the given ''size''.
+
 
Passing <code>-1</code> as the ''size'' indicates that the function must itself compute the length,
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsWideChar</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">wchar_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">w</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
using wcslen.
 
Return <code>NULL</code> on failure.</p></dd></dl>
 
  
; Py_ssize_t <code>PyUnicode_AsWideChar</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', wchar_t *''w'', Py_ssize_t ''size''<span class="sig-paren">)</span>
+
: Unicode 对象内容复制到 <code>wchar_t</code> 缓冲区 ''w''。 最多复制 ''size'' <code>wchar_t</code> 个字符(不包括可能的尾随空终止字符)。 如果出错,返回复制的 <code>wchar_t</code> 个字符或 <code>-1</code> 个字符的数量。 请注意,生成的 <span class="xref c c-texpr">wchar_t*</span> 字符串可能会或可能不会以空字符结尾。 如果应用程序需要,调用者有责任确保 <span class="xref c c-texpr">wchar_t*</span> 字符串以空字符结尾。 另外,请注意 <span class="xref c c-texpr">wchar_t*</span> 字符串可能包含空字符,这会导致字符串在与大多数 C 函数一起使用时被截断。
: Copy the Unicode object contents into the <code>wchar_t</code> buffer ''w''. At most ''size'' <code>wchar_t</code> characters are copied (excluding a possibly trailing null termination character). Return the number of <code>wchar_t</code> characters copied or <code>-1</code> in case of an error. Note that the resulting <span class="xref c c-texpr">wchar_t*</span> string may or may not be null-terminated. It is the responsibility of the caller to make sure that the <span class="xref c c-texpr">wchar_t*</span> string is null-terminated in case this is required by the application. Also, note that the <span class="xref c c-texpr">wchar_t*</span> string might contain null characters, which would cause the string to be truncated when used with most C functions.
 
  
 
<dl>
 
<dl>
<dt>wchar_t *<code>PyUnicode_AsWideCharString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', Py_ssize_t *''size''<span class="sig-paren">)</span></dt>
+
<dt><span class="n"><span class="pre">wchar_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsWideCharString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>Convert the Unicode object to a wide character string. The output string
+
</dt>
always ends with a null character. If ''size'' is not <code>NULL</code>, write the number
+
<dd><p>Unicode 对象转换为宽字符串。 输出字符串总是以空字符结尾。 如果 ''size'' 不是 <code>NULL</code>,则将宽字符数(不包括尾随空终止符)写入 ''*size''。 请注意,生成的 <code>wchar_t</code> 字符串可能包含空字符,这会导致该字符串在与大多数 C 函数一起使用时被截断。 如果 ''size'' <code>NULL</code> 并且 <span class="xref c c-texpr">wchar_t*</span> 字符串包含空字符,则会引发 [[../../library/exceptions#ValueError|ValueError]]</p>
of wide characters (excluding the trailing null termination character) into
+
<p>成功时返回由 <code>PyMem_Alloc()</code> 分配的缓冲区(使用 [[../memory#c|PyMem_Free()]] 释放它)。 出错时,返回 <code>NULL</code> ''*size'' 未定义。 如果内存分配失败,则引发 [[../../library/exceptions#MemoryError|MemoryError]]</p>
''*size''. Note that the resulting <code>wchar_t</code> string might contain
 
null characters, which would cause the string to be truncated when used with
 
most C functions. If ''size'' is <code>NULL</code> and the <span class="xref c c-texpr">wchar_t*</span> string
 
contains null characters a [[../../library/exceptions#ValueError|<code>ValueError</code>]] is raised.</p>
 
<p>Returns a buffer allocated by <code>PyMem_Alloc()</code> (use
 
[[../memory#c|<code>PyMem_Free()</code>]] to free it) on success. On error, returns <code>NULL</code>
 
and ''*size'' is undefined. Raises a [[../../library/exceptions#MemoryError|<code>MemoryError</code>]] if memory allocation
 
is failed.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.2 新版功能.</span></p>
+
<p><span class="versionmodified added">3.2 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.7 版更改: </span>Raises a [[../../library/exceptions#ValueError|<code>ValueError</code>]] if ''size'' is <code>NULL</code> and the <span class="xref c c-texpr">wchar_t*</span>
+
<p><span class="versionmodified changed"> 3.7 版更改: </span> 如果 ''size'' <code>NULL</code> 并且 <span class="xref c c-texpr">wchar_t*</span> 字符串包含空字符,则会引发 [[../../library/exceptions#ValueError|ValueError]]。</p>
string contains null characters.</p>
 
  
 
</div></dd></dl>
 
</div></dd></dl>
第1,083行: 第848行:
  
 
<span id="builtincodecs"></span>
 
<span id="builtincodecs"></span>
== Built-in Codecs ==
+
== 内置编解码器 ==
  
Python provides a set of built-in codecs which are written in C for speed. All of
+
Python 提供了一组用 C 编写的内置编解码器以提高速度。 所有这些编解码器都可以通过以下功能直接使用。
these codecs are directly usable via the following functions.
 
  
Many of the following APIs take two arguments encoding and errors, and they
+
以下许多 API 采用编码和错误两个参数,它们与内置的 [[../../library/stdtypes#str|str()]] 字符串对象构造函数具有相同的语义。
have the same semantics as the ones of the built-in [[../../library/stdtypes#str|<code>str()</code>]] string object
 
constructor.
 
  
Setting encoding to <code>NULL</code> causes the default encoding to be used
+
将编码设置为 <code>NULL</code> 会导致使用默认编码,即 UTF-8。 文件系统调用应使用 [[#c.PyUnicode_FSConverter|PyUnicode_FSConverter()]] 对文件名进行编码。 这在内部使用变量 <code>Py_FileSystemDefaultEncoding</code>。 这个变量应该被视为只读:在某些系统上,它将是一个指向静态字符串的指针,在其他系统上,它会在运行时发生变化(例如当应用程序调用 setlocale 时)。
which is UTF-8. The file system calls should use
 
[[#c.PyUnicode_FSConverter|<code>PyUnicode_FSConverter()</code>]] for encoding file names. This uses the
 
variable <code>Py_FileSystemDefaultEncoding</code> internally. This
 
variable should be treated as read-only: on some systems, it will be a
 
pointer to a static string, on others, it will change at run-time
 
(such as when the application invokes setlocale).
 
  
Error handling is set by errors which may also be set to <code>NULL</code> meaning to use
+
错误处理由错误设置,错误也可以设置为 <code>NULL</code> 意味着使用为编解码器定义的默认处理。 所有内置编解码器的默认错误处理都是“严格的”(引发 [[../../library/exceptions#ValueError|ValueError]])。
the default handling defined for the codec. Default error handling for all
 
built-in codecs is &quot;strict&quot; ([[../../library/exceptions#ValueError|<code>ValueError</code>]] is raised).
 
  
The codecs all use a similar interface. Only deviation from the following
+
编解码器都使用类似的接口。 为简单起见,仅记录了与以下通用的偏差。
generic ones are documented for simplicity.
 
  
 
<div id="generic-codecs" class="section">
 
<div id="generic-codecs" class="section">
  
=== Generic Codecs ===
+
=== 通用编解码器 ===
  
These are the generic codec APIs:
+
这些是通用编解码器 API:
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Decode</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">encoding</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Decode</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''encoding'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 通过解码编码字符串 ''s'' ''size'' 字节来创建一个 Unicode 对象。 ''encoding'' ''errors'' [[../../library/stdtypes#str|str()]] 内置函数中的同名参数含义相同。 使用 Python 编解码器注册表查找要使用的编解码器。 如果编解码器引发异常,则返回 <code>NULL</code>
<p>Create a Unicode object by decoding ''size'' bytes of the encoded string ''s''.
 
''encoding'' and ''errors'' have the same meaning as the parameters of the same name
 
in the [[../../library/stdtypes#str|<code>str()</code>]] built-in function. The codec to be used is looked up
 
using the Python codec registry. Return <code>NULL</code> if an exception was raised by
 
the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsEncodedString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">encoding</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsEncodedString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', ''const'' char *''encoding'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object and return the result as Python bytes object.
 
''encoding'' and ''errors'' have the same meaning as the parameters of the same
 
name in the Unicode [[../../library/stdtypes#str|<code>encode()</code>]] method. The codec to be used is looked up
 
using the Python codec registry. Return <code>NULL</code> if an exception was raised by
 
the codec.</p></dd></dl>
 
  
<dl>
+
: 编码一个 Unicode 对象并将结果作为 Python 字节对象返回。 ''encoding'' ''errors'' Unicode [[../../library/stdtypes#str|encode()]] 方法中的同名参数含义相同。 使用 Python 编解码器注册表查找要使用的编解码器。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Encode</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''encoding'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer ''s'' of the given ''size'' and return a Python
 
bytes object. ''encoding'' and ''errors'' have the same meaning as the
 
parameters of the same name in the Unicode [[../../library/stdtypes#str|<code>encode()</code>]] method. The codec
 
to be used is looked up using the Python codec registry. Return <code>NULL</code> if an
 
exception was raised by the codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Encode</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">encoding</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
: 对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区 ''s'' 进行编码并返回 Python 字节对象。 ''encoding'' 和 ''errors'' 与 Unicode [[../../library/stdtypes#str|encode()]] 方法中的同名参数含义相同。 使用 Python 编解码器注册表查找要使用的编解码器。 如果编解码器引发异常,则返回 <code>NULL</code>
  
  
第1,150行: 第882行:
 
<div id="utf-8-codecs" class="section">
 
<div id="utf-8-codecs" class="section">
  
=== UTF-8 Codecs ===
+
=== UTF-8 编解码器 ===
 +
 
 +
这些是 UTF-8 编解码器 API:
 +
 
 +
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF8</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
  
These are the UTF-8 codec APIs:
+
: 通过解码 UTF-8 编码字符串 ''s'' 的 ''size'' 字节来创建一个 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>。
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF8Stateful</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">consumed</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF8</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 如果 ''consumed'' <code>NULL</code>,则行为类似于 [[#c.PyUnicode_DecodeUTF8|PyUnicode_DecodeUTF8()]]。 如果 ''consumed'' 不是 <code>NULL</code>,尾随不完整的 UTF-8 字节序列将不会被视为错误。 这些字节不会被解码,已经解码的字节数将存储在 ''consumed'' 中。
<p>Create a Unicode object by decoding ''size'' bytes of the UTF-8 encoded string
 
''s''. Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUTF8String</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF8Stateful</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', Py_ssize_t *''consumed''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>If ''consumed'' is <code>NULL</code>, behave like [[#c.PyUnicode_DecodeUTF8|<code>PyUnicode_DecodeUTF8()</code>]]. If
 
''consumed'' is not <code>NULL</code>, trailing incomplete UTF-8 byte sequences will not be
 
treated as an error. Those bytes will not be decoded and the number of bytes
 
that have been decoded will be stored in ''consumed''.</p></dd></dl>
 
  
<dl>
+
: 使用 UTF-8 对 Unicode 对象进行编码,并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsUTF8String</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object using UTF-8 and return the result as Python bytes
 
object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an exception was
 
raised by the codec.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>''const'' char *<code>PyUnicode_AsUTF8AndSize</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', Py_ssize_t *''size''<span class="sig-paren">)</span></dt>
+
<dt><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUTF8AndSize</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
<dd><p>Return a pointer to the UTF-8 encoding of the Unicode object, and
+
</dt>
store the size of the encoded representation (in bytes) in ''size''. The
+
<dd><p>返回指向 Unicode 对象的 UTF-8 编码的指针,并将编码表示的大小(以字节为单位)存储在 ''size'' 中。 ''size'' 参数可以是 <code>NULL</code>; 在这种情况下,不会存储任何大小。 返回的缓冲区总是附加一个额外的空字节(不包括在 ''size'' 中),无论是否有任何其他空代码点。</p>
''size'' argument can be <code>NULL</code>; in this case no size will be stored. The
+
<p>在出错的情况下,返回 <code>NULL</code> 并设置异常并且不存储 ''size''</p>
returned buffer always has an extra null byte appended (not included in
+
<p>这将字符串的 UTF-8 表示缓存在 Unicode 对象中,后续调用将返回指向同一缓冲区的指针。 调用者不负责释放缓冲区。</p>
''size''), regardless of whether there are any other null code points.</p>
 
<p>In the case of an error, <code>NULL</code> is returned with an exception set and no
 
''size'' is stored.</p>
 
<p>This caches the UTF-8 representation of the string in the Unicode object, and
 
subsequent calls will return a pointer to the same buffer. The caller is not
 
responsible for deallocating the buffer.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.7 版更改: </span>The return type is now <code>const char *</code> rather of <code>char *</code>.</p>
+
<p><span class="versionmodified changed"> 3.7 版更改: </span> 返回类型现在是 <code>const char *</code> 而不是 <code>char *</code></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
 
<dl>
 
<dl>
<dt>''const'' char *<code>PyUnicode_AsUTF8</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
+
<dt><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUTF8</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dd><p>As [[#c.PyUnicode_AsUTF8AndSize|<code>PyUnicode_AsUTF8AndSize()</code>]], but does not store the size.</p>
+
</dt>
 +
<dd><p>作为 [[#c.PyUnicode_AsUTF8AndSize|PyUnicode_AsUTF8AndSize()]],但不存储大小。</p>
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.7 版更改: </span>The return type is now <code>const char *</code> rather of <code>char *</code>.</p>
+
<p><span class="versionmodified changed"> 3.7 版更改: </span> 返回类型现在是 <code>const char *</code> 而不是 <code>char *</code></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeUTF8</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeUTF8</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer ''s'' of the given ''size'' using UTF-8 and
 
return a Python bytes object. Return <code>NULL</code> if an exception was raised by
 
the codec.</p>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
 
[[#c.PyUnicode_AsUTF8String|<code>PyUnicode_AsUTF8String()</code>]], [[#c.PyUnicode_AsUTF8AndSize|<code>PyUnicode_AsUTF8AndSize()</code>]] or
 
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
: 使用 UTF-8 对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区 ''s'' 进行编码,并返回 Python 字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>
  
  
第1,230行: 第938行:
 
<div id="utf-32-codecs" class="section">
 
<div id="utf-32-codecs" class="section">
  
=== UTF-32 Codecs ===
+
=== UTF-32 编解码器 ===
  
These are the UTF-32 codec APIs:
+
这些是 UTF-32 编解码器 API:
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF32</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', int *''byteorder''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF32</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">byteorder</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Decode ''size'' bytes from a UTF-32 encoded buffer string and return the
+
<dd><p>从 UTF-32 编码的缓冲区字符串中解码 ''size'' 字节并返回相应的 Unicode 对象。 ''errors''(如果非<code>NULL</code>)定义错误处理。 它默认为“严格”。</p>
corresponding Unicode object. ''errors'' (if non-<code>NULL</code>) defines the error
+
<p>如果 ''byteorder'' 是非 <code>NULL</code>,解码器开始使用给定的字节顺序解码:</p>
handling. It defaults to &quot;strict&quot;.</p>
 
<p>If ''byteorder'' is non-<code>NULL</code>, the decoder starts decoding using the given byte
 
order:</p>
 
 
<div class="highlight-c notranslate">
 
<div class="highlight-c notranslate">
  
 
<div class="highlight">
 
<div class="highlight">
  
<pre>*byteorder == -1: little endian
+
<syntaxhighlight lang="c">*byteorder == -1: little endian
 
*byteorder == 0:  native order
 
*byteorder == 0:  native order
*byteorder == 1:  big endian</pre>
+
*byteorder == 1:  big endian</syntaxhighlight>
  
 
</div>
 
</div>
  
 
</div>
 
</div>
<p>If <code>*byteorder</code> is zero, and the first four bytes of the input data are a
+
<p>如果 <code>*byteorder</code> 为零,并且输入数据的前四个字节是字节顺序标记 (BOM),则解码器切换到此字节顺序,并且不会将 BOM 复制到生成的 Unicode 字符串中。 如果 <code>*byteorder</code> <code>-1</code> <code>1</code>,任何字节顺序标记都会被复制到输出。</p>
byte order mark (BOM), the decoder switches to this byte order and the BOM is
+
<p>完成后,将''*byteorder''设置为输入数据末尾的当前字节顺序。</p>
not copied into the resulting Unicode string. If <code>*byteorder</code> is <code>-1</code> or
+
<p>如果 ''byteorder'' <code>NULL</code>,则编解码器以原生顺序模式启动。</p>
<code>1</code>, any byte order mark is copied to the output.</p>
+
<p>如果编解码器引发异常,则返回 <code>NULL</code></p></dd></dl>
<p>After completion, ''*byteorder'' is set to the current byte order at the end
+
 
of input data.</p>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF32Stateful</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">byteorder</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">consumed</span></span><span class="sig-paren">)</span><br />
<p>If ''byteorder'' is <code>NULL</code>, the codec starts in native order mode.</p>
+
 
<p>Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
+
: 如果 ''consumed'' 是 <code>NULL</code>,则行为类似于 [[#c.PyUnicode_DecodeUTF32|PyUnicode_DecodeUTF32()]]。 如果 ''consumed'' 不是 <code>NULL</code>,则 [[#c.PyUnicode_DecodeUTF32Stateful|PyUnicode_DecodeUTF32Stateful()]] 不会将尾随不完整的 UTF-32 字节序列(例如不能被四整除的字节数)视为错误。 这些字节不会被解码,已经解码的字节数将存储在 ''consumed'' 中。
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUTF32String</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF32Stateful</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', int *''byteorder'', Py_ssize_t *''consumed''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>If ''consumed'' is <code>NULL</code>, behave like [[#c.PyUnicode_DecodeUTF32|<code>PyUnicode_DecodeUTF32()</code>]]. If
 
''consumed'' is not <code>NULL</code>, [[#c.PyUnicode_DecodeUTF32Stateful|<code>PyUnicode_DecodeUTF32Stateful()</code>]] will not treat
 
trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
 
by four) as an error. Those bytes will not be decoded and the number of bytes
 
that have been decoded will be stored in ''consumed''.</p></dd></dl>
 
  
<dl>
+
: 以本机字节顺序使用 UTF-32 编码返回 Python 字节字符串。 该字符串始终以 BOM 标记开头。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsUTF32String</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Return a Python byte string using the UTF-32 encoding in native byte
 
order. The string always starts with a BOM mark. Error handling is &quot;strict&quot;.
 
Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeUTF32</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', int ''byteorder''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeUTF32</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">byteorder</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Return a Python bytes object holding the UTF-32 encoded value of the Unicode
+
<dd><p>返回一个 Python 字节对象,其中包含 ''s'' 中 Unicode 数据的 UTF-32 编码值。 输出按照以下字节顺序写入:</p>
data in ''s''. Output is written according to the following byte order:</p>
 
 
<div class="highlight-c notranslate">
 
<div class="highlight-c notranslate">
  
 
<div class="highlight">
 
<div class="highlight">
  
<pre>byteorder == -1: little endian
+
<syntaxhighlight lang="c">byteorder == -1: little endian
 
byteorder == 0:  native byte order (writes a BOM mark)
 
byteorder == 0:  native byte order (writes a BOM mark)
byteorder == 1:  big endian</pre>
+
byteorder == 1:  big endian</syntaxhighlight>
  
 
</div>
 
</div>
  
 
</div>
 
</div>
<p>If byteorder is <code>0</code>, the output string will always start with the Unicode BOM
+
<p>如果 byteorder <code>0</code>,则输出字符串将始终以 Unicode BOM 标记 (U+FEFF) 开头。 在其他两种模式中,没有预先添加 BOM 标记。</p>
mark (U+FEFF). In the other two modes, no BOM mark is prepended.</p>
+
<p>如果未定义 <code>Py_UNICODE_WIDE</code>,则代理对将作为单个代码点输出。</p>
<p>If <code>Py_UNICODE_WIDE</code> is not defined, surrogate pairs will be output
+
<p>如果编解码器引发异常,则返回 <code>NULL</code></p></dd></dl>
as a single code point.</p>
 
<p>Return <code>NULL</code> if an exception was raised by the codec.</p>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
 
[[#c.PyUnicode_AsUTF32String|<code>PyUnicode_AsUTF32String()</code>]] or [[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
 
 
</div></dd></dl>
 
  
  
第1,310行: 第994行:
 
<div id="utf-16-codecs" class="section">
 
<div id="utf-16-codecs" class="section">
  
=== UTF-16 Codecs ===
+
=== UTF-16 编解码器 ===
  
These are the UTF-16 codec APIs:
+
这些是 UTF-16 编解码器 API:
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF16</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', int *''byteorder''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF16</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">byteorder</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Decode ''size'' bytes from a UTF-16 encoded buffer string and return the
+
<dd><p>从 UTF-16 编码的缓冲区字符串中解码 ''size'' 字节并返回相应的 Unicode 对象。 ''errors''(如果非<code>NULL</code>)定义错误处理。 它默认为“严格”。</p>
corresponding Unicode object. ''errors'' (if non-<code>NULL</code>) defines the error
+
<p>如果 ''byteorder'' 是非 <code>NULL</code>,解码器开始使用给定的字节顺序解码:</p>
handling. It defaults to &quot;strict&quot;.</p>
 
<p>If ''byteorder'' is non-<code>NULL</code>, the decoder starts decoding using the given byte
 
order:</p>
 
 
<div class="highlight-c notranslate">
 
<div class="highlight-c notranslate">
  
 
<div class="highlight">
 
<div class="highlight">
  
<pre>*byteorder == -1: little endian
+
<syntaxhighlight lang="c">*byteorder == -1: little endian
 
*byteorder == 0:  native order
 
*byteorder == 0:  native order
*byteorder == 1:  big endian</pre>
+
*byteorder == 1:  big endian</syntaxhighlight>
  
 
</div>
 
</div>
  
 
</div>
 
</div>
<p>If <code>*byteorder</code> is zero, and the first two bytes of the input data are a
+
<p>如果 <code>*byteorder</code> 为零,并且输入数据的前两个字节是字节顺序标记 (BOM),则解码器切换到此字节顺序,并且不会将 BOM 复制到生成的 Unicode 字符串中。 如果 <code>*byteorder</code> <code>-1</code> <code>1</code>,则任何字节顺序标记都被复制到输出(它将导致 <code>\ufeff</code> <code>\ufffe</code> 字符)。</p>
byte order mark (BOM), the decoder switches to this byte order and the BOM is
+
<p>完成后,将''*byteorder''设置为输入数据末尾的当前字节顺序。</p>
not copied into the resulting Unicode string. If <code>*byteorder</code> is <code>-1</code> or
+
<p>如果 ''byteorder'' <code>NULL</code>,则编解码器以原生顺序模式启动。</p>
<code>1</code>, any byte order mark is copied to the output (where it will result in
+
<p>如果编解码器引发异常,则返回 <code>NULL</code></p></dd></dl>
either a <code>\ufeff</code> or a <code>\ufffe</code> character).</p>
 
<p>After completion, ''*byteorder'' is set to the current byte order at the end
 
of input data.</p>
 
<p>If ''byteorder'' is <code>NULL</code>, the codec starts in native order mode.</p>
 
<p>Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF16Stateful</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">byteorder</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">consumed</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF16Stateful</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', int *''byteorder'', Py_ssize_t *''consumed''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 如果 ''consumed'' <code>NULL</code>,则行为类似于 [[#c.PyUnicode_DecodeUTF16|PyUnicode_DecodeUTF16()]]。 如果 ''consumed'' 不是 <code>NULL</code>,则 [[#c.PyUnicode_DecodeUTF16Stateful|PyUnicode_DecodeUTF16Stateful()]] 将不会处理尾随不完整的 UTF-16 字节序列(例如奇数字节或拆分代理对)作为错误。 这些字节不会被解码,已经解码的字节数将存储在 ''consumed'' 中。
<p>If ''consumed'' is <code>NULL</code>, behave like [[#c.PyUnicode_DecodeUTF16|<code>PyUnicode_DecodeUTF16()</code>]]. If
+
 
''consumed'' is not <code>NULL</code>, [[#c.PyUnicode_DecodeUTF16Stateful|<code>PyUnicode_DecodeUTF16Stateful()</code>]] will not treat
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUTF16String</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
 
split surrogate pair) as an error. Those bytes will not be decoded and the
 
number of bytes that have been decoded will be stored in ''consumed''.</p></dd></dl>
 
  
<dl>
+
: 以本机字节顺序使用 UTF-16 编码返回 Python 字节字符串。 该字符串始终以 BOM 标记开头。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsUTF16String</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Return a Python byte string using the UTF-16 encoding in native byte
 
order. The string always starts with a BOM mark. Error handling is &quot;strict&quot;.
 
Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeUTF16</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', int ''byteorder''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeUTF16</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">byteorder</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Return a Python bytes object holding the UTF-16 encoded value of the Unicode
+
<dd><p>返回一个 Python 字节对象,其中包含 ''s'' 中 Unicode 数据的 UTF-16 编码值。 输出按照以下字节顺序写入:</p>
data in ''s''. Output is written according to the following byte order:</p>
 
 
<div class="highlight-c notranslate">
 
<div class="highlight-c notranslate">
  
 
<div class="highlight">
 
<div class="highlight">
  
<pre>byteorder == -1: little endian
+
<syntaxhighlight lang="c">byteorder == -1: little endian
 
byteorder == 0:  native byte order (writes a BOM mark)
 
byteorder == 0:  native byte order (writes a BOM mark)
byteorder == 1:  big endian</pre>
+
byteorder == 1:  big endian</syntaxhighlight>
  
 
</div>
 
</div>
  
 
</div>
 
</div>
<p>If byteorder is <code>0</code>, the output string will always start with the Unicode BOM
+
<p>如果 byteorder <code>0</code>,则输出字符串将始终以 Unicode BOM 标记 (U+FEFF) 开头。 在其他两种模式中,没有预先添加 BOM 标记。</p>
mark (U+FEFF). In the other two modes, no BOM mark is prepended.</p>
+
<p>如果定义了 <code>Py_UNICODE_WIDE</code>,则单个 [[#c.Py_UNICODE|Py_UNICODE]] 值可能会表示为代理对。 如果未定义,则每个 [[#c.Py_UNICODE|Py_UNICODE]] 值都被解释为 UCS-2 字符。</p>
<p>If <code>Py_UNICODE_WIDE</code> is defined, a single [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] value may get
+
<p>如果编解码器引发异常,则返回 <code>NULL</code></p></dd></dl>
represented as a surrogate pair. If it is not defined, each [[#c.Py_UNICODE|<code>Py_UNICODE</code>]]
 
values is interpreted as a UCS-2 character.</p>
 
<p>Return <code>NULL</code> if an exception was raised by the codec.</p>
 
<div class="deprecated-removed">
 
 
 
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
 
[[#c.PyUnicode_AsUTF16String|<code>PyUnicode_AsUTF16String()</code>]] or [[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
 
 
</div></dd></dl>
 
  
  
第1,392行: 第1,050行:
 
<div id="utf-7-codecs" class="section">
 
<div id="utf-7-codecs" class="section">
  
=== UTF-7 Codecs ===
+
=== UTF-7 编解码器 ===
  
These are the UTF-7 codec APIs:
+
这些是 UTF-7 编解码器 API:
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF7</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF7</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Create a Unicode object by decoding ''size'' bytes of the UTF-7 encoded string
 
''s''. Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
: 通过解码 UTF-7 编码字符串 ''s'' ''size'' 字节来创建一个 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUTF7Stateful</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', Py_ssize_t *''consumed''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>If ''consumed'' is <code>NULL</code>, behave like [[#c.PyUnicode_DecodeUTF7|<code>PyUnicode_DecodeUTF7()</code>]]. If
 
''consumed'' is not <code>NULL</code>, trailing incomplete UTF-7 base-64 sections will not
 
be treated as an error. Those bytes will not be decoded and the number of
 
bytes that have been decoded will be stored in ''consumed''.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUTF7Stateful</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">consumed</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeUTF7</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', int ''base64SetO'', int ''base64WhiteSpace'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given size using UTF-7 and
 
return a Python bytes object. Return <code>NULL</code> if an exception was raised by
 
the codec.</p>
 
<p>If ''base64SetO'' is nonzero, &quot;Set O&quot; (punctuation that has no otherwise
 
special meaning) will be encoded in base-64. If ''base64WhiteSpace'' is
 
nonzero, whitespace will be encoded in base-64. Both are set to zero for the
 
Python &quot;utf-7&quot; codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
: 如果 ''consumed'' 是 <code>NULL</code>,则行为类似于 [[#c.PyUnicode_DecodeUTF7|PyUnicode_DecodeUTF7()]]。 如果 ''consumed'' 不是 <code>NULL</code>,尾随不完整的 UTF-7 base-64 部分将不会被视为错误。 这些字节不会被解码,已经解码的字节数将存储在 ''consumed'' 中。
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
<dl>
 +
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeUTF7</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">base64SetO</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">base64WhiteSpace</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
 +
</dt>
 +
<dd><p>使用 UTF-7 编码给定大小的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区并返回 Python 字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>。</p>
 +
<p>如果 ''base64SetO'' 非零,“Set O”(没有其他特殊含义的标点符号)将以 base-64 编码。 如果 ''base64WhiteSpace'' 非零,空格将被编码为 base-64。 对于 Python“utf-7”编解码器,两者都设置为零。</p></dd></dl>
  
  
第1,431行: 第1,072行:
 
<div id="unicode-escape-codecs" class="section">
 
<div id="unicode-escape-codecs" class="section">
  
=== Unicode-Escape Codecs ===
+
=== Unicode 转义编解码器 ===
 +
 
 +
这些是“Unicode Escape”编解码器 API:
  
These are the &quot;Unicode Escape&quot; codec APIs:
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeUnicodeEscape</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
  
<dl>
+
: 通过解码 Unicode-Escape 编码字符串 ''s'' ''size'' 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeUnicodeEscape</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Create a Unicode object by decoding ''size'' bytes of the Unicode-Escape encoded
 
string ''s''. Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsUnicodeEscapeString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsUnicodeEscapeString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object using Unicode-Escape and return the result as a
 
bytes object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an exception was
 
raised by the codec.</p></dd></dl>
 
  
<dl>
+
: 使用 Unicode-Escape 对 Unicode 对象进行编码,并将结果作为字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeUnicodeEscape</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' using Unicode-Escape and
 
return a bytes object. Return <code>NULL</code> if an exception was raised by the codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeUnicodeEscape</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_AsUnicodeEscapeString|<code>PyUnicode_AsUnicodeEscapeString()</code>]].</p>
 
  
</div></dd></dl>
+
: 使用 Unicode-Escape 对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区进行编码并返回一个字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>
  
  
第1,464行: 第1,092行:
 
<div id="raw-unicode-escape-codecs" class="section">
 
<div id="raw-unicode-escape-codecs" class="section">
  
=== Raw-Unicode-Escape Codecs ===
+
=== 原始 Unicode 转义编解码器 ===
 +
 
 +
这些是“原始 Unicode 转义”编解码器 API:
  
These are the &quot;Raw Unicode Escape&quot; codec APIs:
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeRawUnicodeEscape</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
  
<dl>
+
: 通过解码 Raw-Unicode-Escape 编码字符串 ''s'' ''size'' 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeRawUnicodeEscape</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Create a Unicode object by decoding ''size'' bytes of the Raw-Unicode-Escape
 
encoded string ''s''. Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsRawUnicodeEscapeString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsRawUnicodeEscapeString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object using Raw-Unicode-Escape and return the result as
 
a bytes object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an exception
 
was raised by the codec.</p></dd></dl>
 
  
<dl>
+
: 使用 Raw-Unicode-Escape 对 Unicode 对象进行编码,并将结果作为字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeRawUnicodeEscape</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' using Raw-Unicode-Escape
 
and return a bytes object. Return <code>NULL</code> if an exception was raised by the codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeRawUnicodeEscape</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_AsRawUnicodeEscapeString|<code>PyUnicode_AsRawUnicodeEscapeString()</code>]] or
 
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
: 使用 Raw-Unicode-Escape 对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区进行编码,并返回一个字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>
  
  
第1,498行: 第1,112行:
 
<div id="latin-1-codecs" class="section">
 
<div id="latin-1-codecs" class="section">
  
=== Latin-1 Codecs ===
+
=== 拉丁 1 编解码器 ===
  
These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
+
这些是 Latin-1 编解码器 API:Latin-1 对应于前 256 Unicode 序数,编解码器在编码期间只接受这些。
ordinals and only these are accepted by the codecs during encoding.
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeLatin1</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeLatin1</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 通过解码 Latin-1 编码字符串 ''s'' ''size'' 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>
<p>Create a Unicode object by decoding ''size'' bytes of the Latin-1 encoded string
 
''s''. Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsLatin1String</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsLatin1String</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object using Latin-1 and return the result as Python bytes
 
object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an exception was
 
raised by the codec.</p></dd></dl>
 
  
<dl>
+
: 使用 Latin-1 对 Unicode 对象进行编码,并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeLatin1</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' using Latin-1 and
 
return a Python bytes object. Return <code>NULL</code> if an exception was raised by
 
the codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeLatin1</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_AsLatin1String|<code>PyUnicode_AsLatin1String()</code>]] or
 
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
: 使用 Latin-1 对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区进行编码,并返回 Python 字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>
  
  
第1,534行: 第1,132行:
 
<div id="ascii-codecs" class="section">
 
<div id="ascii-codecs" class="section">
  
=== ASCII Codecs ===
+
=== ASCII 编解码器 ===
  
These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
+
这些是 ASCII 编解码器 API。 仅接受 7 ASCII 数据。 所有其他代码都会产生错误。
codes generate errors.
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeASCII</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeASCII</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 通过解码 ASCII 编码字符串 ''s'' ''size'' 字节来创建一个 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>
<p>Create a Unicode object by decoding ''size'' bytes of the ASCII encoded string
 
''s''. Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsASCIIString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsASCIIString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object using ASCII and return the result as Python bytes
 
object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an exception was
 
raised by the codec.</p></dd></dl>
 
  
<dl>
+
: 使用 ASCII 编码 Unicode 对象并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeASCII</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' using ASCII and
 
return a Python bytes object. Return <code>NULL</code> if an exception was raised by
 
the codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeASCII</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_AsASCIIString|<code>PyUnicode_AsASCIIString()</code>]] or
 
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
: 使用 ASCII 编码给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区并返回 Python 字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>
  
  
第1,570行: 第1,152行:
 
<div id="character-map-codecs" class="section">
 
<div id="character-map-codecs" class="section">
  
=== Character Map Codecs ===
+
=== 字符映射编解码器 ===
  
This codec is special in that it can be used to implement many different codecs
+
这个编解码器的特殊之处在于它可以用来实现许多不同的编解码器(实际上这是为了获得 <code>encodings</code> 包中包含的大多数标准编解码器所做的)。 编解码器使用映射来编码和解码字符。 提供的映射对象必须支持<code>__getitem__()</code>映射接口; 字典和序列运行良好。
(and this is in fact what was done to obtain most of the standard codecs
 
included in the <code>encodings</code> package). The codec uses mapping to encode and
 
decode characters. The mapping objects provided must support the
 
<code>__getitem__()</code> mapping interface; dictionaries and sequences work well.
 
  
These are the mapping codec APIs:
+
这些是映射编解码器 API:
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeCharmap</code><span class="sig-paren">(</span>''const'' char *''data'', Py_ssize_t ''size'', [[../structures#c|PyObject]] *''mapping'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeCharmap</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">data</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">mapping</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Create a Unicode object by decoding ''size'' bytes of the encoded string ''s''
+
<dd><p>通过使用给定的 ''mapping'' 对象解码编码字符串 ''s'' ''size'' 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code></p>
using the given ''mapping'' object. Return <code>NULL</code> if an exception was raised
+
<p>如果 ''mapping'' <code>NULL</code>,则将应用 Latin-1 解码。 否则 ''mapping'' 必须将字节序数(0 到 255 范围内的整数)映射到 Unicode 字符串、整数(然后被解释为 Unicode 序数)或 <code>None</code>。 未映射的数据字节——导致 [[../../library/exceptions#LookupError|LookupError]] 以及映射到 <code>None</code><code>0xFFFE</code> <code>'\ufffe'</code> 的数据字节被视为未定义的映射并导致错误。</p></dd></dl>
by the codec.</p>
 
<p>If ''mapping'' is <code>NULL</code>, Latin-1 decoding will be applied. Else
 
''mapping'' must map bytes ordinals (integers in the range from 0 to 255)
 
to Unicode strings, integers (which are then interpreted as Unicode
 
ordinals) or <code>None</code>. Unmapped data bytes -- ones which cause a
 
[[../../library/exceptions#LookupError|<code>LookupError</code>]], as well as ones which get mapped to <code>None</code>,
 
<code>0xFFFE</code> or <code>'\ufffe'</code>, are treated as undefined mappings and cause
 
an error.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsCharmapString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode'', [[../structures#c|PyObject]] *''mapping''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsCharmapString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">mapping</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Encode a Unicode object using the given ''mapping'' object and return the
+
<dd><p>使用给定的 ''mapping'' 对象对 Unicode 对象进行编码,并将结果作为字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code></p>
result as a bytes object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an
+
<p>''mapping'' 对象必须将 Unicode 序数整数映射到字节对象,范围为 0 255 <code>None</code> 的整数。 未映射的字符序数(导致 [[../../library/exceptions#LookupError|LookupError]])以及映射到 <code>None</code> 的字符序数被视为“未定义映射”并导致错误。</p></dd></dl>
exception was raised by the codec.</p>
 
<p>The ''mapping'' object must map Unicode ordinal integers to bytes objects,
 
integers in the range from 0 to 255 or <code>None</code>. Unmapped character
 
ordinals (ones which cause a [[../../library/exceptions#LookupError|<code>LookupError</code>]]) as well as mapped to
 
<code>None</code> are treated as &quot;undefined mapping&quot; and cause an error.</p></dd></dl>
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeCharmap</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">mapping</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeCharmap</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', [[../structures#c|PyObject]] *''mapping'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' using the given
 
''mapping'' object and return the result as a bytes object. Return <code>NULL</code> if
 
an exception was raised by the codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
: 使用给定的 ''mapping'' 对象对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区进行编码,并将结果作为字节对象返回。 如果编解码器引发异常,则返回 <code>NULL</code>
[[#c.PyUnicode_AsCharmapString|<code>PyUnicode_AsCharmapString()</code>]] or
 
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
  
</div></dd></dl>
+
以下编解码器 API 的特殊之处在于将 Unicode 映射到 Unicode。
 
 
The following codec API is special in that maps Unicode to Unicode.
 
 
 
<dl>
 
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Translate</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', [[../structures#c|PyObject]] *''table'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Translate a string by applying a character mapping table to it and return the
 
resulting Unicode object. Return <code>NULL</code> if an exception was raised by the
 
codec.</p>
 
<p>The mapping table must map Unicode ordinal integers to Unicode ordinal integers
 
or <code>None</code> (causing deletion of the character).</p>
 
<p>Mapping tables need only provide the <code>__getitem__()</code> interface; dictionaries
 
and sequences work well. Unmapped character ordinals (ones which cause a
 
[[../../library/exceptions#LookupError|<code>LookupError</code>]]) are left untouched and are copied as-is.</p>
 
<p>''errors'' has the usual meaning for codecs. It may be <code>NULL</code> which indicates to
 
use the default error handling.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_TranslateCharmap</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', [[../structures#c|PyObject]] *''mapping'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Translate</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">table</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Translate a [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' by applying a
+
<dd><p>通过向字符串应用字符映射表来翻译字符串并返回结果 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>。</p>
character ''mapping'' table to it and return the resulting Unicode object.
+
<p>映射表必须将 Unicode 序数整数映射到 Unicode 序数整数或 <code>None</code>(导致字符删除)。</p>
Return <code>NULL</code> when an exception was raised by the codec.</p>
+
<p>映射表只需提供<code>__getitem__()</code>接口; 字典和序列运行良好。 未映射的字符序数(导致 [[../../library/exceptions#LookupError|LookupError]])保持不变并按原样复制。</p>
<div class="deprecated-removed">
+
<p>''errors'' 具有编解码器的通常含义。 它可能是 <code>NULL</code> 表示使用默认错误处理。</p></dd></dl>
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_TranslateCharmap</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">mapping</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
[[#c.PyUnicode_Translate|<code>PyUnicode_Translate()</code>]]. or [[../codec#codec-registry|<span class="std std-ref">generic codec based API</span>]]</p>
 
  
</div></dd></dl>
+
: 通过向其应用字符 ''mapping'' 表来转换给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区并返回生成的 Unicode 对象。 当编解码器引发异常时返回 <code>NULL</code>
  
  
第1,652行: 第1,192行:
 
<div id="mbcs-codecs-for-windows" class="section">
 
<div id="mbcs-codecs-for-windows" class="section">
  
=== MBCS codecs for Windows ===
+
=== 适用于 Windows 的 MBCS 编解码器 ===
  
These are the MBCS codec APIs. They are currently only available on Windows and
+
这些是 MBCS 编解码器 API。 它们目前仅在 Windows 上可用并使用 Win32 MBCS 转换器来实现转换。 请注意,MBCS(或 DBCS)是一类编码,而不仅仅是一种。 目标编码由运行编解码器的机器上的用户设置定义。
use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
 
DBCS) is a class of encodings, not just one. The target encoding is defined by
 
the user settings on the machine running the codec.
 
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeMBCS</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeMBCS</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 通过解码 MBCS 编码字符串 ''s'' ''size'' 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 <code>NULL</code>。
<p>Create a Unicode object by decoding ''size'' bytes of the MBCS encoded string ''s''.
+
 
Return <code>NULL</code> if an exception was raised by the codec.</p></dd></dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_DecodeMBCSStateful</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">consumed</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 如果 ''consumed'' <code>NULL</code>,则行为类似于 [[#c.PyUnicode_DecodeMBCS|PyUnicode_DecodeMBCS()]]。 如果 ''consumed'' 不是 <code>NULL</code>,则 [[#c.PyUnicode_DecodeMBCSStateful|PyUnicode_DecodeMBCSStateful()]] 将不会解码尾随前导字节,并且已解码的字节数将存储在 consumed[ X191X]。
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_AsMBCSString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_DecodeMBCSStateful</code><span class="sig-paren">(</span>''const'' char *''s'', Py_ssize_t ''size'', ''const'' char *''errors'', Py_ssize_t *''consumed''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>If ''consumed'' is <code>NULL</code>, behave like [[#c.PyUnicode_DecodeMBCS|<code>PyUnicode_DecodeMBCS()</code>]]. If
 
''consumed'' is not <code>NULL</code>, [[#c.PyUnicode_DecodeMBCSStateful|<code>PyUnicode_DecodeMBCSStateful()</code>]] will not decode
 
trailing lead byte and the number of bytes that have been decoded will be stored
 
in ''consumed''.</p></dd></dl>
 
  
<dl>
+
: 使用 MBCS 对 Unicode 对象进行编码,并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 <code>NULL</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_AsMBCSString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''unicode''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode a Unicode object using MBCS and return the result as Python bytes
 
object. Error handling is &quot;strict&quot;. Return <code>NULL</code> if an exception was
 
raised by the codec.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeCodePage</code><span class="sig-paren">(</span>int ''code_page'', [[../structures#c|PyObject]] *''unicode'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeCodePage</span></span></span><span class="sig-paren">(</span><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">code_page</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">unicode</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Encode the Unicode object using the specified code page and return a Python
+
<dd><p>使用指定的代码页对 Unicode 对象进行编码并返回 Python 字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>。 使用 <code>CP_ACP</code> 代码页获取 MBCS 编码器。</p>
bytes object. Return <code>NULL</code> if an exception was raised by the codec. Use
 
<code>CP_ACP</code> code page to get the MBCS encoder.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_EncodeMBCS</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span>[[#c.Py_UNICODE|<span class="n"><span class="pre">Py_UNICODE</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">size</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">errors</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_EncodeMBCS</code><span class="sig-paren">(</span>''const'' [[#c.Py_UNICODE|Py_UNICODE]] *''s'', Py_ssize_t ''size'', ''const'' char *''errors''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Encode the [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] buffer of the given ''size'' using MBCS and return
 
a Python bytes object. Return <code>NULL</code> if an exception was raised by the
 
codec.</p>
 
<div class="deprecated-removed">
 
  
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style [[#c.Py_UNICODE|<code>Py_UNICODE</code>]] API; please migrate to using
+
: 使用 MBCS 对给定 ''size'' 的 [[#c.Py_UNICODE|Py_UNICODE]] 缓冲区进行编码并返回 Python 字节对象。 如果编解码器引发异常,则返回 <code>NULL</code>
[[#c.PyUnicode_AsMBCSString|<code>PyUnicode_AsMBCSString()</code>]], [[#c.PyUnicode_EncodeCodePage|<code>PyUnicode_EncodeCodePage()</code>]] or
 
[[#c.PyUnicode_AsEncodedString|<code>PyUnicode_AsEncodedString()</code>]].</p>
 
 
 
</div></dd></dl>
 
  
  
第1,710行: 第1,226行:
 
<div id="methods-slots" class="section">
 
<div id="methods-slots" class="section">
  
=== Methods &amp; Slots ===
+
=== 方法和插槽 ===
  
  
第1,719行: 第1,235行:
  
 
<span id="unicodemethodsandslots"></span>
 
<span id="unicodemethodsandslots"></span>
== Methods and Slot Functions ==
+
== 方法和槽函数 ==
  
The following APIs are capable of handling Unicode objects and strings on input
+
以下 API 能够处理输入时的 Unicode 对象和字符串(我们在描述中将它们称为字符串)并适当地返回 Unicode 对象或整数。
(we refer to them as strings in the descriptions) and return Unicode objects or
 
integers as appropriate.
 
  
They all return <code>NULL</code> or <code>-1</code> if an exception occurs.
+
如果发生异常,它们都返回 <code>NULL</code> <code>-1</code>
  
<dl>
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Concat</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">left</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">right</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Concat</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''left'', [[../structures#c|PyObject]] *''right''<span class="sig-paren">)</span></dt>
+
 
<dd><p>''Return value: New reference.''</p>
+
: 连接两个字符串,给出一个新的 Unicode 字符串。
<p>Concat two strings giving a new Unicode string.</p></dd></dl>
+
 
 +
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Split</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">sep</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">maxsplit</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 拆分一个字符串,给出一个 Unicode 字符串列表。 如果 ''sep'' 是 <code>NULL</code>,则将在所有空白子串上进行拆分。 否则,在给定的分隔符处发生拆分。 最多会完成 ''maxsplit'' 次分割。 如果为负,则不设置限制。 结果列表中不包含分隔符。
 +
 
 +
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Splitlines</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">s</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">keepend</span></span><span class="sig-paren">)</span><br />
 +
 
 +
: 在换行符处拆分 Unicode 字符串,返回 Unicode 字符串列表。 CRLF 被认为是一个换行符。 如果 ''keepend'' 是 <code>0</code>,则结果字符串中不包含换行符。
 +
 
 +
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Join</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">separator</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">seq</span></span><span class="sig-paren">)</span><br />
  
<dl>
+
: 使用给定的 ''separator'' 连接字符串序列并返回结果 Unicode 字符串。
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Split</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''s'', [[../structures#c|PyObject]] *''sep'', Py_ssize_t ''maxsplit''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Split a string giving a list of Unicode strings. If ''sep'' is <code>NULL</code>, splitting
 
will be done at all whitespace substrings. Otherwise, splits occur at the given
 
separator. At most ''maxsplit'' splits will be done. If negative, no limit is
 
set. Separators are not included in the resulting list.</p></dd></dl>
 
  
<dl>
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Tailmatch</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">substr</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">end</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">direction</span></span><span class="sig-paren">)</span><br />
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Splitlines</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''s'', int ''keepend''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Split a Unicode string at line breaks, returning a list of Unicode strings.
 
CRLF is considered to be one line break. If ''keepend'' is <code>0</code>, the Line break
 
characters are not included in the resulting strings.</p></dd></dl>
 
  
<dl>
+
: 如果 ''substr'' 在给定的尾端匹配 <code>str[start:end]</code>,则返回 <code>1</code>''direction'' == <code>-1</code> 表示进行前缀匹配, ''direction'' == <code>1</code> 后缀匹配),否则为 <code>0</code>。 如果发生错误,则返回 <code>-1</code>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Join</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''separator'', [[../structures#c|PyObject]] *''seq''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Join a sequence of strings using the given ''separator'' and return the resulting
 
Unicode string.</p></dd></dl>
 
  
; Py_ssize_t <code>PyUnicode_Tailmatch</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', [[../structures#c|PyObject]] *''substr'', Py_ssize_t ''start'', Py_ssize_t ''end'', int ''direction''<span class="sig-paren">)</span>
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Find</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">substr</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">end</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">direction</span></span><span class="sig-paren">)</span><br />
: Return <code>1</code> if ''substr'' matches <code>str[start:end]</code> at the given tail end (''direction'' == <code>-1</code> means to do a prefix match, ''direction'' == <code>1</code> a suffix match), <code>0</code> otherwise. Return <code>-1</code> if an error occurred.
 
  
; Py_ssize_t <code>PyUnicode_Find</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', [[../structures#c|PyObject]] *''substr'', Py_ssize_t ''start'', Py_ssize_t ''end'', int ''direction''<span class="sig-paren">)</span>
+
: 使用给定的 ''direction'' 返回 <code>str[start:end]</code> ''substr'' 的第一个位置(''direction'' == <code>1</code> 表示做一个前向搜索,''direction'' == <code>-1</code> 向后搜索)。 返回值是第一个匹配的索引; <code>-1</code> 的值表示未找到匹配项,<code>-2</code> 表示发生错误并设置了异常。
: Return the first position of ''substr'' in <code>str[start:end]</code> using the given ''direction'' (''direction'' == <code>1</code> means to do a forward search, ''direction'' == <code>-1</code> a backward search). The return value is the index of the first match; a value of <code>-1</code> indicates that no match was found, and <code>-2</code> indicates that an error occurred and an exception has been set.
 
  
 
<dl>
 
<dl>
<dt>Py_ssize_t <code>PyUnicode_FindChar</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', [[#c.Py_UCS4|Py_UCS4]] ''ch'', Py_ssize_t ''start'', Py_ssize_t ''end'', int ''direction''<span class="sig-paren">)</span></dt>
+
<dt><span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_FindChar</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, [[#c.Py_UCS4|<span class="n"><span class="pre">Py_UCS4</span></span>]]<span class="w"> </span><span class="n"><span class="pre">ch</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">end</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">direction</span></span><span class="sig-paren">)</span><br />
<dd><p>Return the first position of the character ''ch'' in <code>str[start:end]</code> using
+
</dt>
the given ''direction'' (''direction'' == <code>1</code> means to do a forward search,
+
<dd><p>使用给定的 ''direction'' 返回字符 ''ch'' <code>str[start:end]</code> 中的第一个位置(''direction'' == <code>1</code> 表示做向前搜索,''direction'' == <code>-1</code> 向后搜索)。 返回值是第一个匹配的索引; <code>-1</code> 的值表示未找到匹配项,<code>-2</code> 表示发生错误并设置了异常。</p>
''direction'' == <code>-1</code> a backward search). The return value is the index of the
 
first match; a value of <code>-1</code> indicates that no match was found, and <code>-2</code>
 
indicates that an error occurred and an exception has been set.</p>
 
 
<div class="versionadded">
 
<div class="versionadded">
  
<p><span class="versionmodified added">3.3 新版功能.</span></p>
+
<p><span class="versionmodified added">3.3 版中的新功能。</span></p>
  
 
</div>
 
</div>
 
<div class="versionchanged">
 
<div class="versionchanged">
  
<p><span class="versionmodified changed">3.7 版更改: </span>''start'' and ''end'' are now adjusted to behave like <code>str[start:end]</code>.</p>
+
<p><span class="versionmodified changed"> 3.7 版更改:</span>''start'' ''end'' 现在调整为类似于 <code>str[start:end]</code></p>
  
 
</div></dd></dl>
 
</div></dd></dl>
  
; Py_ssize_t <code>PyUnicode_Count</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', [[../structures#c|PyObject]] *''substr'', Py_ssize_t ''start'', Py_ssize_t ''end''<span class="sig-paren">)</span>
+
; <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Count</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">substr</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">start</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">end</span></span><span class="sig-paren">)</span><br />
: Return the number of non-overlapping occurrences of ''substr'' in <code>str[start:end]</code>. Return <code>-1</code> if an error occurred.
+
 
 +
: 返回 <code>str[start:end]</code> 中 ''substr'' 的非重叠出现次数。 如果发生错误,则返回 <code>-1</code>
 +
 
 +
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Replace</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">str</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">substr</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">replstr</span></span>, <span class="n"><span class="pre">Py_ssize_t</span></span><span class="w"> </span><span class="n"><span class="pre">maxcount</span></span><span class="sig-paren">)</span><br />
  
<dl>
+
: 将 ''str'' ''substr'' 的最多 ''maxcount'' 次替换为 ''replstr'',并返回生成的 Unicode 对象。 ''maxcount'' == <code>-1</code> 表示替换所有出现。
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Replace</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''str'', [[../structures#c|PyObject]] *''substr'', [[../structures#c|PyObject]] *''replstr'', Py_ssize_t ''maxcount''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>Replace at most ''maxcount'' occurrences of ''substr'' in ''str'' with ''replstr'' and
 
return the resulting Unicode object. ''maxcount'' == <code>-1</code> means replace all
 
occurrences.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_Compare</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''left'', [[../structures#c|PyObject]] *''right''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Compare</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">left</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">right</span></span><span class="sig-paren">)</span><br />
<dd><p>Compare two strings and return <code>-1</code>, <code>0</code>, <code>1</code> for less than, equal, and greater than,
+
</dt>
respectively.</p>
+
<dd><p>比较两个字符串并分别返回 <code>-1</code><code>0</code><code>1</code> 的小于、等于和大于。</p>
<p>This function returns <code>-1</code> upon failure, so one should call
+
<p>此函数在失败时返回 <code>-1</code>,因此应调用 [[../exceptions#c|PyErr_Occurred()]] 来检查错误。</p></dd></dl>
[[../exceptions#c|<code>PyErr_Occurred()</code>]] to check for errors.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>int <code>PyUnicode_CompareWithASCIIString</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''uni'', ''const'' char *''string''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_CompareWithASCIIString</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">uni</span></span>, <span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">string</span></span><span class="sig-paren">)</span><br />
<dd><p>Compare a Unicode object, ''uni'', with ''string'' and return <code>-1</code>, <code>0</code>, <code>1</code> for less
+
</dt>
than, equal, and greater than, respectively. It is best to pass only
+
<dd><p>比较 Unicode 对象 ''uni'' ''string'' 并返回 <code>-1</code><code>0</code><code>1</code> 的小于、等于和分别大于。 最好只传递 ASCII 编码的字符串,但如果输入字符串包含非 ASCII 字符,该函数会将输入字符串解释为 ISO-8859-1。</p>
ASCII-encoded strings, but the function interprets the input string as
+
<p>此函数不会引发异常。</p></dd></dl>
ISO-8859-1 if it contains non-ASCII characters.</p>
 
<p>This function does not raise exceptions.</p></dd></dl>
 
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_RichCompare</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''left'', [[../structures#c|PyObject]] *''right'', int ''op''<span class="sig-paren">)</span></dt>
+
<dt>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_RichCompare</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">left</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">right</span></span>, <span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="n"><span class="pre">op</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Rich compare two Unicode strings and return one of the following:</p>
+
<dd><p>Rich 比较两个 Unicode 字符串并返回以下值之一:</p>
 
<ul>
 
<ul>
<li><p><code>NULL</code> in case an exception was raised</p></li>
+
<li><p><code>NULL</code> 以防引发异常</p></li>
<li><p><code>Py_True</code> or <code>Py_False</code> for successful comparisons</p></li>
+
<li><p><code>Py_True</code> <code>Py_False</code> 用于成功比较</p></li>
<li><p><code>Py_NotImplemented</code> in case the type combination is unknown</p></li></ul>
+
<li><p><code>Py_NotImplemented</code> 如果类型组合未知</p></li></ul>
 +
 
 +
<p>''op'' 的可能值为 <code>Py_GT</code>、<code>Py_GE</code>、<code>Py_EQ</code>、<code>Py_NE</code>、<code>Py_LT</code> 和 <code>Py_LE</code>。</p></dd></dl>
  
<p>Possible values for ''op'' are <code>Py_GT</code>, <code>Py_GE</code>, <code>Py_EQ</code>,
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Format</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">format</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">args</span></span><span class="sig-paren">)</span><br />
<code>Py_NE</code>, <code>Py_LT</code>, and <code>Py_LE</code>.</p></dd></dl>
+
 
 +
: 从 ''format'' 和 ''args'' 返回一个新的字符串对象; 这类似于 <code>format % args</code>
  
 
<dl>
 
<dl>
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_Format</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''format'', [[../structures#c|PyObject]] *''args''<span class="sig-paren">)</span></dt>
+
<dt><span class="kt"><span class="pre">int</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_Contains</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">container</span></span>, [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">element</span></span><span class="sig-paren">)</span><br />
<dd><p>''Return value: New reference.''</p>
+
</dt>
<p>Return a new string object from ''format'' and ''args''; this is analogous to
+
<dd><p>检查 ''element'' 是否包含在 ''container'' 中并相应返回 true 或 false。</p>
<code>format % args</code>.</p></dd></dl>
+
<p>''element'' 必须强制转换为一个元素的 Unicode 字符串。 如果出现错误,则返回 <code>-1</code></p></dd></dl>
  
<dl>
+
; <span class="kt"><span class="pre">void</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_InternInPlace</span></span></span><span class="sig-paren">(</span>[[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">string</span></span><span class="sig-paren">)</span><br />
<dt>int <code>PyUnicode_Contains</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] *''container'', [[../structures#c|PyObject]] *''element''<span class="sig-paren">)</span></dt>
+
 
<dd><p>Check whether ''element'' is contained in ''container'' and return true or false
+
: 将参数 ''*string'' 实习到位。 参数必须是指向 Python Unicode 字符串对象的指针变量的地址。 如果存在与 ''*string'' 相同的内部字符串,则将其设置为 ''*string''(减少旧字符串对象的引用计数并增加对象的引用计数)实习字符串对象),否则它会留下 ''*string'' 单独并实习它(增加其引用计数)。 (澄清:尽管有很多关于引用计数的讨论,但将此函数视为引用计数中立;当且仅当您在调用之前拥有该对象时,您才在调用后拥有该对象。)
accordingly.</p>
+
 
<p>''element'' has to coerce to a one element Unicode string. <code>-1</code> is returned
+
; [[../structures#c|<span class="n"><span class="pre">PyObject</span></span>]]<span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="sig-name descname"><span class="n"><span class="pre">PyUnicode_InternFromString</span></span></span><span class="sig-paren">(</span><span class="k"><span class="pre">const</span></span><span class="w"> </span><span class="kt"><span class="pre">char</span></span><span class="w"> </span><span class="p"><span class="pre">*</span></span><span class="n"><span class="pre">v</span></span><span class="sig-paren">)</span><br />
if there was an error.</p></dd></dl>
 
  
; void <code>PyUnicode_InternInPlace</code><span class="sig-paren">(</span>[[../structures#c|PyObject]] **''string''<span class="sig-paren">)</span>
+
: [[#c.PyUnicode_FromString|PyUnicode_FromString()]] 和 [[#c.PyUnicode_InternInPlace|PyUnicode_InternInPlace()]] 的组合,返回一个新的 Unicode 字符串对象,或者一个新的(“拥有的”)引用到一个更早的实习字符串对象具有相同的值。
: Intern the argument ''*string'' in place. The argument must be the address of a pointer variable pointing to a Python Unicode string object. If there is an existing interned string that is the same as ''*string'', it sets ''*string'' to it (decrementing the reference count of the old string object and incrementing the reference count of the interned string object), otherwise it leaves ''*string'' alone and interns it (incrementing its reference count). (Clarification: even though there is a lot of talk about reference counts, think of this function as reference-count-neutral; you own the object after the call if and only if you owned it before the call.)
 
  
<dl>
 
<dt>[[../structures#c|PyObject]] *<code>PyUnicode_InternFromString</code><span class="sig-paren">(</span>''const'' char *''v''<span class="sig-paren">)</span></dt>
 
<dd><p>''Return value: New reference.''</p>
 
<p>A combination of [[#c.PyUnicode_FromString|<code>PyUnicode_FromString()</code>]] and
 
[[#c.PyUnicode_InternInPlace|<code>PyUnicode_InternInPlace()</code>]], returning either a new Unicode string
 
object that has been interned, or a new (&quot;owned&quot;) reference to an earlier
 
interned string object with the same value.</p></dd></dl>
 
  
 +
</div>
  
 
</div>
 
</div>
 +
<div class="clearer">
 +
 +
  
 
</div>
 
</div>
  
[[Category:Python 3.9 中文文档]]
+
[[Category:Python 3.9 文档]]

2021年10月31日 (日) 04:50的最新版本

Unicode 对象和编解码器

Unicode 对象

由于 PEP 393 在 Python 3.3 中的实现,Unicode 对象在内部使用了各种表示,以便在保持内存效率的同时处理完整范围的 Unicode 字符。 对于所有代码点都低于 128、256 或 65536 的字符串,存在特殊情况; 否则,代码点必须低于 1114112(这是完整的 Unicode 范围)。

Py_UNICODE* 和 UTF-8 表示按需创建并缓存在 Unicode 对象中。 Py_UNICODE* 表示已弃用且效率低下。

由于旧 API 和新 API 之间的转换,Unicode 对象在内部可以处于两种状态,具体取决于它们的创建方式:

  • “规范的”Unicode 对象是由未弃用的 Unicode API 创建的所有对象。 它们使用实现所允许的最有效的表示。
  • “legacy” Unicode 对象是通过其中一个不推荐使用的 API(通常是 PyUnicode_FromUnicode())创建的,并且只带有 Py_UNICODE* 表示; 在调用任何其他 API 之前,您必须对它们调用 PyUnicode_READY()

笔记

“遗留” Unicode 对象将在 Python 3.12 中删除,并带有弃用的 API。 从那时起,所有 Unicode 对象都将是“规范的”。 有关更多信息,请参阅 PEP 623


Unicode 类型

这些是用于 Python 中 Unicode 实现的基本 Unicode 对象类型:

type Py_UCS4

type Py_UCS2

type Py_UCS1

这些类型是无符号整数类型的 typedef,其宽度足以分别包含 32 位、16 位和 8 位字符。 处理单个Unicode字符时,使用Py_UCS4

3.3 版中的新功能。

type Py_UNICODE

这是一个 wchar_t 的 typedef,根据平台是 16 位类型还是 32 位类型。

3.3 版更改: 在以前的版本中,这是 16 位类型还是 32 位类型,具体取决于您在构建时选择了“窄”还是“宽” Unicode 版本的 Python。

type PyASCIIObject

type PyCompactUnicodeObject

type PyUnicodeObject

PyObject 的这些子类型代表 Python Unicode 对象。 在几乎所有情况下,都不应该直接使用它们,因为所有处理 Unicode 对象的 API 函数都采用并返回 PyObject 指针。

3.3 版中的新功能。

PyTypeObject PyUnicode_Type
PyTypeObject 的这个实例表示 Python Unicode 类型。 它以 str 的形式暴露给 Python 代码。

以下 API 是真正的 C 宏,可用于进行快速检查和访问 Unicode 对象的内部只读数据:

int PyUnicode_Check(PyObject *o)
如果对象 o 是 Unicode 对象或 Unicode 子类型的实例,则返回 true。 此功能总是成功。
int PyUnicode_CheckExact(PyObject *o)
如果对象 o 是 Unicode 对象,但不是子类型的实例,则返回 true。 此功能总是成功。
int PyUnicode_READY(PyObject *o)

确保字符串对象 o 在“规范”表示中。 在使用下面描述的任何访问宏之前,这是必需的。

成功时返回 0,失败时返回 -1 并设置异常,尤其是在内存分配失败时。

3.3 版中的新功能。

Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)

返回 Unicode 字符串的长度,以代码点为单位。 o 必须是“规范”表示中的 Unicode 对象(未选中)。

3.3 版中的新功能。

Py_UCS1 *PyUnicode_1BYTE_DATA(PyObject *o)

Py_UCS2 *PyUnicode_2BYTE_DATA(PyObject *o)

Py_UCS4 *PyUnicode_4BYTE_DATA(PyObject *o)

返回一个指向转换为 UCS1、UCS2 或 UCS4 整数类型的规范表示的指针,以进行直接字符访问。 如果规范表示具有正确的字符大小,则不会执行任何检查; 使用 PyUnicode_KIND() 选择正确的宏。 确保在访问它之前已经调用了 PyUnicode_READY()

3.3 版中的新功能。

PyUnicode_WCHAR_KIND

PyUnicode_1BYTE_KIND

PyUnicode_2BYTE_KIND

PyUnicode_4BYTE_KIND

返回 PyUnicode_KIND() 宏的值。

3.3 版中的新功能。

unsigned int PyUnicode_KIND(PyObject *o)

返回 PyUnicode 类型常量之一(见上文),指示此 Unicode 对象用于存储其数据的每个字符的字节数。 o 必须是“规范”表示中的 Unicode 对象(未选中)。

3.3 版中的新功能。

void *PyUnicode_DATA(PyObject *o)

返回指向原始 Unicode 缓冲区的空指针。 o 必须是“规范”表示中的 Unicode 对象(未选中)。

3.3 版中的新功能。

void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, Py_UCS4 value)

写入规范表示 data(通过 PyUnicode_DATA() 获得)。 这个宏不做任何健全性检查,旨在用于循环。 调用者应该缓存从其他宏调用中获得的 kind 值和 data 指针。 index 是字符串中的索引(从 0 开始),value 是应该写入该位置的新代码点值。

3.3 版中的新功能。

Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)

从规范表示中读取代码点 data(通过 PyUnicode_DATA() 获得)。 不执行检查或就绪调用。

3.3 版中的新功能。

Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)

从 Unicode 对象 o 中读取字符,该对象必须采用“规范”表示。 如果您进行多次连续读取,这比 PyUnicode_READ() 效率低。

3.3 版中的新功能。

PyUnicode_MAX_CHAR_VALUE(o)

返回适合基于 o 创建另一个字符串的最大代码点,该字符串必须采用“规范”表示。 这始终是一个近似值,但比迭代字符串更有效。

3.3 版中的新功能。

Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
返回已弃用的 Py_UNICODE 表示的大小,以代码单元为单位(这包括作为 2 个单元的代理对)。 o 必须是 Unicode 对象(未选中)。
Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
以字节为单位返回已弃用的 Py_UNICODE 表示的大小。 o 必须是 Unicode 对象(未选中)。
Py_UNICODE *PyUnicode_AS_UNICODE(PyObject *o)

const char *PyUnicode_AS_DATA(PyObject *o)

返回指向对象的 Py_UNICODE 表示的指针。 返回的缓冲区总是以一个额外的空代码点终止。 它还可能包含嵌入的空代码点,这会导致在大多数 C 函数中使用时字符串被截断。 AS_DATA 形式将指针转换为 const char*o 参数必须是一个 Unicode 对象(未检查)。

3.3 版更改: 这个宏现在效率低下——因为在许多情况下 Py_UNICODE 表示不存在并且需要创建——并且可能失败(返回 NULL有一个例外集)。 尝试移植代码以使用新的 PyUnicode_nBYTE_DATA() 宏或使用 PyUnicode_WRITE()PyUnicode_READ()

int PyUnicode_IsIdentifier(PyObject *o)

如果根据语言定义,部分 标识符和关键字 ,字符串是有效标识符,则返回 1。 否则返回 0

在 3.9 版更改: 如果字符串未准备好,该函数不再调用 Py_FatalError()


Unicode 字符属性

Unicode 提供了许多不同的字符属性。 最常需要的可通过这些宏获得,这些宏根据 Python 配置映射到 C 函数。

int Py_UNICODE_ISSPACE(Py_UNICODE ch)
根据 ch 是否为空白字符,返回 10
int Py_UNICODE_ISLOWER(Py_UNICODE ch)
根据 ch 是否为小写字符,返回 10
int Py_UNICODE_ISUPPER(Py_UNICODE ch)
根据 ch 是否为大写字符,返回 10
int Py_UNICODE_ISTITLE(Py_UNICODE ch)
根据 ch 是否是标题字符,返回 10
int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
根据 ch 是否为换行符,返回 10
int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
根据 ch 是否为十进制字符,返回 10
int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
根据 ch 是否为数字字符,返回 10
int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
根据 ch 是否为数字字符,返回 10
int Py_UNICODE_ISALPHA(Py_UNICODE ch)
根据 ch 是否为字母字符,返回 10
int Py_UNICODE_ISALNUM(Py_UNICODE ch)
根据 ch 是否为字母数字字符,返回 10
int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
根据 ch 是否为可打印字符,返回 10。 不可打印字符是在 Unicode 字符数据库中定义为“其他”或“分隔符”的字符,但被认为可打印的 ASCII 空格 (0x20) 除外。 (请注意,此上下文中的可打印字符是在字符串上调用 repr() 时不应转义的字符。 它与处理写入 sys.stdoutsys.stderr 的字符串无关。)

这些 API 可用于快速直接字符转换:

Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)

返回转换为小写的字符 ch

自 3.3 版起已弃用:此函数使用简单的大小写映射。

Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)

返回转换为大写的字符 ch

自 3.3 版起已弃用:此函数使用简单的大小写映射。

Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)

返回转换为标题大小写的字符 ch

自 3.3 版起已弃用:此函数使用简单的大小写映射。

int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
返回转换为十进制正整数的字符 ch。 如果这是不可能的,则返回 -1。 此宏不会引发异常。
int Py_UNICODE_TODIGIT(Py_UNICODE ch)
返回字符 ch 转换为一位整数。 如果这是不可能的,则返回 -1。 此宏不会引发异常。
double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
返回转换为双精度的字符 ch。 如果这是不可能的,则返回 -1.0。 此宏不会引发异常。

这些 API 可用于代理:

Py_UNICODE_IS_SURROGATE(ch)
检查 ch 是否是代理 (0xD800 <= ch <= 0xDFFF)。
Py_UNICODE_IS_HIGH_SURROGATE(ch)
检查 ch 是否为高代理 (0xD800 <= ch <= 0xDBFF)。
Py_UNICODE_IS_LOW_SURROGATE(ch)
检查 ch 是否为低代理 (0xDC00 <= ch <= 0xDFFF)。
Py_UNICODE_JOIN_SURROGATES(high, low)
连接两个代理字符并返回一个 Py_UCS4 值。 highlow 分别是代理对中的前导和尾随代理。


创建和访问 Unicode 字符串

要创建 Unicode 对象并访问它们的基本序列属性,请使用以下 API:

PyObject *PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)

创建一个新的 Unicode 对象。 maxchar 应该是放置在字符串中的真正最大代码点。 作为近似值,它可以四舍五入到序列 127、255、65535、1114111 中最接近的值。

这是分配新 Unicode 对象的推荐方法。 使用此函数创建的对象不可调整大小。

3.3 版中的新功能。

PyObject *PyUnicode_FromKindAndData(int kind, const void *buffer, Py_ssize_t size)

使用给定的 kind 创建一个新的 Unicode 对象(可能的值是 PyUnicode_1BYTE_KIND 等,由 PyUnicode_KIND() 返回)。 buffer 必须指向每个字符 1、2 或 4 个字节的 size 单位数组,由种类给出。

3.3 版中的新功能。

PyObject *PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)

从字符缓冲区 u 创建一个 Unicode 对象。 字节将被解释为 UTF-8 编码。 缓冲区被复制到新对象中。 如果缓冲区不是NULL,则返回值可能是共享对象,即 不允许修改数据。

如果 uNULL,则此函数的行为类似于 PyUnicode_FromUnicode(),缓冲区设置为 NULL。 这种用法已被弃用,取而代之的是 PyUnicode_New(),并将在 Python 3.12 中删除。

PyObject *PyUnicode_FromString(const char *u)
从 UTF-8 编码的空终止字符缓冲区 u 创建一个 Unicode 对象。
PyObject *PyUnicode_FromFormat(const char *format, ...)

取一个 C printf() 样式的 format 字符串和可变数量的参数,计算生成的 Python Unicode 字符串的大小并返回一个字符串,其中包含格式化的值。 变量参数必须是 C 类型,并且必须与 format ASCII 编码字符串中的格式字符完全对应。 允许使用以下格式字符:

格式字符

类型

评论

%%

不适用

The literal % character.

%c

整数

单个字符,表示为 C int。

%d

整数

相当于 printf("%d")1

%u

无符号整数

相当于 printf("%u")1

%ld

相当于 printf("%ld")1

%li

相当于 printf("%li")1

%lu

无符号长

相当于 printf("%lu")1

%lld

长长的

相当于 printf("%lld")1

%lli

长长的

相当于 printf("%lli")1

%llu

无符号长长

相当于 printf("%llu")1

%zd

py_ssize_t

相当于 printf("%zd")1

%zi

py_ssize_t

相当于 printf("%zi")1

%zu

尺寸_t

相当于 printf("%zu")1

%i

整数

相当于 printf("%i")1

%x

整数

相当于 printf("%x")1

%s

常量字符*

以空字符结尾的 C 字符数组。

%p

常量空*

C 指针的十六进制表示。 大部分等同于 printf("%p"),除了它保证以文字 0x 开头,而不管平台的 printf 产生什么。

%A

对象*

调用 ascii() 的结果。

%U

对象*

一个 Unicode 对象。

%V

PyObject*, const char*

一个 Unicode 对象(可能是 NULL)和一个以空字符结尾的 C 字符数组作为第二个参数(如果第一个参数是 NULL,将使用它)。

%S

对象*

调用 PyObject_Str() 的结果。

%R

对象*

调用 PyObject_Repr() 的结果。

无法识别的格式字符会导致格式字符串的所有其余部分按原样复制到结果字符串,并丢弃任何额外的参数。

笔记

宽度格式化单元是字符数而不是字节数。 精度格式器单位是 "%s""%V" 的字节数(如果 PyObject* 参数是 NULL),以及 "%A""%U""%S""%R""%V"(如果 PyObject* 参数不是 NULL .

1(1,2,3,4,5,6,[ X67X]7,8,9,10,11,12, 13)

对于整数说明符 (d, u, ld, li, lu, lld, lli, llu, zd, zi, zu, i, x):即使给定精度,0 转换标志也有效。

3.2 版更改: 添加了对 "%lld""%llu" 的支持。

3.3 版更改: 添加了对 "%li""%lli""%zi" 的支持。

3.4 版更改: 支持 "%s""%A""%U""%V""%S"的宽度和精度格式器]、"%R" 添加。

PyObject *PyUnicode_FromFormatV(const char *format, va_list vargs)
PyUnicode_FromFormat() 相同,只是它只需要两个参数。
PyObject *PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)

将编码对象 obj 解码为 Unicode 对象。

bytes, bytearray 和其他 bytes-like objects 根据给定的 encoding 并使用 定义的错误处理进行解码错误。 两者都可以是 NULL 以使接口使用默认值(有关详细信息,请参阅 内置编解码器 )。

所有其他对象,包括 Unicode 对象,都会导致设置 TypeError

如果出现错误,API 将返回 NULL。 调用者负责定义返回的对象。

Py_ssize_t PyUnicode_GetLength(PyObject *unicode)

返回 Unicode 对象的长度,以代码点为单位。

3.3 版中的新功能。

Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many)

将字符从一个 Unicode 对象复制到另一个。 此函数在必要时执行字符转换,并在可能的情况下回退到 memcpy()。 返回 -1 并设置错误异常,否则返回复制的字符数。

3.3 版中的新功能。

Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, Py_ssize_t length, Py_UCS4 fill_char)

用字符填充字符串:将 fill_char 写入 unicode[start:start+length]

如果 fill_char 大于字符串最大字符,或者字符串有 1 个以上的引用,则失败。

返回写入的字符数,或返回 -1 并在出错时引发异常。

3.3 版中的新功能。

int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, Py_UCS4 character)

将字符写入字符串。 该字符串必须是通过 PyUnicode_New() 创建的。 由于 Unicode 字符串应该是不可变的,因此字符串不能被共享,或者已经被散列。

该函数检查 unicode 是一个 Unicode 对象,索引没有越界,并且该对象可以安全地修改(即 它的引用计数是一)。

3.3 版中的新功能。

Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)

从字符串中读取一个字符。 与宏版本 PyUnicode_READ_CHAR() 相比,此函数检查 unicode 是一个 Unicode 对象并且索引没有越界。

3.3 版中的新功能。

PyObject *PyUnicode_Substring(PyObject *str, Py_ssize_t start, Py_ssize_t end)

返回 str 的子字符串,从字符索引 start(包含)到字符索引 end(不包含)。 不支持负索引。

3.3 版中的新功能。

Py_UCS4 *PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, Py_ssize_t buflen, int copy_null)

如果设置了 copy_null,则将字符串 u 复制到 UCS4 缓冲区中,包括空字符。 返回 NULL 并设置错误异常(特别是,如果 buflen 小于 u 的长度,则为 SystemError)。 buffer 成功返回。

3.3 版中的新功能。

Py_UCS4 *PyUnicode_AsUCS4Copy(PyObject *u)

将字符串 u 复制到使用 PyMem_Malloc() 分配的新 UCS4 缓冲区中。 如果失败,则返回 NULL 并设置 MemoryError。 返回的缓冲区总是附加一个额外的空代码点。

3.3 版中的新功能。


已弃用的 Py_UNICODE API

这些 API 函数随着 PEP 393 的实现而被弃用。 扩展模块可以继续使用它们,因为它们不会在 Python 3.x 中被删除,但需要注意它们的使用现在可能会导致性能和内存命中。

PyObject *PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)

从给定大小的 Py_UNICODE 缓冲区 u 创建一个 Unicode 对象。 u 可能是 NULL 这会导致内容未定义。 用户有责任填写所需的数据。 缓冲区被复制到新对象中。

如果缓冲区不是 NULL,则返回值可能是共享对象。 因此,仅当 uNULL 时,才允许修改生成的 Unicode 对象。

如果缓冲区是 NULL,则在使用任何访问宏(例如 PyUnicode_KIND())之前,必须在填充字符串内容后调用 PyUnicode_READY()

Py_UNICODE *PyUnicode_AsUnicode(PyObject *unicode)
返回一个只读指针,指向 Unicode 对象的内部 Py_UNICODE 缓冲区,或者在出错时返回 NULL。 如果对象尚不可用,这将创建对象的 Py_UNICODE* 表示。 缓冲区总是以额外的空代码点终止。 请注意,生成的 Py_UNICODE 字符串也可能包含嵌入的空代码点,这会导致该字符串在大多数 C 函数中使用时被截断。
PyObject *PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
通过根据十进制值将给定 sizePy_UNICODE 缓冲区中的所有十进制数字替换为 ASCII 数字 0-9,创建一个 Unicode 对象。 如果发生异常,则返回 NULL
Py_UNICODE *PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size)

PyUnicode_AsUnicode() 一样,但也将 Py_UNICODE() 数组长度(不包括额外的空终止符)保存在 size 中。 请注意,生成的 Py_UNICODE* 字符串可能包含嵌入的空代码点,这会导致该字符串在大多数 C 函数中使用时被截断。

3.3 版中的新功能。

Py_UNICODE *PyUnicode_AsUnicodeCopy(PyObject *unicode)

创建以空代码点结尾的 Unicode 字符串的副本。 返回 NULL 并在内存分配失败时引发 MemoryError 异常,否则返回新分配的缓冲区(使用 PyMem_Free() 释放缓冲区)。 请注意,生成的 Py_UNICODE* 字符串可能包含嵌入的空代码点,这会导致该字符串在大多数 C 函数中使用时被截断。

3.2 版中的新功能。

请迁移到使用 PyUnicode_AsUCS4Copy() 或类似的新 API。

Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
返回已弃用的 Py_UNICODE 表示的大小,以代码单元为单位(这包括作为 2 个单元的代理对)。
PyObject *PyUnicode_FromObject(PyObject *obj)

如有必要,将 Unicode 子类型的实例复制到新的真正 Unicode 对象。 如果 obj 已经是真正的 Unicode 对象(不是子类型),则返回引用计数增加的引用。

Unicode 或其子类型以外的对象将导致 TypeError


语言环境编码

当前区域设置编码可用于解码来自操作系统的文本。

PyObject *PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len, const char *errors)

从 Android 和 VxWorks 上的 UTF-8 或其他平台上的当前语言环境编码解码字符串。 支持的错误处理程序是 "strict""surrogateescape" (PEP 383)。 如果 errorsNULL,则解码器使用 "strict" 错误处理程序。 str 必须以空字符结尾,但不能包含嵌入的空字符。

使用 PyUnicode_DecodeFSDefaultAndSize() 解码来自 Py_FileSystemDefaultEncoding 的字符串(Python 启动时读取的语言环境编码)。

此函数忽略 Python UTF-8 模式。

也可以看看

Py_DecodeLocale() 函数。

3.3 版中的新功能。

3.7 版更改: 该函数现在还使用 surrogateescape 错误处理程序的当前区域设置编码,Android 除外。 之前,Py_DecodeLocale()用于surrogateescape,当前区域编码用于strict

PyObject *PyUnicode_DecodeLocale(const char *str, const char *errors)

类似于 PyUnicode_DecodeLocaleAndSize(),但使用 strlen() 计算字符串长度。

3.3 版中的新功能。

PyObject *PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)

在 Android 和 VxWorks 上将 Unicode 对象编码为 UTF-8,或在其他平台上编码为当前的语言环境编码。 支持的错误处理程序是 "strict""surrogateescape" (PEP 383)。 如果 errorsNULL,则编码器使用 "strict" 错误处理程序。 返回一个 bytes 对象。 unicode 不能包含嵌入的空字符。

使用 PyUnicode_EncodeFSDefault() 将字符串编码为 Py_FileSystemDefaultEncoding(Python 启动时读取的语言环境编码)。

此函数忽略 Python UTF-8 模式。

也可以看看

Py_EncodeLocale() 函数。

3.3 版中的新功能。

3.7 版更改: 该函数现在还使用 surrogateescape 错误处理程序的当前区域设置编码,Android 除外。 之前,Py_EncodeLocale()用于surrogateescape,当前区域编码用于strict


文件系统编码

要对文件名和其他环境字符串进行编码和解码,应使用 Py_FileSystemDefaultEncoding 作为编码,并使用 Py_FileSystemDefaultEncodeErrors 作为错误处理程序 (PEP 383PEP 529)。 要在参数解析期间将文件名编码为 bytes,应使用 "O&" 转换器,传递 PyUnicode_FSConverter() 作为转换函数:

int PyUnicode_FSConverter(PyObject *obj, void *result)

ParseTuple 转换器:编码 str 对象 - 直接获得或通过 os.PathLike 接口 - 使用 PyUnicode_EncodeFSDefault()bytesbytes 对象按原样输出。 result 必须是一个 PyBytesObject* 不再使用时必须释放。

3.1 版中的新功能。

3.6 版更改: 接受 类路径对象

要在参数解析期间将文件名解码为 str,应使用 "O&" 转换器,将 PyUnicode_FSDecoder() 作为转换函数传递:

int PyUnicode_FSDecoder(PyObject *obj, void *result)

ParseTuple 转换器:解码 bytes 对象 - 通过 os.PathLike 接口直接或间接获得 - 使用 PyUnicode_DecodeFSDefaultAndSize()strstr 对象按原样输出。 result 必须是一个 PyUnicodeObject* 不再使用时必须释放。

3.2 版中的新功能。

3.6 版更改: 接受 类路径对象

PyObject *PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)

使用 Py_FileSystemDefaultEncodingPy_FileSystemDefaultEncodeErrors 错误处理程序解码字符串。

如果未设置 Py_FileSystemDefaultEncoding,则回退到语言环境编码。

Py_FileSystemDefaultEncoding 在启动时从语言环境编码初始化,以后无法修改。 如果需要从当前语言环境编码解码字符串,请使用 PyUnicode_DecodeLocaleAndSize()

也可以看看

Py_DecodeLocale() 函数。

3.6 版更改: 使用 Py_FileSystemDefaultEncodeErrors 错误处理程序。

PyObject *PyUnicode_DecodeFSDefault(const char *s)

使用 Py_FileSystemDefaultEncodingPy_FileSystemDefaultEncodeErrors 错误处理程序解码以空字符结尾的字符串。

如果未设置 Py_FileSystemDefaultEncoding,则回退到语言环境编码。

如果您知道字符串长度,请使用 PyUnicode_DecodeFSDefaultAndSize()

3.6 版更改: 使用 Py_FileSystemDefaultEncodeErrors 错误处理程序。

PyObject *PyUnicode_EncodeFSDefault(PyObject *unicode)

使用 Py_FileSystemDefaultEncodeErrors 错误处理程序将 Unicode 对象编码为 Py_FileSystemDefaultEncoding,并返回 bytes。 请注意,生成的 bytes 对象可能包含空字节。

如果未设置 Py_FileSystemDefaultEncoding,则回退到语言环境编码。

Py_FileSystemDefaultEncoding 在启动时从语言环境编码初始化,以后无法修改。 如果需要将字符串编码为当前语言环境编码,请使用 PyUnicode_EncodeLocale()

也可以看看

Py_EncodeLocale() 函数。

3.2 版中的新功能。

3.6 版更改: 使用 Py_FileSystemDefaultEncodeErrors 错误处理程序。


wchar_t 支持

wchar_t 对支持它的平台的支持:

PyObject *PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
从给定 大小wchar_t 缓冲区 w 创建一个 Unicode 对象。 将 -1 作为 size 传递表示函数必须自己计算长度,使用 wcslen。 失败时返回 NULL
Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size)
将 Unicode 对象内容复制到 wchar_t 缓冲区 w。 最多复制 size wchar_t 个字符(不包括可能的尾随空终止字符)。 如果出错,返回复制的 wchar_t 个字符或 -1 个字符的数量。 请注意,生成的 wchar_t* 字符串可能会或可能不会以空字符结尾。 如果应用程序需要,调用者有责任确保 wchar_t* 字符串以空字符结尾。 另外,请注意 wchar_t* 字符串可能包含空字符,这会导致字符串在与大多数 C 函数一起使用时被截断。
wchar_t *PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size)

将 Unicode 对象转换为宽字符串。 输出字符串总是以空字符结尾。 如果 size 不是 NULL,则将宽字符数(不包括尾随空终止符)写入 *size。 请注意,生成的 wchar_t 字符串可能包含空字符,这会导致该字符串在与大多数 C 函数一起使用时被截断。 如果 sizeNULL 并且 wchar_t* 字符串包含空字符,则会引发 ValueError

成功时返回由 PyMem_Alloc() 分配的缓冲区(使用 PyMem_Free() 释放它)。 出错时,返回 NULL*size 未定义。 如果内存分配失败,则引发 MemoryError

3.2 版中的新功能。

3.7 版更改: 如果 sizeNULL 并且 wchar_t* 字符串包含空字符,则会引发 ValueError


内置编解码器

Python 提供了一组用 C 编写的内置编解码器以提高速度。 所有这些编解码器都可以通过以下功能直接使用。

以下许多 API 采用编码和错误两个参数,它们与内置的 str() 字符串对象构造函数具有相同的语义。

将编码设置为 NULL 会导致使用默认编码,即 UTF-8。 文件系统调用应使用 PyUnicode_FSConverter() 对文件名进行编码。 这在内部使用变量 Py_FileSystemDefaultEncoding。 这个变量应该被视为只读:在某些系统上,它将是一个指向静态字符串的指针,在其他系统上,它会在运行时发生变化(例如当应用程序调用 setlocale 时)。

错误处理由错误设置,错误也可以设置为 NULL 意味着使用为编解码器定义的默认处理。 所有内置编解码器的默认错误处理都是“严格的”(引发 ValueError)。

编解码器都使用类似的接口。 为简单起见,仅记录了与以下通用的偏差。

通用编解码器

这些是通用编解码器 API:

PyObject *PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
通过解码编码字符串 ssize 字节来创建一个 Unicode 对象。 encodingerrorsstr() 内置函数中的同名参数含义相同。 使用 Python 编解码器注册表查找要使用的编解码器。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
编码一个 Unicode 对象并将结果作为 Python 字节对象返回。 encodingerrors 与 Unicode encode() 方法中的同名参数含义相同。 使用 Python 编解码器注册表查找要使用的编解码器。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
对给定 sizePy_UNICODE 缓冲区 s 进行编码并返回 Python 字节对象。 encodingerrors 与 Unicode encode() 方法中的同名参数含义相同。 使用 Python 编解码器注册表查找要使用的编解码器。 如果编解码器引发异常,则返回 NULL


UTF-8 编解码器

这些是 UTF-8 编解码器 API:

PyObject *PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
通过解码 UTF-8 编码字符串 ssize 字节来创建一个 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
如果 consumedNULL,则行为类似于 PyUnicode_DecodeUTF8()。 如果 consumed 不是 NULL,尾随不完整的 UTF-8 字节序列将不会被视为错误。 这些字节不会被解码,已经解码的字节数将存储在 consumed 中。
PyObject *PyUnicode_AsUTF8String(PyObject *unicode)
使用 UTF-8 对 Unicode 对象进行编码,并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
const char *PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)

返回指向 Unicode 对象的 UTF-8 编码的指针,并将编码表示的大小(以字节为单位)存储在 size 中。 size 参数可以是 NULL; 在这种情况下,不会存储任何大小。 返回的缓冲区总是附加一个额外的空字节(不包括在 size 中),无论是否有任何其他空代码点。

在出错的情况下,返回 NULL 并设置异常并且不存储 size

这将字符串的 UTF-8 表示缓存在 Unicode 对象中,后续调用将返回指向同一缓冲区的指针。 调用者不负责释放缓冲区。

3.3 版中的新功能。

3.7 版更改: 返回类型现在是 const char * 而不是 char *

const char *PyUnicode_AsUTF8(PyObject *unicode)

作为 PyUnicode_AsUTF8AndSize(),但不存储大小。

3.3 版中的新功能。

3.7 版更改: 返回类型现在是 const char * 而不是 char *

PyObject *PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
使用 UTF-8 对给定 sizePy_UNICODE 缓冲区 s 进行编码,并返回 Python 字节对象。 如果编解码器引发异常,则返回 NULL


UTF-32 编解码器

这些是 UTF-32 编解码器 API:

PyObject *PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)

从 UTF-32 编码的缓冲区字符串中解码 size 字节并返回相应的 Unicode 对象。 errors(如果非NULL)定义错误处理。 它默认为“严格”。

如果 byteorder 是非 NULL,解码器开始使用给定的字节顺序解码:

*byteorder == -1: little endian
*byteorder == 0:  native order
*byteorder == 1:  big endian

如果 *byteorder 为零,并且输入数据的前四个字节是字节顺序标记 (BOM),则解码器切换到此字节顺序,并且不会将 BOM 复制到生成的 Unicode 字符串中。 如果 *byteorder-11,任何字节顺序标记都会被复制到输出。

完成后,将*byteorder设置为输入数据末尾的当前字节顺序。

如果 byteorderNULL,则编解码器以原生顺序模式启动。

如果编解码器引发异常,则返回 NULL

PyObject *PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
如果 consumedNULL,则行为类似于 PyUnicode_DecodeUTF32()。 如果 consumed 不是 NULL,则 PyUnicode_DecodeUTF32Stateful() 不会将尾随不完整的 UTF-32 字节序列(例如不能被四整除的字节数)视为错误。 这些字节不会被解码,已经解码的字节数将存储在 consumed 中。
PyObject *PyUnicode_AsUTF32String(PyObject *unicode)
以本机字节顺序使用 UTF-32 编码返回 Python 字节字符串。 该字符串始终以 BOM 标记开头。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)

返回一个 Python 字节对象,其中包含 s 中 Unicode 数据的 UTF-32 编码值。 输出按照以下字节顺序写入:

byteorder == -1: little endian
byteorder == 0:  native byte order (writes a BOM mark)
byteorder == 1:  big endian

如果 byteorder 为 0,则输出字符串将始终以 Unicode BOM 标记 (U+FEFF) 开头。 在其他两种模式中,没有预先添加 BOM 标记。

如果未定义 Py_UNICODE_WIDE,则代理对将作为单个代码点输出。

如果编解码器引发异常,则返回 NULL


UTF-16 编解码器

这些是 UTF-16 编解码器 API:

PyObject *PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)

从 UTF-16 编码的缓冲区字符串中解码 size 字节并返回相应的 Unicode 对象。 errors(如果非NULL)定义错误处理。 它默认为“严格”。

如果 byteorder 是非 NULL,解码器开始使用给定的字节顺序解码:

*byteorder == -1: little endian
*byteorder == 0:  native order
*byteorder == 1:  big endian

如果 *byteorder 为零,并且输入数据的前两个字节是字节顺序标记 (BOM),则解码器切换到此字节顺序,并且不会将 BOM 复制到生成的 Unicode 字符串中。 如果 *byteorder-11,则任何字节顺序标记都被复制到输出(它将导致 \ufeff\ufffe 字符)。

完成后,将*byteorder设置为输入数据末尾的当前字节顺序。

如果 byteorderNULL,则编解码器以原生顺序模式启动。

如果编解码器引发异常,则返回 NULL

PyObject *PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
如果 consumedNULL,则行为类似于 PyUnicode_DecodeUTF16()。 如果 consumed 不是 NULL,则 PyUnicode_DecodeUTF16Stateful() 将不会处理尾随不完整的 UTF-16 字节序列(例如奇数字节或拆分代理对)作为错误。 这些字节不会被解码,已经解码的字节数将存储在 consumed 中。
PyObject *PyUnicode_AsUTF16String(PyObject *unicode)
以本机字节顺序使用 UTF-16 编码返回 Python 字节字符串。 该字符串始终以 BOM 标记开头。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)

返回一个 Python 字节对象,其中包含 s 中 Unicode 数据的 UTF-16 编码值。 输出按照以下字节顺序写入:

byteorder == -1: little endian
byteorder == 0:  native byte order (writes a BOM mark)
byteorder == 1:  big endian

如果 byteorder 为 0,则输出字符串将始终以 Unicode BOM 标记 (U+FEFF) 开头。 在其他两种模式中,没有预先添加 BOM 标记。

如果定义了 Py_UNICODE_WIDE,则单个 Py_UNICODE 值可能会表示为代理对。 如果未定义,则每个 Py_UNICODE 值都被解释为 UCS-2 字符。

如果编解码器引发异常,则返回 NULL


UTF-7 编解码器

这些是 UTF-7 编解码器 API:

PyObject *PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
通过解码 UTF-7 编码字符串 ssize 字节来创建一个 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
如果 consumedNULL,则行为类似于 PyUnicode_DecodeUTF7()。 如果 consumed 不是 NULL,尾随不完整的 UTF-7 base-64 部分将不会被视为错误。 这些字节不会被解码,已经解码的字节数将存储在 consumed 中。
PyObject *PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)

使用 UTF-7 编码给定大小的 Py_UNICODE 缓冲区并返回 Python 字节对象。 如果编解码器引发异常,则返回 NULL

如果 base64SetO 非零,“Set O”(没有其他特殊含义的标点符号)将以 base-64 编码。 如果 base64WhiteSpace 非零,空格将被编码为 base-64。 对于 Python“utf-7”编解码器,两者都设置为零。


Unicode 转义编解码器

这些是“Unicode Escape”编解码器 API:

PyObject *PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
通过解码 Unicode-Escape 编码字符串 ssize 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
使用 Unicode-Escape 对 Unicode 对象进行编码,并将结果作为字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
使用 Unicode-Escape 对给定 sizePy_UNICODE 缓冲区进行编码并返回一个字节对象。 如果编解码器引发异常,则返回 NULL


原始 Unicode 转义编解码器

这些是“原始 Unicode 转义”编解码器 API:

PyObject *PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
通过解码 Raw-Unicode-Escape 编码字符串 ssize 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
使用 Raw-Unicode-Escape 对 Unicode 对象进行编码,并将结果作为字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
使用 Raw-Unicode-Escape 对给定 sizePy_UNICODE 缓冲区进行编码,并返回一个字节对象。 如果编解码器引发异常,则返回 NULL


拉丁 1 编解码器

这些是 Latin-1 编解码器 API:Latin-1 对应于前 256 个 Unicode 序数,编解码器在编码期间只接受这些。

PyObject *PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
通过解码 Latin-1 编码字符串 ssize 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_AsLatin1String(PyObject *unicode)
使用 Latin-1 对 Unicode 对象进行编码,并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
使用 Latin-1 对给定 sizePy_UNICODE 缓冲区进行编码,并返回 Python 字节对象。 如果编解码器引发异常,则返回 NULL


ASCII 编解码器

这些是 ASCII 编解码器 API。 仅接受 7 位 ASCII 数据。 所有其他代码都会产生错误。

PyObject *PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
通过解码 ASCII 编码字符串 ssize 字节来创建一个 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_AsASCIIString(PyObject *unicode)
使用 ASCII 编码 Unicode 对象并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
使用 ASCII 编码给定 sizePy_UNICODE 缓冲区并返回 Python 字节对象。 如果编解码器引发异常,则返回 NULL


字符映射编解码器

这个编解码器的特殊之处在于它可以用来实现许多不同的编解码器(实际上这是为了获得 encodings 包中包含的大多数标准编解码器所做的)。 编解码器使用映射来编码和解码字符。 提供的映射对象必须支持__getitem__()映射接口; 字典和序列运行良好。

这些是映射编解码器 API:

PyObject *PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, PyObject *mapping, const char *errors)

通过使用给定的 mapping 对象解码编码字符串 ssize 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 NULL

如果 mappingNULL,则将应用 Latin-1 解码。 否则 mapping 必须将字节序数(0 到 255 范围内的整数)映射到 Unicode 字符串、整数(然后被解释为 Unicode 序数)或 None。 未映射的数据字节——导致 LookupError 以及映射到 None0xFFFE'\ufffe' 的数据字节被视为未定义的映射并导致错误。

PyObject *PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)

使用给定的 mapping 对象对 Unicode 对象进行编码,并将结果作为字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL

mapping 对象必须将 Unicode 序数整数映射到字节对象,范围为 0 到 255 或 None 的整数。 未映射的字符序数(导致 LookupError)以及映射到 None 的字符序数被视为“未定义映射”并导致错误。

PyObject *PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
使用给定的 mapping 对象对给定 sizePy_UNICODE 缓冲区进行编码,并将结果作为字节对象返回。 如果编解码器引发异常,则返回 NULL

以下编解码器 API 的特殊之处在于将 Unicode 映射到 Unicode。

PyObject *PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)

通过向字符串应用字符映射表来翻译字符串并返回结果 Unicode 对象。 如果编解码器引发异常,则返回 NULL

映射表必须将 Unicode 序数整数映射到 Unicode 序数整数或 None(导致字符删除)。

映射表只需提供__getitem__()接口; 字典和序列运行良好。 未映射的字符序数(导致 LookupError)保持不变并按原样复制。

errors 具有编解码器的通常含义。 它可能是 NULL 表示使用默认错误处理。

PyObject *PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
通过向其应用字符 mapping 表来转换给定 sizePy_UNICODE 缓冲区并返回生成的 Unicode 对象。 当编解码器引发异常时返回 NULL


适用于 Windows 的 MBCS 编解码器

这些是 MBCS 编解码器 API。 它们目前仅在 Windows 上可用并使用 Win32 MBCS 转换器来实现转换。 请注意,MBCS(或 DBCS)是一类编码,而不仅仅是一种。 目标编码由运行编解码器的机器上的用户设置定义。

PyObject *PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
通过解码 MBCS 编码字符串 ssize 字节来创建 Unicode 对象。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_DecodeMBCSStateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
如果 consumedNULL,则行为类似于 PyUnicode_DecodeMBCS()。 如果 consumed 不是 NULL,则 PyUnicode_DecodeMBCSStateful() 将不会解码尾随前导字节,并且已解码的字节数将存储在 consumed[ X191X]。
PyObject *PyUnicode_AsMBCSString(PyObject *unicode)
使用 MBCS 对 Unicode 对象进行编码,并将结果作为 Python 字节对象返回。 错误处理是“严格的”。 如果编解码器引发异常,则返回 NULL
PyObject *PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors)

使用指定的代码页对 Unicode 对象进行编码并返回 Python 字节对象。 如果编解码器引发异常,则返回 NULL。 使用 CP_ACP 代码页获取 MBCS 编码器。

3.3 版中的新功能。

PyObject *PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
使用 MBCS 对给定 sizePy_UNICODE 缓冲区进行编码并返回 Python 字节对象。 如果编解码器引发异常,则返回 NULL


方法和插槽

方法和槽函数

以下 API 能够处理输入时的 Unicode 对象和字符串(我们在描述中将它们称为字符串)并适当地返回 Unicode 对象或整数。

如果发生异常,它们都返回 NULL-1

PyObject *PyUnicode_Concat(PyObject *left, PyObject *right)
连接两个字符串,给出一个新的 Unicode 字符串。
PyObject *PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
拆分一个字符串,给出一个 Unicode 字符串列表。 如果 sepNULL,则将在所有空白子串上进行拆分。 否则,在给定的分隔符处发生拆分。 最多会完成 maxsplit 次分割。 如果为负,则不设置限制。 结果列表中不包含分隔符。
PyObject *PyUnicode_Splitlines(PyObject *s, int keepend)
在换行符处拆分 Unicode 字符串,返回 Unicode 字符串列表。 CRLF 被认为是一个换行符。 如果 keepend0,则结果字符串中不包含换行符。
PyObject *PyUnicode_Join(PyObject *separator, PyObject *seq)
使用给定的 separator 连接字符串序列并返回结果 Unicode 字符串。
Py_ssize_t PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
如果 substr 在给定的尾端匹配 str[start:end],则返回 1direction == -1 表示进行前缀匹配, direction == 1 后缀匹配),否则为 0。 如果发生错误,则返回 -1
Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
使用给定的 direction 返回 str[start:end]substr 的第一个位置(direction == 1 表示做一个前向搜索,direction == -1 向后搜索)。 返回值是第一个匹配的索引; -1 的值表示未找到匹配项,-2 表示发生错误并设置了异常。
Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction)

使用给定的 direction 返回字符 chstr[start:end] 中的第一个位置(direction == 1 表示做向前搜索,direction == -1 向后搜索)。 返回值是第一个匹配的索引; -1 的值表示未找到匹配项,-2 表示发生错误并设置了异常。

3.3 版中的新功能。

3.7 版更改:startend 现在调整为类似于 str[start:end]

Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
返回 str[start:end]substr 的非重叠出现次数。 如果发生错误,则返回 -1
PyObject *PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
strsubstr 的最多 maxcount 次替换为 replstr,并返回生成的 Unicode 对象。 maxcount == -1 表示替换所有出现。
int PyUnicode_Compare(PyObject *left, PyObject *right)

比较两个字符串并分别返回 -101 的小于、等于和大于。

此函数在失败时返回 -1,因此应调用 PyErr_Occurred() 来检查错误。

int PyUnicode_CompareWithASCIIString(PyObject *uni, const char *string)

比较 Unicode 对象 unistring 并返回 -101 的小于、等于和分别大于。 最好只传递 ASCII 编码的字符串,但如果输入字符串包含非 ASCII 字符,该函数会将输入字符串解释为 ISO-8859-1。

此函数不会引发异常。

PyObject *PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)

Rich 比较两个 Unicode 字符串并返回以下值之一:

  • NULL 以防引发异常

  • Py_TruePy_False 用于成功比较

  • Py_NotImplemented 如果类型组合未知

op 的可能值为 Py_GTPy_GEPy_EQPy_NEPy_LTPy_LE

PyObject *PyUnicode_Format(PyObject *format, PyObject *args)
formatargs 返回一个新的字符串对象; 这类似于 format % args
int PyUnicode_Contains(PyObject *container, PyObject *element)

检查 element 是否包含在 container 中并相应返回 true 或 false。

element 必须强制转换为一个元素的 Unicode 字符串。 如果出现错误,则返回 -1

void PyUnicode_InternInPlace(PyObject **string)
将参数 *string 实习到位。 参数必须是指向 Python Unicode 字符串对象的指针变量的地址。 如果存在与 *string 相同的内部字符串,则将其设置为 *string(减少旧字符串对象的引用计数并增加对象的引用计数)实习字符串对象),否则它会留下 *string 单独并实习它(增加其引用计数)。 (澄清:尽管有很多关于引用计数的讨论,但将此函数视为引用计数中立;当且仅当您在调用之前拥有该对象时,您才在调用后拥有该对象。)
PyObject *PyUnicode_InternFromString(const char *v)
PyUnicode_FromString()PyUnicode_InternInPlace() 的组合,返回一个新的 Unicode 字符串对象,或者一个新的(“拥有的”)引用到一个更早的实习字符串对象具有相同的值。