This repository has been archived on 2024-06-20. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.
coffee.pygments/doc/docs/tokens.rst
2023-05-29 17:07:49 +02:00

379 lines
10 KiB
ReStructuredText

.. -*- mode: rst -*-
==============
Builtin Tokens
==============
.. module:: pygments.token
In the :mod:`pygments.token` module, there is a special object called `Token`
that is used to create token types.
You can create a new token type by accessing an attribute of `Token` whose
name starts with an uppercase letter:
.. sourcecode:: pycon
>>> from pygments.token import Token
>>> Token.String
Token.String
>>> Token.String is Token.String
True
Note that tokens are singletons so you can use the ``is`` operator for comparing
token types.
You can also use the ``in`` operator to perform set tests:
.. sourcecode:: pycon
>>> from pygments.token import Comment
>>> Comment.Single in Comment
True
>>> Comment in Comment.Multi
False
This can be useful in :doc:`filters <filters>` and if you write lexers on your
own without using the base lexers.
You can also split a token type into a hierarchy, and get the parent of it:
.. sourcecode:: pycon
>>> String.split()
[Token, Token.Literal, Token.Literal.String]
>>> String.parent
Token.Literal
In principle, you can create an unlimited number of token types but nobody can
guarantee that a style would define style rules for a token type. Because of
that, Pygments proposes some global token types defined in the
`pygments.token.STANDARD_TYPES` dict.
For some tokens aliases are already defined:
.. sourcecode:: pycon
>>> from pygments.token import String
>>> String
Token.Literal.String
Inside the :mod:`pygments.token` module the following aliases are defined:
============= ============================ ====================================
`Text` `Token.Text` for any type of text data
`Whitespace` `Token.Text.Whitespace` for whitespace
`Error` `Token.Error` represents lexer errors
`Other` `Token.Other` special token for data not
matched by a parser (e.g. HTML
markup in PHP code)
`Keyword` `Token.Keyword` any kind of keywords
`Name` `Token.Name` variable/function names
`Literal` `Token.Literal` Any literals
`String` `Token.Literal.String` string literals
`Number` `Token.Literal.Number` number literals
`Operator` `Token.Operator` operators (``+``, ``not``...)
`Punctuation` `Token.Punctuation` punctuation (``[``, ``(``...)
`Comment` `Token.Comment` any kind of comments
`Generic` `Token.Generic` generic tokens (have a look at
the explanation below)
============= ============================ ====================================
Normally you just create token types using the already defined aliases. For each
of those token aliases, a number of subtypes exists (excluding the special tokens
`Token.Text`, `Token.Error` and `Token.Other`)
It's also possible to convert strings to token types (for example
if you want to supply a token from the command line):
.. sourcecode:: pycon
>>> from pygments.token import String, string_to_tokentype
>>> string_to_tokentype("String")
Token.Literal.String
>>> string_to_tokentype("Token.Literal.String")
Token.Literal.String
>>> string_to_tokentype(String)
Token.Literal.String
Keyword Tokens
==============
`Keyword`
For any kind of keyword (especially if it doesn't match any of the
subtypes of course).
`Keyword.Constant`
For keywords that are constants (e.g. ``None`` in future Python versions).
`Keyword.Declaration`
For keywords used for variable declaration (e.g. ``var`` in some programming
languages like JavaScript).
`Keyword.Namespace`
For keywords used for namespace declarations (e.g. ``import`` in Python and
Java and ``package`` in Java).
`Keyword.Pseudo`
For keywords that aren't really keywords (e.g. ``None`` in old Python
versions).
`Keyword.Reserved`
For reserved keywords.
`Keyword.Type`
For builtin types that can't be used as identifiers (e.g. ``int``,
``char`` etc. in C).
Name Tokens
===========
`Name`
For any name (variable names, function names, classes).
`Name.Attribute`
For all attributes (e.g. in HTML tags).
`Name.Builtin`
Builtin names; names that are available in the global namespace.
`Name.Builtin.Pseudo`
Builtin names that are implicit (e.g. ``self`` in Ruby, ``this`` in Java).
`Name.Class`
Class names. Because no lexer can know if a name is a class or a function
or something else this token is meant for class declarations.
`Name.Constant`
Token type for constants. In some languages you can recognise a token by the
way it's defined (the value after a ``const`` keyword for example). In
other languages constants are uppercase by definition (Ruby).
`Name.Decorator`
Token type for decorators. Decorators are syntactic elements in the Python
language. Similar syntax elements exist in C# and Java.
`Name.Entity`
Token type for special entities. (e.g. ``&nbsp;`` in HTML).
`Name.Exception`
Token type for exception names (e.g. ``RuntimeError`` in Python). Some languages
define exceptions in the function signature (Java). You can highlight
the name of that exception using this token then.
`Name.Function`
Token type for function names.
`Name.Function.Magic`
same as `Name.Function` but for special function names that have an implicit use
in a language (e.g. ``__init__`` method in Python).
`Name.Label`
Token type for label names (e.g. in languages that support ``goto``).
`Name.Namespace`
Token type for namespaces. (e.g. import paths in Java/Python), names following
the ``module``/``namespace`` keyword in other languages.
`Name.Other`
Other names. Normally unused.
`Name.Property`
Additional token type occasionally used for class attributes.
`Name.Tag`
Tag names (in HTML/XML markup or configuration files).
`Name.Variable`
Token type for variables. Some languages have prefixes for variable names
(PHP, Ruby, Perl). You can highlight them using this token.
`Name.Variable.Class`
same as `Name.Variable` but for class variables (also static variables).
`Name.Variable.Global`
same as `Name.Variable` but for global variables (used in Ruby, for
example).
`Name.Variable.Instance`
same as `Name.Variable` but for instance variables.
`Name.Variable.Magic`
same as `Name.Variable` but for special variable names that have an implicit use
in a language (e.g. ``__doc__`` in Python).
Literals
========
`Literal`
For any literal (if not further defined).
`Literal.Date`
for date literals (e.g. ``42d`` in Boo).
`String`
For any string literal.
`String.Affix`
Token type for affixes that further specify the type of the string they're
attached to (e.g. the prefixes ``r`` and ``u8`` in ``r"foo"`` and ``u8"foo"``).
`String.Backtick`
Token type for strings enclosed in backticks.
`String.Char`
Token type for single characters (e.g. Java, C).
`String.Delimiter`
Token type for delimiting identifiers in "heredoc", raw and other similar
strings (e.g. the word ``END`` in Perl code ``print <<'END';``).
`String.Doc`
Token type for documentation strings (for example Python).
`String.Double`
Double quoted strings.
`String.Escape`
Token type for escape sequences in strings.
`String.Heredoc`
Token type for "heredoc" strings (e.g. in Ruby or Perl).
`String.Interpol`
Token type for interpolated parts in strings (e.g. ``#{foo}`` in Ruby).
`String.Other`
Token type for any other strings (for example ``%q{foo}`` string constructs
in Ruby).
`String.Regex`
Token type for regular expression literals (e.g. ``/foo/`` in JavaScript).
`String.Single`
Token type for single quoted strings.
`String.Symbol`
Token type for symbols (e.g. ``:foo`` in LISP or Ruby).
`Number`
Token type for any number literal.
`Number.Bin`
Token type for binary literals (e.g. ``0b101010``).
`Number.Float`
Token type for float literals (e.g. ``42.0``).
`Number.Hex`
Token type for hexadecimal number literals (e.g. ``0xdeadbeef``).
`Number.Integer`
Token type for integer literals (e.g. ``42``).
`Number.Integer.Long`
Token type for long integer literals (e.g. ``42L`` in Python).
`Number.Oct`
Token type for octal literals.
Operators
=========
`Operator`
For any punctuation operator (e.g. ``+``, ``-``).
`Operator.Word`
For any operator that is a word (e.g. ``not``).
Punctuation
===========
.. versionadded:: 0.7
`Punctuation`
For any punctuation which is not an operator (e.g. ``[``, ``(``...)
`Punctuation.Marker`
For markers that point to a location (e.g., carets in Python
tracebacks for syntax errors).
.. versionadded:: 2.10
Comments
========
`Comment`
Token type for any comment.
`Comment.Hashbang`
Token type for hashbang comments (i.e. first lines of files that start with
``#!``).
`Comment.Multiline`
Token type for multiline comments.
`Comment.Preproc`
Token type for preprocessor comments (also ``<?php``/``<%`` constructs).
`Comment.PreprocFile`
Token type for filenames in preprocessor comments, such as include files in C/C++.
`Comment.Single`
Token type for comments that end at the end of a line (e.g. ``# foo``).
`Comment.Special`
Special data in comments. For example code tags, author and license
information, etc.
Generic Tokens
==============
Generic tokens are for special lexers like the `DiffLexer` that doesn't really
highlight a programming language but a patch file.
`Generic`
A generic, unstyled token. Normally you don't use this token type.
`Generic.Deleted`
Marks the token value as deleted.
`Generic.Emph`
Marks the token value as emphasized.
`Generic.Error`
Marks the token value as an error message.
`Generic.Heading`
Marks the token value as headline.
`Generic.Inserted`
Marks the token value as inserted.
`Generic.Output`
Marks the token value as program output (e.g. for python cli lexer).
`Generic.Prompt`
Marks the token value as command prompt (e.g. bash lexer).
`Generic.Strong`
Marks the token value as bold (e.g. for rst lexer).
`Generic.EmphStrong`
Marks the token value as bold and emphasized.
`Generic.Subheading`
Marks the token value as subheadline.
`Generic.Traceback`
Marks the token value as a part of an error traceback.