xdoctest.parser module

The XDoctest Parser

This parses a docstring into one or more “doctest part” after the docstrings have been extracted from the source code by either static or dynamic means.

Terms and definitions:

logical block:

a snippet of code that can be executed by itself if given the correct global / local variable context.

PS1:

The original meaning is “Prompt String 1”. For details see: [SE32096] [BashPS1] [CustomPrompt] [GeekPrompt]. In the context of xdoctest, instead of referring to the prompt prefix, we use PS1 to refer to a line that starts a “logical block” of code. In the original doctest module these all had to be prefixed with “>>>”. In xdoctest the prefix is used to simply denote the code is part of a doctest. It does not necessarily mean a new “logical block” is starting.

PS2:

The original meaning is “Prompt String 2”. In the context of xdoctest, instead of referring to the prompt prefix, we use PS2 to refer to a line that continues a “logical block” of code. In the original doctest module these all had to be prefixed with “…”. However, xdoctest uses parsing to automatically determine this.

want statement:

Lines directly after a logical block of code in a doctest indicating the desired result of executing the previous block.

While I do believe this AST-based code is a significant improvement over the RE-based builtin doctest parser, I acknowledge that I’m not an AST expert and there is room for improvement here.

References

class xdoctest.parser.DoctestParser(simulate_repl=False)[source]

Bases: object

Breaks docstrings into parts using the parse method.

Example

>>> from xdoctest.parser import *  # NOQA
>>> parser = DoctestParser()
>>> doctest_parts = parser.parse(
>>>     '''
>>>     >>> j = 0
>>>     >>> for i in range(10):
>>>     >>>     j += 1
>>>     >>> print(j)
>>>     10
>>>     '''.lstrip('\n'))
>>> print('\n'.join(list(map(str, doctest_parts))))
<DoctestPart(ln 0, src="j = 0...", want=None)>
<DoctestPart(ln 3, src="print(j)...", want="10...")>

Example

>>> # Having multiline strings in doctests can be nice
>>> string = utils.codeblock(
        '''
        >>> name = 'name'
        'anything'
        ''')
>>> self = DoctestParser()
>>> doctest_parts = self.parse(string)
>>> print('\n'.join(list(map(str, doctest_parts))))
Parameters:

simulate_repl (bool) – if True each line will be treated as its own doctest. This more closely mimics the original doctest module. Defaults to False.

parse(string, info=None)[source]

Divide the given string into examples and interleaving text.

Parameters:
  • string (str) – The docstring that may contain one or more doctests.

  • info (dict | None) – info about where the string came from in case of an error

Returns:

a list of DoctestPart objects and intervening text in the input docstring.

Return type:

List[xdoctest.doctest_part.DoctestPart | str]

CommandLine

python -m xdoctest.parser DoctestParser.parse

Example

>>> docstr = '''
>>>     A simple docstring contains text followed by an example.
>>>     >>> numbers = [1, 2, 3, 4]
>>>     >>> thirds = [x / 3 for x in numbers]
>>>     >>> print(thirds)
>>>     [0.33  0.66  1  1.33]
>>> '''
>>> from xdoctest import parser
>>> self = parser.DoctestParser()
>>> results = self.parse(docstr)
>>> assert len(results) == 3
>>> for index, result in enumerate(results):
>>>     print(f'results[{index}] = {result!r}')
results[0] = '\nA simple docstring contains text followed by an example.'
results[1] = <DoctestPart(ln 2, src="numbers ...", want=None) at ...>
results[2] = <DoctestPart(ln 4, src="print(th...", want="[0.33  0...") at ...>

Example

>>> s = 'I am a dummy example with two parts'
>>> x = 10
>>> print(s)
I am a dummy example with two parts
>>> s = 'My purpose it so demonstrate how wants work here'
>>> print('The new want applies ONLY to stdout')
>>> print('given before the last want')
>>> '''
    this wont hurt the test at all
    even though its multiline '''
>>> y = 20
The new want applies ONLY to stdout
given before the last want
>>> # Parts from previous examples are executed in the same context
>>> print(x + y)
30

this is simply text, and doesnt apply to the previous doctest the <BLANKLINE> directive is still in effect.

Example

>>> from xdoctest.parser import *  # NOQA
>>> from xdoctest import parser
>>> from xdoctest.docstr import docscrape_google
>>> from xdoctest import core
>>> self = parser.DoctestParser()
>>> docstr = self.parse.__doc__
>>> blocks = docscrape_google.split_google_docblocks(docstr)
>>> doclineno = self.parse.__func__.__code__.co_firstlineno
>>> key, (string, offset) = blocks[-2]
>>> self._label_docsrc_lines(string)
>>> doctest_parts = self.parse(string)
>>> # each part with a want-string needs to be broken in two
>>> assert len(doctest_parts) == 6
>>> len(doctest_parts)
_package_groups(grouped_lines)[source]
_package_chunk(raw_source_lines, raw_want_lines, lineno=0)[source]

if self.simulate_repl is True, then each statement is broken into its own part. Otherwise, statements are grouped by the closest want statement.

Todo

  • [ ] EXCEPT IN CASES OF EXPLICIT CONTINUATION

Example

>>> from xdoctest.parser import *
>>> raw_source_lines = ['>>> "string"']
>>> raw_want_lines = ['string']
>>> self = DoctestParser()
>>> part, = self._package_chunk(raw_source_lines, raw_want_lines)
>>> part.source
'"string"'
>>> part.want
'string'
_group_labeled_lines(labeled_lines)[source]

Group labeled lines into logical parts to be executed together

Returns:

A list of parts. Text parts are just returned as a list of lines. Executable parts are returned as a tuple of source lines and an optional “want” statement.

Return type:

List[List[str] | Tuple[List[str], str]]

_locate_ps1_linenos(source_lines)[source]

Determines which lines in the source begin a “logical block” of code.

Parameters:

source_lines (List[str]) – lines belonging only to the doctest src these will be unindented, prefixed, and without any want.

Returns:

linenos is the first value a list of indices indicating which lines are considered “PS1” and mode_hint, the second value, is a flag indicating if the final line should be considered for a got/want assertion.

Return type:

Tuple[List[int], bool]

Example

>>> self = DoctestParser()
>>> source_lines = ['>>> def foo():', '>>>     return 0', '>>> 3']
>>> linenos, mode_hint = self._locate_ps1_linenos(source_lines)
>>> assert linenos == [0, 2]
>>> assert mode_hint == 'eval'

Example

>>> from xdoctest.parser import *  # NOQA
>>> self = DoctestParser()
>>> source_lines = ['>>> x = [1, 2, ', '>>> 3, 4]', '>>> print(len(x))']
>>> linenos, mode_hint = self._locate_ps1_linenos(source_lines)
>>> assert linenos == [0, 2]
>>> assert mode_hint == 'eval'

Example

>>> from xdoctest.parser import *  # NOQA
>>> self = DoctestParser()
>>> source_lines = [
>>>    '>>> x = 1',
>>>    '>>> try: raise Exception',
>>>    '>>> except Exception: pass',
>>>    '...',
>>> ]
>>> linenos, mode_hint = self._locate_ps1_linenos(source_lines)
>>> assert linenos == [0, 1]
>>> assert mode_hint == 'exec'

Example

>>> from xdoctest.parser import *  # NOQA
>>> self = DoctestParser()
>>> source_lines = [
>>>    '>>> import os; print(os)',
>>>    '...',
>>> ]
>>> linenos, mode_hint = self._locate_ps1_linenos(source_lines)
>>> assert linenos == [0]
>>> assert mode_hint == 'single'

Example

>>> # We should ensure that decorators are PS1 lines
>>> from xdoctest.parser import *  # NOQA
>>> self = DoctestParser()
>>> source_lines = [
>>>    '>>> # foo',
>>>    '>>> @foo',
>>>    '... def bar():',
>>>    '...     ...',
>>> ]
>>> linenos, mode_hint = self._locate_ps1_linenos(source_lines)
>>> print(f'linenos={linenos}')
>>> assert linenos == [0, 1]
_label_docsrc_lines(string)[source]

Give each line in the docstring a label so we can distinguish what parts are text, what parts are code, and what parts are “want” string.

Parameters:

string (str) – doctest source

Returns:

labeled_lines - the above source broken

up by lines, each with a label indicating its type for later use in parsing.

Return type:

List[Tuple[str, str]]

Todo

  • [ ] Sphinx does not parse this doctest properly

Example

>>> from xdoctest.parser import *
>>> # Having multiline strings in doctests can be nice
>>> string = utils.codeblock(
        '''
        text
        >>> items = ['also', 'nice', 'to', 'not', 'worry',
        >>>          'about', '...', 'vs', '>>>']
        ... print('but its still allowed')
        but its still allowed

more text ‘’’)

>>> self = DoctestParser()
>>> labeled = self._label_docsrc_lines(string)
>>> expected = [
>>>     ('text', 'text'),
>>>     ('dsrc', ">>> items = ['also', 'nice', 'to', 'not', 'worry',"),
>>>     ('dsrc', ">>>          'about', '...', 'vs', '>>>']"),
>>>     ('dcnt', "... print('but its still allowed')"),
>>>     ('want', 'but its still allowed'),
>>>     ('text', ''),
>>>     ('text', 'more text')
>>> ]
>>> assert labeled == expected
xdoctest.parser._min_indentation(s)[source]

Return the minimum indentation of any non-blank line in s

xdoctest.parser._complete_source(line, state_indent, line_iter)[source]

helper remove lines from the iterator if they are needed to complete source

This uses static.is_balanced_statement() to do the heavy lifting

Example

>>> from xdoctest.parser import *  # NOQA
>>> from xdoctest.parser import _complete_source
>>> state_indent = 0
>>> line = '>>> x = { # The line is not finished'
>>> remain_lines = ['>>> 1:2,', '>>> 3:4,', '>>> 5:6}', '>>> y = 7']
>>> line_iter = enumerate(remain_lines, start=1)
>>> finished = list(_complete_source(line, state_indent, line_iter))
>>> final = chr(10).join([t[1] for t in finished])
>>> print(final)
xdoctest.parser._iterthree(items, pad_value=None)[source]

Iterate over a sliding window of size 3 with None padding on both sides.

Example

>>> from xdoctest.parser import *
>>> print(list(_iterthree([])))
>>> print(list(_iterthree(range(1))))
>>> print(list(_iterthree([1, 2])))
>>> print(list(_iterthree([1, 2, 3])))
>>> print(list(_iterthree(range(4))))
>>> print(list(_iterthree(range(7))))
xdoctest.parser._hasprefix(line, prefixes)[source]

helper prefix test