gtkdoc.mkhtml2

Generate html from docbook

The tool loads the main xml document (<module>-docs.xml) and chunks it like the xsl-stylesheets would do. For that it resolves all the xml-includes. Each chunk is converted to html using python functions.

In contrast to our previous approach of running gtkdoc-mkhtml + gtkdoc-fixxref, this tools will replace both without relying on external tools such as xsltproc and source-highlight.

Please note, that we’re not aiming for complete docbook-xml support. All tags used in the generated xml are of course handled. More tags used in handwritten xml can be easily supported, but for some combinations of tags we prefer simplicity.

TODO: - tag converters:

  • ‘section’/’simplesect’ - the first we convert as a chunk, the nested ones we need to convert as ‘sect{2,3,4,…}, we can track depth in ‘ctx’

  • inside ‘footnote’ one can have many tags, we only handle ‘para’/’simpara’

  • inside ‘glossentry’ we’re only handling ‘glossterm’ and ‘glossdef’

  • convert_{figure,table} need counters.

  • check each docbook tag if it can contain #PCDATA, if not don’t check for xml.text/xml.tail and add a comment (# no PCDATA allowed here)

  • find a better way to print context for warnings - we use ‘xml.sourceline’, but this all does not help a lot due to xi:include

  • consolidate title handling: - always use the titles-dict

    • convert_title(): uses titles.get(tid)[‘title’]

    • convert_xref(): uses titles[tid][‘tag’], [‘title’] and [‘xml’]

    • create_devhelp2_refsect2_keyword(): uses titles[tid][‘title’]

    • there only store what we have (xml, tag, …)

    • when chunking generate ‘id’s and add entries to titles-dict

    • add accessors for title and raw_title that lazily get them

    • see if any of the other ~10 places that call convert_title() could use this cache

  • performance - consider some perf-warnings flag

    • see ‘No “id” attribute on’

    • xinclude processing in libxml2 is slow - if we disable it, we get ‘{http://www.w3.org/2003/XInclude}include’ tags

      and we could try handling them ourself, in some cases those are subtrees that we extract for chunking anyway

DIFFERENCES: - titles

  • we add the chunk label to the title in toc, on the page and in nav tooltips

  • docbook xsl only sometimes adds the label to the titles and when it does it adds name chunk type too (e.g. ‘Part I.’ instead of ‘I.’)

  • navigation - we always add an up-link except on the first page

  • footer - we’re nov omitting the footer

  • tocs - we always add “Table of Contents’ before a toc - docbook does that for some pages, it is configurable

OPTIONAL: - minify html: https://pypi.python.org/pypi/htmlmin/

Requirements: sudo pip3 install anytree lxml pygments

Example invocation: cd tests/bugs/docs/ mkdir -p db2html cd dbhtml ../../../../gtkdoc-mkhtml2 tester ../tester-docs.xml cd .. xdg-open db2html/index.html meld html db2html

Benchmarking: cd tests/bugs/docs/; rm html-build.stamp; time make html-build.stamp

class gtkdoc.mkhtml2.FakeDTDResolver

Don’t load the docbookx.dtd since we disable the validation anyway.

libxsml2 does not cache DTDs. If we produce a docbook file with 100 chunks loading such a doc with xincluding will load and parse the docbook DTD 100 times. This cases tons of memory allocations and is slow.

resolve(self, system_url, public_id, context)

Override this method to resolve an external source by system_url and public_id. The third argument is an opaque context object.

Return the result of one of the resolve_*() methods.

gtkdoc.mkhtml2.chunk(xml_node, module, depth=0, idx=0, parent=None)

Chunk the tree.

The first time, we’re called with parent=None and in that case we return the new_node as the root of the tree. For each tree-node we generate a filename and process the children.

gtkdoc.mkhtml2.convert(out_dir, module, files, node, src_lang)

Convert the docbook chunks to a html file.

Parameters:
  • out_dir – already created output dir

  • files – list of nodes in the tree in pre-order

  • node – current tree node

gtkdoc.mkhtml2.gen_chunk_name(node, chunk_params)

Generate a chunk file name

This is either based on the id or on the position in the doc. In the latter case it uses a prefix from CHUNK_PARAMS and a sequence number for each chunk type.

gtkdoc.mkhtml2.get_id_path(node)

Generate the ‘id’. We need to walk up the xml-tree and check the positions for each sibling. When reaching the top of the tree we collect remaining index entries from the chunked-tree.