gtkdoc.mkhtml2
¶
Generate html from docbook
The tool loads the main xml document (<module>-docs.xml) and chunks it like the xsl-stylesheets would do. For that it resolves all the xml-includes. Each chunk is converted to html using python functions.
In contrast to our previous approach of running gtkdoc-mkhtml + gtkdoc-fixxref, this tools will replace both without relying on external tools such as xsltproc and source-highlight.
Please note, that we’re not aiming for complete docbook-xml support. All tags used in the generated xml are of course handled. More tags used in handwritten xml can be easily supported, but for some combinations of tags we prefer simplicity.
TODO: - tag converters:
‘section’/’simplesect’ - the first we convert as a chunk, the nested ones we need to convert as ‘sect{2,3,4,…}, we can track depth in ‘ctx’
inside ‘footnote’ one can have many tags, we only handle ‘para’/’simpara’
inside ‘glossentry’ we’re only handling ‘glossterm’ and ‘glossdef’
convert_{figure,table} need counters.
check each docbook tag if it can contain #PCDATA, if not don’t check for xml.text/xml.tail and add a comment (# no PCDATA allowed here)
find a better way to print context for warnings - we use ‘xml.sourceline’, but this all does not help a lot due to xi:include
consolidate title handling: - always use the titles-dict
convert_title(): uses titles.get(tid)[‘title’]
convert_xref(): uses titles[tid][‘tag’], [‘title’] and [‘xml’]
create_devhelp2_refsect2_keyword(): uses titles[tid][‘title’]
there only store what we have (xml, tag, …)
when chunking generate ‘id’s and add entries to titles-dict
add accessors for title and raw_title that lazily get them
see if any of the other ~10 places that call convert_title() could use this cache
performance - consider some perf-warnings flag
see ‘No “id” attribute on’
xinclude processing in libxml2 is slow - if we disable it, we get ‘{http://www.w3.org/2003/XInclude}include’ tags
and we could try handling them ourself, in some cases those are subtrees that we extract for chunking anyway
DIFFERENCES: - titles
we add the chunk label to the title in toc, on the page and in nav tooltips
docbook xsl only sometimes adds the label to the titles and when it does it adds name chunk type too (e.g. ‘Part I.’ instead of ‘I.’)
navigation - we always add an up-link except on the first page
footer - we’re nov omitting the footer
tocs - we always add “Table of Contents’ before a toc - docbook does that for some pages, it is configurable
OPTIONAL: - minify html: https://pypi.python.org/pypi/htmlmin/
Requirements: sudo pip3 install anytree lxml pygments
Example invocation: cd tests/bugs/docs/ mkdir -p db2html cd dbhtml ../../../../gtkdoc-mkhtml2 tester ../tester-docs.xml cd .. xdg-open db2html/index.html meld html db2html
Benchmarking: cd tests/bugs/docs/; rm html-build.stamp; time make html-build.stamp
- class gtkdoc.mkhtml2.FakeDTDResolver¶
Don’t load the docbookx.dtd since we disable the validation anyway.
libxsml2 does not cache DTDs. If we produce a docbook file with 100 chunks loading such a doc with xincluding will load and parse the docbook DTD 100 times. This cases tons of memory allocations and is slow.
- resolve(self, system_url, public_id, context)¶
Override this method to resolve an external source by
system_url
andpublic_id
. The third argument is an opaque context object.Return the result of one of the
resolve_*()
methods.
- gtkdoc.mkhtml2.chunk(xml_node, module, depth=0, idx=0, parent=None)¶
Chunk the tree.
The first time, we’re called with parent=None and in that case we return the new_node as the root of the tree. For each tree-node we generate a filename and process the children.
- gtkdoc.mkhtml2.convert(out_dir, module, files, node, src_lang)¶
Convert the docbook chunks to a html file.
- Parameters:
out_dir – already created output dir
files – list of nodes in the tree in pre-order
node – current tree node
- gtkdoc.mkhtml2.gen_chunk_name(node, chunk_params)¶
Generate a chunk file name
This is either based on the id or on the position in the doc. In the latter case it uses a prefix from CHUNK_PARAMS and a sequence number for each chunk type.
- gtkdoc.mkhtml2.get_id_path(node)¶
Generate the ‘id’. We need to walk up the xml-tree and check the positions for each sibling. When reaching the top of the tree we collect remaining index entries from the chunked-tree.