Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/source/external_exporters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ format designated by the ``FORMAT`` string as explained below.

Extending the built-in format exporters
---------------------------------------
A few built-in formats are available by default: ``html``, ``pdf``, ``webpdf``,
``script``, ``latex``. Each of these has its own *exporter* with many
configuration options that can be extended. Having the option to point to a
different *exporter* allows authors to create their own fully customized
A few built-in formats are available by default: ``html``, ``pdf``, ``webhtml``,
``webpdf``, ``script``, ``latex``. Each of these has its own *exporter* with
many configuration options that can be extended. Having the option to point
to a different *exporter* allows authors to create their own fully customized
templates or export formats.

A custom *exporter* must be an importable Python object. We recommend that
Expand Down
6 changes: 3 additions & 3 deletions docs/source/highlighting.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
Customizing Syntax Highlighting
===============================

Under the hood, nbconvert uses pygments to highlight code. pdf, webpdf and html exporting support
changing the highlighting style.
Under the hood, nbconvert uses pygments to highlight code. pdf, webpdf, html, and webhtml
exporting support changing the highlighting style.

Using Builtin styles
--------------------
Pygments has a number of builtin styles available. To use them, we just need to set the style setting
in the relevant preprocessor.

To change html and webpdf highlighting export with:
To change html, webhtml, and webpdf highlighting export with:

.. code-block:: bash

Expand Down
3 changes: 2 additions & 1 deletion docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ notebooks to PDF.
Installing Chromium
-------------------

For converting notebooks to PDF with ``--to webpdf``, nbconvert requires the
For converting notebooks to PDF with ``--to webpdf``, or for prerendering HTML
notebooks via ``--to webhtml``, nbconvert requires the
`playwright <https://github.com/microsoft/playwright-python>`_ Chromium automation library.

Playwright makes use of a specific version of Chromium. If it does not find a suitable
Expand Down
14 changes: 14 additions & 0 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The currently supported output formats are:
- :ref:`HTML <convert_html>`,
- :ref:`LaTeX <convert_latex>`,
- :ref:`PDF <convert_pdf>`,
- :ref:`WebHTML <convert_webhtml>`,
- :ref:`WebPDF <convert_webpdf>`,
- :ref:`Reveal.js HTML slideshow <convert_revealjs>`,
- :ref:`Markdown <convert_markdown>`,
Expand Down Expand Up @@ -71,6 +72,19 @@ HTML

If this option is provided, embed images as base64 urls in the resulting HTML file.

.. _convert_webpdf:

WebHTML
~~~~~~
* ``--to webhtml``

Generates an HTML document by first rendering to HTML, rendering the HTML Chromium headless, and
exporting to resulting HTML content back to a file. This exporter supports the same templates
as ``--to html``.

The webhtml exporter requires the ``playwright`` Chromium automation library, which
can be installed via ``nbconvert[webhtml]``.

.. _convert_latex:

LaTeX
Expand Down
2 changes: 2 additions & 0 deletions nbconvert/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
ScriptExporter,
SlidesExporter,
TemplateExporter,
WebHTMLExporter,
WebPDFExporter,
export,
get_export_names,
Expand Down Expand Up @@ -48,6 +49,7 @@
"ScriptExporter",
"SlidesExporter",
"TemplateExporter",
"WebHTMLExporter",
"WebPDFExporter",
"__version__",
"export",
Expand Down
2 changes: 2 additions & 0 deletions nbconvert/exporters/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from .script import ScriptExporter
from .slides import SlidesExporter
from .templateexporter import TemplateExporter
from .webhtml import WebHTMLExporter
from .webpdf import WebPDFExporter

__all__ = [
Expand All @@ -34,6 +35,7 @@
"ScriptExporter",
"SlidesExporter",
"TemplateExporter",
"WebHTMLExporter",
"WebPDFExporter",
"export",
"get_export_names",
Expand Down
159 changes: 159 additions & 0 deletions nbconvert/exporters/webhtml.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
"""Export to HTML after loading in a headless browser"""

# Copyright (c) IPython Development Team.
# Distributed under the terms of the Modified BSD License.

import asyncio
import concurrent.futures
import os
import subprocess
import sys
import tempfile
from importlib import util as importlib_util

from traitlets import Bool, List, Unicode, default

from .html import HTMLExporter

PLAYWRIGHT_INSTALLED = importlib_util.find_spec("playwright") is not None
IS_WINDOWS = os.name == "nt"

__all__ = ("WebHTMLExporter",)


class WebHTMLExporter(HTMLExporter):
"""Writer designed to write to HTML files after rendering in a browser.

This inherits from :class:`HTMLExporter`. It creates the HTML using the
template machinery, and then run playwright to load in a browser, saving
the resulting page.
"""

export_from_notebook = "HTML via Browser"

allow_chromium_download = Bool(
False,
help="Whether to allow downloading Chromium if no suitable version is found on the system.",
).tag(config=True)

@default("file_extension")
def _file_extension_default(self):
return ".html"

@default("template_name")
def _template_name_default(self):
return "webhtml"

disable_sandbox = Bool(
False,
help="""
Disable chromium security sandbox when converting to PDF.

WARNING: This could cause arbitrary code execution in specific circumstances,
where JS in your notebook can execute serverside code! Please use with
caution.

``https://github.com/puppeteer/puppeteer/blob/main@%7B2020-12-14T17:22:24Z%7D/docs/troubleshooting.md#setting-up-chrome-linux-sandbox``
has more information.

This is required for webhtml to work inside most container environments.
""",
).tag(config=True)

browser_args = List(
Unicode(),
help="""
Additional arguments to pass to the browser rendering to PDF.

These arguments will be passed directly to the browser launch method
and can be used to customize browser behavior beyond the default settings.
""",
).tag(config=True)

def run_playwright(self, html, _postprocess=None):
"""Run playwright."""

async def main(temp_file):
"""Run main playwright script."""

try:
from playwright.async_api import ( # type: ignore[import-not-found] # noqa: PLC0415,
async_playwright,
)
except ModuleNotFoundError as e:
msg = (
"Playwright is not installed to support Web PDF conversion. "
"Please install `nbconvert[webpdf]` to enable."
)
raise RuntimeError(msg) from e

if self.allow_chromium_download:
cmd = [sys.executable, "-m", "playwright", "install", "chromium"]
subprocess.check_call(cmd) # noqa: S603

playwright = await async_playwright().start()
chromium = playwright.chromium

args = self.browser_args
if self.disable_sandbox:
args.append("--no-sandbox")

try:
browser = await chromium.launch(
handle_sigint=False, handle_sigterm=False, handle_sighup=False, args=args
)
except Exception as e:
msg = (
"No suitable chromium executable found on the system. "
"Please use '--allow-chromium-download' to allow downloading one,"
"or install it using `playwright install chromium`."
)
await playwright.stop()
raise RuntimeError(msg) from e

page = await browser.new_page()
await page.emulate_media(media="print")
await page.wait_for_timeout(100)
await page.goto(f"file://{temp_file.name}", wait_until="networkidle")
await page.wait_for_timeout(100)

data = await page.content()

if _postprocess:
# Reuse this code for webpdf
data = await _postprocess(page, browser, playwright)

await browser.close()
await playwright.stop()
return data

pool = concurrent.futures.ThreadPoolExecutor()
# Create a temporary file to pass the HTML code to Chromium:
# Unfortunately, tempfile on Windows does not allow for an already open
# file to be opened by a separate process. So we must close it first
# before calling Chromium. We also specify delete=False to ensure the
# file is not deleted after closing (the default behavior).
temp_file = tempfile.NamedTemporaryFile( # noqa: SIM115
suffix=".html", delete=False
)
with temp_file:
if isinstance(html, str):
temp_file.write(html.encode("utf-8"))
else:
temp_file.write(html)
try:
html_data = pool.submit(asyncio.run, main(temp_file)).result()
finally:
# Ensure the file is deleted even if playwright raises an exception
os.unlink(temp_file.name)
return html_data

def from_notebook_node(self, nb, resources=None, **kw):
"""Convert from a notebook node."""
html, resources = super().from_notebook_node(nb, resources=resources, **kw)

self.log.info("Building HTML")
html_data = self.run_playwright(html)
self.log.info("HTML successfully created")

return html_data, resources
Loading
Loading