Skip to content

Conversation

@bryanvaz
Copy link

@bryanvaz bryanvaz commented Oct 18, 2024

Fixes issue where \n characters are randomly inserted into the console for stdout messages. This results in non-deterministic output breaking any downstream process.

Connected to #433, #496
Closes #765

Current Behaviour:

When there is significant execution time or async activity between stdout/stderr calls in an output cell, the default logger inserts newline terminators to the papermill console output. This output is different than output as executed in a live notebook client. (See below for screenshots)

Example 1:
Notebook Output:
Screenshot 2024-10-18 at 3 17 11 AM
Console Output:
Screenshot 2024-10-18 at 3 19 02 AM

=============

Example 2:
Notebook Output:

Screenshot 2024-10-18 at 3 25 31 AM
Console Output:
Screenshot 2024-10-18 at 3 24 30 AM

=============

Example 3:
Notebook Output:

Screenshot 2024-10-18 at 3 27 04 AM
Console Output:
Screenshot 2024-10-18 at 3 27 32 AM

Expected Behaviour:

When using the --log-output option, especially when using the cli, the cell output text from papermill should match the text output when running a notebook in jupyter (or other .ipynb client)

Fix:

Added a secondary logger notebook_logger to log.py that removes the extraneous terminator from the stream handlers. The notebook logger is then only used by cell outputs, leaving the default papermill logger to handle all papermill messages.

Users can override this behaviour when using papermill as a library by either setting the log.notebook_logger to the default log.loggeror modifying the parameters when instantiating a Papermill client.

Output with fix:
Screenshot 2024-10-18 at 3 47 10 AM

Reasoning behind Fix

The papermill client was using the default log.info of the default notebook client for all stdout messages from a notebook when the explicit —log-output option is used. This calls the default logger which is built as a logger for syslog. As a result, the log formatter will automatically add \n when flushing.

In Jupyter, the input cell’s stdout and stderr calls are captured and redirected to the notebook’s output cells without additional \n characters added in order to preserve the intended output formatting of the notebook’s author. If the author needs to modify their output to ensure it is compatible with whatever downstream tool/process, they should be responsible for making those changes in the notebook itself and trusting that papermill will not alter the cell’s output.

Unfortunately this does not fix the same issue in the underlying nbclient. As a result, the cell output in the output notebook is still different than if you were to run the notebook in a live system (additional newline characters.)

Excerpt from python logging that is causing the issue:

class StreamHandler(Handler):
…
    terminator = '\n'def emit(self, record):
        """
        Emit a record.

        If a formatter is specified, it is used to format the record.
        The record is then written to the stream with a trailing newline.  If
        exception information is present, it is formatted using
        traceback.print_exception and appended to the stream.  If the stream
        has an 'encoding' attribute, it is used to determine how to do the
        output to the stream.
        """
        try:
            msg = self.format(record)
            stream = self.stream
            # issue 35046: merged two stream.writes into one.
            stream.write(msg + self.terminator) # <-- `\n` terminator is added to every call of log._log
            self.flush()

@codecov
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 9 lines in your changes missing coverage. Please review.

Project coverage is 90.87%. Comparing base (cb2eb37) to head (2ec6897).
Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #810      +/-   ##
==========================================
- Coverage   91.54%   90.87%   -0.68%     
==========================================
  Files          17       17              
  Lines        1621     1654      +33     
==========================================
+ Hits         1484     1503      +19     
- Misses        137      151      +14     

Comment on lines 435 to +439
with patch.object(logger, 'info') as info_mock:
with patch.object(logger, 'warning') as warning_mock:
with patch.object(NotebookExecutionManager, 'save'):
NBClientEngine.execute_notebook(
self.nb,
'python',
output_path='foo.ipynb',
progress_bar=False,
log_output=True,
)
info_mock.assert_has_calls(
[
call('Executing notebook with kernel: python'),
call('Executing Cell 1---------------------------------------'),
call('Ending Cell 1------------------------------------------'),
call('Executing Cell 2---------------------------------------'),
call('None\n'),
call('Ending Cell 2------------------------------------------'),
]
)
warning_mock.is_not_called()
with patch.object(notebook_logger, 'info') as notebook_info_mock:
with patch.object(notebook_logger, 'warning') as notebook_warning_mock:
with patch.object(NotebookExecutionManager, 'save'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with patch.object(logger, 'info') as info_mock:
with patch.object(logger, 'warning') as warning_mock:
with patch.object(NotebookExecutionManager, 'save'):
NBClientEngine.execute_notebook(
self.nb,
'python',
output_path='foo.ipynb',
progress_bar=False,
log_output=True,
)
info_mock.assert_has_calls(
[
call('Executing notebook with kernel: python'),
call('Executing Cell 1---------------------------------------'),
call('Ending Cell 1------------------------------------------'),
call('Executing Cell 2---------------------------------------'),
call('None\n'),
call('Ending Cell 2------------------------------------------'),
]
)
warning_mock.is_not_called()
with patch.object(notebook_logger, 'info') as notebook_info_mock:
with patch.object(notebook_logger, 'warning') as notebook_warning_mock:
with patch.object(NotebookExecutionManager, 'save'):
with patch.object(logger, 'info') as info_mock, patch.object(logger, 'warning') as warning_mock, patch.object(notebook_logger, 'info') as notebook_info_mock, patch.object(notebook_logger, 'warning') as notebook_warning_mock, patch.object(NotebookExecutionManager, 'save'):

consider reducing the depth

@Borda Borda requested a review from Copilot November 13, 2025 21:01
Copilot finished reviewing on behalf of Borda November 13, 2025 21:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where newline characters (\n) were randomly inserted into console output for stdout/stderr messages, causing non-deterministic output. The fix introduces a separate notebook_logger that uses a custom NbOutputStreamHandler without the automatic newline terminator, ensuring cell output matches the notebook's intended formatting.

Key Changes:

  • Created a new notebook_logger with custom stream handler that doesn't append newline terminators
  • Updated PapermillNotebookClient to accept a log_cell parameter for logging cell output separately from papermill messages
  • Modified tests to mock both the default logger and the new notebook logger appropriately

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
papermill/log.py Introduces notebook_logger and NbOutputStreamHandler class that writes messages without automatic newline terminators
papermill/engines.py Passes notebook_logger as the log_cell parameter when instantiating PapermillNotebookClient
papermill/clientwrap.py Adds log_cell parameter to constructor and uses it for logging cell output instead of the default logger
papermill/tests/test_engines.py Updates tests to mock both logger and notebook_logger separately and verify correct usage
papermill/tests/test_clientwrap.py Updates tests to use notebook_logger for cell output assertions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +477 to +480
info_mock.is_not_called()
warning_mock.is_not_called()
notebook_info_mock.is_not_called()
notebook_warning_mock.is_not_called()
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_not_called() is not a valid assertion method. These should be assert_not_called() instead. The current code doesn't actually assert anything, which means the test will pass even if these mocks are called.

Copilot uses AI. Check for mistakes.
elif self.log_output and ("data" in output and "text/plain" in output.data):
self.log.info("".join(output.data['text/plain']))
self.log_cell.info("".join(output.data['text/plain']))
# self.log.info("".join(output.data['text/plain']))
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out code should be removed. Leftover commented code reduces code clarity and maintainability.

Suggested change
# self.log.info("".join(output.data['text/plain']))

Copilot uses AI. Check for mistakes.
Comment on lines +461 to +462
warning_mock.is_not_called()
notebook_warning_mock.is_not_called()
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_not_called() is not a valid assertion method. This should be assert_not_called() instead. The current code doesn't actually assert anything, which means the test will pass even if these mocks are called.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Does tqdm print properly with --log-output?

2 participants