Skip to content

Conversation

@dezhiAmd
Copy link
Contributor

@dezhiAmd dezhiAmd commented Dec 1, 2025

Motivation

Introduces a memory monitoring system to diagnose out-of-memory (OOM) issues during CI builds.

Technical Details

The implementation includes monitoring tools, analysis utilities, and example integrations.

#1. memory_monitor.py is for memory monitoring.

#2. Run the command below to analyze results after build:
python build_tools/analyze_memory_logs.py --detailed --github-summary

Test Plan

#1. Pick a CI such as "CI / Windows::gfx110X-all::release / Build Artifacts / Build release (xfail false) (pull_request)",
click "Rerun with debug enabled"
observe that there is a folder by the name of memory-logs is created in folder build
#2. Run the command below to analyze the memory usage:
python build_tools\analyze_memory_logs.py --detailed --github-summar

Test Result

A typical result looks like this:

[*] Found 1 log file(s)
================================================================================
MEMORY USAGE ANALYSIS REPORT
================================================================================

Total phases analyzed: 1

SUMMARY (Sorted by Peak Memory Usage)
--------------------------------------------------------------------------------
Phase                                    Peak         Avg          Severity
--------------------------------------------------------------------------------
[~]  Build Target: therock-archives        76.6%        47.9%        MEDIUM
--------------------------------------------------------------------------------

DETAILED BREAKDOWN
================================================================================

Phase: Build Target: therock-archives
  Duration: 11779.8s
  Samples: 1179
  Memory Usage:
    Average: 47.9%
    Peak: 76.6% (24.44 GB)
    Range: 27.8% - 76.6%
  Swap Usage:
    Average: 2.8%
    Peak: 3.1%
  Time Range:
    Start: 2025-12-01T12:25:19.070405
    End: 2025-12-01T15:41:38.854486

Submission Checklist

dezhiAmd and others added 10 commits November 20, 2025 18:36
…ent, both Windows and Linux builds will now include memory monitoring

Signed-off-by: dezhliao <[email protected]>
Signed-off-by: dezhliao <[email protected]>
Signed-off-by: dezhliao <[email protected]>
Signed-off-by: dezhliao <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive memory monitoring system to help diagnose out-of-memory (OOM) issues during CI builds. The implementation includes monitoring tools, analysis utilities, and example integrations.

Key Changes:

  • Added memory monitoring infrastructure with memory_monitor.py for real-time tracking and analyze_memory_logs.py for post-build analysis
  • Created wrapper script memory_wrapped_build.py to integrate monitoring with build commands
  • Integrated optional memory monitoring into Windows and Linux build workflows via debug flag

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
docs/development/memory-monitoring-example-integration.yml Example workflow demonstrating comprehensive memory monitoring integration
build_tools/memory_monitor.py Core monitoring module that tracks system and process memory usage
build_tools/github_actions/memory_wrapped_build.py Wrapper script for executing build commands with memory monitoring
build_tools/analyze_memory_logs.py Analysis tool for processing and reporting on memory logs
build_tools/tests/test_memory_monitor.py Test suite for memory monitoring functionality
.github/workflows/build_windows_artifacts.yml Windows build workflow with conditional memory monitoring
.github/workflows/build_portable_linux_artifacts.yml Linux build workflow with conditional memory monitoring

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dezhiAmd dezhiAmd marked this pull request as ready for review December 2, 2025 00:15
Copy link
Contributor

@geomin12 geomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scripts look fantastic! as I mentioned in the comments, we want to keep build step + logging separate.

My thought is, let's land these scripts after some review (and keep cmake a separate step in workflows), then I can work on uploading the logs to s3 in another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, let's add a README with just the analyze steps? I don't think there's a need to include the entire yml file

max_swap_percent,
)

def write_github_summary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i may do instead of write_github_summary is: we will write to a file, upload to s3 periodically

then at the end (if successful), we have a link to that log file from s3. I think writing to github summary may be bulky and too much

@dezhiAmd dezhiAmd closed this Dec 2, 2025
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Dec 2, 2025
@dezhiAmd dezhiAmd reopened this Dec 2, 2025
@dezhiAmd dezhiAmd moved this from Done to TODO in TheRock Triage Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants