-
Notifications
You must be signed in to change notification settings - Fork 117
Add datasets installation command #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: George Armstrong <[email protected]>
Signed-off-by: George Armstrong <[email protected]>
Signed-off-by: George Armstrong <[email protected]>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThis PR introduces dataset-specific installation command support. A new utility function retrieves installation commands defined by datasets, aggregates them with user-provided commands, and ensures they are executed before pipeline tasks. The datasets package version constraint is updated in requirements. Changes
Sequence Diagram(s)sequenceDiagram
participant Pipeline as Pipeline Entry<br/>(prepare_data/eval)
participant Utils as Dataset Utils
participant Dataset as Dataset Module<br/>(e.g., livecodebench)
participant Aggregator as Command<br/>Aggregator
participant Executor as Task<br/>Executor
Pipeline->>Utils: get_dataset_installation_command(name)
Utils->>Dataset: import & access INSTALLATION_COMMAND
Dataset-->>Utils: installation_command (if defined)
Utils-->>Pipeline: installation_command or None
Pipeline->>Aggregator: collect per-dataset commands
Aggregator->>Aggregator: join with " && "
Note over Aggregator: aggregated_dataset_installation_command
Pipeline->>Pipeline: merge with user_installation_command
Note over Pipeline: final_installation_command
Pipeline->>Executor: pass final_installation_command
Executor->>Executor: quote with shlex.quote()
Executor->>Executor: execute via bash -c
Executor-->>Pipeline: success/failure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
nemo_skills/dataset/livecodebench/__init__.py (1)
21-23: livecodebench INSTALLATION_COMMAND clashes with globaldatasets>=4.0.0This dataset-level command (
pip install 'datasets<4') conflicts with the project-widedatasets>=4.0.0requirement inrequirements/main.txt. In a shared env this will either fail pip resolution or leave you running code built fordatasets>=4on an older<4version.Please align this command with the global requirement (or segregate environments) so that running livecodebench’s prepare/eval doesn’t destabilize
datasetsfor the rest of the pipelines. See also the comment inrequirements/main.txt.
🧹 Nitpick comments (4)
nemo_skills/pipeline/utils/exp.py (1)
380-390: Quoting fix ininstall_packages_wraplooks correct; env UUID export is redundantUsing
shlex.quote(installation_command)and executing viabash -c {quoted_command}correctly treats the full installation command as a single-cscript and avoids shell parsing issues for<,>,&&, etc. This is a solid improvement over inlining the raw command.If you want to trim dead code,
setup_envcurrently exportsNEMO_SKILLS_JOB_UUIDbut that variable is never read (the UUID is only embedded intolock_file). You can either drop the export or refactor the lock logic to actually derive the path from the env var to make the comment accurate.nemo_skills/dataset/utils.py (1)
241-256: Helper is straightforward; consider documenting name expectations
get_dataset_installation_commandis simple and robust (graceful on missing modules/attrs). The only subtle assumption is thatdataset_namematches the Python module path undernemo_skills.dataset(e.g.,livecodebenchorcode.livecodebench), so datasets coming fromextra_datasetsor with different CLI aliases won’t participate in this mechanism.If you expect non-core or aliased datasets to define
INSTALLATION_COMMANDin the future, it’s worth documenting this expectation in the docstring (and possibly threadingdata_dir/extra_datasets_typethrough later) so callers know which names will work.nemo_skills/pipeline/prepare_data.py (1)
21-21: Dataset installation aggregation works; heuristics are a bit loose but safeThe integration is sound:
prepare_datanow discovers per-datasetINSTALLATION_COMMANDs viaget_dataset_installation_command(...),- aggregates them into a single
" && "-joined string,- and passes that as
installation_commandinto_run_cmd, so the new exp-level wrapper will run them once per job before the main prepare command.A couple of minor points you may want to refine later:
- The heuristic “any
ctx.argsentry that does not start with-is a dataset name” will also see values of flags (e.g.,--split test→test) and Hydra-style extras; these produce harmlessModuleNotFoundError→None, but add a few wasted import attempts.- If the same dataset is specified multiple times, its INSTALLATION_COMMAND will be repeated in the aggregated string; you could mirror the deduplication you already do in
prepare_eval_commands.Neither is functionally blocking given the current defensive implementation.
Also applies to: 94-112, 169-190
nemo_skills/pipeline/utils/eval.py (1)
24-25: Per-benchmark installation command aggregation is consistent with eval’s usageThis wiring looks correct:
dataset_installation_commandsis built while enumerating benchmarks/groups, usingget_dataset_installation_command(benchmark)and deduping by command string.- You then aggregate once at the end (and in the
num_jobs == 0early return) intoaggregated_dataset_installation_commandjoined with" && ", which matches howeval.pyexpects to combine and forward it.- Importing
get_dataset_moduleandimport_from_pathfromnemo_skills.dataset.utilsalso lines up with their definitions there, so call sites here stay consistent.The only assumption worth keeping in mind is that
benchmark_args.namematches the dataset module path undernemo_skills.dataset; benchmarks coming purely fromextra_datasetswon’t currently participate in this mechanism unless they’re also importable under that namespace, which is a reasonable first step.Also applies to: 295-339, 370-377, 508-514
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
nemo_skills/dataset/livecodebench/__init__.py(1 hunks)nemo_skills/dataset/utils.py(1 hunks)nemo_skills/pipeline/eval.py(7 hunks)nemo_skills/pipeline/prepare_data.py(3 hunks)nemo_skills/pipeline/utils/eval.py(5 hunks)nemo_skills/pipeline/utils/exp.py(1 hunks)requirements/main.txt(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-23T17:56:57.556Z
Learnt from: Glorf
Repo: NVIDIA-NeMo/Skills PR: 908
File: requirements/main.txt:16-16
Timestamp: 2025-11-23T17:56:57.556Z
Learning: faiss-cpu must be explicitly listed in requirements/main.txt for BFCLv4 memory evaluations (memory_kv, memory_vector, memory_rec_sum) as it is an optional dependency of sentence_transformers that is required for vector similarity search functionality in the memory backends.
Applied to files:
requirements/main.txt
🧬 Code graph analysis (2)
nemo_skills/pipeline/prepare_data.py (1)
nemo_skills/dataset/utils.py (1)
get_dataset_installation_command(241-256)
nemo_skills/pipeline/utils/eval.py (1)
nemo_skills/dataset/utils.py (1)
get_dataset_installation_command(241-256)
🔇 Additional comments (1)
nemo_skills/pipeline/eval.py (1)
474-497: Installation-command plumbing is consistent; no backward-compatibility issues foundThe implementation is sound:
prepare_eval_commandsreturns exactly 3 values in all code paths:(benchmarks_dict, job_batches, aggregated_dataset_installation_command).- The single call site at line 474 correctly unpacks all 3 values.
- The merged
final_installation_commandflows consistently to all eval-related tasks (main eval, NVEmbed judge, LLM judge, summarize, and group-score).No additional call sites to update.
| bs4 | ||
| click < 8.2.0 # https://github.com/ai-dynamo/dynamo/issues/1039 | ||
| datasets<4 # lcb problem with datasets 4.0.0 | ||
| datasets>=4.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check requirements/main.txt for datasets constraints
echo "=== datasets constraints in requirements/main.txt ==="
rg -n "datasets" requirements/main.txt
# Check all requirements files
echo -e "\n=== datasets constraints in all requirements files ==="
fd -e txt requirements/ --exec rg -l "datasets" {} + | while read f; do
echo "--- $f ---"
rg -n "datasets" "$f"
done
# Find INSTALLATION_COMMAND uses
echo -e "\n=== INSTALLATION_COMMAND uses ==="
rg -n "INSTALLATION_COMMAND" nemo_skills
# Check livecodebench __init__.py specifically
echo -e "\n=== livecodebench/__init__.py content ==="
fd -type f -name "__init__.py" nemo_skills/dataset/livecodebench --exec cat -n {} +Repository: NVIDIA-NeMo/Skills
Length of output: 1116
🏁 Script executed:
#!/bin/bash
# Get full livecodebench/__init__.py
echo "=== Full livecodebench/__init__.py ==="
cat nemo_skills/dataset/livecodebench/__init__.py
# Also check if there are any other datasets version constraints in other config files
echo -e "\n=== Search for datasets in setup files ==="
rg -i "datasets" setup.py pyproject.toml 2>/dev/null || trueRepository: NVIDIA-NeMo/Skills
Length of output: 1240
Reconcile conflicting datasets version constraints
requirements/main.txt pins datasets>=4.0.0 (line 17), while nemo_skills/dataset/livecodebench/__init__.py defines INSTALLATION_COMMAND = "pip install 'datasets<4'" (line 23). These version ranges are mutually exclusive. When prepare/eval jobs for livecodebench execute, the INSTALLATION_COMMAND will attempt to install datasets<4, creating an unsatisfiable constraint conflict with the main requirements or causing a downgrade that violates project expectations.
Align both constraints to the same supported range, or move the legacy requirement behind a separate environment/extra to prevent installation failures or incompatible datasets versions during livecodebench operations.
🤖 Prompt for AI Agents
In requirements/main.txt line 17 and
nemo_skills/dataset/livecodebench/__init__.py around line 23, the project pins
conflicting datasets versions (requirements: datasets>=4.0.0; livecodebench
INSTALLATION_COMMAND: pip install 'datasets<4'), causing impossible installs;
fix by making the constraints consistent: either update
nemo_skills/dataset/livecodebench/__init__.py to use a compatible range such as
"pip install 'datasets>=4.0.0,<5'" (or simply "pip install 'datasets>=4.0.0'")
OR move the legacy datasets<4 requirement into a separate extra/environment for
livecodebench and remove the hard-coded INSTALLATION_COMMAND so main
requirements remain authoritative; apply one of these changes and run dependency
resolution to confirm no conflicts.
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.