v0.2.2
What's Changed
- [Benchmark] Update welford torch.compile function name by @yf225 in #1029
- chore: Bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #1030
- chore: Bump actions/download-artifact from 5 to 6 by @dependabot[bot] in #1031
- [Benchmark CI] Set welford num_inputs to 6 to avoid timeout by @yf225 in #1032
- Default config: reduce block_size and num_stages to avoid shared mem OOM by @yf225 in #1033
- Default config: reduce block_size further to avoid shared mem OOM by @yf225 in #1034
- Disable autotuner progress bar in fbcode unit test by @yf225 in #1035
- Always print cached config by @oulgen in #1036
- Fix dtype mismatch error in se_block example by @yf225 in #1040
- Upgrade clang version by @oulgen in #1043
- Fix missing static_shapes=False in deployment_autotuning.md by @jansel in #1042
- Fix matmul output dtype to match PyTorch eager behavior by @yf225 in #1044
- Fix layernorm bwd unit test by @yf225 in #1047
- Fix FlattenedTileStrategy to handle unit-sized block dimensions by @yf225 in #1048
- [CI] Fix debug_str() to be compatible with latest PyTorch nightly by @yf225 in #1050
- [Fix upcoming CI error] Set current node in inductor lowering by @yf225 in #1052
- Remove Section Navigation pane from Deployment and Autotuning page. by @choijon5 in #1051
- Add
settings.autotune_baseline_fnto allow passing in custom baseline function to autotuner by @yf225 in #1054 - Add
HELION_PRINT_REPRO=1to print Helion kernel repro script to console by @yf225 in #1049 - Fix caching for CPUs by @oulgen in #1055
- Add get_num_sm for cpu by @oulgen in #1056
- Support torch.rand / torch.rand_like with dynamic tile sizes by @yf225 in #1057
- Remove line numbers from expected files by @oulgen in #1061
- Ignore passed in config when force autotune is turned on by @oulgen in #1060
- Update Watch Talk link to Triton conf talk. by @choijon5 in #1058
- Helion Puzzle docs bug fixes by @Athe-kunal in #1062
- Update test_persistent_kernels.expected by @jansel in #1070
- Make HELION_PRINT_REPRO=1 take effect in more error cases by @yf225 in #1066
- add geglu backward by @parsshar-RH in #1069
- [Unblock internal] Fix log capture issue on internal tests by @yf225 in #1076
- Add best effort triton-cpu support by @oulgen in #1037
- Update test_debug_utils.py by @oulgen in #1077
- Raise user error if device-loop is empty after DCE by @yf225 in #1074
- Add GRPO loss example by @ighoshsubho in #1063
- Use HELION_PRINT_REPRO=1 to print repro when device IR lowering or Triton codegen error by @yf225 in #1078
- add AMD demo link by @vivienfanghuagood in #1068
- Update test.yml by @oulgen in #1083
- Fix GRPO loss example unit tests by @yf225 in #1079
- Remove requirements.txt by @oulgen in #1088
- Relax requirements for inline_triton output_like=None by @jansel in #1087
- feat(autotuner): Make autotune cache class configurable via env var by @fulvius31 in #1071
- Add support for while and pass by @jansel in #1090
- Update sphinxtheme to pull from pypi package by @sekyondaMeta in #1091
- [Autotuner] Better error message for default config error by @yf225 in #1092
- Ignore illegal instruction errors by @jansel in #1093
- Update talk links to PTC version by @jansel in #1094
- Add autotuning log by @jansel in #1095
- Fix builtin min / max handling in device loop by @yf225 in #1085
- Add skipIfRocm to failing test on main by @jansel in #1101
- Fix lint in newer triton by @jansel in #1098
- Add AGENTS.md by @jansel in #1100
- Refactor _decorators.codegen to allow multiple backends by @jansel in #1099
- Add extra line before repro log; update repro log tests by @yf225 in #1102
- Refactor inductor_lowering.py into two files by @jansel in #1103
- Use CPU machine for triton-cpu by @oulgen in #1105
- Fix no libdw.so issue on AMD CI by @yf225 in #1107
- Fixes in helion puzzles by @Athe-kunal in #1104
- Add distributed CI job (4xH100) and example unit tests by @yf225 in #1106
- Generalize aten_lowering.py for multiple backends by @jansel in #1108
- Support tensor.T for transpose by @yf225 in #1110
- Add warning to discourage use of
acc += lhs @ rhspattern by @yf225 in #1111 - Remove
@helion.jitusage and advise use of@helion.kernelby @yf225 in #1116
New Contributors
- @Athe-kunal made their first contribution in #1062
- @parsshar-RH made their first contribution in #1069
- @ighoshsubho made their first contribution in #1063
- @vivienfanghuagood made their first contribution in #1068
- @fulvius31 made their first contribution in #1071
Full Changelog: v0.2.1...v0.2.2