Skip to content

v0.2.2

Choose a tag to compare

@oulgen oulgen released this 12 Nov 18:58
· 41 commits to main since this release
51580b4

What's Changed

  • [Benchmark] Update welford torch.compile function name by @yf225 in #1029
  • chore: Bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #1030
  • chore: Bump actions/download-artifact from 5 to 6 by @dependabot[bot] in #1031
  • [Benchmark CI] Set welford num_inputs to 6 to avoid timeout by @yf225 in #1032
  • Default config: reduce block_size and num_stages to avoid shared mem OOM by @yf225 in #1033
  • Default config: reduce block_size further to avoid shared mem OOM by @yf225 in #1034
  • Disable autotuner progress bar in fbcode unit test by @yf225 in #1035
  • Always print cached config by @oulgen in #1036
  • Fix dtype mismatch error in se_block example by @yf225 in #1040
  • Upgrade clang version by @oulgen in #1043
  • Fix missing static_shapes=False in deployment_autotuning.md by @jansel in #1042
  • Fix matmul output dtype to match PyTorch eager behavior by @yf225 in #1044
  • Fix layernorm bwd unit test by @yf225 in #1047
  • Fix FlattenedTileStrategy to handle unit-sized block dimensions by @yf225 in #1048
  • [CI] Fix debug_str() to be compatible with latest PyTorch nightly by @yf225 in #1050
  • [Fix upcoming CI error] Set current node in inductor lowering by @yf225 in #1052
  • Remove Section Navigation pane from Deployment and Autotuning page. by @choijon5 in #1051
  • Add settings.autotune_baseline_fn to allow passing in custom baseline function to autotuner by @yf225 in #1054
  • Add HELION_PRINT_REPRO=1 to print Helion kernel repro script to console by @yf225 in #1049
  • Fix caching for CPUs by @oulgen in #1055
  • Add get_num_sm for cpu by @oulgen in #1056
  • Support torch.rand / torch.rand_like with dynamic tile sizes by @yf225 in #1057
  • Remove line numbers from expected files by @oulgen in #1061
  • Ignore passed in config when force autotune is turned on by @oulgen in #1060
  • Update Watch Talk link to Triton conf talk. by @choijon5 in #1058
  • Helion Puzzle docs bug fixes by @Athe-kunal in #1062
  • Update test_persistent_kernels.expected by @jansel in #1070
  • Make HELION_PRINT_REPRO=1 take effect in more error cases by @yf225 in #1066
  • add geglu backward by @parsshar-RH in #1069
  • [Unblock internal] Fix log capture issue on internal tests by @yf225 in #1076
  • Add best effort triton-cpu support by @oulgen in #1037
  • Update test_debug_utils.py by @oulgen in #1077
  • Raise user error if device-loop is empty after DCE by @yf225 in #1074
  • Add GRPO loss example by @ighoshsubho in #1063
  • Use HELION_PRINT_REPRO=1 to print repro when device IR lowering or Triton codegen error by @yf225 in #1078
  • add AMD demo link by @vivienfanghuagood in #1068
  • Update test.yml by @oulgen in #1083
  • Fix GRPO loss example unit tests by @yf225 in #1079
  • Remove requirements.txt by @oulgen in #1088
  • Relax requirements for inline_triton output_like=None by @jansel in #1087
  • feat(autotuner): Make autotune cache class configurable via env var by @fulvius31 in #1071
  • Add support for while and pass by @jansel in #1090
  • Update sphinxtheme to pull from pypi package by @sekyondaMeta in #1091
  • [Autotuner] Better error message for default config error by @yf225 in #1092
  • Ignore illegal instruction errors by @jansel in #1093
  • Update talk links to PTC version by @jansel in #1094
  • Add autotuning log by @jansel in #1095
  • Fix builtin min / max handling in device loop by @yf225 in #1085
  • Add skipIfRocm to failing test on main by @jansel in #1101
  • Fix lint in newer triton by @jansel in #1098
  • Add AGENTS.md by @jansel in #1100
  • Refactor _decorators.codegen to allow multiple backends by @jansel in #1099
  • Add extra line before repro log; update repro log tests by @yf225 in #1102
  • Refactor inductor_lowering.py into two files by @jansel in #1103
  • Use CPU machine for triton-cpu by @oulgen in #1105
  • Fix no libdw.so issue on AMD CI by @yf225 in #1107
  • Fixes in helion puzzles by @Athe-kunal in #1104
  • Add distributed CI job (4xH100) and example unit tests by @yf225 in #1106
  • Generalize aten_lowering.py for multiple backends by @jansel in #1108
  • Support tensor.T for transpose by @yf225 in #1110
  • Add warning to discourage use of acc += lhs @ rhs pattern by @yf225 in #1111
  • Remove @helion.jit usage and advise use of @helion.kernel by @yf225 in #1116

New Contributors

Full Changelog: v0.2.1...v0.2.2