TPDE: A Fast Adaptable Compiler Back-End Framework

(arxiv.org)

60 points | by npalli 1 day ago

6 comments

npalli 1 day ago
Source code for the framework
https://github.com/tpde2/tpde

MaskRay 9 hours ago

Build instructions

In the llvm/llvm-project repository

    git switch origin/release/19.x
    cmake -GNinja -S. -B/tmp/out/custom -DLLVM_TARGETS_TO_BUILD='X86;AArch64' -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_PLUGINS=off -DCMAKE_BUILD_TYPE=Release -DLLVM_LINK_LLVM_DYLIB=on
    # consider -DCLANG_ENABLE_OBJC_REWRITER=off -DCLANG_ENABLE_STATIC_ANALYZER=off -DCLANG_ENABLE_ARCMT=off -DCLANG_PLUGIN_SUPPORT=off
    ninja -C /tmp/out/custom clang LLVM FileCheck   # build clang and libLLVM.so and test utilities

In the tpde repository

    git submodule update --init
    cmake -GNinja -S. -Bout/debug -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=on -DCMAKE_PREFIX_PATH=/tmp/out/custom -DCMAKE_CXX_COMPILER=$HOME/Stable/bin/clang++ -DCMAKE_C_COMPILER=$HOME/Stable/bin/clang

/Stable/bin/clang

There are some failures:

``` % /tmp/out/custom/bin/llvm-lit out/debug/tpde/test/filetest ... Failed Tests (5): TPDE FileTests :: codegen/eh-frame-arm64.tir TPDE FileTests :: codegen/eh-frame-x64.tir TPDE FileTests :: codegen/simple_ret.tir TPDE FileTests :: codegen/tbz.tir TPDE FileTests :: tir/duplicate_funcs.tir ```

[-]

aengelke 8 hours ago
These are tests that use some more LLVM tools (llvm-objdump, llvm-dwarfdump, not). Could you try after building these tools in addition to FileCheck? Do the TPDE-LLVM tests, which use the same tools, pass with this setup?

BarakWidawsky 1 day ago
If this is a faster backend for LLVM, does it potentially obviate the niche Cranelift is optimizing for?
[-]
- npalli 1 day ago
  While they used Cranelift IR itself (amongst others, not just LLVM) to show performance improvements (thus making it complementary and not a replacement) you raise a good point. Quite possible it is not as full-featured yet so perhaps in the future, if at all.
  The TPDE-based back-end compiles 4.27x faster than Cranelift and 2.68x faster than Cranelift with its fast register allocator, but is 1.74x slower than Winch
  [-]
  - cfallin 19 hours ago
    They're hitting another design point on the compile time vs. code-quality tradeoff curve, which is interesting. They compile 4.27x faster than Cranelift with default (higher quality) regalloc, but Cranelift produces code that runs 1.64x faster (section 6.2.2).
    This isn't too surprising to me, as the person who wrote Cranelift's current regalloc (hi!) -- regalloc is super important to run-time perf, so for Wasmtime's use-case at least, we've judged that it's worth the compile time.
    TPDE is pretty cool and it's great to see more exploration in compiler architectures!
fooker 1 day ago
What makes this 'adaptable' and what makes this a 'framework'?
Seems like a pretty neat fast compiler backend for LLVM. Why the extra buzzwords?
[-]
- t0b1 1 day ago
  TPDE is a framework for writing a back-end for various SSA IRs. TPDE-LLVM is an LLVM back-end written using TPDE, but TPDE itself is independent of LLVM. The paper also mentions back-ends written for Cranelift's IR and Umbra IR using TPDE.
xiphias2 1 day ago
It's great start, but what would be cooler if they really went through the boring part, which is putting it into LLVM as the new default -O0 compiler.
Edit: LLM to LLVM
[-]
- npalli 1 day ago
  You mean LLVM, cause I was confused why you would put into an LLM (which one?)
  [-]
  - xiphias2 1 day ago
    Sure, I meant LLVM
vlovich123 1 day ago
> Performance results on SPECint 2017 show that we can compile LLVM-IR 8--24x faster than LLVM -O0 while being on-par in terms of run-time performance
Wait - it’s 8-24x faster than O0 while producing code on par with O3???
[-]
- ummonk 1 day ago
  No, the generated code is on par with LLVM -O0. It's slower than LLVM -O1, never mind LLVM -O3.
  [-]
  - wiz21c 1 day ago
    I guess it doesn't include linking ? (which takes quite some time)
    [-]
    - andyferris 1 day ago
      One thing I never understood in this context here (fast JIT/debug builds/hot reloads/-O0) is why you would need much static linking. Generally your modules are going to have a DAG relationship. Even code inside a large compilation unit could potentially be factored out (automatically) into smaller modules. Could you not just generate a bunch of small dynamically linked libraries? Would the system dynamic loader become the speed bottleneck? Even if so, wouldn't reloading just a portion of the DAG in a hot-reload context be much faster than linking everything beforehand?