gcc and g++ are the traditional GNU compilers for C and C++ code. Recently, clang (and clang++) using LLVM has been gaining popularity as an alternative compiler.
What is the difference between clang and gcc / g++? Is there an advantage to using clang?
- They are back- and front-ends of their respective systems, and they have no relation to each other. – user541686 Jul 19 '14 at 2:35
- 2@Mehrdad that wasn't very helpful. Which is a backend (I assume that means code generation and linking), which is a frontend (I assume that means compiler driver), and what is the motivation for them compared to existing front- and backends? I'd also think that while the gcc executable is strictly spoken just a frontend it's also customary to call the whole toolchain from source to executable "gcc". Wouldn't that be the case with other compilers too? – Peter - Reinstate Monica Jul 19 '14 at 3:49
- It's also nonsense, neither 'gcc' nor 'g++' is a front end, and neither is a back end either. They are driver programs that run the compiler, assembler and/or linker as needed. The compiler executable that gcc/g++ runs is what has a front end and back end. – Jonathan Wakely Dec 1 '20 at 19:26
GCC is a big bag of software. The typical process, as I understand it, is for a GCC frontend to lex and parse the code, convert to GCC's internal Register Transfer Language, and then for a backend to write out native code.
So one typical flow is: C code ---> GCC's C frontend ---> RTL ---> GCC's x86 backend ---> x86 machine code.
GCC supports several frontends: C, C++, Java, Objective C, Go, and Fortran.
GCC supports several backends: 32-bit x86, 64-bit x86, little endian ARM, big endian ARM, MIPS, SPARC, PowerPC, etc.
Frontends convert text to RTL, backends convert RTL to machine code of some sort.
LLVM is a middle-layer machine-agnostic computation representation, similar in concept to GCC's RTL. It is its own type system and instruction set called LLVM Intermediate Representation (IR). If I understand correctly, LLVM's IR is richer, more expressive, and much more flexible than GCC's RTL, which serves many benefits. Compiler front-ends for LLVM for many different languages can all compile down to LLVM IR. This can be used for 'conventional' languages like C, C++, Java, etc, but it can also be used for 'unconventional' programming tasks like GPU shaders or sql queries
LLVM is, perhaps, two things then. LLVM-the-machine, which is the type system and instruction set, which is probably better referred to as "LLVM IR"; and LLVM-the-API, which is software for manipulating code in the LLVM IR, such as the LLVM JIT compiler, or perhaps the LLVM x86 machine code backend.
Clang is a front-end for LLVM that processes C-family languages: C, C++, Objective C, Objective C++. Clang converts C/C++/etc to LLVM IR, LLVM performs optimizations on the IR, and the LLVM x86 backend writes out x86 machine code for execution.
Despite the name, LLVM is not a Virtual Machine in the traditional sense - it is a computation model and representation that lends itself well to the task of manipulating code.
Part of LLVM's popularity comes from the fact that it is a fully reified compiler API. It can be used for performing static analysis on code ("does this code ever accidentally use uninitialized memory?"), optimization, code parsing (such as for building IDEs). GCC's internals are very highly coupled, and so using GCC in this manner is incredibly difficult. One example is that GCC's frontends perform some optimizations during parsing, so it is not possible to always get a perfect representation of the code-as-typed, for eg, reporting errors and performing squiggle-line syntax highlighting, because some information might have been lost.
As I understand it, Clang preserves the unoptimized parsed syntax, making it possible for 3rd party tools to use its output and equate transformations back to the original text, most notably, Clang's error messages are much more helpful because they can highlight the exact part of the line that is in question.
- 6
- 9
- @sapy ditto. Can someone present an easier to understand version of this, without terms like reified? – Sujay Phadke Dec 4 '20 at 7:44
- 2@SujayPhadke - reified: to make real; a fully realized implementation; to take something abstract and make it concrete. If you were to set out and design a library to specifically provide a compiler API, with all of the things that compilers do other than just emit machine code, what would it look like? Ostensibly, GCC's primary goal is to take text and turn it into machine code. LLVM's goal goes beyond that. – antiduh Dec 4 '20 at 19:15
- @antiduh If I understand your post correctly. 1. Both GCC and LLVM requires architecture-specific backend (64-bit x86, ARM, etc). 2. The benefit of LLVM is that it does not do as much as GCC does during conversion to some intermediate representation so people could do more (for example, analysis) with the LLVM IR than GCC RTL? – Mr.Robot Jan 28 at 5:14