Game World!: Is GCC written in C or C++?

The GNU Compiler Collection (GCC) was, from its inception, written in C and compiled by a C compiler. Beginning in 2008, an effort was undertaken to change GCC so that it could be compiled by a C++ compiler and take advantage of a subset of C++ constructs. This effort was jump-started by a presentation by Ian Lance Taylor [PDF] at the June 2008 GCC summit. As with any major change, this one had its naysayers and its problems, as well as its proponents and successes.

Reasons

Taylor's slides list the reasons to commit to writing GCC in C++:

C++ is well-known and popular.
It's nearly a superset of C90, which GCC was then written in.
The C subset of C++ is as efficient as C.
C++ "supports cleaner code in several significant cases." It never requires "uglier" code.
C++ makes it harder to break interface boundaries, which leads to cleaner interfaces.

The popularity of C++ and its superset relationship to C speak for themselves. In stating that the C subset of C++ is as efficient as C, Taylor meant that if developers are concerned about efficiency, limiting themselves to C constructs will generate code that is just as efficient. Having cleaner interfaces is one of the main advantages of C++, or any object-oriented language. Saying that C++ never requires "uglier" code is a value judgment. However, saying that it supports "cleaner code in several significant cases" has a deep history, best demonstrated by gengtype.

According to the GCC Wiki:

As C does not have any means of reflection [...] gengtype was introduced to support some GCC-specific type and variable annotations, which in turn support garbage collection inside the compiler and precompiled headers. As such, gengtype is one big kludge of a rudimentary C lexer and parser.

What had happened was that developers were emulating features such as garbage collection, a vector class, and a tree class in C. This was the "ugly" code to which Taylor referred.

In his slides, Taylor also tried to address many of the initial objections: that C++ was slow, that it was complicated, that there would be a bootstrap problem, and that the Free Software Foundation (FSF) wouldn't like it. He addressed the speed issue by pointing out that the C subset of C++ is as efficient as C. As far as FSF went, Taylor wrote, "The FSF is not writing the code."

The complexity of a language is in the eye of the beholder. Many GCC developers were primarily, or exclusively, C programmers, so of necessity there would be a time period in which they would be less productive, and/or might use C++ in ways that negated all its purported benefits. To combat that problem, Taylor hoped to develop coding standards that limited development to a subset of C++.

The bootstrap problem could be resolved by ensuring that GCC version N-1 could always build GCC version N, and that they could link statically against libstdc++. GCC version N-1 must be linked against libstdc++ N-1 while it is building GCC N and libstdc++ N; GCC N, in turn, will need libstdc++ N. Static linking ensures that each version of the compiler runs with the appropriate version of the library.

For many years prior to 2008, there had been general agreement to restrict GCC code to a common subset of C and C++, according to Taylor (via email). However, there was a great deal of resistance to replacing the C compiler with a C++ compiler. At the 2008 GCC summit, Taylor took a poll on how large that resistance was, and approximately 40% were opposed. The C++ boosters paid close attention to identifying and addressing the specific objections raised by C++ opponents (speed, memory usage, inexperience of developers, and so on), so that each year thereafter the size of the opposition shrank significantly. Most of these discussions took place at the GCC summits and via unlogged IRC chats. Therefore, the only available record is in the GCC mailing list archives.

First steps

The first step, a proper baby step, was merely to try to compile the existing C code base with a C++ compiler. While Taylor was still at the conference, he created a gcc-in-cxx branch for experimenting with building GCC with a C++ compiler. Developers were quick to announce their intention to work on the project. The initial build attempts encountered many errors and warnings, which were then cleaned up.

In June 2009, almost exactly a year from proposing this switch, Taylor reported that phase one was complete. He configured GCC with the switch enable-build-with-cxx to cause the core compiler to be built with C++. A bootstrap on a single target system was completed. Around this time, the separate cxx branch was merged into the main GCC trunk, and people continued their work, using the enable-build-with-cxx switch. (However, the separate branch was revived on at least one occasion for experimentation.)

In May 2010, there was a GCC Release Manager Q&A on IRC. The conclusion from that meeting was to request permission from the GCC Steering Committee to use C++ language features in GCC itself, as opposed to just compiling with a C++ compiler. Permission was granted, with agreement also coming from the FSF. Mark Mitchell announced the decision in an email to the GCC mailing list on May 31, 2010.

In that thread, Jakub Jelinek and Vladimir Makarov expressed a lack of enthusiasm for the change. However, as Makarov put it, he had no desire to start a flame war over a decision that had already been made. That said, he recently shared via email that his primary concern was that the GCC community would rush into converting the GCC code base to C++ "instead of working on more important things for GCC users (like improving performance, new functionality and so on). Fortunately, it did not happen."

Richard Guenther was concerned about creating a tree class hierarchy:

It's a lot of work (tree extends in all three Frontends, middle-end and backends). And my fear is we'll only get a halfway transition - something worse than no transition at all.

The efforts of the proponents to allay concerns, and the "please be careful" messages from the opponents give some indication of the other concerns. In addition to the issues raised by Taylor at the 2008 presentation, Jelinek mentioned memory usage. Others, often as asides to other comments, worried that novice C++ programmers would use the language inappropriately, and create unmaintainable code.

There was much discussion about coding standards in the thread. Several argued for existing standards, but others pointed out that they needed to define a "safe" subset of C++ to use. There was, at first, little agreement about which features of C++ were safe for a novice C++ developer. Taylor proposed a set of coding standards. These were amended by Lawrence Crowl and others, and then were adopted. Every requirement has a thorough rationale and discussion attached. However, the guiding principle on maintainability is not the coding standard, but one that always existed for GCC: the maintainer of a component makes the final decision about any changes to that component.

Current status

Currently, those who supported the changes feel their efforts provided the benefits they expected. No one has publicly expressed any dissatisfaction with the effort. Makarov was relieved that his fear that the conversion effort would be a drain on resources did not come to pass. In addition, he cites the benefits of improved modularity as being a way to make GCC easier to learn, and thus more likely to attract new developers.

As far as speed goes, Makarov noted that a bootstrap on a multi-CPU platform is as fast as it was for C. However, on uniprocessor platforms, a C bootstrap was 30% faster. He did not speculate as to why that is. He also found positive impacts, like converting to C++ hash tables, which sped up compile time by 1-2%. This last work is an ongoing process, that Lawrence Crowl last reported on in October 2012. In keeping with Makarov's concerns, this work is done slowly, as people's time and interests permit.

Of the initial desired conversions (gengtype, tree, and vector), vector support is provided using C++ constructs (i.e., a class) and gengtype has been rewritten for C++ compatibility. Trees are a different matter. Although they have been much discussed and volunteered for several times, no change has been made to the code. This adds credence to the 2010 contention of Guenther (who has changed his surname to Biener) that it would be difficult to do correctly. Reached recently, Biener stated that he felt it was too early to assess the impact of the conversion because, compared to the size of GCC, there have been few changes to C++ constructs. On the negative side, he noted (as others have) that, because of the changes, long-time contributors must relearn things that they were familiar with in the past.

In 2008, 2009, and 2010, (i.e., at the beginning and after each milestone) Taylor provided formal plans for the next steps. There is no formal plan going forward from here. People will use C++ constructs in future patches as they deem necessary, but not just for the sake of doing so. Some will limit their changes to the times when they are patching the code anyway. Others approach the existing C code with an eye to converting code to C++ wherever it makes the code clearer or more efficient. Therefore, this is an ongoing effort on a meandering path for the foreseeable future.

As the C++ project has progressed, some fears have been allayed, while some developers are still in a holding pattern. For them it is too soon to evaluate things definitively, and too late to change course. However, the majority seems to be pleased with the changes. Only time will tell what new benefits or problems will arise.

Index entries for this article
GuestArticles	Jacobson, Linda

(Log in to post comments)

GCC's move to C++

Posted Mar 14, 2013 4:24 UTC (Thu) by jhhaller (subscriber, #56103) [Link]

For the emphasis on compilation speed, I'm surprised that #include has never been implemented using openat, with directories opened for each -I option. For a simple project, openat won't help much, but in a project with a hundred -I lines, searching through the -I directories to find each header can have significant overhead. Processing the header will always be more expensive than open or openat, but openat can help with the unsuccessful opens.

GCC's move to C++

Posted Mar 14, 2013 16:30 UTC (Thu) by etienne (guest, #25256) [Link]

> has never been implemented using openat()

I am not a fan of openat(), but maybe it can be used for GCC.
For server/daemon, you can get to this situation:
Let's assume you want to write a safe TFTP server, only able to upload/download stuff in /tftpboot - so you write your server and it changes directory to /tftpboot before serving requests, only using openat().
Now the user of TFTP has a complex configuration, and keeps /tftpboot-v1 and /tftpboot-v2, and link or rename /tftpboot to what is needed at that time.
If you used openat() in tftpd, your user is then forced to restart the tftpd server when he changes configuration, else he get the content of the directory *when the server was started*...

GCC's move to C++

Posted Mar 14, 2013 9:57 UTC (Thu) by ncm (subscriber, #165) [Link]

The effort will be mature when people begin to note C constructs that are not appreciably slower than the conventional C++ equivalents.

GCC's move to C++

Posted Mar 14, 2013 16:40 UTC (Thu) by dashesy (guest, #74652) [Link]

Being started recently, I am just wondering why not C++11? It is far better to use native language constructs, than trying to hack them. BTW, being gcc, their coding standard, and a simpler version of how-to for avoiding common C++ minefields would be useful for many (a de-facto standard on what to avoid).

GCC's move to C++

Posted Mar 14, 2013 16:57 UTC (Thu) by rriggs (guest, #11598) [Link]

I do not think that the GCC's coding standard would be useful to many C++ coders. In fact, many of those "restricted C++" coding standards (such as Google's*) do more harm than good for the language. It is only useful for dealing with novice C++ programmers in a transition period and is best shed as quickly as possible.

As to using C++11, the problem is that GCC's implementation is still experimental. It makes it more difficult to ensure that GCC N can be built with both GCC N-1 and N.

*Google provides a reasonable rationale for their standard, but many do not read the rationale or understand it and just blindly accept that it is a reasonable coding standard, even when it does not apply to their situation.

Compiler nannyism

Posted Mar 14, 2013 21:52 UTC (Thu) by ncm (subscriber, #165) [Link]

I read the Gcc coding standard. I found only one particularly silly rule: a class with a virtual function must have a virtual destructor. It's perfectly reasonable for an object with virtual functions to exist only on the stack, where there can be no confusion about the correct destructor to call, or for it to be destroyed only by its creator. This rule is, wrongly, hard-wired into "-W" (not just "-Wall"), so the requirement to build without warnings enforces it. That means, in turn, that the rule expressed in the coding standard is redundant.

The typical response is, why not just declare the destructor virtual? But we do not always have a choice about what goes into header files we get from others. A bug has been open on this for a long time: an easy fix would be to complain only when the destructor is called in a polymorphic context.

Another rule (echoed in Google's much sillier standard) forbids any use of exceptions. That can be defensible in a program that started out as C.

Compiler nannyism

Posted Mar 14, 2013 22:17 UTC (Thu) by dashesy (guest, #74652) [Link]

But aren't coding styles always some sort of generalization that is good most of the time for most people? Avoiding the corner cases that language allows, but it is wrong most of the time, and when it is wrong is has consequences. I have found some bugs because of that warning BTW.

What most of competent programmers overlook is that, the next person may not have all the skills, so it should be as simple and straightforward as possible.

As for using exceptions it is more of a policy I think, either all code should be exception-aware or not. Then probably the person who initially put together that standard did not like them (maybe they take too much of the code real-state and look ugly, maybe she thinks they are just glorified goto statements, maybe exceptions should never cross the shared library for any reason).

Compiler nannyism

Posted Mar 14, 2013 22:43 UTC (Thu) by jwakely (guest, #60262) [Link]

> an easy fix would be to complain only when the destructor is called in a polymorphic context.

That's pretty much what the -Wdelete-non-virtual-dtor warning does, which I added to GCC, "borrowing" it from Clang

Compiler nannyism

Posted Mar 21, 2013 17:44 UTC (Thu) by xman (subscriber, #46972) [Link]

While I do think there are cases where a type with a vtable may have its life-cycle managed in a non-polymorphic fashion, such that a virtual destructor is not needed, it's non-trivial to verify that this is the case (unless there are some fairly strict restrictions on ways the type is used. Particularly for a large project, it seems much more maintainable to simply declare the destructor virtual. This practice is practically canon, and can be found in Effective C++.

While the case while you inherit a header is very real, one could argue about that problem with a LOT of warnings. I think it is reasonable to have it wired in to -W with an option to disable it on a per header basis (in general, for warnings, you want to investigate and potentially disable them for a particular case... that's the difference between an error and a warning).

GCC's move to C++

Posted Mar 14, 2013 17:10 UTC (Thu) by jwakely (guest, #60262) [Link]

> why not C++11?

Currently GCC trunk can be built by any GCC since 3.4, and if you're willing to do a bit of hacking with even older versions. Requiring a C++11 compiler would mean GCC 4.6 or later, or an equivalently new Clang, as before then the language was still in flux and the rules kept changing. You'd force people to bootstrap GCC 4.6 or 4.7 just to then bootstrap 4.8, even if they have an old C++ compiler already installed, which would be quite inconvenient.

For GNU/Linux users it wouldn't be a problem, they'd just get it from their distro, who deal with building it, but it doesn't help anyone to make it awkward for an admin of a ten year old Solaris, HPUX or AIX server to install a new GCC. If we want people to use the GNU compiler instead of their system compiler then the barrier for entry can't be too high.

It wasn't so long ago that GCC could still be bootstrapped starting with a K&R C compiler!

Bootstrapping GCC with K&R C

Posted Mar 14, 2013 21:32 UTC (Thu) by ncm (subscriber, #165) [Link]

It still can be, it just takes more steps. The best route might be hard to identify: first build an ancient Gcc, or build an emulator for a current target and run a current Gcc on that? Maybe build a K&R back-end for Gcc, compile current Gcc to K&R using it, and build the output on your target.

This would make a good contest.

GCC's move to C++

Posted Mar 15, 2013 4:26 UTC (Fri) by brianomahoney (guest, #6206) [Link]

Absolutely correct, the saying is "Do not let the Perfect be the enemy of the Good". MFG omb

GCC's move to C++

Posted Mar 27, 2013 3:33 UTC (Wed) by eean (guest, #50420) [Link]

What would they compile it with?

The coding standard they established for GCC looks highly specific to the legacy codebase they are building on. It's not useful as a general guideline.

GCC's move to C++

Posted Mar 15, 2013 6:36 UTC (Fri) by eru (subscriber, #2753) [Link]

I doubt it is useful to require a "safe" or "efficient" subset of C++ and expect that desire to have much effect, unless you have a picky tool for enforcing it. In a large project programmers will have different notions of what is safe, and will use their favourite features anyway. For example, multiple inheritance essentially crept in by accident into the first large C++ project I was involved with, although we also wanted to use "safe" features and all C++ compiler did not even support multiple inheritance at the time. - On the other hand, the black-belt hackers who work on GCC can probably be trusted to not shoot themselves in the foot with C++ (unlike most programmers, who should steer clear of it).

Game World!

Game World!

Die Besten Rezepte Aus Der Suppen Und Eintöpfe In Deutschland

Followers

Popular Posts

Tuesday 29 June 2021

Is GCC written in C or C++?

Reasons

First steps

Current status

GCC's move to C++

GCC's move to C++

GCC's move to C++

GCC's move to C++

GCC's move to C++

Compiler nannyism

Compiler nannyism

Compiler nannyism

Compiler nannyism

GCC's move to C++

Bootstrapping GCC with K&R C

GCC's move to C++

GCC's move to C++

GCC's move to C++

Floating Button

Button

Pinterest

Blogarama

Follow Button