Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
Many programming languages, C# included, treat certain sequences of letters as “special”.
Some sequences are so special that they cannot be used as identifiers. Let’s call those the “reserved keywords” and the remaining special sequences we’ll call the “contextual keywords”. They are “contextual” because the character sequence might one meaning in a context where the keyword is expected and another in a context where an identifier is expected.*
The C# specification defines the following reserved keywords:
abstract as base bool break byte case catch char checked class const continue decimal default delegate do double else enum event explicit extern false finally fixed float for foreach goto if implicit in int interface internal is lock long namespace new null object operator out override params private protected public readonly ref return sbyte sealed short sizeof stackalloc static string struct switch this throw true try typeof uint ulong unchecked unsafe ushort using virtual void volatile while
The implementation also reserves the magic keywords __arglist __makeref __reftype __refvalue which are for obscure scenarios that I might blog about in the future.
Those are the keywords that we reserved in C# 1.0; no new reserved keywords have been added since. It is tempting to do so, but we always resist. Were we to add a new reserved keyword then any program that used that keyword as an identifier would break upon recompilation. Yes, you can always use a keyword as an identifier if you really want: @typeof @goto = @for.@switch(@throw); is perfectly legal, though more than a little weird. But we prefer to avoid as many breaking changes as possible.
We also have a whole bunch of contextual keywords.
The “preprocessor” † uses all the directives (#define, and so on) which of course were never valid identifiers in the first place. But it also uses contextual keywords hidden default disable restore checksum.
C# 1.0 had contextual keywords get set value add remove for properties, indexers and events. The attribute locations event and return are already reserved keywords; assembly module type method field property param typevar are contextual keywords in the context of an attribute.
C# 2.0 added where partial global yield alias.
C# 3.0 added from join on equals into orderby ascending descending group by select let var.
C# 4.0 added dynamic.
The async CTP added async and await.
Every time we add one of these we need to carefully design the grammar so that if possible, the use of the new contextual keyword does not possibly change the meaning of an existing program which used it.
For example, when defining a partial class, the partial must go immediately before the class. Since there was never a legal C# 1.0 program where partial appeared immediately before class, we knew that adding this new feature to the grammar would not possibly break any existing programs.
Or, another example. Consider var x = 1; – that could have been a legal C# 2.0 program if there was a type called var with a user-defined implicit conversion from int. The semantic analyzer for declaration statements checks to see whether there is a type called var that is accessible at the declaration; if there is then the normal declaration rules are used. Only if there is not such a type can we do the analysis as an implicitly typed local declaration.
One might wonder why on earth we added five contextual keywords to C# 1.0, when there was no chance of breaking backwards compatibility. Why not just make get set value add remove into “real” keywords?
Because we could easily get away with making them contextual keywords, and it seemed likely that real people would want to name variables or methods things like get, set, value, add or remove. So we left them unreserved as a courtesy.
Those were easy to make contextual, unlike, say, return. That’s a lot harder to make a contextual keyword because then return (10); would be ambiguous; is that calling the method named “return” or returning ten? So we didn’t make any of the other reserved keywords into contextual keywords.
(*) An unfortunate consequence of this definition is that using is said to be a reserved keyword even though its meaning depends on its context; whether the using begins a directive or a statement determines its meaning.
(†) An unfortunate name, since “preprocessing” is not done before regular language processing. In C#, the so-called “preprocessing” happens during lexical analysis.
Thanks for the great post Eric. I always find this kind of "behind the scenes" stuff interesting.
It is interesting that you highlight the "using" keyword. I was discussing disposable objects with my colleagues the other day, and mentioned the very cool using syntax you can use in C#. They scratched their heads and said, "we thought 'using' was the equivalent of 'imports'", when I explained it did both depending on where you type it, they were very suprised :)
This is a tricky point of language design; when one keyword is used to represent two completely different concepts, it can be confusing. But introducing a new keyword per concept makes the language feel a bit bloated. I personally would have chosen "imports" or some such syntax for the directive form to ensure that it is not confused with the statement form, but I understand that its a judgment call.
We were designing a feature for C# 4.0 that got cut which was yet another form of "partial" class; basically, a way to share attribute metadata between the machine-generated and user-generated halves of a partial class. I pushed back on using the keyword "partial" for the feature because we would then have had THREE subtly different meanings for "partial" in C#, which I felt was two too many. (I was advocating adding another conditional keyword "existing". Unfortunately the point ended up moot since the feature was cut for lack of time. -- Eric
By the way, I've just been reading your archives and I loved the "Riddle me this" Google posts :)
Thanks! Those are among my favourites too. Since we have changed blog software it is now more difficult for me to extract and search the referrer logs, so I haven't written a fourth. -- Eric
It's funny how C# 4.0 adds only one new (contextual) keyword : dynamic ... yet it enables sooo much with it, because it changes the way the compiler emits the calls from static to dynamic late bound through DLR, plus some other interesting things ...
I never knew that the compiler looks up a type called "var" before it decides to use type inference :), time to play some evil compiler error tricks on my colleagues <evil grin> ... (that will teach them to read your blog more :P)
As always, good stuff, the kind of brain food a geek enjoys the most.
Great post, as usual... Thanks Eric !
I thought I knew all C# keywords, but it seems I didn't... I'm looking forward to read your explanations on hidden, disable, restore and checksum ;)
There's nothing particularly interesting there. Do a web search on "C# pragma" and "C# line" and you'll find the documentation for those preprocessor keywords. -- Eric
There are .NET languages where it is dramatic:
I've found a nice summary:
Thank you for submitting this cool story - Trackback from DotNetShoutout
Of course, the fact that 'var' is still a legal name for a type, means that you can have some legitimately compiling C# 3.0 code which uses it, that then suddenly breaks when you import a namespace that contains a class with that name.
Sure, but of course this was a problem before "var". It is always possible that when you add a new reference, you introduce a new ambiguity. -- Eric
So of course, one would expect a tool such as FxCop to dissuade you from creating classes called 'var'. Perhaps ironic, then, that an old version of FxCop's Microsoft.Cci.dll (which you have to reference in order to build FxCop rules) included a top-level (non-namespaced) class called 'var'...
We considered adding a warning to the compiler if you use a type called "var" ambiguously, but never did implement it. -- Eric
Interesting related story: Why "yield return" rather than "yield"?
Hello again Eric,
One of my colleagues (a VB developer learning C#) emailed me with a LINQ problem today. He had a LINQ query in VB along these lines...
dim pagesize as int = 5dim productspaged = pageNum * pagesizeDim results = (From product In products Where product.productgroupid = deptid Order By product.product_id Descending _Take productspaged Order By product.product_id _Take pagesize).ToList
He asked what the C# equivalent would be; pointing out that he couldn't find a "take" keyword for C# query syntax. This keyword omission in C# surprised me somewhat, as it makes the query syntax in C# and VB diverge somewhat. There are also other keyword differences in C# and VB’s query syntax, but I can’t remember any off the top of my head.
I had to suggest to my colleague to do the query using the non-“syntactic sugar" way, seeing as there isn't a take keyword in C#, but this was a lot less expressive than the query syntax counterpart.
It struck me as interesting that this keyword was omitted from C#, especially as it would have only been a contextual keyword and therefore not a breaking change. I suppose you could go mad and keyword everything up if you liked, but it would raise the question of “where do you draw the line”.
I agree that it would have been nice. But as you say, the line has to be drawn somewhere, and C# and VB teams drew the line at different places. -- Eric
I just thought I'd comment with this observation as it seemed quite topical :)
You should be able to do it like this without having to revert totally to the non-"syntactic sugar" way:
var results = (from product in products where product.productgroupid = deptid order by product.product_id descending).Take(productspaged).OrderBy(product.product_id).Take(pagesize).ToList();
Многие языки программирования, включая C#, трактуют определённые цепочки символов как «особые». Некоторые
I would like to add that some identifiers (assembly field method module param property @return type typevar) can also have special meaning when they are used as attribute targets. And 'value' is the name of implicit parameter in property, indexer and event accessors.