diff --git a/gopls/doc/semantictokens.md b/gopls/doc/semantictokens.md new file mode 100644 index 0000000000..fc541fbff1 --- /dev/null +++ b/gopls/doc/semantictokens.md @@ -0,0 +1,121 @@ +# Semantic Tokens + +The [LSP](https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#textDocument_semanticTokens) +specifies semantic tokens as a way of telling clients about language-specific +properties of pieces of code in a file being edited. + +The client asks for a set of semantic tokens and modifiers. This note describe which ones +gopls will return, and under what circumstances. Gopls has no control over how the client +converts semantic tokens into colors (or some other visible indication). In vscode it +is possible to modify the color a theme uses by setting the `editor.semanticTokenColorCustomizations` +object. We provide a little [guidance](#Colors) later. + +There are 22 semantic tokens, with 10 possible modifiers. The protocol allows each semantic +token to be used with any of the 1024 subsets of possible modifiers, but most combinations +don't make intuitive sense (although `async documentation` has a certain appeal). + +The 22 semantic tokens are `namespace`, `type`, `class`, `enum`, `interface`, + `struct`, `typeParameter`, `parameter`, `variable`, `property`, `enumMember`, + `event`, `function`, `member`, `macro`, `keyword`, `modifier`, `comment`, + `string`, `number`, `regexp`, `operator`. + +The 10 modifiers are `declaration`, `definition`, `readonly`, `static`, + `deprecated`, `abstract`, `async`, `modification`, `documentation`, `defaultLibrary`. + +The authoritative lists are in the [specification](https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#semanticTokenTypes) + +For the implementation to work correctly the client and server have to agree on the ordering +of the tokens and of the modifiers. Gopls, therefore, will only send tokens and modifiers +that the client has asked for. This document says what gopls would send if the client +asked for everything. By default, vscode asks for everything. + +Gopls sends 11 token types for `.go` files and 1 for `.*tmpl` files. +Nothing is sent for any other kind of file. +This all could change. (When Go has generics, gopls will return `typeParameter`.) + +For `.*tmpl` files gopls sends `macro`, and no modifiers, for each `{{`...`}}` scope. + +## Semantic tokens for Go files + +There are two contrasting guiding principles that might be used to decide what to mark +with semantic tokens. All clients already do some kind of syntax marking. E.g., vscode +uses a TextMate grammar. The minimal principle would send semantic tokens only for those +language features that cannot be reliably found without parsing Go and looking at types. +The maximal principle would attempt to convey as much as possible about the Go code, +using all available parsing and type information. + +There is much to be said for returning minimal information, but the minimal principle is +not well-specified. Gopls has no way of knowing what the clients know about the Go program +being edited. Even in vscode the TextMate grammars can be more or less elaborate +and change over time. (Nonetheless, a minimal implementation would not return `keyword`, +`number`, `comment`, or `string`.) + +The maximal position isn't particularly well-specified either. To chose one example, a +format string might have formatting codes (`%[4]-3.6f`), escape sequences (`\U00010604`), and regular +characters. Should these all be distinguished? One could even imagine distinguishing +different runes by their Unicode language assignment, or some other Unicode property, such as +being [confusable](http://www.unicode.org/Public/security/10.0.0/confusables.txt). + +Gopls does not come close to either of these principles. Semantic tokens are returned for +identifiers, keywords, operators, comments, and literals. (Sematic tokens do not +cover the file. They are not returned for +white space or punctuation, and there is no semantic token for labels.) +The following describes more precisely what gopls +does, with a few notes on possible alternative choices. +The references to *object* refer to the +```types.Object``` returned by the type checker. The references to *nodes* refer to the +```ast.Node``` from the parser. + +1. __`keyword`__ All Go [keywords](https://golang.org/ref/spec#Keywords) are marked `keyword`. +1. __`namespace`__ All package names are marked `namespace`. In an import, if there is an +alias, it would be marked. Otherwise the last component of the import path is marked. +1. __`type`__ Objects of type ```types.TypeName``` are marked `type`. +If they are also ```types.Basic``` +the modifier is `defaultLibrary`. (And in ```type B struct{C}```, ```B``` has modifier `definition`.) +1. __`parameter`__ The formal arguments in ```ast.FuncDecl``` nodes are marked `parameter`. +1. __`variable`__ Identifiers in the +scope of ```const``` are modified with `readonly`. ```nil``` is usually a `variable` modified with both +`readonly` and `defaultLibrary`. (```nil``` is a predefined identifier; the user can redefine it, +in which case it would just be a variable, or whatever.) Identifiers of type ```types.Variable``` are, +not surprisingly, marked `variable`. Identifiers being defined (node ```ast.GenDecl```) are modified +by `definition` and, if appropriate, `readonly`. Receivers (in method declarations) are +`variable`. +1. __`member`__ Members are marked at their definition (```func (x foo) bar() {}```) or declaration +in an ```interface```. Members are not marked where they are used. +In ```x.bar()```, ```x``` will be marked +either as a `namespace` if it is a package name, or as a `variable` if it is an interface value, +so distinguishing ```bar``` seemed superfluous. +1. __`function`__ Bultins (```types.Builtin```) are modified with `defaultLibrary` +(e.g., ```make```, ```len```, ```copy```). Identifiers whose +object is ```types.Func``` or whose node is ```ast.FuncDecl``` are `function`. +1. __`comment`__ Comments and struct tags. (Perhaps struct tags should be `property`?) +1. __`string`__ Strings. Could add modifiers for e.g., escapes or format codes. +1. __`number`__ Numbers. Should the ```i``` in ```23i``` be handled specially? +1. __`operator`__ Assignment operators, binary operators, ellipses (```...```), increment/decrement +operators, sends (```<-```), and unary operators. + +Gopls will send the modifier `deprecated` if it finds a comment +```// deprecated``` in the godoc. + +The unused tokens for Go code are `class`, `enum`, `interface`, + `struct`, `typeParameter`, `property`, `enumMember`, + `event`, `macro`, `modifier`, + `regexp` + +## Colors + +These comments are about vscode. + +The documentation has a [helpful](https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#custom-textmate-scope-mappings) +description of which semantic tokens correspond to scopes in TextMate grammars. Themes seem +to use the TextMate scopes to decide on colors. + +Some examples of color customizations are [here](https://medium.com/@danromans/how-to-customize-semantic-token-colorization-with-visual-studio-code-ac3eab96141b). + +## Note + +While a file is being edited it may temporarily contain either +parsing errors or type errors. In this case gopls cannot determine some (or maybe any) +of the semantic tokens. To avoid weird flickering it is the responsibility +of clients to maintain the semantic token information +in the unedited part of the file, and they do. \ No newline at end of file