spec: update section on type inference for Go 1.21

The new section describes type inference as the problem
of solving a set of type equations for bound type parameters.

The next CL will update the section on unification to match
the new inference approach.

Change-Id: I2cb49bfb588ccc82d645343034096a82b7d602e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/503920
TryBot-Bypass: Robert Griesemer <gri@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
This commit is contained in:
Robert Griesemer 2023-06-15 16:09:52 -07:00 committed by Gopher Robot
parent 82ee946d7a
commit a04a665a92
1 changed files with 189 additions and 295 deletions

View File

@ -1,6 +1,6 @@
<!--{
"Title": "The Go Programming Language Specification",
"Subtitle": "Version of July 19, 2023",
"Subtitle": "Version of July 20, 2023",
"Path": "/ref/spec"
}-->
@ -2511,7 +2511,7 @@ type (
<p>
A type definition creates a new, distinct type with the same
<a href="#Types">underlying type</a> and operations as the given type
<a href="#Underlying_types">underlying type</a> and operations as the given type
and binds an identifier, the <i>type name</i>, to it.
</p>
@ -4343,7 +4343,7 @@ type parameter list type arguments after substitution
When using a generic function, type arguments may be provided explicitly,
or they may be partially or completely <a href="#Type_inference">inferred</a>
from the context in which the function is used.
Provided that they can be inferred, type arguments may be omitted entirely if the function is:
Provided that they can be inferred, type argument lists may be omitted entirely if the function is:
</p>
<ul>
@ -4351,7 +4351,7 @@ Provided that they can be inferred, type arguments may be omitted entirely if th
<a href="#Calls">called</a> with ordinary arguments,
</li>
<li>
<a href="#Assignment_statements">assigned</a> to a variable with an explicitly declared type,
<a href="#Assignment_statements">assigned</a> to a variable with a known type
</li>
<li>
<a href="#Calls">passed as an argument</a> to another function, or
@ -4371,7 +4371,7 @@ must be inferrable from the context in which the function is used.
// sum returns the sum (concatenation, for strings) of its arguments.
func sum[T ~int | ~float64 | ~string](x... T) T { … }
x := sum // illegal: sum must have a type argument (x is a variable without a declared type)
x := sum // illegal: the type of x is unknown
intSum := sum[int] // intSum has type func(x... int) int
a := intSum(2, 3) // a has value 5 of type int
b := sum[float64](2.0, 3) // b has value 5.0 of type float64
@ -4406,71 +4406,223 @@ For a generic type, all type arguments must always be provided explicitly.
<h3 id="Type_inference">Type inference</h3>
<p>
<em>NOTE: This section is not yet up-to-date for Go 1.21.</em>
A use of a generic function may omit some or all type arguments if they can be
<i>inferred</i> from the context within which the function is used, including
the constraints of the function's type parameters.
Type inference succeeds if it can infer the missing type arguments
and <a href="#Instantiations">instantiation</a> succeeds with the
inferred type arguments.
Otherwise, type inference fails and the program is invalid.
</p>
<p>
Missing function type arguments may be <i>inferred</i> by a series of steps, described below.
Each step attempts to use known information to infer additional type arguments.
Type inference stops as soon as all type arguments are known.
After type inference is complete, it is still necessary to substitute all type arguments
for type parameters and verify that each type argument
<a href="#Implementing_an_interface">implements</a> the relevant constraint;
it is possible for an inferred type argument to fail to implement a constraint, in which
case instantiation fails.
Type inference uses the type relationships between pairs of types for inference:
For instance, a function argument must be <a href="#Assignability">assignable</a>
to its respective function parameter; this establishes a relationship between the
type of the argument and the type of the parameter.
If either of these two types contains type parameters, type inference looks for the
type arguments to substitute the type parameters with such that the assignability
relationship is satisfied.
Similarly, type inference uses the fact that a type argument must
<a href="#Satisfying_a_type_constraint">satisfy</a> the constraint of its respective
type parameter.
</p>
<p>
Type inference is based on
Each such pair of matched types corresponds to a <i>type equation</i> containing
one or multiple type parameters, from one or possibly multiple generic functions.
Inferring the missing type arguments means solving the resulting set of type
equations for the respective type parameters.
</p>
<p>
For example, given
</p>
<pre>
// dedup returns a copy of the argument slice with any duplicate entries removed.
func dedup[S ~[]E, E comparable](S) S { … }
type Slice []int
var s Slice
s = dedup(s) // same as s = dedup[Slice, int](s)
</pre>
<p>
the variable <code>s</code> of type <code>Slice</code> must be assignable to
the function parameter type <code>S</code> for the program to be valid.
To reduce complexity, type inference ignores the directionality of assignments,
so the type relationship between <code>Slice</code> and <code>S</code> can be
expressed via the (symmetric) type equation <code>Slice ≡<sub>A</sub> S</code>
(or <code>S ≡<sub>A</sub> Slice</code> for that matter),
where the <code><sub>A</sub></code> in <code><sub>A</sub></code>
indicates that the LHS and RHS types must match per assignability rules
(see the section on <a href="#Type_unification">type unifcation</a> for
details).
Similarly, the type parameter <code>S</code> must satisfy its constraint
<code>~[]E</code>. This can be expressed as <code>S ≡<sub>C</sub> ~[]E</code>
where <code>X ≡<sub>C</sub> Y</code> stands for
"<code>X</code> satisfies constraint <code>Y</code>".
These observations lead to a set of two equations
</p>
<pre>
Slice ≡<sub>A</sub> S (1)
S ≡<sub>C</sub> ~[]E (2)
</pre>
<p>
which now can be solved for the type parameters <code>S</code> and <code>E</code>.
From (1) a compiler can infer that the type argument for <code>S</code> is <code>Slice</code>.
Similarly, because the underlying type of <code>Slice</code> is <code>[]int</code>
and <code>[]int</code> must match <code>[]E</code> of the constraint,
a compiler can infer that <code>E</code> must be <code>int</code>.
Thus, for these two equations, type inference infers
</p>
<pre>
S ➞ Slice
E ➞ int
</pre>
<p>
Given a set of type equations, the type parameters to solve for are
the type parameters of the functions that need to be instantiated
and for which no explicit type arguments is provided.
These type parameters are called <i>bound</i> type parameters.
For instance, in the <code>dedup</code> example above, the type parameters
<code>P</code> and <code>E</code> are bound to <code>dedup</code>.
An argument to a generic function call may be a generic function itself.
The type parameters of that function are included in the set of bound
type parameters.
The types of function arguments may contain type parameters from other
functions (such as a generic function enclosing a function call).
Those type parameters may also appear in type equations but they are
not bound in that context.
Type equations are always solved for the bound type parameters only.
</p>
<p>
Type inference supports calls of generic functions and assignments
of generic functions to (explicitly function-typed) variables.
This includes passing generic functions as arguments to other
(possibly also generic) functions, and returning generic functions
as results.
Type inference operates on a set of equations specific to each of
these cases.
The equations are as follows (type argument lists are omitted for clarity):
</p>
<ul>
<li>
a <a href="#Type_parameter_declarations">type parameter list</a>
<p>
For a function call <code>f(a<sub>0</sub>, a<sub>1</sub>, …)</code> where
<code>f</code> or a function argument <code>a<sub>i</sub></code> is
a generic function:
<br>
Each pair <code>(a<sub>i</sub>, p<sub>i</sub>)</code> of corresponding
function arguments and parameters where <code>a<sub>i</sub></code> is not an
<a href="#Constants">untyped constant</a> yields an equation
<code>typeof(p<sub>i</sub>) ≡<sub>A</sub> typeof(a<sub>i</sub>)</code>.
<br>
If <code>a<sub>i</sub></code> is an untyped constant <code>c<sub>j</sub></code>,
and <code>p<sub>i</sub></code> is a bound type parameter <code>P<sub>k</sub></code>,
the pair <code>(c<sub>j</sub>, P<sub>k</sub>)</code> is collected separately from
the type equations.
</p>
</li>
<li>
a substitution map <i>M</i> initialized with the known type arguments, if any
<p>
For an assignment <code>v = f</code> of a generic function <code>f</code> to a
(non-generic) variable <code>v</code> of function type:
<br>
<code>typeof(v) ≡<sub>A</sub> typeof(f)</code>.
</p>
</li>
<li>
a (possibly empty) list of ordinary function arguments (in case of a function call only)
<p>
For a return statement <code>return …, f, … </code> where <code>f</code> is a
generic function returned as a result to a (non-generic) result variable
of function type:
<br>
<code>typeof(r) ≡<sub>A</sub> typeof(f)</code>.
</p>
</li>
</ul>
<p>
and then proceeds with the following steps:
Additionally, each type parameter <code>P<sub>k</sub></code> and corresponding type constraint
<code>C<sub>k</sub></code> yields the type equation
<code>P<sub>k</sub><sub>C</sub> C<sub>k</sub></code>.
</p>
<p>
Type inference gives precedence to type information obtained from typed operands
before considering untyped constants.
Therefore, inference proceeds in two phases:
</p>
<ol>
<li>
apply <a href="#Function_argument_type_inference"><i>function argument type inference</i></a>
to all <i>typed</i> ordinary function arguments
<p>
The type equations are solved for the bound
type parameters using <a href="#Type_unification">type unification</a>.
If unification fails, type inference fails.
</p>
</li>
<li>
apply <a href="#Constraint_type_inference"><i>constraint type inference</i></a>
</li>
<li>
apply function argument type inference to all <i>untyped</i> ordinary function arguments
using the default type for each of the untyped function arguments
</li>
<li>
apply constraint type inference
<p>
For each bound type parameter <code>P<sub>k</sub></code> for which no type argument
has been inferred yet and for which one or more pairs
<code>(c<sub>j</sub>, P<sub>k</sub>)</code> with that same type parameter
were collected, determine the <a href="#Constant_expressions">constant kind</a>
of the constants <code>c<sub>j</sub></code> in all those pairs the same way as for
<a href="#Constant_expressions">constant expressions</a>.
The type argument for <code>P<sub>k</sub></code> is the
<a href="#Constants">default type</a> for the determined constant kind.
If a constant kind cannot be determined due to conflicting constant kinds,
type inference fails.
</p>
</li>
</ol>
<p>
If there are no ordinary or untyped function arguments, the respective steps are skipped.
Constraint type inference is skipped if the previous step didn't infer any new type arguments,
but it is run at least once if there are missing type arguments.
If not all type arguments have been found after these two phases, type inference fails.
</p>
<p>
The substitution map <i>M</i> is carried through all steps, and each step may add entries to <i>M</i>.
The process stops as soon as <i>M</i> has a type argument for each type parameter or if an inference step fails.
If an inference step fails, or if <i>M</i> is still missing type arguments after the last step, type inference fails.
If the two phases are successful, type inference determined a type argument for each
bound type parameter:
</p>
<pre>
P<sub>k</sub> ➞ A<sub>k</sub>
</pre>
<p>
A type argument <code>A<sub>k</sub></code> may be a composite type,
containing other bound type parameters <code>P<sub>k</sub></code> as element types
(or even be just another bound type parameter).
In a process of repeated simplification, the bound type parameters in each type
argument are substituted with the respective type arguments for those type
parameters until each type argument is free of bound type parameters.
</p>
<p>
If type arguments contain cyclic references to themselves
through bound type parameters, simplification and thus type
inference fails.
Otherwise, type inference succeeds.
</p>
<h4 id="Type_unification">Type unification</h4>
<p>
<em>
Note: This section is not up-to-date for Go 1.21.
</em>
</p>
<p>
Type inference is based on <i>type unification</i>. A single unification step
applies to a <a href="#Type_inference">substitution map</a> and two types, either
@ -4546,264 +4698,6 @@ and the type literal <code>[]E</code>, unification compares <code>[]float64</cod
the substitution map.
</p>
<h4 id="Function_argument_type_inference">Function argument type inference</h4>
<!-- In this section and the section on constraint type inference we start with examples
rather than have the examples follow the rules as is customary elsewhere in spec.
Hopefully this helps building an intuition and makes the rules easier to follow. -->
<p>
Function argument type inference infers type arguments from function arguments:
if a function parameter is declared with a type <code>T</code> that uses
type parameters,
<a href="#Type_unification">unifying</a> the type of the corresponding
function argument with <code>T</code> may infer type arguments for the type
parameters used by <code>T</code>.
</p>
<p>
For instance, given the generic function
</p>
<pre>
func scale[Number ~int64|~float64|~complex128](v []Number, s Number) []Number
</pre>
<p>
and the call
</p>
<pre>
var vector []float64
scaledVector := scale(vector, 42)
</pre>
<p>
the type argument for <code>Number</code> can be inferred from the function argument
<code>vector</code> by unifying the type of <code>vector</code> with the corresponding
parameter type: <code>[]float64</code> and <code>[]Number</code>
match in structure and <code>float64</code> matches with <code>Number</code>.
This adds the entry <code>Number</code> &RightArrow; <code>float64</code> to the
<a href="#Type_unification">substitution map</a>.
Untyped arguments, such as the second function argument <code>42</code> here, are ignored
in the first round of function argument type inference and only considered if there are
unresolved type parameters left.
</p>
<p>
Inference happens in two separate phases; each phase operates on a specific list of
(parameter, argument) pairs:
</p>
<ol>
<li>
The list <i>Lt</i> contains all (parameter, argument) pairs where the parameter
type uses type parameters and where the function argument is <i>typed</i>.
</li>
<li>
The list <i>Lu</i> contains all remaining pairs where the parameter type is a single
type parameter. In this list, the respective function arguments are untyped.
</li>
</ol>
<p>
Any other (parameter, argument) pair is ignored.
</p>
<p>
By construction, the arguments of the pairs in <i>Lu</i> are <i>untyped</i> constants
(or the untyped boolean result of a comparison). And because <a href="#Constants">default types</a>
of untyped values are always predeclared non-composite types, they can never match against
a composite type, so it is sufficient to only consider parameter types that are single type
parameters.
</p>
<p>
Each list is processed in a separate phase:
</p>
<ol>
<li>
In the first phase, the parameter and argument types of each pair in <i>Lt</i>
are unified. If unification succeeds for a pair, it may yield new entries that
are added to the substitution map <i>M</i>. If unification fails, type inference
fails.
</li>
<li>
The second phase considers the entries of list <i>Lu</i>. Type parameters for
which the type argument has already been determined are ignored in this phase.
For each remaining pair, the parameter type (which is a single type parameter) and
the <a href="#Constants">default type</a> of the corresponding untyped argument is
unified. If unification fails, type inference fails.
</li>
</ol>
<p>
While unification is successful, processing of each list continues until all list elements
are considered, even if all type arguments are inferred before the last list element has
been processed.
</p>
<p>
Example:
</p>
<pre>
func min[T ~int|~float64](x, y T) T
var x int
min(x, 2.0) // T is int, inferred from typed argument x; 2.0 is assignable to int
min(1.0, 2.0) // T is float64, inferred from default type for 1.0 and matches default type for 2.0
min(1.0, 2) // illegal: default type float64 (for 1.0) doesn't match default type int (for 2)
</pre>
<p>
In the example <code>min(1.0, 2)</code>, processing the function argument <code>1.0</code>
yields the substitution map entry <code>T</code> &RightArrow; <code>float64</code>. Because
processing continues until all untyped arguments are considered, an error is reported. This
ensures that type inference does not depend on the order of the untyped arguments.
</p>
<h4 id="Constraint_type_inference">Constraint type inference</h4>
<p>
Constraint type inference infers type arguments by considering type constraints.
If a type parameter <code>P</code> has a constraint with a
<a href="#Core_types">core type</a> <code>C</code>,
<a href="#Type_unification">unifying</a> <code>P</code> with <code>C</code>
may infer additional type arguments, either the type argument for <code>P</code>,
or if that is already known, possibly the type arguments for type parameters
used in <code>C</code>.
</p>
<p>
For instance, consider the type parameter list with type parameters <code>List</code> and
<code>Elem</code>:
</p>
<pre>
[List ~[]Elem, Elem any]
</pre>
<p>
Constraint type inference can deduce the type of <code>Elem</code> from the type argument
for <code>List</code> because <code>Elem</code> is a type parameter in the core type
<code>[]Elem</code> of <code>List</code>.
If the type argument is <code>Bytes</code>:
</p>
<pre>
type Bytes []byte
</pre>
<p>
unifying the underlying type of <code>Bytes</code> with the core type means
unifying <code>[]byte</code> with <code>[]Elem</code>. That unification succeeds and yields
the <a href="#Type_unification">substitution map</a> entry
<code>Elem</code> &RightArrow; <code>byte</code>.
Thus, in this example, constraint type inference can infer the second type argument from the
first one.
</p>
<p>
Using the core type of a constraint may lose some information: In the (unlikely) case that
the constraint's type set contains a single <a href="#Type_definitions">defined type</a>
<code>N</code>, the corresponding core type is <code>N</code>'s underlying type rather than
<code>N</code> itself. In this case, constraint type inference may succeed but instantiation
will fail because the inferred type is not in the type set of the constraint.
Thus, constraint type inference uses the <i>adjusted core type</i> of
a constraint: if the type set contains a single type, use that type; otherwise use the
constraint's core type.
</p>
<p>
Generally, constraint type inference proceeds in two phases: Starting with a given
substitution map <i>M</i>
</p>
<ol>
<li>
For all type parameters with an adjusted core type, unify the type parameter with that
type. If any unification fails, constraint type inference fails.
</li>
<li>
At this point, some entries in <i>M</i> may map type parameters to other
type parameters or to types containing type parameters. For each entry
<code>P</code> &RightArrow; <code>A</code> in <i>M</i> where <code>A</code> is or
contains type parameters <code>Q</code> for which there exist entries
<code>Q</code> &RightArrow; <code>B</code> in <i>M</i>, substitute those
<code>Q</code> with the respective <code>B</code> in <code>A</code>.
Stop when no further substitution is possible.
</li>
</ol>
<p>
The result of constraint type inference is the final substitution map <i>M</i> from type
parameters <code>P</code> to type arguments <code>A</code> where no type parameter <code>P</code>
appears in any of the <code>A</code>.
</p>
<p>
For instance, given the type parameter list
</p>
<pre>
[A any, B []C, C *A]
</pre>
<p>
and the single provided type argument <code>int</code> for type parameter <code>A</code>,
the initial substitution map <i>M</i> contains the entry <code>A</code> &RightArrow; <code>int</code>.
</p>
<p>
In the first phase, the type parameters <code>B</code> and <code>C</code> are unified
with the core type of their respective constraints. This adds the entries
<code>B</code> &RightArrow; <code>[]C</code> and <code>C</code> &RightArrow; <code>*A</code>
to <i>M</i>.
<p>
At this point there are two entries in <i>M</i> where the right-hand side
is or contains type parameters for which there exists other entries in <i>M</i>:
<code>[]C</code> and <code>*A</code>.
In the second phase, these type parameters are replaced with their respective
types. It doesn't matter in which order this happens. Starting with the state
of <i>M</i> after the first phase:
</p>
<p>
<code>A</code> &RightArrow; <code>int</code>,
<code>B</code> &RightArrow; <code>[]C</code>,
<code>C</code> &RightArrow; <code>*A</code>
</p>
<p>
Replace <code>A</code> on the right-hand side of &RightArrow; with <code>int</code>:
</p>
<p>
<code>A</code> &RightArrow; <code>int</code>,
<code>B</code> &RightArrow; <code>[]C</code>,
<code>C</code> &RightArrow; <code>*int</code>
</p>
<p>
Replace <code>C</code> on the right-hand side of &RightArrow; with <code>*int</code>:
</p>
<p>
<code>A</code> &RightArrow; <code>int</code>,
<code>B</code> &RightArrow; <code>[]*int</code>,
<code>C</code> &RightArrow; <code>*int</code>
</p>
<p>
At this point no further substitution is possible and the map is full.
Therefore, <code>M</code> represents the final map of type parameters
to type arguments for the given type parameter list.
</p>
<h3 id="Operators">Operators</h3>
<p>
@ -5479,7 +5373,7 @@ in any of these cases:
ignoring struct tags (see below),
<code>x</code>'s type and <code>T</code> are not
<a href="#Type_parameter_declarations">type parameters</a> but have
<a href="#Type_identity">identical</a> <a href="#Types">underlying types</a>.
<a href="#Type_identity">identical</a> <a href="#Underlying_types">underlying types</a>.
</li>
<li>
ignoring struct tags (see below),
@ -8291,7 +8185,7 @@ of if the general conversion rules take care of this.
<p>
A <code>Pointer</code> is a <a href="#Pointer_types">pointer type</a> but a <code>Pointer</code>
value may not be <a href="#Address_operators">dereferenced</a>.
Any pointer or value of <a href="#Types">underlying type</a> <code>uintptr</code> can be
Any pointer or value of <a href="#Underlying_types">underlying type</a> <code>uintptr</code> can be
<a href="#Conversions">converted</a> to a type of underlying type <code>Pointer</code> and vice versa.
The effect of converting between <code>Pointer</code> and <code>uintptr</code> is implementation-defined.
</p>