mirror of https://github.com/golang/go.git
105 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
a61524d103 |
cmd/internal/obj: add Prog.SetFrom3{Reg,Const}
These are the the most common uses, and they reduce line noise. I don't love adding new deprecated APIs, but since they're trivial wrappers, it'll be very easy to update them along with the rest. No functional changes; passes toolstash-check. Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/296009 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
6c34d2f420 |
[dev.regabi] cmd/compile: split out package ssagen [generated]
[git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
0ced54062e |
[dev.regabi] cmd/compile: split out package objw [generated]
Object file writing routines are used not just at the end
of the compilation but also during static data layout in walk.
Split them into their own package.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# Move bit vector to new package bitvec
mv bvec.n bvec.N
mv bvec.b bvec.B
mv bvec BitVec
mv bvalloc New
mv bvbulkalloc NewBulk
mv bulkBvec.next bulkBvec.Next
mv bulkBvec Bulk
mv H0 h0
mv Hp hp
# Leave bvecSet and bitmap hashes behind - not needed as broadly.
mv bvecSet.extractUniqe bvecSet.extractUnique
mv h0 bvecSet bvecSet.grow bvecSet.add \
bvecSet.extractUnique hashbitmap bvset.go
mv bv.go cmd/compile/internal/bitvec
ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 {
import "cmd/internal/obj"
var a *obj.Addr
var i int64
Addrconst(a, i) -> a.SetConst(i)
var p, to *obj.Prog
Patch(p, to) -> p.To.SetTarget(to)
}
rm Addrconst Patch
# Move object-writing API to new package objw
mv duint8 Objw_Uint8
mv duint16 Objw_Uint16
mv duint32 Objw_Uint32
mv duintptr Objw_Uintptr
mv duintxx Objw_UintN
mv dsymptr Objw_SymPtr
mv dsymptrOff Objw_SymPtrOff
mv dsymptrWeakOff Objw_SymPtrWeakOff
mv ggloblsym Objw_Global
mv dbvec Objw_BitVec
mv newProgs NewProgs
mv Progs.clearp Progs.Clear
mv Progs.settext Progs.SetText
mv Progs.next Progs.Next
mv Progs.pc Progs.PC
mv Progs.pos Progs.Pos
mv Progs.curfn Progs.CurFunc
mv Progs.progcache Progs.Cache
mv Progs.cacheidx Progs.CacheIndex
mv Progs.nextLive Progs.NextLive
mv Progs.prevLive Progs.PrevLive
mv Progs.Appendpp Progs.Append
mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex
mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint
mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \
Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \
Objw_BitVec \
objw.go
mv sharedProgArray NewProgs Progs \
LivenessIndex StackMapDontCare \
LivenessDontCare LivenessIndex.StackMapValid \
Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \
prog.go
mv prog.go objw.go cmd/compile/internal/objw
# Move ggloblnod to obj with the rest of the non-objw higher-level writing.
mv ggloblnod obj.go
'
cd ../objw
rf '
mv Objw_Uint8 Uint8
mv Objw_Uint16 Uint16
mv Objw_Uint32 Uint32
mv Objw_Uintptr Uintptr
mv Objw_UintN UintN
mv Objw_SymPtr SymPtr
mv Objw_SymPtrOff SymPtrOff
mv Objw_SymPtrWeakOff SymPtrWeakOff
mv Objw_Global Global
mv Objw_BitVec BitVec
'
Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/279310
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
41f3af9d04 |
[dev.regabi] cmd/compile: replace *Node type with an interface Node [generated]
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.
The previous CL defined an interface INode modeling a *Node.
This CL:
- Changes all references outside internal/ir to use INode,
along with many references inside internal/ir as well.
- Renames Node to node.
- Renames INode to Node
So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.
The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.
Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.
% benchstat bench.old bench.new
name old time/op new time/op delta
Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9)
Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9)
GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9)
Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9)
SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10)
Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8)
GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8)
Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9)
Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9)
XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8)
LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8)
ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10)
LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8)
StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9)
[Geo mean] 480ms 510ms +6.31%
name old user-time/op new user-time/op delta
Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10)
Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9)
GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9)
Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9)
SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10)
Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9)
GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9)
Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10)
Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10)
XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9)
LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9)
ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10)
LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8)
[Geo mean] 483ms 504ms +4.41%
[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
mv Node OldNode
add node.go \
type Node = *OldNode
'
: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
cd ../ssa
rf '
ex {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
/type Node = \*OldNode/d
s/\*OldNode/Node/g
s/^func (n Node)/func (n *OldNode)/
s/OldNode/node/g
s/type INode interface/type Node interface/
s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go
sed -i '' '
s/{Func{}, 136, 248}/{Func{}, 152, 280}/
s/{Name{}, 32, 56}/{Name{}, 44, 80}/
s/{Param{}, 24, 48}/{Param{}, 44, 88}/
s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go
cd ../ssa
sed -i '' '
s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go
cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go
cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u
Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
84e2bd611f |
[dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated]
If we want to break up package gc at all, we will need to move
the compiler IR it defines into a separate package that can be
imported by packages that gc itself imports. This CL does that.
It also removes the TINT8 etc aliases so that all code is clear
about which package things are coming from.
This CL is automatically generated by the script below.
See the comments in the script for details about the changes.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# These names were never fully qualified
# when the types package was added.
# Do it now, to avoid confusion about where they live.
inline -rm \
Txxx \
TINT8 \
TUINT8 \
TINT16 \
TUINT16 \
TINT32 \
TUINT32 \
TINT64 \
TUINT64 \
TINT \
TUINT \
TUINTPTR \
TCOMPLEX64 \
TCOMPLEX128 \
TFLOAT32 \
TFLOAT64 \
TBOOL \
TPTR \
TFUNC \
TSLICE \
TARRAY \
TSTRUCT \
TCHAN \
TMAP \
TINTER \
TFORW \
TANY \
TSTRING \
TUNSAFEPTR \
TIDEAL \
TNIL \
TBLANK \
TFUNCARGS \
TCHANARGS \
NTYPE \
BADWIDTH
# esc.go and escape.go do not need to be split.
# Append esc.go onto the end of escape.go.
mv esc.go escape.go
# Pull out the type format installation from func Main,
# so it can be carried into package ir.
mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats
# Names that need to be exported for use by code left in gc.
mv Isconst IsConst
mv asNode AsNode
mv asNodes AsNodes
mv asTypesNode AsTypesNode
mv basicnames BasicTypeNames
mv builtinpkg BuiltinPkg
mv consttype ConstType
mv dumplist DumpList
mv fdumplist FDumpList
mv fmtMode FmtMode
mv goopnames OpNames
mv inspect Inspect
mv inspectList InspectList
mv localpkg LocalPkg
mv nblank BlankNode
mv numImport NumImport
mv opprec OpPrec
mv origSym OrigSym
mv stmtwithinit StmtWithInit
mv dump DumpAny
mv fdump FDumpAny
mv nod Nod
mv nodl NodAt
mv newname NewName
mv newnamel NewNameAt
mv assertRepresents AssertValidTypeForConst
mv represents ValidTypeForConst
mv nodlit NewLiteral
# Types and fields that need to be exported for use by gc.
mv nowritebarrierrecCallSym SymAndPos
mv SymAndPos.lineno SymAndPos.Pos
mv SymAndPos.target SymAndPos.Sym
mv Func.lsym Func.LSym
mv Func.setWBPos Func.SetWBPos
mv Func.numReturns Func.NumReturns
mv Func.numDefers Func.NumDefers
mv Func.nwbrCalls Func.NWBRCalls
# initLSym is an algorithm left behind in gc,
# not an operation on Func itself.
mv Func.initLSym initLSym
mv nodeQueue NodeQueue
mv NodeQueue.empty NodeQueue.Empty
mv NodeQueue.popLeft NodeQueue.PopLeft
mv NodeQueue.pushRight NodeQueue.PushRight
# Many methods on Node are actually algorithms that
# would apply to any node implementation.
# Those become plain functions.
mv Node.funcname FuncName
mv Node.isBlank IsBlank
mv Node.isGoConst isGoConst
mv Node.isNil IsNil
mv Node.isParamHeapCopy isParamHeapCopy
mv Node.isParamStackCopy isParamStackCopy
mv Node.isSimpleName isSimpleName
mv Node.mayBeShared MayBeShared
mv Node.pkgFuncName PkgFuncName
mv Node.backingArrayPtrLen backingArrayPtrLen
mv Node.isterminating isTermNode
mv Node.labeledControl labeledControl
mv Nodes.isterminating isTermNodes
mv Nodes.sigerr fmtSignature
mv Node.MethodName methodExprName
mv Node.MethodFunc methodExprFunc
mv Node.IsMethod IsMethod
# Every node will need to implement RawCopy;
# Copy and SepCopy algorithms will use it.
mv Node.rawcopy Node.RawCopy
mv Node.copy Copy
mv Node.sepcopy SepCopy
# Extract Node.Format method body into func FmtNode,
# but leave method wrapper behind.
mv Node.Format:0,$ FmtNode
# Formatting helpers that will apply to all node implementations.
mv Node.Line Line
mv Node.exprfmt exprFmt
mv Node.jconv jconvFmt
mv Node.modeString modeString
mv Node.nconv nconvFmt
mv Node.nodedump nodeDumpFmt
mv Node.nodefmt nodeFmt
mv Node.stmtfmt stmtFmt
# Constant support needed for code moving to ir.
mv okforconst OKForConst
mv vconv FmtConst
mv int64Val Int64Val
mv float64Val Float64Val
mv Node.ValueInterface ConstValue
# Organize code into files.
mv LocalPkg BuiltinPkg ir.go
mv NumImport InstallTypeFormats Line fmt.go
mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \
AsNode AsTypesNode BlankNode OrigSym \
Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \
IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \
Node.RawCopy SepCopy Copy \
IsNil IsBlank IsMethod \
Node.Typ Node.StorageClass node.go
mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go
# Move files to new ir package.
mv bitset.go class_string.go dump.go fmt.go \
ir.go node.go op_string.go val.go \
sizeof_test.go cmd/compile/internal/ir
'
: # fix mkbuiltin.go to generate the changes made to builtin.go during rf
sed -i '' '
s/\[T/[types.T/g
s/\*Node/*ir.Node/g
/internal\/types/c \
fmt.Fprintln(&b, `import (`) \
fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \
fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \
fmt.Fprintln(&b, `)`)
' mkbuiltin.go
gofmt -w mkbuiltin.go
: # update cmd/dist to add internal/ir
cd ../../../dist
sed -i '' '/compile.internal.gc/a\
"cmd/compile/internal/ir",
' buildtool.go
gofmt -w buildtool.go
: # update cmd/compile TestFormats
cd ../..
go install std cmd
cd cmd/compile
go test -u || go test # first one updates but fails; second passes
Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db
Reviewed-on: https://go-review.googlesource.com/c/go/+/273008
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
26b66fd60b |
[dev.regabi] cmd/compile: introduce cmd/compile/internal/base [generated]
Move Flag, Debug, Ctxt, Exit, and error messages to
new package cmd/compile/internal/base.
These are the core functionality that everything in gc uses
and which otherwise prevent splitting any other code
out of gc into different packages.
A minor milestone: the compiler source code
no longer contains the string "yy".
[git-generate]
cd src/cmd/compile/internal/gc
rf '
mv atExit AtExit
mv Ctxt atExitFuncs AtExit Exit base.go
mv lineno Pos
mv linestr FmtPos
mv flusherrors FlushErrors
mv yyerror Errorf
mv yyerrorl ErrorfAt
mv yyerrorv ErrorfVers
mv noder.yyerrorpos noder.errorAt
mv Warnl WarnfAt
mv errorexit ErrorExit
mv base.go debug.go flag.go print.go cmd/compile/internal/base
'
: # update comments
sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go
: # bootstrap.go is not built by default so invisible to rf
sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go
goimports -w bootstrap.go
: # update cmd/dist to add internal/base
cd ../../../dist
sed -i '' '/internal.amd64/a\
"cmd/compile/internal/base",
' buildtool.go
gofmt -w buildtool.go
Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/272250
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
3c240f5d17 |
[dev.regabi] cmd/compile: clean up debug flag (-d) handling [generated]
The debug table is not as haphazard as flags, but there are still
a few mismatches between command-line names and variable names.
This CL moves them all into a consistent home (var Debug, like var Flag).
Code updated automatically using the rf command below.
A followup CL will make a few manual cleanups, leaving this CL
completely automated and easier to regenerate during merge
conflicts.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add main.go var Debug struct{}
mv Debug_append Debug.Append
mv Debug_checkptr Debug.Checkptr
mv Debug_closure Debug.Closure
mv Debug_compilelater Debug.CompileLater
mv disable_checknil Debug.DisableNil
mv debug_dclstack Debug.DclStack
mv Debug_gcprog Debug.GCProg
mv Debug_libfuzzer Debug.Libfuzzer
mv Debug_checknil Debug.Nil
mv Debug_panic Debug.Panic
mv Debug_slice Debug.Slice
mv Debug_typeassert Debug.TypeAssert
mv Debug_wb Debug.WB
mv Debug_export Debug.Export
mv Debug_pctab Debug.PCTab
mv Debug_locationlist Debug.LocationLists
mv Debug_typecheckinl Debug.TypecheckInl
mv Debug_gendwarfinl Debug.DwarfInl
mv Debug_softfloat Debug.SoftFloat
mv Debug_defer Debug.Defer
mv Debug_dumpptrs Debug.DumpPtrs
mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug
mv debugtab Debug parseDebug \
debugHelpHeader debugHelpFooter \
debug.go
# Remove //go:generate line copied from main.go
rm debug.go:/go:generate/-+
'
Change-Id: I625761ca5659be4052f7161a83baa00df75cca91
Reviewed-on: https://go-review.googlesource.com/c/go/+/272246
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
cb674b5c13 |
cmd/compile,cmd/asm: fix function pointer call perf regression on ppc64
by inserting hint when using bclrl. Using this instruction as subroutine call is not the expected default behavior, and as a result confuses the branch predictor. The default expected behavior is a conditional return from a subroutine. We can change this assumption by encoding a hint this is not a subroutine return. The regex benchmarks are a pretty good example of how much this hint can help generic ppc64le code on a power9 machine: name old time/op new time/op delta Find 606ns ± 0% 447ns ± 0% -26.27% FindAllNoMatches 309ns ± 0% 205ns ± 0% -33.72% FindString 609ns ± 0% 451ns ± 0% -26.04% FindSubmatch 734ns ± 0% 594ns ± 0% -19.07% FindStringSubmatch 706ns ± 0% 574ns ± 0% -18.83% Literal 177ns ± 0% 136ns ± 0% -22.89% NotLiteral 4.69µs ± 0% 2.34µs ± 0% -50.14% MatchClass 6.05µs ± 0% 3.26µs ± 0% -46.08% MatchClass_InRange 5.93µs ± 0% 3.15µs ± 0% -46.86% ReplaceAll 3.15µs ± 0% 2.18µs ± 0% -30.77% AnchoredLiteralShortNonMatch 156ns ± 0% 109ns ± 0% -30.61% AnchoredLiteralLongNonMatch 192ns ± 0% 136ns ± 0% -29.34% AnchoredShortMatch 268ns ± 0% 209ns ± 0% -22.00% AnchoredLongMatch 472ns ± 0% 357ns ± 0% -24.30% OnePassShortA 1.16µs ± 0% 0.87µs ± 0% -25.03% NotOnePassShortA 1.34µs ± 0% 1.20µs ± 0% -10.63% OnePassShortB 940ns ± 0% 655ns ± 0% -30.29% NotOnePassShortB 873ns ± 0% 703ns ± 0% -19.52% OnePassLongPrefix 258ns ± 0% 155ns ± 0% -40.13% OnePassLongNotPrefix 943ns ± 0% 529ns ± 0% -43.89% MatchParallelShared 591ns ± 0% 436ns ± 0% -26.31% MatchParallelCopied 596ns ± 0% 435ns ± 0% -27.10% QuoteMetaAll 186ns ± 0% 186ns ± 0% -0.16% QuoteMetaNone 55.9ns ± 0% 55.9ns ± 0% +0.02% Compile/Onepass 9.64µs ± 0% 9.26µs ± 0% -3.97% Compile/Medium 21.7µs ± 0% 20.6µs ± 0% -4.90% Compile/Hard 174µs ± 0% 174µs ± 0% +0.07% Match/Easy0/16 7.35ns ± 0% 7.34ns ± 0% -0.11% Match/Easy0/32 116ns ± 0% 97ns ± 0% -16.27% Match/Easy0/1K 592ns ± 0% 562ns ± 0% -5.04% Match/Easy0/32K 12.6µs ± 0% 12.5µs ± 0% -0.64% Match/Easy0/1M 556µs ± 0% 556µs ± 0% -0.00% Match/Easy0/32M 17.7ms ± 0% 17.7ms ± 0% +0.05% Match/Easy0i/16 7.34ns ± 0% 7.35ns ± 0% +0.10% Match/Easy0i/32 2.82µs ± 0% 1.64µs ± 0% -41.71% Match/Easy0i/1K 83.2µs ± 0% 48.2µs ± 0% -42.06% Match/Easy0i/32K 2.13ms ± 0% 1.80ms ± 0% -15.34% Match/Easy0i/1M 68.1ms ± 0% 57.6ms ± 0% -15.31% Match/Easy0i/32M 2.18s ± 0% 1.80s ± 0% -17.52% Match/Easy1/16 7.36ns ± 0% 7.34ns ± 0% -0.24% Match/Easy1/32 118ns ± 0% 96ns ± 0% -18.72% Match/Easy1/1K 2.46µs ± 0% 1.58µs ± 0% -35.65% Match/Easy1/32K 80.2µs ± 0% 54.6µs ± 0% -31.92% Match/Easy1/1M 2.75ms ± 0% 1.88ms ± 0% -31.66% Match/Easy1/32M 87.5ms ± 0% 59.8ms ± 0% -31.62% Match/Medium/16 7.34ns ± 0% 7.34ns ± 0% +0.01% Match/Medium/32 2.60µs ± 0% 1.50µs ± 0% -42.61% Match/Medium/1K 78.1µs ± 0% 43.7µs ± 0% -44.06% Match/Medium/32K 2.08ms ± 0% 1.52ms ± 0% -27.11% Match/Medium/1M 66.5ms ± 0% 48.6ms ± 0% -26.96% Match/Medium/32M 2.14s ± 0% 1.60s ± 0% -25.18% Match/Hard/16 7.35ns ± 0% 7.35ns ± 0% +0.03% Match/Hard/32 3.58µs ± 0% 2.44µs ± 0% -31.82% Match/Hard/1K 108µs ± 0% 75µs ± 0% -31.04% Match/Hard/32K 2.79ms ± 0% 2.25ms ± 0% -19.30% Match/Hard/1M 89.4ms ± 0% 72.2ms ± 0% -19.26% Match/Hard/32M 2.91s ± 0% 2.37s ± 0% -18.60% Match/Hard1/16 11.1µs ± 0% 8.3µs ± 0% -25.07% Match/Hard1/32 21.4µs ± 0% 16.1µs ± 0% -24.85% Match/Hard1/1K 658µs ± 0% 498µs ± 0% -24.27% Match/Hard1/32K 12.2ms ± 0% 11.7ms ± 0% -4.60% Match/Hard1/1M 391ms ± 0% 374ms ± 0% -4.40% Match/Hard1/32M 12.6s ± 0% 12.0s ± 0% -4.68% Match_onepass_regex/16 870ns ± 0% 611ns ± 0% -29.79% Match_onepass_regex/32 1.58µs ± 0% 1.08µs ± 0% -31.48% Match_onepass_regex/1K 45.7µs ± 0% 30.3µs ± 0% -33.58% Match_onepass_regex/32K 1.45ms ± 0% 0.97ms ± 0% -33.20% Match_onepass_regex/1M 46.2ms ± 0% 30.9ms ± 0% -33.01% Match_onepass_regex/32M 1.46s ± 0% 0.99s ± 0% -32.02% name old alloc/op new alloc/op delta Find 0.00B 0.00B 0.00% FindAllNoMatches 0.00B 0.00B 0.00% FindString 0.00B 0.00B 0.00% FindSubmatch 48.0B ± 0% 48.0B ± 0% 0.00% FindStringSubmatch 32.0B ± 0% 32.0B ± 0% 0.00% Compile/Onepass 4.02kB ± 0% 4.02kB ± 0% 0.00% Compile/Medium 9.39kB ± 0% 9.39kB ± 0% 0.00% Compile/Hard 84.7kB ± 0% 84.7kB ± 0% 0.00% Match_onepass_regex/16 0.00B 0.00B 0.00% Match_onepass_regex/32 0.00B 0.00B 0.00% Match_onepass_regex/1K 0.00B 0.00B 0.00% Match_onepass_regex/32K 0.00B 0.00B 0.00% Match_onepass_regex/1M 5.00B ± 0% 3.00B ± 0% -40.00% Match_onepass_regex/32M 136B ± 0% 68B ± 0% -50.00% name old allocs/op new allocs/op delta Find 0.00 0.00 0.00% FindAllNoMatches 0.00 0.00 0.00% FindString 0.00 0.00 0.00% FindSubmatch 1.00 ± 0% 1.00 ± 0% 0.00% FindStringSubmatch 1.00 ± 0% 1.00 ± 0% 0.00% Compile/Onepass 52.0 ± 0% 52.0 ± 0% 0.00% Compile/Medium 112 ± 0% 112 ± 0% 0.00% Compile/Hard 424 ± 0% 424 ± 0% 0.00% Match_onepass_regex/16 0.00 0.00 0.00% Match_onepass_regex/32 0.00 0.00 0.00% Match_onepass_regex/1K 0.00 0.00 0.00% Match_onepass_regex/32K 0.00 0.00 0.00% Match_onepass_regex/1M 0.00 0.00 0.00% Match_onepass_regex/32M 2.00 ± 0% 1.00 ± 0% -50.00% name old speed new speed delta QuoteMetaAll 75.2MB/s ± 0% 75.3MB/s ± 0% +0.15% QuoteMetaNone 465MB/s ± 0% 465MB/s ± 0% -0.02% Match/Easy0/16 2.18GB/s ± 0% 2.18GB/s ± 0% +0.10% Match/Easy0/32 276MB/s ± 0% 330MB/s ± 0% +19.46% Match/Easy0/1K 1.73GB/s ± 0% 1.82GB/s ± 0% +5.29% Match/Easy0/32K 2.60GB/s ± 0% 2.62GB/s ± 0% +0.64% Match/Easy0/1M 1.89GB/s ± 0% 1.89GB/s ± 0% +0.00% Match/Easy0/32M 1.89GB/s ± 0% 1.89GB/s ± 0% -0.05% Match/Easy0i/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.10% Match/Easy0i/32 11.4MB/s ± 0% 19.5MB/s ± 0% +71.48% Match/Easy0i/1K 12.3MB/s ± 0% 21.2MB/s ± 0% +72.62% Match/Easy0i/32K 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12% Match/Easy0i/1M 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12% Match/Easy0i/32M 15.4MB/s ± 0% 18.6MB/s ± 0% +21.21% Match/Easy1/16 2.17GB/s ± 0% 2.18GB/s ± 0% +0.24% Match/Easy1/32 271MB/s ± 0% 333MB/s ± 0% +23.07% Match/Easy1/1K 417MB/s ± 0% 648MB/s ± 0% +55.38% Match/Easy1/32K 409MB/s ± 0% 600MB/s ± 0% +46.88% Match/Easy1/1M 381MB/s ± 0% 558MB/s ± 0% +46.33% Match/Easy1/32M 383MB/s ± 0% 561MB/s ± 0% +46.25% Match/Medium/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.01% Match/Medium/32 12.3MB/s ± 0% 21.4MB/s ± 0% +74.13% Match/Medium/1K 13.1MB/s ± 0% 23.4MB/s ± 0% +78.73% Match/Medium/32K 15.7MB/s ± 0% 21.6MB/s ± 0% +37.23% Match/Medium/1M 15.8MB/s ± 0% 21.6MB/s ± 0% +36.93% Match/Medium/32M 15.7MB/s ± 0% 21.0MB/s ± 0% +33.67% Match/Hard/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.03% Match/Hard/32 8.93MB/s ± 0% 13.10MB/s ± 0% +46.70% Match/Hard/1K 9.48MB/s ± 0% 13.74MB/s ± 0% +44.94% Match/Hard/32K 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87% Match/Hard/1M 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87% Match/Hard/32M 11.6MB/s ± 0% 14.2MB/s ± 0% +22.86% Match/Hard1/16 1.44MB/s ± 0% 1.93MB/s ± 0% +34.03% Match/Hard1/32 1.49MB/s ± 0% 1.99MB/s ± 0% +33.56% Match/Hard1/1K 1.56MB/s ± 0% 2.05MB/s ± 0% +31.41% Match/Hard1/32K 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48% Match/Hard1/1M 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48% Match/Hard1/32M 2.66MB/s ± 0% 2.79MB/s ± 0% +4.89% Match_onepass_regex/16 18.4MB/s ± 0% 26.2MB/s ± 0% +42.41% Match_onepass_regex/32 20.2MB/s ± 0% 29.5MB/s ± 0% +45.92% Match_onepass_regex/1K 22.4MB/s ± 0% 33.8MB/s ± 0% +50.54% Match_onepass_regex/32K 22.6MB/s ± 0% 33.9MB/s ± 0% +49.67% Match_onepass_regex/1M 22.7MB/s ± 0% 33.9MB/s ± 0% +49.27% Match_onepass_regex/32M 23.0MB/s ± 0% 33.9MB/s ± 0% +47.14% Fixes #42709 Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/271337 Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org> |
|
|
|
c3c6fbf314 |
cmd/compile: combine more 32 bit shift and mask operations on ppc64
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is and the shift value produce constant which can be encoded into an RLWINM instruction. Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate masks produces a constant which can be encoded into RLWINM. Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)). Combine rotate word + and operations which can be encoded as a single RLWINM/RLWNM instruction. The most notable performance improvements arise from the crypto benchmarks below (GOARCH=power8 on a ppc64le/linux): pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88% ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15% pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le Key 57.6ms ± 0% 52.3ms ± 0% -9.13% pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le Equal 90.9ms ± 0% 82.6ms ± 0% -9.13% DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12% Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce Reviewed-on: https://go-review.googlesource.com/c/go/+/260798 Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
e223c6cf07 |
cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on PPC64
This is a simple case of changing the operand size of the existing 8-bit And/Or. I've also updated a few operand descriptions that were out-of-sync with the implementation. Change-Id: I95ac4445d08f7958768aec9a233698a2d652a39a Reviewed-on: https://go-review.googlesource.com/c/go/+/263150 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
bdab5df40f |
cmd/compile,cmd/internal/obj/ppc64: use mulli where possible
This adds support to allow the use of mulli when one of the multiply operands is a constant that fits in 16 bits. This especially helps in the case where this instruction appears in a loop since the load of the constant is not being moved out of the loop. Some improvements seen in compress/flate on power9: Decode/Digits/Huffman/1e4 259µs ± 0% 261µs ± 0% +0.57% (p=1.000 n=1+1) Decode/Digits/Huffman/1e5 2.43ms ± 0% 2.45ms ± 0% +0.79% (p=1.000 n=1+1) Decode/Digits/Huffman/1e6 23.9ms ± 0% 24.2ms ± 0% +0.86% (p=1.000 n=1+1) Decode/Digits/Speed/1e4 278µs ± 0% 279µs ± 0% +0.34% (p=1.000 n=1+1) Decode/Digits/Speed/1e5 2.80ms ± 0% 2.81ms ± 0% +0.29% (p=1.000 n=1+1) Decode/Digits/Speed/1e6 28.0ms ± 0% 28.1ms ± 0% +0.28% (p=1.000 n=1+1) Decode/Digits/Default/1e4 278µs ± 0% 278µs ± 0% +0.28% (p=1.000 n=1+1) Decode/Digits/Default/1e5 2.68ms ± 0% 2.69ms ± 0% +0.19% (p=1.000 n=1+1) Decode/Digits/Default/1e6 26.6ms ± 0% 26.6ms ± 0% +0.21% (p=1.000 n=1+1) Decode/Digits/Compression/1e4 278µs ± 0% 278µs ± 0% +0.00% (p=1.000 n=1+1) Decode/Digits/Compression/1e5 2.68ms ± 0% 2.69ms ± 0% +0.21% (p=1.000 n=1+1) Decode/Digits/Compression/1e6 26.6ms ± 0% 26.6ms ± 0% +0.07% (p=1.000 n=1+1) Decode/Newton/Huffman/1e4 322µs ± 0% 312µs ± 0% -2.84% (p=1.000 n=1+1) Decode/Newton/Huffman/1e5 3.11ms ± 0% 2.91ms ± 0% -6.41% (p=1.000 n=1+1) Decode/Newton/Huffman/1e6 31.4ms ± 0% 29.3ms ± 0% -6.85% (p=1.000 n=1+1) Decode/Newton/Speed/1e4 282µs ± 0% 269µs ± 0% -4.69% (p=1.000 n=1+1) Decode/Newton/Speed/1e5 2.29ms ± 0% 2.20ms ± 0% -4.13% (p=1.000 n=1+1) Decode/Newton/Speed/1e6 22.7ms ± 0% 21.3ms ± 0% -6.06% (p=1.000 n=1+1) Decode/Newton/Default/1e4 254µs ± 0% 237µs ± 0% -6.60% (p=1.000 n=1+1) Decode/Newton/Default/1e5 1.86ms ± 0% 1.75ms ± 0% -5.99% (p=1.000 n=1+1) Decode/Newton/Default/1e6 18.1ms ± 0% 17.4ms ± 0% -4.10% (p=1.000 n=1+1) Decode/Newton/Compression/1e4 254µs ± 0% 244µs ± 0% -3.91% (p=1.000 n=1+1) Decode/Newton/Compression/1e5 1.85ms ± 0% 1.79ms ± 0% -3.10% (p=1.000 n=1+1) Decode/Newton/Compression/1e6 18.0ms ± 0% 17.3ms ± 0% -3.88% (p=1.000 n=1+1) Change-Id: I840320fab1c4bf64c76b001c2651ab79f23df4eb Reviewed-on: https://go-review.googlesource.com/c/go/+/259444 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
cc2a5cf4b8 |
cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regression
A recent change to improve shifts was generating some invalid cases when the rule was based on an AND. The extended mnemonics CLRLSLDI and CLRLSLWI only allow certain values for the operands and in the mask case those values were not being checked properly. This adds a check to those rules to verify that the 'b' and 'n' values used when an AND was part of the rule have correct values. There was a bug in some diag messages in asm9. The message expected 3 values but only provided 2. Those are corrected here also. The test/codegen/shift.go was updated to add a few more cases to check for the case mentioned here. Some of the comments that mention the order of operands in these extended mnemonics were wrong and those have been corrected. Fixes #41683. Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31 Reviewed-on: https://go-review.googlesource.com/c/go/+/258138 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
a424f6e45e |
cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9
This adds support for the extswsli instruction which combines extsw followed by a shift. New benchmark demonstrates the improvement: name old time/op new time/op delta ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3) Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/257017 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
967465da29 |
cmd/compile: use combined shifts to improve array addressing on ppc64x
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
|
|
7615b20d06 |
cmd/compile: generate subfic on ppc64
This merges an lis + subf into subfic, and for 32b constants lwa + subf into oris + ori + subf. The carry bit is no longer used in code generation, therefore I think we can clobber it as needed. Note, lowered borrow/carry arithmetic is self-contained and thus is not affected. A few extra rules are added to ensure early transformations to SUBFCconst don't trip up earlier rules, fold constant operations, or otherwise simplify lowering. Likewise, tests are added to ensure all rules are hit. Generic constant folding catches trivial cases, however some lowering rules insert arithmetic which can introduce new opportunities (e.g BitLen or Slicemask). I couldn't find a specific benchmark to demonstrate noteworthy improvements, but this is generating subfic in many of the default bent test binaries, so we are at least saving a little code space. Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012 Reviewed-on: https://go-review.googlesource.com/c/go/+/249461 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> |
|
|
|
2aba467933 |
cmd/compile: remove unused carry related ssa ops in ppc64
The intermediate SSA opcodes* are no longer generated during the lowering pass. The shifting rules have been improved using ISEL. Therefore, we can remove them and the rules which expand them. * The removed opcodes are: LoweredAdd64Carry ADDconstForCarry MaskIfNotCarry FlagCarryClear FlagCarrySet Change-Id: I1ebe2726ed988f29ed4800c8f57b428f7a214cd0 Reviewed-on: https://go-review.googlesource.com/c/go/+/249462 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
e7c7ce646f |
cmd/compile: combine multiply/add into maddld on ppc64le/power9
Add a new lowering rule to match and replace such instances with the MADDLD instruction available on power9 where possible. Likewise, this plumbs in a new ppc64 ssa opcode to house the newly generated MADDLD instructions. When testing ed25519, this reduced binary size by 936B. Similarly, MADDLD combination occcurs in a few other less obvious cases such as division by constant. Testing of golang.org/x/crypto/ed25519 shows non-trivial speedup during keygeneration: name old time/op new time/op delta KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19% Signing 64.3µs ± 0% 64.4µs ± 0% +0.16% Verification 147µs ± 0% 147µs ± 0% +0.11% Similarly, this test binary has shrunk by 66488B. Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0 Reviewed-on: https://go-review.googlesource.com/c/go/+/248723 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
56933fb838 |
cmd/compile,cmd/internal/obj/ppc64: use mod instructions on power9
This updates the PPC64.rules file to use the MOD instructions that are available in power9. Prior to power9 this is done using a longer sequence with multiply and divide. Included in this change is removal of the REM* opcode variations that set the CC or OV bits since their settings are based on the DIV and are not appropriate for the REM. Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa Reviewed-on: https://go-review.googlesource.com/c/go/+/229761 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
ea7126fe14 |
cmd/compile: use a Sym type instead of interface{} for symbolic offsets
Will help with strongly typed rewrite rules. Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba Reviewed-on: https://go-review.googlesource.com/c/go/+/227785 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> |
|
|
|
815509ae31 |
cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following: - Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9. These instructions do not require an index register, which allows more loads and stores within a loop without initializing multiple index registers. The LoweredQuadXXX generate LXV/STXV. - Create LoweredMoveXXXShort and LoweredZeroXXXShort for short moves that don't generate loops, and therefore don't clobber the address registers or flags. - Use registers other than R3 and R4 to avoid conflicting with registers that have already been allocated to avoid unnecessary register moves. - Eliminate the use of R14 as scratch register and use R31 instead. - Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a loop with more than 3 iterations. This performance opportunity was noticed in github.com/golang/snappy benchmarks. Results on power9: WordsDecode1e1 54.1ns ± 0% 53.8ns ± 0% -0.51% (p=0.029 n=4+4) WordsDecode1e2 287ns ± 0% 282ns ± 1% -1.83% (p=0.029 n=4+4) WordsDecode1e3 3.98µs ± 0% 3.64µs ± 0% -8.52% (p=0.029 n=4+4) WordsDecode1e4 66.9µs ± 0% 67.0µs ± 0% +0.20% (p=0.029 n=4+4) WordsDecode1e5 723µs ± 0% 723µs ± 0% -0.01% (p=0.200 n=4+4) WordsDecode1e6 7.21ms ± 0% 7.21ms ± 0% -0.02% (p=1.000 n=4+4) WordsEncode1e1 29.9ns ± 0% 29.4ns ± 0% -1.51% (p=0.029 n=4+4) WordsEncode1e2 2.12µs ± 0% 1.75µs ± 0% -17.70% (p=0.029 n=4+4) WordsEncode1e3 11.7µs ± 0% 11.2µs ± 0% -4.61% (p=0.029 n=4+4) WordsEncode1e4 119µs ± 0% 120µs ± 0% +0.36% (p=0.029 n=4+4) WordsEncode1e5 1.21ms ± 0% 1.22ms ± 0% +0.41% (p=0.029 n=4+4) WordsEncode1e6 12.0ms ± 0% 12.0ms ± 0% +0.57% (p=0.029 n=4+4) RandomEncode 286µs ± 0% 203µs ± 0% -28.82% (p=0.029 n=4+4) ExtendMatch 47.4µs ± 0% 47.0µs ± 0% -0.85% (p=0.029 n=4+4) Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858 Reviewed-on: https://go-review.googlesource.com/c/go/+/226539 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
287d67e3dd |
cmd/compile: indexed loads/stores can't be faultOnNilArg0
Because of the index, these ops can't guarantee faulting if arg0 is nil. Clean up the PPC64 index ops - they can't take a sym or an offset. Noticed while debugging #37881. I don't think it is the cause, but I guess there is a chance. Update #37881 Change-Id: Ic22925250bf7b1ba64e3cea1a65638bc4bab390c Reviewed-on: https://go-review.googlesource.com/c/go/+/224457 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
37fc092be1 |
cmd/compile: remove duplicate ppc64 rules
Const64 gets lowered to MOVDconst. Change rules using interior Const64 to use MOVDconst instead, to be less dependent on rule application order. As a result of doing this, some of the rules end up being exact duplicates; remove those. We had those exact duplicates because of the order dependency; ppc64 had no way to optimize away shifts by a constant if the initial lowering didn't catch it. Add those optimizations as well. The outcome is the same, but this makes the overall rules more robust. Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b Reviewed-on: https://go-review.googlesource.com/c/go/+/220877 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
40ebcfaa17 |
cmd/compile: enable nil check logging for other architectures.
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857 Reviewed-on: https://go-review.googlesource.com/c/go/+/204159 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
24e549a717 |
cmd/compile, cmd/internal/obj/ppc64: use LR for indirect calls
On PPC64, indirect calls can be made through LR or CTR. Currently both are used. This CL changes it to always use LR. For async preemption, to return from the injected call, we need an indirect jump back to the PC we preeempted. This jump can be made through LR or CTR. So we'll have to clobber either LR or CTR. Currently, LR is used more frequently. In particular, for a leaf function, LR is live throughout the function. We don't want to make leaf functions nonpreemptible. So we choose CTR for the call injection. For code sequences that use CTR, if it is ok to use another register, change it to. Plus, it is a call so it will clobber LR anyway. It doesn't need to also clobber CTR (even without preemption). Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8 Reviewed-on: https://go-review.googlesource.com/c/go/+/203822 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
97592b3c14 |
cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own. Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a Reviewed-on: https://go-review.googlesource.com/c/go/+/203284 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
816ff44479 |
cmd/compile: use vsx loads and stores for LoweredMove, LoweredZero on ppc64x
This improves the code generated for LoweredMove and LoweredZero by using LXVD2X and STXVD2X to move 16 bytes at a time. These instructions are now used if the size to be moved or zeroed is >= 64. These same instructions have already been used in the asm implementations for memmove and memclr. Some examples where this shows an improvement on power8: MakeSlice/Byte 27.3ns ± 1% 25.2ns ± 0% -7.69% MakeSlice/Int16 40.2ns ± 0% 35.2ns ± 0% -12.39% MakeSlice/Int 94.9ns ± 1% 77.9ns ± 0% -17.92% MakeSlice/Ptr 129ns ± 1% 103ns ± 0% -20.16% MakeSlice/Struct/24 176ns ± 1% 131ns ± 0% -25.67% MakeSlice/Struct/32 200ns ± 1% 142ns ± 0% -29.09% MakeSlice/Struct/40 220ns ± 2% 156ns ± 0% -28.82% GrowSlice/Byte 81.4ns ± 0% 73.4ns ± 0% -9.88% GrowSlice/Int16 118ns ± 1% 98ns ± 0% -17.03% GrowSlice/Int 178ns ± 1% 134ns ± 1% -24.65% GrowSlice/Ptr 249ns ± 4% 212ns ± 0% -14.94% GrowSlice/Struct/24 294ns ± 5% 215ns ± 0% -27.08% GrowSlice/Struct/32 315ns ± 1% 248ns ± 0% -21.49% GrowSlice/Struct/40 382ns ± 4% 289ns ± 1% -24.38% ExtendSlice/IntSlice 109ns ± 1% 90ns ± 1% -17.51% ExtendSlice/PointerSlice 142ns ± 2% 118ns ± 0% -16.75% ExtendSlice/NoGrow 6.02ns ± 0% 5.88ns ± 0% -2.33% Append 27.2ns ± 0% 27.6ns ± 0% +1.38% AppendGrowByte 4.20ms ± 3% 2.60ms ± 0% -38.18% AppendGrowString 134ms ± 3% 102ms ± 2% -23.62% AppendSlice/1Bytes 5.65ns ± 0% 5.67ns ± 0% +0.35% AppendSlice/4Bytes 6.40ns ± 0% 6.55ns ± 0% +2.34% AppendSlice/7Bytes 8.74ns ± 0% 8.84ns ± 0% +1.14% AppendSlice/8Bytes 5.68ns ± 0% 5.70ns ± 0% +0.40% AppendSlice/15Bytes 9.31ns ± 0% 9.39ns ± 0% +0.86% AppendSlice/16Bytes 14.0ns ± 0% 5.8ns ± 0% -58.32% AppendSlice/32Bytes 5.72ns ± 0% 5.68ns ± 0% -0.66% AppendSliceLarge/1024Bytes 918ns ± 8% 615ns ± 1% -33.00% AppendSliceLarge/4096Bytes 3.25µs ± 1% 1.92µs ± 1% -40.84% AppendSliceLarge/16384Bytes 8.70µs ± 2% 4.69µs ± 0% -46.08% AppendSliceLarge/65536Bytes 18.1µs ± 3% 7.9µs ± 0% -56.30% AppendSliceLarge/262144Bytes 69.8µs ± 2% 25.9µs ± 0% -62.91% AppendSliceLarge/1048576Bytes 258µs ± 1% 93µs ± 0% -63.96% Change-Id: I21625dbe231a2029ddb9f7d73f5a6417b35c1e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/199639 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
9c2e7e8bed |
cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
55924135ee |
cmd/compile: fix register masks of ANDCC et al. on PPC64
PPC64's ANDCC, ORCC, XORCC SSA ops produce a flags value, which should not have register mask of an integer register. Fixes #34468. Change-Id: Ic762e423b20275fd9f8118dae7951c258d59738c Reviewed-on: https://go-review.googlesource.com/c/go/+/196960 Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
7987238d9c |
cmd/asm,cmd/compile: clean up isel codegen on ppc64x
This cleans up the isel code generation in ssa for ppc64x. Current there is no isel op and the isel code is only generated from pseudo ops in ppc64/ssa.go, and only using operands with values 0 or 1. When the isel is generated, there is always a load of 1 into the temp register before it. This change implements the isel op so it can be used in PPC64.rules, and can recognize operand values other than 0 or 1. This also eliminates the forced load of 1, so it will be loaded only if needed. This will make the isel code generation consistent with other ops, and allow future rule changes that can take advantage of having a more general purpose isel rule. Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d Reviewed-on: https://go-review.googlesource.com/c/go/+/195057 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> |
|
|
|
ed7f323c8f |
all: simplify code using "gofmt -s -w"
Most changes are removing redundant declaration of type when direct
instantiating value of map or slice, e.g. []T{T{}} become []T{{}}.
Small changes are removing the high order of subslice if its value
is the length of slice itself, e.g. T[:len(T)] become T[:].
The following file is excluded due to incompatibility with go1.4,
- src/cmd/compile/internal/gc/ssa.go
Change-Id: Id3abb09401795ce1e6da591a89749cba8502fb26
Reviewed-on: https://go-review.googlesource.com/c/go/+/166437
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
|
|
4a4e05b0b1 |
cmd/compile,runtime/internal/atomic: add Load8
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c Reviewed-on: https://go-review.googlesource.com/c/go/+/172283 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
50ad09418e |
cmd/compile: intrinsify math/bits.Add64 for ppc64x
This change creates an intrinsic for Add64 for ppc64x and adds a testcase for it. name old time/op new time/op delta Add64-160 1.90ns ±40% 2.29ns ± 0% ~ (p=0.119 n=5+5) Add64multiple-160 6.69ns ± 2% 2.45ns ± 4% -63.47% (p=0.016 n=4+5) Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97 Reviewed-on: https://go-review.googlesource.com/c/go/+/173758 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> |
|
|
|
ad6c691542 |
cmd/compile: remove AUNDEF opcode
This opcode was only used to mark unreachable code for plive to use. plive now uses the SSA representation, so it knows locations are unreachable because they are ends of Exit blocks. It doesn't need these opcodes any more. These opcodes actually used space in the binary, 2 bytes per undef on x86 and more for other archs. Makes the amd64 go binary 0.2% smaller. Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620 Reviewed-on: https://go-review.googlesource.com/c/go/+/171058 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
3023d7da49 |
cmd/compile/internal, cmd/internal/obj/ppc64: generate new count trailing zeros instructions on POWER9
This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD) to the assembler and generates them in SSA when GOPPC64=power9. name old time/op new time/op delta TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13) TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15) TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14) TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14) TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13) Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596 Reviewed-on: https://go-review.googlesource.com/c/go/+/167517 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
2c423f063b |
cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
9fe9853ae5 |
cmd/compile: fix nilcheck for AIX
This commit adapts compile tool to create correct nilchecks for AIX. AIX allows to load a nil pointer. Therefore, the default nilcheck which issues a load must be replaced by a CMP instruction followed by a store at 0x0 if the value is nil. The store will trigger a SIGSEGV as on others OS. The nilcheck algorithm must be adapted to do not remove nilcheck if it's only a read. Stores are detected with v.Type.IsMemory(). Tests related to nilptr must be adapted to the previous changements. nilptr.go cannot be used as it's because the AIX address space starts at 1<<32. Change-Id: I9f5aaf0b7e185d736a9b119c0ed2fe4e5bd1e7af Reviewed-on: https://go-review.googlesource.com/c/144538 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
4ae49b5921 |
cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
This change makes use of the cc versions of the AND, OR, XOR instructions, omitting the need for a CMP instruction. In many test programs and in the go binary, this reduces the size of 20-30 functions by at least 1 instruction, many in runtime. Testcase added to test/codegen/comparisons.go Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb Reviewed-on: https://go-review.googlesource.com/c/143059 Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Michael Munday <mike.munday@ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
5c472132bf |
cmd/compile, runtime: add new lightweight atomics for ppc64x
This change creates the infrastructure for new lightweight atomics primitives in runtime/internal/atomic: - LoadAcq, for load-acquire - StoreRel, for store-release - CasRel, for Compare-and-Swap-release and implements them for ppc64x. There is visible performance improvement in producer-consumer scenarios, like BenchmarkChanProdCons*: benchmark old ns/op new ns/op delta BenchmarkChanProdCons0-48 2034 2034 +0.00% BenchmarkChanProdCons10-48 1798 1608 -10.57% BenchmarkChanProdCons100-48 1596 1585 -0.69% BenchmarkChanProdConsWork0-48 2084 2046 -1.82% BenchmarkChanProdConsWork10-48 1829 1668 -8.80% BenchmarkChanProdConsWork100-48 1650 1650 +0.00% Fixes #21348 Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db Reviewed-on: https://go-review.googlesource.com/c/142277 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
04dc1b2443 |
all: fix a bunch of misspellings
Change-Id: I94cebca86706e072fbe3be782d3edbe0e22b9432
GitHub-Last-Rev:
|
|
|
|
85c9c85955 |
cmd/compile: add rules to use index regs for ppc64x
The following adds support for load and store instructions with index registers, and adds rules to take advantage of those instructions. Examples of improvements: crypto/rc4: name old time/op new time/op delta RC4_128 445ns ± 0% 404ns ± 0% -9.21% (p=0.029 n=4+4) RC4_1K 3.46µs ± 0% 3.13µs ± 0% -9.29% (p=0.029 n=4+4) RC4_8K 27.0µs ± 0% 24.7µs ± 0% -8.83% (p=0.029 n=4+4) crypto/des: name old time/op new time/op delta Encrypt 276ns ± 0% 264ns ± 0% -4.35% (p=0.029 n=4+4) Decrypt 278ns ± 0% 263ns ± 0% -5.40% (p=0.029 n=4+4) TDESEncrypt 683ns ± 0% 645ns ± 0% -5.56% (p=0.029 n=4+4) TDESDecrypt 684ns ± 0% 641ns ± 0% -6.29% (p=0.029 n=4+4) crypto/sha1: name old time/op new time/op delta Hash8Bytes 661ns ± 0% 635ns ± 0% -3.93% (p=1.000 n=1+1) Hash320Bytes 2.70µs ± 0% 2.56µs ± 0% -5.26% (p=1.000 n=1+1) Hash1K 7.14µs ± 0% 6.78µs ± 0% -5.03% (p=1.000 n=1+1) Hash8K 52.1µs ± 0% 49.4µs ± 0% -5.14% (p=1.000 n=1+1) Change-Id: I03810e90fcc20029975a323f06bfa086c973c2b0 Reviewed-on: https://go-review.googlesource.com/c/135975 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com> |
|
|
|
9aed4cc395 |
cmd/compile: instrinsify math/bits.Mul on ppc64x
Add SSA rules to intrinsify Mul/Mul64 on ppc64x. benchmark old ns/op new ns/op delta BenchmarkMul-40 8.80 0.93 -89.43% BenchmarkMul32-40 1.39 1.39 +0.00% BenchmarkMul64-40 5.39 0.93 -82.75% Updates #24813 Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829 Reviewed-on: https://go-review.googlesource.com/138917 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
28edaf4584 |
cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and stores when the byte order was little endian for ppc64le. This is the corresponding change for bytes that are in big endian order. These rules are all intended for a little endian target arch. This adds new testcases in test/codegen/memcombine.go Fixes #22496 Updates #24242 Benchmark improvement for encoding/binary: name old time/op new time/op delta ReadSlice1000Int32s-16 11.0µs ± 0% 9.0µs ± 0% -17.47% (p=0.029 n=4+4) ReadStruct-16 2.47µs ± 1% 2.48µs ± 0% +0.67% (p=0.114 n=4+4) ReadInts-16 642ns ± 1% 630ns ± 1% -2.02% (p=0.029 n=4+4) WriteInts-16 654ns ± 0% 653ns ± 1% -0.08% (p=0.629 n=4+4) WriteSlice1000Int32s-16 8.75µs ± 0% 8.20µs ± 0% -6.19% (p=0.029 n=4+4) PutUint16-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4) PutUint32-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4) PutUint64-16 1.85ns ± 0% 0.93ns ± 0% -49.73% (p=0.029 n=4+4) LittleEndianPutUint16-16 1.03ns ± 0% 0.93ns ± 0% -9.71% (p=0.029 n=4+4) LittleEndianPutUint32-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal) LittleEndianPutUint64-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal) PutUvarint32-16 43.0ns ± 0% 43.1ns ± 0% +0.12% (p=0.429 n=4+4) PutUvarint64-16 174ns ± 0% 175ns ± 0% +0.29% (p=0.429 n=4+4) Updates made to functions in gcm.go to enable their matching. An existing testcase prevents these functions from being replaced by those in encoding/binary due to import dependencies. Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a Reviewed-on: https://go-review.googlesource.com/98136 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
20102594a0 |
cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures: arm mips mipsle mips64 mips64le ppc64 ppc64le s390x Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef Reviewed-on: https://go-review.googlesource.com/110835 Run-TryBot: Wei Xiao <Wei.Xiao@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
ebb67d993a |
cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
This change implements math.Round as an intrinsic on ppc64x so it can be done using a single instruction. benchmark old ns/op new ns/op delta BenchmarkRound-16 2.60 0.69 -73.46% Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8 Reviewed-on: https://go-review.googlesource.com/109395 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
8871c930be |
cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
30311e8860 |
cmd/compile: generate load without DS relocation for go.string on ppc64le
Due to some recent optimizations related to the compare instruction, DS-form load instructions started to be used to load 8-byte go.strings. This can cause link time errors if the go.string is not aligned to 4 bytes. For DS-form instructions, the value in the offset field must be a multiple of 4. If the offset is known at the time the rules are processed, a DS-form load will not be chosen. But for go.strings, the offset is not known at that time, but a relocation is generated indicating that the linker should fill in the DS relocation. When the linker tries to fill in the relocation, if the offset is not aligned properly, a link error will occur. To fix this, when loading a go.string using MOVDload, the full address of the go.string is generated and loaded into the base register. Then the go.string is loaded with a 0 offset field. Added a testcase that reproduces this problem. Fixes #24799 Change-Id: I6a154e8e1cba64eae290be0fbcb608b75884ecdd Reviewed-on: https://go-review.googlesource.com/107855 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
ab48574bab |
cmd/compile: use Block.Likely to put likely branch first
When a neither of a conditional block's successors follows,
the block must end with a conditional branch followed by a
an unconditional branch. If the (conditional) branch is
"unlikely", invert it and swap successors to make it
likely instead.
This doesn't matter to most benchmarks on amd64, but in one
instance on amd64 it caused a 30% improvement, and it is
otherwise harmless. The problematic loop is
for i := 0; i < w; i++ {
if pw[i] != 0 {
return true
}
}
compiled under GOEXPERIMENT=preemptibleloops
This the very worst-case benchmark for that experiment.
Also in this CL is a commoning up of heavily-repeated
boilerplate, which made it much easier to see that the
changes were applied correctly. In the future this should
allow un-exporting of SSAGenState.Branches once the
boilerplate-replacement is done everywhere.
Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335
Reviewed-on: https://go-review.googlesource.com/104977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
6633bb2aa7 |
cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
This change implements faster atomics for ppc64x based on the ISA 2.07B, Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some cases. Updates #21348 name old time/op new time/op delta Cond1-16 955ns 856ns -10.33% Cond2-16 2.38µs 2.03µs -14.59% Cond4-16 5.90µs 5.44µs -7.88% Cond8-16 12.1µs 11.1µs -8.42% Cond16-16 27.0µs 25.1µs -7.04% Cond32-16 59.1µs 55.5µs -6.14% LoadMostlyHits/*sync_test.DeepCopyMap-16 22.1ns 24.1ns +9.02% LoadMostlyHits/*sync_test.RWMutexMap-16 252ns 249ns -1.20% LoadMostlyHits/*sync.Map-16 16.2ns 16.3ns ~ LoadMostlyMisses/*sync_test.DeepCopyMap-16 22.3ns 22.6ns ~ LoadMostlyMisses/*sync_test.RWMutexMap-16 249ns 247ns -0.51% LoadMostlyMisses/*sync.Map-16 12.7ns 12.7ns ~ LoadOrStoreBalanced/*sync_test.RWMutexMap-16 1.27µs 1.17µs -7.54% LoadOrStoreBalanced/*sync.Map-16 1.12µs 1.10µs -2.35% LoadOrStoreUnique/*sync_test.RWMutexMap-16 1.75µs 1.68µs -3.84% LoadOrStoreUnique/*sync.Map-16 2.07µs 1.97µs -5.13% LoadOrStoreCollision/*sync_test.DeepCopyMap-16 15.8ns 15.9ns ~ LoadOrStoreCollision/*sync_test.RWMutexMap-16 496ns 424ns -14.48% LoadOrStoreCollision/*sync.Map-16 6.07ns 6.07ns ~ Range/*sync_test.DeepCopyMap-16 1.65µs 1.64µs ~ Range/*sync_test.RWMutexMap-16 278µs 288µs +3.75% Range/*sync.Map-16 2.00µs 2.01µs ~ AdversarialAlloc/*sync_test.DeepCopyMap-16 3.45µs 3.44µs ~ AdversarialAlloc/*sync_test.RWMutexMap-16 226ns 227ns ~ AdversarialAlloc/*sync.Map-16 1.09µs 1.07µs -2.36% AdversarialDelete/*sync_test.DeepCopyMap-16 553ns 550ns -0.57% AdversarialDelete/*sync_test.RWMutexMap-16 273ns 274ns ~ AdversarialDelete/*sync.Map-16 247ns 249ns ~ UncontendedSemaphore-16 79.0ns 65.5ns -17.11% ContendedSemaphore-16 112ns 97ns -13.77% MutexUncontended-16 3.34ns 2.51ns -24.69% Mutex-16 266ns 191ns -28.26% MutexSlack-16 226ns 159ns -29.55% MutexWork-16 377ns 338ns -10.14% MutexWorkSlack-16 335ns 308ns -8.20% MutexNoSpin-16 196ns 184ns -5.91% MutexSpin-16 710ns 666ns -6.21% Once-16 1.29ns 1.29ns ~ Pool-16 8.64ns 8.71ns ~ PoolOverflow-16 1.60µs 1.44µs -10.25% SemaUncontended-16 5.39ns 4.42ns -17.96% SemaSyntNonblock-16 539ns 483ns -10.42% SemaSyntBlock-16 413ns 354ns -14.20% SemaWorkNonblock-16 305ns 258ns -15.36% SemaWorkBlock-16 266ns 229ns -14.06% RWMutexUncontended-16 12.9ns 9.7ns -24.80% RWMutexWrite100-16 203ns 147ns -27.47% RWMutexWrite10-16 177ns 119ns -32.74% RWMutexWorkWrite100-16 435ns 403ns -7.39% RWMutexWorkWrite10-16 642ns 611ns -4.79% WaitGroupUncontended-16 4.67ns 3.70ns -20.92% WaitGroupAddDone-16 402ns 355ns -11.54% WaitGroupAddDoneWork-16 208ns 250ns +20.09% WaitGroupWait-16 1.21ns 1.21ns ~ WaitGroupWaitWork-16 5.91ns 5.87ns -0.81% WaitGroupActuallyWait-16 92.2ns 85.8ns -6.91% Updates #21348 Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3 Reviewed-on: https://go-review.googlesource.com/95175 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> |
|
|
|
ae7d5f84f8 |
runtime: buffered write barrier for ppc64
Updates #22460. Change-Id: I6040c4024111c80361c81eb7eec5071ec9efb4f9 Reviewed-on: https://go-review.googlesource.com/92702 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
4d0151ede5 |
cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x. New instruction FCPSGN is added to generate fcpsgn. Some new rules are added to improve the int<->float conversions that are generated mainly due to the Float64bits and Float64frombits in the math package. PPC64.rules is also modified as suggested in the review for CL 63290. Improvements: benchmark old ns/op new ns/op delta BenchmarkAbs-16 1.12 0.69 -38.39% BenchmarkCopysign-16 1.30 0.93 -28.46% BenchmarkNextafter32-16 9.34 8.05 -13.81% BenchmarkFrexp-16 8.81 7.60 -13.73% Others that used Copysign also saw smaller improvements. I attempted to make this work using rules since that seems to be preferred, but due to the use of Float64bits and Float64frombits in these functions, several rules had to be added and even then not all cases were matched. Using rules became too complicated and seemed too fragile for these. Updates #21390 Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995 Reviewed-on: https://go-review.googlesource.com/67130 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org> |