Commit Graph

105 Commits

Author SHA1 Message Date
Josh Bleecher Snyder a61524d103 cmd/internal/obj: add Prog.SetFrom3{Reg,Const}
These are the the most common uses, and they reduce line noise.

I don't love adding new deprecated APIs,
but since they're trivial wrappers,
it'll be very easy to update them along with the rest.
No functional changes; passes toolstash-check.

Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/296009
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-02-25 23:19:16 +00:00
Russ Cox 6c34d2f420 [dev.regabi] cmd/compile: split out package ssagen [generated]
[git-generate]

cd src/cmd/compile/internal/gc
rf '
	# maxOpenDefers is declared in ssa.go but used only by walk.
	mv maxOpenDefers walk.go

	# gc.Arch -> ssagen.Arch
	# It is not as nice but will do for now.
	mv Arch ArchInfo
	mv thearch Arch
	mv Arch ArchInfo arch.go

	# Pull dwarf out of pgen.go.
	mv debuginfo declPos createDwarfVars preInliningDcls \
		createSimpleVars createSimpleVar \
		createComplexVars createComplexVar \
		dwarf.go

	# Pull high-level compilation out of pgen.go,
	# leaving only the SSA code.
	mv compilequeue funccompile compile compilenow \
		compileFunctions isInlinableButNotInlined \
		initLSym \
		compile.go

	mv BoundsCheckFunc GCWriteBarrierReg ssa.go
	mv largeStack largeStackFrames CheckLargeStacks pgen.go

	# All that is left in dcl.go is the nowritebarrierrecCheck
	mv dcl.go nowb.go

	# Export API and unexport non-API.
	mv initssaconfig InitConfig
	mv isIntrinsicCall IsIntrinsicCall
	mv ssaDumpInline DumpInline
	mv initSSATables InitTables
	mv initSSAEnv InitEnv
	mv compileSSA Compile
	mv stackOffset StackOffset
	mv canSSAType TypeOK
	mv SSAGenState State
	mv FwdRefAux fwdRefAux

	mv cgoSymABIs CgoSymABIs
	mv readSymABIs ReadSymABIs
	mv initLSym InitLSym
	mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go

	mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen
'
rm go.go gsubr.go

Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/279476
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 06:39:29 +00:00
Russ Cox 0ced54062e [dev.regabi] cmd/compile: split out package objw [generated]
Object file writing routines are used not just at the end
of the compilation but also during static data layout in walk.
Split them into their own package.

[git-generate]

cd src/cmd/compile/internal/gc
rf '
	# Move bit vector to new package bitvec
	mv bvec.n bvec.N
	mv bvec.b bvec.B
	mv bvec BitVec
	mv bvalloc New
	mv bvbulkalloc NewBulk
	mv bulkBvec.next bulkBvec.Next
	mv bulkBvec Bulk
	mv H0 h0
	mv Hp hp

	# Leave bvecSet and bitmap hashes behind - not needed as broadly.
	mv bvecSet.extractUniqe bvecSet.extractUnique
	mv h0 bvecSet bvecSet.grow bvecSet.add \
		bvecSet.extractUnique hashbitmap bvset.go

	mv bv.go cmd/compile/internal/bitvec

	ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 {
		import "cmd/internal/obj"
		var a *obj.Addr
		var i int64
		Addrconst(a, i) -> a.SetConst(i)
		var p, to *obj.Prog
		Patch(p, to) -> p.To.SetTarget(to)
	}
	rm Addrconst Patch

	# Move object-writing API to new package objw
	mv duint8 Objw_Uint8
	mv duint16 Objw_Uint16
	mv duint32 Objw_Uint32
	mv duintptr Objw_Uintptr
	mv duintxx Objw_UintN
	mv dsymptr Objw_SymPtr
	mv dsymptrOff Objw_SymPtrOff
	mv dsymptrWeakOff Objw_SymPtrWeakOff
	mv ggloblsym Objw_Global
	mv dbvec Objw_BitVec
	mv newProgs NewProgs
	mv Progs.clearp Progs.Clear
	mv Progs.settext Progs.SetText
	mv Progs.next Progs.Next
	mv Progs.pc Progs.PC
	mv Progs.pos Progs.Pos
	mv Progs.curfn Progs.CurFunc
	mv Progs.progcache Progs.Cache
	mv Progs.cacheidx Progs.CacheIndex
	mv Progs.nextLive Progs.NextLive
	mv Progs.prevLive Progs.PrevLive
	mv Progs.Appendpp Progs.Append
	mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex
	mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint

	mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \
		Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \
		Objw_BitVec \
		objw.go
	mv sharedProgArray NewProgs Progs \
		LivenessIndex StackMapDontCare \
		LivenessDontCare LivenessIndex.StackMapValid \
		Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \
		prog.go
	mv prog.go objw.go cmd/compile/internal/objw

	# Move ggloblnod to obj with the rest of the non-objw higher-level writing.
	mv ggloblnod obj.go
'

cd ../objw
rf '
	mv Objw_Uint8 Uint8
	mv Objw_Uint16 Uint16
	mv Objw_Uint32 Uint32
	mv Objw_Uintptr Uintptr
	mv Objw_UintN UintN
	mv Objw_SymPtr SymPtr
	mv Objw_SymPtrOff SymPtrOff
	mv Objw_SymPtrWeakOff SymPtrWeakOff
	mv Objw_Global Global
	mv Objw_BitVec BitVec
'

Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/279310
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 06:38:47 +00:00
Russ Cox 41f3af9d04 [dev.regabi] cmd/compile: replace *Node type with an interface Node [generated]
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.

The previous CL defined an interface INode modeling a *Node.

This CL:
 - Changes all references outside internal/ir to use INode,
   along with many references inside internal/ir as well.
 - Renames Node to node.
 - Renames INode to Node

So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.

The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.

Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.

% benchstat bench.old bench.new
name                      old time/op       new time/op       delta
Template                        168ms ± 4%        182ms ± 2%   +8.34%  (p=0.000 n=9+9)
Unicode                        72.2ms ±10%       82.5ms ± 6%  +14.38%  (p=0.000 n=9+9)
GoTypes                         563ms ± 8%        598ms ± 2%   +6.14%  (p=0.006 n=9+9)
Compiler                        2.89s ± 4%        3.04s ± 2%   +5.37%  (p=0.000 n=10+9)
SSA                             6.45s ± 4%        7.25s ± 5%  +12.41%  (p=0.000 n=9+10)
Flate                           105ms ± 2%        115ms ± 1%   +9.66%  (p=0.000 n=10+8)
GoParser                        144ms ±10%        152ms ± 2%   +5.79%  (p=0.011 n=9+8)
Reflect                         345ms ± 9%        370ms ± 4%   +7.28%  (p=0.001 n=10+9)
Tar                             149ms ± 9%        161ms ± 5%   +8.05%  (p=0.001 n=10+9)
XML                             190ms ± 3%        209ms ± 2%   +9.54%  (p=0.000 n=9+8)
LinkCompiler                    327ms ± 2%        325ms ± 2%     ~     (p=0.382 n=8+8)
ExternalLinkCompiler            1.77s ± 4%        1.73s ± 6%     ~     (p=0.113 n=9+10)
LinkWithoutDebugCompiler        214ms ± 4%        211ms ± 2%     ~     (p=0.360 n=10+8)
StdCmd                          14.8s ± 3%        15.9s ± 1%   +6.98%  (p=0.000 n=10+9)
[Geo mean]                      480ms             510ms        +6.31%

name                      old user-time/op  new user-time/op  delta
Template                        223ms ± 3%        237ms ± 3%   +6.16%  (p=0.000 n=9+10)
Unicode                         103ms ± 6%        113ms ± 3%   +9.53%  (p=0.000 n=9+9)
GoTypes                         758ms ± 8%        800ms ± 2%   +5.55%  (p=0.003 n=10+9)
Compiler                        3.95s ± 2%        4.12s ± 2%   +4.34%  (p=0.000 n=10+9)
SSA                             9.43s ± 1%        9.74s ± 4%   +3.25%  (p=0.000 n=8+10)
Flate                           132ms ± 2%        141ms ± 2%   +6.89%  (p=0.000 n=9+9)
GoParser                        177ms ± 9%        183ms ± 4%     ~     (p=0.050 n=9+9)
Reflect                         467ms ±10%        495ms ± 7%   +6.17%  (p=0.029 n=10+10)
Tar                             183ms ± 9%        197ms ± 5%   +7.92%  (p=0.001 n=10+10)
XML                             249ms ± 5%        268ms ± 4%   +7.82%  (p=0.000 n=10+9)
LinkCompiler                    544ms ± 5%        544ms ± 6%     ~     (p=0.863 n=9+9)
ExternalLinkCompiler            1.79s ± 4%        1.75s ± 6%     ~     (p=0.075 n=10+10)
LinkWithoutDebugCompiler        248ms ± 6%        246ms ± 2%     ~     (p=0.965 n=10+8)
[Geo mean]                      483ms             504ms        +4.41%

[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
	mv Node OldNode

	add node.go \
		type Node = *OldNode
'

: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
        ex .  ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
                import "cmd/compile/internal/ir"
                *ir.OldNode -> ir.Node
        }
'
cd ../ssa
rf '
        ex {
                import "cmd/compile/internal/ir"
                *ir.OldNode -> ir.Node
        }
'

: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
	/type Node = \*OldNode/d
	s/\*OldNode/Node/g
	s/^func (n Node)/func (n *OldNode)/
	s/OldNode/node/g
	s/type INode interface/type Node interface/
	s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go

sed -i '' '
	s/{Func{}, 136, 248}/{Func{}, 152, 280}/
	s/{Name{}, 32, 56}/{Name{}, 44, 80}/
	s/{Param{}, 24, 48}/{Param{}, 44, 88}/
	s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go

cd ../ssa
sed -i '' '
	s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go

cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go

cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u

Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 17:30:43 +00:00
Russ Cox 84e2bd611f [dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated]
If we want to break up package gc at all, we will need to move
the compiler IR it defines into a separate package that can be
imported by packages that gc itself imports. This CL does that.
It also removes the TINT8 etc aliases so that all code is clear
about which package things are coming from.

This CL is automatically generated by the script below.
See the comments in the script for details about the changes.

[git-generate]
cd src/cmd/compile/internal/gc

rf '
        # These names were never fully qualified
        # when the types package was added.
        # Do it now, to avoid confusion about where they live.
        inline -rm \
                Txxx \
                TINT8 \
                TUINT8 \
                TINT16 \
                TUINT16 \
                TINT32 \
                TUINT32 \
                TINT64 \
                TUINT64 \
                TINT \
                TUINT \
                TUINTPTR \
                TCOMPLEX64 \
                TCOMPLEX128 \
                TFLOAT32 \
                TFLOAT64 \
                TBOOL \
                TPTR \
                TFUNC \
                TSLICE \
                TARRAY \
                TSTRUCT \
                TCHAN \
                TMAP \
                TINTER \
                TFORW \
                TANY \
                TSTRING \
                TUNSAFEPTR \
                TIDEAL \
                TNIL \
                TBLANK \
                TFUNCARGS \
                TCHANARGS \
                NTYPE \
                BADWIDTH

        # esc.go and escape.go do not need to be split.
        # Append esc.go onto the end of escape.go.
        mv esc.go escape.go

        # Pull out the type format installation from func Main,
        # so it can be carried into package ir.
        mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats

        # Names that need to be exported for use by code left in gc.
        mv Isconst IsConst
        mv asNode AsNode
        mv asNodes AsNodes
        mv asTypesNode AsTypesNode
        mv basicnames BasicTypeNames
        mv builtinpkg BuiltinPkg
        mv consttype ConstType
        mv dumplist DumpList
        mv fdumplist FDumpList
        mv fmtMode FmtMode
        mv goopnames OpNames
        mv inspect Inspect
        mv inspectList InspectList
        mv localpkg LocalPkg
        mv nblank BlankNode
        mv numImport NumImport
        mv opprec OpPrec
        mv origSym OrigSym
        mv stmtwithinit StmtWithInit
        mv dump DumpAny
        mv fdump FDumpAny
        mv nod Nod
        mv nodl NodAt
        mv newname NewName
        mv newnamel NewNameAt
        mv assertRepresents AssertValidTypeForConst
        mv represents ValidTypeForConst
        mv nodlit NewLiteral

        # Types and fields that need to be exported for use by gc.
        mv nowritebarrierrecCallSym SymAndPos
        mv SymAndPos.lineno SymAndPos.Pos
        mv SymAndPos.target SymAndPos.Sym

        mv Func.lsym Func.LSym
        mv Func.setWBPos Func.SetWBPos
        mv Func.numReturns Func.NumReturns
        mv Func.numDefers Func.NumDefers
        mv Func.nwbrCalls Func.NWBRCalls

        # initLSym is an algorithm left behind in gc,
        # not an operation on Func itself.
        mv Func.initLSym initLSym

        mv nodeQueue NodeQueue
        mv NodeQueue.empty NodeQueue.Empty
        mv NodeQueue.popLeft NodeQueue.PopLeft
        mv NodeQueue.pushRight NodeQueue.PushRight

        # Many methods on Node are actually algorithms that
        # would apply to any node implementation.
        # Those become plain functions.
        mv Node.funcname FuncName
        mv Node.isBlank IsBlank
        mv Node.isGoConst isGoConst
        mv Node.isNil IsNil
        mv Node.isParamHeapCopy isParamHeapCopy
        mv Node.isParamStackCopy isParamStackCopy
        mv Node.isSimpleName isSimpleName
        mv Node.mayBeShared MayBeShared
        mv Node.pkgFuncName PkgFuncName
        mv Node.backingArrayPtrLen backingArrayPtrLen
        mv Node.isterminating isTermNode
        mv Node.labeledControl labeledControl
        mv Nodes.isterminating isTermNodes
        mv Nodes.sigerr fmtSignature
        mv Node.MethodName methodExprName
        mv Node.MethodFunc methodExprFunc
        mv Node.IsMethod IsMethod

        # Every node will need to implement RawCopy;
        # Copy and SepCopy algorithms will use it.
        mv Node.rawcopy Node.RawCopy
        mv Node.copy Copy
        mv Node.sepcopy SepCopy

        # Extract Node.Format method body into func FmtNode,
        # but leave method wrapper behind.
        mv Node.Format:0,$ FmtNode

        # Formatting helpers that will apply to all node implementations.
        mv Node.Line Line
        mv Node.exprfmt exprFmt
        mv Node.jconv jconvFmt
        mv Node.modeString modeString
        mv Node.nconv nconvFmt
        mv Node.nodedump nodeDumpFmt
        mv Node.nodefmt nodeFmt
        mv Node.stmtfmt stmtFmt

	# Constant support needed for code moving to ir.
        mv okforconst OKForConst
        mv vconv FmtConst
        mv int64Val Int64Val
        mv float64Val Float64Val
        mv Node.ValueInterface ConstValue

        # Organize code into files.
        mv LocalPkg BuiltinPkg ir.go
        mv NumImport InstallTypeFormats Line fmt.go
        mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \
                AsNode AsTypesNode BlankNode OrigSym \
                Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \
                IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \
                Node.RawCopy SepCopy Copy \
                IsNil IsBlank IsMethod \
                Node.Typ Node.StorageClass node.go
        mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go

        # Move files to new ir package.
        mv bitset.go class_string.go dump.go fmt.go \
                ir.go node.go op_string.go val.go \
                sizeof_test.go cmd/compile/internal/ir
'

: # fix mkbuiltin.go to generate the changes made to builtin.go during rf
sed -i '' '
        s/\[T/[types.T/g
        s/\*Node/*ir.Node/g
        /internal\/types/c \
                fmt.Fprintln(&b, `import (`) \
                fmt.Fprintln(&b, `      "cmd/compile/internal/ir"`) \
                fmt.Fprintln(&b, `      "cmd/compile/internal/types"`) \
                fmt.Fprintln(&b, `)`)
' mkbuiltin.go
gofmt -w mkbuiltin.go

: # update cmd/dist to add internal/ir
cd ../../../dist
sed -i '' '/compile.internal.gc/a\
        "cmd/compile/internal/ir",
' buildtool.go
gofmt -w buildtool.go

: # update cmd/compile TestFormats
cd ../..
go install std cmd
cd cmd/compile
go test -u || go test  # first one updates but fails; second passes

Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db
Reviewed-on: https://go-review.googlesource.com/c/go/+/273008
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 16:53:33 +00:00
Russ Cox 26b66fd60b [dev.regabi] cmd/compile: introduce cmd/compile/internal/base [generated]
Move Flag, Debug, Ctxt, Exit, and error messages to
new package cmd/compile/internal/base.

These are the core functionality that everything in gc uses
and which otherwise prevent splitting any other code
out of gc into different packages.

A minor milestone: the compiler source code
no longer contains the string "yy".

[git-generate]
cd src/cmd/compile/internal/gc
rf '
        mv atExit AtExit
        mv Ctxt atExitFuncs AtExit Exit base.go

        mv lineno Pos
        mv linestr FmtPos
        mv flusherrors FlushErrors
        mv yyerror Errorf
        mv yyerrorl ErrorfAt
        mv yyerrorv ErrorfVers
        mv noder.yyerrorpos noder.errorAt
        mv Warnl WarnfAt
        mv errorexit ErrorExit

        mv base.go debug.go flag.go print.go cmd/compile/internal/base
'

: # update comments
sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go

: # bootstrap.go is not built by default so invisible to rf
sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go
goimports -w bootstrap.go

: # update cmd/dist to add internal/base
cd ../../../dist
sed -i '' '/internal.amd64/a\
	"cmd/compile/internal/base",
' buildtool.go
gofmt -w buildtool.go

Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/272250
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 16:39:54 +00:00
Russ Cox 3c240f5d17 [dev.regabi] cmd/compile: clean up debug flag (-d) handling [generated]
The debug table is not as haphazard as flags, but there are still
a few mismatches between command-line names and variable names.
This CL moves them all into a consistent home (var Debug, like var Flag).

Code updated automatically using the rf command below.
A followup CL will make a few manual cleanups, leaving this CL
completely automated and easier to regenerate during merge
conflicts.

[git-generate]
cd src/cmd/compile/internal/gc
rf '
	add main.go var Debug struct{}
	mv Debug_append Debug.Append
	mv Debug_checkptr Debug.Checkptr
	mv Debug_closure Debug.Closure
	mv Debug_compilelater Debug.CompileLater
	mv disable_checknil Debug.DisableNil
	mv debug_dclstack Debug.DclStack
	mv Debug_gcprog Debug.GCProg
	mv Debug_libfuzzer Debug.Libfuzzer
	mv Debug_checknil Debug.Nil
	mv Debug_panic Debug.Panic
	mv Debug_slice Debug.Slice
	mv Debug_typeassert Debug.TypeAssert
	mv Debug_wb Debug.WB
	mv Debug_export Debug.Export
	mv Debug_pctab Debug.PCTab
	mv Debug_locationlist Debug.LocationLists
	mv Debug_typecheckinl Debug.TypecheckInl
	mv Debug_gendwarfinl Debug.DwarfInl
	mv Debug_softfloat Debug.SoftFloat
	mv Debug_defer Debug.Defer
	mv Debug_dumpptrs Debug.DumpPtrs

	mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug

	mv debugtab Debug parseDebug \
		debugHelpHeader debugHelpFooter \
		debug.go

	# Remove //go:generate line copied from main.go
	rm debug.go:/go:generate/-+
'

Change-Id: I625761ca5659be4052f7161a83baa00df75cca91
Reviewed-on: https://go-review.googlesource.com/c/go/+/272246
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 16:39:42 +00:00
Paul E. Murphy cb674b5c13 cmd/compile,cmd/asm: fix function pointer call perf regression on ppc64
by inserting hint when using bclrl.

Using this instruction as subroutine call is not the expected
default behavior, and as a result confuses the branch predictor.

The default expected behavior is a conditional return from a
subroutine.

We can change this assumption by encoding a hint this is not a
subroutine return.

The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:

name                          old time/op    new time/op     delta
Find                             606ns ± 0%      447ns ± 0%  -26.27%
FindAllNoMatches                 309ns ± 0%      205ns ± 0%  -33.72%
FindString                       609ns ± 0%      451ns ± 0%  -26.04%
FindSubmatch                     734ns ± 0%      594ns ± 0%  -19.07%
FindStringSubmatch               706ns ± 0%      574ns ± 0%  -18.83%
Literal                          177ns ± 0%      136ns ± 0%  -22.89%
NotLiteral                      4.69µs ± 0%     2.34µs ± 0%  -50.14%
MatchClass                      6.05µs ± 0%     3.26µs ± 0%  -46.08%
MatchClass_InRange              5.93µs ± 0%     3.15µs ± 0%  -46.86%
ReplaceAll                      3.15µs ± 0%     2.18µs ± 0%  -30.77%
AnchoredLiteralShortNonMatch     156ns ± 0%      109ns ± 0%  -30.61%
AnchoredLiteralLongNonMatch      192ns ± 0%      136ns ± 0%  -29.34%
AnchoredShortMatch               268ns ± 0%      209ns ± 0%  -22.00%
AnchoredLongMatch                472ns ± 0%      357ns ± 0%  -24.30%
OnePassShortA                   1.16µs ± 0%     0.87µs ± 0%  -25.03%
NotOnePassShortA                1.34µs ± 0%     1.20µs ± 0%  -10.63%
OnePassShortB                    940ns ± 0%      655ns ± 0%  -30.29%
NotOnePassShortB                 873ns ± 0%      703ns ± 0%  -19.52%
OnePassLongPrefix                258ns ± 0%      155ns ± 0%  -40.13%
OnePassLongNotPrefix             943ns ± 0%      529ns ± 0%  -43.89%
MatchParallelShared              591ns ± 0%      436ns ± 0%  -26.31%
MatchParallelCopied              596ns ± 0%      435ns ± 0%  -27.10%
QuoteMetaAll                     186ns ± 0%      186ns ± 0%   -0.16%
QuoteMetaNone                   55.9ns ± 0%     55.9ns ± 0%   +0.02%
Compile/Onepass                 9.64µs ± 0%     9.26µs ± 0%   -3.97%
Compile/Medium                  21.7µs ± 0%     20.6µs ± 0%   -4.90%
Compile/Hard                     174µs ± 0%      174µs ± 0%   +0.07%
Match/Easy0/16                  7.35ns ± 0%     7.34ns ± 0%   -0.11%
Match/Easy0/32                   116ns ± 0%       97ns ± 0%  -16.27%
Match/Easy0/1K                   592ns ± 0%      562ns ± 0%   -5.04%
Match/Easy0/32K                 12.6µs ± 0%     12.5µs ± 0%   -0.64%
Match/Easy0/1M                   556µs ± 0%      556µs ± 0%   -0.00%
Match/Easy0/32M                 17.7ms ± 0%     17.7ms ± 0%   +0.05%
Match/Easy0i/16                 7.34ns ± 0%     7.35ns ± 0%   +0.10%
Match/Easy0i/32                 2.82µs ± 0%     1.64µs ± 0%  -41.71%
Match/Easy0i/1K                 83.2µs ± 0%     48.2µs ± 0%  -42.06%
Match/Easy0i/32K                2.13ms ± 0%     1.80ms ± 0%  -15.34%
Match/Easy0i/1M                 68.1ms ± 0%     57.6ms ± 0%  -15.31%
Match/Easy0i/32M                 2.18s ± 0%      1.80s ± 0%  -17.52%
Match/Easy1/16                  7.36ns ± 0%     7.34ns ± 0%   -0.24%
Match/Easy1/32                   118ns ± 0%       96ns ± 0%  -18.72%
Match/Easy1/1K                  2.46µs ± 0%     1.58µs ± 0%  -35.65%
Match/Easy1/32K                 80.2µs ± 0%     54.6µs ± 0%  -31.92%
Match/Easy1/1M                  2.75ms ± 0%     1.88ms ± 0%  -31.66%
Match/Easy1/32M                 87.5ms ± 0%     59.8ms ± 0%  -31.62%
Match/Medium/16                 7.34ns ± 0%     7.34ns ± 0%   +0.01%
Match/Medium/32                 2.60µs ± 0%     1.50µs ± 0%  -42.61%
Match/Medium/1K                 78.1µs ± 0%     43.7µs ± 0%  -44.06%
Match/Medium/32K                2.08ms ± 0%     1.52ms ± 0%  -27.11%
Match/Medium/1M                 66.5ms ± 0%     48.6ms ± 0%  -26.96%
Match/Medium/32M                 2.14s ± 0%      1.60s ± 0%  -25.18%
Match/Hard/16                   7.35ns ± 0%     7.35ns ± 0%   +0.03%
Match/Hard/32                   3.58µs ± 0%     2.44µs ± 0%  -31.82%
Match/Hard/1K                    108µs ± 0%       75µs ± 0%  -31.04%
Match/Hard/32K                  2.79ms ± 0%     2.25ms ± 0%  -19.30%
Match/Hard/1M                   89.4ms ± 0%     72.2ms ± 0%  -19.26%
Match/Hard/32M                   2.91s ± 0%      2.37s ± 0%  -18.60%
Match/Hard1/16                  11.1µs ± 0%      8.3µs ± 0%  -25.07%
Match/Hard1/32                  21.4µs ± 0%     16.1µs ± 0%  -24.85%
Match/Hard1/1K                   658µs ± 0%      498µs ± 0%  -24.27%
Match/Hard1/32K                 12.2ms ± 0%     11.7ms ± 0%   -4.60%
Match/Hard1/1M                   391ms ± 0%      374ms ± 0%   -4.40%
Match/Hard1/32M                  12.6s ± 0%      12.0s ± 0%   -4.68%
Match_onepass_regex/16           870ns ± 0%      611ns ± 0%  -29.79%
Match_onepass_regex/32          1.58µs ± 0%     1.08µs ± 0%  -31.48%
Match_onepass_regex/1K          45.7µs ± 0%     30.3µs ± 0%  -33.58%
Match_onepass_regex/32K         1.45ms ± 0%     0.97ms ± 0%  -33.20%
Match_onepass_regex/1M          46.2ms ± 0%     30.9ms ± 0%  -33.01%
Match_onepass_regex/32M          1.46s ± 0%      0.99s ± 0%  -32.02%

name                          old alloc/op   new alloc/op    delta
Find                             0.00B           0.00B         0.00%
FindAllNoMatches                 0.00B           0.00B         0.00%
FindString                       0.00B           0.00B         0.00%
FindSubmatch                     48.0B ± 0%      48.0B ± 0%    0.00%
FindStringSubmatch               32.0B ± 0%      32.0B ± 0%    0.00%
Compile/Onepass                 4.02kB ± 0%     4.02kB ± 0%    0.00%
Compile/Medium                  9.39kB ± 0%     9.39kB ± 0%    0.00%
Compile/Hard                    84.7kB ± 0%     84.7kB ± 0%    0.00%
Match_onepass_regex/16           0.00B           0.00B         0.00%
Match_onepass_regex/32           0.00B           0.00B         0.00%
Match_onepass_regex/1K           0.00B           0.00B         0.00%
Match_onepass_regex/32K          0.00B           0.00B         0.00%
Match_onepass_regex/1M           5.00B ± 0%      3.00B ± 0%  -40.00%
Match_onepass_regex/32M           136B ± 0%        68B ± 0%  -50.00%

name                          old allocs/op  new allocs/op   delta
Find                              0.00            0.00         0.00%
FindAllNoMatches                  0.00            0.00         0.00%
FindString                        0.00            0.00         0.00%
FindSubmatch                      1.00 ± 0%       1.00 ± 0%    0.00%
FindStringSubmatch                1.00 ± 0%       1.00 ± 0%    0.00%
Compile/Onepass                   52.0 ± 0%       52.0 ± 0%    0.00%
Compile/Medium                     112 ± 0%        112 ± 0%    0.00%
Compile/Hard                       424 ± 0%        424 ± 0%    0.00%
Match_onepass_regex/16            0.00            0.00         0.00%
Match_onepass_regex/32            0.00            0.00         0.00%
Match_onepass_regex/1K            0.00            0.00         0.00%
Match_onepass_regex/32K           0.00            0.00         0.00%
Match_onepass_regex/1M            0.00            0.00         0.00%
Match_onepass_regex/32M           2.00 ± 0%       1.00 ± 0%  -50.00%

name                          old speed      new speed       delta
QuoteMetaAll                  75.2MB/s ± 0%   75.3MB/s ± 0%   +0.15%
QuoteMetaNone                  465MB/s ± 0%    465MB/s ± 0%   -0.02%
Match/Easy0/16                2.18GB/s ± 0%   2.18GB/s ± 0%   +0.10%
Match/Easy0/32                 276MB/s ± 0%    330MB/s ± 0%  +19.46%
Match/Easy0/1K                1.73GB/s ± 0%   1.82GB/s ± 0%   +5.29%
Match/Easy0/32K               2.60GB/s ± 0%   2.62GB/s ± 0%   +0.64%
Match/Easy0/1M                1.89GB/s ± 0%   1.89GB/s ± 0%   +0.00%
Match/Easy0/32M               1.89GB/s ± 0%   1.89GB/s ± 0%   -0.05%
Match/Easy0i/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.10%
Match/Easy0i/32               11.4MB/s ± 0%   19.5MB/s ± 0%  +71.48%
Match/Easy0i/1K               12.3MB/s ± 0%   21.2MB/s ± 0%  +72.62%
Match/Easy0i/32K              15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/1M               15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/32M              15.4MB/s ± 0%   18.6MB/s ± 0%  +21.21%
Match/Easy1/16                2.17GB/s ± 0%   2.18GB/s ± 0%   +0.24%
Match/Easy1/32                 271MB/s ± 0%    333MB/s ± 0%  +23.07%
Match/Easy1/1K                 417MB/s ± 0%    648MB/s ± 0%  +55.38%
Match/Easy1/32K                409MB/s ± 0%    600MB/s ± 0%  +46.88%
Match/Easy1/1M                 381MB/s ± 0%    558MB/s ± 0%  +46.33%
Match/Easy1/32M                383MB/s ± 0%    561MB/s ± 0%  +46.25%
Match/Medium/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.01%
Match/Medium/32               12.3MB/s ± 0%   21.4MB/s ± 0%  +74.13%
Match/Medium/1K               13.1MB/s ± 0%   23.4MB/s ± 0%  +78.73%
Match/Medium/32K              15.7MB/s ± 0%   21.6MB/s ± 0%  +37.23%
Match/Medium/1M               15.8MB/s ± 0%   21.6MB/s ± 0%  +36.93%
Match/Medium/32M              15.7MB/s ± 0%   21.0MB/s ± 0%  +33.67%
Match/Hard/16                 2.18GB/s ± 0%   2.18GB/s ± 0%   -0.03%
Match/Hard/32                 8.93MB/s ± 0%  13.10MB/s ± 0%  +46.70%
Match/Hard/1K                 9.48MB/s ± 0%  13.74MB/s ± 0%  +44.94%
Match/Hard/32K                11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/1M                 11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/32M                11.6MB/s ± 0%   14.2MB/s ± 0%  +22.86%
Match/Hard1/16                1.44MB/s ± 0%   1.93MB/s ± 0%  +34.03%
Match/Hard1/32                1.49MB/s ± 0%   1.99MB/s ± 0%  +33.56%
Match/Hard1/1K                1.56MB/s ± 0%   2.05MB/s ± 0%  +31.41%
Match/Hard1/32K               2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/1M                2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/32M               2.66MB/s ± 0%   2.79MB/s ± 0%   +4.89%
Match_onepass_regex/16        18.4MB/s ± 0%   26.2MB/s ± 0%  +42.41%
Match_onepass_regex/32        20.2MB/s ± 0%   29.5MB/s ± 0%  +45.92%
Match_onepass_regex/1K        22.4MB/s ± 0%   33.8MB/s ± 0%  +50.54%
Match_onepass_regex/32K       22.6MB/s ± 0%   33.9MB/s ± 0%  +49.67%
Match_onepass_regex/1M        22.7MB/s ± 0%   33.9MB/s ± 0%  +49.27%
Match_onepass_regex/32M       23.0MB/s ± 0%   33.9MB/s ± 0%  +47.14%

Fixes #42709

Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/271337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2020-11-19 18:23:23 +00:00
Paul E. Murphy c3c6fbf314 cmd/compile: combine more 32 bit shift and mask operations on ppc64
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.

Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.

Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).

Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.

The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):

pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt                               52.2µs ± 0%    47.5µs ± 0%  -8.88%
ExpandKey                                       44.4µs ± 0%    40.3µs ± 0%  -9.15%

pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key                                             57.6ms ± 0%    52.3ms ± 0%  -9.13%

pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal                                           90.9ms ± 0%    82.6ms ± 0%  -9.13%
DefaultCost                                     91.0ms ± 0%    82.7ms ± 0%  -9.12%

Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-27 18:33:20 +00:00
Michael Pratt e223c6cf07 cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on PPC64
This is a simple case of changing the operand size of the existing 8-bit
And/Or.

I've also updated a few operand descriptions that were out-of-sync with
the implementation.

Change-Id: I95ac4445d08f7958768aec9a233698a2d652a39a
Reviewed-on: https://go-review.googlesource.com/c/go/+/263150
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-23 15:11:03 +00:00
Lynn Boger bdab5df40f cmd/compile,cmd/internal/obj/ppc64: use mulli where possible
This adds support to allow the use of mulli when one of the multiply
operands is a constant that fits in 16 bits.

This especially helps in the case where this instruction appears in
a loop since the load of the constant is not being moved out of the loop.

Some improvements seen in compress/flate on power9:

Decode/Digits/Huffman/1e4         259µs ± 0%     261µs ± 0%   +0.57%  (p=1.000 n=1+1)
Decode/Digits/Huffman/1e5        2.43ms ± 0%    2.45ms ± 0%   +0.79%  (p=1.000 n=1+1)
Decode/Digits/Huffman/1e6        23.9ms ± 0%    24.2ms ± 0%   +0.86%  (p=1.000 n=1+1)
Decode/Digits/Speed/1e4           278µs ± 0%     279µs ± 0%   +0.34%  (p=1.000 n=1+1)
Decode/Digits/Speed/1e5          2.80ms ± 0%    2.81ms ± 0%   +0.29%  (p=1.000 n=1+1)
Decode/Digits/Speed/1e6          28.0ms ± 0%    28.1ms ± 0%   +0.28%  (p=1.000 n=1+1)
Decode/Digits/Default/1e4         278µs ± 0%     278µs ± 0%   +0.28%  (p=1.000 n=1+1)
Decode/Digits/Default/1e5        2.68ms ± 0%    2.69ms ± 0%   +0.19%  (p=1.000 n=1+1)
Decode/Digits/Default/1e6        26.6ms ± 0%    26.6ms ± 0%   +0.21%  (p=1.000 n=1+1)
Decode/Digits/Compression/1e4     278µs ± 0%     278µs ± 0%   +0.00%  (p=1.000 n=1+1)
Decode/Digits/Compression/1e5    2.68ms ± 0%    2.69ms ± 0%   +0.21%  (p=1.000 n=1+1)
Decode/Digits/Compression/1e6    26.6ms ± 0%    26.6ms ± 0%   +0.07%  (p=1.000 n=1+1)
Decode/Newton/Huffman/1e4         322µs ± 0%     312µs ± 0%   -2.84%  (p=1.000 n=1+1)
Decode/Newton/Huffman/1e5        3.11ms ± 0%    2.91ms ± 0%   -6.41%  (p=1.000 n=1+1)
Decode/Newton/Huffman/1e6        31.4ms ± 0%    29.3ms ± 0%   -6.85%  (p=1.000 n=1+1)
Decode/Newton/Speed/1e4           282µs ± 0%     269µs ± 0%   -4.69%  (p=1.000 n=1+1)
Decode/Newton/Speed/1e5          2.29ms ± 0%    2.20ms ± 0%   -4.13%  (p=1.000 n=1+1)
Decode/Newton/Speed/1e6          22.7ms ± 0%    21.3ms ± 0%   -6.06%  (p=1.000 n=1+1)
Decode/Newton/Default/1e4         254µs ± 0%     237µs ± 0%   -6.60%  (p=1.000 n=1+1)
Decode/Newton/Default/1e5        1.86ms ± 0%    1.75ms ± 0%   -5.99%  (p=1.000 n=1+1)
Decode/Newton/Default/1e6        18.1ms ± 0%    17.4ms ± 0%   -4.10%  (p=1.000 n=1+1)
Decode/Newton/Compression/1e4     254µs ± 0%     244µs ± 0%   -3.91%  (p=1.000 n=1+1)
Decode/Newton/Compression/1e5    1.85ms ± 0%    1.79ms ± 0%   -3.10%  (p=1.000 n=1+1)
Decode/Newton/Compression/1e6    18.0ms ± 0%    17.3ms ± 0%   -3.88%  (p=1.000 n=1+1)

Change-Id: I840320fab1c4bf64c76b001c2651ab79f23df4eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/259444
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-06 19:40:46 +00:00
Lynn Boger cc2a5cf4b8 cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regression
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.

There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.

The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.

Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.

Fixes #41683.

Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-01 18:51:18 +00:00
Lynn Boger a424f6e45e cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9
This adds support for the extswsli instruction which combines
extsw followed by a shift.

New benchmark demonstrates the improvement:
name      old time/op  new time/op  delta
ExtShift  1.34µs ± 0%  1.30µs ± 0%  -3.15%  (p=0.057 n=4+3)

Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28 18:13:48 +00:00
Lynn Boger 967465da29 cmd/compile: use combined shifts to improve array addressing on ppc64x
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.

These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.

Some rules in PPC64.rules were moved because the ordering prevented
some matching.

The following results were generated on power9.

hash/crc32:
    CRC32/poly=Koopman/size=40/align=0          195ns ± 0%     163ns ± 0%  -16.41%
    CRC32/poly=Koopman/size=40/align=1          200ns ± 0%     163ns ± 0%  -18.50%
    CRC32/poly=Koopman/size=512/align=0        1.98µs ± 0%    1.67µs ± 0%  -15.46%
    CRC32/poly=Koopman/size=512/align=1        1.98µs ± 0%    1.69µs ± 0%  -14.80%
    CRC32/poly=Koopman/size=1kB/align=0        3.90µs ± 0%    3.31µs ± 0%  -15.27%
    CRC32/poly=Koopman/size=1kB/align=1        3.85µs ± 0%    3.31µs ± 0%  -14.15%
    CRC32/poly=Koopman/size=4kB/align=0        15.3µs ± 0%    13.1µs ± 0%  -14.22%
    CRC32/poly=Koopman/size=4kB/align=1        15.4µs ± 0%    13.1µs ± 0%  -14.79%
    CRC32/poly=Koopman/size=32kB/align=0        137µs ± 0%     105µs ± 0%  -23.56%
    CRC32/poly=Koopman/size=32kB/align=1        137µs ± 0%     105µs ± 0%  -23.53%

crypto/rc4:
    RC4_128    733ns ± 0%    650ns ± 0%  -11.32%  (p=1.000 n=1+1)
    RC4_1K    5.80µs ± 0%   5.17µs ± 0%  -10.89%  (p=1.000 n=1+1)
    RC4_8K    45.7µs ± 0%   40.8µs ± 0%  -10.73%  (p=1.000 n=1+1)

crypto/sha1:
    Hash8Bytes       635ns ± 0%     613ns ± 0%   -3.46%  (p=1.000 n=1+1)
    Hash320Bytes    2.30µs ± 0%    2.18µs ± 0%   -5.38%  (p=1.000 n=1+1)
    Hash1K          5.88µs ± 0%    5.38µs ± 0%   -8.62%  (p=1.000 n=1+1)
    Hash8K          42.0µs ± 0%    37.9µs ± 0%   -9.75%  (p=1.000 n=1+1)

There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.

Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-09-17 12:37:40 +00:00
Paul E. Murphy 7615b20d06 cmd/compile: generate subfic on ppc64
This merges an lis + subf into subfic, and for 32b constants
lwa + subf into oris + ori + subf.

The carry bit is no longer used in code generation, therefore
I think we can clobber it as needed.  Note, lowered borrow/carry
arithmetic is self-contained and thus is not affected.

A few extra rules are added to ensure early transformations to
SUBFCconst don't trip up earlier rules, fold constant operations,
or otherwise simplify lowering.  Likewise, tests are added to
ensure all rules are hit.  Generic constant folding catches
trivial cases, however some lowering rules insert arithmetic
which can introduce new opportunities (e.g BitLen or Slicemask).

I couldn't find a specific benchmark to demonstrate noteworthy
improvements, but this is generating subfic in many of the default
bent test binaries, so we are at least saving a little code space.

Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012
Reviewed-on: https://go-review.googlesource.com/c/go/+/249461
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-08-27 20:10:15 +00:00
Paul E. Murphy 2aba467933 cmd/compile: remove unused carry related ssa ops in ppc64
The intermediate SSA opcodes* are no longer generated during the
lowering pass.  The shifting rules have been improved using ISEL.
Therefore, we can remove them and the rules which expand them.

* The removed opcodes are:

  LoweredAdd64Carry
  ADDconstForCarry
  MaskIfNotCarry
  FlagCarryClear
  FlagCarrySet

Change-Id: I1ebe2726ed988f29ed4800c8f57b428f7a214cd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/249462
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-08-27 13:07:57 +00:00
Paul E. Murphy e7c7ce646f cmd/compile: combine multiply/add into maddld on ppc64le/power9
Add a new lowering rule to match and replace such instances
with the MADDLD instruction available on power9 where
possible.

Likewise, this plumbs in a new ppc64 ssa opcode to house
the newly generated MADDLD instructions.

When testing ed25519, this reduced binary size by 936B.
Similarly, MADDLD combination occcurs in a few other less
obvious cases such as division by constant.

Testing of golang.org/x/crypto/ed25519 shows non-trivial
speedup during keygeneration:

name           old time/op  new time/op  delta
KeyGeneration  65.2µs ± 0%  63.1µs ± 0%  -3.19%
Signing        64.3µs ± 0%  64.4µs ± 0%  +0.16%
Verification    147µs ± 0%   147µs ± 0%  +0.11%

Similarly, this test binary has shrunk by 66488B.

Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/248723
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-08-18 21:09:30 +00:00
Lynn Boger 56933fb838 cmd/compile,cmd/internal/obj/ppc64: use mod instructions on power9
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.

Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.

Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/229761
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-29 14:45:56 +00:00
Keith Randall ea7126fe14 cmd/compile: use a Sym type instead of interface{} for symbolic offsets
Will help with strongly typed rewrite rules.

Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba
Reviewed-on: https://go-review.googlesource.com/c/go/+/227785
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-10 16:24:46 +00:00
Lynn Boger 815509ae31 cmd/compile: improve lowered moves and zeros for ppc64le
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.

This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:

WordsDecode1e1    54.1ns ± 0%    53.8ns ± 0%   -0.51%  (p=0.029 n=4+4)
WordsDecode1e2     287ns ± 0%     282ns ± 1%   -1.83%  (p=0.029 n=4+4)
WordsDecode1e3    3.98µs ± 0%    3.64µs ± 0%   -8.52%  (p=0.029 n=4+4)
WordsDecode1e4    66.9µs ± 0%    67.0µs ± 0%   +0.20%  (p=0.029 n=4+4)
WordsDecode1e5     723µs ± 0%     723µs ± 0%   -0.01%  (p=0.200 n=4+4)
WordsDecode1e6    7.21ms ± 0%    7.21ms ± 0%   -0.02%  (p=1.000 n=4+4)
WordsEncode1e1    29.9ns ± 0%    29.4ns ± 0%   -1.51%  (p=0.029 n=4+4)
WordsEncode1e2    2.12µs ± 0%    1.75µs ± 0%  -17.70%  (p=0.029 n=4+4)
WordsEncode1e3    11.7µs ± 0%    11.2µs ± 0%   -4.61%  (p=0.029 n=4+4)
WordsEncode1e4     119µs ± 0%     120µs ± 0%   +0.36%  (p=0.029 n=4+4)
WordsEncode1e5    1.21ms ± 0%    1.22ms ± 0%   +0.41%  (p=0.029 n=4+4)
WordsEncode1e6    12.0ms ± 0%    12.0ms ± 0%   +0.57%  (p=0.029 n=4+4)
RandomEncode       286µs ± 0%     203µs ± 0%  -28.82%  (p=0.029 n=4+4)
ExtendMatch       47.4µs ± 0%    47.0µs ± 0%   -0.85%  (p=0.029 n=4+4)

Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-06 12:09:39 +00:00
Keith Randall 287d67e3dd cmd/compile: indexed loads/stores can't be faultOnNilArg0
Because of the index, these ops can't guarantee faulting if arg0 is nil.

Clean up the PPC64 index ops - they can't take a sym or an offset.

Noticed while debugging #37881. I don't think it is the cause, but I guess
there is a chance.
Update #37881

Change-Id: Ic22925250bf7b1ba64e3cea1a65638bc4bab390c
Reviewed-on: https://go-review.googlesource.com/c/go/+/224457
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-21 02:46:16 +00:00
Josh Bleecher Snyder 37fc092be1 cmd/compile: remove duplicate ppc64 rules
Const64 gets lowered to MOVDconst.
Change rules using interior Const64 to use MOVDconst instead,
to be less dependent on rule application order.

As a result of doing this, some of the rules end up being
exact duplicates; remove those.

We had those exact duplicates because of the order dependency;
ppc64 had no way to optimize away shifts by a constant
if the initial lowering didn't catch it.

Add those optimizations as well.
The outcome is the same, but this makes the overall rules more robust.

Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220877
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-03-02 21:59:19 +00:00
David Chase 40ebcfaa17 cmd/compile: enable nil check logging for other architectures.
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857
Reviewed-on: https://go-review.googlesource.com/c/go/+/204159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-10 17:12:15 +00:00
Cherry Zhang 24e549a717 cmd/compile, cmd/internal/obj/ppc64: use LR for indirect calls
On PPC64, indirect calls can be made through LR or CTR. Currently
both are used. This CL changes it to always use LR.

For async preemption, to return from the injected call, we need
an indirect jump back to the PC we preeempted. This jump can be
made through LR or CTR. So we'll have to clobber either LR or CTR.
Currently, LR is used more frequently. In particular, for a leaf
function, LR is live throughout the function. We don't want to
make leaf functions nonpreemptible. So we choose CTR for the call
injection. For code sequences that use CTR, if it is ok to use
another register, change it to.

Plus, it is a call so it will clobber LR anyway. It doesn't need
to also clobber CTR (even without preemption).

Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8
Reviewed-on: https://go-review.googlesource.com/c/go/+/203822
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 19:20:03 +00:00
Austin Clements 97592b3c14 cmd/compile: intrinsics for runtime/internal/atomic.Store8
For #10958, #24543, but makes sense on its own.

Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-29 03:18:55 +00:00
Lynn Boger 816ff44479 cmd/compile: use vsx loads and stores for LoweredMove, LoweredZero on ppc64x
This improves the code generated for LoweredMove and LoweredZero by
using LXVD2X and STXVD2X to move 16 bytes at a time. These instructions
are now used if the size to be moved or zeroed is >= 64. These same
instructions have already been used in the asm implementations for
memmove and memclr.

Some examples where this shows an improvement on power8:

MakeSlice/Byte                                  27.3ns ± 1%     25.2ns ± 0%    -7.69%
MakeSlice/Int16                                 40.2ns ± 0%     35.2ns ± 0%   -12.39%
MakeSlice/Int                                   94.9ns ± 1%     77.9ns ± 0%   -17.92%
MakeSlice/Ptr                                    129ns ± 1%      103ns ± 0%   -20.16%
MakeSlice/Struct/24                              176ns ± 1%      131ns ± 0%   -25.67%
MakeSlice/Struct/32                              200ns ± 1%      142ns ± 0%   -29.09%
MakeSlice/Struct/40                              220ns ± 2%      156ns ± 0%   -28.82%
GrowSlice/Byte                                  81.4ns ± 0%     73.4ns ± 0%    -9.88%
GrowSlice/Int16                                  118ns ± 1%       98ns ± 0%   -17.03%
GrowSlice/Int                                    178ns ± 1%      134ns ± 1%   -24.65%
GrowSlice/Ptr                                    249ns ± 4%      212ns ± 0%   -14.94%
GrowSlice/Struct/24                              294ns ± 5%      215ns ± 0%   -27.08%
GrowSlice/Struct/32                              315ns ± 1%      248ns ± 0%   -21.49%
GrowSlice/Struct/40                              382ns ± 4%      289ns ± 1%   -24.38%
ExtendSlice/IntSlice                             109ns ± 1%       90ns ± 1%   -17.51%
ExtendSlice/PointerSlice                         142ns ± 2%      118ns ± 0%   -16.75%
ExtendSlice/NoGrow                              6.02ns ± 0%     5.88ns ± 0%    -2.33%
Append                                          27.2ns ± 0%     27.6ns ± 0%    +1.38%
AppendGrowByte                                  4.20ms ± 3%     2.60ms ± 0%   -38.18%
AppendGrowString                                 134ms ± 3%      102ms ± 2%   -23.62%
AppendSlice/1Bytes                              5.65ns ± 0%     5.67ns ± 0%    +0.35%
AppendSlice/4Bytes                              6.40ns ± 0%     6.55ns ± 0%    +2.34%
AppendSlice/7Bytes                              8.74ns ± 0%     8.84ns ± 0%    +1.14%
AppendSlice/8Bytes                              5.68ns ± 0%     5.70ns ± 0%    +0.40%
AppendSlice/15Bytes                             9.31ns ± 0%     9.39ns ± 0%    +0.86%
AppendSlice/16Bytes                             14.0ns ± 0%      5.8ns ± 0%   -58.32%
AppendSlice/32Bytes                             5.72ns ± 0%     5.68ns ± 0%    -0.66%
AppendSliceLarge/1024Bytes                       918ns ± 8%      615ns ± 1%   -33.00%
AppendSliceLarge/4096Bytes                      3.25µs ± 1%     1.92µs ± 1%   -40.84%
AppendSliceLarge/16384Bytes                     8.70µs ± 2%     4.69µs ± 0%   -46.08%
AppendSliceLarge/65536Bytes                     18.1µs ± 3%      7.9µs ± 0%   -56.30%
AppendSliceLarge/262144Bytes                    69.8µs ± 2%     25.9µs ± 0%   -62.91%
AppendSliceLarge/1048576Bytes                    258µs ± 1%       93µs ± 0%   -63.96%

Change-Id: I21625dbe231a2029ddb9f7d73f5a6417b35c1e49
Reviewed-on: https://go-review.googlesource.com/c/go/+/199639
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-08 16:41:02 +00:00
Michael Munday 9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Cherry Zhang 55924135ee cmd/compile: fix register masks of ANDCC et al. on PPC64
PPC64's ANDCC, ORCC, XORCC SSA ops produce a flags value, which
should not have register mask of an integer register.

Fixes #34468.

Change-Id: Ic762e423b20275fd9f8118dae7951c258d59738c
Reviewed-on: https://go-review.googlesource.com/c/go/+/196960
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23 21:16:33 +00:00
Lynn Boger 7987238d9c cmd/asm,cmd/compile: clean up isel codegen on ppc64x
This cleans up the isel code generation in ssa for ppc64x.

Current there is no isel op and the isel code is only
generated from pseudo ops in ppc64/ssa.go, and only using
operands with values 0 or 1. When the isel is generated,
there is always a load of 1 into the temp register before it.

This change implements the isel op so it can be used in PPC64.rules,
and can recognize operand values other than 0 or 1. This also
eliminates the forced load of 1, so it will be loaded only if
needed.

This will make the isel code generation consistent with other ops,
and allow future rule changes that can take advantage of having
a more general purpose isel rule.

Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/195057
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-09-18 15:54:32 +00:00
Shulhan ed7f323c8f all: simplify code using "gofmt -s -w"
Most changes are removing redundant declaration of type when direct
instantiating value of map or slice, e.g. []T{T{}} become []T{{}}.

Small changes are removing the high order of subslice if its value
is the length of slice itself, e.g. T[:len(T)] become T[:].

The following file is excluded due to incompatibility with go1.4,

- src/cmd/compile/internal/gc/ssa.go

Change-Id: Id3abb09401795ce1e6da591a89749cba8502fb26
Reviewed-on: https://go-review.googlesource.com/c/go/+/166437
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-06 22:19:22 +00:00
Austin Clements 4a4e05b0b1 cmd/compile,runtime/internal/atomic: add Load8
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/172283
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-05-03 19:25:37 +00:00
Carlos Eduardo Seo 50ad09418e cmd/compile: intrinsify math/bits.Add64 for ppc64x
This change creates an intrinsic for Add64 for ppc64x and adds a
testcase for it.

name               old time/op  new time/op  delta
Add64-160          1.90ns ±40%  2.29ns ± 0%     ~     (p=0.119 n=5+5)
Add64multiple-160  6.69ns ± 2%  2.45ns ± 4%  -63.47%  (p=0.016 n=4+5)

Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97
Reviewed-on: https://go-review.googlesource.com/c/go/+/173758
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-04-28 23:51:04 +00:00
Keith Randall ad6c691542 cmd/compile: remove AUNDEF opcode
This opcode was only used to mark unreachable code for plive to use.
plive now uses the SSA representation, so it knows locations are
unreachable because they are ends of Exit blocks. It doesn't need
these opcodes any more.

These opcodes actually used space in the binary, 2 bytes per undef
on x86 and more for other archs.

Makes the amd64 go binary 0.2% smaller.

Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620
Reviewed-on: https://go-review.googlesource.com/c/go/+/171058
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-07 01:15:28 +00:00
Carlos Eduardo Seo 3023d7da49 cmd/compile/internal, cmd/internal/obj/ppc64: generate new count trailing zeros instructions on POWER9
This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD)
to the assembler and generates them in SSA when GOPPC64=power9.

name                 old time/op  new time/op  delta
TrailingZeros-160    1.59ns ±20%  1.45ns ±10%  -8.81%  (p=0.000 n=14+13)
TrailingZeros8-160   1.55ns ±23%  1.62ns ±44%    ~     (p=0.593 n=13+15)
TrailingZeros16-160  1.78ns ±23%  1.62ns ±38%  -9.31%  (p=0.003 n=14+14)
TrailingZeros32-160  1.64ns ±10%  1.49ns ± 9%  -9.15%  (p=0.000 n=13+14)
TrailingZeros64-160  1.53ns ± 6%  1.45ns ± 5%  -5.38%  (p=0.000 n=15+13)

Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596
Reviewed-on: https://go-review.googlesource.com/c/go/+/167517
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-03-20 20:27:00 +00:00
Keith Randall 2c423f063b cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3):

   s[-1]    runtime error: index out of range [-1]
   s[3]     runtime error: index out of range [3] with length 3
   s[-1:0]  runtime error: slice bounds out of range [-1:]
   s[3:0]   runtime error: slice bounds out of range [3:0]
   s[3:-1]  runtime error: slice bounds out of range [:-1]
   s[3:4]   runtime error: slice bounds out of range [:4] with capacity 3
   s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3

Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.

An exhaustive set of examples is in issue30116[u].out in the CL.

The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.

Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.

Fixes #30116

Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-18 17:33:38 +00:00
Clément Chigot 9fe9853ae5 cmd/compile: fix nilcheck for AIX
This commit adapts compile tool to create correct nilchecks for AIX.

AIX allows to load a nil pointer. Therefore, the default nilcheck
which issues a load must be replaced by a CMP instruction followed by a
store at 0x0 if the value is nil. The store will trigger a SIGSEGV as on
others OS.

The nilcheck algorithm must be adapted to do not remove nilcheck if it's
only a read. Stores are detected with v.Type.IsMemory().

Tests related to nilptr must be adapted to the previous changements.
nilptr.go cannot be used as it's because the AIX address space starts at
1<<32.

Change-Id: I9f5aaf0b7e185d736a9b119c0ed2fe4e5bd1e7af
Reviewed-on: https://go-review.googlesource.com/c/144538
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-26 14:13:53 +00:00
Lynn Boger 4ae49b5921 cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
This change makes use of the cc versions of the AND, OR, XOR
instructions, omitting the need for a CMP instruction.

In many test programs and in the go binary, this reduces the
size of 20-30 functions by at least 1 instruction, many in
runtime.

Testcase added to test/codegen/comparisons.go

Change-Id: I6cc1ca8b80b065d7390749c625bc9784b0039adb
Reviewed-on: https://go-review.googlesource.com/c/143059
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-09 19:40:52 +00:00
Carlos Eduardo Seo 5c472132bf cmd/compile, runtime: add new lightweight atomics for ppc64x
This change creates the infrastructure for new lightweight atomics
primitives in runtime/internal/atomic:

- LoadAcq, for load-acquire
- StoreRel, for store-release
- CasRel, for Compare-and-Swap-release

and implements them for ppc64x. There is visible performance improvement
in producer-consumer scenarios, like BenchmarkChanProdCons*:

benchmark                           old ns/op     new ns/op     delta
BenchmarkChanProdCons0-48           2034          2034          +0.00%
BenchmarkChanProdCons10-48          1798          1608          -10.57%
BenchmarkChanProdCons100-48         1596          1585          -0.69%
BenchmarkChanProdConsWork0-48       2084          2046          -1.82%
BenchmarkChanProdConsWork10-48      1829          1668          -8.80%
BenchmarkChanProdConsWork100-48     1650          1650          +0.00%

Fixes #21348

Change-Id: I1f6ce377e4a0fe4bd7f5f775e8036f50070ad8db
Reviewed-on: https://go-review.googlesource.com/c/142277
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-10-23 18:10:38 +00:00
Igor Zhilianin 04dc1b2443 all: fix a bunch of misspellings
Change-Id: I94cebca86706e072fbe3be782d3edbe0e22b9432
GitHub-Last-Rev: 8e15a40545
GitHub-Pull-Request: golang/go#28067
Reviewed-on: https://go-review.googlesource.com/c/140437
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-08 03:12:03 +00:00
Lynn Boger 85c9c85955 cmd/compile: add rules to use index regs for ppc64x
The following adds support for load and store instructions
with index registers, and adds rules to take advantage of
those instructions.

Examples of improvements:

crypto/rc4:
name     old time/op   new time/op   delta
RC4_128    445ns ± 0%    404ns ± 0%   -9.21%  (p=0.029 n=4+4)
RC4_1K    3.46µs ± 0%   3.13µs ± 0%   -9.29%  (p=0.029 n=4+4)
RC4_8K    27.0µs ± 0%   24.7µs ± 0%   -8.83%  (p=0.029 n=4+4)

crypto/des:
name         old time/op    new time/op    delta
Encrypt         276ns ± 0%     264ns ± 0%  -4.35%  (p=0.029 n=4+4)
Decrypt         278ns ± 0%     263ns ± 0%  -5.40%  (p=0.029 n=4+4)
TDESEncrypt     683ns ± 0%     645ns ± 0%  -5.56%  (p=0.029 n=4+4)
TDESDecrypt     684ns ± 0%     641ns ± 0%  -6.29%  (p=0.029 n=4+4)

crypto/sha1:
name          old time/op    new time/op    delta
Hash8Bytes       661ns ± 0%     635ns ± 0%  -3.93%  (p=1.000 n=1+1)
Hash320Bytes    2.70µs ± 0%    2.56µs ± 0%  -5.26%  (p=1.000 n=1+1)
Hash1K          7.14µs ± 0%    6.78µs ± 0%  -5.03%  (p=1.000 n=1+1)
Hash8K          52.1µs ± 0%    49.4µs ± 0%  -5.14%  (p=1.000 n=1+1)

Change-Id: I03810e90fcc20029975a323f06bfa086c973c2b0
Reviewed-on: https://go-review.googlesource.com/c/135975
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-03 22:33:13 +00:00
Carlos Eduardo Seo 9aed4cc395 cmd/compile: instrinsify math/bits.Mul on ppc64x
Add SSA rules to intrinsify Mul/Mul64 on ppc64x.

benchmark             old ns/op     new ns/op     delta
BenchmarkMul-40       8.80          0.93          -89.43%
BenchmarkMul32-40     1.39          1.39          +0.00%
BenchmarkMul64-40     5.39          0.93          -82.75%

Updates #24813

Change-Id: I6e95bfbe976a2278bd17799df184a7fbc0e57829
Reviewed-on: https://go-review.googlesource.com/138917
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-10-02 18:56:06 +00:00
Lynn Boger 28edaf4584 cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.

This adds new testcases in test/codegen/memcombine.go

Fixes #22496
Updates #24242

Benchmark improvement for encoding/binary:
name                      old time/op    new time/op    delta
ReadSlice1000Int32s-16      11.0µs ± 0%     9.0µs ± 0%  -17.47%  (p=0.029 n=4+4)
ReadStruct-16               2.47µs ± 1%    2.48µs ± 0%   +0.67%  (p=0.114 n=4+4)
ReadInts-16                  642ns ± 1%     630ns ± 1%   -2.02%  (p=0.029 n=4+4)
WriteInts-16                 654ns ± 0%     653ns ± 1%   -0.08%  (p=0.629 n=4+4)
WriteSlice1000Int32s-16     8.75µs ± 0%    8.20µs ± 0%   -6.19%  (p=0.029 n=4+4)
PutUint16-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint32-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint64-16                1.85ns ± 0%    0.93ns ± 0%  -49.73%  (p=0.029 n=4+4)
LittleEndianPutUint16-16    1.03ns ± 0%    0.93ns ± 0%   -9.71%  (p=0.029 n=4+4)
LittleEndianPutUint32-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
LittleEndianPutUint64-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
PutUvarint32-16             43.0ns ± 0%    43.1ns ± 0%   +0.12%  (p=0.429 n=4+4)
PutUvarint64-16              174ns ± 0%     175ns ± 0%   +0.29%  (p=0.429 n=4+4)

Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.

Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00
Wei Xiao 20102594a0 cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures:
  arm
  mips mipsle mips64 mips64le
  ppc64 ppc64le
  s390x

Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-02 16:59:27 +00:00
Carlos Eduardo Seo ebb67d993a cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.

benchmark                   old ns/op     new ns/op     delta
BenchmarkRound-16           2.60          0.69          -73.46%

Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-26 14:12:09 +00:00
Austin Clements 8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
Lynn Boger 30311e8860 cmd/compile: generate load without DS relocation for go.string on ppc64le
Due to some recent optimizations related to the compare
instruction, DS-form load instructions started to be used
to load 8-byte go.strings. This can cause link time errors
if the go.string is not aligned to 4 bytes.

For DS-form instructions, the value in the offset field must
be a multiple of 4. If the offset is known at the time the
rules are processed, a DS-form load will not be chosen. But for
go.strings, the offset is not known at that time, but a
relocation is generated indicating that the linker should fill
in the DS relocation. When the linker tries to fill in the
relocation, if the offset is not aligned properly, a link error
will occur.

To fix this, when loading a go.string using MOVDload, the full
address of the go.string is generated and loaded into the base
register. Then the go.string is loaded with a 0 offset field.

Added a testcase that reproduces this problem.

Fixes #24799

Change-Id: I6a154e8e1cba64eae290be0fbcb608b75884ecdd
Reviewed-on: https://go-review.googlesource.com/107855
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 16:16:47 +00:00
David Chase ab48574bab cmd/compile: use Block.Likely to put likely branch first
When a neither of a conditional block's successors follows,
the block must end with a conditional branch followed by a
an unconditional branch.  If the (conditional) branch is
"unlikely", invert it and swap successors to make it
likely instead.

This doesn't matter to most benchmarks on amd64, but in one
instance on amd64 it caused a 30% improvement, and it is
otherwise harmless.  The problematic loop is

		for i := 0; i < w; i++ {
			if pw[i] != 0 {
				return true
			}
		}

compiled under GOEXPERIMENT=preemptibleloops
This the very worst-case benchmark for that experiment.

Also in this CL is a commoning up of heavily-repeated
boilerplate, which made it much easier to see that the
changes were applied correctly.  In the future this should
allow un-exporting of SSAGenState.Branches once the
boilerplate-replacement is done everywhere.

Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335
Reviewed-on: https://go-review.googlesource.com/104977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-11 17:35:34 +00:00
Carlos Eduardo Seo 6633bb2aa7 cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
This change implements faster atomics for ppc64x based on the ISA 2.07B,
Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
cases.

Updates #21348

name                                           old time/op new time/op    delta
Cond1-16                                           955ns     856ns      -10.33%
Cond2-16                                          2.38µs    2.03µs      -14.59%
Cond4-16                                          5.90µs    5.44µs       -7.88%
Cond8-16                                          12.1µs    11.1µs       -8.42%
Cond16-16                                         27.0µs    25.1µs       -7.04%
Cond32-16                                         59.1µs    55.5µs       -6.14%
LoadMostlyHits/*sync_test.DeepCopyMap-16          22.1ns    24.1ns       +9.02%
LoadMostlyHits/*sync_test.RWMutexMap-16            252ns     249ns       -1.20%
LoadMostlyHits/*sync.Map-16                       16.2ns    16.3ns         ~
LoadMostlyMisses/*sync_test.DeepCopyMap-16        22.3ns    22.6ns         ~
LoadMostlyMisses/*sync_test.RWMutexMap-16          249ns     247ns       -0.51%
LoadMostlyMisses/*sync.Map-16                     12.7ns    12.7ns         ~
LoadOrStoreBalanced/*sync_test.RWMutexMap-16      1.27µs    1.17µs       -7.54%
LoadOrStoreBalanced/*sync.Map-16                  1.12µs    1.10µs       -2.35%
LoadOrStoreUnique/*sync_test.RWMutexMap-16        1.75µs    1.68µs       -3.84%
LoadOrStoreUnique/*sync.Map-16                    2.07µs    1.97µs       -5.13%
LoadOrStoreCollision/*sync_test.DeepCopyMap-16    15.8ns    15.9ns         ~
LoadOrStoreCollision/*sync_test.RWMutexMap-16      496ns     424ns      -14.48%
LoadOrStoreCollision/*sync.Map-16                 6.07ns    6.07ns         ~
Range/*sync_test.DeepCopyMap-16                   1.65µs    1.64µs         ~
Range/*sync_test.RWMutexMap-16                     278µs     288µs       +3.75%
Range/*sync.Map-16                                2.00µs    2.01µs         ~
AdversarialAlloc/*sync_test.DeepCopyMap-16        3.45µs    3.44µs         ~
AdversarialAlloc/*sync_test.RWMutexMap-16          226ns     227ns         ~
AdversarialAlloc/*sync.Map-16                     1.09µs    1.07µs       -2.36%
AdversarialDelete/*sync_test.DeepCopyMap-16        553ns     550ns       -0.57%
AdversarialDelete/*sync_test.RWMutexMap-16         273ns     274ns         ~
AdversarialDelete/*sync.Map-16                     247ns     249ns         ~
UncontendedSemaphore-16                           79.0ns    65.5ns      -17.11%
ContendedSemaphore-16                              112ns      97ns      -13.77%
MutexUncontended-16                               3.34ns    2.51ns      -24.69%
Mutex-16                                           266ns     191ns      -28.26%
MutexSlack-16                                      226ns     159ns      -29.55%
MutexWork-16                                       377ns     338ns      -10.14%
MutexWorkSlack-16                                  335ns     308ns       -8.20%
MutexNoSpin-16                                     196ns     184ns       -5.91%
MutexSpin-16                                       710ns     666ns       -6.21%
Once-16                                           1.29ns    1.29ns         ~
Pool-16                                           8.64ns    8.71ns         ~
PoolOverflow-16                                   1.60µs    1.44µs      -10.25%
SemaUncontended-16                                5.39ns    4.42ns      -17.96%
SemaSyntNonblock-16                                539ns     483ns      -10.42%
SemaSyntBlock-16                                   413ns     354ns      -14.20%
SemaWorkNonblock-16                                305ns     258ns      -15.36%
SemaWorkBlock-16                                   266ns     229ns      -14.06%
RWMutexUncontended-16                             12.9ns     9.7ns      -24.80%
RWMutexWrite100-16                                 203ns     147ns      -27.47%
RWMutexWrite10-16                                  177ns     119ns      -32.74%
RWMutexWorkWrite100-16                             435ns     403ns       -7.39%
RWMutexWorkWrite10-16                              642ns     611ns       -4.79%
WaitGroupUncontended-16                           4.67ns    3.70ns      -20.92%
WaitGroupAddDone-16                                402ns     355ns      -11.54%
WaitGroupAddDoneWork-16                            208ns     250ns      +20.09%
WaitGroupWait-16                                  1.21ns    1.21ns         ~
WaitGroupWaitWork-16                              5.91ns    5.87ns       -0.81%
WaitGroupActuallyWait-16                          92.2ns    85.8ns       -6.91%

Updates #21348

Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
Reviewed-on: https://go-review.googlesource.com/95175
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-22 14:13:01 +00:00
Austin Clements ae7d5f84f8 runtime: buffered write barrier for ppc64
Updates #22460.

Change-Id: I6040c4024111c80361c81eb7eec5071ec9efb4f9
Reviewed-on: https://go-review.googlesource.com/92702
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-13 16:34:23 +00:00
Lynn Boger 4d0151ede5 cmd/compile,cmd/internal/obj/ppc64: make math.Abs,math.Copysign instrinsics on ppc64x
This adds support for math Abs, Copysign to be instrinsics on ppc64x.

New instruction FCPSGN is added to generate fcpsgn. Some new
rules are added to improve the int<->float conversions that are
generated mainly due to the Float64bits and Float64frombits in
the math package. PPC64.rules is also modified as suggested
in the review for CL 63290.

Improvements:
benchmark                           old ns/op     new ns/op     delta
BenchmarkAbs-16                   1.12          0.69          -38.39%
BenchmarkCopysign-16              1.30          0.93          -28.46%
BenchmarkNextafter32-16           9.34          8.05          -13.81%
BenchmarkFrexp-16                 8.81          7.60          -13.73%

Others that used Copysign also saw smaller improvements.

I attempted to make this work using rules since that
seems to be preferred, but due to the use of Float64bits and
Float64frombits in these functions, several rules had to be added and
even then not all cases were matched. Using rules became too
complicated and seemed too fragile for these.

Updates #21390

Change-Id: Ia265da9a18355e08000818a4fba1a40e9e031995
Reviewed-on: https://go-review.googlesource.com/67130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-30 13:56:39 +00:00