go/src/cmd/compile/internal/ssa/gen
Josh Bleecher Snyder 54dbab5221 cmd/compile: optimize TrailingZeros(8|16) on amd64
Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

name               old time/op  new time/op  delta
TrailingZeros8-8   1.33ns ± 6%  0.84ns ± 3%  -36.90%  (p=0.000 n=20+20)
TrailingZeros16-8  1.26ns ± 5%  0.84ns ± 5%  -33.50%  (p=0.000 n=20+18)

Code:

func f8(x uint8)   { z = bits.TrailingZeros8(x) }
func f16(x uint16) { z = bits.TrailingZeros16(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	BTSQ	$8, AX
	0x000d 00013 (x.go:7)	BSFQ	AX, AX
	0x0011 00017 (x.go:7)	MOVL	$64, CX
	0x0016 00022 (x.go:7)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

"".f16 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BTSQ	$16, AX
	0x000d 00013 (x.go:8)	BSFQ	AX, AX
	0x0011 00017 (x.go:8)	MOVL	$64, CX
	0x0016 00022 (x.go:8)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:8)	RET

After:

"".f8 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	BTSL	$8, AX
	0x0009 00009 (x.go:7)	BSFL	AX, AX
	0x000c 00012 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:7)	RET

"".f16 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	BTSL	$16, AX
	0x0009 00009 (x.go:8)	BSFL	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:8)	RET

Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11
Reviewed-on: https://go-review.googlesource.com/108940
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:33:52 +00:00
..
386.rules cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
386Ops.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
AMD64.rules cmd/compile: optimize TrailingZeros(8|16) on amd64 2018-04-25 21:33:52 +00:00
AMD64Ops.go cmd/compile: add amd64 LEAL{1,2,4,8} ops 2018-04-23 21:42:28 +00:00
ARM.rules cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
ARM64.rules cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
ARM64Ops.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
ARMOps.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
MIPS.rules cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
MIPS64.rules cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
MIPS64Ops.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
MIPSOps.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
PPC64.rules cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
PPC64Ops.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
README
S390X.rules cmd/compile/internal/types: remove ElemType wrapper 2018-04-24 22:24:47 +00:00
S390XOps.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
dec.rules cmd/compile/internal/types: remove ElemType wrapper 2018-04-24 22:24:47 +00:00
dec64.rules cmd/compile: change ssa.Type into *types.Type 2017-05-09 23:01:51 +00:00
dec64Ops.go [dev.ssa] cmd/compile: decompose 64-bit integer on ARM 2016-06-02 13:01:09 +00:00
decOps.go
generic.rules cmd/compile/internal/types: remove ElemType wrapper 2018-04-24 22:24:47 +00:00
genericOps.go cmd/compile: optimize TrailingZeros(8|16) on amd64 2018-04-25 21:33:52 +00:00
main.go cmd/compile: don't lower OpConvert 2018-04-20 18:46:39 +00:00
rulegen.go all: fix misspellings 2018-02-20 21:02:58 +00:00

README

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

This package generates opcode tables, rewrite rules, etc. for the ssa compiler.
Run it with:
   go run *.go