Original post

Writing a static analysis tool for a language could be daunting for those who hasn’t done it before. The good news is for , it’s actually very straight forward if you know how to leverage existing packages exposed by the compiler itself.

In this series of blog posts, I would like to share some of tips I learned from building my first Go static analysis tool: sqlvet.

To keep it simple, I won’t be using sqlvet as the example. Instead I am going to build a dummy static analysis tool to warn the use of fmt.Println and fmt.Printf functions.

Source code is stored as a blob of text, which is easy for human to read, but hard for computer to manipulate. So the first step is to parse them into an in memory data structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

package main

import (
    "fmt"
    "go/parser"
    "go/token"
    "os"
)

func main() {
    srcPath := os.Args[1]
    fmt.Printf("Parsing source file %s...n", srcPath)
    fset := token.NewFileSet()
    f, err := parser.ParseFile(fset, srcPath, nil, 0)
    // f is of type *ast.File
    if err != nil {
        panic(err)
    }
    fmt.Println("Found imports:")
    for _, s := range f.Imports {
        fmt.Println(s.Path.Value)
    }
}

Yes, it’s that easy! Using just two function calls, we have our source file fully parsed into an AST. If you don’t know what an AST is, don’t worry about it, just think of it as source code represented in a tree data structure that’s easy for a machine to consume. As noted in the comment above, the parsed AST is stored in the variable f with a type of *ast.File.

Here is what it looks like to run this code on itself:

$ go run . main.go
Parsing source file main.go...
Found imports:
"fmt"
"go/parser"
"go/token"
"os"

Finding fmt.Println and fmt.Printf calls can be done through two steps. First, find all function calls. Then filter those calls by function name.

All statements, including function calls are stored as a tree node within the AST we generated from the source code. If we do a full traverse of the AST, we should be able to hit all the function calls.

Because AST traversal is such a common operation, go/ast package comes with a helper function called ast.Inspect. When invoked, this function will travrse the AST in depth-first order and process each syntax tree node with a provided callback:

20
21
22
23
24
25
26
27

    // print all function calls
    ast.Inspect(f, func(n ast.Node) bool {
        switch x := n.(type) {
        case *ast.CallExpr:
            ast.Print(fset, x.Fun)  // ast.Print is handy for debugging
        }
        return true
    })

Let’s run the code and see if we can find the AST node for fmt.Printf:

$ go run . ./main.go
Parsing source file ./main.go...
<...>
     0  *ast.SelectorExpr {
     1  .  X: *ast.Ident {
     2  .  .  NamePos: ./main.go:13:2
     3  .  .  Name: "fmt"
     4  .  }
     5  .  Sel: *ast.Ident {
     6  .  .  NamePos: ./main.go:13:6
     7  .  .  Name: "Printf"
     8  .  }
     9  }
<...>

Notice fmt.Printf calls are parsed into *ast.SelectorExpr structs with fmt as the expression (struct field X) and Printf as the selector (struct field Sel)

With this information, we can add couple filter rules in the callback to focus only on print function calls:

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

    hasPrint := false
    ast.Inspect(f, func(n ast.Node) bool {
        switch x := n.(type) {
        case *ast.CallExpr:
            selexpr, ok := x.Fun.(*ast.SelectorExpr)
            if !ok {
                return true
            }
            ident, ok := selexpr.X.(*ast.Ident)
            if !ok || ident.Name != "fmt" {
                return true
            }
            if selexpr.Sel.Name == "Printf" || selexpr.Sel.Name == "Println" {
                // convert compact token position to raw source position for display
                pos := fset.Position(selexpr.Sel.Pos())
                fmt.Printf("Use of `fmt.%s` detected at %vn", selexpr.Sel.Name, pos)
                hasPrint = true
            }
        }
        return true
    })
    if hasPrint {
        os.Exit(1)
    } else {
        fmt.Println("All good!")
    }

Here is what the final output looks like:

$ go run . ./main.go
Use of `fmt.Printf` detected at ./main.go:33:9
Use of `fmt.Println` detected at ./main.go:43:7
exit status 1

Not bad for less than 50 lines of code right?

What we have built so far only works for a single source file, which is not very useful. Using golang.org/x/tools/go/packages package, we can parse all source files within a given package path with just couple lines of code.

While go/ast package is straight forward to use, we can’t run deeper analysis using just it without a lot of extra work. For example, we don’t have access to type information and function call graphs. Luckily, we can get all of those through golang.org/x/tools/go/ssa and golang.org/x/tools/go/pointer packages with very little effort.

In the next blog post, I will cover how sqlvet leverages those packages and other techniques to discovery SQL statements in a code base and analyze them at build time to prevent runtime errors.