[string] How to get the number of characters in a string

How can I get the number of characters of a string in Go?

For example, if I have a string "hello" the method should return 5. I saw that len(str) returns the number of bytes and not the number of characters so len("£") returns 2 instead of 1 because £ is encoded with two bytes in UTF-8.

This question is related to string go character string-length

The answer is


There is a way to get count of runes without any packages by converting string to []rune as len([]rune(YOUR_STRING)):

package main

import "fmt"

func main() {
    russian := "??????? ? ??????"
    english := "Sputnik & pogrom"

    fmt.Println("count of bytes:",
        len(russian),
        len(english))

    fmt.Println("count of runes:",
        len([]rune(russian)),
        len([]rune(english)))

}

count of bytes 30 16

count of runes 16 16


If you need to take grapheme clusters into account, use regexp or unicode module. Counting the number of code points(runes) or bytes also is needed for validaiton since the length of grapheme cluster is unlimited. If you want to eliminate extremely long sequences, check if the sequences conform to stream-safe text format.

package main

import (
    "regexp"
    "unicode"
    "strings"
)

func main() {

    str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
    str2 := "a" + strings.Repeat("\u0308", 1000)

    println(4 == GraphemeCountInString(str))
    println(4 == GraphemeCountInString2(str))

    println(1 == GraphemeCountInString(str2))
    println(1 == GraphemeCountInString2(str2))

    println(true == IsStreamSafeString(str))
    println(false == IsStreamSafeString(str2))
}


func GraphemeCountInString(str string) int {
    re := regexp.MustCompile("\\PM\\pM*|.")
    return len(re.FindAllString(str, -1))
}

func GraphemeCountInString2(str string) int {

    length := 0
    checked := false
    index := 0

    for _, c := range str {

        if !unicode.Is(unicode.M, c) {
            length++

            if checked == false {
                checked = true
            }

        } else if checked == false {
            length++
        }

        index++
    }

    return length
}

func IsStreamSafeString(str string) bool {
    re := regexp.MustCompile("\\PM\\pM{30,}") 
    return !re.MatchString(str) 
}

There are several ways to get a string length:

package main

import (
    "bytes"
    "fmt"
    "strings"
    "unicode/utf8"
)

func main() {
    b := "?????"
    len1 := len([]rune(b))
    len2 := bytes.Count([]byte(b), nil) -1
    len3 := strings.Count(b, "") - 1
    len4 := utf8.RuneCountInString(b)
    fmt.Println(len1)
    fmt.Println(len2)
    fmt.Println(len3)
    fmt.Println(len4)

}


I should point out that none of the answers provided so far give you the number of characters as you would expect, especially when you're dealing with emojis (but also some languages like Thai, Korean, or Arabic). VonC's suggestions will output the following:

fmt.Println(utf8.RuneCountInString("??")) // Outputs "6".
fmt.Println(len([]rune("??"))) // Outputs "6".

That's because these methods only count Unicode code points. There are many characters which can be composed of multiple code points.

Same for using the Normalization package:

var ia norm.Iter
ia.InitString(norm.NFKD, "??")
nc := 0
for !ia.Done() {
    nc = nc + 1
    ia.Next()
}
fmt.Println(nc) // Outputs "6".

Normalization is not really the same as counting characters and many characters cannot be normalized into a one-code-point equivalent.

masakielastic's answer comes close but only handles modifiers (the rainbow flag contains a modifier which is thus not counted as its own code point):

fmt.Println(GraphemeCountInString("??"))  // Outputs "5".
fmt.Println(GraphemeCountInString2("??")) // Outputs "5".

The correct way to split Unicode strings into (user-perceived) characters, i.e. grapheme clusters, is defined in the Unicode Standard Annex #29. The rules can be found in Section 3.1.1. The github.com/rivo/uniseg package implements these rules so you can determine the correct number of characters in a string:

fmt.Println(uniseg.GraphemeClusterCount("??")) // Outputs "2".

Depends a lot on your definition of what a "character" is. If "rune equals a character " is OK for your task (generally it isn't) then the answer by VonC is perfect for you. Otherwise, it should be probably noted, that there are few situations where the number of runes in a Unicode string is an interesting value. And even in those situations it's better, if possible, to infer the count while "traversing" the string as the runes are processed to avoid doubling the UTF-8 decode effort.


I tried to make to do the normalization a bit faster:

    en, _ = glyphSmart(data)

    func glyphSmart(text string) (int, int) {
        gc := 0
        dummy := 0
        for ind, _ := range text {
            gc++
            dummy = ind
        }
        dummy = 0
        return gc, dummy
    }

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to go

Has been blocked by CORS policy: Response to preflight request doesn’t pass access control check Go test string contains substring Golang read request body How to uninstall Golang? Decode JSON with unknown structure Access HTTP response as string in Go How to search for an element in a golang slice How to delete an element from a Slice in Golang How to set default values in Go structs MINGW64 "make build" error: "bash: make: command not found"

Examples related to character

Set the maximum character length of a UITextField in Swift Max length UITextField Remove last character from string. Swift language Get nth character of a string in Swift programming language How many characters can you store with 1 byte? How to convert integers to characters in C? Converting characters to integers in Java How to check the first character in a string in Bash or UNIX shell? Invisible characters - ASCII How to delete Certain Characters in a excel 2010 cell

Examples related to string-length

String field value length in mongoDB SQL SELECT everything after a certain character Length of string in bash Get column value length, not column max length of value How to get the number of characters in a string How to find the length of a string in R Retrieve the maximum length of a VARCHAR column in SQL Server Measure string size in Bytes in php How to get the size of a string in Python? Display only 10 characters of a long string?