How to get the number of characters in a string

Question

How can I get the number of characters of a string in Go   For example  if I have a string  hello  the method should return 5  I saw that len str  returns the number of bytes and not the number of characters so len       returns 2 instead of 1 because    is encoded with two bytes in UTF-8

User · Answer

There is a way to get count of runes without any packages by converting string to []rune as len([]rune(YOUR_STRING)):

package main

import "fmt"

func main() {
    russian := "??????? ? ??????"
    english := "Sputnik & pogrom"

    fmt.Println("count of bytes:",
        len(russian),
        len(english))

    fmt.Println("count of runes:",
        len([]rune(russian)),
        len([]rune(english)))

}

count of bytes 30 16

count of runes 16 16

User · Answer

Depends a lot on your definition of what a  character  is  If  rune equals a character   is OK for your task  generally it isn t  then the answer by VonC is perfect for you  Otherwise  it should be probably noted  that there are few situations where the number of runes in a Unicode string is an interesting value  And even in those situations it s better  if possible  to infer the count while  traversing  the string as the runes are processed to avoid doubling the UTF-8 decode effort

User · Answer

I tried to make to do the normalization a bit faster       en      glyphSmart data       func glyphSmart text string   int  int            gc    0         dummy    0         for ind       range text               gc               dummy   ind                   dummy   0         return gc  dummy

User · Answer

You can try RuneCountInString from the utf8 package   returns the number of runes in p  that  as illustrated in this script  the length of  quot World quot  might be 6  when written in Chinese   quot    quot    but its rune count is 2  package main      import  quot fmt quot  import  quot unicode utf8 quot       func main         fmt Println  quot Hello     quot   len  quot    quot    utf8 RuneCountInString  quot    quot       Phrozen adds in the comments  Actually you can do len   over runes by just type casting  len   rune  quot    quot    will print 2  At leats in Go 1 3   And with CL 108985  May 2018  for Go 1 11   len   rune string   is now optimized   Fixes issue 24923  The compiler detects len   rune string   pattern automatically  and replaces it with for r    range s call   Adds a new runtime function to count runes in a string  Modifies the compiler to detect the pattern len   rune string   and replaces it with the new rune counting runtime function  RuneCount lenruneslice ASCII        27 8ns    2   14 5ns    3   -47 70  RuneCount lenruneslice Japanese     126ns    2    60  ns    2   -52 03  RuneCount lenruneslice MixedLength  104ns    2    50  ns    1   -51 71     Stefan Steiger points to the blog post  quot Text normalization in Go quot   What is a character    As was mentioned in the strings blog post  characters can span multiple runes  For example  an  e  and           acute  quot  u0301 quot   can combine to form        quot e u0301 quot  in NFD   Together these two runes are one character  The definition of a character may vary depending on the application  For normalization we will define it as   a sequence of runes that starts with a starter  a rune that does not modify or combine backwards with any other rune  followed by possibly empty sequence of non-starters  that is  runes that do  typically accents    The normalization algorithm processes one character at at time   Using that package and its Iter type  the actual number of  quot character quot  would be  package main      import  quot fmt quot  import  quot golang org x text unicode norm quot       func main         var ia norm Iter     ia InitString norm NFKD   quot   cole quot       nc    0     for  ia Done             nc   nc   1         ia Next             fmt Printf  quot Number of chars   d n quot   nc     Here  this uses the Unicode Normalization form NFKD  quot Compatibility Decomposition quot   Oliver s answer points to UNICODE TEXT SEGMENTATION as the only way to reliably determining default boundaries between certain significant text elements  user-perceived characters  words  and sentences  For that  you need an external library like rivo uniseg  which does Unicode Text Segmentation  That  will actually count   quot grapheme cluster quot   where multiple code points may be combined into one user-perceived character  package uniseg      import        quot fmt quot            quot github com rivo uniseg quot         func main         gr    uniseg NewGraphemes  quot   quot       for gr Next             fmt Printf  quot  x  quot   gr Runes                 Output   1f44d 1f3fc   21     Two graphemes  even though there are three runes  Unicode code points   You can see other examples in  quot How to manipulate strings in GO to reverse them  quot    alone is one grapheme  but  from unicode to code points converter  4 runes     women  1f469  dark skin  1f3fe  ZERO WIDTH JOINER  200d  red hair  1f9b0

User · Answer

If you need to take grapheme clusters into account  use regexp or unicode module  Counting the number of code points runes  or bytes also is needed for validaiton since the length of grapheme cluster is unlimited  If you want to eliminate extremely long sequences  check if the sequences conform to stream-safe text format   package main  import        regexp       unicode       strings     func main          str      u0308     a u0308     o u0308     u u0308      str2     a    strings Repeat   u0308   1000       println 4    GraphemeCountInString str       println 4    GraphemeCountInString2 str        println 1    GraphemeCountInString str2       println 1    GraphemeCountInString2 str2        println true    IsStreamSafeString str       println false    IsStreamSafeString str2       func GraphemeCountInString str string  int       re    regexp MustCompile    PM  pM          return len re FindAllString str  -1      func GraphemeCountInString2 str string  int        length    0     checked    false     index    0      for    c    range str            if  unicode Is unicode M  c                length                if checked    false                   checked   true                          else if checked    false               length                      index              return length    func IsStreamSafeString str string  bool       re    regexp MustCompile    PM  pM 30          return  re MatchString str

User · Answer

I should point out that none of the answers provided so far give you the number of characters as you would expect  especially when you re dealing with emojis  but also some languages like Thai  Korean  or Arabic   VonC s suggestions will output the following   fmt Println utf8 RuneCountInString           Outputs  6   fmt Println len   rune            Outputs  6     That s because these methods only count Unicode code points  There are many characters which can be composed of multiple code points   Same for using the Normalization package   var ia norm Iter ia InitString norm NFKD        nc    0 for  ia Done         nc   nc   1     ia Next     fmt Println nc     Outputs  6     Normalization is not really the same as counting characters and many characters cannot be normalized into a one-code-point equivalent   masakielastic s answer comes close but only handles modifiers  the rainbow flag contains a modifier which is thus not counted as its own code point    fmt Println GraphemeCountInString            Outputs  5   fmt Println GraphemeCountInString2           Outputs  5     The correct way to split Unicode strings into  user-perceived  characters  i e  grapheme clusters  is defined in the Unicode Standard Annex  29  The rules can be found in Section 3 1 1  The github com rivo uniseg package implements these rules so you can determine the correct number of characters in a string   fmt Println uniseg GraphemeClusterCount           Outputs  2

User · Answer

There are several ways to get a string length   package main  import        bytes       fmt       strings       unicode utf8     func main         b                len1    len   rune b       len2    bytes Count   byte b   nil  -1     len3    strings Count b      - 1     len4    utf8 RuneCountInString b      fmt Println len1      fmt Println len2      fmt Println len3      fmt Println len4

[string] How to get the number of characters in a string

Examples related to string

Examples related to go

Examples related to character

Examples related to string-length