How to define hash tables in Bash

Question

What is the equivalent of Python dictionaries but in Bash  should work across OS X and Linux

User · Answer

I agree with  lhunath and others that the associative array are the way to go with Bash 4  If you are stuck to Bash 3  OSX  old distros that you cannot update  you can use also expr  which should be everywhere  a string and regular expressions  I like it especially when the dictionary is not too big    Choose 2 separators that you will not use in keys and values  e g      and       Write your map as a string  note the separator     also at beginning and end   animals   moo cow woof dog    Use a regex to extract the values  get animal       echo    expr   animals         1                    Split the string to list the items  get animal items       arr   echo    animals 1    animals -2     tr       n       for i in  arr     do         value    i               key    i               echo    value  likes to  key      done      Now you can use it     animal   get animal  moo  cow   get animal items cow likes to moo dog likes to woof

User · Answer

Just use the file system

The file system is a tree structure that can be used as a hash map. Your hash table will be a temporary directory, your keys will be filenames, and your values will be file contents. The advantage is that it can handle huge hashmaps, and doesn't require a specific shell.

Hashtable creation

hashtable=$(mktemp -d)

Add an element

echo $value > $hashtable/$key

Read an element

value=$(< $hashtable/$key)

Performance

Of course, its slow, but not that slow. I tested it on my machine, with an SSD and btrfs, and it does around 3000 element read/write per second.

User · Answer

Bash 4  Bash 4 natively supports this feature   Make sure your script s hashbang is    usr bin env bash or    bin bash so you don t end up using sh   Make sure you re either executing your script directly  or execute script with bash script   Not actually executing a Bash script with Bash does happen  and will be really confusing    You declare an associative array by doing   declare -A animals   You can fill it up with elements using the normal array assignment operator  For example  if you want to have a map of animal sound key     animal value    animals     moo    cow    woof    dog     Or merge them   declare -A animals     moo    cow    woof    dog     Then use them just like normal arrays  Use   animals  key    value  to set value    animals      to expand the values     animals       notice the    to expand the keys     Don t forget to quote them   echo    animals moo    for sound in     animals       do echo   sound -   animals  sound     done   Bash 3  Before bash 4  you don t have associative arrays   Do not use eval to emulate them   Avoid eval like the plague  because it is the plague of shell scripting   The most important reason is that eval treats your data as executable code  there are many other reasons too    First and foremost  Consider upgrading to bash 4   This will make the whole process much easier for you   If there s a reason you can t upgrade  declare is a far safer option   It does not evaluate data as bash code like eval does  and as such does not allow arbitrary code injection quite so easily   Let s prepare the answer by introducing the concepts   First  indirection     animals moo cow  sound moo  i  animals  sound   echo     i   cow   Secondly  declare     sound moo  animal cow  declare  animals  sound  animal   echo   animals moo  cow   Bring them together     Set a value  declare  array  index  value     Get a value  arrayGet          local array  1 index  2     local i    array   index      printf   s      i       Let s use it     sound moo   animal cow   declare  animals  sound  animal    arrayGet animals   sound  cow   Note  declare cannot be put in a function   Any use of declare inside a bash function turns the variable it creates local to the scope of that function  meaning we can t access or modify global arrays with it   In bash 4 you can use declare -g to declare global variables - but in bash 4  you can use associative arrays in the first place  avoiding this workaround    Summary    Upgrade to bash 4 and use declare -A for associative arrays  Use the declare option if you can t upgrade  Consider using awk instead and avoid the issue altogether

User · Answer

You can further modify the hput   hget   interface so that you have named hashes as follows   hput         eval   1   2    3     hget         eval echo       1 2   hash       and then  hput capitals France Paris hput capitals Netherlands Amsterdam hput capitals Spain Madrid echo  hget capitals France  and  hget capitals Netherlands  and  hget capitals Spain    This lets you define other maps that don t conflict  e g    rcapitals  which does country lookup by capital city   But  either way  I think you ll find that this is all pretty terrible  performance-wise   If you really want fast hash lookup  there s a terrible  terrible hack that actually works really well  It is this  write your key values out to a temporary file  one-per line  then use  grep    key   to get them out  using pipes with cut or awk or sed or whatever to retrieve the values   Like I said  it sounds terrible  and it sounds like it ought to be slow and do all sorts of unnecessary IO  but in practice it is very fast  disk cache is awesome  ain t it    even for very large hash tables  You have to enforce key uniqueness yourself  etc  Even if you only have a few hundred entries  the output file grep combo is going to be quite a bit faster - in my experience several times faster  It also eats less memory   Here s one way to do it   hinit         rm -f  tmp hashmap  1    hput         echo   2  3   gt  gt   tmp hashmap  1    hget         grep    2    tmp hashmap  1   awk    print  2        hinit capitals hput capitals France Paris hput capitals Netherlands Amsterdam hput capitals Spain Madrid  echo  hget capitals France  and  hget capitals Netherlands  and  hget capitals Spain

User · Answer

I really liked Al P s answer but wanted uniqueness enforced cheaply so I took it one step further - use a directory  There are some obvious limitations  directory file limits  invalid file names  but it should work for most cases   hinit         rm -rf  tmp hashmap  1     mkdir -p  tmp hashmap  1    hput         printf   3   gt   tmp hashmap  1  2    hget         cat  tmp hashmap  1  2    hkeys         ls -1  tmp hashmap  1    hdestroy         rm -rf  tmp hashmap  1    hinit ids  for    i   0  i  lt  10000  i       do     hput ids  key i   value i  done  for    i   0  i  lt  10000  i       do     printf   s n    hget ids  key i    gt   dev null done  hdestroy ids   It also performs a tad bit better in my tests     time bash hash sh  real    0m46 500s user    0m16 767s sys     0m51 473s    time bash dirhash sh  real    0m35 875s user    0m8 002s sys     0m24 666s   Just thought I d pitch in  Cheers   Edit  Adding hdestroy

User · Answer

Consider a solution using the bash builtin read as illustrated within the code snippet from a ufw firewall script that follows  This approach has the advantage of using as many delimited field sets  not just 2  as are desired   We have used the   delimiter because port range specifiers may require a colon  ie 6001 6010      usr bin env bash  readonly connections                                       192 168 1 4 24 tcp 22                               192 168 1 4 24 tcp 53                               192 168 1 4 24 tcp 80                               192 168 1 4 24 tcp 139                               192 168 1 4 24 tcp 443                               192 168 1 4 24 tcp 445                               192 168 1 4 24 tcp 631                               192 168 1 4 24 tcp 5901                               192 168 1 4 24 tcp 6566     function set connections        local range proto port     for fields in   connections         do             IFS      read -r range proto port  lt  lt  lt    fields              ufw allow from   range  proto   proto  to any port   port      done    set connections

User · Answer

This is what I was looking for here   declare -A hashmap hashmap  key    value  hashmap  key2    value2  echo    hashmap  key     for key in    hashmap      do echo  key  done for value in   hashmap      do echo  value  done echo hashmap has    hashmap     elements   This did not work for me with bash 4 1 5   animals     moo    cow

User · Answer

Bash 3 solution   In reading some of the answers I put together a quick little function I would like to contribute back that might help others     Define a hash like this MYHASH   firstName Milan           lastName Adamovsky      Function to get value by key getHashKey        declare -a hash      1      local key   local lookup  2    for key in    hash        do    KEY   key         VALUE   key        if     KEY     lookup       then     echo  VALUE    fi   done       Function to get a list of all keys getHashKeys        declare -a hash      1      local KEY   local VALUE   local key   local lookup  2    for key in    hash        do    KEY   key         VALUE   key        keys     KEY      done    echo  keys       Here we want to get the value of  lastName  echo   getHashKey MYHASH     lastName       Here we want to get all keys echo   getHashKeys MYHASH

User · Answer

I also used the bash4 way but I find and annoying bug.

I needed to update dynamically the associative array content so i used this way:

for instanceId in $instanceList
do
   aws cloudwatch describe-alarms --output json --alarm-name-prefix $instanceId| jq '.["MetricAlarms"][].StateValue'| xargs | grep -E 'ALARM|INSUFFICIENT_DATA'
   [ $? -eq 0 ] && statusCheck+=([$instanceId]="checkKO") || statusCheck+=([$instanceId]="allCheckOk"
done

I find out that with bash 4.3.11 appending to an existing key in the dict resulted in appending the value if already present. So for example after some repetion the content of the value was "checkKOcheckKOallCheckOK" and this was not good.

No problem with bash 4.3.39 where appenging an existent key means to substisture the actuale value if already present.

I solved this just cleaning/declaring the statusCheck associative array before the cicle:

unset statusCheck; declare -A statusCheck

User · Answer

I create HashMaps in bash 3 using dynamic variables. I explained how that works in my answer to: Associative arrays in Shell scripts

Also you can take a look in shell_map, which is a HashMap implementation made in bash 3.

User · Answer

Prior to bash 4 there is no good way to use associative arrays in bash. Your best bet is to use an interpreted language that actually has support for such things, like awk. On the other hand, bash 4 does support them.

As for less good ways in bash 3, here is a reference than might help: http://mywiki.wooledge.org/BashFAQ/006

User · Answer

A coworker just mentioned this thread   I ve independently implemented hash tables within bash  and it s not dependent on version 4   From a blog post of mine in March 2010  before some of the answers here     entitled Hash tables in bash   I previously used cksum to hash but have since translated Java s string hashCode to native bash zsh     Here s the hashing function ht       local h 0 i   for    i 0  i  lt     1   i       do     let  h    h lt  lt 5  - h       printf  d     1  i 1        let  h    h    done   printf   h       Example   myhash  ht foo bar    a value  myhash  ht baz baf    b value   echo   myhash  ht baz baf       b value  echo   myhash        a value b value  though perhaps reversed echo    myhash        2  - there are two values  note  zsh doesn t count right    It s not bidirectional  and the built-in way is a lot better  but neither should really be used anyway   Bash is for quick one-offs  and such things should quite rarely involve complexity that might require hashes  except perhaps in your    bashrc and friends

User · Answer

Two things  you can use memory instead of  tmp in any kernel 2 6 by using  dev shm  Redhat  other distros may vary  Also hget can be reimplemented using read as follows   function hget      while read key idx   do     if    key    2       then       echo  idx       return     fi   done  lt   dev shm hashmap  1     In addition by assuming that all keys are unique  the return short circuits the read loop and prevents having to read through all entries  If your implementation can have duplicate keys  then simply leave out the return  This saves the expense of reading and forking both grep and awk  Using  dev shm for both implementations yielded the following using time hget on a 3 entry hash searching for the last entry    Grep Awk   hget         grep    2    dev shm hashmap  1   awk    print  2          time echo   hget FD oracle  3  real    0m0 011s user    0m0 002s sys     0m0 013s   Read echo     time echo   hget FD oracle  3  real    0m0 004s user    0m0 000s sys     0m0 004s   on multiple invocations I never saw less then a 50  improvement   This can all be attributed to fork over head  due to the use of  dev shm

User · Answer

There s parameter substitution  though it may be un-PC as well    like indirection      bin bash    Array pretending to be a Pythonic dictionary ARRAY    cow moo           dinosaur roar           bird chirp           bash rock     for animal in    ARRAY        do     KEY    animal           VALUE    animal           printf   s likes to  s  n    KEY    VALUE  done  printf   s is an extinct animal which likes to  s n     ARRAY 1           ARRAY 1          The BASH 4 way is better of course  but if you need a hack    only a hack will do  You could search the array hash with similar techniques

User · Answer

hput        eval hash  1    2     hget        eval echo    hash   1   hash     hput France Paris hput Netherlands Amsterdam hput Spain Madrid echo  hget France  and  hget Netherlands  and  hget Spain        sh hash sh Paris and Amsterdam and Madrid

[bash] How to define hash tables in Bash?

The answer is

Just use the file system

Hashtable creation

Add an element

Read an element

Performance

Examples related to bash

Examples related to dictionary

Examples related to hashtable

Examples related to associative-array

Tags