ElasticSearch Unassigned Shards how to fix

Question

I have an ES cluster with 4 nodes   number of replicas  1 search01 - master  false  data  false search02 - master  true  data  true search03 - master  false  data  true search04 - master  false  data  true   I had to restart search03  and when it came back  it rejoined the cluster no problem  but left 7 unassigned shards laying about          cluster name     tweedle      status     yellow      timed out    false     number of nodes    4     number of data nodes    3     active primary shards    15     active shards    23     relocating shards    0     initializing shards    0     unassigned shards    7     Now my cluster is in yellow state   What is the best way to resolve this issue    Delete  cancel  the shards  Move the shards to another node  Allocate the shards to the node  Update  number of replicas  to 2  Something else entirely    Interestingly  when a new index was added  that node started working on it and played nice with the rest of the cluster  it just left the unassigned shards laying about   Follow on question  am I doing something wrong to cause this to happen in the first place   I don t have much confidence in a cluster that behaves this way when a node is restarted     NOTE  If you re running a single node cluster for some reason  you might simply need to do the following   curl -XPUT  localhost 9200  settings  -d          index               number of replicas    0

User · Answer

Might help  but I had this issue when trying to run ES in embedded mode   Fix was to make sure the Node had local true  set

User · Answer

In my case an old node with old shares was joining the cluster  so we had to shutdown the old node and delete the indices with unassigned shards

User · Answer

By default  Elasticsearch will re-assign shards to nodes dynamically  However  if you ve disabled shard allocation  perhaps you did a rolling restart and forgot to re-enable it   you can re-enable shard allocation     v0 90 x and earlier curl -XPUT  localhost 9200  settings  -d         index routing allocation disable allocation   false       v1 0  curl -XPUT  localhost 9200  cluster settings  -d         transient               cluster routing allocation enable     all             Elasticsearch will then reassign shards as normal  This can be slow  consider raising indices recovery max bytes per sec and cluster routing allocation node concurrent recoveries to speed it up   If you re still seeing issues  something else is probably wrong  so look in your Elasticsearch logs for errors  If you see EsRejectedExecutionException your thread pools may be too small   Finally  you can explicitly reassign a shard to a node with the reroute API     Suppose shard 4 of index  my-index  is unassigned  so you want to   assign it to node search03  curl -XPOST  localhost 9200  cluster reroute  -d         commands               allocate                  index    my-index                shard   4               node    search03                allow primary   1

User · Answer

This little bash script will brute force reassign  you may lose data   NODE  YOUR NODE NAME  IFS    n  for line in   curl -s  localhost 9200  cat shards    fgrep UNASSIGNED   do   INDEX   echo  line    awk   print  1       SHARD   echo  line    awk   print  2        curl -XPOST  localhost 9200  cluster reroute  -d          commands                            allocate                      index      INDEX                     shard     SHARD                    node      NODE                     allow primary   true                                  done

User · Answer

For me  this was resolved by running this from the dev console    POST   cluster reroute retry failed          I started by looking at the index list to see which indices were red and then ran    get   cat shards h  INDEXNAME  shard prirep state unassigned reason   and saw that it had shards stuck in ALLOCATION FAILED state  so running the retry above caused them to re-try the allocation

User · Answer

I had two indices with unassigned shards that didn t seem to be self-healing   I eventually resolved this by temporarily adding an extra data-node 1    After the indices became healthy and everything stabilized to green  I removed the extra node and the system was able to rebalance  again  and settle on a healthy state   It s a good idea to avoid killing multiple data nodes at once  which is how I got into this state    Likely  I had failed to preserve any copies replicas for at least one of the shards   Luckily  Kubernetes kept the disk storage around  and reused it when I relaunched the data-node        Some time has passed     Well  this time just adding a node didn t seem to be working  after waiting several minutes for something to happen   so I started poking around in the REST API   GET   cluster allocation explain   This showed my new node with  decision    YES    By the way  all of the pre-existing nodes had  decision    NO  due to  the node is above the low watermark cluster setting    So this was probably a different case than the one I had addressed previously   Then I made the following simple POST 2  with no body  which kicked things into gear     POST   cluster reroute     Other notes    Very helpful  https   datadoghq com blog elasticsearch-unassigned-shards Something else that may work  Set cluster concurrent rebalance to 0  then to null -- as I demonstrate here       1  Pretty easy to do in Kubernetes if you have enough headroom  just scale out the stateful set via the dashboard    2  Using the Kibana  Dev Tools  interface  I didn t have to bother with SSH exec shells

User · Answer

In my case  when I create a new index then the default number of replicas is set as 1  And the number of nodes in my cluster was only one so there was no extra node to create the replica  so the health was turning to yellow   So when I created the index with settings property and set the  number of replicas as 0  Then it worked fine  Hope this helps   PUT  customer        settings              number of replicas   0

User · Answer

When dealing with corrupted shards you can set the replication factor to 0 and then set it back to the original value  This should clear up most if not all your corrupted shards and relocate the new replicas in the cluster   Setting indexes with unassigned replicas to use a replication factor of 0   curl -XGET http   localhost 9200  cat shards      grep UNASSIGNED   grep   r        awk   print  1        xargs -I    curl -XPUT http   localhost 9200     settings -H  Content-Type  application json      -d     index     number of replicas   0      Setting them back to 1   curl -XGET http   localhost 9200  cat shards      awk   print  1        xargs -I    curl -XPUT http   localhost 9200     settings -H  Content-Type  application json      -d     index     number of replicas   1      Note  Do not run this if you have different replication factors for different indexes  This would hardcode the replication factor for all indexes to 1

User · Answer

I tried several of the suggestions above and unfortunately none of them worked  We have a  Log  index in our lower environment where apps write their errors  It is a single node cluster  What solved it for me was checking the YML configuration file for the node and seeing that it still had the default setting  gateway expected nodes  2   This was overriding any other settings we had  Whenever we would create an index on this node it would try to spread 3 out of 5 shards to the phantom 2nd node  These would therefore appear as unassigned and they could never be moved to the 1st and only node   The solution was editing the config  changing the setting  gateway expected nodes  to 1  so it would quit looking for its never-to-be-found brother in the cluster  and restarting the Elastic service instance  Also  I had to delete the index  and create a new one  After creating the index  the shards all showed up on the 1st and only node  and none were unassigned     Set how many nodes are expected in this cluster  Once these N nodes   are up  and recover after nodes is met   begin recovery process immediately    without waiting for recover after time to expire       gateway expected nodes  2 gateway expected nodes  1

User · Answer

Maybe it helps someone  but I had the same issue and it was due to a lack of storage space caused by a log getting way too big   Hope it helps someone

User · Answer

First use cluster health API to get the current health of cluster  where RED means one or more primary shards missing and Yellow means one of more replica shards are missing  After this use the cluster allocation explain API to know why a particular shard is missing and elasticsearch is not able to allocate it on data-node  Once you get the exact root cause  try to address the issue  which often requires  changing few cluster settings mentioned in  wilfred answer earlier  But in some cases  if its replica shards  and you have another copy of same shard ie another replica  available  you can reduce the replica count using update replica setting and later on again increase it  if you need it  Apart from above  if your cluster allocation API mention it doesn yt have a valid data nodes to allocate a shard  than you need to add a new data nodes  or change the shard allocation awareness settings

User · Answer

Elasticsearch automatically allocates shards if the below config is set to all  This config can be set using a rest api as well  cluster routing allocation enable  all  If even after application of the below config  es fails to assign the shards automatically  then you have to force assign the shards yourself  ES official link for this   I have written a script to force assign all unassigned shards across cluster   below array contains list of nodes among which you want to balance the unassigned shards     bin bash array   node1 node2 node3   node counter 0 length    array     IFS    n  for line in   curl -s  http   127 0 0 1 9200  cat shards    fgrep UNASSIGNED   do     INDEX   echo  line    awk   print  1         SHARD   echo  line    awk   print  2         NODE   array  node counter       echo  NODE     curl -XPOST  http   127 0 0 1 9200  cluster reroute  -d             commands                            allocate                      index      INDEX                     shard     SHARD                    node      NODE                     allow primary   true                                              node counter     node counter  length  1   done

User · Answer

This may be a cause of the disk space as well  In Elasticsearch 7 5 2  by default  if disk usage is above 85   then replica shards are not assigned to any other node  This can be fixed by setting a different threshold or by disabling it either in the  yml or via Kibana PUT  cluster settings      quot persistent quot          quot cluster routing allocation disk threshold enabled quot    quot false quot

User · Answer

OK  I ve solved this with some help from ES support   Issue the following command to the API on all nodes  or the nodes you believe to be the cause of the problem    curl -XPUT  localhost 9200  lt index gt   settings        -d    index routing allocation disable allocation   false     where  lt index gt  is the index you believe to be the culprit   If you have no idea  just run this on all nodes   curl -XPUT  localhost 9200  settings        -d    index routing allocation disable allocation   false     I also added this line to my yaml config and since then  any restarts of the server service have been problem free   The shards re-allocated back immediately   FWIW  to answer an oft sought after question  set MAX HEAP SIZE to 30G unless your machine has less than 60G RAM  in which case set it to half the available memory   References   Shard Allocation Awareness

User · Answer

I had the same problem but the root cause was a difference in version numbers  1 4 2 on two nodes  with problems  and 1 4 4 on two nodes  ok    The first and second answers  setting  index routing allocation disable allocation  to false and setting  cluster routing allocation enable  to  all   did not work    However  the answer by  Wilfred Hughes  setting  cluster routing allocation enable  to  all  using transient  gave me an error with the following statement       NO target node version  1 4 2  is older than source node version    1 4 4      After updating the old nodes to 1 4 4 these nodes started to resnc with the other good nodes

User · Answer

I ve stuck today with the same issue of shards allocation  The script that W  Andrew Loe III has proposed in his answer didn t work for me  so I modified it a little and it finally worked      usr bin env bash    The script performs force relocation of all unassigned shards     of all indices to a specified node  NODE variable   ES HOST   lt elasticsearch host gt   NODE   lt node name gt    curl   ES HOST  9200  cat shards  gt  shards grep  UNASSIGNED  shards  gt  unassigned shards  while read LINE  do   IFS     read -r -a ARRAY  lt  lt  lt    LINE    INDEX   ARRAY 0     SHARD   ARRAY 1      echo  Relocating     echo  Index    INDEX     echo  Shard    SHARD     echo  To node    NODE      curl -s -XPOST    ES HOST  9200  cluster reroute  -d          commands                          allocate                   index        INDEX                  shard      SHARD                node        NODE                  allow primary    true                                  echo   echo  ------------------------------  done  lt unassigned shards  rm shards rm unassigned shards  exit 0   Now  I m not kind of a Bash guru  but the script really worked for my case  Note  that you ll need to specify appropriate values for  ES HOST  and  NODE  variables

User · Answer

I tried to delete unassigned shards or manually assign them to particular data node  It didn t work because unassigned shards kept appearing and health status was  red  over and over  Then I noticed that one of the data nodes stuck in  restart  state  I reduce number of data nodes  killed it  Problem is not reproducible anymore

User · Answer

I ran into exactly the same issue  This can be prevented by temporarily setting the shard allocation to false before restarting elasticsearch  but this does not fix the unassigned shards if they are already there   In my case it was caused by lack of free disk space on the data node  The unassigned shards where still on the data node after the restart but they where not recognized by the master    Just cleaning 1 of the nodes from the disk got the replication process started for me  This is a rather slow process because all the data has to be copied from 1 data node to the other

User · Answer

I just first increased the       index number of replicas     by 1  wait until nodes are synced  then decreased it by 1 afterwards  which effectively removes the unassigned shards and cluster is Green again without the risk of losing any data    I believe there are better ways but this is easier for me   Hope this helps

User · Answer

Another possible reason for unassigned shards is that your cluster is running more than one version of the Elasticsearch binary      shard replication from the more recent version to the previous   versions will not work   This can be a root cause for unassigned shards    Elastic Documentation - Rolling Upgrade Process

User · Answer

I also meet this situation and finally fixed it   Firstly  I will describe my situation  I have two nodes in ElasticSearch cluster  they can find each other  but when I created a index with settings  number of replicas    2   number of shards    5  ES show yellow signal and unassigned shards is 5   The problem occurs because the value of number of replicas  when I set its value with 1  all is fine

User · Answer

The only thing that worked for me was changing the number of replicas  I had 2 replicas  so I changed it to 1 and then changed back to 2    First   PUT  myindex  settings        index               number of replicas    1            Then   PUT  myindex  settings        index               number of replicas    2             I Already asnwered it in this question

User · Answer

In my case  the hard disk space upper bound was reached   Look at this article  https   www elastic co guide en elasticsearch reference current disk-allocator html  Basically  I ran   PUT   cluster settings      transient          cluster routing allocation disk watermark low    90         cluster routing allocation disk watermark high    95         cluster info update interval    1m          So that it will allocate if  lt 90  hard disk space used  and move a shard to another machine in the cluster if  95  hard disk space used  and it checks every 1 minute

User · Answer

I also encountered similar error  It happened to me because one of my data node was full and due to which shards allocation failed  If unassigned shards are there and your cluster is RED and few indices also RED  in that case I  have followed below steps and these worked like a champ  in kibana dev tool- GET  cluster allocation explain  If any unassigned shards are there then you will get details else will throw ERROR  simply running below command will solve everything- POST  cluster reroute retry failed  Thanks to - https   github com elastic elasticsearch issues 23199 issuecomment-280272888

User · Answer

I was having this issue as well  and I found an easy way to resolve it    Get the index of unassigned shards    curl -XGET http   172 16 4 140 9200  cat shards  Install curator Tools  and use it to delete index    curator --host 172 16 4 140 delete indices --older-than 1          --timestring   Y  m  d  --time-unit days --prefix logstash   NOTE  In my case  the index is logstash of the day 2016-04-21 Then check the shards again  all the unassigned shards go away

[elasticsearch] ElasticSearch: Unassigned Shards, how to fix?

Examples related to elasticsearch

Examples related to sharding

Examples related to master