I am new to Elasticsearch and have been entering data manually up until this point. For example I've done something like this:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}'
I now have a .json file and I want to index this into Elasticsearch. I've tried something like this too, but no success:
curl -XPOST 'http://jfblouvmlxecs01:9200/test/test/1' -d lane.json
How do I import a .json file? Are there steps I need to take first to ensure the mapping is correct?
This question is related to
json
elasticsearch
If you are using the elastic search 7.7 or above version then follow below command.
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk? pretty&refresh" --data-binary @"/Users/waseem.khan/waseem/elastic/account.json"
On above file path is /Users/waseem.khan/waseem/elastic/account.json
.
If you are using elastic search 6.x version then you can use the below command.
curl -X POST localhost:9200/bank/_bulk?pretty&refresh --data-binary @"/Users/waseem.khan/waseem/elastic/account.json" -H 'Content-Type: application/json'
Note: Make sure in your .json file at the end you will add the one empty line otherwise you will be getting below exception.
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
}
],
"type" : "illegal_argument_exception",
"reason" : "The bulk request must be terminated by a newline [\n]"
},
`enter code here`"status" : 400
Adding to KenH's answer
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests
You can replace @requests
with @complete_path_to_json_file
Note: @
is important before the file path
I wrote some code to expose the Elasticsearch API via a Filesystem API.
It is good idea for clear export/import of data for example.
I created prototype elasticdriver. It is based on FUSE
One thing I've not seen anyone mention: the JSON file must have one line specifying the index the next line belongs to, for every line of the "pure" JSON file.
I.E.
{"index":{"_index":"shakespeare","_type":"act","_id":0}}
{"line_id":1,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}
Without that, nothing works, and it won't tell you why
You are using
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests
If 'requests' is a json file then you have to change this to
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests.json
Now before this, if your json file is not indexed, you have to insert an index line before each line inside the json file. You can do this with JQ. Refer below link: http://kevinmarsh.com/2014/10/23/using-jq-to-import-json-into-elasticsearch.html
Go to elasticsearch tutorials (example the shakespeare tutorial) and download the json file sample used and have a look at it. In front of each json object (each individual line) there is an index line. This is what you are looking for after using the jq command. This format is mandatory to use the bulk API, plain json files wont work.
I'm the author of elasticsearch_loader
I wrote ESL for this exact problem.
You can download it with pip:
pip install elasticsearch-loader
And then you will be able to load json files into elasticsearch by issuing:
elasticsearch_loader --index incidents --type incident json file1.json file2.json
If you want to import a json file into Elasticsearch and create an index, use this Python script.
import json
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
i = 0
with open('el_dharan.json') as raw_data:
json_docs = json.load(raw_data)
for json_doc in json_docs:
i = i + 1
es.index(index='ind_dharan', doc_type='doc_dharan', id=i, body=json.dumps(json_doc))
We made a little tool for this type of thing https://github.com/taskrabbit/elasticsearch-dump
Per the current docs, https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html:
If you’re providing text file input to curl, you must use the --data-binary flag instead of plain -d. The latter doesn’t preserve newlines.
Example:
$ curl -s -XPOST localhost:9200/_bulk --data-binary @requests
I just made sure that I am in the same directory as the json file and then simply ran this
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/product/default/_bulk?pretty --data-binary @product.json
So if you too make sure you are at the same directory and run it this way. Note: product/default/ in the command is something specific to my environment. you can omit it or replace it with whatever is relevant to you.
if you are using VirtualBox and UBUNTU in it or you are simply using UBUNTU then it can be useful
wget https://github.com/andrewvc/ee-datasets/archive/master.zip
sudo apt-get install unzip (only if unzip module is not installed)
unzip master.zip
cd ee-datasets
java -jar elastic-loader.jar http://localhost:9200 datasets/movie_db.eloader
just get postman from https://www.getpostman.com/docs/environments give it the file location with /test/test/1/_bulk?pretty command.
As of Elasticsearch 7.7, you have to specify the content type also:
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/_bulk --data-binary @<absolute path to JSON file>
Source: Stackoverflow.com