Inserting documents using text format

The TEXT API will read one document for each line in the TEXT file. A regular expression must be provided to determine what will captured. Each capture is copied to a field of the schema.

Use this API to add to the index documents in plain TEXT formats (CSV, TTL).

Requirement: OpenSearchServer v1.5

Call parameters

URL: /services/rest/index/{index_name}/document

Method: PUT

Header:

  • Content-Type (required): text/plain
  • Accept (optional returned type): application/json or application/xml

URL parameters:

  • index_name (required): The name of the index.

Query parameters:

  • pattern (required): A regular expression pattern capturing the field in the text line.
  • field (required): One field for each capture in the regular expression (field mapping).
  • langpos (optional): The number of the capture containing the language.
  • charset (optional): The charset of the text (default is UTF-8).
  • buffersize (optional): The size of the buffer when indexing the lines (default is 100).

Raw data (PUT):
Text lines.

<http://fr.dbpedia.org/resource/!!!> <http://www.w3.org/2000/01/rdf-schema#comment> "!!!, qui se prononce tchik tchik tchik ou à la convenance toute autre syllabe répétée trois fois, est un groupe américain formé pendant l'été 1995 de la fusion d'une partie des groupes Black Liquorice et Popesmashers. Ce nom à but anticonformiste a l'inconvénient d'être, selon Spin Magazine, « le plus dur des noms de groupe pour Google »."@fr .
<http://fr.dbpedia.org/resource/$5000_Reward,_Dead_or_Alive> <http://www.w3.org/2000/01/rdf-schema#comment> "$5000 Reward, Dead or Alive est un film muet américain réalisé par Allan Dwan et sorti en 1911."@fr .
<http://fr.dbpedia.org/resource/$O$> <http://www.w3.org/2000/01/rdf-schema#comment> "$O$ est le premier album du groupe Sud-Africain Die Antwoord."@fr .

Sample call

curl -XPUT -H "Content-Type: text/plain" --upload-file my_file.txt 'http://localhost:8080/services/rest/index/gendarmerie_test/document?pattern=%5E%3C%28%5B%5E%3E%5D*%29%3E+%3C%5B%5E%3E%5D*%3E+%22%28%5B%5E%22%5D*%29%22%40%28%5Ba-zA-Z%5C-_%5D*%29+%5C.%24&field=url&field=abstract&field=lang&langpos=3&charset=UTF-8&buffersize=100'

Success response

The document(s) has been created or updated.

HTTP code:
200

Content (application/json):

{
    "successful": true,
    "info": "95 document(s) updated."
}

Error response

The creation/update failed. The reason is provided in the content.

HTTP code:
500


View/edit on GitHub


comments powered by Disqus