You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
245 lines
5.9 KiB
245 lines
5.9 KiB
|
|
== Indexing Documents
|
|
|
|
When you add documents to Elasticsearch, you index JSON documents. This maps naturally to PHP associative arrays, since
|
|
they can easily be encoded in JSON. Therefore, in Elasticsearch-PHP you create and pass associative arrays to the client
|
|
for indexing. There are several methods of ingesting data into Elasticsearch, which we will cover here.
|
|
|
|
=== Single document indexing
|
|
|
|
When indexing a document, you can either provide an ID or let elasticsearch generate one for you.
|
|
|
|
{zwsp} +
|
|
|
|
.Providing an ID value
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id',
|
|
'body' => [ 'testField' => 'abc']
|
|
];
|
|
|
|
// Document will be indexed to my_index/my_type/my_id
|
|
$response = $client->index($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
.Omitting an ID value
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'body' => [ 'testField' => 'abc']
|
|
];
|
|
|
|
// Document will be indexed to my_index/my_type/<autogenerated ID>
|
|
$response = $client->index($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
If you need to set other parameters, such as a `routing` value, you specify those in the array alongside the `index`,
|
|
`type`, etc. For example, let's set the routing and timestamp of this new document:
|
|
|
|
.Additional parameters
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id',
|
|
'routing' => 'company_xyz',
|
|
'timestamp' => strtotime("-1d"),
|
|
'body' => [ 'testField' => 'abc']
|
|
];
|
|
|
|
|
|
$response = $client->index($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
=== Bulk Indexing
|
|
|
|
Elasticsearch also supports bulk indexing of documents. The bulk API expects JSON action/metadata pairs, separated by
|
|
newlines. When constructing your documents in PHP, the process is similar. You first create an action array object
|
|
(e.g. `index` object), then you create a document body object. This process repeats for all your documents.
|
|
|
|
A simple example might look like this:
|
|
|
|
.Bulk indexing with PHP arrays
|
|
[source,php]
|
|
----
|
|
for($i = 0; $i < 100; $i++) {
|
|
$params['body'][] = [
|
|
'index' => [
|
|
'_index' => 'my_index',
|
|
'_type' => 'my_type',
|
|
]
|
|
];
|
|
|
|
$params['body'][] = [
|
|
'my_field' => 'my_value',
|
|
'second_field' => 'some more values'
|
|
];
|
|
}
|
|
|
|
$responses = $client->bulk($params);
|
|
----
|
|
|
|
In practice, you'll likely have more documents than you want to send in a single bulk request. In that case, you need
|
|
to batch up the requests and periodically send them:
|
|
|
|
|
|
.Bulk indexing with batches
|
|
[source,php]
|
|
----
|
|
$params = ['body' => []];
|
|
|
|
for ($i = 1; $i <= 1234567; $i++) {
|
|
$params['body'][] = [
|
|
'index' => [
|
|
'_index' => 'my_index',
|
|
'_type' => 'my_type',
|
|
'_id' => $i
|
|
]
|
|
];
|
|
|
|
$params['body'][] = [
|
|
'my_field' => 'my_value',
|
|
'second_field' => 'some more values'
|
|
];
|
|
|
|
// Every 1000 documents stop and send the bulk request
|
|
if ($i % 1000 == 0) {
|
|
$responses = $client->bulk($params);
|
|
|
|
// erase the old bulk request
|
|
$params = ['body' => []];
|
|
|
|
// unset the bulk response when you are done to save memory
|
|
unset($responses);
|
|
}
|
|
}
|
|
|
|
// Send the last batch if it exists
|
|
if (!empty($params['body'])) {
|
|
$responses = $client->bulk($params);
|
|
}
|
|
----
|
|
|
|
== Getting Documents
|
|
|
|
Elasticsearch provides realtime GETs of documents. This means that as soon as the document has been indexed and your
|
|
client receives an acknowledgement, you can immediately retrieve the document from any shard. Get operations are
|
|
performed by requesting a document by its full `index/type/id` path:
|
|
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id'
|
|
];
|
|
|
|
// Get doc at /my_index/my_type/my_id
|
|
$response = $client->get($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
== Updating Documents
|
|
|
|
Updating a document allows you to either completely replace the contents of the existing document, or perform a partial
|
|
update to just some fields (either changing an existing field, or adding new fields).
|
|
|
|
=== Partial document update
|
|
|
|
If you want to partially update a document (e.g. change an existing field, or add a new one) you can do so by specifying
|
|
the `doc` in the `body` parameter. This will merge the fields in `doc` with the existing document:
|
|
|
|
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id',
|
|
'body' => [
|
|
'doc' => [
|
|
'new_field' => 'abc'
|
|
]
|
|
]
|
|
];
|
|
|
|
// Update doc at /my_index/my_type/my_id
|
|
$response = $client->update($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
=== Scripted document update
|
|
|
|
Sometimes you need to perform a scripted update, such as incrementing a counter or appending a new value to an array.
|
|
To perform a scripted update, you need to provide a script and (usually) a set of parameters:
|
|
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id',
|
|
'body' => [
|
|
'script' => 'ctx._source.counter += count',
|
|
'params' => [
|
|
'count' => 4
|
|
]
|
|
]
|
|
];
|
|
|
|
$response = $client->update($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
=== Upserts
|
|
|
|
Upserts are "Update or Insert" operations. This means an upsert will attempt to run your update script, but if the document
|
|
does not exist (or the field you are trying to update doesn't exist), default values will be inserted instead.
|
|
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id',
|
|
'body' => [
|
|
'script' => 'ctx._source.counter += count',
|
|
'params' => [
|
|
'count' => 4
|
|
],
|
|
'upsert' => [
|
|
'counter' => 1
|
|
]
|
|
]
|
|
];
|
|
|
|
$response = $client->update($params);
|
|
----
|
|
{zwsp} +
|
|
|
|
|
|
== Deleting documents
|
|
|
|
Finally, you can delete documents by specifying their full `/index/type/id` path:
|
|
|
|
[source,php]
|
|
----
|
|
$params = [
|
|
'index' => 'my_index',
|
|
'type' => 'my_type',
|
|
'id' => 'my_id'
|
|
];
|
|
|
|
// Delete doc at /my_index/my_type/my_id
|
|
$response = $client->delete($params);
|
|
----
|
|
{zwsp} +
|
|
|