Extraction results updates
Overview
DocHeart is a great extraction system. However, no extraction system is 100% foolproof. Thus, there is a small chance that an extraction might go wrong. But, luckily, we have your back! DocHeart allows you to manually correct values that were misextracted. This reference page will show you how!
The Update Object
Before we move on to the update endpoint, it is important to understand DocHeart’s update procedure and it’s data model. As explained in the section about extraction results queries the data extracted by the extraction pipeline is contained within a set of objects called extracted targets. As a result of this, the update objects define operations on the extracted targets.
An update object contains the following fields:
- extracted_target_id - the id of the extracted target being updated by the update object.
- type - the type of the extracted target being updated (it can be either field or table)
- new_extracted_values - a list of extracted values for the extracted target that will override the list already contained in the extracted target. The ids of the new extracted values can be optionally provided. When not provided, DocHeart will automatically generate some unique id.
Examples of update objects are provided below:
-
{ "type": "field", "extracted_target_id": "960d0a18-e44a-4a13-99f4-dc871d1bb018", "new_extracted_values": [ { // no id, DocHeart will generate a unique id id for this extracted value "extracted_field": "new value" }, { // DocHeart will use the provided id for the extracted value "id": "fe8d3e39-1b97-4a95-a96f-6d3a1f2ab145", "extracted_field": "new value with id" } ] }
-
{ "type": "table", "extracted_target_id": "182fc714-6303-4351-be7a-4f957c0394f7", "new_extracted_values": [ { // DocHeart will use the provided id for the extracted value "id": "fh8d3e39-1b97-4a95-a96f-6d3a1f2ab324", "extracted_cell": "cell 1", "table_id": "6c6fa88e-10db-4353-9a97-9936f4eeac66", "row": 0, "column": 0 }, { // no id, DocHeart will generate a unique id for this extracted value "extracted_cell": "cell 2", "table_id": "6c6fa88e-10db-4353-9a97-9936f4eeac66", "row": 0, "column": 1 } ] }
Update Endpoint
The update endpoint is an endpoint that is used to update one or multiple extraction targets within an extracted data object. To perform an update, you need to provide the extraction_id corresponding to the extraction for which you want to update the results. Additionally, you need to provide the list of update objects specifying the updates to be applied.
The update object will return the version of the extracted data object that results from applying the updates.
Below you can find an example of how to perform the request:
-
const response = await fetch("https://api.docheart.ai/docheart/api/extraction/result/update", { method: "POST", headers: { "X-Api": "<api_token>" }, body: { "extraction_id": "<the extraction id corresponding to the result being updated>", "updates": [...] // the list of update objects to be applied } })
-
curl -X POST https://api.docheart.ai/docheart/api/extraction/result/update \ -H "X-api: <api_token>" \ -d '{ "extraction_id": "<the extraction id corresponding to the result being updated>", "updates": [...] // the list of update objects to be applied }'
-
{ "extraction_id": "571fa66c-9ffa-422e-8d60-6e00c4a107f0", "updates": [ { "type": "field", "extracted_target_id": "960d0a18-e44a-4a13-99f4-dc871d1bb018", "new_extracted_values": [ { "extracted_field": "<new value for field>" } ] }, { "type": "table", "extracted_target_id": "182fc714-6303-4351-be7a-4f957c0394f7", "new_extracted_values": [ { "extracted_cell": "<new value for cell 1>", "table_id": "6c6fa88e-10db-4353-9a97-9936f4eeac66", "row": 0, "column": 0 }, { "extracted_cell": "<new value for cell 2>", "table_id": "6c6fa88e-10db-4353-9a97-9936f4eeac66", "row": 0, "column": 1 } ] } ] }
-
{ "id": "65c20d3d84e5d82f3a8a5d0e", "user_id": "7ZawPyIwfZYt3LTlmdRCJhV0vlG3", "extraction_info": { // the id of the extraction info object "id": "65c20d3684e5d82f3a8a5d06", // id unique for the API key used to trigger the extraction "api_key_hash": "c487c3eed215c73541aa86e82a563c895275c1a7936555ecbef873e16440dace", // the id of the extraction "extraction_id": "be18f53b-5cf7-4787-b3d1-cd87810384a7", // the name of the extraction "extraction_name": "API Sync 3 Demo", // the id of the user who triggered the extraction (your id) "trigger_user_id": "7ZawPyIwfZYt3LTlmdRCJhV0vlG3", // the unix timestamp at which the extraction was triggered "trigger_unix_timestamp": 1707216182.5737329, // whether the extraction was synchronous or asynchronous "trigger_type": "sync", // how the document was inputted ("raw", "hosted", or "docheart"), as explained in the section about extraction triggering "input_type": "raw", // whether or not the extracted data was saved within DocHeart's database "extracted_data_saved": true, // the id of the configuration used for the extraction "used_extraction_configuration_id": "65c1d47df09cb9a0e925972e", // the name of the configuration used for the extraction "used_extraction_configuration_name": "Demo Invoice Mock", "preview": false }, // the list of extracted targets collected in this extraction "extracted_targets": [ { "id": "906fd471-973a-4973-993f-06e1c498af0a", "used_extraction_group_id": "f42b849a-c338-4408-87e4-5829385b1908", "used_extraction_target_id": "c0e671db-1de2-4e68-ba07-f42f971bd3dc", "type": "field", "extracted_values": [ { "id": "fe8d3e39-1b97-4a95-a96f-6d3a1f2ab145", "extracted_field": "Fountain Fresh Imports Ltd", "validation_passed": true, "validation_messages": [] } ], "confidence": 0.9979146003723145 }, { "id": "7099e694-1378-40ff-9169-1001d0e71555", "used_extraction_group_id": "83be75ab-953e-4766-8565-359c9861961b", "used_extraction_target_id": "59e9b850-fbbf-4463-b38a-001472a5e411", "type": "table", "extracted_values": [ { "id": "b8defa1d-f05d-4095-96c8-d9845ebcaae6", "extracted_cell": "Total Amount", "table_id": "4395d1e6-da7a-4f98-bf99-9b873a02f388", "row": 0, "column": 0, "validation_passed": true, "validation_messages": [] }, { "id": "fc8346e3-5b16-45ad-968d-ab7b77ff8781", "extracted_cell": "£\n44", "table_id": "4395d1e6-da7a-4f98-bf99-9b873a02f388", "row": 0, "column": 1, "validation_passed": true, "validation_messages": [] }, ... ], "confidence": 0.9890895883242289 } ], // the unix timestamp at which the extracted data object was creation "creation_unix_timestamp": 1707216189.0296614, // the DocHeart Vault storage id of the extracted document "extracted_document_storage_id": "ae378e24-f925-4dbe-9c55-e222059b16fe", // a copy of the configuration object used in the extraction (understanding the details of this object is not necessary) "used_extraction_configuration": {...} }