Triggering an extraction

Overview

This guide will provide you with the necessary information to trigger a simple extraction. DocHeart supports multiple ways of triggering extractions, that differ both in terms of how the document is inputted and in terms of the nature of the interaction (synchronous vs. asynchronous).

In this guide, we will focus on synchronous direct extractions, as they are the most simple. As the name suggests, in a synchronous direct extraction the document is passed directly into the body of the request and the extracted data is returned synchronously in the response. This means that there will be a significant delay between the HTTP request and the HTTP response, as DocHeart needs to process the entire document before returning a response.

We will rely on the extraction configuration created in the previous guides to extract data from a document entitled FieldLab Invoice, which follows the same layout and structure as the Docklab Invoice seen before. The document can be downloaded from here.

Extraction triggering

As mentioned above, we will extract data from the document below, entitled FieldLab Invoice:

  • doc_page_1
  • doc_page_2

The first step in our extraction process is converting the contents of the document to a base 64 string. You can do this by making use of some specific library in the programming language that you are using. Since this guide is programming language agnostic, we will make use of an online service to perform the conversion. The base64 encoding of the pdf should look as follows:

JVBERi0xLjUKJcOkw7zDtsOfCjIgMCBvYmoKPDwvTG...

For the purpose of this tutorial, if you don’t want to perform the encoding to base64 yourself, you can download the ready-made base64 encoding of the FieldLab Invoice from here.

Now that we have the base64 encoded document, we can move on to creating the body of the trigger request:

{
    "configuration_name": "Tutorial Invoice Config.",
    "extraction_name": "Tutorial Extraction",
    "content_type": "application/pdf",
    "save_extraction": true,
    "input_type": "raw",
    "document_raw_data": "JVBERi0xLjU..."
}

The configuration name is set to Tutorial Invoice Config. (with a dot at the end), which means that we are going to use the configuration we created in the previous part of the tutorial. The input_type is set to raw, which means that we are passing the raw data of the document directly into the body of the HTTP Request. Last but not least, we have the document_raw_data property which encodes the document itself.

We can now move on to creating the actual request. To proceed, you need the API token which you should have obtained in one of the previous guides.

  • const response = await fetch("https://api.docheart.ai/docheart/api/extraction/trigger_sync", {
        method: "POST",
        headers: {
            "X-Api": "<api_token>"
        },
        body: JSON.stringify ({
            "configuration_name": "Tutorial Invoice Config.",
            "extraction_name": "Tutorial Extraction",
            "content_type": "application/pdf",
            "save_extraction": true,
            "input_type": "raw",
            "document_raw_data": "JVBERi0xLjU..."
        })
    })
    
  • curl -X POST https://api.docheart.ai/docheart/api/extraction/trigger_sync \
    -H "X-api: <api_token>" \
    -d '{
        "configuration_name": "Tutorial Invoice Config.",
        "extraction_name": "Tutorial Extraction",
        "content_type": "application/pdf",
        "save_extraction": true,
        "input_type": "raw",
        "document_raw_data": "JVBERi0xLjU..."
    }'
    

If you’ve done everything correctly, the result of the extraction should look as follows:

{
    "id": "65ca3e0d6a5c7bc736e7bd4b",
    "user_id": "hvmydu3PBLc2tnX7ChLdMoXnTr33",
    "extraction_info": {
        "id": "65ca3e046a5c7bc736e7bd43",
        "api_key_hash": "8fc51cf75b691e2e14d2303e5acfd27da488cd906207c095502f56c7d1e64b4a",
        "extraction_id": "ece3df2c-ef13-49b9-8166-8c4336f7a09a",
        "extraction_name": "Tutorial Extraction",
        "trigger_user_id": "hvmydu3PBLc2tnX7ChLdMoXnTr33",
        "trigger_unix_timestamp": 1707752964.7163458,
        "trigger_type": "sync",
        "input_type": "raw",
        "extracted_data_saved": true,
        "used_extraction_configuration_id": "65ca2708c279a4cbde4d235f",
        "used_extraction_configuration_name": "Tutorial Invoice Config.",
        "preview": false
    },
    "extracted_targets": [
        {
            "id": "27c09168-b32d-4765-9032-68dc3fce917f",
            "used_extraction_group_id": "970cf923-3cc5-4a7f-95ee-2e1cf84a645d",
            "used_extraction_target_id": "5ea2c412-ae9a-495f-9342-b2c1737827e3",
            "type": "field",
            "extracted_values": [
                {
                    "id": "1ecae908-3878-4fa5-af89-e0a88c19d7c7",
                    "extracted_field": "Emma",
                    "validation_passed": false,
                    "validation_messages": [
                        "`Emma` did not match word length condition `= 2`"
                    ]
                }
            ],
            "confidence": 0.9830590057373046
        },
        {
            "id": "5917f06b-dcf9-4acc-80c0-758412d7c216",
            "used_extraction_group_id": "970cf923-3cc5-4a7f-95ee-2e1cf84a645d",
            "used_extraction_target_id": "28526429-39e4-43ea-bc00-13a4bd3bd428",
            "type": "field",
            "extracted_values": [
                {
                    "id": "ebec1984-798b-46d5-8cf8-bf1cafd18d8d",
                    "extracted_field": "(123) 451-7897",
                    "validation_passed": true,
                    "validation_messages": []
                }
            ],
            "confidence": 0.9830590057373046
        },
        {
            "id": "494141da-b9b3-466b-b7e5-da004f10d45d",
            "used_extraction_group_id": "970cf923-3cc5-4a7f-95ee-2e1cf84a645d",
            "used_extraction_target_id": "76d4a768-7f8e-4a17-b7d3-84fc2c3bef68",
            "type": "field",
            "extracted_values": [
                {
                    "id": "bb4e9393-6b8a-46d1-8487-00323afacb56",
                    "extracted_field": "emma@fieldlab.nl",
                    "validation_passed": true,
                    "validation_messages": []
                }
            ],
            "confidence": 0.9830590057373046
        },
        ...
    ],
    "creation_unix_timestamp": 1707752973.4157176,
    "extracted_document_storage_id": "3d5af5bc-269b-4e61-a4ee-a44ba173474b",
    "used_extraction_configuration": {...}
}

More about extraction triggering

To learn more about triggering extractions you can read the full reference.

Next steps

Congratulations! You are now ready to start integrating DocHeart in your application!