Pipeline API (v2) โ€” How to Use It Pipeline API (v2) โ€” How to Use It

Pipeline API (v2) โ€” How to Use It

๐Ÿงช Beta. This API is in beta โ€” we are not committing to backward compatibility yet. Endpoints and request/response shapes may change without notice during the beta, so build defensively and expect breaking changes. (The path is deliberately /api/v2-beta/โ€ฆ to make this explicit.)

 

Toloka's Pipeline API (v2) lets you operate the Pipeline product programmatically over HTTP: bring your data, run a pipeline you designed, watch it, and pull the results back.

 

๐Ÿ“– API reference (interactive docs)

The complete, always-current reference is public โ€” no login needed to read it:

 

๐Ÿ”‘ Authentication

Every data request carries an API-key header:

Authorization: ApiKey <your-api-key>

The documentation pages themselves are public; all data endpoints require the key.

 

๐Ÿง  How it works โ€” the core concepts

1. You design pipelines in the Toloka app, by hand. The API does not build a pipeline from scratch. You first create a project and design its pipeline โ€” the workflow itself (for example, "annotate bounding boxes on these images") โ€” once, in the app UI.

2. The API operates existing pipelines. Once a pipeline exists, the API lets you copy it, attach data to it, run it, watch the run, and download results.

3. Everything is run-based, and a run is a snapshot.

Important: You cannot add new items to an existing run. A run processes a fixed set of dataset items, captured when the run starts. There is no "append an item to a running job."

 

To process new data, you create a new run. The loop is always:

  1. Take your pipeline โ€” copy an existing one (or import an exported .pipeline file).

  2. Attach a dataset to it (new data, or a selection of existing items).

  3. Create a run.

  4. Watch the run via the API until it finishes.

So "send more data" or "re-process" always means: a new pipeline bound to that data โ†’ a new run โ€” never a change to a live run. This keeps every run reproducible and auditable.

 

๐Ÿš€ End-to-end workflow

(First build the project and its pipeline in the app. Then, via the API:)

1. Pick or create a project

  • GET /api/v2-beta/projects โ€” list the projects you can access

  • POST /api/v2-beta/projects โ€” create a project (the pipeline graph is still authored in the app)

2. Prepare a dataset

  • POST /api/v2-beta/projects/{projectId}/datasets โ€” create a dataset

  • Add your data โ€” two ways, depending on size:

    • Small payloads (โ‰ค 2 MB), one synchronous call: POST /api/v2-beta/datasets/{datasetId}/items โ€” send rows inline as a JSON array and, optionally, declare the fields to create. Each row is a JSON object keyed by field name.

    • Large files (upload straight to storage, bypassing our servers):

      1. POST /api/v2-beta/datasets/{datasetId}/uploads โ†’ returns a one-time uploadUrl (you can declare fields here too)

      2. Upload the file bytes โ€” a JSON array of row objects โ€” directly to that uploadUrl

      3. POST /api/v2-beta/uploads/{uploadId}/finalize โ€” kicks off ingestion

      4. GET /api/v2-beta/uploads/{uploadId} โ€” poll until the status is succeeded

  • GET /api/v2-beta/datasets/{datasetId}/fields โ€” list a dataset's fields (e.g. a url field for image links)

Fields are declared when you add data (on items or uploads); they're created up front and filled as rows land. When you declare fields, each row is mapped into them by name; omit fields and the whole row is stored under an uploaded field. (There is no standalone "add a field to an existing dataset" โ€” adding a field after data already exists would require re-propagating values, so declare fields with the data instead.)

 

3. Bring your pipeline

  • Copy an existing pipeline: POST /api/v2-beta/pipelines/{pipelineId}/copy
  • Or import a .pipeline file exported from the app: POST /api/v2-beta/projects/{projectId}/pipelines/import
  • You can also export any pipeline as a portable file: GET /api/v2-beta/pipelines/{pipelineId}/export

4. Attach the dataset to the pipeline

  • POST /api/v2-beta/pipelines/{pipelineId}/dataset โ€” binds the dataset to the pipeline

    • Optional itemLimit โ€” cap how many items the run processes

    • Optional itemFilter โ€” select which items (e.g. by field values)

5. Start the run

  • POST /api/v2-beta/pipelines/{pipelineId}/runs

6. Watch the run

  • GET /api/v2-beta/runs/{runId} โ€” status and progress (total / completed / failed โ€ฆ)

  • GET /api/v2-beta/pipelines/{pipelineId}/runs โ€” list all runs of a pipeline

7. Get the results

  • POST /api/v2-beta/datasets/{datasetId}/exports โ€” request an export of the fields you want (identify each field by its name or id)

  • GET /api/v2-beta/exports/{exportId} โ€” poll until ready; the response includes a downloadUrl

  • Download the result file from downloadUrl

 

โ™ป๏ธ "I have more data to process" 

  1. Copy the pipeline โ€” POST /api/v2-beta/pipelines/{pipelineId}/copy

  2. Attach the new dataset / items โ€” POST /api/v2-beta/pipelines/{pipelineId}/dataset

  3. Run โ€” POST /api/v2-beta/pipelines/{pipelineId}/runs

  4. Watch โ€” GET /api/v2-beta/runs/{runId}

Each batch of data gets its own pipeline and its own run.


For exact request/response shapes, parameters, and examples, use the interactive docs linked at the top โ€” every endpoint is documented there.

Was this article helpful?

0 out of 0 found this helpful

Articles in this section