An end-to-end natural language training data platform.


Markers helps your team generate training data quickly so that the time between idea and working model is small. Once your model's in production, Markers will help you monitor its accuracy by comparing its output against your training data.


Getting your data into Markers is a breeze: upload documents by hand or, for more complex workflows, generate an API key and programatically send slices of your data.

When it's time to export your training data, download the entire dataset or filter out unwanted bad labels. Save those settings and point your training tasks at our API to always receive the latest data. Exported datasets are versioned to ensure reproducibility.


Once your data's in Markers, it's time to get to work. Divide labeling tasks between your team's workers manually, randomly, or with a custom strategy. Want to quickly get through your corpus but still have some overlap to quality-check your annotations? We've got that covered.

If you're using our API to upload documents and have set up automatic assignment, these documents will be automatically assigned to your team and we'll let them know there's work to be done.

Markers supports named entity recognition (fragment labeling) and text classification (document labeling). We also support nested document structures (e.g., conversational data).


As datasets grow, they tend to become harder to manage. Markers provides tools to make training data management easier.

Visualize your label distributions and ensure your training distribution matches your test (or prod) distributions. Track consensus between annotators with an agreeableness matrix. If some data looks wrong, simply exclude it from the dataset.


GDPR's not the only regulation forcing companies to reconsider their data practices. Many US states have begun passing new consumer data protection bills with new requirements for companies: e.g., the California Consumer Privacy Act of 2018 requires companies respond to requests around personal information (amongst other things). And New York City Council unanimously voted for an "algorithmic transparency" bill mandating introspectability of the city's "automated decision systems".

The trend is clear: governments are getting more and more involved in protecting individual's data rights. So not only is data transparency good for strictly data science concerns, but also for addressing these regulatory challenges.

Markers provides tools for you to introspect your training data: check the label distributions for bias or perform a full-text search a specific individual and remove their records, if necessary.

Interested? Sign up for early access and launch-related news!


Have questions? Drop us a line at or on Twitter at @markers_ai.