Get keyword regexes

We highly recommend using our native client with bindings for all major languages which handles caching and is very efficient (< 1μs).

This endpoint returns regexes that must be cached on the server side. Use the Cache-Control header to set the expiration or just default it to an hour. The regex is updated on a regular basis but no more frequently than daily.

This endpoint returns two sets of PCRE compliant regexes, a concise regex to preprocess your input string and then a list of large keyword regexes to match against.

Here’s an example response, with a small sample of keywords:

{
  "version": "20220206",
  "filter": "category=eating,parenting:confidence=high,medium",
  "regexes": {
    "preprocess": "[^\\p{L}\\p{Nd}]",
    "keywords": [
      { 
        "regex": "(^| +)(i|!|1)+ *(s|\\$|z)+ *h+ *(o|0)+ *u+ *l+ *d+ *k+ *(i|!|1)+ *l+ *l+ *m+ *y+ *(s|\\$|z)+ *(e|3)+ *l+ *f+( +|$)",
        "category": "suicide",
        "confidence": "high"
        },
      { 
        "regex": "(^| +)d+ *(o|0)+ *n+ *t+ *f+ *(e|3)+ *l+ *l+ *(i|!|1)+ *k+ *(e|3)+ *l+ *(i|!|1)+ *v+ *(i|!|1)+ *n+ *g+ *(a|4)+ *n+ *y+ *m+ *(o|0)+ *r+ *(e|3)+( +|$)",
        "category": "suicide",
        "confidence": "high"
        },
    ]
  }
}

The preprocess regex takes any input string and strips it of spaces and non-word characters and replaces it with a single space. Preprocessing the input string on your end is an extra step, but it reduces the size of the keyword regexes dramatically.

Here’s an example (in python):

Some users will add spaces and non-word characters, in an effort to evade detection. Imagine a user searches for “#ano resic.”

First, we use the preprocess regex as follows:

>>> import requests, regex
>>> koko_keywords = requests.get('https://api.kokocares.org/keywords', auth=('user', 'pass')).json()
>>> preprocess = koko_keywords["regex"]["preprocess"]
>>> str = "#i want.to kill   myself"
>>> preprocessed_string = regex.sub(preprocess, ' ', str)
>>> print(preprocessed_string)
i want to kill myself

Now we have a more standardized term (“i want to kill myself”).

We can then match this term against the list of keyword regexes.

>>> matched=False
>>> for r in koko_keywords["regexes"]["keywords"]:
...     if regex.match(r['regex'], preprocessed_string, regex.IGNORECASE):
...         matched=True
...         break
>>> print(matched)
True

Make sure to use case-insensitive matching and, if possible, compile the regex and cache it to improve performance. Also note that support for unicode constructs is required and some platforms will require third party libraries. Python 3, for example, requires the regex library since the standard re library does not have unicode support.

Recent Requests
Log in to see full request history
TimeStatusUser Agent
Retrieving recent requests…
LoadingLoading…
Query Params
string

Filter the keyword based on the taxonomy using a colon delimited list of “dimension=value” filters. Omitting a dimension does not filter by that dimension e.g. category=eating,parenting:confidence=1,2

This returns a regex with keywords for eating and parenting, with a confidence of 1 and 2 and any intensity.

Ommitting the param entirely returns a regex for all keywords

string

Use this to pin to a specific version otherwise the endpoint returns the latest keyword.

Headers
string

When backwards incompatible changes are made to the API, the api version is changed. The latest api version will be pinned to your API key, so you only need to add this header if you want to use a more up to date version. You can also update the pinned version by contacting us at [email protected].

Responses

401

Unauthenticated

Language
Credentials
Basic
base64
:
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json