Get keyword regexes

get http://api.kokocares.org/keywords

We highly recommend using our native client with bindings for all major languages which handles caching and is very efficient (< 1μs).

This endpoint returns regexes that must be cached on the server side. Use the Cache-Control header to set the expiration or just default it to an hour. The regex is updated on a regular basis but no more frequently than daily.

This endpoint returns two sets of PCRE compliant regexes, a concise regex to preprocess your input string and then a list of large keyword regexes to match against.

Here’s an example response, with a small sample of keywords:

{
  "version": "20220206",
  "filter": "category=eating,parenting:confidence=high,medium",
  "regexes": {
    "preprocess": "[^\\p{L}\\p{Nd}]",
    "keywords": [
      { 
        "regex": "(^| +)(i|!|1)+ *(s|\\$|z)+ *h+ *(o|0)+ *u+ *l+ *d+ *k+ *(i|!|1)+ *l+ *l+ *m+ *y+ *(s|\\$|z)+ *(e|3)+ *l+ *f+( +|$)",
        "category": "suicide",
        "confidence": "high"
        },
      { 
        "regex": "(^| +)d+ *(o|0)+ *n+ *t+ *f+ *(e|3)+ *l+ *l+ *(i|!|1)+ *k+ *(e|3)+ *l+ *(i|!|1)+ *v+ *(i|!|1)+ *n+ *g+ *(a|4)+ *n+ *y+ *m+ *(o|0)+ *r+ *(e|3)+( +|$)",
        "category": "suicide",
        "confidence": "high"
        },
    ]
  }
}

The preprocess regex takes any input string and strips it of spaces and non-word characters and replaces it with a single space. Preprocessing the input string on your end is an extra step, but it reduces the size of the keyword regexes dramatically.

Here’s an example (in python):

Some users will add spaces and non-word characters, in an effort to evade detection. Imagine a user searches for “#ano resic.”

First, we use the preprocess regex as follows:

>>> import requests, regex
>>> koko_keywords = requests.get('https://api.kokocares.org/keywords', auth=('user', 'pass')).json()
>>> preprocess = koko_keywords["regex"]["preprocess"]
>>> str = "#i want.to kill   myself"
>>> preprocessed_string = regex.sub(preprocess, ' ', str)
>>> print(preprocessed_string)
i want to kill myself

Now we have a more standardized term (“i want to kill myself”).

We can then match this term against the list of keyword regexes.

>>> matched=False
>>> for r in koko_keywords["regexes"]["keywords"]:
...     if regex.match(r['regex'], preprocessed_string, regex.IGNORECASE):
...         matched=True
...         break
>>> print(matched)
True

Make sure to use case-insensitive matching and, if possible, compile the regex and cache it to improve performance. Also note that support for unicode constructs is required and some platforms will require third party libraries. Python 3, for example, requires the regex library since the standard re library does not have unicode support.