Search stores logic for FollowTheMoney data.
The aim is to experiment around with different full-text search backends for efficient shallow search of entities.
Currently supported backends:
- Sqlite FTS5
- Elasticsearch
- Tantivy (persistent or in-memory)
Python 3.11 or later.
pip install ftmq-search
ftmqs transform -i entities.ftm.json > entities.transformed.json
Speed it up via GNU Parallel
cat entities.ftm.json | parallel -j8 --pipe --roundrobin ftmqs transform > entities.transformed.json
ftmqs --uri sqlite:///ftmqs.store index -i entities.transformed.json
ftmqs --uri http://localhost:9200 index -i entities.transformed.json
ES can be parallelized:
cat entities.transformed.json | parallel -j8 --pipe --roundrobin ftmqs --uri http://localhost:9200 index
ftmqs --uri tantivy://tantivy.db index -i entities.transformed.json
ftmqs search <query>
ftmqs autocomplete <query>
from ftmq import Query, smart_stream_proxies
from ftmqs import get_store
from ftmqs.logic import index_proxies
# elasticsearch
store = get_store("http://localhost:9200")
# sqlite
store = get_store("sqlite:///ftmqs.db")
# tantivy
store = get_store("tantivy://tantivy.db")
# tantivy in-memory
store = get_store("memory://")
# index entity data
proxies = smart_stream_proxies("./entities.ftm.json")
index_proxies(proxies, store)
# search
store.search("jane doe")
# filter for country and schema
q = Query().where(country="de", schema="Person")
store.search("jane doe", q)