Skip to content

dataresearchcenter/ftmq-search

Repository files navigation

ftmq-search on pypi Python test and package pre-commit Coverage Status MIT License

ftmq-search

Search stores logic for FollowTheMoney data.

The aim is to experiment around with different full-text search backends for efficient shallow search of entities.

Currently supported backends:

Install

Python 3.11 or later.

pip install ftmq-search

Generate search documents

ftmqs transform -i entities.ftm.json > entities.transformed.json

Speed it up via GNU Parallel

cat entities.ftm.json | parallel -j8 --pipe --roundrobin ftmqs transform > entities.transformed.json

Index transformed documents

Sqlite FTS

ftmqs --uri sqlite:///ftmqs.store index -i entities.transformed.json

Elasticsearch

ftmqs --uri http://localhost:9200 index -i entities.transformed.json

ES can be parallelized:

cat entities.transformed.json | parallel -j8 --pipe --roundrobin ftmqs --uri http://localhost:9200 index

Tantivy

ftmqs --uri tantivy://tantivy.db index -i entities.transformed.json

Search

ftmqs search <query>

Autocomplete

ftmqs autocomplete <query>

Python

from ftmq import Query, smart_stream_proxies

from ftmqs import get_store
from ftmqs.logic import index_proxies

# elasticsearch
store = get_store("http://localhost:9200")

# sqlite
store = get_store("sqlite:///ftmqs.db")

# tantivy
store = get_store("tantivy://tantivy.db")

# tantivy in-memory
store = get_store("memory://")

# index entity data
proxies = smart_stream_proxies("./entities.ftm.json")
index_proxies(proxies, store)

# search
store.search("jane doe")

# filter for country and schema
q = Query().where(country="de", schema="Person")
store.search("jane doe", q)

About

Search experiments for followthemoney data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •