Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions datasketch/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,27 @@
WeightedMinHashLSH = MinHashLSH
WeightedMinHashLSHForest = MinHashLSHForest

# Optional async export (requires motor or redis.asyncio)
try:
from datasketch.aio import AsyncMinHashLSH
except ImportError:
AsyncMinHashLSH = None # type: ignore[misc,assignment]

__all__ = [
"AsyncMinHashLSH",
"HNSW",
"HyperLogLog",
"HyperLogLogPlusPlus",
"LeanMinHash",
"MinHash",
"MinHashLSH",
"MinHashLSHBloom",
"MinHashLSHEnsemble",
"MinHashLSHForest",
"WeightedMinHash",
"WeightedMinHashGenerator",
"WeightedMinHashLSH",
"WeightedMinHashLSHForest",
"bBitMinHash",
"sha1_hash32",
]

Check failure on line 49 in datasketch/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (RUF022)

datasketch/__init__.py:32:11: RUF022 `__all__` is not sorted

Check failure on line 49 in datasketch/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (RUF022)

datasketch/__init__.py:32:11: RUF022 `__all__` is not sorted
34 changes: 34 additions & 0 deletions datasketch/aio/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""Async MinHash LSH module.

This module provides asynchronous implementations of MinHash LSH for use with
async storage backends like MongoDB (via motor) and Redis (via redis.asyncio).

Example:

Check failure on line 6 in datasketch/aio/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (D413)

datasketch/aio/__init__.py:6:1: D413 Missing blank line after last section ("Example")

Check failure on line 6 in datasketch/aio/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (D413)

datasketch/aio/__init__.py:6:1: D413 Missing blank line after last section ("Example")
.. code-block:: python

from datasketch.aio import AsyncMinHashLSH
from datasketch import MinHash

async def main():
async with AsyncMinHashLSH(
storage_config={"type": "aiomongo", "mongo": {"host": "localhost", "port": 27017}},
threshold=0.5,
num_perm=128
) as lsh:
m = MinHash(num_perm=128)
m.update(b"data")
await lsh.insert("key", m)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example code will raise a TypeError because you are inserting a string key "key" when using aiomongo storage. By default (prepickle=False), aiomongo storage requires keys to be bytes.

Suggested change
await lsh.insert("key", m)
await lsh.insert(b"key", m)

result = await lsh.query(m)
"""

from datasketch.aio.lsh import (
AsyncMinHashLSH,
AsyncMinHashLSHDeleteSession,
AsyncMinHashLSHInsertionSession,
)

__all__ = [
"AsyncMinHashLSH",
"AsyncMinHashLSHInsertionSession",
"AsyncMinHashLSHDeleteSession",
]

Check failure on line 34 in datasketch/aio/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (RUF022)

datasketch/aio/__init__.py:30:11: RUF022 `__all__` is not sorted

Check failure on line 34 in datasketch/aio/__init__.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (RUF022)

datasketch/aio/__init__.py:30:11: RUF022 `__all__` is not sorted
Loading
Loading