Skip to content

[Enhancement] Add Visual Grounding / Detection by Image Input #4

@Interpause

Description

@Interpause

async detectObjectsByText(
imageFileUri: string,
textPrompt: string,
includeDescription: boolean
): Promise<DetectionResult> {
return this.performDetection(imageFileUri, includeDescription, {
model: "DINO-X-1.0",
prompt: {
type: "text",
text: textPrompt
},
targets: ["bbox"],
bbox_threshold: 0.25,
iou_threshold: 0.8
});
}

Hi, DINO-X supports visual prompts for visual grounding-based object detection right? Would it be possible to add a tool that lets the agent give a local file URL/HTTPS URL to use as the detection prompt? I think just need to modify the function above right? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions