[Enhancement] Add Visual Grounding / Detection by Image Input

https://github.com/IDEA-Research/DINO-X-MCP/blob/83be2e9f02b1df931e03cdd1cbc57780d27d01aa/src/dino-x/client.ts#L186-L201

Hi, DINO-X supports visual prompts for visual grounding-based object detection right? Would it be possible to add a tool that lets the agent give a local file URL/HTTPS URL to use as the detection prompt? I think just need to modify the function above right? Thanks.

	async detectObjectsByText(
	imageFileUri: string,
	textPrompt: string,
	includeDescription: boolean
	): Promise<DetectionResult> {
	return this.performDetection(imageFileUri, includeDescription, {
	model: "DINO-X-1.0",
	prompt: {
	type: "text",
	text: textPrompt
	},
	targets: ["bbox"],
	bbox_threshold: 0.25,
	iou_threshold: 0.8
	});
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add Visual Grounding / Detection by Image Input #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement] Add Visual Grounding / Detection by Image Input #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions