Skip to content

Feat: Support running scraper as a CLI#724

Merged
brafdlog merged 3 commits intomasterfrom
runFromCmdLine
Feb 7, 2026
Merged

Feat: Support running scraper as a CLI#724
brafdlog merged 3 commits intomasterfrom
runFromCmdLine

Conversation

@brafdlog
Copy link
Owner

@brafdlog brafdlog commented Jan 23, 2026

This allows running caspion from command line for scraping.
I chose a lean approach not doing a full refactor. This is a very beneficial feature and as you can see, can be supported with a relatively small change.

Fixes #183

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CLI support for running Caspion's scraper in headless mode using the --scrape flag. This enables automated scraping via cron jobs without requiring the GUI to be open.

Changes:

  • Added a new scripts/scrape.js file that checks if source files have changed and rebuilds if necessary before launching the scraper
  • Modified the main entry point to detect CLI mode and bypass single-instance checks, allowing the scraper to run alongside the GUI
  • Changed the default maxConcurrency from 1 to 3 for better performance
  • Added comprehensive documentation in README.md with platform-specific examples and cron job configuration

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/scrape.js New build-aware launcher script for CLI scraping in development mode
packages/main/src/index.ts Added CLI mode detection and scraping flow, bypasses single-instance lock in CLI mode
packages/main/src/backend/import/importTransactions.ts Updated default maxConcurrency from 1 to 3
package.json Added scrape script command
README.md Added comprehensive CLI mode documentation with platform-specific examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@baruchiro
Copy link
Collaborator

Although I left this requirement behind since I found the daniel-hauser/moneyman project, I will review it.

Comment on lines +69 to +76
eventPublisher.onAny((eventName, eventData) => {
console.log(`[${eventName}]`, eventData?.message ?? '');
});
await scrapeAndUpdateOutputVendors(config, eventPublisher);
logAppEvent('CLI_SCRAPE_SUCCESS');
app.quit();
} catch (error) {
logAppEvent('CLI_SCRAPE_FAILED', { errorMessage: (error as Error).message });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm from mobile, but what is the difference between console.log and logAppEvent?

If the logAppEvent is for special lifecycle keys, why are those keys not enum?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logAppEvent writes structured logs to electron-log (persistent log files), while console.log here outputs scraping progress to stdout for CLI users running from terminal/cron. They serve different purposes — logAppEvent for persistent diagnostics, console for real-time CLI feedback.
Regarding enums — good point, but all existing event keys throughout the codebase (APP_READY, APP_QUIT, UPDATE_CHECK_START, etc.) are plain strings. We can create an enum for all of them in a separate PR to keep this one focused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file? Is it for those who want to really cron it from source, and you trying to avoid recompiling every time, but only when needed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

Copy link
Contributor

Copilot AI commented Jan 27, 2026

@baruchiro I've opened a new pull request, #725, to work on those changes. Once the pull request is ready, I'll request review from you.

Addresses code review feedback on #724 requesting removal of two inline
comments that don't add value beyond what the code clearly expresses.

**Changes:**
- Removed `// Check for CLI mode` comment before `isCliScrape` variable
declaration
- Removed `// CLI mode: run scraping and exit` comment at start of CLI
execution block

Both comments were redundant given the self-documenting variable names
and control flow.

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
@brafdlog brafdlog merged commit 601ed6b into master Feb 7, 2026
10 checks passed
@brafdlog brafdlog deleted the runFromCmdLine branch February 7, 2026 20:51
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

🎉 This PR is included in version 2.16.2 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: support CLI commands

3 participants