Releases: EdJoPaTo/website-stalker
v0.13.0
Split css_select into css_select and css_remove
This results in simpler configs for removing via css selector:
editors:
- - css_select:
- selector: img
- remove: true
+ - css_remove: imgThis is a breaking change and also simplifies the internal logic.
img in html_markdownify
Images are now added to the markdown output.
Images will require absolute paths when markdown is being rendered as html so html_url_canonicalize is helpful here.
If you do not want the images (like it was before this release) add the editor css_remove: img to your config.
Minor Changes
- fix(git): work in repo without commits yet 7436dff
v0.12.1
v0.12.0
Editors
Two new editors json_prettify and html_url_canonicalize. 73814fb e51baf0
IPv6 vs legacy IPv4
The log output now shows which kind of address was used. e034a70
v0.11.0
Simplify Git Logic
The git part was heavily updated. When running with --commit the command now aborts when not in a git repo or the repo is unclean.
If the repo is unclean (without --commit) no more git add is used which simplifies testing out the ideal config before commiting it.
With these changes also now all the git logic is handled via libgit2. The git binary is not anymore a required dependency. ❇️
- feat(run)!: prevent --commit in a not clean repo 73800f0
- feat!: prevent --commit when not in a git repo 664837c
- fix(run): only git add when --commit 25fa0d8
- fix(git): dont integrate git diff and git status da23989
- feat(run): dont cleanup or reset b75d9f8
- refactor(run): simplify git finishup logic 8efda45
Warn on redirected URLs
Some urls are redirected first before the content is returned. This results in additional traffic and roundtrips. As this is done every time the website-stalker is running this adds up over time. In order to reduce traffic the target of the redirects should be specified directly.
There is now a warning which shows which URL leads where and suggests using the target instead.
- feat: warn on redirected URLs to reduce traffic 4c9136c
Init command
You can now init a directory with a git repo (git init) and a config (website-stalker example-config > website-stalker.yaml) in one neat command:
website-stalker init
- feat(init): provide init folder/repo/config command 9842d9a
Case insensitive site filter
The site filter is now case insensitve. When you had to use website-stalker run EdJoPaTo for running on https://EdJoPaTo.de you can now do so with website-stalker run edjopato
- feat(cli)!: site filter is now case insensitive 85af5f6
Config format is now fixed
Before you could use other formats for the config like website-stalker.toml. In order to simplify the config logic the config now has to be a yaml file.
- refactor(config)!: simplify 4d5e390
Minor Changes
v0.10.0
html_markdownify
A new editor html_markdownify can create markdown from html input. See more details about this new editor in the README. e1798ee
html_textify
Creates now up to one empty line between filled lines db894e9 32fa6d1
Rename editors to be more like functions
Editors should now be more clear in what they are doing when they are applied. This is a breaking change and you have to adapt your configs in order to work with this release. 82cefbc
- html_text → html_textify
- css_selector → css_select
- regex_replacer → regex_replace
v0.9.0
v0.8.0
More generic config file format
Each site in the config file is now more generic. Before each entry was an html or utf8 entry. Now each entry is basically the same.
Each entry has an URL and a file extension which is then used to save the resulting file.
Each site can also have editors. An editor manipulates the content before saving the result.
css_selector and regex_replacer are now editors. The default behavior of html to prettify the content is now and editor too: html_prettify.
Additionally this update includes a new editor html_text which only returns text entries from the HTML.
To give an example:
sites:
- url: "https://edjopato.de/post/"
extension: html
editors:
- css_selector: article
- css_selector:
selector: .meta
remove: true
- html_prettifyIf you want to see a config migration see this commit.
css_selector remove elements
The css_selector can now remove matching HTML elements from the result. This is already included in the example above.
html_text Editor
This editor only returns text entries from the HTML.
To give an example: This will save every h1 heading to the resulting file.
- url: "https://edjopato.de/post/"
extension: txt
editors:
- css_selector: h1
- html_textsystemd improvements
v0.7.1
v0.7.0
systemd files
Adds a systemd service and timer to be used locally 3a210f2
libgit2
Migrate some functions from running git as a commandline tool towards libgit2.
This should make handling and detecting easier on the code side of things.
Not everything is migrated (yet?). Some outputs like the git diff are just fine currently via the commandline command.
0074611 bf08eb4 20f07d5 9398cd6
This also allows for running from within a subfolder of a git repo 9398cd6