Skip to content

v0.2.93#33

Merged
D4Vinci merged 30 commits intomainfrom
dev
Jan 31, 2025
Merged

v0.2.93#33
D4Vinci merged 30 commits intomainfrom
dev

Conversation

@D4Vinci
Copy link
Owner

@D4Vinci D4Vinci commented Jan 31, 2025

This is an essential update for everyone to fully enjoy Scrapling as it's intended

What's changed

  1. The return type is now consistent across all the parser engine so you will always get a return type as one of these Adaptor, Adaptors, TextHandler, TextHandlers, None, and a list in case you have mixed results like combined CSS selector. This allows a better coding experience with minimum manual type checking, makes the library more stable, and makes chaining methods almost always possible.
  2. Most of the parser engine especially the Adaptor class got refactored to a cleaner version and most importantly a faster version. So now almost all the methods/properties, especially the searching methods, got a speed increase between 5-40%. Some methods got bigger speed boosts like find_by_regex got a ~60% speed boost! The automatch feature got a small ~5% speed boost.
  3. Fixed logic bugs with the find_all/find methods that made the passed filters used in OR fashion and other times as aa AND. So now all filters you, all elements returned need to fulfill it except the passed tag names.
  4. Now all regex-related methods return TextHandler/TextHandlers for easier methods chaining.
  5. Added a new below_elements property that returns an Adaptors object of all elements under the current element in the DOM tree.
  6. Now all methods/properties that were returning HTML source as string are now returning it as TextHandler so you can do regex easily on it etc...
  7. StealthyFetcher is now a bit faster and more stealthy. Also, now the option to make it possible to click Captchas like Cloudflare Turnstile is enabled by default.
  8. The auto-completion and type hints improved a lot in nearly half the library. Especially Adaptor, TextHandler, and TextHandlers.
  9. Now slicing TextHandler, accessing by index, or using the split methods returns another TextHandler instead of the standard Python string. Now almost all standard string operations return other Texthandler to make chaining methods/functions always possible.
  10. Fixed some small bugs and typos. For example, the Fetcher async_put was doing post request instead of put request 😶‍🌫️
  11. Improved the README a bit till I finish the documentation website.

This was supposed to be a small update till version 0.3 but thought to make it better.
Thanks for all your support!


Shoutout to our biggest Sponsor: Scrapeless

Scrapeless Banner

It runs autocompletion on some IDEs
This allows elements in cross-origin iframes, such as the Turnstile checkbox, to be clicked.
…for async_fetch

This allows elements in cross-origin iframes, such as the Turnstile checkbox, to be clicked.
.
Forgot to add it with the other commit
This is instead of the default disk-based cache.
This will provide a better autocompletion experience inside IDEs, commit affects:
- TextHandler
- AttributesHandler
…n fixes

and access by index return `TextHandler`
- Now return types are consistent across all the parser engine
- Parser got a 5-30% performance boost across different methods.
- Renamed some of the internal methods for clearer code.
- A lot better auto-completion experience after a lot of adjustments.
…a (and) fashion and other times (OR)

Also, the speed boost is 6-14% not 95% as I said first :(
So you can do regex on raw html if wanted etc...
@D4Vinci D4Vinci merged commit e40a1e1 into main Jan 31, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments