Skip to content

Speed up attribute lookup#651

Open
preciz wants to merge 1 commit intophilss:mainfrom
preciz:speed_up_attribute_lookup_2
Open

Speed up attribute lookup#651
preciz wants to merge 1 commit intophilss:mainfrom
preciz:speed_up_attribute_lookup_2

Conversation

@preciz
Copy link
Contributor

@preciz preciz commented Dec 18, 2025

Improvements:
exact attribute => ~5% faster, ~50% lower memory usage
attribute present => ~10% faster, ~40% lower memory usage
attribute includes => ~100% faster, ~60% lower memory usage

This change utilizes more built in functions instead of Enum and uses String.contains? to check for match before performing String.split.

  read_file = fn name ->
    __ENV__.file
    |> Path.dirname()
    |> Path.join(name)
    |> File.read!()
    |> Floki.parse_document!()
  end

  inputs = %{
    "big" => read_file.("big.html")
  }

  Benchee.run(
    %{
      "exact attribute" => fn doc -> Floki.find(doc, "[class='noprint']") end,
      "attribute present" => fn doc -> Floki.find(doc, "[title]") end,
      "attribute includes" => fn doc -> Floki.find(doc, "[class~='wikitable']") end
    },
    time: 10,
    inputs: inputs,
    memory_time: 2
  )

defp get_value(attr_name, attributes) do
Enum.find_value(attributes, "", fn
defp get_value(attr_name, attributes) when is_list(attributes) do
case List.keyfind(attributes, attr_name, 0) do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the improvements came from this change?

If is not that much, I would prefer to keep the Enum.find_value/3 just because it's easier to read and maintain.

Copy link
Contributor Author

@preciz preciz Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran the benchmarks back and forth changing this part only and it's faster in all cases, the biggest win in speedup is that it's ~11% faster in the "exact_attribute" case but most significantly it halves the memory usage in the "exact_attribute" and in the "attribute_includes" cases.

I believe this library is used heavily by a lot of companies where every speedup has a huge effect on throughput.
That is the case for our company.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants