Skip to content

Conversation

@kamenkremen
Copy link
Contributor

@kamenkremen kamenkremen commented Dec 18, 2025

For older Tarantool versions vshard uses func.call to execute functions.

If such a function raises an error during an uncommitted transaction, 'Transaction is active...' error is going to be raised, which will mask the original error.

This patch fixes that bug by checking the function call result. If the function returns an error and a transaction is still active, vshard now explicitly rolls back the transaction.

Fixes #614

NO_DOC=bugfix

Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! I'm glad, you've figured out the reason 🎉

P.S. And please, don't forget to rebase)

@Serpentian Serpentian assigned kamenkremen and unassigned Serpentian Dec 22, 2025
@kamenkremen kamenkremen force-pushed the gh-614 branch 5 times, most recently from 949d8a9 to 80bd895 Compare December 26, 2025 23:26
Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Well done

Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits are left

Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patches! Well done

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch 💪!

Comment on lines 704 to 716
local function test_local_call_return_type_consistency_template(add_to_schema,
g)
g.replica_1_a:exec(function(add_to_schema)
rawset(_G, 'test_return_tuple', function()
return box.tuple.new({100, 200, 'tuple'})
end)
if add_to_schema then
box.schema.func.create('test_return_tuple')
end
local status, result = ivshard.storage.call(1,
'read',
'test_return_tuple',
{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument line wraps look malformed. Maybe make the function name shorter, or wrap all arguments to the next line like:


local function test_local_call_return_type_consistency_template(
    add_to_schema, g)
...

Same with the ivshard.storage.call().

Comment on lines 302 to 300
if not func then
return pcall(netbox_self_call, netbox_self, func_name, args)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if my function left a transaction unfinished, and it was called via netbox_self_call(), then netbox_self_call() will roll it back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the function throws an error during an active transaction, the transaction is rolled back in handle_eval_result. If the function leaves the transaction unfinished, it is rolled back after the function completes

@Serpentian Serpentian added the do not merge Not ready to be merged label Jan 26, 2026
@kamenkremen
Copy link
Contributor Author

I ran some performance tests, and here are the results.

Setup

Hardware

Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

Tarantool

Tarantool 2.11.0-0-g247a9a4183
Target: Linux-x86_64-Release
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=TRUE
Compiler: GNU-8.3.1
C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=./tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type
CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=./tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type

2 shards, 2 replicas in one shard, 9 routers, 9 load generators, which are created as follows:

tarantool generate_load.lua \
            --bucket_count 30000 \
            --op_type replace \
            --warmup \
            --uri localhost:3305,localhost:3306,localhost:3307,localhost:3308,localhost:3309,localhost:3310,localhost:3311,localhost:3312,localhost:3313 \
            --fibers 50 \
            --ops 1000000 \

Mean of 20 iterations

The perf test is from https://github.com/tarantool/vshard-ee/pull/51, with registered replace function in schema.

  • Before patch: 193721.8000 (100%)
  • After only patch that preserves original error on function fail: 193125.2000 (99.7%)
  • After whole patch: 175869.9000 (90.7%)

@Serpentian
Copy link
Collaborator

Let's do only that variant:

After only patch that preserves original error on function fail: 193125.2000 (99.7%)

The perf degradation is way too huge in the second variant. We'll leave #630 unresolved for now, since nobody complained about that yet.

@Serpentian Serpentian assigned kamenkremen and unassigned Gerold103 Feb 2, 2026
@kamenkremen kamenkremen force-pushed the gh-614 branch 2 times, most recently from ee34ee2 to 4c6abe1 Compare February 3, 2026 04:07
@kamenkremen kamenkremen assigned Serpentian and unassigned kamenkremen Feb 3, 2026
@Serpentian Serpentian requested a review from mrForza February 4, 2026 13:30
Copy link
Contributor

@mrForza mrForza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clean patch! 👍 Left some minor comments below

end
local status, err = ivshard.storage.call(1, 'read', 'test_fail', {})
ilt.assert_not(status)
local err_msg = tostring(err)
Copy link
Contributor

@mrForza mrForza Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to convert the whole error into a string, because it will contain an excess data: code, base_type, type, trace. It can slightly slow down the assert_(not)_str_contains. We need only the message of this error.

Lets simplify it:

ilt.assert_str_contains(err.message, 'test_error')
ilt.assert_not_str_contains(err.message, 'Transaction is active')

end, {global_cfg})
end

local function test_error_msg_is_preserved_template(add_to_schema, g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: lets make this template function consistent with other ones: g argument should come first.

local function test_error_msg_is_preserved_template(g, add_to_schema)

and it will be looked just a bit pretty in test_error_msg_is_preserved function:

test_error_msg_is_preserved_template(g, false)
test_error_msg_is_preserved_template(g, true)

@mrForza mrForza assigned kamenkremen and unassigned mrForza Feb 5, 2026
-- transaction, then in Tarantool versions before 3.0.0-beta1-18,
-- the original error is replaced by the "Transaction is active" error.
-- We manually check if the function returned an error. If so, we
-- rollback the transaction to reveal the original error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the comment should be in the handle_eval_function.

then in Tarantool versions before 3.0.0-beta1-18

It happens everywhere, so no need to mention version

@Serpentian Serpentian removed their assignment Feb 6, 2026
@Serpentian
Copy link
Collaborator

Serpentian commented Feb 6, 2026

I can confirm, that @kamenkremen commit doesn't cause major perf degradation, so I remove do not merge lablel, we`re safe now.

However, @Gerold103, can we go even further and drop that piece of shit from net.box, which we don't need at all:

    local results = {...}
    for i = 1, select('#', ...) do
        if type(results[i]) == 'cdata' then
            results[i] = msgpack.decode(msgpack.encode(results[i]))
        end
    end
    return unpack(results)

This breaks backward compatibility of vshard.storage.call, but who cares, when it costs us 10% of performance. Here're my measurements. This perf bump will affect all requests, which return cdata (e.g. get, replace, insert). The only one not affected - select (it returns table). All tests (even cartridge's ones) works: #641

5 workers (load generators), 9 routers, 2 shards, 2 replicas in one shard. replaces.

Before the patches:
Average per iteration: 186312.8500 (100%)

After Ivan`s commit:
Average per iteration: 185511.4500 (99.6%)

After two commits (Ivan's + Mine: #641):
Average per iteration: 204105.0500 (109.6%)

Edit

Actually, we must fix it in core: tarantool/tarantool#11624 (comment). The inconsistency between versions is much more serious: the return type on router becomes cdata instead of table. Id rather return cdata everywhere and that's it. When we'll fix it in core, we can use the net.box.self.call` from there, but on all versions before it we should reimplement net.box.self.call, as it's done in #641, in order to avoid that perf degradation.

This way we have the following backward compatibility breakage: when cdata is returned on old vshard versions, vshard automatically did table from it, from now on cdata itself is returned. IMHO, it's ok.

@Serpentian Serpentian removed the do not merge Not ready to be merged label Feb 6, 2026
In Tarantool versions prior to 3.0.0-beta1-18, `net_box.self.call()`
cannot invoke C stored or Lua persistent functions.
Therefore, if such a function is registered in `box.func`, vshard
executes it via `func.call` instead.

If such a function raises an error during an uncommitted transaction,
'Transaction is active...' error is going to be raised, which will mask
the original error.

This patch fixes that bug by checking the function call result. If the
function returns an error and a transaction is still active, vshard now
explicitly rolls back the transaction.

Fixes tarantool#614

NO_DOC=bugfix
@Gerold103
Copy link
Collaborator

I think it is fine to just drop netbox.self from vshard entirely. I can't imagine what can really break besides for people who were calling vshard.storage.call() locally on storage and were checking type() of results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vshard.storage.call masks the original error thrown from a function in some cases

4 participants