-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Dear All,
I am sorry for bothering you...
In the just-uploaded feature/add-coarray-buckets branch I obtain a SIGSEGV that I am not able to debug... any hints are much more than welcome. In the following there is a full report.
The test
The test is minimal
program hasty_test_caf_get_clone
use, intrinsic :: iso_fortran_env, only : int32, int64, error_unit
use hasty
type(hash_table) :: a_table !< A table.
class(*), allocatable :: a_new_content !< A content.
call a_table%initialize(buckets_number=4, use_prime=.true.)
#ifdef CAF
call a_table%add_clone(key=3_int32, content=int(this_image(), int32))
critical
call a_table%get_clone(key=3_int32, content=a_new_content)
end critical
sync all
#endif
endprogram hasty_test_caf_get_cloneThe get_clone method is
subroutine get_clone(self, key, content)
class(hash_table), intent(in) :: self !< The hash table.
class(*), intent(in) :: key !< The key.
class(*), allocatable, intent(out) :: content !< Content of the queried node.
integer(I4P) :: b !< Bucket index.
integer(I4P) :: i !< Image index.
if (self%is_initialized_) then
call self%get_bucket_image_indexes(key=key, bucket=b, image=i)
if (b>0) then
#ifdef CAF
call dictionary_get_clone(self%bucket(b)[i], key=key, content=content)
#else
call self%bucket(b)%get_clone(key=key, content=content)
#endif
endif
endif
endsubroutine get_cloneThe statement call dictionary_get_clone(self%bucket(b)[i], key=key, content=content) is where all evil starts.
Results using OpenCoarrays/GNU gfortran
The call to get_clone raises a SIGSEGV if the number of images is greater than 1
stefano@zaghi(06:24 PM Wed Nov 30) on feature/add-coarray-buckets
~/fortran/HASTY 14 files, 356Kb
→ cafrun -np 2 ./exe/hasty_test_caf_get_clone
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f6966ba10af in ???
#1 0x40eae9 in __hasty_dictionary_node_MOD_has_key
at src/lib/hasty_dictionary_node.f90:97
#2 0x41095f in key_iterator_search
at src/lib/hasty_dictionary.f90:328
#3 0x40fa7e in __hasty_dictionary_MOD_traverse_iterator
at src/lib/hasty_dictionary.f90:521
#4 0x40fe5f in __hasty_dictionary_MOD_node
at src/lib/hasty_dictionary.f90:315
#5 0x410772 in __hasty_dictionary_MOD_get_pointer
at src/lib/hasty_dictionary.f90:223
#6 0x40fc98 in __hasty_dictionary_MOD_get_clone
at src/lib/hasty_dictionary.f90:206
#7 0x41362f in __hasty_hash_table_MOD_get_clone
at src/lib/hasty_hash_table.f90:236
#8 0x414ca7 in hasty_test_caf_get_clone
at src/tests/hasty_test_caf_get_clone.F90:23
#9 0x414d4d in main
at src/tests/hasty_test_caf_get_clone.F90:7
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 9044 RUNNING AT zaghi
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestionsValgrind inspection
stefano@zaghi(06:24 PM Wed Nov 30) on feature/add-coarray-buckets
~/fortran/HASTY 14 files, 356Kb
→ valgrind --leak-check=yes cafrun -np 2 ./exe/hasty_test_caf_get_clone
==9448== Memcheck, a memory error detector
==9448== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9448== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==9448== Command: /opt/arch/opencoarrays/build/bin/cafrun -np 2 ./exe/hasty_test_caf_get_clone
==9448==
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f7e933dd0af in ???
#1 0x40eae9 in __hasty_dictionary_node_MOD_has_key
at src/lib/hasty_dictionary_node.f90:97
#2 0x41095f in key_iterator_search
at src/lib/hasty_dictionary.f90:328
#3 0x40fa7e in __hasty_dictionary_MOD_traverse_iterator
at src/lib/hasty_dictionary.f90:521
#4 0x40fe5f in __hasty_dictionary_MOD_node
at src/lib/hasty_dictionary.f90:315
#5 0x410772 in __hasty_dictionary_MOD_get_pointer
at src/lib/hasty_dictionary.f90:223
#6 0x40fc98 in __hasty_dictionary_MOD_get_clone
at src/lib/hasty_dictionary.f90:206
#7 0x41362f in __hasty_hash_table_MOD_get_clone
at src/lib/hasty_hash_table.f90:236
#8 0x414ca7 in hasty_test_caf_get_clone
at src/tests/hasty_test_caf_get_clone.F90:23
#9 0x414d4d in main
at src/tests/hasty_test_caf_get_clone.F90:7
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 9452 RUNNING AT zaghi
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
==9448==
==9448== HEAP SUMMARY:
==9448== in use at exit: 101,606 bytes in 1,426 blocks
==9448== total heap usage: 4,560 allocs, 3,134 frees, 257,768 bytes allocated
==9448==
==9448== 12 bytes in 1 blocks are definitely lost in loss record 94 of 409
==9448== at 0x4C2AB8D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9448== by 0x47219D: xmalloc (in /usr/bin/bash)
==9448== by 0x46BDD8: set_default_locale (in /usr/bin/bash)
==9448== by 0x41A048: main (in /usr/bin/bash)
==9448==
==9448== LEAK SUMMARY:
==9448== definitely lost: 12 bytes in 1 blocks
==9448== indirectly lost: 0 bytes in 0 blocks
==9448== possibly lost: 0 bytes in 0 blocks
==9448== still reachable: 101,594 bytes in 1,425 blocks
==9448== suppressed: 0 bytes in 0 blocks
==9448== Reachable blocks (those to which a pointer was found) are not shown.
==9448== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==9448==
==9448== For counts of detected and suppressed errors, rerun with: -v
==9448== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)There are memory leaks, but I cannot understand why
Digging deeper
I think that the final memory leak happens when I try to check if a node has a key here
elemental logical function has_key(self)
!< Return .true. if the node has a key (or id) set-up.
class(dictionary_node), intent(in) :: self !< The node.
has_key = allocated(self%key)
endfunction has_keyNote that self is not defined as pointer, but when I invoke has_key as method it is likely a pointer into a list. Moreover, before calling has_key on a pointer-node I check if the node is associated, see here
subroutine traverse_iterator(self, iterator)
!< Traverse dictionary from head to tail calling the iterator procedure.
class(dictionary), intent(in) :: self !< The dictionary.
procedure(iterator_interface) :: iterator !< The iterator procedure to call for each node.
type(dictionary_node), pointer :: p !< Pointer to scan the dictionary.
logical :: done !< Flag to set to true to stop traversing.
done = .false.
p => self%head
do
if (associated(p)) then
call iterator(node=p, done=done)
if (done) exit
p => p%next
else
exit
endif
enddo
endsubroutine traverse_iteratorThe call iterator... statement is where I actually pass the has_key iterator check on pointer-node `p'.
@LadaF @jeffhammond @MichaelSiehl @zbeekman @rouson have some suggestions? (do not think I want you to force to read all, just what do you make in such situations?).
In such situation I generally try other Compilers, but as you know for this project I have to stick on GNU gfortran (OpenCoarrays).
O.T. @rouson @MichaelSiehl @zbeekman I am failing to force a sync all for debugging output: even echoing on standard error unit an disabling all IO buffering of my shell the write(error_unit...) of my tests seems to be unaffected by sync all: is sync all really like mpi barrier or I am misunderstanding (a lot)?