Skip to content

coarray "get": I got a SIGSEGV, but I do not understand why... #6

@szaghi

Description

@szaghi

Dear All,

I am sorry for bothering you...

In the just-uploaded feature/add-coarray-buckets branch I obtain a SIGSEGV that I am not able to debug... any hints are much more than welcome. In the following there is a full report.

The test

The test is minimal

program hasty_test_caf_get_clone
use, intrinsic :: iso_fortran_env, only : int32, int64, error_unit
use hasty

type(hash_table)      :: a_table       !< A table.
class(*), allocatable :: a_new_content !< A content.

call a_table%initialize(buckets_number=4, use_prime=.true.)

#ifdef CAF

call a_table%add_clone(key=3_int32, content=int(this_image(), int32))

critical
call a_table%get_clone(key=3_int32, content=a_new_content)
end critical
sync all

#endif
endprogram hasty_test_caf_get_clone

The get_clone method is

  subroutine get_clone(self, key, content)
  class(hash_table),     intent(in)  :: self    !< The hash table.
  class(*),              intent(in)  :: key     !< The key.
  class(*), allocatable, intent(out) :: content !< Content of the queried node.
  integer(I4P)                       :: b       !< Bucket index.
  integer(I4P)                       :: i       !< Image index.
  
  if (self%is_initialized_) then
    call self%get_bucket_image_indexes(key=key, bucket=b, image=i)
    if (b>0) then
#ifdef CAF
      call dictionary_get_clone(self%bucket(b)[i], key=key, content=content)
#else
      call self%bucket(b)%get_clone(key=key, content=content)
#endif
    endif
  endif
  endsubroutine get_clone

The statement call dictionary_get_clone(self%bucket(b)[i], key=key, content=content) is where all evil starts.

Results using OpenCoarrays/GNU gfortran

The call to get_clone raises a SIGSEGV if the number of images is greater than 1

stefano@zaghi(06:24 PM Wed Nov 30) on feature/add-coarray-buckets
~/fortran/HASTY 14 files, 356Kb
→ cafrun -np 2 ./exe/hasty_test_caf_get_clone

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f6966ba10af in ???
#1  0x40eae9 in __hasty_dictionary_node_MOD_has_key
        at src/lib/hasty_dictionary_node.f90:97
#2  0x41095f in key_iterator_search
        at src/lib/hasty_dictionary.f90:328
#3  0x40fa7e in __hasty_dictionary_MOD_traverse_iterator
        at src/lib/hasty_dictionary.f90:521
#4  0x40fe5f in __hasty_dictionary_MOD_node
        at src/lib/hasty_dictionary.f90:315
#5  0x410772 in __hasty_dictionary_MOD_get_pointer
        at src/lib/hasty_dictionary.f90:223
#6  0x40fc98 in __hasty_dictionary_MOD_get_clone
        at src/lib/hasty_dictionary.f90:206
#7  0x41362f in __hasty_hash_table_MOD_get_clone
        at src/lib/hasty_hash_table.f90:236
#8  0x414ca7 in hasty_test_caf_get_clone
        at src/tests/hasty_test_caf_get_clone.F90:23
#9  0x414d4d in main
        at src/tests/hasty_test_caf_get_clone.F90:7

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 9044 RUNNING AT zaghi
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Valgrind inspection

stefano@zaghi(06:24 PM Wed Nov 30) on feature/add-coarray-buckets
~/fortran/HASTY 14 files, 356Kb
→ valgrind --leak-check=yes cafrun -np 2 ./exe/hasty_test_caf_get_clone
==9448== Memcheck, a memory error detector
==9448== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9448== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==9448== Command: /opt/arch/opencoarrays/build/bin/cafrun -np 2 ./exe/hasty_test_caf_get_clone
==9448==

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f7e933dd0af in ???
#1  0x40eae9 in __hasty_dictionary_node_MOD_has_key
        at src/lib/hasty_dictionary_node.f90:97
#2  0x41095f in key_iterator_search
        at src/lib/hasty_dictionary.f90:328
#3  0x40fa7e in __hasty_dictionary_MOD_traverse_iterator
        at src/lib/hasty_dictionary.f90:521
#4  0x40fe5f in __hasty_dictionary_MOD_node
        at src/lib/hasty_dictionary.f90:315
#5  0x410772 in __hasty_dictionary_MOD_get_pointer
        at src/lib/hasty_dictionary.f90:223
#6  0x40fc98 in __hasty_dictionary_MOD_get_clone
        at src/lib/hasty_dictionary.f90:206
#7  0x41362f in __hasty_hash_table_MOD_get_clone
        at src/lib/hasty_hash_table.f90:236
#8  0x414ca7 in hasty_test_caf_get_clone
        at src/tests/hasty_test_caf_get_clone.F90:23
#9  0x414d4d in main
        at src/tests/hasty_test_caf_get_clone.F90:7

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 9452 RUNNING AT zaghi
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
==9448==
==9448== HEAP SUMMARY:
==9448==     in use at exit: 101,606 bytes in 1,426 blocks
==9448==   total heap usage: 4,560 allocs, 3,134 frees, 257,768 bytes allocated
==9448==
==9448== 12 bytes in 1 blocks are definitely lost in loss record 94 of 409
==9448==    at 0x4C2AB8D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9448==    by 0x47219D: xmalloc (in /usr/bin/bash)
==9448==    by 0x46BDD8: set_default_locale (in /usr/bin/bash)
==9448==    by 0x41A048: main (in /usr/bin/bash)
==9448==
==9448== LEAK SUMMARY:
==9448==    definitely lost: 12 bytes in 1 blocks
==9448==    indirectly lost: 0 bytes in 0 blocks
==9448==      possibly lost: 0 bytes in 0 blocks
==9448==    still reachable: 101,594 bytes in 1,425 blocks
==9448==         suppressed: 0 bytes in 0 blocks
==9448== Reachable blocks (those to which a pointer was found) are not shown.
==9448== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==9448==
==9448== For counts of detected and suppressed errors, rerun with: -v
==9448== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

There are memory leaks, but I cannot understand why

Digging deeper

I think that the final memory leak happens when I try to check if a node has a key here

  elemental logical function has_key(self)
  !< Return .true. if the node has a key (or id) set-up.
  class(dictionary_node), intent(in) :: self !< The node.

  has_key = allocated(self%key)
  endfunction has_key

Note that self is not defined as pointer, but when I invoke has_key as method it is likely a pointer into a list. Moreover, before calling has_key on a pointer-node I check if the node is associated, see here

  subroutine traverse_iterator(self, iterator)
  !< Traverse dictionary from head to tail calling the iterator procedure.
  class(dictionary), intent(in)  :: self     !< The dictionary.
  procedure(iterator_interface)  :: iterator !< The iterator procedure to call for each node.
  type(dictionary_node), pointer :: p        !< Pointer to scan the dictionary.
  logical                        :: done     !< Flag to set to true to stop traversing.

  done = .false.
  p => self%head
  do
    if (associated(p)) then
      call iterator(node=p, done=done)
      if (done) exit
      p => p%next
    else
      exit
    endif
  enddo
  endsubroutine traverse_iterator

The call iterator... statement is where I actually pass the has_key iterator check on pointer-node `p'.

@LadaF @jeffhammond @MichaelSiehl @zbeekman @rouson have some suggestions? (do not think I want you to force to read all, just what do you make in such situations?).

In such situation I generally try other Compilers, but as you know for this project I have to stick on GNU gfortran (OpenCoarrays).

O.T. @rouson @MichaelSiehl @zbeekman I am failing to force a sync all for debugging output: even echoing on standard error unit an disabling all IO buffering of my shell the write(error_unit...) of my tests seems to be unaffected by sync all: is sync all really like mpi barrier or I am misunderstanding (a lot)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions