Sony's Developer World forum

    • Home
    • Forum guidelines

    farapi can get into deadlock when SMP is on.

    Spresense
    1
    1
    95
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      jens6151 0 1 1 last edited by jens6151 0 1 1

      Although farapi_main tries to switch the current task/thread to CPU0, this does not succeed 100% due to workaround not being 100% stable. When CPU1 is seen once, it immediately freezes.
      You can raise the probablity to see this by doing high frequency calls to farapi like polling GNSS (and/or having the camera stream active.) Pin both to another core so that it must always change.

        if (0 != cpu)
          {
            /* Save the current cpuset */
      
            sched_getaffinity(getpid(), sizeof(cpu_set_t), &cpuset0);
      
            /* Assign the current task to cpu0 */
      
            cpu_set_t cpuset1;
            CPU_ZERO(&cpuset1);
            CPU_SET(0, &cpuset1);
            sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset1);
      
            /* NOTE: a workaround to finish rescheduling */
      
            nxsig_usleep(10 * 1000);
      
            // my modification start
            while(up_cpu_index() != 0) {
                  nxsig_usleep(100 * 1000);
                }
            // my modification end
          }    
      

      I saw this by adding prints before taking the lock and after releasing the lock

        printf("#E#%d-%d-%s->%d,%d,%d=%08x###\n", sched_getcpu(), getpid(), rtcb->name, mlist->cpuno, mlist->mbxid, id, arg);
        farapi_semtake(&g_farlock);
      ...
      err:
        nxsem_post(&g_farlock);
        printf("###%d-%d#R#\n", sched_getcpu(), getpid());
      

      The above seems to work well.
      However an additional note:

      • I once saw that the exit print reported CPU1 on the release print only. It keeps me puzzled because at this point of time it must be pinned to CPU0. It did not freeze. It was not seen again.
      • Please note that "The caller must allow for the possibility that the information returned is no longer current by the time the call returns." So it might be better to always pin, then check, then switch and wait and then check again. (Addition: sched_getcpu just returns up_cpu_index)
      /****************************************************************************
       * Name: sched_getcpu
       *
       * Description:
       *    sched_getcpu() returns the number of the CPU on which the calling
       *    thread is currently executing.
       *
       *    The return CPU number is guaranteed to be valid only at the time of
       *    the call.  Unless the CPU affinity has been fixed using
       *    sched_setaffinity(), the OS might change the CPU at any time.  The
       *    caller must allow for the possibility that the information returned is
       *    no longer current by the time the call returns.
       *
       *    Non-Standard.  Functionally equivalent to the GLIBC __GNU_SOURCE
       *    interface of the same name.
       *
       * Input Parameters:
       *   None
       *
       * Returned Value:
       *   A non-negative CPU number is returned on success.  -1 (ERROR) is
       *   returned on failure with the errno value set to indicate the cause of
       *   the failure.
       *
       ****************************************************************************/
      
      1 Reply Last reply Reply Quote
      • Referenced by  J jens6151 0 1 1 
      • Referenced by  J jens6151 0 1 1 
      • First post
        Last post
      Developer World
      Copyright © 2021 Sony Group Corporation. All rights reserved.
      • Contact us
      • Legal