SparkFun Forums 

Where electronics enthusiasts find answers.

Topics pertaining to the Arduino Core & software used with the Artemis module and Artemis development boards.
User avatar
By robin_hodgson
#216616
I am using the Arduino development environment to perform a bazillion SPI transfers to an LCD display. It takes a lot of operations to transfer a pixel image to a display one pixel at a time! A fair number of the transfers are failing with the following error:
Code: Select all
got an error on _transfer: 4
This message gets printed to my Arduino Serial connection. It's not from my code, so I'm thinking that it must come from somewhere inside the Arduino SPI driver.

Whenever one of these errors occurs, the drawing process freezes up for a second or so, and leaves a dark spot instead of my pixel. It makes me think that there is some sort of timeout involved. I am not sure why there would be a timeout though. It's not like SPI needs to wait for anything unless the SPI transactions are being queued up inside the driver.

Does anyone know what this error means?
User avatar
By robin_hodgson
#216617
I figured out that this is indeed a timeout error, but I still have no idea why I should be getting it. Here is where the message is getting generated inside the Sparkfun Arduino core for SPI:
Code: Select all
void SPIClass::_transfer(void *buf_out, void *buf_in, size_t count)
{
  ...
  retVal32 = am_hal_iom_blocking_transfer(_handle, &iomTransfer);
  ...
  if (retVal32 != 0)
  {
    Serial.printf("got an error on _transfer: %d\n", retVal32);
  }
}
The specific error 4 is defined here as a timeout error:
Code: Select all
typedef enum
  {
    AM_HAL_STATUS_SUCCESS,
    AM_HAL_STATUS_FAIL,
    AM_HAL_STATUS_INVALID_HANDLE,
    AM_HAL_STATUS_IN_USE,
    AM_HAL_STATUS_TIMEOUT,
    AM_HAL_STATUS_OUT_OF_RANGE,
    AM_HAL_STATUS_INVALID_ARG,
    AM_HAL_STATUS_INVALID_OPERATION,
    AM_HAL_STATUS_MEM_ERR,
    AM_HAL_STATUS_HW_ERR,
    AM_HAL_STATUS_MODULE_SPECIFIC_START = 0x08000000,
  } am_hal_status_e;
Why should the HAL be timing out? I am sending tons of sequential SPI operations, but they are being sent via a blocking transfer mechanism so they should not be queing up anywhere.
User avatar
By robin_hodgson
#216618
OK, either I don't know how to use the SPI interface, or the Artemis Arduino SPI implementation is broken. To replicate the issue, try out this minimal Arduino sketch:
Code: Select all
#include <SPI.h>

void setup() {
  SPI.begin();
  Serial.begin(115200);
}

void loop() {
  uint32_t cnt=0;
  while (1) {
    cnt++;
    if ((cnt%1000)==0) Serial.println(cnt);
    SPI.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));
    for (int i=0; i<4; i++) {
      SPI.transfer(i);
    }
    SPI.endTransaction();
  }
}
I don't care about asserting SS: it is fine for the SPI data to just vanish into the ether. All I care about is that I should be able to begin a transaction, send 4 bytes, and end the transaction. Repeat forever.

Interestingly, this test fails very roughly 1 out of every 2000 times. Here is some output from a typical test run:
Code: Select all
1000
2000
3000
4000
5000
6000
7000
got an error on _transfer: 4
8000
9000
10000
got an error on _transfer: 4
11000
got an error on _transfer: 4
12000
got an error on _transfer: 4
got an error on _transfer: 4
13000
14000
got an error on _transfer: 4
15000
got an error on _transfer: 4
got an error on _transfer: 4
16000
17000
18000
19000
got an error on _transfer: 4
20000
21000
These errors occur randomly. Here is another run:
Code: Select all
got an error on _transfer: 4
1000
2000
got an error on _transfer: 4
3000
got an error on _transfer: 4
4000
got an error on _transfer: 4
5000
6000
7000
got an error on _transfer: 4
8000
got an error on _transfer: 4
9000
10000
got an error on _transfer: 4
11000
12000
13000
14000
15000
16000
17000
got an error on _transfer: 4
got an error on _transfer: 4
18000
From a user perspective, this is an undetectable error: the Artemis SPI driver may generate an error message to the Serial port, but the Arduino transfer() mechanisms do not support returning error information to the user's application code, so the driver has no choice but to throw the error on the ground. Sadly, ignorance is not bliss in this case.

Finally, experimentation shows that if the inner loop transmits 4 or more bytes in the transaction, there will be errors. If I change that to be 3 bytes or less, then all the errors go away.
User avatar
By robin_hodgson
#216621
On a whim, I tried the SPI transfer test program on another Redboard I had laying around. Interestingly, I could not replicate the SPI errors on that second board. So I tried a third board I had made myself with a bare Artemis module on it. I let that system run the SPI transfer test overnight. As of this morning, it had no SPI transfer errors in over 440 million transfers. That would seem to suggest that my original board has a processor that is misbehaving under rare circumstances. In my experience, that is an profoundly unusual occurrence, but maybe that's what it is. I would be interested if anyone else sees any errors when running the same test.
User avatar
By liquid.soulder
#216640
Thanks for writing this up Robin. That's surprising behavior to be sure. I also did not know that that debugging printf statement had snuck its way into a release.
User avatar
By robin_hodgson
#216641
I am totally OK with the error message! It sure beats silently having a problem with a transfer.
User avatar
By liquid.soulder
#216642
I agree - but also have problems forcing the user to see it if they do not want to. Perhaps we can add a way to configure some errors to appear on a desired serial port.... or we could extend the Arduino SPI API so that there is an accessible report of the last status message. Logging issue on GitHub: https://github.com/sparkfun/Arduino_Apollo3/issues/196
User avatar
By robin_hodgson
#216654
I can generate transfer error 4 timeout situations at will now. What follows is one way to do it.

I modified the Sparkfun SPI _transfer() method to print out if it was being invoked to TX, RX or both. This is not critical to generating the bug, but it helps explains the test output that will follow:
Code: Select all
void SPIClass::_transfer(void *buf_out, void *buf_in, size_t count)
{
    Serial.printf("_transfer(%s,%s,%d)\n", buf_out?"TX":"", buf_in?"RX":"", count);
...
It took a surprising amount of time to figure out that it could be generated in a trivially simple fashion:
Code: Select all
    SPI.begin();
    uint8_t buffer[256];
    SPISettings settings = SPISettings();
    settings.clockFreq = 16000000;
    SPI.beginTransaction(settings);
    uint32_t testIter = 0;
    while (1) {
      Serial.println(testIter++);
      SPI.transfer(0);
      SPI.transferOut((void*)buffer, sizeof(buffer));
    }
Code: Select all
0
_transfer(TX,RX,1)
_transfer(TX,,256)
got an error on _transfer: 4
1
_transfer(TX,RX,1)
got an error on _transfer: 4
The timeout error 4 shows up when there is a bidirectional transfer() followed by a unidirectional write transferOut().

Final Notes:
  • If I remove the bidirectional transfer(0) from the test code, the 256 byte transferOut() write runs perfectly forever
  • If I change the bidirectional transfer(0) call to be a 1-byte unidirectional transferOut(buffer, 1) call, then both transfers run perfectly forever
At the moment, I do not know if this issue is in the Sparkfun Arduino SPI driver or maybe the Ambiq HAL, or perhaps an interaction between the two of them. But I'm pretty sure it's not my code for a change :)
User avatar
By stefan_haechler
#219271
Are there any news... I've run into strange behaviors:
Code: Select all
 SPISettings settings(24000000, MSBFIRST, SPI_MODE0);
    SPIEth.begin();
    uint8_t b[256] = {0x00};
    while(1){
      SPIEth.beginTransaction(settings);
      //Serial.println("write");
      SPIEth.transferOut((void*)b,256);
      //Serial.println("\nread");
      //for(int i = 0;i < 100;i++){
      SPIEth.transferIn((void*)b,256);
      //}
      SPIEth.endTransaction();
      delay(4);
    }
If the delay is removed it will go to an error 4 for all spi sends or receives after ~700 Transfers
If the delay is set to 4 (ms) it only needs around 40 transfers until errors appearing. (The error will stay for all spi transfers afterwards)
If the delay is set to 10 it needs only 24 transfers until error...

With delay set to 20ms it was stable... :?:

Timeouterror comes from: "am_hal_flash_delay_status_check(...)" in am_hal_flash.c but this function is already called with a 0.5s Blocking wait time. (AM_HAL_IOM_MAX_BLOCKING_WAIT)
User avatar
By robin_hodgson
#219275
I never bothered debugging it further after figuring out the workaround described above that resolved my specific issue.

I did see this a couple of days ago: https://support.ambiqmicro.com/hc/en-us ... e-release-

The gist is that there is a fix to 2.4.2 that deals with problems involving blocking full-duplex SPI transfers. That was right in the area of the issues I was seeing, but it is not clear that it would cover what you are seeing. That said, Ambiq has made improvements to the SPI driver and it might be worth applying the patch at that link to see if it improves your situation. Please let us know what you find!
User avatar
By stefan_haechler
#219276
Update: It's only occurring on some artemis mcu. Artemis Redboard gives the above described errors. On an other Artemis Module there where no errors at all.

I used the same hal_sdk and only non duplex SPI transfers.
Last edited by stefan_haechler on Mon Sep 28, 2020 9:09 am, edited 1 time in total.
User avatar
By robin_hodgson
#219277
That was my experience too. However, that does not mean it is a silicon issue. It feels like a driver problem involving something like a race condition.
User avatar
By robin_hodgson
#219334
Today, I saw that Ambiq released SDK 2.5.1. It claims to have the SPI fixes inside it that were available as patches before now. I tried it out, and now my projects don't even build. Boo. It appears to me that they introduced a bug into a header file so that C++ projects won't compile. I filed a report with Ambiq. Of course, it wouldn't be the first time that I thought I found a bug that turned out to be my own problem :oops:
By stephenf
#219697
I've been running into this error and getting quite frustrated, while I'm still fixing a few things in my own code and still using core 1.2.0. They might be useful to someone.

Here's a few things which seemed to have helped my cause:
- Fixing a memory leak caused by not closing files
- Removing some of my debugging statements, which were doing a lot of Serial printing, and also logging to a large buffer, which was eventually written to the SD card
 Topic permissions

You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum