SparkFun Forums 

Where electronics enthusiasts find answers.

Have questions about a SparkFun product or board? This is the place to be.
By TriBob
#138715
I'm planning to add an ADXL345 3-axis digital accelerometer to my Logomatic V2, and I'd like to get comments from the forum before I let the smoke out of the chips.

I need to get the V2 to log data as fast as possible (3 axes @ 1 kHz is my minimum requirement, 3.2 KHz is ADXL345 max), so I'm planning lots of s/w changes beyond the minimum needed to talk to the ADXL345.

I plan to connect the ADXL345 to the V2 as follows (via a breakout board to include decoupling caps):
Code: Select all
  ADXL345 signal:  Vdd  GND  -  GND  GND  Vs   CS   Int1  Int2  -   -   SDO    SDI    SCLK
     ADXL345 pin:  1    2    3  4    5    6    7    8     9     10  11  12     13     14
    Logomatic V2:  VCC  GND  -  GND  GND  VCC  CS1  P1    ???   -   -   MISO1  MOSI1  SCLK1
Note: Map LPC2148 pin 15 ("P1" on J6-2) as EINT3.

I intend to use the PackageTracker ADXL345 code, though that decision may change if that code is unable to support high data rates and key device features.

Other V2 software changes I'm planning include:
Code: Select all
1. Logging:
   - Pre-allocate log file:
     - Default to all of free uSD space, though this could make for a long startup.
     - Erase any allocated sector needing it. Easily avoidable by formatting first.
     - When closing the log file:
       - Truncate file to actual space used (free excess allocation).
       - Set the file modification time to match that of the config file.
   - Binary mode:
     - Write '$$' separator less often (or not at all) to reduce non-data uSD bandwidth
       (presently 25% loss for a 3-ADC read).
     - Start binary log file with a shebang and a self-dumping script:
       - Executing the log file under Linux/Windows(Cygwin)/MacOS creates a CSV translation.
       - Include field names and brief descriptions.
       - Script also documents binary translation algorithm changes across releases.

2. Config file:
   - New option to set log file pre-allocation size in KB. (Or default to all free uSD space?)
   - New option to log RTC tick counter with each sample. (Or use hack, "Trigger Character = t"?)

3. uSD buffers:
   - Maximize number of uSD buffers to cope with uSD timing irregularities.
   - Optimize system RAM use.

4. RTC:
   - Configure RTC from PCLK to avoid anomalous tick counter reads.
   - Consider initializaing RTC date/time from config file modification timestamp.
There are lots of other things I'd like to change, but they won't help me toward my immediate goal of logging ADXL345 data as rapidly as possible with minimal risk of data loss.

I haven't developed code for the V2 or the LPC2148 before: Any debugging hints I should know?

Comments on all the above?

Any suggestions for other changes?

Once I get everything working, where would be the best place to publish my code for ease of access by the SFE community? GitHub?

Thanks!
Last edited by TriBob on Mon Jan 23, 2012 11:16 pm, edited 2 times in total.
By skimask
#138716
Log the data to an FRAM chip, then write it to the SD card later. No waiting for writes. Pins can be combined (Clock, Data In/Out, etc) with the SD card with only one extra pin for the FRAM CS line, or more if you want to use more than one FRAM chip.
The FRAM chips themselves are a bit spendy, but if you look on the RAMTRON website, you can get a few samples for free and try them out.
By Polux rsv
#138722
What about building your software structure to also read devices on I2C, like gyro, pressure or other accel sensors ?
The configuration could contain various sections which activates a device setup and readings. The firmware will be generic and is configured through the config file. Something like:
[I2C]
label=BMA180 AccX ; label to put on first line
port=I2C1
baseaddress=0x64
initregister=0x03
initvalue=0x45
valuetype=word
valueregister=0x05

label=BMA180 AccY ; label to put on first line
port=I2C1
baseaddress=0x64
valuetype=word ; no init
valueregister=0x07

label=BMA180 AccZ ; label to put on first line
port=I2C1
baseaddress=0x64
valuetype=word ; no init
valueregister=0x09

....and so on with various types, bytes, word, int, float, big or little endian,......

Keep the analog and serial readings. Serial could be binary, text or GPS

Angelo
By UhClem
#138761
3200SPS X 6 bytes/sample = 19,200 bytes/second.

This is not particularly difficult and you may not even need to use the FIFOs in the SSP.

You don't even need to pre-allocate the FAT so long as the FAT library isn't terminally stupid. (Scanning the FAT from the start at each cluster allocation is right out.)
By TriBob
#138774
@UhClem:
I also intend to log the 16-bit RTC tick counter (needed for statistical timing and sampling analyses), which bumps the byte count up to 8 bytes/sample, or 25,600 bytes/second. And that's assuming the "$$" binary separator text is eliminated, otherwise it would be 10 bytes/sample, or 32,000 bytes/second.

The Logomatic manual states the uSD card needs up to 40 ms to write a 512 byte sector, which is only 12,800 bytes/second. That's why the "Safety" feature exists in the current code. That's why the default baud rate for text logging is only 9600.

No matter how you slice it, the current V2 uSD write performance is pretty miserable and must be improved. Preferably tripled!

I'm confident the 40ms writes happen only occasionally, and are primarily due to dynamic FAT16 cluster allocation: Pre-allocation will eliminate them during log file writes. The FAT file system grows open files dynamically as they are written: Updating the cluster allocation for a file typically causes 2 additional flash sectors to be modified (erased and rewritten), turning a single sector write into a 2 sector read, a 2 sector erase, a 2 sector write, finally followed by the 1 sector write we were trying to do in the first place! This is just how the FAT file system on flash memory works.

Another source of delay occurs when the FAT16 library allocates a previously-used sector that is subsequently written to: The uSD card will implement an automatic erase cycle before writing the data, greatly delaying completion of the sector write. Remember, most filesystems don't erase sectors when a file is deleted - the filesystem just updates the free sector map, which is why file undelete tools work (the data is still there in the sectors). And remember that a flash sector must be erased before being re-written, unlike a hard disk.

Even the smartest FAT16 library can't get around this. It's part of the the design of the filesystem and it's inherent in the characteristics of flash memory technology. Which is one reason (among many) why FAT is actually a lousy filesystem for use on flash media: Filesystems like YAFFS and JFFS are vastly superior, but only Linux systems will natively mount SD cards formatted this way. Which works fine for me, but isn't a feature I'd like to force on the world (despite being very tempted to do so).

Testing and clearing the pre-allocated log file will eliminate the risk of implicit erase cycles during log writing. But the need for this process can itself be eliminated if the user is careful to NEVER delete log files: Just use the card until it is full, then do a 'deep' reformat to reclaim and erase the space. So I probably won't implement this on the V2.

For a little extra insurance, creating more write buffers will spread any other occasional uSD write delays (such as automatic bad sector remapping) over a longer time period, reducing their impact on the average write rate, hopefully preventing any logging interruptions. And the RAM is there anyway, so why not put it to good use?
By TriBob
#138776
@skimask:
I intend to log data at 3200 Hz for 4 hours (the life of the battery), which means I'll be generating log files up to 512 MB in size. Since the largest FRAM currently on sale holds only 1 Mbit (128 KB), I'd need something like 4000 FRAM chips to hold just one log file.
By TriBob
#138777
@Polux rsv:
My needs don't include any use of I2C. If you want that, it should be straightforward for you to port the relevant parts of the PacketTracker code.

I'm primarily interested in high-rate logging,and I2C is a slow bus.
By UhClem
#138814
TriBob wrote:@UhClem:
I also intend to log the 16-bit RTC tick counter (needed for statistical timing and sampling analyses), which bumps the byte count up to 8 bytes/sample, or 25,600 bytes/second. And that's assuming the "$$" binary separator text is eliminated, otherwise it would be 10 bytes/sample, or 32,000 bytes/second.
A minor point which I did consider but this is far below the maximum attainable speed of the SD card. In any case, why use the RTC counter?
Why not use one of the regular timers?
The Logomatic manual states the uSD card needs up to 40 ms to write a 512 byte sector, which is only 12,800 bytes/second. That's why the "Safety" feature exists in the current code. That's why the default baud rate for text logging is only 9600.
The FAT library code delivered with the Logomatic (and other choices, like a very slow SPI clock) are the reasons for these limits. Which are actually too high. It has been a while but I analyzed the FAT code and posted about it in a thread here somewhere. When it needs to allocate a new cluster it reads the entire FAT chain. Twice.

If you read the link to the firmware I posted earlier you would have seen that I achieved a write rate of over 300,000 bytes/second while reading the data from the ADCs. Part of that speed increase is from deferring all FAT table updates to file close. Note that I did not pre-erase any sectors. I am certain it could have gone even faster with better hardware on the micro-controller side as the limiting factor was getting bits over the SPI bus.

I learned a few things in the process. One is that writing a sector does not necessarily trigger an erase of that sector. For example, some cards use an erase pool. Sectors are grouped together along with a number of spare sectors which are kept erased. (the erase pool) When a sector is written it is written to a sector from the erase pool. The sector that used to respond to that address is then erased and sent to the erase pool.

This has at least two nice effects. The first is that an erase operation is not required before the write. The second is that this provides load leveling because even if you wrote a single sector repeatedly, the data would be stored in a set of sectors cycled through the erase pool.

Another thing I learned and I had this point rammed home today, is that not all SD cards follow the specification. I just had an initialization bug in the new SDIO code I am working on that vanished when I swapped out the Kingston microSD card I was testing with.
Updating the cluster allocation for a file typically causes 2 additional flash sectors to be modified (erased and rewritten), turning a single sector write into a 2 sector read, a 2 sector erase, a 2 sector write, finally followed by the 1 sector write we were trying to do in the first place!
Your math is off a bit. The typical sequence, assuming no dedicated FAT sector buffer is:

read
write
write (if you update the copy of the FAT that nearly every FAT file system is created with)

Worst case is when the current cluster is in the last position in a sector:

read (next FAT sector)
write (modified next FAT sector)
read (current FAT sector)
write (modified current FAT sector)

(Don't forget the two extra writes for the copy.)

While you could issue explicit erase commands before the writes, SD cards usually hide the erase from you. You might get higher write speeds by pre-erasing but at your write speeds, it is not needed. (How would an explicit erase command interact with the erase pool?)

You might have noticed that this assumes no extra time spent skipping over clusters already allocated to another file. If that happens, add more sector reads to find a free cluster.


The thing about these updates that hurts performance is the requirement to transfer these extra sectors over the slow SPI bus. (The LPC2148 SPI port maximum clock speed is much lower than the maximum the SD cards support.) Worse, because the Logomatic connects the SD card to the SPI port and not the SSP port, there is no data FIFO available which means that servicing interrupts to gather your input data can slow down SD card transfers. If you are servicing an interrupt when the SPI shift register goes empty, you cannot fill it in a timely fashion.

Have I mentioned before that using the SPI port for the SD card sucks? :-)

Since cluster allocation doesn't happen too often, if you can buffer your data it doesn't hurt too bad. The Logomatic used exactly two 512 byte buffers so it was especially susceptible to this problem. This was an amazingly bad choice since there was about 30K of RAM not being used for anything.
By TriBob
#138821
why use the RTC counter? Why not use one of the regular timers?
The RTC can operate from either the Q1 or Q2 clocks sources, allowing me to use whichever turns out to be most accurate and stable. Plus, I have other uses for the RTC I may implement, so it will be nice to have it up and running anyway.
If you read the link to the firmware I posted earlier
You didn't post it to this thread, so I haven't seen it. It would be courteous if you would provide the context for your comments WITH your comments.
Part of that speed increase is from deferring all FAT table updates to file close.
That's not a wise design or implementation choice, since it takes an unnecessary risk: What is the state of the card if logging is interrupted unexpectedly, such as by a shock dislodging the card or the battery? You'd see only a zero-length log file!

The major advantage of the pre-allocation approach is that the FAT is always in a consistent state during logging. If logging ends unexpectedly, pre-allocation ensures there will be a huge log file that has 0xFF from the last successful write to the end of the file, something that is trivially easy to deal with.

Imagine if an aircraft data recorder used your system, and the aircraft crashed (causing a sudden end to logging). Not good: It would take a significant amount of forensics to reconstruct the logged data, with little assurance that the reconstruction was correct. The ONLY case where the reconstruction has a good chance to be correct is if it can be proven that the card had been reformatted since the prior log file had been written, and that the config file had been written to the card just once (no edit-update cycles). Not exactly a robust premise to assume! (Not that a Logomatic would be used for this purpose: I'm just trying to make my point a bit more concrete.)

I choose to design and implement my systems to survive worst-case upsets whenever possible. Especially when it takes only 6 lines of code to do so in this particular case. The only down side for this degree of data security (combined with the desired benefit of faster logging) is a longer delay between power-on and the start of logging.

I'm working on an idea to eliminate this startup delay, and possibly make startup faster than the current V2 code, while still retaining all the advantages of pre-allocation.
By UhClem
#138856
You didn't post it to this thread, so I haven't seen it. It would be courteous if you would provide the context for your comments WITH your comments.
I posted it to another recent thread which you started and responded to so you HAVE seen it. I foolishly assumed you would remember it.
By TriBob
#138864
I foolishly assumed you would remember it.
Call me blind, but I saw no link to any code of yours in that thread. All the links in that thread were mine. Check for yourself! Please prove me wrong.

You could have simply posted a link to your code. Instead, you posted a link to a thread that doesn't contain a link to your code. You seem to be fine with wasting everyone's time, including your own, instead of moving a technical discussion forward.

While your hardware advice has been valuable, your software advice, at least in this instance, has been of no use whatsoever. The one specific idea you did proudly share exhibited terrible design choices.

I think I'll just have to get along without your code. I see no reason to keep beating you over the head for a link you refuse to post to this thread.

But thanks for the offer.
By TriBob
#138913
uSD Card Tests:

Before modifying the V2 code to attempt to improve uSD performance, I decided to first see just how fast my 2GB Patriot uSD card can transfer data on my Linux system.

To ensure the card itself is being tested (and not the adapter or hub), I put my uSD card in an SD adapter that was then placed in an SDHC-compatible SD card reader on a USB2 port. This setup uses the full 4-bit SD interface, while the V2 uses the SPI interface, which is at least 4x slower (probably worse).

My card shows up as /dev/sdb, and the first (and only) partition, /dev/sdb1, mounts as /media/728D-CF57

Read speed:
Code: Select all
$ sudo hdparmn -t /dev/sdb
/dev/sdb:
 Timing buffered disk reads:  40 MB in  3.01 seconds =  13.28 MB/sec
Not too shabby!

But we really care about write performance:
Code: Select all
$ dd count=1k bs=1M if=/dev/zero of=/media/728D-CF57/test.img
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 355.876 s, 3.0 MB/s
Not bad! This test was performed on a "dirty" card that had been completely filled up, all files deleted, and NOT reformatted prior to the test. So it includes all erase-before-write acions the card performs behind the scenes.

The SD card specification says the SPI port on a uSD card should be capable of operating at 50 MHz, which yields a transfer rate of 6.2 MB/s. This will limit the maximum read speed, but should not in any way hinder the write speed.

But how fast can the LPC2148 transfer data out SPI0? First, let's look at the SPI0 clock: The V2 uses a 12 MHz crystal that the PLL multiplies by 5 to yield a 60 MHz processor clock. This clock is divided by 4 to yield a peripheral clock of 15 MHz. The lowest allowed SPI0 clock divider value is 8, yielding a maximum SPI0 clock rate of 1.875 MHz. It takes SPI 8 clocks to transfer each byte, yielding a maximum possible SPI0 transfer rate of 234.375 KB/s.

However, the SPI0 transfer is not done via DMA, and requires the processor to poll the SPI0 status and loop to transfer each byte, so there are some instruction delays that occur between when one SPI0 byte transfer ends and the next begins. I haven't counted the instructions, but it should be no more than 20 or so, for a polling rate of about 3 MHz, which yields only a tiny reduction in overall the SPI0 data rate. So let's just round down VERY generously, to include tons of time to send the write command, to yield a maximum expected uSD transfer rate of 200 KB/s.

I expect to log a minimum of 8 bytes per sample, and the limit of the ADXL345 is 3200 samples/s, yielding a logging rate of 25.6 KB/s. Even given all the limitations of the V2 processor, this should be easily achievable. But the current code is limited to just under 13 KB/s.

If we put this in terms of 512 byte sectors, thats an expected ideal transfer rate of 400 sectors/s, a desired logging rate of 51.2 sectors/s, and a current V2 performance of about 25 sectors/s.

Another way to look at it is as the time needed to write a sector: 2.5 ms/sector max, 19.5 ms/sector desired, and 40 ms/sector currently.

So why is the current code so slow? That was partially discussed in a prior post in this thread. Basically, whenever a new cluster is needed, the FAT code must read and update at least one FAT sector, sometimes more. At best, this temporarily cuts the throughput by a factor of 2 (probably slightly worse when you add in the time needed to compute the update). So the 40 ms/sector may be the worst-case, due to allocating a new cluster, meaning the write rate between cluster allocations should be less than 20 ms/sector.

In other words, if we can eliminate the FAT updates for cluster allocation, we should be able to attain at least 20 ms/sector. That's enticingly close to our desired rate of 19.5 ms/sector!

However, I may have just found a bug in the V2 code: In the Initialize() function in main.c, the "S0SPCR" register is being set to 8, when it should be the "S0SPCCR" register! A single-character typo that yields another valid register name instead of creating an error that would be easy to fix. The previous value written to the S0SPCCR register appears to be 60. That error would degrade the transfer rate by a factor of 7.5, to 18.75 ms/sector MAX! That seems way too close to 20 ms to be a coincidence.

If this is indeed a bug, then we may not need to go through the bother of trying to avoid the FAT updates: We may have more than enough uSD bandwidth.

I'll code it up this weekend and see if I missed something, or if my analysis is wrong.
Last edited by TriBob on Sat Jan 28, 2012 3:29 am, edited 3 times in total.
By TriBob
#138916
BOO-YAH!

I couldn't wait to see if the bug was real. It was. I put in the fix, cleaned up several existing compiler warnings, then tested the new FW.SFE.

I used the following LOGCON.txt content:
Code: Select all
MODE = 2
ASCII = N
Baud = 4
Frequency = 3200
Trigger Character = $
Text Frame = 100
AD1.3 = Y
AD0.3 = Y
AD0.2 = Y
AD0.1 = N
AD1.2 = N
AD0.4 = N
AD1.7 = N
AD1.6 = N
Saftey On = N
That's reading 3 ADCs at 3200 Hz, then logging in Binary mode with a "$$" separator for 8 bytes per sample.

I acquired for 10 minutes, which yielded a log file containing 15,500,608 bytes. The actual sample rate captured was 15,500,608 bytes / 8 bytes/sample / 10 minutes * 60 seconds/minute = 3229 Hz!

Considering I did the timing by hand, that is deliciously close to the configured rate!

I need to do more testing to ensure no samples were dropped, and that no sector was written twice in a row (a risk in the current V2 code if a Tx buffer overflows).

However, at this point I can confidently predict the V2 should be able to log from the serial port at 96,000 baud! (Assuming it could be set to that rate.)

My next step will be to add a timing test mode, and put all the free ram into uSD wrrite buffers.
By TriBob
#138918
I tried to see how many ADCs I could read and log, and how fast. It seems with the new faster SPI rate, the ADC conversion polling combined with SPI write polling creates a problem: The CPU runs out of polling cycles! At 3200 Hz, it is possible to read only 6 ADCs, not all 8. And even then, about half the data fails to be written to the uSD card.

I suspected this could happen, which is one reason why I went with the ADXL345, an SPI accelerometer that can interrupt the processor when data is ready: No polling required.

The CPU is wasting lots of time in the polling loop waiting for each SPI byte transfer to complete: A one byte transfer at max speed takes 256 processor clock cycles. Seems it's time to make the SPI interfaces interrupt-driven so they can both run flat-out.

For that matter, an ADC conversion takes at least 145 processor clocks (generally more), so perhaps interrupts should replace polling there as well (in addition to the frequency timer interrupt). The UART appears to already be fully interrupt-driven.

The net ADC conversion rate can be doubled by starting two conversions in parallel, one on each ADC. The current code does only one at a time, each in sequence. Hmmm... I wonder if the ADC Burst and Global modes can be combined with a timer? That could eliminate all the ADC overhead.

I won't be making any ADC code modifications for my project, but if anyone else does, I'd be glad to merge the changes!
By TriBob
#138939
Some notes on my build environment:

I obtained the latest V2 sofware source using GIT:
Code: Select all
git clone https://github.com/SFE-Chris/Logomatic-V2.git
The main advantage of using git is that it makes it easier to a) make your own changes reversible, and b) share your changes with others. If you are new to git, I recommend using "git gui" to manage your local repository until you get used to the git command line.

For now, I'm only able to share source, diffs and patches, since my local git repository isn't public (my ISP makes it needlessly difficult). If you want to see my changes as I make them, or if you'd like to contribute some changes of your own, just ask, and I'll clone my repository to github.

I'm using the Free (and current) Code Sourcery ARM EABI GCC toolchain, available for Linux and Windows (and Mac with minor effort) here:
https://sourcery.mentor.com/sgpp/lite/a ... iption3053
(Yes, if I were really hard-core I'd build the ARM GCC toolchain from source. But I'd rather build loggers, not compilers.)

After installing it, I modified this section of the V2 Makefile to use the right toolchain command names:
Code: Select all
## Multi-platform Code Sourcery toolchain (ARM EABI)
CC = arm-none-eabi-gcc
CPP = arm-none-eabi-g++
OBJCOPY = arm-none-eabi-objcopy
OBJDUMP = arm-none-eabi-objdump
SIZE = arm-none-eabi-size
NM = arm-none-eabi-nm
If you compile the unmodified V2 code, you will get lots of compiler warnings. To make them easier to see, I build using the following command line from within the Main directory:
Code: Select all
make clean all | grep -v eabi-gcc
Yes, I rebuild everything every time. Omit the 'clean' target if you want to rebuild only what's changed.

Before making any changes to the code, do an as-is build, copy the new FW.SFE file from the Main directory to the root directory of the Logomatic V2 uSD card, then start the V2 and ensure the new firmware performs EXACTLY like the old. The new firmware may differ in size by a handful of bytes from the old, due to using a different version of GCC. If you suspect a problem, the original firmware will be in the top V2 directory (above Main).

The compiler warnings are mainly of two types: Missing includes / function prototypes, and side-effects of debug code. In my code base I've fixed the first kinds, but left the last alone (for now).

FWIW, I'm doing my development under 32-bit Ubuntu 11.10 (with Cairo-Dock/GLX-Dock instead of Unity), and I'm using Geany as my code editor.