SparkFun Forums 

Where electronics enthusiasts find answers.

Have questions about a SparkFun product or board? This is the place to be.
By bobcousins
#99329
If you are getting many files being corrupted it sound like clusters are being allocated incorrectly, or errors writing to the FAT itself. I would suspect a bug in the FAT code. With large files quite possibly an overflow type error calculating cluster address or something.

If you are using code from a book it is probably not very widely tested and/or out of date. Have a look at the author's website/blog for some later code http://blog.flyingpic24.com/2009/06/01/june-bugs/

Sounds like this fix might be important...
** 05/31/08 V2.2 LDJ writeFAT bug fix
By UhClem
#99343
I took a quick look at the SDlib code at the link provided and have a few comments.

1) For something that bills itself as "SDlib" it uses the obsolete MMC card initialization routine.

2) While the FAT table access code does attempt to cache a copy of the current FAT page, allocating a new cluster is slow. Really slow. Function newFAT() does:

a) Scan FAT table, starting with a sector read.
b) Mark free cluster as EOF, write.
c) Link previous cluster and write.

So this requires two writes even when the two clusters are in the same sector.

Complicated by a detail that I had forgotten: there is usually more than one copy of the FAT table (typically two) and they must both be updated.

So at best this requires 1 sector read and 4 writes. If the clusters are in different sectors this requires 2 reads and 4 writes. Add more reads if a free cluster isn't found in the first sector.

3) The file write code does some extra bit shuffling. The first thing it does is copy the data to be written into its internal sector buffer. If you are doing byte at a time I/O this is great. If you are logging lots of data and already have your own data buffers (highly recommended), not so good.

On the plus side, it doesn't update the file size in the directory except on file close.

On a side note there is a very disturbing graph in the SD specification. Figure 4-9 on page 70 seems to say that as the card fills up, speed declines. A lot. But so much of that standard is redacted that it is impossible to know what it really means.



Do you have your own data buffers or are you just calling fwrite(M) and letting it do the sector buffering?
By sebmadgwick
#99358
bobcousins wrote:If you are using code from a book it is probably not very widely tested and/or out of date...
I am using the latest code from the website which has noted corrections.

UhClem, first of all, thank you for spending the time that you have reviewing the code.
UhClem wrote:1) For something that bills itself as "SDlib" it uses the obsolete MMC card initialization routine.
I've basically threw out their physical layer interface (i.e. initSD, read/write sectors) and wrote my own. FAT library that sits on top of this is largely unchanged. I have ditched the 400kHz start up SPI clock because I’ve found it doesn’t always init the card; 3.685 MHz (my full speed) works every time; is this is obsolete feature you are referring to?
UhClem wrote:On the plus side, it doesn't update the file size in the directory except on file close.
This is an observation I was able to make and reflects your advised solution within the other thread ("Who makes fast micro SD cards?" in this forum)
UhClem wrote:Do you have your own data buffers or are you just calling fwrite(M) and letting it do the sector buffering?
I have a great big circular buffer. Data is added to the buffer in with a sampling timer interrupt, and 'fwrite' is called (within main loop) whenever the the number of bytes in the buffer >2048.

It seems you have easily identified inadequacies in the FAT code, would it be just as easy to suggested the required changes?

Latest info: I left the device logging overnight again and this time a 800MB was created with no errors (in it or other files on drive). This time I used the Samsung card (SanDisk previously). As discussed in the other thread ("Who makes fast micro SD cards?"), I have found that Samsung cards achieve a practical throughput double that of SanDisk (with SPI). Over the next few days I will be repeating tests to obtain representative results.
By UhClem
#99367
sebmadgwick wrote: I've basically threw out their physical layer interface (i.e. initSD, read/write sectors) and wrote my own. FAT library that sits on top of this is largely unchanged. I have ditched the 400kHz start up SPI clock because I’ve found it doesn’t always init the card; 3.685 MHz (my full speed) works every time; is this is obsolete feature you are referring to?
OK, Several problems.

400KHz max clock is absolutely a requirement of the SD specification: "Until the end of Card Identification Mode the host shall remain at fOD frequency because some cards may have operating frequency restrictions during the card identification mode." The obsolete part refers to the initialization sequence as given in Figure 7-2 of the specification.

Cards are insensitive to SPI clock speeds slower than their maximums so your code not always working at 400KHz is a very bad sign.
It seems you have easily identified inadequacies in the FAT code, would it be just as easy to suggested the required changes?
I did at least hint at a couple of desirable changes:

1) Change the code that allocates a new cluster so that two writes (per FAT copy) per cluster occurs only once per 256 clusters instead of every time.

2) Add a new write function that writes whole sectors without copying data hither and yon.

I did not see any obvious problems that would cause troubles with FAT allocation and sector addressing but that would require deep thought rather than a quick scan.
Latest info: I left the device logging overnight again and this time a 800MB was created with no errors (in it or other files on drive). This time I used the Samsung card (SanDisk previously). As discussed in the other thread ("Who makes fast micro SD cards?"), I have found that Samsung cards achieve a practical throughput double that of SanDisk (with SPI). Over the next few days I will be repeating tests to obtain representative results.
Have you checked to see how many FAT copies they have?
By sebmadgwick
#99381
I’ve now re-implemented the 400 kHz (actually 460 due to system clock) and it works :? . I'm not sure what was going on, I tested this previously on more than one occasions. Anyway, that's that solved.
UhClem wrote:I did at least hint at a couple of desirable changes:
Thanks for that. They are on my list of future [non-urgent] tasks. I no longer have any issues with data throughput limitations so for now I will focus on solving the corrupt file issue.
UhClem wrote:Have you checked to see how many FAT copies they have?
No. How can I do this? What conclusions can be drawn from an answer to this question?
By UhClem
#99393
sebmadgwick wrote:Thanks for that. They are on my list of future [non-urgent] tasks. I no longer have any issues with data throughput limitations so for now I will focus on solving the corrupt file issue.
You have never described exactly what the file corruption looks like. I figured out my corruption problem after I noticed data that was identical to the file system FAT tables and directory structures in the middle of my data files.
No. How can I do this? What conclusions can be drawn from an answer to this question?
There are several ways. One is to simply look at the media descriptor that the FAT code builds because it has a value in there for the number of FAT copies.

For example, when I modified the Logomatic code I changed the function that writes a default logcon.txt file to include some extra data:

Card Specific Data =
0 26 0 32 5f 5a 83 c9 3e fb cf ff 92 80 40 cb

Sector zero = 249 (0xf9)
Start of Data = 519 (0x207)
RootDirectory = 487 (0x1e7)
Sectors per FAT = 243 (0xf3)
Sectors per Cluster = 64 (0x40)
Reserved Sectors = 1 (0x1)
Total Sectors = 3967238 (0x3c8906)
Bytes per Sector = 512 (0x200)
Number of FATS = 2 (0x2)
Root Directory sectors = 32 (0x20)
Count of Clusters = 61972 (0xf214)

This being from my 2GB Sandisk Ultra II. It has been a while so I can't recall for certain but since I fragged file systems several times in the process, this is almost certainly not the original format.


I suspect that at least part of the difference in speeds between your cards is due to one having more FAT copies than the other. I think I remember a comment about having to reformat the Sandisk several times and it is likely that the default option for the formatting routine is two copies. This is reasonable for a hard disk which has a higher chance for errors. But flash memory devices don't have those sorts of problems and they do have internal firmware checking things out.
By sebmadgwick
#99427
When I say the files are corrupt, I mean that I put the card in my PC and it says something like... 'cannot read file as it is corrupt'. My understanding of FAT is extremely limited.
UhClem wrote:There are several ways. One is to simply look at the media descriptor that the FAT code builds...
I'm not even certain where to find that right now, sorry. I've got a copy of the book arriving today so I shall be studying the the underlying theory intensely; hopefully I will then be able to understand and implement your suggestions.

As a parallel task, I am going to try and use the logomatic FAT/SD code. My device is a dsPIC but I hope it will not be too hard to transfer. Any comments on that? sounds like you used the code also.

I did another another test last night, the Samsung card is subject to the same problems as all the others.
By UhClem
#99507
sebmadgwick wrote:When I say the files are corrupt, I mean that I put the card in my PC and it says something like... 'cannot read file as it is corrupt'. My understanding of FAT is extremely limited.
Curiosity reared its ugly head so I spent an hour or two playing with the code. I replaced the SD code with some stubs to access a dummy file system in a file built with mkdosfs. I discovered a minor bug that might be your problem.

After writing a sector to the drive the code checks to see if it has filled up the current cluster. If it has it allocates a new cluster. If no more data is written, there will be a mismatch between the cluster chain and the file size. fsck complained about this problem.

Looking at the code, fixing this will not be easy because it uses a single buffer for both the data and when managing the cluster chain. As a work around, if you are writing data in 512 byte (or anything that evenly divides the cluster size) chunks is to write a single extra byte just before closing the file.

If you are writing a random amount of data in 512 byte chunks then this problem will happen randomly depending on exactly how much data was written. Hit a magic multiple of the cluster size and you get an error. Not something that should be a big problem but it might upset overly sensitive software.
By sebmadgwick
#99530
That’s great that you've identified a problem but rather concerning that this library is now known to be 'faulty'; even more so because I do not think this fault would account for my problem. If I understand you correctly; assuming the number of bytes in any file is some number >512 and random (uniformly distributed), then a kB files should be just as likely to be corrupt as MB and GB files.

In my on-going quest for a solution I tried to implement the logomatic code with no success (gets stuck somewhere after 'openroot') and then implemented the microchip fileIO library. Microchip's library seems to work well (600MB file over night with no errors) BUT is sooo slow. I had a while(1) loop constantly calling fileWrite() parsing a 500 byte string and the throughput of this was just under 19 kB/s!! ...compared to the 50 kB/s or so of the ‘learning to fly PIC24’ library.

My immediate direction is to really optimise microchip’s SD physical interface layer (e.g. sector read/write in assembler) and just hope that things speed up enough; throughput must not fall below 25 kB/s. On the plus side this library supports FAT32; btw. compiler #if ect. allow FAT32 support to be removed, throughput does not changed noticeable.

It is my preference to use the ‘learning to fly PIC24’ library as I think it is more elegant than the microchip’s and that this is reflected in its speed. Obviously while it corrupts files, it is not an option.
If your Curiosity provokes even more investigations in to my problems, please keep me updated. Thank you, and I shall post my findings when available.

P.S. if you can communicate exactly what the problem (above) is, perhaps it would be worth contacting the book author. From their website they seem very inviting to public contributions.
By sebmadgwick
#99568
So I spent just about the whole day fiddling with code and re-writing parts of the SD-SPI interface in asm and I've boosted the file-write throughput by 40% (now is ~26 kB/s). However, I had miscalculated and this is still not fast enough; my buffer is overflowing! I am running out of options...
By UhClem
#99595
One more reason to dislike the SDlib FAT handling...

Flash memory devices have a finite number of erase/write cycles before they fail. The cards implement wear levelling algorithms to try and control this is of limited utility when you abuse them.

Note that for every cluster allocated the SDlib performs two writes. That is two erase/write cycles every time.

I read a Sandisk white paper on their wear levelling scheme. Which they call write before erase.

The basic idea is that for each block of sectors there are 3% more blocks than are made available externally. These are the erase pool.

Some number of sectors (32 or more) are grouped together into a block. This block is erased as a unit and written as a unit. The blocks are grouped together into a zone and each zone (4MB) has a set of extra blocks in the erase pool.

The FAT table accesses will be concentrated in a small number of zones and those zones are likely to wear out first. By raising the number of erase/write cycles the SDlib code will promote the early death of SD cards used with it. (Other wear leveling schemes might be used by other vendors but the problem remains even if the details change.)

Logging a large file can generate 512 write cycles in each FAT sector used. This is bad.

Assuming that a zone is 4MB, the FAT will reside in a single zone. Consider writing a 512MB file to a 1GB disk. The file uses 32,768 clusters and will generate 64K writes to this zone. There are 256 blocks so assume 8 spares. That is 248 write/erase cycles per block just for this one file. (The numbers I see on write/erase cycles vary from 10K for cheap parts to 2M for high end like Sandisk.)

But it is worse than that because the writes are concentrated to a small range of blocks so not all blocks in the zone get circulated through the erase pool. The hypothetical 512MB file will use 2 blocks. (Most likely spread over three but assume two to simplify.) The first block will have 16,384 writes spread over 9 blocks (the original plus the erase pool). Then it moves on to the next block. Another 16,384 writes will be spread over the 9 blocks but the 8 in the pool started with 16,384/9 and end up with twice that. Or 3,640 cycles after just one file. (Since there are typically two copies of the FAT table and both will be in the same zone, it is actually a bit worse.) The only thing that could help this situation is if you write to the other blocks in this zone which would spread the writes out a bit.

The SDlib code already leaves the file system in an inconsistent state by only updating the file size field in the directory entry at file close so delaying updating the FAT for a while should not impose any additional risk.

The change to the SDlib code is pretty simple: add a dedicated 512 byte buffer to hold FAT sectors. This is updated in place and only written back to the SD card when a new FAT sector must be read or on file close. The potential reduction in write/erase cycles per sector is from 512 to 2. With a significant increase in the life of the SD card.