Get extra free RAM in ESP8266.

Introduction

The available RAM for applications on the ESP8266 microcontroller is roughly 35-40 kilobytes (KB).
Which is not much and every KB is quite precious.

Below you will learn how to gain additional RAM in your application with a simple trick.
The trick is working only for constant data and is needed if you are compiling your application using the free GCC compiler and toolchain for
ESP8266.

Problem description

Let’s take a look at the following C definition

static const uint16_t vm_decode_table[] = { 1,2,3,4,5,6,7,8,10, 11 };

The constant data is not changing during the lifecycle of the program. Such a constant data can be offloaded to FLASH and read only when needed into RAM to save heap space.
What is happening though is that the current open-source GCC compiler is not doing this and the data ends up in RAM.

In order to see this with our own eyes we can compile and link our application. After that we can check the placement of the different symbols by calling a command like the one below:

xtensa-lx106-elf-readelf -s ./out/build/app_0.out

The output will contain our vm_decode_table constant data and its location. Something like this

   Num:    Value  Size Type    Bind   Vis      Ndx Name
...

   487: 4025e85c   114 FUNC    LOCAL  DEFAULT    4 parser_emit_two_bytes
   488: 3ffed654   276 OBJECT  LOCAL  DEFAULT    2 CSWTCH$26
   489: 00000000     0 FILE    LOCAL  DEFAULT  ABS vm.c
   490: 4025eec8    67 FUNC    LOCAL  DEFAULT    4 vm_construct_literal_obje

   491: 3ffed900   610 OBJECT  LOCAL  DEFAULT    2 vm_decode_table

   492: 3fff6b1c     1 OBJECT  LOCAL  DEFAULT    3 is_direct_eval_form_call
   493: 4025ef44  5553 FUNC    LOCAL  DEFAULT    4 vm_loop
   494: 3fff6b20     4 OBJECT  LOCAL  DEFAULT    3 vm_top_context_p
   495: 402604fc   978 FUNC    LOCAL  DEFAULT    4 vm_execute

The output above is giving us information about the type, size and location of the different symbols. For example we see that parser_emit_two_bytes is a function (Column: Type, Value: FUNC) and is stored at address 4025e85c (Column: Value, Value: 4025e85c). The output for the symbol vm_decode_table is saying that the data is stored at address 3ffed900. And its size is 610 bytes.

Using the Memory Map for ESP8266 we can determine where the content will end up. In this map addresses from 3FFE8000h till 3FFFC000h end in dram0 and addresses from 40200000h – 40300000h end up in Flash. This means that the function parser_emit_two_bytes will end up in Flash and will not decrease the amount of available RAM. But unfortunately our constant data vm_decode_table will end up in RAM and will decrease the available RAM with 610 bytes which is more than half a KB.

Solution

We have plenty of Flash (around 1 Megabyte addressable) which is a lot more than the 40 K RAM that we have.
So wouldn’t it be great if we move some parts from our code into Flash?

Currently that is possible with a bit of work.  ESP8266’s flash allows only aligned 32-bit reads. In order to convince the linker to put the data in Flash we should align it properly. And then we need to have a compiler that puts the right read and write assembler commands so that our application does not crash when we start reading that data.

Which means that we need two things:
1. Compiler that supports the mforce-l32 option
2. Instruct the linker to put the data in Flash.

The latest open source GCC Xtensa compiler has support for the mforce-l32 option. If you are using the latest version of the esp-open-sdk then that is already the case for you. If you are not sure run the compiler with the following command:

xtensa-lx106-elf-gcc --help=target | grep mforce-l32

If there is an output similar to the one below then you are golden.

-mforce-l32                 Use l32i to access 1- and 2-byte quantities in

Otherwise either compile your ESP8266 toolchain/compiler or find a precompiled toolchain/compiler that supports the mforce-l32 option.

Once you are sure that your compiler has the needed option then you can go to the second step. Which is adding special attributes to the constant data.
In order to instruct the linker to put our constant data in Flash we need to use two attributes:

__attribute__((aligned(4))) __attribute__((section(.irom.text)))

Our initial constant data definition will start to look like this:

static const uint16_t vm_decode_table[] __attribute__((aligned(4))) __attribute__((section(.irom.text))) = { 1,2,3,4,5,6,7,8,10, 11 };

We can improve the code above and replace the attributes with a pre-processor directive.

static const uint16_t vm_decode_table[] ESP_CONST_DATA = { 1,2,3,4,5,6,7,8,10, 11 };

Where ESP_CONST_DATA can be defined in a header file or as a directive during the compilation.

#define ESP_CONST_DATA __attribute__((aligned(4))) __attribute__((section(.irom.text)))

After recompilation with those changes we will see that the location of the vm_decode_table is in Flash (address range from 40200000h – 40300000h) and system_get_free_heap_size()
will show more free bytes than before.

Neat, isn’t it 😉

Use it with care

But shall we run now and put all of our constant data into Flash? The answer is no.

First it makes sense to use this trick only for constant data that has a size of hundreds of bytes or more, unless you are really desperate.
What I did was to run the following command and import its output as a fixed length sheet in LibreOffice.

xtensa-lx106-elf-readelf -s ./out/build/app_0.out > /tmp/app-symbols.csv

Once the sheet with the data is in LibreOffice or your favourite spread sheet application you can sort all rows based on the “Size” column. Filter all data so that only rows with type OBJECT are shown. Then start optimizing the rows that will show on the top AND are constant data.

And second: reading and writing data to Flash comes with a price. The price is speed. Before putting your constant data into Flash make sure that it is not mission critical and microsecond delays are acceptable.

This trick was used to make JerryScript, an open-source JavaScript engine from Samsung, run on ESP8266 microcontroller and it brought more than 2 KB free heap.

Summary

If you are looking for an easy way to gain additional free bytes for your application make sure to apply this relatively easy trick on your application or the libraries that it depends on.

If you want to share you feedback make sure to leave a comment below 🙂

Tags

Post navigation


Comments

  • Geoffrey McRae

    Excellent information, thanks!

    I found that passing the readelf output through grep and sort you can get a quick output of the worst offenders.


    # xtensa-lx106-elf-readelf -s a.out | grep OBJECT | grep ': 3' | sort -k3 -n | tail
    1100: 3ffeb7f0 120 OBJECT GLOBAL DEFAULT 3 espconn_TaskQueue
    378: 3fff0010 148 OBJECT GLOBAL DEFAULT 3 chip6_sleep_params
    1382: 3ffeaa54 156 OBJECT GLOBAL DEFAULT 3 gScanStruct
    933: 3ffeb6f0 180 OBJECT GLOBAL DEFAULT 3 premot
    860: 3ffec660 232 OBJECT GLOBAL DEFAULT 3 pmc
    902: 3ffea030 256 OBJECT GLOBAL DEFAULT 3 event_TaskQueue
    1569: 3ffe8c48 257 OBJECT GLOBAL DEFAULT 2 _ctype_
    55: 3ffe83b8 1064 OBJECT LOCAL DEFAULT 1 impure_data
    988: 3ffea358 1704 OBJECT GLOBAL DEFAULT 3 g_ic
    1504: 3fff01a0 2244 OBJECT GLOBAL DEFAULT 3 app

    I also like to put all my global application variables into a struct called ‘app’ which makes it easy to track how much memory I am using.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.