Skip to main content

Debugging: Solaris bus error caused by taking pointer to structure member

Take a look at this sample program that fails horribly when compiled on Solaris using gcc (I haven't tried other compilers, and I'm not pointing my finger at gcc here, this is a Sun gotcha).

Here's an example program (simplified for something much more complex that I was debugging), that illustrates how memory alignment on SPARC systems can bite you if you are doing low-level things in C. In the example the program allocates space for a thing structure which will be prepended with a header. The header structure has a dummy byte array called data which will be used to reference the start of the thing.
struct thing {
  int an_int;

struct header {
  short id;
  char data[0];

struct header * maker( int size ) {
  return (struct header *)malloc( sizeof( struct header ) + size );

int main( void ) {
  struct header * a_headered_thing = maker( sizeof( struct thing ) );

  struct thing * a_thing = (struct thing *)&(a_headered_thing->data[0]);

  a_thing->an_int  = 42;

If you build this on a SPARC machine you'll get the following error when you run it:
Bus Error (core dumped)

Annoyingly, if you build a debugging version of this program the problem magically goes away and doesn't dump core in the debugger. So you either resort to printf-style debugging or going into gdb and looking at the assembly output.

Here's what happens when you run this in gdb (non-debug code):
(gdb) run

Program received signal SIGSEGV, Segmentation fault.
0x000106d8 in main ()

Since you can't get back to the source we're forced to do a little disassembly:
(gdb) disassemble
Dump of assembler code for function main:
: save %sp, -120, %sp 0x000106b4
: mov 4, %o0 0x000106b8
: call 0x10688 0x000106bc
: nop 0x000106c0
: st %o0, [ %fp + -20 ] 0x000106c4
: ld [ %fp + -20 ], %o0 0x000106c8
: add %o0, 2, %o0 0x000106cc
: st %o0, [ %fp + -24 ] 0x000106d0
: ld [ %fp + -24 ], %o1 0x000106d4
: mov 0x2a, %o0 0x000106d8
: st %o0, [ %o1 ]
: mov %o0, %i0 0x000106e0
: nop 0x000106e4
: ret 0x000106e8
: restore 0x000106ec
: retl 0x000106f0
: add %o7, %l7, %l7 End of assembler dump.

I've highlighted the offending instruction. From the code you can clearly see that the o0 register contains the value 0x2a (which is, of course, 42) and hence we are looking at code corresponding to the line a_thing->an_int = 42;. The st instruction is going to write the 42 into the an_int field of thing. The address of an_int is stored in o1.

Asking gdb for o1's value shows us:
(gdb) info registers o1
o1             0x2094a  133450

An int is 4 bytes and you can easily see that the address of an_int stored in o1 is not 4 byte aligned (133450 mod 4 = 2, or just stare at the bottom nybble). The SPARC architecture insists that the data accesses be correctly aligned for the size of the access. In this case we need 4 byte assignment (note that malloc will make sure that things are correctly aligned and the compiler will pack structures to the correct alignment while minimizing space).

In this case, the code fails because the data member is byte aligned (since we declared it as a character array), but then we take a pointer to it and treat it as structure with an integer member. Oops. Bus error.

(Note you could have discovered this with printf and %p to get the pointer values without going into the debugger and poking around in the assembly code).

There are a couple of ways to fix it. The first is to pad the header structure so that data is correctly aligned: adding 4 bytes of padding in the form of a short while make the problem go away:
struct header {
  short id;
  short padding;
  char data[0];

That's ugly and requires careful commenting and could be a maintenance problem if maker is used to make things requiring a different alignment, or the header structure is modified.

It's slightly cleaner to not have padding but change the type of data to something like the alignment you want:
struct header {
  short id;
  int data[0];

(Or even double data[0] to get 8 byte alignment). With gcc you could even make this really clear by using the aligned attribute to create a special type:
typedef char aligned_data __attribute__ ((aligned (8)));

struct header {
  short id;
  aligned_data data[0];

I think that's the clearest option of all. With a little documentation around this it should be maintainable.


Unknown said…
It's been a number of years since I did low-level system programming in C, but I my recollection is that the maximally-portable solution to your problem is to use a union for the data field of the struct.

Of course, if you don't care about compiling for something older than c99, the aligned attribute is fine.

Popular posts from this blog

How to write a successful blog post

First, a quick clarification of 'successful'. In this instance, I mean a blog post that receives a large number of page views. For my, little blog the most successful post ever got almost 57,000 page views. Not a lot by some other standards, but I was pretty happy about it. Looking at the top 10 blog posts (by page views) on my site, I've tried to distill some wisdom about what made them successful. Your blog posting mileage may vary. 1. Avoid using the passive voice The Microsoft Word grammar checker has probably been telling you this for years, but the passive voice excludes the people involved in your blog post. And that includes you, the author, and the reader. By using personal pronouns like I, you and we, you will include the reader in your blog post. When I first started this blog I avoid using "I" because I thought I was being narcissistic. But we all like to read about other people, people help anchor a story in reality. Without people your bl

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it: Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name. There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly. What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message: Our system is unable to process last names that contain non-letters, please replace them with spaces. Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your

The Elevator Button Problem

User interface design is hard. It's hard because people perceive apparently simple things very differently. For example, take a look at this interface to an elevator: From flickr Now imagine the following situation. You are on the third floor of this building and you wish to go to the tenth. The elevator is on the fifth floor and there's an indicator that tells you where it is. Which button do you press? Most people probably say: "press up" since they want to go up. Not long ago I watched someone do the opposite and questioned them about their behavior. They said: "well the elevator is on the fifth floor and I am on the third, so I want it to come down to me". Much can be learnt about the design of user interfaces by considering this, apparently, simple interface. If you think about the elevator button problem you'll find that something so simple has hidden depths. How do people learn about elevator calling? What's the right amount of