## Wednesday, November 21, 2012

### A Formal Language for Data Definitions

Lately, I've started to think again about the irritating problem that there is no formal language for describing binary data layouts (at least not that I know of). So when people attempt to describe a file format or a network protocol they have to resort to vague and nondescript things like:

```Each section in the file starts with a header with the format:

0--20 bytes  extra data in header

The extra data is described below.```

As anyone who has tried to decipher such descriptions can testify, they are not always clear-cut, which leads to a lot of unnecessary work when trying to coax data out of a document.

It is even worse when I create my own data formats (for our engine's runtime data). I would like to document those format in a clear and unambiguous way, so that others can understand them. But since I have no standardized way of doing that, I too have to resort to ad-hoc methods.

This whole thing reminds me of the state of mathematics before formal algebraic notation was introduced. When you had to write things like: the sum of the square of these two numbers equals the square of the previous number. Formal notation can bring a lot of benefits (just look at what it has done for mathematics, music, and chess).

For data layouts, a formal definition language would allow us to write a tool that could open any binary file (that we had a data definition) for and display its content in a human readable way:

```height = 128
width = 128
comment = "A funny cat animation"
frames = [
{display_time = 0.1 image_data = [100 120 25 ...]}
...
]```

The tool could even allow us to edit the readable data and save it back out as a binary file.

A formal language would also allow debuggers to display more useful information. By writing data definition files, we could make the debugger understand all our types and display them nicely. And it would be a lot cleaner than the hackery that is autoexp.dat.

Just to toss something out there, here's an idea of what a data definition might look like:

```typdedef uint32_t StringHash;

struct Light
{
StringHash name;
Vector3  color;
float  falloff_start;
float   falloff_end;
};

struct Level
{
uint32_t version;
uint32_t num_lights;
uoffset32_t light_data_offset;

...

light_data_offset:
Light lights[num_lights];
};```

This is a C-inspired approach, with some additions. Array lengths can be parametrized on earlier data in the file and a labels can be used to generate offsets to different sections in the file..

I'm still tossing around ideas in my head about what the best way would be to make a language like this a reality. Some of the things I'm thinking about are:

## Use Case

I don't think it would do much good to just define a langauge. I want to couple it with something that makes it immediately useful. First, for my own motivation. Second, to provide a "reality check" to make sure that the choices I make for the language are the right ones. And third, as a reference implementation for anyone else who might want to make use of the language.

My current idea is to write a binary-to-JSON converter. I.e., a program that given a data definition file can automatically convert back and forth between a binary and a JSON-representation of that same data.

## Syntax

The syntax in the example is very "C like". The advantage of that is that it will automatically understand C structs if you just paste them into the data definition file, which reduces the work required to set up a file.

The disadvantage is that it can be confusing with a language that is very similar to C, but not exactly C. It is easy to make mistakes. Also, C++ (we probably want some kind of template support) is quite tricky to parse. If we want to add our own enhancements on top of that, we might just make a horrible mess.

So maybe it would be better to go for something completely different. Something Lisp-like perhaps. (Because: Yay, Lisp! But also: Ugh, Lisp.)

I'm still not 100 % decided, but I'm leaning towards a restricted variant of C. Something that retains the basic syntatic elements, but is easier to parse.

## Completeness

Should this system be able to describe any possible binary format out there?

Completeness would be nice of course. It is kind of annoying to have gone through all the trouble of defining language and creating the tools and still not be able to handle all forms of binary data.

On the other hand, there are a lot of different formats out there and some of them have a complexity that is borderline insane. The only way to be able to describe everything is to have a data definition language that is Turing complete and procedural (in other words, a detailed list of the instructions required to pack and unpack the data).

But if we go down that route, we haven't really raised the abstraction level. In that case, why even bothering with creating a new language. The format description could just be a list of the C instructions needed to unpack the data. That doesn't feel like a step forward.

Perhaps some middle ground could be found. Maybe we could make language that was simple and readable for "normal" data, but still had the power to express more esoteric constructs. One approach would be to regard the "declarative statements" as syntactic sugar in a procedural language. With this approach, the declaration:

```struct LightCollection
{
unsigned num_lights;
LightData lights[num_lights];
};```

Would just be syntactic sugar for:

```function unpack_light_collection(stream)
local res = {}
res.num_lights = unpack_unsigned(stream)
res.lights = []
for i=1,res.num_lights do
res.lights[i] = unpack_light_data(stream)
end
end```

This would allow the declarative syntax to be used in most places, but we could drop out to full-featured Turing complete code whenever needed.

1. Years ago, I wrote a tool that made some steps towards this. It was a very simple hierarchical block structured language - like a minimal xml and trivial to parse. In it you could write both the data to be converted to binary, and a 'rules' file that would tell a compiler how to do the conversion.

The compiler would walk the structure of both the rules and the data file simultaneously translating it as it went. There were a set of special rules for common things like adding a header, or saving the size of a block (including those in the future), or the length of an array.

The basic point of it was to minimise the amount of crap we had to write when converting data from our tools to game ready binary data. We could change the binary format independently of the data and the tools, which was nice.

It was also a big pain in the ass, mostly because that's all it did. It didn't help with versioning or backwards compatibility (which was a nightmare), or generate your binary loading code for you, and frankly I wrote it when I was still only 1 year into industry so it wasn't exactly my finest piece of code .

These things are all fixable, but it's definitely a problem with a lot of details that need carefully unpicking. Code gen and backwards compatibility are definitely two biggies. Being able to avoid having to put your data into a text file first is another.

It's an interesting problem though. We also have a need for this, so I'd be interested in talking to you about it. It just appears that standardised, but adhoc methods, have been easier to date.

ta,
Sam

2. FWIW, there's an existing standard called ASN.1 that provides a formal language for describing binary data layouts. I think it's generally used for defining messaging protocols.

It's verbose and nowhere near as readable as your pseudocode though, so even if ASN.1 provides some of the functionality you're looking for I suspect it's more heavyweight than most game developers would want.

3. I can see merit in having more formal way for structural definition.
One approach that brings to mind is a hex editor called Synalyze It[1], where you can define
grammars for file formats etc. and it can highlight and understand parts of the binary.
The grammar isn't quite as human readable, being a xml created by the grammar editor, but it has the mechanics laid out needed for many file formats.

[1] http://www.synalysis.net

4. Google's protocol buffers might interest you...

5. Hi,

I think 010Editor [1] does something similar of what you describe but they have a language close to C in syntax which allows you to describe "every" cases. So it's less simple than what you are proposing but it's very powerful.

By the way, I totally love your blog, believe it or not but I was working in a game company which was trying to do exactly what your are doing at bitsquid (but only for internal use) and I think we made all the errors you list in this blog (All-in one editor, complex XML format, complex serialization system, everything is an object etc.). I was very disappointed about the technical decisions and... a colleague show me your blog and it blows my mind!!

Now, I don't work anymore for the game industry neither I code in C++ but I think your blog is one of the main reason (maybe also the book "Coders at work") I'm still coding for a living.

Keep up good work,
Andreas, a true fan

[1] http://www.sweetscape.com/010editor/

1. Indeed 010Editors Templates have been what I have been using. After talking to a lot of friends that do reverse engineering this is basically the standard.

2. Thanks for the nice words!

I had a quick look at the 010Editor data templates. And you are right, it looks very similar to what I was looking for.

I'll investigate it further.

3. You're welcome :)

When I read your blog for the first time I was so amaze cause it was like you answered all our problematic with a very simple, understandable and yet extremely powerful and modular solution. On our side we had some hyper blotted tech that was almost unusable despite five years of R&D... What a waste of time and resources !

I want to write something about that cause it's the exact opposite (in term of design) as your engine and I think it will be a great example of what to avoid :).

6. There's such examples of reading such data if you look at any open-source Halo map editor. We had a very similar approach to what you're looking at when writing updates to the editor, Entity. Though this approach would of course need some way to detect the different sets of data.

7. Answer is "Protocol buffers", already mentioned, but worth to look at.

8. QuickBooks Online is widely used by people who need to access their business from multiple locations and multiple devices. so if you still have the QBDT version then you can switch to QBO. It has all the features of QBDT along with its own features. you can shift your Company file with some easy steps. if you need assistance to switch to QBO then you can contact Quickbook Tech Support. They will Guide you to switch it properly.

9. WOW! I Love it...
and i thing thats good for you >>

ซีรี่ส์เกาหลีสุดฟิน! Search 써치 2020
Thank you!

10. blog is very amazing. Yellowstone Coat

11. love the way you've expressed idea

yellowatone hoodie coat

12. I will be looking forward to your next post. Thank you
drpepperstarcenter.com/
kentrylee.com/

incomingcerebraloverdrive"
imagesf1"

14. Hello AOL Users, I am Miya Wilson also an AOL user, when I am facing all mail issue, then consider AOL Email Supports Technical Team to resolve all mail problems. Sometimes all mail not working for various reasons. Don't feel helpless! I suggest you to resolve your all problems, contact AOL Customer Support or get instant help. For more information visit our website.
How to Recover Forgot AOL Password
How To Recover AOL Maiil Password

15. This comment has been removed by the author.

16. During the printer establishment, numerous user see the 'Epson Printer WiFi Setup Failed' message springing up on the screen. The error message showing up on the screen demonstrates that your Printer Wi-fi setup is flopped because of some specialized difficulty. Luckily, a user can undoubtedly manage issues with basic apparatuses. Epson printer wifi connection problem
On the off chance that your printer is also showing you the 'Epson Printer WiFi Setup Failed' message on the screen, you can continue with the directions referenced in the guide and fix your concern. Along these lines, on the off chance that you would prefer not to return to your old-wired association, peruse and adhere to the guidelines. Assuming you are stressing associating your Epson printer to the remote organization, we will assist you with doing that. Here you can figure out how to set up an Epson printer remotely with no outer assistance. In the wake of introducing the product furnished with your new printer, you can go before setting up your printer to work remotely user WLAN network. This availability doesn't need links and it offers arrangement without the capability of network disappointment. brother printer MFC L2750DW setup

17. Thanks for the best share and i loved it,
cucotv

18. Selecting our gorgeous, verified, and experienced Independent Escorts Service and Call Girls in Aerocity would provide you spellbinding escort services that will offer you complete fulfillment and satisfaction and leave a long-lasting impression on you.Your Aerocity tour is always half-done without our Call Girls in Aerocity the real beauty in the city. We have the most gorgeous,Its so much insist of our loved customers our agency is happy to provide our Call Girls in Aerocity. We have hot and sexy females who are ready to hookup. Everyone are respectful with a quiet character. We have a wide range of Female escorts in Aerocity. Simply call us and recruit via given whatsapp option to the footer section.we are accessible day in and day out all over in Aerocity.To hire our Call Girls in Aerocity, you can speak with our call girl representative over the call. It is effortless to get Call Girls in Aerocity now. So, let us tell you about the benefit of our Call Girls in Aerocity because when you first desire something, you look for the pros of that thing. Call Girls in Aerocity, Delhi. One of the oldest Agency over hundreds of customer satisfatiction.

19. Hello guys you! You will meet many Aerocity escorts but she will not give you full satisfaction. If all of you want to real & erotic satisfaction then you booked your Adorable Escorts Service in Aerocity. And take full enjoy with Gratifying Aerocity escorts services. Escorts Service in Aerocity perfectly delightful movements and they are ready for sensuality anytime you crave.You will be able to find Hot Escorts Service in Aerocity, Elite escorts, and others as well. There are galleries for all of these girls, making it easy for you to hand-select the girl you want to spend time with. Booking requires a simple phone call. Escorts Service in Aerocity are often available day and night. The sooner you book, the easier it will be to have a larger selection. The girls working with our Delhi escort agency as escorts are college girls, Escorts Service in Aerocity, and housewives. Our other young babes such as the College Escorts from Model Female Escorts Service in Aerocity and Mahipalpur agency are light-hearted and always seem to have a spring in their step.Individuals who are not wedded or have enormous sex advance ought to employ a Escorts Service in Aerocity escort young girl like me to get unwind.

20. I will set up the whole thing for you with the purpose that you will have the nice pleasure of desirous adoration in an undistributed organic system. My incredible delight is to make my folks fulfilled and upbeat. This has helped me get a huge collection of nice in elegance gentlemen and industrialists as my rehash clients.
phone call Nainital girls
hot call Haldwani girls number.
call Ramnagar girls phone number
call Rudrapur models phone number
call Rishikesh models phone number

21. Additionally, you can make a booking anytime of the night or day we make it easy to make it easy for you to book the most desirable models for your special occasion in rudrapur. With our 24/7 operating hours, it won't matter when you decide to meet with one of our models. You'll also have an enjoyable and stimulating time thanks to our fantastic models. They won't be missed in a flash.
what's app Udaipur girls number
phone call udaipur girls
hot call udaipur girls number.
call udaipur girls phone number
call udaipur girls phone number
what's app Gurgaon girls number.

22. Impressive written blog and valuable information shared here. สมัครสมาชิก 123betting

23. If you are Looking for the best Massage Near Me Bangalore, then you have come to the right place. We have young models who offer b2b massaging. Female to male spa near me

24. Izspa is best place known for female to male body massage and body to body massage,Thai massage service in Bangalore.Our motto is giving 100% satisfaction to our customers.We are known for safe and secure service provider among all.

Visit massage centre near me

25. Spa69 Is known for safe and secure Body To Body Massage Service In Bangalore. We kept privacy of our customer and giving 24/7 service.

Visit body to body massage centres near me

26. massage is a great way to relieve muscle pain, soreness, and tension. It doesn’t matter whether this pain was caused by physical or mental stress, the massage will work it out for you.

Visit massage parlour near me

27. Looking for Full Female to male massage centre near me Service, Body Spa in Bangalore for female to male at our renowned sparsh body spa. Book now Nearby me.

28. Our agency is ready to fulfill your desire best service provider in Hyderabad.Best girl are available for service 24/7 assured 100% satisfaction.

29. We serve our Body to body massage service in the whole Hyderabad city Our services are at the top when it comes to massage Service in Hyderabad. We have the best Body to body massage spa near me to satisfy.

30. Have u ever tried a Massage near me ? If your answer is no, then you are missing out on loads. Fun is one quotient and getting relaxed is another. for more info visit here:- Nuru massage

31. We assure you world class nuru massage service in Bangalore.Bella spa having special nuru therapist where you can get 100% satisfied service.

Visit female to male spa near me 24 hours

32. I am committed towards providing excellent customer experience with each client session by ensuring that they are comfortable during their time at Massage spa near me

33. Lishasingh is a massage therapist from Bangalore. He has worked over the last few years to provide top customer service and support, for both male and female customers.He is always striving to become massage parlour near me a better version of his self by ensuring high quality control throughout the day.Obedient6 minutes ago

34. In South Hyderabad You can enjoy the benefits of day spas, which mean all day for your own relaxation. You can take advantage of these spa services in hyderabad's luxury hotels, health clubs department stores, or spa centers. The models are charming to take care of and pamper your body. There are many luxurious benefits , like a massages and body wraps if you decide to not wear clothes. Massages are performed using body to body massage centres in hyderabad aromatherapy and vitamin-rich oils. There are treatment for your body, beauty tips, and skin treatments. In addition you will also get the luxury of a hot tub bath that takes away all fatigue and stress.

35. Now get your female massage services in indhiranagar from expert female therapist at best prices.Feamle client can get comfortable service with the female therapist in your locality from massage Izspa.

Visit female to male spa near me 24 hours

36. Our spa bring the b2b massage service from female therapist who are trained to provide 100% result based massage sessions.Customers go with stress free and relax mode after our oil, thai, swedish, hot stone massage.