Data Oriented Design – Principles

Data Oriented Design (DOD) is an approach to designing software which focuses on context (data and transformations affecting it). Object Oriented Design (OOD), on the other hand, strives to decompose the problem into manageable pieces of responsibility represented as classes. DOD favors carefully crafted solution, well suited to the problem, while OOD wishes to find general patterns which will cover the requirements (including potential ones).

There is no clear winner among these two methodologies. While OOD might seem to be the prevalent design school nowadays, there are no reasons to ignore DOD. Why is that? DOD shines in two fields: performance and flexibility. By understanding the environment we can reach the optimal efficiency for a given platform. We can always create a feasible solution, not being distracted by (often) conflicting requirements that the pattern should cover.

I do believe that these approaches can be merged and you are the one to determine how DOD or OOD your application should be.

Overview

dod

Imprint this image in your mind
Close your eyes
Imagine a mosaic of such images
This is your solution

Decalogue

  1. You shall determine properties of input data:
    * size (copying vs sharing)
    * uniqueness (mapping hash : data, and storing only hash vs keeping everything as is)
    * divisibility (many blocks (array) vs single entity)
    * source access method (stream vs database (various optimizations already present) vs simple binary (example: file))
    * data format (binary, JSON, XML, UTF-8 plain text)
    * influx (stable data vs bursts vs steady throughput)
  2. You shall know the detailed requirements of xforms
    * latency (time from scheduling the task to receiving the answer)
    * throughput (number of elements that should be xformed per second/minute/hour)
    * reentrancy (should the xform be reentrant and produce expected results)
    * pre/post-conditions and invariants (how restrained is your xform (postcond. & invariants) and how it should react to broken precond. (undefined behavior and greater efficiency vs double checking and slower))
  3. You shall define the behavior in concurrent scenarios
    * eventual guarantees (what will be the final state/guarantee in case of multiple actors?)
    * independence (strong synchronization and worse efficiency vs loose synchronization and better efficiency)
  4. You shall think like the platform
    * OS API functions (are there any higher level synchronization primitives, async IO read/write, etc.)
    * thread/process restrictions (max threads per process, cost of spawning proc./thread, inter-process messaging)
    * atomic file operations (can you safely lock on a file handle, or not (like in Windows))
    * user rights (what privileges are required for your solution to operate)

  5. You shall understand the machine
    * 32/64-bit (larger pointers (data size) and (potential) lack of support in case of some libraries vs more registers and ongoing support by the compilers)
    * CPU family (ARM/x86-x64/Cell) (efficiency/support of/for floating point arithmetics, weak/strong synchronization guarantees, branching efficiency, instruction/data cache sizes, unaligned data costs, speed (MHz))
    * peripherals (GPU present and potential GPGPU usage, disk speed (SSD/HDD/flash) and the cost of fetching data from second storage, access patterns of peripherals, DMA present and ability to delegate IO operations)
  6. You shall avoid dynamism
    * branching (avoid complex conditional logic and use early-out checks only if you do know that the performance is gained)
    * sort (data shall be sorted so that similar operations will be present in instruction cache)
    * virtual calls (while VC is relatively light in itself, the pipeline disturbing calls via pointers can seriously degrade performance)
    * bulk (operate on data in SIMD like fashion where the data will be properly aligned and processed in understandable and pre-fetch-friendly way)
    * static (strive to calculate as much as possible during compilation time, and prefer logic that can be optimized before running the code)
  7. You shall simplify
    * feature creep (implement only this, what should be implemented; eagerness will waste precious time)
    * no patterns (most of the time you are not designing widely used library, so focus on the job, not on the general way that the problem could be solved)
    * clarity (no matter how you optimize, make sure that the code is readable and commented where needed)
  8. You shall structure your code around data
    * states (determine the state of your data)
    * assign operations (moving from one state to another will be done with some xform)
  9. You shall not forget the universal & timeless design principles
    * DRY (one reason to change – one place to do it)
    * orthogonality (less dependencies between modules, easier reusability)
    * testability (no tests, no product)
    * morphism (decompose application so even the heavily-customized xforms can be swapped)
    * documentation (clearly present the goals, features and restrictions of your architecture)
  10. You shall measure
    * statistics (in critical parts of your program add logging procedures so that you know what are the usage statistics of your data)
    * tuning (according to stats tune the properties of your code in order to reach the set goals)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: