--- title: "Coding Style Conventions" layout: default permalink: /page_code_style.html ---
DynamoRIO
|
The overall goal is to make the source code as readable and as self-documenting as possible. Everyone following the same style guidelines is an important part of keeping the code consistent and maintainable.
We employ automated code formatting via clang-format
version 14.0. The .clang-format
top-level file specifies the style rules for all .h
, .c
, and .cpp
source code files. Developers are expected to set up their editors to use clang-format
when saving each file (see the clang-format
documentation for pointers to vim, emacs, and Visual Studio setup instructions). Our test suite includes a format check and will fail any code whose formatting does not match the clang-format
output.
Some of the style conventions have changed over time, but we have not wanted to incur the cost in time and history confusion of changing all old code. Thus, you may observe old code that does not comply with some of these conventions. These listed conventions overrule surrounding code! Please change the style of old code when you are making other changes to the same lines.
_api.h
suffix: e.g., encode_api.h
. If the content is solely used externally, it should be named the same as its exported name: e.g., dr_inject.h
. For the former case, where the exported name is different, the include guards should use the exported name.Variable and function names use only lowercase letters. Multi-word function and variable names are all lowercase with underscores delimiting words. Do not use CamelCase for names, unless mirroring Windows-defined data structures.
GOOD: instr_get_target()
BAD: instrGetTarget()
_t
: This is true for C++ class names as well.instr_t
struct begin with instr_
.core/
, short names or any global name with a chance of colliding with names from an including application linking statically should be qualified with a d_r_
prefx: e.g., d_r_dispatch
. This is distinct from the dr_
prefix which is used on exported interface names.static
when possible for every function or variable that is not needed outside of its own file.When declaring a function with no arguments, always explicitly use the void
keyword. Otherwise the compiler will not be able to check whether you are incorrectly passing arguments to that function.
GOOD: int foo(void);
BAD: int foo();
Use the IN
, OUT
, and INOUT
labels to describe function parameters. This is a recent addition to DynamoRIO so you will see many older functions without these labels, but use them on all new functions.
GOOD: int foo(IN int length, OUT char *buf);
BAD: int foo(int length, char **buf);
Only use boolean types as conditionals. This means using explicit NULL comparisons and result comparisons. In particular with functions like strcmp() and memcmp(), the use of ! is counter-intuitive.
GOOD: if (p == NULL) ...
BAD: if (!p)
GOOD: if (p != NULL) ...
BAD: if (p)
GOOD: if (strncmp(...) == 0) ...
BAD: if (!strncmp(...))
0
.It's much easier to read if (i == 0)
than if (0 == i)
. The compiler, with all warnings turned on (which we have), will warn you if you use assignment rather than equality.
GOOD: if (i == 0) ...
BAD: if (0 == i)
Use the TEST
and related macros for testing bits.
GOOD: if (TEST(BITMASK, x))
BAD: if ((x & BITMASK) != 0)
INT64_FORMAT
and related macros for printing 64-bit integers.ASSERT_TRUNCATE
macros when casting to a smaller type.PFX
(rather than p, which is inconsistent across compilers) and other printf macros for printing pointer-sized variables in code using general printing libraries. For code that exclusively uses DR's own printing facilities, p is allowed: its improved code readability and simplicity outweigh the risk of such code being copied into non-DR locations and resulting in inconsistent output.const
makes code easier to read and lets the compiler complain about errors and generate better code. It is also required for the most efficient self-protection. Use whenever possible. However, do not mark simple scalar type function parameters as const
.Place *
(or &
for C++ references) prefixing variable names (C style), not suffixing type names (Java style):
GOOD: char *foo;
BAD: char* foo;
In a struct, union, or class, list each field on its own line with its own type declaration, even when sharing the type of the prior field. Similarly, declare global variables separately. Local variables of the same type can optionally be combined on a line.
GOOD:
BAD:
char
is signed: use our sbyte
typedef for a signed one-byte type./∗ ∗/
comments are preferable to //
. Put stars on each line of a multi-line comment, like this: /* multi-line comment * with stars */The trailing
∗/
can be either on its own line or the end of the preceding line, but on its own line is preferred.For C++ code, //
comments are allowed.
Do not use large, clunky function headers that simply duplicate information in the code itself. Such headers tend to contain stale, incorrect information, for two reasons: the code is often updated without maintaining the header, and since the headers are a pain to type they are often copied from other functions and not completely modified for their new home. They also make it harder to see the code or to group related functions, as they take up so much screen space. It is better to have leaner, more maintainable, and more readable implementation files by using self-descriptive function and parameter names and placing comments for function parameters next to the parameters themselves.
GOOD:
/* Retrieves the name of the logfile for a particular thread. * Returns false if no such thread exists. */ bool get_logfile(IN thread_id_t thread, OUT char **fname, IN size_t fname_len)
BAD:
/*------------------------------------------------------ * Name: get_logfile * * Purpose: * Retrieves the name of the logfile for a particular thread. * * Parameters: * [IN] thread = which thread * [OUT](IN] fname = where to store the logfile name * [IN] fname_len = the size of the fname buffer * * Returns: * True if successful. * False if no such thread exists. * * Side effects: * None. * ------------------------------------------------------ */ bool get_logfile(thread_id_t thread, char *fname, size_t fname_len)
Use doxygen comments on all function and type declarations that are exported as part of the API. For comments starting with /∗∗
, leave the rest of the first line empty, unless the entire comment is a single line. Some examples (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries):
DR_API /** * Returns the entry point of the function with the given name in the module * with the given base. Returns NULL on failure. * \note Currently Windows only. */ generic_func_t dr_get_proc_address(IN module_handle_t lib, IN const char *name); /** * Data structure passed with a signal event. Contains the machine * context at the signal interruption point and other signal * information. */ typedef struct _dr_siginfo_t { int sig; /**< The signal number. */ void *drcontext; /**< The context of the thread receiving the signal. */ dr_mcontext_t mcontext; /**< The machine state at the signal interruption point. */ siginfo_t siginfo; /**< The signal information provided by the kernel. **/ } dr_siginfo_t;
Within doxygen comments, create links using parentheses for functions foo()
and a leading #
for other items such as types #dr_siginfo_t
or defines #DR_REG_START_GPR
. See the doxygen documentation for more information: http://www.doxygen.nl/manual/autolink.html
#if DISABLED_UNTIL_BUG_812_IS_FIXED
), and perhaps additionally explain in a comment why the code is disabled.XXX
in comments to indicate code that could be optimized or something that may warrant re-examination later. Include the issue number using the syntax i#<number>
. For example (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries): /* XXX i#391: this could be done more efficiently via ... */
FIXME
or TODO
in comments to indicate missing features that are required and not just optimizations or optional improvements (use XXX
for those). Include the issue number using the syntax i#<number>
. For example (ignore leading dots – only there to work around GitHub markdown problems with leading spaces in literal blocks in list entries): /* FIXME i#999: we do not yet handle a corner case where ... */
NOCHECKIN
comment. The make/codereview.cmake
script will remind you to clean up the code. x = 4; /* NOCHECKIN: temporary debugging change */
/**************************************************************************** * Name for this group of functions */If a closing marker is needed use this style:
/* ****************************************************************************/
Uninitialized variables warning (W4701 for cl): Don't initialize variables when you don't need to, so that we can still have good warnings about uninitialized variables in the future. Only if the compiler can't analyze code properly is it better to err on the side of a deterministic bug and set to 0 or {0}
.
Use do {} while ()
loops to help the compiler figure out that variables will get initialized. The generated code on those constructs is faster and better predicted (although optimizations should be able to transform simple loops).
Use an indentation level of 4 spaces (no tabs, always expand them to spaces when saving the file). (Exception: in CMakeLists.txt and other CMake scripts, use an indentation level of 2 spaces.)
WARNING: Emacs defaults are not always correct here. Make sure your .emacs contains the following:
For CMake, use cmake-mode which does default to 2 spaces.
Functions should have their type on a separate line from their name. Place the function's opening brace on a line by itself at zero indentation.
Function declarations should also have the type on a separate line, although this rule can be relaxed for short (single-line) signatures with a one-line comment or no comment beforehand.
Put spaces after commas in parameter and argument lists
GOOD: foo(x, y, z);
BAD: foo(x,y,z);
Do not put spaces between a function name and the following parenthesis. Do put a space between a for
, if
, while
, do
, or switch
and the following parenthesis. Do not put spaces after an opening parenthesis or before a closing parenthesis, unless the interior expression is complex and contains multiple layers of parentheses.
GOOD: foo(x, y, z);
BAD: foo (x, y, z);
BAD: foo( x, y, z);
BAD: foo(x, y, z );
GOOD: if (x == 6)
BAD: if( x==6 )
Always put the body of an if or loop on a separate line from the line containing the keyword.
GOOD:
BAD:
#
character must be in the first column, but the rest of the statement can be indented. DODEBUG
, DOSTATS
, or DOLOG
macros, or the IF_WINDOWS
and related macros, rather than ifdefs, for common defines.DO_ONCE(SYSLOG_INTERNAL
. Instead use two new macros: DODEBUG_ONCE
and SYSLOG_INTERNAL_*_ONCE
.make/codereview.cmake
's style checks to examine the code for known poor coding patterns. In the future we may add checks using astyle
(issue 83).When using DEBUG_DECLARE or other conditional macros at the start of a line, move any code not within the macro to the subsequent line, to aid readability and avoid the reader skipping over it under the assumption that it's debug-only. E.g.:
GOOD:
BAD:
Avoid embedding assignments inside expressions. We consider a separate assignment statement to be more readable. E.g.:
GOOD:
BAD:
Avoid embedding non-const expressions inside macros. We consider a separate expression statement to be more readable, as well as avoiding hidden errors when macros such as ASSERT are disabled in non-debug build. Example:
GOOD:
BAD:
Use named constants rather than "magic values" embedded in the code. Recognizing and naming constants up front and centralizing them makes assumptions clearer, code more readable, and modifying and maintaining the code easier.
GOOD: char buf[MAXIMUM_LINE_LENGTH];
BAD: char buf[128];
core/
directory. If code must diverge for Windows versus Linux, provide an OS-independent interface documented in core/os_shared.h
and implemented separately in core/unix/
and core/windows/
.