These are the phpBB Coding Guidelines for Olympus, all attempts should be made to follow them as closely as possible.

Coding Guidelines


1. Defaults

1.i. Настройки редактора

Табуляция против пробелов:

Чтобы все максимально упростить, мы будем использовать табуляцию, а не пробелы. Мы используем 4 (четыре) позиции пробела для одной табуляции. Поэтому вы должны устанавливать ширину табуляции в своём редакторе в пределах 4 позиций пробела. Удостоверьтесь, что при сохранении файла сохраняются позиции табуляции, а не пробелы. Таким образом, каждый сможет настроить отображение кода по своему вкусу, не нарушая при этом его фактического расположения.

Табуляции в начале строки не представляют из себя проблемы. Но табуляции внутри строки может стать проблемой, если только вы не используете такое же количество позиций пробелов для табуляции, как каждый из нас. В общем случае, вы должны стремиться к такому виду:

{TAB}$mode{TAB}{TAB}= request_var('mode', '');
{TAB}$search_id{TAB}= request_var('search_id', '');
	

При использовании табуляций (заменяют {TAB}), оба знака равенства должны находиться в одном столбце.

Перевод строки:

Убедитесь, что ваш редактор сохраняет файлы в формате UNIX, то есть строки оканчиваются символом перевода строки, а не комбинацией CR/LF, как в Win32, или безотносительно использования Mac. Любой приличный редактор должен иметь такую возможность, хотя это и не всегда установлено по умолчанию. Знайте свой редактор. Если вы хотите получить совет относительно редакторов текста под Windows, просто спросите одного из разработчиков - некоторые из них занимаются редактированием под Win32.

1.ii. Заголовок файла

Стандартный заголовок для нового файла:

Вот шаблон заголовка, который должен быть включен в начало всех файлов phpBB:

/**
*
* @package {PACKAGENAME}
* @version $Id: $
* @copyright (c) 2007 phpBB Group
* @license http://opensource.org/licenses/gpl-license.php GNU Public License
*
*/
	

Смотрите раздел Расположение файлов для правильного указания {PACKAGENAME}.

Файлы, содержащие исполняемый код:

Для этих файлов вы должны поместить пустой комментарий сразу после заголовка, чтобы header to prevent the documentor assigning the header to the first code element found.

/**
* {HEADER}
*/

/**
*/
{CODE}
	

Файлы, содержащие только функции:

Не забывайте писать комментарии к функциям (особенно перед первой функцией после заголовка файла). Каждая функция должна иметь, по крайней мере, описание того что она делает. Для сложных функций рекомендуется описать ещё и её параметры.

Файлы, содержащие только классы:

Не забывайте писать комментарии для классов. Классам требуется отдельное определение @package - это тоже самое, что и название пакета в заголовке файла. Кроме этого специального дополнения, на файлы, содержащие только функции, распространяются все требования и методы, что и к файлам, содержащим только функции.

Код после заголовка, но только для файлов функций/классов:

If this case is true, лучшим способом избежать бардака в документции будет добавление специальной команды игнорирования. Например:

/**
* {HEADER}
*/

/**
* @ignore
*/
Small code snipped, mostly one or two defines or an if statement

/**
* {DOCUMENTATION}
*/
class ...
	

1.iii. Расположение файлов

Функции, используемые более чем на одной странице, должны быть помещены в файл functions.php. Функции, используемые на одной странице, должны быть помещены в соответствующий файл (в конец) или within the relevant sections functions file. Some files in /includes are holding functions responsible for special sections, for example uploading files, displaying "things", user related functions and so forth.

The following packages are defined, and related new features/functions should be placed within the mentioned files/locations, as well as specifying the correct package name. The package names are bold within this list:

  • phpBB3
    Основные файлы и файлы не относящиеся к отдельным пакетам
  • acm
    /includes/acm, /includes/cache.php
    Система кэширования
  • acp
    /adm, /includes/acp, /includes/functions_admin.php
    Администраторский раздел
  • dbal
    /includes/db
    Database Abstraction Layer.
    Base class is dbal
    • /includes/db/dbal.php
      Base DBAL class, defining the overall framework as well as common detominators
    • /includes/db/firebird.php
      Firebird/Interbase Database Abstraction Layer
    • /includes/db/msssql.php
      MSSQL Database Abstraction Layer
    • /includes/db/mssql_odbc.php
      MSSQL ODBC Database Abstraction Layer for MSSQL
    • /includes/db/mysql.php
      MySQL Database Abstraction Layer for MySQL 3.x/4.0.x
    • /includes/db/mysql4.php
      MySQL4 Database Abstraction Layer for MySQL 4.1.x/5.x
    • /includes/db/mysqli.php
      MySQLi Database Abstraction Layer
    • /includes/db/oracle.php
      Oracle Database Abstraction Layer
    • /includes/db/postgres.php
      PostgreSQL Database Abstraction Layer
    • /includes/db/sqlite.php
      Sqlite Database Abstraction Layer
  • diff
    /includes/diff
    Diff Engine
  • docs
    /docs
    Документация по phpBB
  • images
    /images
    Все изображения не относящиеся к конкретному стилю
  • install
    /install
    Система установки
  • language
    /language
    Все языковые файлы
  • login
    /includes/auth
    Login Authentication Plugins
  • VC
    /includes/captcha
    CAPTCHA
  • mcp
    mcp.php, /includes/mcp, report.php
    Модераторский раздел
  • ucp
    ucp.php, /includes/ucp
    User Control Panel
  • utf
    /includes/utf
    UTF8-related functions/classes
  • search
    /includes/search, search.php
    Поисковая система
  • styles
    /styles, style.php
    phpBB Styles/Templates/Themes/Imagesets

2. Code Layout/Guidelines

Please note that these Guidelines applies to all php, html, javascript and css files.

2.i. Имена переменных и функций

Мы не будем использовать венгерскую запись в наших обозначениях. Многие из нас полагают, что венгерская запись - один из основных используемых в настоящее время методов путаницы кода.

Имена переменных:

Имена переменных должны быть в нижнем регистре, слова отделяются символом подчеркивания. Например:

$current_user правильно, а $currentuser и $currentUser - неверно.

Имена должны быть понятными, но краткими. Мы не хотим, чтобы имена переменных были огромными, но лучше напечатать пару лишних символов, чем гадать, для чего в действительности предназначена та или иная переменная.

Счетчики циклов:

Единственная ситуация, когда имя переменной может состоять из одного символа - когда это переменная-счетчик какой-либо циклической конструкции. В этом случае, счетчик внешнего цикла должен быть $i. Если имеется еще и вложенный цикл, его счетчик $j, затем - $k, и так далее. Если счетчиком является некоторая уже существующая переменная с понятным именем, это руководство не применяется.

for ($i = 0; $i < $outer_size; $i++)
{
   for ($j = 0; $j < $inner_size; $j++)
   {
      foo($i, $j);
   }
}
	

Имена функций:

Функции нужно также называть понятными именами. Здесь мы не программируем на C и не хотим писать функции, названные чем-то вроде "stristr()". Снова все в нижнем регистре, слова отделены единственным символом подчеркивания. Предпочтительно, чтобы имена функций содержали глаголы. Хорошими именами функций являются print_login_status(), get_user_data() и так далее...

Параметры функций:

Параметры подчиняются тем же правилам, что и имена переменных. Не желаем видеть множество функций вида do_stuff($a, $b, $c). В большинстве случаев хотелось бы иметь возможность сказать, как использовать функцию, лишь посмотрев на ее объявление.

Итоги:

Главное - не вредить ясности кода в угоду лени. Руководствуйтесь здравым смыслом. print_login_status_for_a_given_user(), например, длинновато - эту функцию лучше назвать print_user_login_status(), или просто print_login_status().

Специальные имена:

Для всех смайлов используйте smiley, если это один смайл, и smilies, если их множество.

2.ii. Код

Всегда ставьте фигурные скобки:

Это другой случай того, когда нежелание напечать два лишних символа может вызвать проблемы с ясностью кода. Если какая-то конструкция занимает только одну строку, не бросайте скобки. Просто не бросайте. Например:

// Неправильно.

if (condition) do_stuff();

if (condition)
	do_stuff();

while (condition)
	do_stuff();

for ($i = 0; $i < size; $i++)
	do_stuff($i);
	

// Правильно.

if (condition)
{
	do_stuff();
}

while (condition)
{
	do_stuff();
}

for ($i = 0; $i < size; $i++)
{
	do_stuff();
}
	

Где поместить фигурные скобки:

Это тоже маленькая часть святой войны. Мы собираемся использовать стиль, который можно выразить следующими предложениями. Фигурные скобки всегда помещаются на новой строке. Закрывающая скобка должна всегда находится в том же самом столбце, где и соответствующая ей открывающая.

if (condition)
{
	while (condition2)
	{
		...
	}
}
else
{
	...
}

for ($i = 0; $i < $size; $i++)
{
	...
}

while (condition)
{
	...
}

function do_stuff()
{
	...
}
	

Используйте пробелы между лексеммами:

то другой способ сделать код удобочитаемым малой кровью. Когда вы пишете присваивание, выражение и т.д., всегда оставляйте один пробел между лексеммами. Просто пишите код так, будто вы пишете человеческим языком. Оставляйте пробелы между именами переменных и операторами. Не ставьте их лишь сразу после открывающейся скобки и перед закрывающейся, а также непосредственно перед запятой или точкой с запятой. Все это лучше всего продемонстрировать несколькими примерами.

// Каждая пара показывает неправильный и правильный варианты.

$i=0;
$i = 0;

if($i<7) ...
if ($i < 7) ...

if ( ($i < 7)&&($j > 8) ) ...
if ($i < 7 && $j > 8) ...

do_stuff( $i, 'foo', $b );
do_stuff($i, 'foo', $b);

for($i=0; $i<$size; $i++) ...
for ($i = 0; $i < $size; $i++) ...

$i=($j < $size)?0:1;
$i = ($j < $size) ? 0 : 1;
	

Порядок выполнения операторов:

Вы знаете точный порядок выполнения всех операторов в PHP? Я - тоже нет. Не гадайте. Укажите порядок выполнения с помощью скобок и будьте уверены в результате. Но не злоупотребляйте этим - это может ухудшить читаемость кода. Просто не выделяйте отдельные выражения. Например:

// каков результат? кто знает...

$bool = ($i < 7 && $j > 8 || $k == 4);
	

// теперь в результате можно быть уверенным.

$bool = (($i < 7) && (($j < 8) || ($k == 4)));
	

// Но этот вариант ещё лучше, потому что этот код хорошо читается и в тоже время сохранена последовательность выполнения.

$bool = ($i < 7 && ($j < 8 || $k == 4));
	

Обозначение строк:

В PHP существует два различных способа обозначения строк - с помощью одинарных или двойных кавычек. Разница между этими двумя способами заключается в том, что в строках, обозначенных двойными кавычками, парсер заменяет имена переменных их значениями, чего он не делает в строках, обозначенных кавычками одинарными. Поэтому, если подмена переменных их значениями не нужна, вы должны использовать одинарные кавычки. Этим мы избавляем парсер от множества совершенно не нужной работы.

Опять же, если вы используете строку как аргумент при вызове функции, нет необходимости заключать имя переменной в кавычки. Этим мы снова облегчаем работу парсера. И запомните - все escape-последовательности существующие для строк с двойными кавычками, не будут работать в строках с одинарными кавычками. Будьте внимательны и спокойно нарушайте вышеописанное, если данные правила делают ваш код нечитабельным. Например:

// неправильно

$str = "This is a really long string with no variables for the parser to find.";

do_stuff("$str");
	

// правильно

$str = 'This is a really long string with no variables for the parser to find.';

do_stuff($str);
	

// Иногда только одинарные кавычки не правильны

$post_url = $phpbb_root_path . 'posting.' . $phpEx . '?mode=' . $mode . '&amp;start=' . $start;
	

// Двойные кавычки иногда необходимы, что бы overcroud the line with concentinations

$post_url = "{$phpbb_root_path}posting.$phpEx?mode=$mode&amp;start=$start";
	

In SQL Statements mixing single and double quotes is partly allowed (following the guidelines listed here about SQL Formatting), else it should be tryed to only use one method - mostly single quotes.

Ассоциативные ключи массивов:

В PHP возможно использовать строки литералов в качестве ключа для ассоциативного массива без заключения в кавычки этой строки. Нам не хотелось бы этим пользоваться - строка должна быть всегда заключена в кавычки во избежание проблем. Учтите, что это касается только использования литерал, а не использования переменных.

// не правильно

$foo = $assoc_array[blah];
	

// правильно

$foo = $assoc_array['blah'];
	

// не правильно

$foo = $assoc_array["$var"];
	

// правильно

$foo = $assoc_array[$var];
	

Комментарии:

Каждая функция должна сопровождаться комментарием, описывающим программисту все, что ему нужно для ее использования. Значение каждого параметра, входные и выходные данные должны быть указаны обязательно - это считается необходимым минимумом. Поведение функции в случае возникновения ошибочных ситуаций (а также сами ошибочные ситуации) также желательно указывать - but mostly included within the comment about the output.

Especially important to document are any assumptions the code makes, or preconditions for its proper operation. Любой разработчик должен иметь возможность, посмотрев на код приложения and figure out, понять его назначение в кратчайший срок.

Избегайте использования /* */ для комментирования небольших блоков. Для коротких комментариев вы должны использовать //.

Магические числа:

Не используйте их. Используйте именованные константы для любой литеральной переменной, кроме особых случаев. Вполне допустимо проверять наличие нулевого массива использованием литерала 0. Но никогда не придавайте особое значение числам и ни в коем случае не используйте их всюду как литералы. Это сильно вредит читаемости И возможности дальнейшего сопровождения кода. Аналогично, необходимо использовать константы true и false вместо литералов 1 и 0 - несмотря на то, что они имеют одни и те же значения (но не тип!), потому что они более очевидны what the actual logic, как если бы вы использовали именованные константы. Typecast variables where it is needed, do not rely on the correct variable type (PHP is currently very loose on typecasting which can lead to security problems if a developer does not have a very close eye to it).

Встроенные функции:

Единственные встроенные функции, вызывающие проблемы с читабельностью кода - операторы инкремента $i++ и декремента $j--. Эти операторы не должны использоваться как часть выражения. Их необходимо размещать в отдельной строке - это сильно снизит головную боль при отладке.

// не правильно

$array[++$i] = $j;
$array[$i++] = $k;
	

// правильно

$i++;
$array[$i] = $j;

$array[$i] = $k;
$i++;
	

Встроенные условия:

Встроенные условия должны использоваться только для осуществления очень простых операций. Желательно использовать их только для присваиваний, но не для вызовов функций или чего-либо подобного. Они также могут повредить читабельности, так что не увлекайтесь ими ради экономии пары набранных символов. Например:

// не правильно

($i < $size && $j > $size) ? do_stuff($foo) : do_stuff($bar);
	

// правильно

$min = ($i < $j) ? $i : $j;
	

Не используйте неинициализированные переменные.

Для phpBB3 мы собираемся использовать более высокий уровень сообщений об ошибках времени выполнения. Это означает, что использование неинициализированной переменной будет воспринято как ошибка. Данных ошибок можно избежать, используя встроенную функцию isset() для проверки инициализации переменных. Но предпочтительней, что бы переменная всегда существовала. Для проверки ключа массива это тоже может пригодиться.

// не правильно

if ($forum) ...
	

// правильно

if (isset($forum)) ...
	

// тоже можно

if (isset($forum) && $forum == 5)
	

Функция empty() может быть использована для проверки определена переменная или она пуста (пустая строка, 0 как целое число или строка, NULL, false, пустой массив или переменная объявленна, но не имеет значения внутри класса). Эта функция должна использоваться вместо подобных записей isset($array) && sizeof($array) > 0, её можно записать более коротко !empty($array).

Switch statements:

Switch/case code blocks can get a bit long sometimes. To have some level of notice and being in-line with the opening/closing brace requirement (where they are on the same line for better readability), this also applies to switch/case code blocks and the breaks. An example:

// Wrong

switch ($mode)
{
	case 'mode1':
		// I am doing something here
		break;
	case 'mode2':
		// I am doing something completely different here
		break;
}
	

// Good

switch ($mode)
{
	case 'mode1':
		// I am doing something here
	break;

	case 'mode2':
		// I am doing something completely different here
	break;

	default:
		// Always assume that the case got not catched
	break;
}
	

// Also good, if you have more code between the case and the break

switch ($mode)
{
	case 'mode1':

		// I am doing something here

	break;

	case 'mode2':

		// I am doing something completely different here

	break;

	default:

		// Always assume that the case got not catched

	break;
}
	

Even if the break for the default case is not needed, it is sometimes better to include it just for readability and completeness.

If no break is intended, please add a comment instead. An example:

// Example with no break

switch ($mode)
{
	case 'mode1':

		// I am doing something here

	// no break here

	case 'mode2':

		// I am doing something completely different here

	break;

	default:

		// Always assume that the case got not catched

	break;
}
	

2.iii. SQL/SQL Layout

Common SQL Guidelines:

All SQL should be cross-DB compatible, if DB specific SQL is used alternatives must be provided which work on all supported DB's (MySQL3/4/5, MSSQL (7.0 and 2000), PostgreSQL (7.0+), Firebird, SQLite, Oracle8, ODBC (generalised if possible)).

All SQL commands should utilise the DataBase Abstraction Layer (DBAL)

SQL code layout:

SQL Statements are often unreadable without some formatting, since they tend to be big at times. Though the formatting of sql statements adds a lot to the readability of code. SQL statements should be formatted in the following way, basically writing keywords:

$sql = 'SELECT *
<-one tab->FROM ' . SOME_TABLE . '
<-one tab->WHERE a = 1
<-two tabs->AND (b = 2
<-three tabs->OR b = 3)
<-one tab->ORDER BY b';
	

Here the example with the tabs applied:

$sql = 'SELECT *
	FROM ' . SOME_TABLE . '
	WHERE a = 1
		AND (b = 2
			OR b = 3)
	ORDER BY b';
	

SQL Quotes:

Double quotes where applicable (The variables in these examples are typecasted to integers before) ... examples:

// These are wrong.

"UPDATE " . SOME_TABLE . " SET something = something_else WHERE a = $b";

'UPDATE ' . SOME_TABLE . ' SET something = ' . $user_id . ' WHERE a = ' . $something;
	

// These are right.

'UPDATE ' . SOME_TABLE . " SET something = something_else WHERE a = $b";

'UPDATE ' . SOME_TABLE . " SET something = $user_id WHERE a = $something";
	

In other words use single quotes where no variable substitution is required or where the variable involved shouldn't appear within double quotes. Otherwise use double quotes.

Avoid DB specific SQL:

The "not equals operator", as defined by the SQL:2003 standard, is "<>"

// This is wrong.

$sql = 'SELECT *
	FROM ' . SOME_TABLE . '
	WHERE a != 2';
	

// This is right.

$sql = 'SELECT *
	FROM ' . SOME_TABLE . '
	WHERE a <> 2';
	

Common DBAL methods:

sql_escape():

Always use $db->sql_escape() if you need to check for a string within an SQL statement (even if you are sure the variable cannot contain single quotes - never trust your input), for example:

$sql = 'SELECT *
	FROM ' . SOME_TABLE . "
	WHERE username = '" . $db->sql_escape($username) . "'";
	

sql_query_limit():

We do not add limit statements to the sql query, but instead use $db->sql_query_limit(). You basically pass the query, the total number of lines to retrieve and the offset.

Note: Since Oracle handles limits differently and because of how we implemented this handling you need to take special care if you use sql_query_limit with an sql query retrieving data from more than one table.

Make sure when using something like "SELECT x.*, y.jars" that there is not a column named jars in x; make sure that there is no overlap between an implicit column and the explicit columns.

sql_build_array():

If you need to UPDATE or INSERT data, make use of the $db->sql_build_array() function. This function already escapes strings and checks other types, so there is no need to do this here. The data to be inserted should go into an array - $sql_ary - or directly within the statement if one or two variables needs to be inserted/updated. An example of an insert statement would be:

$sql_ary = array(
	'somedata'		=> $my_string,
	'otherdata'		=> $an_int,
	'moredata'		=> $another_int
);

$db->sql_query('INSERT INTO ' . SOME_TABLE . ' ' . $db->sql_build_array('INSERT', $sql_ary));
	

To complete the example, this is how an update statement would look like:

$sql_ary = array(
	'somedata'		=> $my_string,
	'otherdata'		=> $an_int,
	'moredata'		=> $another_int
);

$sql = 'UPDATE ' . SOME_TABLE . '
	SET ' . $db->sql_build_array('UPDATE', $sql_ary) . '
	WHERE user_id = ' . (int) $user_id;
$db->sql_query($sql);
	

The $db->sql_build_array() function supports the following modes: INSERT (example above), INSERT_SELECT (building query for INSERT INTO table (...) SELECT value, column ... statements), MULTI_INSERT (for returning extended inserts), UPDATE (example above) and SELECT (for building WHERE statement [AND logic]).

sql_in_set():

The $db->sql_in_set() function should be used for building IN () and NOT IN () constructs. Since (specifically) MySQL tend to be faster if for one value to be compared the = and <> operator is used, we let the DBAL decide what to do. A typical example of doing a positive match against a number of values would be:

$sql = 'SELECT *
	FROM ' . FORUMS_TABLE . '
	WHERE ' . $db->sql_in_set('forum_id', $forum_ids);
$db->sql_query($sql);
	

Based on the number of values in $forum_ids, the query can look differently.

// SQL Statement if $forum_ids = array(1, 2, 3);

SELECT FROM phpbb_forums WHERE forum_id IN (1, 2, 3)
	

// SQL Statement if $forum_ids = array(1) or $forum_ids = 1

SELECT FROM phpbb_forums WHERE forum_id = 1
	

Of course the same is possible for doing a negative match against a number of values:

$sql = 'SELECT *
	FROM ' . FORUMS_TABLE . '
	WHERE ' . $db->sql_in_set('forum_id', $forum_ids, true);
$db->sql_query($sql);
	

Based on the number of values in $forum_ids, the query can look differently here too.

// SQL Statement if $forum_ids = array(1, 2, 3);

SELECT FROM phpbb_forums WHERE forum_id NOT IN (1, 2, 3)
	

// SQL Statement if $forum_ids = array(1) or $forum_ids = 1

SELECT FROM phpbb_forums WHERE forum_id <> 1
	

If the given array is empty, an error will be produced.

sql_build_query():

The $db->sql_build_query() function is responsible for building sql statements for select and select distinct queries if you need to JOIN on more than one table or retrieving data from more than one table while doing a JOIN. This needs to be used to make sure the resulting statement is working on all supported db's. Instead of explaining every possible combination, i will give a short example:

$sql_array = array(
	'SELECT'	=> 'f.*, ft.mark_time',

	'FROM'		=> array(
		FORUMS_WATCH_TABLE	=> 'fw',
		FORUMS_TABLE		=> 'f'
	),

	'LEFT_JOIN'	=> array(
		array(
			'FROM'	=> array(FORUMS_TRACK_TABLE => 'ft'),
			'ON'	=> 'ft.user_id = ' . $user->data['user_id'] . ' AND ft.forum_id = f.forum_id'
		)
	),

	'WHERE'		=> 'fw.user_id = ' . $user->data['user_id'] . '
		AND f.forum_id = fw.forum_id',

	'ORDER_BY'	=> 'left_id'
);

$sql = $db->sql_build_query('SELECT', $sql_array);
	

The possible first parameter for sql_build_query() is SELECT or SELECT_DISTINCT. As you can see, the logic is pretty self-explaining. For the LEFT_JOIN key, just add another array if you want to join on to tables for example. The added benefit of using this construct is that you are able to easily build the query statement based on conditions - for example the above LEFT_JOIN is only necessary if server side topic tracking is enabled; a slight adjustement would be:

$sql_array = array(
	'SELECT'	=> 'f.*',

	'FROM'		=> array(
		FORUMS_WATCH_TABLE	=> 'fw',
		FORUMS_TABLE		=> 'f'
	),

	'WHERE'		=> 'fw.user_id = ' . $user->data['user_id'] . '
		AND f.forum_id = fw.forum_id',

	'ORDER_BY'	=> 'left_id'
);

if ($config['load_db_lastread'])
{
	$sql_array['LEFT_JOIN'] = array(
		array(
			'FROM'	=> array(FORUMS_TRACK_TABLE => 'ft'),
			'ON'	=> 'ft.user_id = ' . $user->data['user_id'] . ' AND ft.forum_id = f.forum_id'
		)
	);

	$sql_array['SELECT'] .= ', ft.mark_time ';
}
else
{
	// Here we read the cookie data
}

$sql = $db->sql_build_query('SELECT', $sql_array);
	

2.iv. Optimizations

Operations in loop definition:

Always try to optimize your loops if operations are going on at the comparing part, since this part is executed every time the loop is parsed through. For assignments a descriptive name should be chosen. Example:

// On every iteration the sizeof function is called

for ($i = 0; $i < sizeof($post_data); $i++)
{
	do_something();
}
	

// You are able to assign the (not changing) result within the loop itself

for ($i = 0, $size = sizeof($post_data); $i < $size; $i++)
{
	do_something();
}
	

Use of in_array():

Try to avoid using in_array() on huge arrays, and try to not place them into loops if the array to check consist of more than 20 entries. in_array() can be very time consuming and uses a lot of cpu processing time. For little checks it is not noticable, but if checked against a huge array within a loop those checks alone can be a bunch of seconds. If you need this functionality, try using isset() on the arrays keys instead, actually shifting the values into keys and vice versa. A call to isset($array[$var]) is a lot faster than in_array($var, array_keys($array)) for example.

2.v. General Guidelines

General things:

Never trust user input (this also applies to server variables as well as cookies).

Try to sanitize values returned from a function.

Try to sanitize given function variables within your function.

The auth class should be used for all authorisation checking.

No attempt should be made to remove any copyright information (either contained within the source or displayed interactively when the source is run/compiled), neither should the copyright information be altered in any way (it may be added to).

Variables:

Make use of the request_var() function for anything except for submit or single checking params.

The request_var function determines the type to set from the second parameter (which determines the default value too). If you need to get a scalar variable type, you need to tell this the request_var function explicitly. Examples:

// Old method, do not use it

$start = (isset($HTTP_GET_VARS['start'])) ? intval($HTTP_GET_VARS['start']) : intval($HTTP_POST_VARS['start']);
$submit = (isset($HTTP_POST_VARS['submit'])) ? true : false;
	

// Use request var and define a default variable (use the correct type)

$start = request_var('start', 0);
$submit = (isset($_POST['submit'])) ? true : false;
	

// $start is an int, the following use of request_var therefore is not allowed

$start = request_var('start', '0');
	

// Getting an array, keys are integers, value defaults to 0

$mark_array = request_var('mark', array(0));
	

// Getting an array, keys are strings, value defaults to 0

$action_ary = request_var('action', array('' => 0));
	

Login checks/redirection:

To show a forum login box use login_forum_box($forum_data), else use the login_box() function.

The login_box() function can have a redirect as the first parameter. As a thumb of rule, specify an empty string if you want to redirect to the users current location, else do not add the $SID to the redirect string (for example within the ucp/login we redirect to the board index because else the user would be redirected to the login screen).

Sensitive Operations:

For sensitive operations always let the user confirm the action. For the confirmation screens, make use of the confirm_box() function.

Altering Operations:

For operations altering the state of the database, for instance posting, always verify the form token, unless you are already using confirm_box(). To do so, make use of the add_form_key() and check_form_key() functions.

	add_form_key('my_form');

	if ($submit)
	{
		if (!check_form_key('my_form'))
		{
			trigger_error('FORM_INVALID');
		}
	}
	

The string passed to add_form_key() needs to match the string passed to check_form_key(). Another requirement for this to work correctly is that all forms include the {S_FORM_TOKEN} template variable.

Sessions:

Sessions should be initiated on each page, as near the top as possible using the following code:

$user->session_begin();
$auth->acl($user->data);
$user->setup();
	

The $user->setup() call can be used to pass on additional language definition and a custom style (used in viewforum).

Errors and messages:

All messages/errors should be outputed by calling trigger_error() using the appropriate message type and language string. Example:

trigger_error('NO_FORUM');
	
trigger_error($user->lang['NO_FORUM']);
	
trigger_error('NO_MODE', E_USER_ERROR);
	

Url formatting

All urls pointing to internal files need to be prepended by the $phpbb_root_path variable. Within the administration control panel all urls pointing to internal files need to be prepended by the $phpbb_admin_path variable. This makes sure the path is always correct and users being able to just rename the admin folder and the acp still working as intended (though some links will fail and the code need to be slightly adjusted).

The append_sid() function from 2.0.x is available too, though does not handle url alterations automatically. Please have a look at the code documentation if you want to get more details on how to use append_sid(). A sample call to append_sid() can look like this:

append_sid("{$phpbb_root_path}memberlist.$phpEx", 'mode=group&amp;g=' . $row['group_id'])
	

General function usage:

Some of these functions are only chosen over others because of personal preference and having no other benefit than to be consistant over the code.

  • Use sizeof instead of count

  • Use strpos instead of strstr

  • Use else if instead of elseif

  • Use false (lowercase) instead of FALSE

  • Use true (lowercase) instead of TRUE


3. Styling

General things

Templates should be produced in a consistent manner. Where appropriate they should be based off an existing copy, e.g. index, viewforum or viewtopic (the combination of which implement a range of conditional and variable forms). Please also note that the intendation and coding guidelines also apply to templates where possible.

The outer table class forumline has gone and is replaced with tablebg.

When writing <table> the order <table class="" cellspacing="" cellpadding="" border="" align=""> creates consistency and allows everyone to easily see which table produces which "look". The same applies to most other tags for which additional parameters can be set, consistency is the major aim here.

Each block level element should be indented by one tab, same for tabular elements, e.g. <tr> <td> etc., whereby the intendiation of <table> and the following/ending <tr> should be on the same line. This applies not to div elements of course.

Don't use <span> more than is essential ... the CSS is such that text sizes are dependent on the parent class. So writing <span class="gensmall"><span class="gensmall">TEST</span></span> will result in very very small text. Similarly don't use span at all if another element can contain the class definition, e.g.

<td><span class="gensmall">TEST</span></td>

can just as well become:

<td class="gensmall">TEST</td>

Try to match text class types with existing useage, e.g. don't use the nav class where viewtopic uses gensmall for example.

Row colours/classes are now defined by the template, use an IF S_ROW_COUNT switch, see viewtopic or viewforum for an example.

Remember block level ordering is important ... while not all pages validate as XHTML 1.0 Strict compliant it is something we're trying to work too.

Use a standard cellpadding of 2 and cellspacing of 0 on outer tables. Inner tables can vary from 0 to 3 or even 4 depending on the need.

Use div container/css for styling and table for data representation.

The separate catXXXX and thXXX classes are gone. When defining a header cell just use <th> rather than <th class="thHead"> etc. Similarly for cat, don't use <td class="catLeft"> use <td class="cat"> etc.

Try to retain consistency of basic layout and class useage, i.e. _EXPLAIN text should generally be placed below the title it explains, e.g. {L_POST_USERNAME}<br /><span class="gensmall">{L_POST_USERNAME_EXPLAIN}</span> is the typical way of handling this ... there may be exceptions and this isn't a hard and fast rule.

Try to keep template conditional and other statements tabbed in line with the block to which they refer.

this is correct

<!-- BEGIN test -->
	<tr>
		<td>{test.TEXT}</td>
	</tr>
<!-- END test -->

this is also correct:

<!-- BEGIN test -->
<tr>
	<td>{test.TEXT}</td>
</tr>
<!-- END test -->

it gives immediate feedback on exactly what is looping - decide which way to use based on the readability.


4. Templating

File naming

Firstly templates now take the suffix ".html" rather than ".tpl". This was done simply to make the lifes of some people easier wrt syntax highlighting, etc.

Variables

All template variables should be named appropriately (using underscores for spaces), language entries should be prefixed with L_, system data with S_, urls with U_, javascript urls with UA_, language to be put in javascript statements with LA_, all other variables should be presented 'as is'.

L_* template variables are automatically tried to be mapped to the corresponding language entry if the code does not set (and therefore overwrite) this variable specifically. For example {L_USERNAME} maps to $user->lang['USERNAME']. The LA_* template variables are handled within the same way, but properly escaped to be put in javascript code. This should reduce the need to assign loads of new lang vars in Modifications.

Blocks/Loops

The basic block level loop remains and takes the form:

<!-- BEGIN loopname -->
	markup, {loopname.X_YYYYY}, etc.
<!-- END loopname -->

A bit later loops will be explained further. To not irretate you we will explain conditionals as well as other statements first.

Including files

Something that existed in 2.0.x which no longer exists in 3.0.x is the ability to assign a template to a variable. This was used (for example) to output the jumpbox. Instead (perhaps better, perhaps not but certainly more flexible) we now have INCLUDE. This takes the simple form:

<!-- INCLUDE filename -->

You will note in the 3.0 templates the major sources start with <!-- INCLUDE overall_header.html --> or <!-- INCLUDE simple_header.html -->, etc. In 2.0.x control of "which" header to use was defined entirely within the code. In 3.0.x the template designer can output what they like. Note that you can introduce new templates (i.e. other than those in the default set) using this system and include them as you wish ... perhaps useful for a common "menu" bar or some such. No need to modify loads of files as with 2.0.x.

PHP

A contentious decision has seen the ability to include PHP within the template introduced. This is achieved by enclosing the PHP within relevant tags:

<!-- PHP -->
	echo "hello!";
<!-- ENDPHP -->

You may also include PHP from an external file using:

<!-- INCLUDEPHP somefile.php -->

it will be included and executed inline.

A note, it is very much encouraged that template designers do not include PHP. The ability to include raw PHP was introduced primarily to allow end users to include banner code, etc. without modifying multiple files (as with 2.0.x). It was not intended for general use ... hence www.phpbb.com will not make available template sets which include PHP. And by default templates will have PHP disabled (the admin will need to specifically activate PHP for a template).

Conditionals/Control structures

The most significant addition to 3.0.x are conditions or control structures, "if something then do this else do that". The system deployed is very similar to Smarty. This may confuse some people at first but it offers great potential and great flexibility with a little imagination. In their most simple form these constructs take the form:

<!-- IF expr -->
	markup
<!-- ENDIF -->

expr can take many forms, for example:

<!-- IF loop.S_ROW_COUNT is even -->
	markup
<!-- ENDIF -->

This will output the markup if the S_ROW_COUNT variable in the current iteration of loop is an even value (i.e. the expr is TRUE). You can use various comparison methods (standard as well as equivalent textual versions noted in square brackets) including (not, or, and, eq, neq, is should be used if possible for better readability):

== [eq]
!= [neq, ne]
<> (same as !=)
!== (not equivalent in value and type)
=== (equivalent in value and type)
> [gt]
< [lt]
>= [gte]
<= [lte]
&& [and]
|| [or]
% [mod]
! [not]
+
-
*
/
,
<< (bitwise shift left)
>> (bitwise shift right)
| (bitwise or)
^ (bitwise xor)
& (bitwise and)
~ (bitwise not)
is (can be used to join comparison operations)

Basic parenthesis can also be used to enforce good old BODMAS rules. Additionally some basic comparison types are defined:

even
odd
div

Beyond the simple use of IF you can also do a sequence of comparisons using the following:

<!-- IF expr1 -->
	markup
<!-- ELSEIF expr2 -->
	markup
	.
	.
	.
<!-- ELSEIF exprN -->
	markup
<!-- ELSE -->
	markup
<!-- ENDIF -->

Each statement will be tested in turn and the relevant output generated when a match (if a match) is found. It is not necessary to always use ELSEIF, ELSE can be used alone to match "everything else".

So what can you do with all this? Well take for example the colouration of rows in viewforum. In 2.0.x row colours were predefined within the source as either row color1, row color2 or row class1, row class2. In 3.0.x this is moved to the template, it may look a little daunting at first but remember control flows from top to bottom and it's not too difficult:

<table>
	<!-- IF loop.S_ROW_COUNT is even -->
		<tr class="row1">
	<!-- ELSE -->
		<tr class="row2">
	<!-- ENDIF -->
	<td>HELLO!</td>
</tr>
</table>

This will cause the row cell to be output using class row1 when the row count is even, and class row2 otherwise. The S_ROW_COUNT parameter gets assigned to loops by default. Another example would be the following:

<table>
	<!-- IF loop.S_ROW_COUNT > 10 -->
		<tr bgcolor="#FF0000">
	<!-- ELSEIF loop.S_ROW_COUNT > 5 -->
		<tr bgcolor="#00FF00">
	<!-- ELSEIF loop.S_ROW_COUNT > 2 -->
		<tr bgcolor="#0000FF">
	<!-- ELSE -->
		<tr bgcolor="#FF00FF">
	<!-- ENDIF -->
	<td>hello!</td>
</tr>
</table>

This will output the row cell in purple for the first two rows, blue for rows 2 to 5, green for rows 5 to 10 and red for remainder. So, you could produce a "nice" gradient effect, for example.

What else can you do? Well, you could use IF to do common checks on for example the login state of a user:

<!-- IF S_USER_LOGGED_IN -->
	markup
<!-- ENDIF -->

This replaces the existing (fudged) method in 2.0.x using a zero length array and BEGIN/END.

Extended syntax for Blocks/Loops

Back to our loops - they had been extended with the following additions. Firstly you can set the start and end points of the loop. For example:

<!-- BEGIN loopname(2) -->
	markup
<!-- END loopname -->

Will start the loop on the third entry (note that indexes start at zero). Extensions of this are:

loopname(2): Will start the loop on the 3rd entry
loopname(-2): Will start the loop two entries from the end
loopname(3,4): Will start the loop on the fourth entry and end it on the fifth
loopname(3,-4): Will start the loop on the fourth entry and end it four from last

A further extension to begin is BEGINELSE:

<!-- BEGIN loop -->
	markup
<!-- BEGINELSE -->
	markup
<!-- END loop -->

This will cause the markup between BEGINELSE and END to be output if the loop contains no values. This is useful for forums with no topics (for example) ... in some ways it replaces "bits of" the existing "switch_" type control (the rest being replaced by conditionals).

Another way of checking if a loop contains values is by prefixing the loops name with a dot:

<!-- IF .loop -->
	<!-- BEGIN loop -->
		markup
	<!-- END loop -->
<!-- ELSE -->
	markup
<!-- ENDIF -->

You are even able to check the number of items within a loop by comparing it with values within the IF condition:

<!-- IF .loop > 2 -->
	<!-- BEGIN loop -->
		markup
	<!-- END loop -->
<!-- ELSE -->
	markup
<!-- ENDIF -->

Nesting loops cause the conditionals needing prefixed with all loops from the outer one to the inner most. An illustration of this:

<!-- BEGIN firstloop -->
	{firstloop.MY_VARIABLE_FROM_FIRSTLOOP}

	<!-- BEGIN secondloop -->
		{firstloop.secondloop.MY_VARIABLE_FROM_SECONDLOOP}
	<!-- END secondloop -->
<!-- END firstloop -->

Sometimes it is necessary to break out of nested loops to be able to call another loop within the current iteration. This sounds a little bit confusing and it is not used very often. The following (rather complex) example shows this quite good - it also shows how you test for the first and last row in a loop (i will explain the example in detail further down):

<!-- BEGIN l_block1 -->
	<!-- IF l_block1.S_SELECTED -->
		<strong>{l_block1.L_TITLE}</strong>
		<!-- IF S_PRIVMSGS -->

			<!-- the ! at the beginning of the loop name forces the loop to be not a nested one of l_block1 -->
			<!-- BEGIN !folder -->
				<!-- IF folder.S_FIRST_ROW -->
					<ul class="nav">
				<!-- ENDIF -->

				<li><a href="{folder.U_FOLDER}">{folder.FOLDER_NAME}</a></li>

				<!-- IF folder.S_LAST_ROW -->
					</ul>
				<!-- ENDIF -->
			<!-- END !folder -->

		<!-- ENDIF -->

		<ul class="nav">
		<!-- BEGIN l_block2 -->
			<li>
				<!-- IF l_block1.l_block2.S_SELECTED -->
					<strong>{l_block1.l_block2.L_TITLE}</strong>
				<!-- ELSE -->
					<a href="{l_block1.l_block2.U_TITLE}">{l_block1.l_block2.L_TITLE}</a>
				<!-- ENDIF -->
			</li>
		<!-- END l_block2 -->
		</ul>
	<!-- ELSE -->
		<a class="nav" href="{l_block1.U_TITLE}">{l_block1.L_TITLE}</a>
	<!-- ENDIF -->
<!-- END l_block1 -->

Let us first concentrate on this part of the example:

<!-- BEGIN l_block1 -->
	<!-- IF l_block1.S_SELECTED -->
		markup
	<!-- ELSE -->
		<a class="nav" href="{l_block1.U_TITLE}">{l_block1.L_TITLE}</a>
	<!-- ENDIF -->
<!-- END l_block1 -->

Here we open the loop l_block1 and doing some things if the value S_SELECTED within the current loop iteration is true, else we write the blocks link and title. Here, you see {l_block1.L_TITLE} referenced - you remember that L_* variables get automatically assigned the corresponding language entry? This is true, but not within loops. The L_TITLE variable within the loop l_block1 is assigned within the code itself.

Let's have a closer look to the markup:

<!-- BEGIN l_block1 -->
.
.
	<!-- IF S_PRIVMSGS -->

		<!-- BEGIN !folder -->
			<!-- IF folder.S_FIRST_ROW -->
				<ul class="nav">
			<!-- ENDIF -->

			<li><a href="{folder.U_FOLDER}">{folder.FOLDER_NAME}</a></li>

			<!-- IF folder.S_LAST_ROW -->
				</ul>
			<!-- ENDIF -->
		<!-- END !folder -->

	<!-- ENDIF -->
.
.
<!-- END l_block1 -->

The <!-- IF S_PRIVMSGS --> statement clearly checks a global variable and not one within the loop, since the loop is not given here. So, if S_PRIVMSGS is true we execute the shown markup. Now, you see the <!-- BEGIN !folder --> statement. The exclamation mark is responsible for instructing the template engine to iterate through the main loop folder. So, we are now within the loop folder - with <!-- BEGIN folder --> we would have been within the loop l_block1.folder automatically as is the case with l_block2:

<!-- BEGIN l_block1 -->
.
.
	<ul class="nav">
	<!-- BEGIN l_block2 -->
		<li>
			<!-- IF l_block1.l_block2.S_SELECTED -->
				<strong>{l_block1.l_block2.L_TITLE}</strong>
			<!-- ELSE -->
				<a href="{l_block1.l_block2.U_TITLE}">{l_block1.l_block2.L_TITLE}</a>
			<!-- ENDIF -->
		</li>
	<!-- END l_block2 -->
	</ul>
.
.
<!-- END l_block1 -->

You see the difference? The loop l_block2 is a member of the loop l_block1 but the loop folder is a main loop.

Now back to our folder loop:

<!-- IF folder.S_FIRST_ROW -->
	<ul class="nav">
<!-- ENDIF -->

<li><a href="{folder.U_FOLDER}">{folder.FOLDER_NAME}</a></li>

<!-- IF folder.S_LAST_ROW -->
	</ul>
<!-- ENDIF -->

You may have wondered what the comparison to S_FIRST_ROW and S_LAST_ROW is about. If you haven't guessed already - it is checking for the first iteration of the loop with S_FIRST_ROW and the last iteration with S_LAST_ROW. This can come in handy quite often if you want to open or close design elements, like the above list. Let us imagine a folder loop build with three iterations, it would go this way:

<ul class="nav"> <!-- written on first iteration -->
	<li>first element</li> <!-- written on first iteration -->
	<li>second element</li> <!-- written on second iteration -->
	<li>third element</li> <!-- written on third iteration -->
</ul> <!-- written on third iteration -->

As you can see, all three elements are written down as well as the markup for the first iteration and the last one. Sometimes you want to omit writing the general markup - for example:

<!-- IF folder.S_FIRST_ROW -->
	<ul class="nav">
<!-- ELSEIF folder.S_LAST_ROW -->
	</ul>
<!-- ELSE -->
	<li><a href="{folder.U_FOLDER}">{folder.FOLDER_NAME}</a></li>
<!-- ENDIF -->

would result in the following markup:

<ul class="nav"> <!-- written on first iteration -->
	<li>second element</li> <!-- written on second iteration -->
</ul> <!-- written on third iteration -->

Just always remember that processing is taking place from up to down.

Forms

If a form is used for a non-trivial operation (i.e. more than a jumpbox), then it should include the {S_FORM_TOKEN} template variable.

<form method="post" id="mcp" action="{U_POST_ACTION}">

	<fieldset class="submit-buttons">
		<input type="reset" value="{L_RESET}" name="reset" class="button2" /> 
		<input type="submit" name="action[add_warning]" value="{L_SUBMIT}" class="button1" />
	</fieldset>
	{S_FORM_TOKEN}
</form>
		


5. Character Sets and Encodings

What are Unicode, UCS and UTF-8?

The Universal Character Set (UCS) described in ISO/IEC 10646 consists of a large amount of characters. Each of them has a unique name and a code point which is an integer number. Unicode - which is an industry standard - complements the Universal Character Set with further information about the characters' properties and alternative character encodings. More information on Unicode can be found on the Unicode Consortium's website. One of the Unicode encodings is the 8-bit Unicode Transformation Format (UTF-8). It encodes characters with up to four bytes aiming for maximum compatability with the American Standard Code for Information Interchange which is a 7-bit encoding of a relatively small subset of the UCS.

phpBB's use of Unicode

Unfortunately PHP does not faciliate the use of Unicode prior to version 6. Most functions simply treat strings as sequences of bytes assuming that each character takes up exactly one byte. This behaviour still allows for storing UTF-8 encoded text in PHP strings but many operations on strings have unexpected results. To circumvent this problem we have created some alternative functions to PHP's native string operations which use code points instead of bytes. These functions can be found in /includes/utf/utf_tools.php. They are also covered in the phpBB3 Sourcecode Documentation. A lot of native PHP functions still work with UTF-8 as long as you stick to certain restrictions. For example explode still works as long as the first and the last character of the delimiter string are ASCII characters.

phpBB only uses the ASCII and the UTF-8 character encodings. Still all Strings are UTF-8 encoded because ASCII is a subset of UTF-8. The only exceptions to this rule are code sections which deal with external systems which use other encodings and character sets. Such external data should be converted to UTF-8 using the utf8_recode() function supplied with phpBB. It supports a variety of other character sets and encodings, a full list can be found below.

With request_var() you can either allow all UCS characters in user input or restrict user input to ASCII characters. This feature is controlled by the function's third parameter called $multibyte. You should allow multibyte characters in posts, PMs, topic titles, forum names, etc. but it's not necessary for internal uses like a $mode variable which should only hold a predefined list of ASCII strings anyway.

// an input string containing a multibyte character
$_REQUEST['multibyte_string'] = 'Käse';

// print request variable as a UTF-8 string allowing multibyte characters
echo request_var('multibyte_string', '', true);
// print request variable as ASCII string
echo request_var('multibyte_string', '');

This code snippet will generate the following output:

Käse
K??se

Unicode Normalization

If you retrieve user input with multibyte characters you should additionally normalize the string using utf8_normalize_nfc() before you work with it. This is necessary to make sure that equal characters can only occur in one particular binary representation. For example the character Å can be represented either as U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE) or as U+212B (ANGSTROM SIGN). phpBB uses Normalization Form Canonical Composition (NFC) for all text. So the correct version of the above example would look like this:

$_REQUEST['multibyte_string'] = 'Käse';

// normalize multibyte strings
echo utf8_normalize_nfc(request_var('multibyte_string', '', true));
// ASCII strings do not need to be normalized
echo request_var('multibyte_string', '');

Case Folding

Case insensitive comparison of strings is no longer possible with strtolower or strtoupper as some characters have multiple lower case or multiple upper case forms depending on their position in a word. The utf8_strtolower and the utf8_strtoupper functions suffer from the same problem so they can only be used to display upper/lower case versions of a string but they cannot be used for case insensitive comparisons either. So instead you should use case folding which gives you a case insensitive version of the string which can be used for case insensitive comparisons. An NFC normalized string can be case folded using utf8_case_fold_nfc().

// Bad - The strings might be the same even if strtolower differs

if (strtolower($string1) == strtolower($string2))
{
	echo '$string1 and $string2 are equal or differ in case';
}

// Good - Case folding is really case insensitive

if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
{
	echo '$string1 and $string2 are equal or differ in case';
}

Confusables Detection

phpBB offers a special method utf8_clean_string which can be used to make sure string identifiers are unique. This method uses Normalization Form Compatibility Composition (NFKC) instead of NFC and replaces similarly looking characters with a particular representative of the equivalence class. This method is currently used for usernames and group names to avoid confusion with similarly looking names.


6. Translation (i18n/L10n) Guidelines

6.i. Standardisation

Reason:

phpBB is one of the most translated open-source projects, with the current stable version being available in over 60 localisations. Whilst the ad hoc approach to the naming of language packs has worked, for phpBB3 and beyond we hope to make this process saner which will allow for better interoperation with current and future web browsers.

Encoding:

With phpBB3, the output encoding for the forum in now UTF-8, a Universal Character Encoding by the Unicode Consortium that is by design a superset to US-ASCII and ISO-8859-1. By using one character set which simultaenously supports all scripts which previously would have required different encodings (eg: ISO-8859-1 to ISO-8859-15 (Latin, Greek, Cyrillic, Thai, Hebrew, Arabic); GB2312 (Simplified Chinese); Big5 (Traditional Chinese), EUC-JP (Japanese), EUC-KR (Korean), VISCII (Vietnamese); et cetera), this removes the need to convert between encodings and improves the accessibility of multilingual forums.

The impact is that the language files for phpBB must now also be encoded as UTF-8, with a caveat that the files must not contain a BOM for compatibility reasons with non-Unicode aware versions of PHP. For those with forums using the Latin character set (ie: most European languages), this change is transparent since UTF-8 is superset to US-ASCII and ISO-8859-1.

Language Tag:

The IETF recently published RFC 4646 for tags used to identify languages, which in combination with RFC 4647 obseletes the older RFC 3006 and older-still RFC 1766. RFC 4646 uses ISO 639-1/ISO 639-2, ISO 3166-1 alpha-2, ISO 15924 and UN M.49 to define a language tag. Each complete tag is composed of subtags which are not case sensitive and can also be empty.

Ordering of the subtags in the case that they are all non-empty is: language-script-region-variant-extension-privateuse. Should any subtag be empty, its corresponding hyphen would also be ommited. Thus, the language tag for English will be en and not en-----.

Most language tags consist of a two- or three-letter language subtag (from ISO 639-1/ISO 639-2). Sometimes, this is followed by a two-letter or three-digit region subtag (from ISO 3166-1 alpha-2 or UN M.49). Some examples are:

Language tag examples
Language tag Description Component subtags
en English language
mas Masai language
fr-CA French as used in Canada language+region
en-833 English as used in the Isle of Man language+region
zh-Hans Chinese written with Simplified script language+script
zh-Hant-HK Chinese written with Traditional script as used in Hong Kong language+script+region
de-AT-1996 German as used in Austria with 1996 orthography language+region+variant

The ultimate aim of a language tag is to convey the needed useful distingushing information, whilst keeping it as short as possible. So for example, use en, fr and ja as opposed to en-GB, fr-FR and ja-JP, since we know English, French and Japanese are the native language of Great Britain, France and Japan respectively.

Next is the ISO 15924 language script code and when one should or shouldn't use it. For example, whilst en-Latn is syntaxically correct for describing English written with Latin script, real world English writing is more-or-less exclusively in the Latin script. For such languages like English that are written in a single script, the IANA Language Subtag Registry has a "Suppress-Script" field meaning the script code should be ommitted unless a specific language tag requires a specific script code. Some languages are written in more than one script and in such cases, the script code is encouraged since an end-user may be able to read their language in one script, but not the other. Some examples are:

Language subtag + script subtag examples
Language tag Description Component subtags
en-Brai English written in Braille script language+script
en-Dsrt English written in Deseret (Mormon) script language+script
sr-Latn Serbian written in Latin script language+script
sr-Cyrl Serbian written in Cyrillic script language+script
mn-Mong Mongolian written in Mongolian script language+script
mn-Cyrl Mongolian written in Cyrillic script language+script
mn-Phag Mongolian written in Phags-pa script language+script
az-Cyrl-AZ Azerbaijani written in Cyrillic script as used in Azerbaijan language+script+region
az-Latn-AZ Azerbaijani written in Latin script as used in Azerbaijan language+script+region
az-Arab-IR Azerbaijani written in Arabic script as used in Iran language+script+region

Usage of the three-digit UN M.49 code over the two-letter ISO 3166-1 alpha-2 code should hapen if a macro-geographical entity is required and/or the ISO 3166-1 alpha-2 is ambiguous.

Examples of English using marco-geographical regions:

Coding for English using macro-geographical regions
ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2 ISO 639-1/ISO 639-2 + UN M.49 (Example macro regions)
en-AU
English as used in Australia
en-053
English as used in Australia & New Zealand
en-009
English as used in Oceania
en-NZ
English as used in New Zealand
en-FJ
English as used in Fiji
en-054
English as used in Melanesia

Examples of Spanish using marco-geographical regions:

Coding for Spanish macro-geographical regions
ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2 ISO 639-1/ISO 639-2 + UN M.49 (Example macro regions)
es-PR
Spanish as used in Puerto Rico
es-419
Spanish as used in Latin America & the Caribbean
es-019
Spanish as used in the Americas
es-HN
Spanish as used in Honduras
es-AR
Spanish as used in Argentina
es-US
Spanish as used in United States of America
es-021
Spanish as used in North America

Example of where the ISO 3166-1 alpha-2 is ambiguous and why UN M.49 might be preferred:

Coding for ambiguous ISO 3166-1 alpha-2 regions
CS assignment pre-1994 CS assignment post-1994
CS
Czechoslovakia (ISO 3166-1)
200
Czechoslovakia (UN M.49)
CS
Serbian & Montenegro (ISO 3166-1)
891
Serbian & Montenegro (UN M.49)
CZ
Czech Republic (ISO 3166-1)
203
Czech Republic (UN M.49)
SK
Slovakia (ISO 3166-1)
703
Slovakia (UN M.49)
RS
Serbia (ISO 3166-1)
688
Serbia (UN M.49)
ME
Montenegro (ISO 3166-1)
499
Montenegro (UN M.49)

Macro-languages & Topolects:

RFC 4646 anticipates features which shall be available in (currently draft) ISO 639-3 which aims to provide as complete enumeration of languages as possible, including living, extinct, ancient and constructed languages, whether majour, minor or unwritten. A new feature of ISO 639-3 compared to the previous two revisions is the concept of macrolanguages where Arabic and Chinese are two such examples. In such cases, their respective codes of ar and zh is very vague as to which dialect/topolect is used or perhaps some terse classical variant which may be difficult for all but very educated users. For such macrolanguages, it is recommended that the sub-language tag is used as a suffix to the macrolanguage tag, eg:

Macrolanguage subtag + sub-language subtag examples
Language tag Description Component subtags
zh-cmn Mandarin (Putonghau/Guoyu) Chinese macrolanguage+sublanguage
zh-yue Yue (Cantonese) Chinese macrolanguage+sublanguage
zh-cmn-Hans Mandarin (Putonghau/Guoyu) Chinese written in Simplified script macrolanguage+sublanguage+script
zh-cmn-Hant Mandarin (Putonghau/Guoyu) Chinese written in Traditional script macrolanguage+sublanguage+script
zh-nan-Latn-TW Minnan (Hoklo) Chinese written in Latin script (POJ Romanisation) as used in Taiwan macrolanguage+sublanguage+script+region

6.ii. Other considerations

Normalisation of language tags for phpBB:

For phpBB, the language tags are not used in their raw form and instead converted to all lower-case and have the hyphen - replaced with an underscore _ where appropriate, with some examples below:

Language tag normalisation examples
Raw language tag Description Value of USER_LANG
in ./common.php
Language pack directory
name in /language/
en British English en en
de-AT German as used in Austria de-at de_at
es-419 Spanish as used in Latin America & Caribbean en-419 en_419
zh-yue-Hant-HK Cantonese written in Traditional script as used in Hong Kong zh-yue-hant-hk zh_yue_hant_hk

How to use iso.txt:

The iso.txt file is a small UTF-8 encoded plain-text file which consists of three lines:

  1. Language's English name
  2. Language's local name
  3. Authors information

iso.txt is automatically generated by the language pack submission system on phpBB.com. You don't have to create this file yourself if you plan on releasing your language pack on phpBB.com, but do keep in mind that phpBB itself does require this file to be present.

Because language tags themselves are meant to be machine read, they can be rather obtuse to humans and why descriptive strings as provided by iso.txt are needed. Whilst en-US could be fairly easily deduced to be "English as used in the United States", de-CH is more difficult less one happens to know that de is from "Deutsch", German for "German" and CH is the abbreviation of the official Latin name for Switzerland, "Confoederatio Helvetica".

For the English language description, the language name is always first and any additional attributes required to describe the subtags within the language code are then listed in order separated with commas and enclosed within parentheses, eg:

English language description examples for iso.txt
Raw language tag English description within iso.txt
en British English
en-US English (United States)
en-053 English (Australia & New Zealand)
de German
de-CH-1996 German (Switzerland, 1996 orthography)
gws-1996 Swiss German (1996 orthography)
zh-cmn-Hans-CN Mandarin Chinese (Simplified, Mainland China)
zh-yue-Hant-HK Cantonese Chinese (Traditional, Hong Kong)

For the localised language description, just translate the English version though use whatever appropriate punctuation typical for your own locale, assuming the language uses punctuation at all.

Unicode bi-directional considerations:

Because phpBB is now UTF-8, all translators must take into account that certain strings may be shown when the directionality of the document is either opposite to normal or is ambiguous.

The various Unicode control characters for bi-directional text and their HTML enquivalents where appropriate are as follows:

Unicode bidirectional control characters & HTML elements/entities
Unicode character
abbreviation
Unicode
code-point
Unicode character
name
Equivalent HTML
markup/entity
Raw character
(enclosed between '')
LRM U+200E Left-to-Right Mark &lrm; '‎'
RLM U+200F Right-to-Left Mark &rlm; '‏'
LRE U+202A Left-to-Right Embedding dir="ltr" '‪'
RLE U+202B Right-to-Left Embedding dir="rtl" '‫'
PDF U+202C Pop Directional Formatting </bdo> '‬'
LRO U+202D Left-to-Right Override <bdo dir="ltr"> '‭'
RLO U+202E Right-to-Left Override <bdo dir="rtl"> '‮'

For iso.txt, the directionality of the text can be explicitly set using special Unicode characters via any of the three methods provided by left-to-right/right-to-left markers/embeds/overrides, as without them, the ordering of characters will be incorrect, eg:

Unicode bidirectional control characters iso.txt
Directionality Raw character view Display of localised
description in iso.txt
Ordering
dir="ltr" English (Australia & New Zealand) English (Australia & New Zealand) Correct
dir="rtl" English (Australia & New Zealand) English (Australia & New Zealand) Incorrect
dir="rtl" with LRM English (Australia & New Zealand)U+200E English (Australia & New Zealand)‎ Correct
dir="rtl" with LRE & PDF U+202AEnglish (Australia & New Zealand)U+202C ‪English (Australia & New Zealand)‬ Correct
dir="rtl" with LRO & PDF U+202DEnglish (Australia & New Zealand)U+202C ‭English (Australia & New Zealand)‬ Correct

In choosing which of the three methods to use, in the majority of cases, the LRM or RLM to put a "strong" character to fully enclose an ambiguous punctuation character and thus make it inherit the correct directionality is sufficient.

Within some cases, there may be mixed scripts of a left-to-right and right-to-left direction, so using LRE & RLE with PDF may be more appropriate. Lastly, in very rare instances where directionality must be forced, then use LRO & RLO with PDF.

For further information on authoring techniques of bi-directional text, please see the W3C tutorial on authoring techniques for XHTML pages with bi-directional text.

Working with placeholders:

As phpBB is translated into languages with different ordering rules to that of English, it is possible to show specific values in any order deemed appropriate. Take for example the extremely simple "Page X of Y", whilst in English this could just be coded as:

	...
'PAGE_OF'	=>	'Page %s of %s',
		/* Just grabbing the replacements as they
		come and hope they are in the right order */
	...
	

… a clearer way to show explicit replacement ordering is to do:

	...
'PAGE_OF'	=>	'Page %1$s of %2$s',
		/* Explicit ordering of the replacements,
		even if they are the same order as English */
	...
	

Why bother at all? Because some languages, the string transliterated back to English might read something like:

	...
'PAGE_OF'	=>	'Total of %2$s pages, currently on page %1$s',
		/* Explicit ordering of the replacements,
		reversed compared to English as the total comes first */
	...
	

6.iii. Writing Style

Miscellaneous tips & hints:

As the language files are PHP files, where the various strings for phpBB are stored within an array which in turn are used for display within an HTML page, rules of syntax for both must be considered. Potentially problematic characters are: ' (straight quote/apostrophe), " (straight double quote), < (less-than sign), > (greater-than sign) and & (ampersand).

// Bad - The un-escapsed straight-quote/apostrophe will throw a PHP parse error

	...
'CONV_ERROR_NO_AVATAR_PATH'
	=>	'Note to developer: you must specify $convertor['avatar_path'] to use %s.',
	...
	

// Good - Literal straight quotes should be escaped with a backslash, ie: \

	...
'CONV_ERROR_NO_AVATAR_PATH'
	=>	'Note to developer: you must specify $convertor[\'avatar_path\'] to use %s.',
	...
	

However, because phpBB3 now uses UTF-8 as its sole encoding, we can actually use this to our advantage and not have to remember to escape a straight quote when we don't have to:

// Bad - The un-escapsed straight-quote/apostrophe will throw a PHP parse error

	...
'USE_PERMISSIONS'	=>	'Test out user's permissions',
	...
	

// Okay - However, non-programmers wouldn't type "user\'s" automatically

	...
'USE_PERMISSIONS'	=>	'Test out user\'s permissions',
	...
	

// Best - Use the Unicode Right-Single-Quotation-Mark character

	...
'USE_PERMISSIONS'	=>	'Test out user’s permissions',
	...
	

The " (straight double quote), < (less-than sign) and > (greater-than sign) characters can all be used as displayed glyphs or as part of HTML markup, for example:

// Bad - Invalid HTML, as segments not part of elements are not entitised

	...
'FOO_BAR'	=>	'PHP version < 4.3.3.<br />
	Visit "Downloads" at <a href="http://www.php.net/">www.php.net</a>.',
	...
	

// Okay - No more invalid HTML, but "&quot;" is rather clumsy

	...
'FOO_BAR'	=>	'PHP version &lt; 4.3.3.<br />
	Visit &quot;Downloads&quot; at <a href="http://www.php.net/">www.php.net</a>.',
	...
	

// Best - No more invalid HTML, and usage of correct typographical quotation marks

	...
'FOO_BAR'	=>	'PHP version &lt; 4.3.3.<br />
	Visit “Downloads” at <a href="http://www.php.net/">www.php.net</a>.',
	...
	

Lastly, the & (ampersand) must always be entitised regardless of where it is used:

// Bad - Invalid HTML, none of the ampersands are entitised

	...
'FOO_BAR'	=>	'<a href="http://somedomain.tld/?foo=1&bar=2">Foo & Bar</a>.',
	...
	

// Good - Valid HTML, amperands are correctly entitised in all cases

	...
'FOO_BAR'	=>	'<a href="http://somedomain.tld/?foo=1&amp;bar=2">Foo &amp; Bar</a>.',
	...
	

As for how these charcters are entered depends very much on choice of Operating System, current language locale/keyboard configuration and native abilities of the text editor used to edit phpBB language files. Please see http://en.wikipedia.org/wiki/Unicode#Input_methods for more information.

Spelling, punctuation, grammar, et cetera:

The default language pack bundled with phpBB is British English using Cambridge University Press spelling and is assigned the language code en. The style and tone of writing tends towards formal and translations should emulate this style, at least for the variant using the most compact language code. Less formal translations or those with colloquialisms must be denoted as such via either an extension or privateuse tag within its language code.


7. Журнал изменений

Изменения с ревизии 1.31

  • Добавлены описания add_form_key и check_form_key.

Изменения с ревизии 1.24

Изменения с ревизии 1.16

Изменения с ревизии 1.11-1.15

  • Различные грамматические и пунктационные правки. Исправленно форматирование.

Изменения с ревизии 1.9-1.10

Изменения с ревизии 1.8

Изменения с ревизии 1.5


8. Авторские права и отказ от обязательств

Это приложение с открытым исходным кодом выпущено под лицензией GPL. Для получения дополнительной информации смотрите исходный код и каталог «docs». Этот пакет и его содержимое является собственностью (c) 2000, 2002, 2005, 2007 phpBB Group. Все права защищены.
Перевод этого документа на русский язык выполнили Xpert и VVVas.