Globalization, Localization, Internationalization and Translation

CAT process which associates segments of text from source and target language pairs of a translated document to build translation memories.
attribute - 屬性
big5 - 大五
most common character set used in Taiwan, over 20,000 characters, including 13,053 Chinese characters
bit - 位元
byte - 位元組
CAT - 電腦輔助翻譯, 計算機輔助翻譯
Computer Assisted Translation - Translation using software which assists the translator with consistency, speed, or project management. Translation memories and lexicons created from previous translations with CAT tools may be leveraged through the CAT's translator interface. CAT tools consist of tools to build, manage and use translation memories, terminology lexicons and translation projects.
character- 字元
charset, character set - 字集, 編碼, 文字集合
a (standard) collection of characters, e.g. iso8859, ISO-8859-1, Unicode, UTF-8, UTF-16
Chinese - 中文, 漢語, 華文
languages used in China, Taiwan, Hong Kong, Singapore and other Chinese locales
abbreviation for the major languages of Asia - Chinese, Japanese, Korean (and Vietnamese) which have large character sets requiring specialized computer processing
computerization - 電腦化
content - 內容
Content Management System - 內容管理系統
cultural bias - 文化成見
cultural taboos - 注意文化禁忌
colors and cultural nuances - 顏色和文化上的細微差異
data - 資料
data markup - 資料標記
database - 資料庫
delimiter - 定界符
dialect - 方言
putonghua - 普通話 or guoyu - 國語, Taiwanese - 台語 or Minnanyu - 閩南與, Cantonese - 粵語, Shanghainese, Hakka - 客語...
document - 文件
document format - 文件格式
document type definition - 文件型別定義
document type declaration - 文件型別宣告
encoding, character encoding - 字集碼, 編碼, 編碼方式, 字符集編碼, 符號化
method or standard defining the correspondence between code representation (for computer use) and printable or displayable glyphs
element - 元素
entity - 實體
font - 字型
styled glyphs for a character set. ming - 細明體 in Windows, Kai - 標楷體 is Taiwan and SimSun - 簡宋 in China.
GB - 國標
character encoding used by China. GB is an abbreviation for "GuoBiao", meaning "national standard".
character set representing 7445 characters, including 6763 simplified Chinese characters, gb2312 may be encoded with ISO-2022 and GBK
GBK - 國標擃展
"extended" GB characters set, including 14,240 traditional form characters. Used by simplified Microsoft Windows 95 and 98
globalization - 全球化
addresses the business issues associated with taking a product global. In the globalization of high-tech products this involves integrating localization throughout a company, after proper internationalization and product design, as well as marketing, sales, and support in the world market. Abbreviated: G10N
glyph - 字形
a specific character
Hypertext Markup Language
Internationalization - 國際化
The process of generalizing or designing a product to be as culturally and technically "neutral" as possible allowing easy localization for a specific linguistic or cultural locale (LISA). A localized document can be easily adapted to a range of languages, cultures, technical or political locales without the need for re-design. Abbreviated: I18N
IME - 輸入法
input method allowing input from a large character set using a limited keyboard
implementation - 實用化
International Standards Organization - 國際標準化組織
keyword - 關鍵字
language - 語言, 語文
lexicon, glossary, terminology database - 辭庫, 詞彙庫
list of corresponding terminology in different languages, usually locale, industry or project specific
LISA - Localization Industry Standard Association
localization - 當地化 (Taiwan), 地方化 (China), 本土化 (originally), 區域化, 地區化, 地域化
the process of adapting a product or service to a linguistic and cultural locale
locale - 區域環境, 地區設定
A discrete linguistic or cultural environment. A group of people within a common language and culture. For example, "zh-tw" indicates Chinese used in Taiwan.
mark up - 加標示
Machine-Assisted Human Translation - human translation using CAT tools
Multi-Language Vendor - a service provider who offers localization services into multiple target languages
MT - machine translation - 機器翻譯
Automatic translation of lexical and syntactic content
Optical Character Recognition - software used to convert scanned images of text into text data.
parameter - 參數
processing of a text file to extract desired data. Linguistic parsing may recognize words and phrases in text, and even recognize parts of speech.
pinyin - 拼音
"phonetic transcription" - system using roman letters to represent Chinese characters as syllables.
rc file, resource code file - 資源檔案
record - 記錄
reference - 參引
alphabetic or phonetic representation of Asian languages, For example: zhuyin - 注音, pinyin - 拼音 or hanyu pinyin - 漢語拼音, Wade Giles, tongyong pinyin - 通用拼音
search engine - 搜尋引擎
Segmentation is the tokenizing of text into syntactic memes. These units may then be aligned to create translation memories or extracted to terminology lexicons to be used by CAT tools. Segments may be phrases, sentences or whole paragraphs. Segmentation identifies the atoms to translate. Segmentation is intuitive to the skilled linguist, but challenging to the best MT.
separator - 分隔符
character used to indicate the boundary of fields or segments. Often a tab, space or comma
simplified characters - 簡化字, 簡體字
simplified Chinese characters used in China, Singapore and by overseas Chinese
source language
language from wihc the translation is made
tag - 標籤
non-content text element used to mark up and control the presenation of a document
target language
language into which a document is translated
text - 文字
TM - translation memory - 翻譯記憶
Stored units of associated text strings in language pairs from previous translations which can be suggested to translators (using CAT tools) translating similar content and language pair documents.
TMX - Translation Memory eXchange - 翻譯記憶交換
traditional characters - 繁體字
traditional characters used in Taiwan, Hong Kong and by overseas Chinese
is only one of the activities in localization; in addition to translation, a localization project includes many other tasks such as project management, software engineering, testing, and desktop publishing.
transliteration - 音譯
Unicode - 統一碼, 萬國碼, 多國文字編碼
international 16-bit character set and encoding designed to represent all the characters of all the world's languages.
"Universal Transformation Format" - including UTF-8 - a variable one to six byte encoding; and UTF-16 - the encoding for Unicode 2.0
eXtensible Markup Language - 可擴展標示語言

