全民健康保險研究資料庫

Introduction to the National Health Insurance Research Database (NHIRD), Taiwan [中文]

Background

Taiwan launched a single-payer National Health Insurance Program on March 1, 1995. As of 2007, 22.60 million of Taiwan’s 22.96 million population were enrolled in this program. Foreigners in Taiwan are also eligible for this program. The database of this program contains registration files, and original claim data for reimbursement. National Health Insurance Research Database (NHIRD) derived from this system by the Bureau of National Health Insurance, Taiwan (BNHI) and maintained by the National Health Research Institutes, Taiwan, are provided to scientists in Taiwan for research purposes.

Data protection

Data in the National Health Insurance Research Database (NHIRD) that could be used to identify patients or care providers, including medical institutions and physicians, is scrambled before being sent to the National Health Research Institutes for database construction and is further scrambled before being released to each researcher. Theoretically, it is impossible to query the data alone to identify individuals at any level using this database. All researchers who wish to use the NHIRD and its data subsets are required to sign a written agreement declaring that they have no intention of attempting to obtain information that could potentially violate the privacy of patients or care providers.

Data files

Each year, BNHI collects data from the National Health Insurance program and sorts it into data files, including registration files and original claim data for reimbursement. These data files are de-identified by scrambling the identification codes of both patients and medical facilities and sent to the National Health Research Institutes to form the original files of NHIRD.

The Registration files include :

Registry for contracted beds (BED)
Registry for contracted specialty services (DETA)
Registry for contracted medical facilities (HOSB)
Supplementary registry for contracted medical facilities (HOSX)
Registry for board-certified specialists (DOC)
Registry for medical personnel (PER)
Registry for catastrophic illness patients (HV)
Registry for medical services (HOX)
Registry for drug prescriptions (DRUG)
Registry for beneficiaries (ID)

The Original Claim Data include:

Monthly claim summary for inpatient claims (DT)
Monthly claim summary for ambulatory care claims (CT)
Inpatient expenditures by admissions (DD)
Details of inpatient orders (DO)
Ambulatory care expenditures by visits (CD)
Details of ambulatory care orders (OO)
Expenditures for prescriptions dispensed at contracted pharmacies (GD)
Details of prescriptions dispensed at contracted pharmacies (GO)

Data subsets

Based on the registration files and original claim data in NHIRD, specific data subsets are constructed for research purposes. Brief descriptions of these datasets are as follows:

Registration datasets

The registration dataset combines seven registration files, namely HOSB, HOSX, DETA, BED, PER, DOC, and HV, and two original claim data files: CT and DT into a CD ROM for release. The registry for beneficiaries is released alone as "Specific subject datasets".

Systematic Sampling DD

5% of the inpatient expenditures, by admission, (DD), extracted by systematic sampling method on a monthly basis, together with the related records in details of inpatient orders (DO) form the Systematic Sampling DD.

Systematic Sampling CD

0.2% of the ambulatory care expenditures, by visit, (CD) extracted by systematic sampling method on a monthly basis, together with the related records in details of ambulatory care orders (OO) form the Systematic Sampling CD.

Longitudinal Health Insurance Database 2005 (LHID2005)

LHID2005 contains all the original claim data of 1,000,000 beneficiaries, randomly sampled from the year 2005 Registry for Beneficiaries (ID) of the NHIRD; everyone who was a beneficiary of the National Health Insurance Program during any period in 2005 is in the population for random sampling. There are approximately 25.68 million individuals in this registry. All the registration and claim data of these 1,000,000 individuals collected by the National Health Insurance Program constitute the LHID2005. There is no significant difference in the gender distribution，age distribution or average insured payroll-related amount between the patients in the LHID2005 and the original NHIRD.

Longitudinal Health Insurance Database 2000 (LHID2000)

LHID2000 contains all the original claim data of 200,000 individuals randomly sampled from the 2000 Registry for Beneficiaries (ID) of the NHIRD, which maintains the registration data of everyone who was a beneficiary of the National Health Insurance Program during the period of 1996–2000. There are approximately 23.72 million individuals in this registry. All the registration and claim data of these 200,000 individuals collected by the National Health Insurance program constitute the LHID2000. There is no significant difference in the gender distribution between the patients in the LHID2000 and the original NHIRD. [i]

Specific subject datasets

Based on a survey of the research community, specific research subjects were selected and the matched datasets are provided in CD-ROM or ready-to-dispatch files for more timely distribution. Prior to the year 2000, patients’ diagnoses in the database were encoded using either the A-code or ICD-9-CM code; after 2000 all the diagnoses follow the ICD-9-CM. NHIRD codebooks are made available for researchers to use with the database. These codes are also the major basis for data subset construction.

6-1 Traditional Chinese medicine dataset (CM)

Traditional Chinese medicine original claim data, which is a sub-file in the CD data file provided by BNHI.

6-2 Cancer dataset (CN)

Cancer patient original claim data extracted from the CD data file. Data that matched any of the cancer-related ICD-9-CM or other cancer-related codes were selected to construct this dataset, i.e. the cancer treatment codes 12, D1, or D2 from CURE_ITEM_NO1 to CURE_ITEM_NO4; the disease-category codes of “ACODE_ICD9_1~3” with the first three digits from 140 to 239; the diagnosis codes “ACODE_ICD9_1~3” with the first three digits from A08 to A17; and the cancer-surgery-related codes “ICD_OP_CODE” with the first three digits from V57 to V58.

6-3 Diabetes dataset (DB)

Diabetes patient original claim data extracted from the CD data file. Data that matched any of the diabetes-related ICD-9-CM codes were selected to construct this dataset, i.e. the diagnosis codes of “ACODE_ICD9_1~3” with the first three digits 250; and the diagnosis codes “ACODE_ICD9_1~3” with the first four digits 6488, 7751, 7902, 6480, or A181.

6-4 Inpatient expenditures, by admission (DD)

Original claim data of inpatients, by admission.

6-5 Dental dataset (DN)

Dental original claim data, which is a sub-file in the CD data file provided by BNHI.

6-6 Pharmacies dataset (G)

Pharmacies original claim data, which is a sub-file in the GD and GO data file provided by BNHI.

6-7 Catastrophic illness dataset (HV)

Catastrophic illness patient original claim data extracted from the CD data file. Data that matched the co-payment related catastrophic illness code 001 in “PART_NO” were selected to construct this dataset.

6-8 Registry for beneficiaries (ID)

Registration data of all beneficiaries. Data include identification number (scrambled), type of insurance, birthday, gender, coverage period, etc.

6-9 Injury dataset (IN)

Injury patient original claim data extracted from the CD data file. Data that matched any of the injury-related ICD-9-CM codes were selected to construct this dataset, i.e. the diagnosis codes of “ACODE_ICD9_1~3” with the letter E; and the diagnosis codes “ACODE_ICD9_1~3” with the first three digits from 800 to 959.

6-10 Medical center dataset (MC)

Patient original data claimed by medical centers extracted from the CD data file. From the HOSB, we collected all the medical centers’ ID and obtained all the data claimed by these medical centers from the CD data file to construct this dataset.

6-11 Case-payment dataset (NCP)

Case payment[ii] coverage original claim data of patients extracted from the CD data file. Data that matched any of the case payment related ICD-9-CM or other codes were selected to construct this dataset, i.e. the case-type codes, “CASE_TYPE” 81; the disease-diagnosis codes of “ACODE_ICD9_1~3” with the first three digits 592; the diagnosis codes “ACODE_ICD9_1~3” with the first five digits 27411; the case payment surgery related codes “ICD_OP_CODE” 132 or 133; and “ICD_OP_CODE” 5300, 5301, 5302, 5329, 9851, 1311, 1319, 1341, 1342, 1343, 1351, 1359, 1371, 3001, 3009, 3022, or 3142.

6-12 Occupational disease and occupational injury dataset (OC)

Occupational disease or occupational injury patient original claim data extracted from the CD data file. Data that matched 1 or 2 “GAVE_KIND” were selected to construct this dataset.

6-13 Psychiatric Inpatient Medical Claim Dataset (PIMC)

From the inpatient expenditures by admission (DD), we selected the patients whose admitting department was psychiatric or whose diagnosis matched psychiatric. Data of these individuals in CD, DD, OO, and DO were collected to construct the PIMC dataset.

6-14 Traffic accident dataset (TR)

Traffic accident patient original claim data extracted from the DD data file. Data that matched traffic event code and ICD-9-CM codes were selected to construct this dataset, i.e. the traffic event code “TRA_EVEN” Y and the diagnosis codes “ACODE_ICD9_1~3” with the first four letters/digits from E800-E848.

6-15 Rehabilitation therapy dataset (RH)

Rehabilitation therapy patient original claim data extracted from the CD data file. Data that matched any of the rehabilitation-related treatment codes and ICD-9-CM codes were selected to construct this dataset, i.e. D3 or D0 in “CURE_ITEM_NO1~CURE_ITEM_NO4” treatment codes, and the rehabilitation therapy surgery codes “ICD_OP_CODE” with the first two digits 93.

Teaching demo dataset

All the original claim data of 1,000 individuals randomly sampled from the NHIRD Registry for Beneficiaries in the year 2000. This dataset is provided to teachers for educational purposes.

[i] Churn-Shiouh Gau, I-Shou Chang, Fe-Lin Wu, Hui-Tzu Yu, Yu-Wen Huang, Cheng-Liang Chi, Su-Yu Chien, Keh-Ming Lin, Ming-Ying Liu, Hui-Po Wang. Usage of the claim database of national health insurance programme for analysis of cisapride-erythromycin co-medication in Taiwan. Pharmacoepidemiology and drug safety 2007; 16:86-95

[ii] Third-party payers pay physicians/hospitals according to the cases treated rather than per service or per bed day. The coverage in Taiwan is available at this LINK [Accessed May 2007. In Chinese].