全民健康保險研究資料庫v2014

Based on the registration files and original claim data in NHIRD, specific data subsets are constructed for research purposes. Brief descriptions of these datasets are as follows:


1.	Registration dataset
	The registration dataset combines seven registration files, namely HOSB, HOSX, DETA, BED, PER, DOC, and HV, and two original claim data files: CT and DT into a CD ROM for release. The registry for beneficiaries is released alone as 'Specific subject datasets'.

2.	Systematic Sampling of CD and DD
	Systematic sampling of CD 0.2% of the ambulatory care expenditures, by visit, (CD) extracted by systematic sampling method on a monthly basis , together with the related records in details of ambulatory care orders (OO) form the Systematic Sampling CD. Systematic Sampling of DD 5% of the inpatient expenditures, by admission, (DD), extracted by systematic sampling method on a monthly basis, together with the related records in details of inpatient orders (DO) form the Systematic Sampling DD.

3.	Longitudinal Health Insurance Database(LHID)
	LHID2010 LHID 2010contains all the original claim data of 1,000,000 beneficiaries enrolled in year 2010 randomly sampled from the year 2010 Registry for Beneficiaries (ID) of the NHIRD, where registration data of everyone who was a beneficiary of the National Health Insurance program during the period of Jan. 1st 2010 to Dec. 31 2010 were drawn for random sampling. There are approximately 27.38 million individuals in this registry. All the registration and claim data of these 1,000,000 individuals collected by the National Health Insurance program constitute the LHID2010. There was no significant difference in the genderdistribution (χ2=0.067, df=1, p-value=0.796) between the patients in the LHID2010 and the original NHIRD. LHID2005 LHID 2005contains all the original claim data of 1,000,000 beneficiaries enrolled in year 2005 randomly sampled from the year 2005 Registry for Beneficiaries (ID) of the NHIRD, where registration data of everyone who was a beneficiary of the National Health Insurance program during the period of Jan. 1st 2005 to Jan. 1st, 2006 were drawn for random sampling. There are approximately 25.68 million individuals in this registry. All the registration and claim data of these 1,000,000 individuals collected by the National Health Insurance program constitute the LHID2005. There was no significant difference in the genderdistribution (χ2=0.008, df=1,p-value=0.931)between the patients in the LHID2005 and the original NHIRD. LHID2000 LHID2000 contains all the original claim data of 200,000 individuals randomly sampled from the 2000 Registry for Beneficiaries (ID) of the NHIRD, which maintains the registration data of everyone who was a beneficiary of the National Health Insurance program during the period of 1996–2000. There are approximately 23.75 million individuals in this registry. All the registration and claim data of these 1,000,000 individuals collected by the National Health Insurance program constitute the LHID2000. There was no significant difference in the genderdistribution (χ2=1.74, df=1, p-value=0.187) between the patients in the LHID2000 and the original NHIRD.

4.	Specific subject datasets
	Based on a survey of the research community, specific research subjects were selected and the matched datasets are provided in CDROM or ready-to-dispatch files for more timely distribution. Prior to the year 2000, patients’ diagnoses in the database were encoded using the A-code or ICD-9-CM code; only the ICD-9-CM codes have been used since 2000. NHIRD codebooks are made available for researchers to use with the database. These codes are also the major basis for data subset construction.
	4-1 Dental dataset (DN) Dental original claim data, which is a sub-file in the CD data file provided by BNHI. 4-2 Traditional Chinese medicine dataset (CM) Traditional Chinese medicine original claim data, which is a sub-file in the CD data file provided by BNHI. 4-3 Inpatient expenditures, by admission (DD) Original claim data of inpatients, by admission. 4-4 Registry for beneficiaries (ID) Registration data of all beneficiaries. Data include identification number (scrambled), type of insurance, birthday, gender, coverage period, etc. 4-5 Cancer dataset (CN) Cancer patient original claim data extracted from the CD data file. Data that matched any of the cancer-related ICD-9-CM or other cancer-related codes were selected to construct this dataset, i.e. the cancer treatment codes 12, D1, or D2 from CURE_ITEM_NO1 to CURE_ITEM_NO4; the disease-category codes of “ACODE_ICD9_1~3” with the first three digits from 140 to 239; the diagnosis codes “ACODE_ICD9_1~3” with the first three digits from A08 to A17; and the cancer-surgery-related codes “ICD_OP_CODE” with the first three digits from V57 to V58. 4-6 Injury dataset (IN) Injury patient original claim data extracted from the CD data file. Data that matched any of the injury-related ICD-9-CM codes were selected to construct this dataset, i.e. the diagnosis codes of “ACODE_ICD9_1~3” with the letter E; and the diagnosis codes “ACODE_ICD9_1~3” with the first three digits from 800 to 959. 4-7 Case-payment dataset (NCP) Case payment coverage original claim data of patients extracted from the CD data file. Data that matched any of the case payment related ICD-9-CM or other codes were selected to construct this dataset, i.e. the case-type codes, “CASE_TYPE” 81; the disease-diagnosis codes of “ACODE_ICD9_1~3” with the first three digits 592; the diagnosis codes “ACODE_ICD9_1~3” with the first five digits 27411; the case payment surgery related codes “ICD_OP_CODE” 132 or 133; and “ICD_OP_CODE” 5300, 5301, 5302, 5329, 9851, 1311, 1319, 1341, 1342, 1343, 1351, 1359, 1371, 3001, 3009, 3022, or 3142. 4-8 Diabetes dataset (DB) Diabetes patient original claim data extracted from the CD data file. Data that matched any of the diabetes-related ICD-9-CM codes were selected to construct this dataset, i.e. the diagnosis codes of “ACODE_ICD9_1~3” with the first three digits 250; and the diagnosis codes “ACODE_ICD9_1~3” with the first four digits 6488, 7751, 7902, 6480, or A181. 4-9 Psychiatric Inpatient Medical Claim Dataset (PIMC) From the inpatient expenditures by admission (DD), we selected the patients whose admitting department was psychiatric or whose diagnosis matched psychiatric. Data of these individuals in CD, DD, OO, and DO were collected to construct the PIMC dataset. 4-10 Catastrophic illness dataset (HV) Catastrophic illness patient original claim data extracted from the CD data file. Data that matched the co-payment related catastrophic illness code 001 in “PART_NO” were selected to construct this dataset. 4-11 Occupational disease and occupational injury dataset (OC) Occupational disease or occupational injury patient original claim data extracted from the CD data file. Data that matched 1 or 2 “GAVE_KIND” were selected to construct this dataset. 4-12 Traffic accident dataset (TR) Traffic accident patient original claim data extracted from the DD data file. Data that matched traffic event code and ICD-9-CM codes were selected to construct this dataset, i.e. the traffic event code “TRA_EVEN” Y and the diagnosis codes “ACODE_ICD9_1~3” with the first four letters/digits from E800-E848. 4-13 Rehabilitation therapy dataset (RH) Rehabilitation therapy patient original claim data extracted from the CD data file. Data that matched any of the rehabilitation-related treatment codes and ICD-9-CM codes were selected to construct this dataset, i.e. D3 or D0 in “CURE_ITEM_NO1~CURE_ITEM_NO4” treatment codes, and the rehabilitation therapy surgery codes “ICD_OP_CODE” with the first two digits 93. 4-14 Medical center dataset (MC) Patient original data claimed by medical centers extracted from the CD data file. From the HOSB, we collected all the medical centers’ ID and obtained all the data claimed by these medical centers from the CD data file to construct this dataset.

5.	Teaching demo dataset
	All the original claim data of 1,000 individuals randomly sampled from the NHIRD Registry for Beneficiaries in the year 2000. This dataset is provided to teachers for educational purposes.