1 00:00:17,083 --> 00:00:19,052 >> Jon Walter McKeeby: My name is Jon Walter McKeeby. 2 00:00:19,119 --> 00:00:22,088 I am the chief information officer at the Clinical Center. 3 00:00:22,622 --> 00:00:25,358 The Clinical Center is the hospital portion 4 00:00:25,425 --> 00:00:27,994 of the NIH Intramural Program. 5 00:00:28,561 --> 00:00:33,400 I oversee the EHR and all the clinical information systems 6 00:00:33,466 --> 00:00:36,936 that support clinical research at the Clinical Center. 7 00:00:37,804 --> 00:00:40,907 Today we're going to talk about common data elements. 8 00:00:40,974 --> 00:00:43,576 We're presenting in four parts. 9 00:00:43,643 --> 00:00:48,281 Part one is an overview to provide background information 10 00:00:48,348 --> 00:00:50,216 about common data elements. 11 00:00:51,885 --> 00:00:54,254 We want to be able to use common language 12 00:00:54,320 --> 00:00:57,791 across different sites, studies, and clinical trials. 13 00:00:57,857 --> 00:01:03,129 That way, we can compare and utilize existing standards 14 00:01:03,196 --> 00:01:06,833 to conduct clinical trials and share information. 15 00:01:07,700 --> 00:01:10,437 We want to ensure consistent data collection, 16 00:01:10,503 --> 00:01:12,605 data analysis, and data reporting. 17 00:01:13,840 --> 00:01:16,009 And so we're going to review a background -- 18 00:01:16,075 --> 00:01:18,812 what is data, information, knowledge, and wisdom, 19 00:01:19,345 --> 00:01:22,015 review of quantitative versus qualitative data, 20 00:01:22,582 --> 00:01:26,519 the need for data standards, form, and content, problems 21 00:01:26,586 --> 00:01:30,890 with data, terminologies, interoperability standards, 22 00:01:31,524 --> 00:01:34,961 FAIR guiding principles, common data elements, 23 00:01:35,028 --> 00:01:37,197 data management and data sharing plans. 24 00:01:37,764 --> 00:01:41,000 We'll review CDE usage at the NIH. 25 00:01:41,067 --> 00:01:45,371 We'll review National Library of Medicine and data standards 26 00:01:46,406 --> 00:01:49,742 usage at the institute and center level, 27 00:01:49,809 --> 00:01:53,179 and the NIH CDE Task Force. 28 00:01:55,682 --> 00:01:58,618 Data, information, knowledge, and wisdom model -- 29 00:01:58,685 --> 00:02:03,923 so data is that specific data discrete element. 30 00:02:04,557 --> 00:02:09,262 It's a collection or organized, named, 31 00:02:10,263 --> 00:02:14,267 but it's not having context yet. It's not information. 32 00:02:14,334 --> 00:02:17,504 You add organization, interpretation, 33 00:02:17,570 --> 00:02:19,772 and then data becomes information. 34 00:02:20,440 --> 00:02:23,610 It becomes knowledge when it's integrated 35 00:02:23,676 --> 00:02:27,780 into more information and understanding. 36 00:02:28,348 --> 00:02:31,084 So within the EHR world, 37 00:02:31,150 --> 00:02:35,855 the specific data element of a lab value is the data. 38 00:02:36,456 --> 00:02:40,493 It becomes information when you identify the medications, 39 00:02:40,560 --> 00:02:43,596 the individual, the patient, 40 00:02:43,663 --> 00:02:46,966 and what's going on with the patient. 41 00:02:47,033 --> 00:02:48,768 It becomes information. 42 00:02:48,835 --> 00:02:52,872 Understanding what to provide in the sense 43 00:02:52,939 --> 00:02:58,545 of what orders to place, what to do with the information, 44 00:02:59,279 --> 00:03:03,683 how to treat the symptoms of the patient, 45 00:03:03,750 --> 00:03:09,522 that is knowledge. It's wisdom when you now apply it 46 00:03:09,589 --> 00:03:12,258 and take it outside the organization 47 00:03:12,325 --> 00:03:16,462 and it's published in a journal. Then it becomes wisdom. 48 00:03:16,529 --> 00:03:22,168 So this is data, information, knowledge, and wisdom model. 49 00:03:24,037 --> 00:03:28,341 Quantitative studies -- measurable, numeric, 50 00:03:28,408 --> 00:03:30,310 provides answers to the how many, 51 00:03:30,376 --> 00:03:34,814 the how much, the how often. Data is fixed and factual. 52 00:03:34,881 --> 00:03:38,084 We can utilize statistical analysis. 53 00:03:38,751 --> 00:03:42,488 Discrete quantitative data is fixed numerical values. 54 00:03:43,523 --> 00:03:46,192 Options are surveys and questionnaires, 55 00:03:46,259 --> 00:03:48,127 analytical tools, 56 00:03:48,194 --> 00:03:51,030 electronic health records that have discrete data, 57 00:03:51,097 --> 00:03:55,401 vital signs, clinical information systems, 58 00:03:55,468 --> 00:03:57,904 laboratory data, blood bank data. 59 00:03:57,971 --> 00:04:00,106 They all provide quantitative data 60 00:04:00,173 --> 00:04:02,809 that we can measure as part of our studies. 61 00:04:02,875 --> 00:04:05,878 Qualitative -- descriptive language, 62 00:04:06,746 --> 00:04:09,916 unstructured, subjective, dynamic. 63 00:04:09,983 --> 00:04:13,820 We use that to analyze and identify themes. 64 00:04:14,554 --> 00:04:17,857 Nominal data is used to label, categorize. 65 00:04:17,924 --> 00:04:20,293 We're looking for clustering of information. 66 00:04:21,260 --> 00:04:24,530 Examples of qualitative data we use for research, 67 00:04:24,597 --> 00:04:26,966 again, electronic health records, 68 00:04:27,033 --> 00:04:29,569 but the textual data, the report data, 69 00:04:29,636 --> 00:04:31,938 interpretations, unstructured data, 70 00:04:32,538 --> 00:04:35,475 clinical information systems such as lab, 71 00:04:35,541 --> 00:04:39,212 radiology, blood bank, it's, again, the textual data, 72 00:04:39,278 --> 00:04:42,749 the unstructured data, the interpretations. 73 00:04:42,815 --> 00:04:45,618 Interviews, surveys, and questionnaires, 74 00:04:45,685 --> 00:04:49,989 and observations are all examples of qualitative data. 75 00:04:51,124 --> 00:04:55,461 It's important to know knowledge, information, wisdom, 76 00:04:55,528 --> 00:04:57,497 and quantitative, qualitative data, 77 00:04:57,563 --> 00:04:59,766 because they provide the foundation 78 00:04:59,832 --> 00:05:03,403 that clinical data elements are standed 79 00:05:03,469 --> 00:05:05,071 [phonetic sp] upon. 80 00:05:07,440 --> 00:05:08,708 The problems with data -- 81 00:05:08,775 --> 00:05:12,045 inconsistent results, inconsistent formats, 82 00:05:12,111 --> 00:05:15,348 inconsistent questions, sampling bias, 83 00:05:15,415 --> 00:05:18,818 under-representation, misrepresentation, 84 00:05:18,885 --> 00:05:22,789 interview quality, poorly validated questions, 85 00:05:22,855 --> 00:05:25,191 respondents not answering truthfully, 86 00:05:25,258 --> 00:05:29,362 changes in meaning, changes in coding strategies. 87 00:05:29,429 --> 00:05:32,832 It makes it difficult to have consistent data 88 00:05:32,899 --> 00:05:36,669 that you can utilize with your clinical research studies 89 00:05:36,736 --> 00:05:43,476 to make valid decisions, valid results, 90 00:05:43,543 --> 00:05:45,912 and have valid studies. 91 00:05:45,978 --> 00:05:51,784 And so common data elements help resolve many of these issues. 92 00:05:54,454 --> 00:05:55,888 So here's an example. 93 00:05:55,955 --> 00:05:58,624 This is actually a true letter to my father 94 00:05:59,459 --> 00:06:01,327 from somebody on the railroad. 95 00:06:01,394 --> 00:06:06,466 My father's name was Thom, no A-S, T-H-O-M, McKeeby. 96 00:06:06,532 --> 00:06:10,303 And the person wrote, "Please deliver this to Thom McKeeby. 97 00:06:10,369 --> 00:06:14,107 He drives a train. He lives near the glass factory. 98 00:06:14,173 --> 00:06:16,709 His brother-in-law is Doug." 99 00:06:17,343 --> 00:06:20,747 And it was able to be delivered, right. 100 00:06:20,813 --> 00:06:23,416 The post office knew who Thom was -- 101 00:06:23,483 --> 00:06:26,552 small town -- was able to deliver it. 102 00:06:26,619 --> 00:06:30,857 But it's because that human was part of the process, 103 00:06:31,357 --> 00:06:34,660 and they had cognitive understanding 104 00:06:34,727 --> 00:06:37,663 of all the information that was here. 105 00:06:38,264 --> 00:06:41,334 They relied on human knowledge and cognition, 106 00:06:41,400 --> 00:06:44,203 standardization not required for the address. 107 00:06:45,138 --> 00:06:47,240 But it's hard to share and process. 108 00:06:47,306 --> 00:06:50,042 It's hard to share the knowledge. 109 00:06:50,109 --> 00:06:51,677 And so then we moved -- 110 00:06:51,744 --> 00:06:54,180 the post office moved to standard formats, 111 00:06:54,247 --> 00:06:58,017 right, where the street, you put the apartment, 112 00:06:58,084 --> 00:07:01,788 the city, the state, the zip code, standard values, 113 00:07:01,854 --> 00:07:05,591 format of the zip code, values for states, 114 00:07:06,259 --> 00:07:08,261 two-letter abbreviations. 115 00:07:08,327 --> 00:07:12,398 And that standardized it so we could share and process 116 00:07:12,465 --> 00:07:15,635 letters quickly, efficiently, effectively. 117 00:07:16,169 --> 00:07:21,340 And so that's example of where common data elements 118 00:07:21,407 --> 00:07:25,311 and standards play into improve 119 00:07:26,679 --> 00:07:29,582 clinical studies and clinical research. 120 00:07:32,418 --> 00:07:36,355 And so some healthcare examples -- patient name format, 121 00:07:37,223 --> 00:07:40,993 prefix, first name, middle name, last name, suffix, 122 00:07:41,761 --> 00:07:43,896 comparing results across organization, 123 00:07:43,963 --> 00:07:47,967 lab results, using coding methods such as LOINC, 124 00:07:49,135 --> 00:07:51,537 the name of the task, the reference range, 125 00:07:51,604 --> 00:07:55,007 the result values, the unit of measure. 126 00:07:55,074 --> 00:07:57,243 An example of questionnaires with scales -- 127 00:07:57,310 --> 00:08:02,014 here's an example of a pain measurement scale, smiley face, 128 00:08:02,081 --> 00:08:06,853 light blue going to red, from no pain to worse pain. 129 00:08:06,919 --> 00:08:10,389 These are scales that are used and consistent 130 00:08:10,923 --> 00:08:12,692 and allow the sharing of data 131 00:08:12,758 --> 00:08:16,495 to be easy and efficient and effective. 132 00:08:18,698 --> 00:08:20,333 NLM and data standards. 133 00:08:20,399 --> 00:08:23,336 NLM is the central coordinating body 134 00:08:23,402 --> 00:08:25,504 for clinical terminology standards 135 00:08:25,571 --> 00:08:28,107 within the Department of Health and Human Services. 136 00:08:29,108 --> 00:08:31,277 NLM works closely with the Office 137 00:08:31,344 --> 00:08:34,647 of the National Coordinator for Health Information Technology 138 00:08:34,714 --> 00:08:38,284 to ensure NLM efforts are aligned 139 00:08:38,351 --> 00:08:42,154 with the goal of the president and the HHS secretary. 140 00:08:42,922 --> 00:08:46,926 It's to improve and ensure interoperability 141 00:08:46,993 --> 00:08:50,129 of health information technology infrastructure 142 00:08:50,196 --> 00:08:53,599 to improve the quality and efficiency of health care, 143 00:08:54,133 --> 00:08:57,503 allow the interoperability between systems, 144 00:08:57,570 --> 00:09:01,774 between organizations, for the sharing of clinical data. 145 00:09:03,409 --> 00:09:07,647 NLM HealthIT summarizes health information technology 146 00:09:07,713 --> 00:09:11,884 and health data standards at NLM is also available. 147 00:09:11,951 --> 00:09:14,420 So that's a link that you can follow 148 00:09:14,487 --> 00:09:18,090 to see everything that's done for data standards at NLM. 149 00:09:19,525 --> 00:09:20,927 Medical terminologies. 150 00:09:20,993 --> 00:09:25,298 Medical terminologies facilitate precise and clear communication 151 00:09:25,364 --> 00:09:29,201 by minimizing and eliminating ambiguity. 152 00:09:29,702 --> 00:09:34,507 By having a code, you can identify that value, 153 00:09:34,573 --> 00:09:38,678 that item, across different domains, 154 00:09:38,744 --> 00:09:40,446 different research studies. 155 00:09:41,247 --> 00:09:44,350 Some terminologies serve as a crosswalk 156 00:09:44,417 --> 00:09:47,620 or a mapping between multiple other terminologies. 157 00:09:48,154 --> 00:09:50,556 Key features of a medical terminology 158 00:09:50,623 --> 00:09:52,091 is a unique identifier, 159 00:09:52,158 --> 00:09:56,495 usually a code, an official name, and synonyms. 160 00:09:56,562 --> 00:10:01,067 So it allows you to identify the item and its values 161 00:10:01,133 --> 00:10:02,735 and what's possible, 162 00:10:03,369 --> 00:10:07,773 ensuring that you can utilize it across systems. 163 00:10:09,909 --> 00:10:13,245 And so here is example of medical terminologies. 164 00:10:13,779 --> 00:10:16,782 Current procedural terminology, CPT, 165 00:10:16,849 --> 00:10:20,853 used to describe procedures performed, used for billing. 166 00:10:21,620 --> 00:10:25,491 International classification of disease, ICD-10, 167 00:10:25,558 --> 00:10:27,793 sponsored by the World Health Organization -- 168 00:10:28,661 --> 00:10:35,368 main terminology used worldwide to identify diseases 169 00:10:35,434 --> 00:10:40,072 and classification of diseases across the world. 170 00:10:40,973 --> 00:10:46,612 LOINC -- Logical Observation Identifiers Names and Codes -- 171 00:10:47,146 --> 00:10:50,383 laboratory and clinical codes identifying tasks, 172 00:10:50,449 --> 00:10:54,286 so you know that a sodium is a sodium across the world 173 00:10:54,353 --> 00:10:58,891 based on its LOINC code, and any other test. 174 00:10:59,825 --> 00:11:03,229 Medical subject headings -- created and used by NLM, 175 00:11:03,295 --> 00:11:08,000 used to tag medical abstracts with concept-based information, 176 00:11:08,534 --> 00:11:12,605 allows authors and searchers to use different terminology 177 00:11:12,671 --> 00:11:15,174 for the name concept. 178 00:11:16,375 --> 00:11:21,247 National drug codes identify codes for the medications, 179 00:11:21,313 --> 00:11:23,883 so you know what product that you're utilizing. 180 00:11:24,450 --> 00:11:26,819 It's typically used for billing reimbursement, 181 00:11:26,886 --> 00:11:29,488 but it's also used to identify 182 00:11:29,555 --> 00:11:32,224 what medications are being used in a study. 183 00:11:34,293 --> 00:11:40,232 RxNorm -- produced by NLM and is included as part of UMLS. 184 00:11:40,299 --> 00:11:44,537 It's a standardized clinical drug terminology naming system, 185 00:11:45,037 --> 00:11:49,742 a curated collection of drug names, has synonyms, 186 00:11:49,809 --> 00:11:55,381 has a thesaurus to identify ingredients of a medication, 187 00:11:55,448 --> 00:11:57,149 components of a medication, 188 00:11:57,216 --> 00:11:59,051 but you can share that information. 189 00:12:00,586 --> 00:12:04,924 SNOMED-CT, created by SNOMED International -- 190 00:12:05,891 --> 00:12:08,828 it is managed by NLM, 191 00:12:09,528 --> 00:12:12,832 and it's used to identify diagnosis, 192 00:12:12,898 --> 00:12:18,871 concepts, different information to codify unstructured data 193 00:12:18,938 --> 00:12:20,539 to make value with it. 194 00:12:21,440 --> 00:12:25,377 And then Unified Medical Language Systems, UMLS, 195 00:12:25,444 --> 00:12:29,381 it's a collection of over 100 controlled vocabularies 196 00:12:29,448 --> 00:12:31,317 in biomedicine and health. 197 00:12:31,383 --> 00:12:34,353 It maps concepts between vocabularies 198 00:12:34,420 --> 00:12:36,989 and enables linking between systems. 199 00:12:40,693 --> 00:12:45,464 And so the United States Core Data Interoperability 200 00:12:47,233 --> 00:12:52,771 provides standards to identify components 201 00:12:52,838 --> 00:12:59,411 of data elements, data classes, to be able to share. 202 00:12:59,478 --> 00:13:04,183 So specific standardized elements 203 00:13:04,250 --> 00:13:06,752 include items such as patient name, 204 00:13:07,419 --> 00:13:11,924 sex assigned at birth, ethnicity, smoking status, 205 00:13:12,725 --> 00:13:15,094 date of birth, preferred language, 206 00:13:15,161 --> 00:13:17,763 race, lab values, procedures, 207 00:13:19,064 --> 00:13:21,534 medication allergies, immunization, 208 00:13:21,600 --> 00:13:24,170 vital signs, medications, and goals. 209 00:13:25,137 --> 00:13:27,973 It's both format as well as values, 210 00:13:28,474 --> 00:13:31,644 so sex being coded to female, 211 00:13:31,710 --> 00:13:35,681 male, and unknown, values for ethnicity, 212 00:13:35,748 --> 00:13:37,950 a format for date of birth. 213 00:13:38,017 --> 00:13:41,820 These are all part of interoperability standards 214 00:13:41,887 --> 00:13:44,423 to share information, again, 215 00:13:44,490 --> 00:13:48,427 put in some common data element component 216 00:13:48,494 --> 00:13:51,497 to make it easy to share information 217 00:13:52,398 --> 00:13:55,668 to use for analysis studies, 218 00:13:55,734 --> 00:13:59,505 but also to convey to other systems, 219 00:13:59,572 --> 00:14:01,407 to interface to other systems. 220 00:14:03,509 --> 00:14:10,382 And so here is the example of vocabulary for ethnicity group, 221 00:14:10,449 --> 00:14:14,553 so Hispanic or Latino, not Hispanic or Latino. 222 00:14:14,620 --> 00:14:16,689 There's a concept code to it, 223 00:14:16,755 --> 00:14:18,857 so then you can share this information. 224 00:14:19,458 --> 00:14:21,827 And then you see the race category, 225 00:14:21,894 --> 00:14:28,067 so the main level of race categories: 226 00:14:29,034 --> 00:14:31,136 American, Indian, or Alaskan Native, 227 00:14:31,203 --> 00:14:33,839 Asian, Black or African American, 228 00:14:33,906 --> 00:14:37,943 Native Hawaiian or Other Pacific Islander, and then White. 229 00:14:38,477 --> 00:14:42,881 And then each one of these can be broken down to another level 230 00:14:42,948 --> 00:14:47,686 if you need to go to a deeper level of American Indian 231 00:14:47,753 --> 00:14:52,258 or Alaskan Native, sub-level of Asian, 232 00:14:53,025 --> 00:14:55,928 sub-level of Black or African American, 233 00:14:55,995 --> 00:14:58,063 a sub-level of Native Hawaiian 234 00:14:58,130 --> 00:15:01,233 or Other Pacific Islander, or sub-level of White. 235 00:15:01,300 --> 00:15:05,170 So each of them have a hierarchical structure. 236 00:15:06,872 --> 00:15:09,942 Health Level Seven, HL seven, 237 00:15:10,476 --> 00:15:14,013 is that main interoperability standard 238 00:15:14,079 --> 00:15:18,083 used to share information between systems. 239 00:15:18,150 --> 00:15:21,587 So if you have a laboratory information system, 240 00:15:21,654 --> 00:15:25,391 an electronic health record, you want to provide the orders 241 00:15:25,457 --> 00:15:27,393 from the electronic health record 242 00:15:27,459 --> 00:15:29,962 to the laboratory information standard 243 00:15:31,797 --> 00:15:33,832 utilizing HL seven. 244 00:15:33,899 --> 00:15:37,536 It identifies the time and date of the message, 245 00:15:38,170 --> 00:15:41,774 the patient, any next of kin information, 246 00:15:41,840 --> 00:15:43,842 and information of orders, 247 00:15:43,909 --> 00:15:46,545 and then it can convey the results back 248 00:15:46,612 --> 00:15:48,213 to the electronic health record. 249 00:15:48,847 --> 00:15:52,584 And so this is a standard format created in 1987. 250 00:15:53,852 --> 00:15:57,790 It has an ANSI-accredited standard, 251 00:15:57,856 --> 00:16:01,060 and again, it's focused on the exchange, integration, 252 00:16:01,126 --> 00:16:03,862 sharing, and retrieval of electronic health record. 253 00:16:07,499 --> 00:16:11,136 The FAIR guideline principle -- so findable, 254 00:16:11,203 --> 00:16:15,341 assessable, interoperable, reusable -- 255 00:16:16,075 --> 00:16:18,477 and there's a link for the FAIR principles. 256 00:16:19,078 --> 00:16:26,018 And so these are the guidelines to make data usable, beneficial, 257 00:16:27,119 --> 00:16:31,924 shareable, consistent -- ensures the integrity. 258 00:16:32,725 --> 00:16:36,528 And so findable, finding that there's sufficient metadata -- 259 00:16:36,595 --> 00:16:38,197 there's a unique code 260 00:16:38,263 --> 00:16:40,833 that identifies the value of the item. 261 00:16:41,533 --> 00:16:45,671 My example of a sodium is a sodium across systems 262 00:16:45,738 --> 00:16:47,506 based on the LOINC code. 263 00:16:47,573 --> 00:16:51,577 So the instrument that produces the sodium result 264 00:16:51,643 --> 00:16:55,447 will have a code that you can search 265 00:16:55,514 --> 00:16:59,351 and identifies that it's the sodium result. 266 00:17:00,719 --> 00:17:02,921 Assessable -- to be assessable, metadata, 267 00:17:03,856 --> 00:17:06,825 as well as the data itself, 268 00:17:06,892 --> 00:17:10,062 should be readable by humans and machines. 269 00:17:10,129 --> 00:17:13,098 It must reside in a trusted repository 270 00:17:13,165 --> 00:17:16,602 that you can access and utilize between systems, 271 00:17:17,236 --> 00:17:24,777 between humans, as well as for different items 272 00:17:24,843 --> 00:17:30,849 such as research as well as clinical purposes. 273 00:17:30,916 --> 00:17:34,887 Interoperability -- data must share a common structure. 274 00:17:34,953 --> 00:17:38,724 It must be recognized, formal terminologies for description, 275 00:17:40,259 --> 00:17:44,062 so the format as well as the code for the item. 276 00:17:44,997 --> 00:17:46,999 And reusable -- data and collections 277 00:17:47,065 --> 00:17:50,836 must have clear usage licenses, clear provenance, 278 00:17:51,603 --> 00:17:54,873 meet relevant community standards for the domain. 279 00:17:57,643 --> 00:18:00,245 And so there's sub-categories for each. 280 00:18:00,312 --> 00:18:03,849 So findable -- metadata is assigned a globally unique 281 00:18:03,916 --> 00:18:05,517 and persistent identifier. 282 00:18:06,585 --> 00:18:09,488 Data are described with rich metadata. 283 00:18:10,355 --> 00:18:13,358 Metadata is clearly and explicitly include 284 00:18:13,425 --> 00:18:15,494 the identifier the data they describe. 285 00:18:16,628 --> 00:18:19,198 And the metadata is registered or indexed 286 00:18:19,264 --> 00:18:20,933 in a searchable resource. 287 00:18:22,301 --> 00:18:26,138 Accessible -- metadata is retrievable by identifier 288 00:18:26,205 --> 00:18:29,308 using a standardized communication protocol. 289 00:18:30,242 --> 00:18:34,780 The protocol is open, free, and universally implemented. 290 00:18:35,814 --> 00:18:39,151 The protocol allows for an authentication 291 00:18:39,218 --> 00:18:43,589 and authorization procedure, and the metadata is accessible 292 00:18:43,655 --> 00:18:46,525 even when the data are no longer available. 293 00:18:47,493 --> 00:18:49,294 So they have to be always available. 294 00:18:50,562 --> 00:18:55,033 Interoperability -- the metadata uses a formal, 295 00:18:55,100 --> 00:18:56,368 accessible, shared, 296 00:18:56,435 --> 00:18:59,104 and applicable language for knowledge representation. 297 00:18:59,972 --> 00:19:03,375 Utilizes vocabularies that follow FAIR principles. 298 00:19:04,309 --> 00:19:07,946 The metadata data included qualified references 299 00:19:08,013 --> 00:19:09,615 to other metadata. 300 00:19:10,582 --> 00:19:12,317 And then reusability -- 301 00:19:12,985 --> 00:19:15,954 the metadata are richly described 302 00:19:16,021 --> 00:19:20,025 with a plurality of accurate and relevant attributes. 303 00:19:21,226 --> 00:19:23,095 The metadata is released with a clear 304 00:19:23,161 --> 00:19:25,464 and accessible data usage license, 305 00:19:26,098 --> 00:19:29,201 so you know how you can use it, when it's available, 306 00:19:30,202 --> 00:19:33,572 what you need to identify, when you use it. 307 00:19:34,339 --> 00:19:37,142 The data is associated with a detailed provenance, 308 00:19:37,209 --> 00:19:40,012 so you know the value of the source 309 00:19:40,078 --> 00:19:44,283 and integrity of the metadata. 310 00:19:45,117 --> 00:19:48,720 And the metadata meets the main relevant community standards -- 311 00:19:48,787 --> 00:19:51,123 so for healthcare, HL7. 312 00:19:51,189 --> 00:19:54,159 For financial, there's other standards, 313 00:19:54,226 --> 00:19:57,529 so each domain might have its own standard. 314 00:20:01,233 --> 00:20:04,036 Challenges with the FAIR guideline principles -- 315 00:20:04,102 --> 00:20:06,204 not everybody wants to share data, 316 00:20:06,772 --> 00:20:09,508 and that's true with common data elements. 317 00:20:09,575 --> 00:20:12,611 Not everybody wants to share their research questions. 318 00:20:12,678 --> 00:20:15,847 Not everybody wants to make it available 319 00:20:15,914 --> 00:20:18,483 for other people to do the same research. 320 00:20:20,218 --> 00:20:22,821 And if you do not measure something the same way, 321 00:20:22,888 --> 00:20:25,390 you're not measuring the same thing. 322 00:20:25,457 --> 00:20:27,826 So in the common data element world, 323 00:20:27,893 --> 00:20:32,164 if I want my study to be across multiple sites 324 00:20:32,230 --> 00:20:33,999 and my question is different, 325 00:20:34,066 --> 00:20:36,001 I'm not measuring the same thing. 326 00:20:36,535 --> 00:20:39,237 So that's true to FAIR guiding principles, 327 00:20:39,304 --> 00:20:41,573 and it's true to common data elements. 328 00:20:43,775 --> 00:20:46,545 So common data elements -- a standardized, 329 00:20:46,612 --> 00:20:48,981 precisely defined question 330 00:20:49,047 --> 00:20:52,985 paired with a set of specific allowable responses, 331 00:20:53,518 --> 00:20:56,154 used systematically across different sites, 332 00:20:56,221 --> 00:20:58,023 studies or clinical trials, 333 00:20:58,523 --> 00:21:00,659 ensure consistent data collection. 334 00:21:01,693 --> 00:21:05,697 CDEs can be grouped into sets to form question surveys, 335 00:21:05,764 --> 00:21:08,200 case report forms, or other instruments. 336 00:21:08,867 --> 00:21:15,140 They are defined unambiguously in both human and machine 337 00:21:15,974 --> 00:21:17,609 computable terms. 338 00:21:19,678 --> 00:21:21,213 The benefits -- reduced time 339 00:21:21,279 --> 00:21:23,982 and cost to develop data collection tools, 340 00:21:24,049 --> 00:21:28,654 because you have the questions. You have the elements. 341 00:21:28,720 --> 00:21:32,290 You have what the possible results can be. 342 00:21:32,357 --> 00:21:35,861 So again, it could be gender at birth, 343 00:21:35,927 --> 00:21:37,329 or it could be a lab test, 344 00:21:37,396 --> 00:21:40,165 or it could be a question on a questionnaire. 345 00:21:41,233 --> 00:21:43,835 Standardized and consistent data collection, 346 00:21:44,536 --> 00:21:46,705 improved data quality and data sharing, 347 00:21:47,239 --> 00:21:49,875 meeting those fair standards that we talked about 348 00:21:50,442 --> 00:21:53,912 three slides ago allow for meta-analysis 349 00:21:53,979 --> 00:21:56,581 and comparison of results from different studies. 350 00:21:57,115 --> 00:22:02,320 If it's a lab result and you have a code and it's utilized, 351 00:22:02,387 --> 00:22:07,726 then you can compare that sodium or that calcium result, 352 00:22:07,793 --> 00:22:09,327 because you have the LOINC code 353 00:22:09,394 --> 00:22:12,597 and they are identified as being the same. 354 00:22:13,298 --> 00:22:16,802 If you have a question on a questionnaire and you -- 355 00:22:16,868 --> 00:22:19,304 it's coded, it has a metadata, 356 00:22:19,871 --> 00:22:23,375 and you have content that the answers can be, 357 00:22:23,442 --> 00:22:26,978 you can then compare the results for that question. 358 00:22:27,646 --> 00:22:30,148 And so it allows that meta-analysis 359 00:22:30,215 --> 00:22:32,017 and comparison of results. 360 00:22:35,787 --> 00:22:39,357 Implementing CDE questions -- how is data released? 361 00:22:39,424 --> 00:22:40,659 How is data shared? 362 00:22:40,726 --> 00:22:44,463 How do you ensure the meaning of information across sites? 363 00:22:44,996 --> 00:22:48,633 How can you use external sources of data? 364 00:22:52,671 --> 00:22:56,608 Implementing CDE -- the purpose, to identify discrete, 365 00:22:56,675 --> 00:22:58,877 defined items for data collection, 366 00:22:58,944 --> 00:23:02,013 to promote consistent data collection in the field, 367 00:23:02,681 --> 00:23:06,218 to eliminate unneeded or redundant data collection, 368 00:23:06,284 --> 00:23:09,721 to promote consistent reporting and analysis, 369 00:23:10,322 --> 00:23:12,524 to reduce the possibility of error 370 00:23:12,591 --> 00:23:15,460 related to data translation and transformation, 371 00:23:15,994 --> 00:23:18,563 and to facilitate data sharing. 372 00:23:27,472 --> 00:23:31,510 Here's examples of clinical research typical data elements. 373 00:23:31,576 --> 00:23:34,546 So from chapter 30 in the data management 374 00:23:34,613 --> 00:23:41,253 and clinical trials book, on-study, demographic data, 375 00:23:41,319 --> 00:23:45,590 eligibility criteria, family history, patient history, 376 00:23:46,258 --> 00:23:49,127 surgical history, medication treatments, 377 00:23:49,194 --> 00:23:53,131 laboratory results, radiology reports, 378 00:23:53,198 --> 00:23:56,468 current symptoms, adverse events, 379 00:23:56,535 --> 00:23:59,171 treatment response, disease status, 380 00:23:59,237 --> 00:24:02,073 protocol treatment, non-protocol treatment, 381 00:24:02,607 --> 00:24:04,910 toxicities, date of death, 382 00:24:04,976 --> 00:24:09,781 cause of death are typical clinical research data elements. 383 00:24:13,351 --> 00:24:15,987 The importance of common data elements 384 00:24:16,054 --> 00:24:20,826 also leads to the ability to complete data management plans. 385 00:24:20,892 --> 00:24:23,895 So during the design and development of a clinical trial, 386 00:24:23,962 --> 00:24:27,399 you need to establish the data elements to be collected, 387 00:24:27,465 --> 00:24:29,267 the design of the data collection, 388 00:24:29,334 --> 00:24:31,436 and the design of the computer database. 389 00:24:32,003 --> 00:24:35,407 You're building your common data elements for your study, 390 00:24:35,941 --> 00:24:38,310 and you want to utilize common data elements 391 00:24:38,376 --> 00:24:41,813 that already exist to make the data management 392 00:24:41,880 --> 00:24:45,116 and sharing more effective and efficient. 393 00:24:46,518 --> 00:24:48,920 The process of data management should result 394 00:24:48,987 --> 00:24:54,059 in an error-free, valid, and statistically robust database. 395 00:24:54,125 --> 00:24:57,762 Data management plans are developed to document processes 396 00:24:57,829 --> 00:25:00,098 and procedures of how you're going to do that, 397 00:25:00,165 --> 00:25:02,133 how you're going to handle the data, 398 00:25:02,200 --> 00:25:04,936 how are -- for each of your trials, 399 00:25:06,404 --> 00:25:10,108 what databases you're going to use, what data entry, 400 00:25:10,175 --> 00:25:13,211 how you're tracking your quality, 401 00:25:13,278 --> 00:25:15,614 what quality control measures you have, 402 00:25:16,348 --> 00:25:20,719 how you're tracking adverse event reconciliation, 403 00:25:20,785 --> 00:25:23,455 how you're going to deal with discrepancy management, 404 00:25:24,155 --> 00:25:26,224 data transfer, extraction. 405 00:25:27,225 --> 00:25:30,295 All the database management guidelines 406 00:25:30,362 --> 00:25:32,364 will be in your data management plan. 407 00:25:33,131 --> 00:25:37,535 The data management plan's comprised of multiple parts, 408 00:25:37,602 --> 00:25:38,837 and here's examples -- 409 00:25:38,904 --> 00:25:41,539 so description of all data sources, 410 00:25:42,274 --> 00:25:46,645 data and workflow diagrams, definitions of data elements, 411 00:25:47,279 --> 00:25:50,382 planned data model during data collection, 412 00:25:50,448 --> 00:25:53,151 any changes for specific data sharing -- 413 00:25:53,218 --> 00:25:57,122 if you're adding terminologies, adding codes, adding metadata -- 414 00:25:57,923 --> 00:26:01,993 observations and measurements, your data recording process, 415 00:26:02,060 --> 00:26:06,564 your data processing processes, your data integrations, 416 00:26:06,631 --> 00:26:08,867 your data quality control definitions 417 00:26:08,934 --> 00:26:13,338 and acceptance criteria, plans for business continuity, 418 00:26:13,405 --> 00:26:16,675 your security plan, your change management plan 419 00:26:16,741 --> 00:26:19,778 to anything that's part of your data management process, 420 00:26:20,345 --> 00:26:25,150 your data retention, archival, and disposal plan -- 421 00:26:25,216 --> 00:26:27,185 how long you're going to keep the data. 422 00:26:27,252 --> 00:26:28,887 What are you going to do with the data 423 00:26:28,954 --> 00:26:30,889 when you're ending your study? 424 00:26:31,623 --> 00:26:33,792 Do you have to keep it for a period of time? 425 00:26:33,858 --> 00:26:35,660 What is that period of time? 426 00:26:35,727 --> 00:26:37,295 And then your data sharing plan -- 427 00:26:37,362 --> 00:26:39,564 how are you going to share your data? 428 00:26:41,700 --> 00:26:45,503 So principles of data sharing in clinical trials -- 429 00:26:46,338 --> 00:26:49,474 so making the data sharing a reality, 430 00:26:49,541 --> 00:26:51,576 consent for data sharing, 431 00:26:51,643 --> 00:26:53,845 protection of trial participants, 432 00:26:54,346 --> 00:26:56,548 data standards, rights, 433 00:26:56,614 --> 00:26:59,851 types, management of access, data management 434 00:26:59,918 --> 00:27:03,488 and repositories, discovery, and metadata -- 435 00:27:03,555 --> 00:27:06,257 all these are part of clinical trial data sharing. 436 00:27:06,858 --> 00:27:09,794 All these rely on common data elements, 437 00:27:10,428 --> 00:27:13,231 and so it's important that you identify 438 00:27:13,298 --> 00:27:16,234 your common data elements that you're doing to utilize. 439 00:27:16,301 --> 00:27:20,372 And anything that you create, make it a common data element 440 00:27:20,438 --> 00:27:23,742 to be shared across trials by others. 441 00:27:26,778 --> 00:27:29,414 So coding and metadata in clinical trials -- 442 00:27:29,481 --> 00:27:30,648 data and coding standards 443 00:27:30,715 --> 00:27:33,918 should be built into any trial's data design 444 00:27:33,985 --> 00:27:35,987 at the beginning of the trial. 445 00:27:36,054 --> 00:27:38,056 Various data standards are available. 446 00:27:38,123 --> 00:27:40,658 Those from CDISC should be considered 447 00:27:40,725 --> 00:27:42,694 as offering the best starting point. 448 00:27:43,495 --> 00:27:45,663 Metadata schema suitable 449 00:27:45,730 --> 00:27:48,967 for describing all the repository data objects 450 00:27:49,034 --> 00:27:52,370 linked to the clinical trials to be developed and implemented, 451 00:27:53,004 --> 00:27:57,108 agreed by the stakeholders and repository managers 452 00:27:57,175 --> 00:27:58,743 and widely disseminated. 453 00:27:59,577 --> 00:28:02,881 Mechanisms to sustain metadata repositories 454 00:28:02,947 --> 00:28:07,152 for portal search systems that connect them long-term, 455 00:28:07,218 --> 00:28:10,555 so they can be utilized for data sharing 456 00:28:10,622 --> 00:28:12,657 but across clinical trials.