Transcript of the September 21, 2012 NCVHS Full Committee Meeting

[This Transcript is Unedited]

DEPARTMENT OF HEALTH AND HUMAN SERVICES

THE NATIONAL COMMITTEE ON VITAL AND HEALTH STATISTICS

September 21, 2012

Hubert H. Humphrey Building
200 Independence Ave., SW
Washington, D.C.

Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030
(703) 266-8402

Call to Order, Review Agenda - Dr. Justine Carr
Standards Administrative Simplification Letter - ACTION Outline of HIPAA Report - Dr. Walter Suarez
Privacy Community Health Data Report - Action - Ms. Linda Kloss
NCVHS - Summary Steps and Future Directions - Dr. Larry Green
De-identification methods for Open Health data - Jonathan Gluck and Khaled El Aman

P R O C E E D I N G S (10:08 a.m.)

Agenda Item: Call to Order, Review Agenda

DR. CARR: Welcome to day two of the National Committee on Vital and Health Statistics. I am Justine Carr, Lame Duck Chair of the Committee, Stewart Health Care, alive and well, and no conflicts.

DR. TANG: Paul Tang, Palo Alto Medical Foundation, member of the committee, no conflicts.

DR. FRANCIS: Leslie Francis, University of Utah, member of the committee, no conflicts.

MR. QUINN: Matt Quinn, NIST, staff of the Quality Subcommittee.

MS. KLOSS: Linda Kloss, member of the committee, no conflicts.

MR. BURKE: Jack Burke, Harvard Pilgrim Health Care, member of the committee, no conflicts

DR. SCANLON: Bill Scanlon, National Health Policy Forum, member of the committee, no conflicts

DR. COHEN: Bruce Cohen, Massachusetts Department of Public Health, member of the committee, no conflicts.

DR. CHANDERRAJ: Raj Chanderraj, private cardiologist in Las Vegas, member of the committee, no conflicts.

DR. SUAREZ: Walter Suarez with Kaiser Permanente, member of the committee, no conflicts.

DR. FITZMAURICE: Michael Fitzmaurice, Agency for Healthcare Research and Quality, liaison to the full committee, staff to the Subcommittee on Quality and the Subcommittee on Standards.

MR. WALKER: Jim Walker, Geisinger Health Systems, no conflicts

DR. WARREN: Judy Warren, University of Kansas School of Nursing, member of the committee, no conflicts.

MR. SOONTHORNSIMA: Ob Soonthornsima, Blue Cross Blue Shield Louisiana, member of the committee, no conflicts.

MS. MILAM: Sally Milam, West Virginia Health Care Authority, member of the committee, no conflicts.

DR. GREEN: Larry Green, University of Colorado, member of the committee, no conflicts.

MS. GREENBERG: Marjorie Greenberg, National Center for Health Statistics, CDC, and executive Secretary to the committee.

MR. SCANLON: Jim Scanlon, Deputy Assistant Secretary for Planning at HHS, and executive director of the full committee.

DR. MAYS: Vickie Mays, University of California, Los Angeles, member of the full committee, no conflicts.

(Introductions of staff and guests)

DR. CARR: Jim, I would like to turn it over to you. You have an announcement.

MR. SCANLON: Let me give the committee an update on a couple of things. About eight weeks ago, we made a set of recommendations for reappointments and new members for the full committee. I am pleased to say, late yesterday, it was at the end of the day, so we couldn't really announce anything during the meeting, the Secretary approved all of our recommendations. What I can announce today are the reappointments. For new members, we were asked to wait until we heard from the formal confirmation for the new members. We should be getting four members who will fit in very nicely with the committee, and will bring us up to the full complement of 18 members.

Number one, I am pleased to say that Sally and Walter have been reappointed for a 15-year term, 4-year times. They negotiated a 7 percent increase in pay, but we had to take away their health insurance. Then, let's see, we have Justine, our esteemed chair for quite a while now, a member of the committee. She has served two terms and so has reached the limit, we can't renew her. She will be leaving the committee, but staying on as the chair of our working group on data access and use.

The Secretary has asked Larry to serve as the chair. In a moment of weakness, Larry agreed. I think what we can do is, as of the end of the day today, probably Justine can turn over the gavel to Larry, and Larry will be chair for the next meeting subsequently.

MS. GREENBERG: I just wanted to say that we have four new members that we have to wait until they accept their appointments. We, of course, hope to bring them in, they will have done that and they will have completed their paperwork, so we can bring them in for the November meeting. We would also be prepared to bring in, to make the transition complete, those, I guess it is just the two of you right now, Justine and Judy, for that meeting, as well.

Now, you would be coming in any event because of the working group. You wouldn't be voting members at that point, assuming that they are voting members. Sometimes, they haven't gotten all of their paperwork in, so don't think that those dates are free now on your calendar. On the other hand, I don't want to put undue pressure on you, but I just wanted to clarify that.

DR. CARR: Actually, I want to just take a moment to share the things. Everything I needed to know, I learned at NCVHS as a member, I want to share things. We'll have to pass this along to new members, but it is relevant to current members, all members. Just a couple of things, and then a couple of things for Larry as the incoming chair.

As a member, I urge you to ask dumb questions, to make grids to understand complex issues, very helpful, express your thoughts once clearly, less is more. Turn off your mic if you are prone to sidebar comments. Read transcripts, your wisecracks may not be as good as you thought. Don't multitask, you'll be sorry what you missed. Bring snacks to meetings, no public dollars can be spent on food. Go to all the dinners and kick back.

Linda, this will be for you, bring a calculator and do the math for Marjorie at the end of the day. Now, for Larry, start on time, end on time, set timely deadlines and meet them. Ask for PowerPoints before developing a letter. Keep track of themes. When concepts are muddy, go around the room.

When discord emerges, embrace it as an opportunity to learn. Appreciate and recognize colleagues. Edit letters for one voice. Don't sign anything you don't understand. For the members and the chair, recognize the honor bestowed on you and live up to it.

One other request, can we do a class photo? It would be fun.

(Pause for class photo)

DR. CARR: Okay, we are ready for the standards, Walter and Judy.

Agenda Item: Standards Administrative Simplification Letter -- ACTION Outline of HIPAA Report

DR. SUAREZ: Thank you, it will be a tough act to follow. In the spirit of the theme today, and to reflect on our leader, this letter truly reflects some of the things that you just said, Justine, we hope. When you sign it, we hope that everything will be clear. More importantly, I think it always comes back to all of the things that you always told us to think about. Why are we doing this, who is being affected by it, and why is this good? I think those four questions that you always ask us to follow are truly reflective in the work that was done in this letter.

We presented this letter yesterday to the committee and we worked through the comments. Thank you again to everyone who provided us with those comments yesterday. We came back with this final draft. Yesterday, the subcommittee reviewed it and edited it, and is submitting it for your consideration.

The first change, I think, was down in the introductory party. There are a couple of places, Yes, on the ICD10, I think we added a couple of points. One was this point about ensuring that, through this transition and through this adoption of ICD10, we minimized disruptions in the business of delivering care.

We heard testimony, and even specific examples of this one entity that has gone out of business basically. More importantly, the fact that we need to be mindful, as we think of this transition as the Secretary and HHS, through the recommendations, begin to develop this strategy to minimize disruption to business of delivering care. We inserted that as one statement.

We also inserted another bullet below this, which is basically the last bullet, that highlights the importance, and we also heard this during the testimony, promoting the establishment of testing areas and test methods, including sample data, and to allow for some innovative opportunities for testing during the transition period, so we wanted to insert that.

All of this to ensure a couple of things, clinical consistency, so that when we move from nine to ten, whatever we express in ten is consistently clinically with what we were expressing before in ICD9. Financial neutrality, so a very important point about not seeing any deviation from what we were doing before with ICD9. Those two elements were added to the section here.

Those were the only comments in the first part of the letter, although the comments were in the recommendations, so we will go down to the recommendations. I think the first comment on the recommendation was on recommendation three, I believe. I think we did actually, yes. That's right, we added, as the clarification point on the dental codes to address some concerns that the way this was phrased might suggest some direct relationship or oversight from NCHS to ADA. We added, it was suggested, by testifiers, that NCVHS look into the maintenance process of all HIPAA name standard code sets. It wasn't just to look at one at or at look at a specific situation, but to look at all of them.

In addition to that, I think now we get into the recommendation side. The first change was really in the recommendation in the testing. Testing, we mentioned yesterday, was one of the most consistent and important themes. We had originally one recommendation. It was actually a long paragraph, and so we broke the same paragraph into the three recommendations. The first one did not change.

The second one, we changed the word industry, working session to ensure that it is different from the other listening session that we are recommending in the letter, that CMS convene. I think we added the Research 3C, which was specific to ICD10, and it was highlighting a couple of points. Again, the point about establishing the testing areas and test methods, and then the expectation that NCVHS would look forward to receiving reports on the status of ICD10 being tested on a regular basis.

DR. COHEN: We had talked about adding the word, quickly, or expeditiously or something.

DR. SUAREZ: Yes, putting some prioritization, I think that was another point that I don't think we capture here. We should.

DR. COHEN: CMS should, I don't know whether quickly promote or expeditiously promote the establishment. There needs to be more of a sense of urgency for this to happen.

DR. SUAREZ: Expeditiously?

DR. COHEN: I would really like to see that.

DR. SUAREZ: Timely, expeditiously, pick one. That was the main addition to this recommendation. Any comments or additional questions on this? Okay, then the next one in the location and outreach, I think we also added the importance of two things, targeting the safety net providers and small entities with limited resources in this outreach effort. The point, I think, that was mentioned, and I think that was it basically. That was the concept of we need CMS to certainly prioritize, given the constraints and resources, so it should target safety net providers and small providers and entities in this outreach and location.

DR. FITZMAURICE: Walter, would you want to consider outreach should be targets to help and educate safety net providers?

DR. SUAREZ: Target to?

DR. FITZMAURICE: Just to help safety net providers.

DR. SUAREZ: Help safety, sure, health safety net providers and small health care. Oh, help, not health.

DR. WARREN: Isn't that redundant? We are talking already about education.

DR. CARR: Do I have a motion?

DR. SUAREZ: We still have, I think, a couple of more. The only other one with recommendation seven, we added the word sample, so both conduct adequate sample of compliance audits, so that it did not give the impression that it was complies with audits across the board, to everyone, but the sample of compliance audits, which is something that CMS is --

DR. FITZMAURICE: To conduct an inadequate sample?

DR. SUAREZ: Yes.

DR. FITZMAURICE: Thank you.

DR. SUAREZ: Those were the main changes to the letter.

DR. FITZMAURICE: I would take out – and put its.

DR. SUAREZ: Thank you.

DR. CARR: That is now where you want to go.

MS. GREENBERG: I assure you, after we finalize these letters, I read through all of them but I always appreciate an extra set of eyes.

DR. SUAREZ: Those were the changes. The word industry adoption, or the term industry adoption, changed. We had before industry absorption, we changed absorption to adoption.

DR. CARR: Do I have a motion to approve?

DR. CHANDERRAJ: Move.

DR. COHEN: Second.

DR. CARR: All in favor?

("Aye")

Any opposed? None. Any abstentions? None. Well done.

(Motion approved.)

(Applause)

DR. SUAREZ: We want to next probably just bring up the additional point about the HIPAA report. As you all know, every year, we had been providing a HIPAA report to Congress. Actually, it hasn't happened really every year since HIPAA was passed, because we are in the 17^th year. Last year we submitted our 10^th report to Congress. We don't do it every year, and the subcommittee had a discussion about the timing and the appropriateness of preparing a HIPAA report to Congress this year, in light of a couple of things.

The first one was our very extensive detailed report was delivered late last year, basically almost early this year, to Congress, the 10^th HIPAA report to Congress, in which we had a number of things identified, discussed, and proposed to be worked on into the future.

The second major consideration was the fact that there has been a number of changes done in the administrative simplification activities, including the adoption of the new version of standards, starting this year in January of 2012, the adoption of operating rules, which will start in January of 2013, the delay of ICD-10, the adoption of a health plan ID.

All of these are new things that are going to be changing and shifting the way we continue to attempt to improve administrative processes. The subcommittee really felt that I was early to provide any type of report about those changes, because in reality, some of them just started to happen, and some of them are to happen in the next few months.

The decision, and that is the recommendation of the subcommittee, is to not prepare and submit a report this year to Congress, but actually wait until next year, when we would have the experience of at least a year and a half or more of implementation of the new standards, the new version of the new standards, as well as the implementation of some of the new operating rules and some of the other processes.

We also, in the letter that we just approved, recommend that CMS begin to consider putting some resources into assessing, into making a more deliberate and complete assessment of the implementation of this standard. All of the regulations really related to administration and publication. In light of that, we are recommending that we not report this year, or not prepare a report to Congress this year, but wait until next year.

I will turn it back to you, Justine, to see if there's any.

MS. JACKSON: Just in term of dissemination, Linda's suggestion about making sure that our documents get out right to the people who want them. Every time this report goes out, as you can see, I've got a note that this is a final copy. People really enjoy this report, so I'm kind of happy that we have another year of being able to disseminate this.

As you go to your meetings that can use it, please just send me an email. Let me know how many copies, we do have plenty. I am sure when the next report comes out, we will have the excitement of an Apple 5. We want to make sure that this gets out and uses this time for this year while we can.

DR. CARR: I think the work that was done last year was tremendous, and I think that when we do these reports, they deserve that amount of attention. It's too soon to do it again, so I think I agree, accept, endorse your recommendation.

I have to say Walter, you are an extraordinary powerhouse. You, and actually Judy and Ob, what you have accomplished, since I've been here, is nothing short of extraordinary. I thank you, on behalf of the committee, and also for myself, because you do such great thorough work. Thank you very much.

We will now move to the Privacy Community Health Data Report. I think the way that would be a good way to tee this up is to say, what were the takeaways from yesterday and the actions that you did, and then we can go into the detail.

Agenda Item: Privacy Community Health Data Report -- ACTION

MS. KLOSS: We gave you both a markup, as well as a clean copy, because it's hard to encapsulate what we did to it. We did a lot to it. We did combine principle one and three, so we reduced the number of principles from ten to nine. We took out musts, and made them shoulds.

We addressed all of the other points we believed that came out in the discussion yesterday. I think we might do best by going through this, paragraph by paragraph, but not worrying about wordsmithing or editing or typos. We didn't have the luxury of time to do a final polish on any of that. We just wanted to make sure that we've captured the issues that were most compelling.

DR. FRANCIS: We have got the clean copy up, but if you want to see what got changed, you have a markup copy.

MS. KLOSS: On the first page, we changed a must to should, you will see in the second paragraph. We also added reference to the original Code of Fair Information Practice, produced in 1973, to underscore the historical nature of the evolution of these principles. Maya suggested that and added that, and I think that does strengthen.

DR. FITZMAURICE: Could you give an example of maybe one way the communities are using this digital data? I am interested in, how did they get it, as well as what are they using it for?

MS. KLOSS: Well, the reference in communities today are using digital data to tackle important health issues, is just a general statement, that then is supported on the next page by specific reference to the committee's report on community health data initiatives.

DR. FITZMAURICE: I am saying are they looking at registry for say, influenza vaccines, and then sending out the police to knock on doors saying, you didn't get your vaccine.

MS. KLOSS: Are you suggesting we should put an example right here?

DR. FITZMAURICE: No, I am just trying to understand it. I am not suggesting any change, I am just trying to understand.

DR. CARR: You are right, it is a bold statement to say, digital data to tackle important issues in ways never imagined.

DR. FITZMAURICE: I'm trying to imagine that they get the data and use it without violating HIPAA.

DR. FRANCIS: We can take digital out, if that would help. Basically, that was meant to be a summary that came from the community health data report.

DR. FITZMAURICE: I don't mind digital, just what do they do with the data and how do they get it?

DR. CARR: You are right, it is the opening kind of introduction, so look forward to achieving the promise of that sentence.

DR. KLOSS: We could put footnote the report right here, if that would help. We also looked at ways to flip from anything that seemed like it was on track to be regulatory, to more of a positive. That really, at the top of page two, topic of the letter of stewardship framework, enabling communities to use data to improve health in a manner that fosters trust. Again, positioning these statements in a positive way.

DR. CARR: Just in the second paragraph, these developments should be encouraged. Yet, they should be coupled. I am just wondering whether we want to set it up. These developments should be cultivated and divided by data stewardship. I also don't know what the difference between appropriate data stewardship and data stewardship, or data stewardship practice is or data stewardship.

I think we just need to decide, data stewardship is a frame of mind that is carried out in some practices. I would take out appropriate and I would take out practices, and I would make it and guided by, or perhaps by guided by building the trust through data stewardship or something like that.

DR. FRANCIS: When I tucked in appropriate there, my reasoning was that we didn't want to suggest that there was only a specific set. That was the reason for that extra word, but it is fine to take it out.

DR. SCANLON: I like appropriate. I mean, there is the risk if somebody claims that they are doing data stewardship and they are not doing the right thing.

DR. CARR: I would say it is appropriate data practices then. I think stewardship, but we will see how is signing this letter.

DR. FRANCIS: We have got an appropriate later on, so we could just take it out. Go ahead and take it out.

MS. KLOSS: The sentence will read, these developments should be cultivated and guided by data stewardship practices. I am missing the appropriate in there, because I mean the custodianship of data doesn't mean it is good custodianship. That was the point, I think, right?

MS. BERSTEIN: Did you want to take out the word encouraged, or just say encouraged and guided by? You didn't want to remove the word encouraged, did you?

DR. CARR: I would say it cultivated, I would even say, and guided by data stewardship.

DR. FRANCIS: Then, appropriately comes in at the end of that sentence, because we combine two sentences, yes.

MS. KLOSS: Okay, page two. This is where we inserted the definition of community. There's two new sentences there.

DR. MAYS: I want to go back before you start the community definition. The very last part of the sentence at the top of the page on two, yes, right there. The topic of the letter is stewardship frame, enabling communities to use data to improve health in a manner that fosters trust. I am not as comfortable with in a manner that fosters trust, because I don't think we provide enough of a case for that. Instead, enabling communities to use data to improve the health of their communities, I would suggest is a friendly amendment.

MS. KLOSS: I think we were trying to very directly link stewardship to fostering trust, and so that was the point.

MR. SOONTHORNSIMA: Trust is the operative word here, I think.

MS. GREENBERG: The way we modified that sentence, still has the kind of negative aspect by the if. If you change the if, I think it is these developments should be cultivated and guided by stewardship practices, so that, I think I would say. Then, since you mentioned trust there, do you need to mention it again?

DR. CARR: Where you go down to data stewardship, comma, if individuals and communities are to trust. I would suggest that we say rather these developments should be cultivated and guided by data stewardship, building the trust of individuals and communities.

DR. FRANCIS: How about changing if to so? So individuals and communities can trust. The reason for making sure trust is there is that that what was called out in the CHIP report as the next step.

DR. CARR: I think the subtle things we are saying is that we don't want to say we have an emergency, it is broken, we have got to jump in. We want to say, we have a continuum of practice that is building. As we grow in our use of it, we grow in our understanding.

DR. COHEN: I don't know that we need may. Why don't we just say trust, so that individuals and communities trust that their information.

MS. KLOSS: Then, the lead-in to the sentence you're questioning, Vickie, the bottom of that page, these frameworks however have not been developed to attend to the topic of this letter, of stewardship framework enabling communities to use data to improve health in a manner that fosters trust.

DR. FRANCIS: The reason for that sentence was to indicate that although there are a lot of data stewardship frameworks out there, they are not directed to this circumstance.

DR. MAYS: Now, I understand the trust stuff. I am just feeling it over promises what is to come. What is to come is focused less on the process about trust, and more about the process of the use of data. It is just a little bit switching it in some way. That was why I kind of dropped it at the end, but let me see if I can edit it differently. I think that it is on the data and the community, and not the process of trust.

DR. TANG: Isn't the sentence building on the Fair Information Practices, and what we are saying then is that the framework exists. We are building on the framework in order to accommodate the broader sharing or dissemination of data. That is the uncovered thing that wasn't present in 1974. I think maybe what we are trying to express is we are building on instead of that wasn't covered.

DR. FRANCIS: Let's change that sentence to use your language, to say something like this letter.

DR. TANG: This letter builds on these frameworks to account for the broader use and sharing of data. I mean, that's what community creates. What it builds on previous frameworks to accommodate the wider use and dissemination of data.

MS. KLOSS: Well, we might want to reference the specific use of communities to use data, to improve health.

DR. FRANCIS: To encompass rather than to accommodate?

MS. KLOSS: The middle paragraph on page two inserts the definition from the report, many different types of communities today are using data to improve health. The report described some of these efforts. The 2011 report defined a community as an independent group of people who share a set of characteristics and are joined over time by a sense of what happens to one member affects many or all of the others.

Then, we go on to say, and reinforce that communities are diverse, although stewardship matters for all, application of stewardship principles may differ, depending on community characteristics.

DR. CARR: I don't understand what that means.

DR. TANG: The stewardship principles are uniform and universal, because communities are diverse and they have diverse uses and applications, we need to apply the principles somewhat differently.

MS. KLOSS: Principles are not processes.

MS. GREENBERG: An example would be, we heard from a number of Indian tribes. In their case, they have a very structured process of approving anything, any research or data that is collected, a whole governance structure because of their tribal relationships and their position. They are going to have a different, there's going to be a group you have to go to, et cetera, to manifest some of these principles, whereas other communities wouldn't have these structures already set up.

DR. CARR: Is it application of the principles or implementation of the principles. Because I think if we believe these are the principles, we apply them, how we implement them.

MS. KLOSS: This says application.

DR. CARR: I'm saying, I don't think that's the right word. I think implementing, in other words, this makes me think that you have 10, I can choose six.

DR. TANG: Okay, so it is like Marjorie said. We said that we need to consult with the community. Well, when you implement that in the Indian tribes, that means you go to the tribal council.

MS. GREENBERG: I think we all agree, but Justine is suggesting that implementation is what you are talking about.

MS. BERNSTEIN: Implementation strikes me as something you do with an actual rule or a procedure.

MS. KLOSS: I think we were trying to be a little more general, and reference how they are applied, rather than getting into specific practices. I think that is why we use application.

MS. GREENBERG: Maybe you could say, although the principles, not just stewardship, but although the principles matter for all, their application may differ, depending on community care. Make it clear that the principles are relevant to all.

DR. WALKER: I have what I hope is a substantive question. This document has in view the governance of data stewards, correct, not the governance of communities.

DR. TANG: Correct.

DR. WALKER: I think that could be clearer, partly because in some instances, the community is the data steward. In other instances, it is definitely not. I think there are places here where it's not clear whether we are governing the community.

Then, the question that comes out of that, so say there is a data steward that serves 12 communities. Which communities' characteristics does it match its data stewardship to, if they are different? I don't think we have addressed. When we say it should be matched to the community, I get the point. In practice, that will be a little weird, in some cases, I would guess in many cases.

DR. FRANCIS: One way to deal with this, because actually I think we can envision that, although stewardship matters, there may be cases in which a particular stewardship principle isn't applicable at all. We wanted to keep this very general, so I understand you.

DR. WALKER: What if there is one data steward, there are 12 communities, this principle is appropriate to this community, but not to that community. Data steward has two different stewardship responses?

DR. FRANCIS: They might have different responsibilities to different communities. Suppose what we do is just say, application of stewardship principles may differ, and then drop the depending on community characteristics, because they might differ, depending on a number. That is why I was trying to address that.

DR. TANG: Might we be emphasizing the wrong thing? We want to emphasize actually the universality of the principles across diverse communities. I think it is a bit of an exercise to the reader, in terms of how do you do it with the consortium that you happen to be dealing with. Our main point is that we all have the principles universally apply, despite the diversity of the communities.

DR. COHEN: I agree both with Jim and Paul. I think that universally apply, but it's not a function of the characteristics of the community. It is a function of the characteristics of the data holder, because stewardship principles are very different for a local government than they are for an NGO or a community coalition, or a provider in a community.

MS. BERNSTEIN: That is true, but I think if you look at all of the principles that doesn't universally apply, so for example, one of the principles is transparency and notice essentially. It doesn't really have to do with withholding the data, but who you are communicating with, how you are going to go about giving that notice.

It matters more about the characteristics of the community, its size, the way that you communicate people effectively, and not on the fact that I am holding the data, I am going to communicate with everybody exactly the same way. Depending on which principle you are talking about, that might be right or it might not be. Maybe it is more than just community characteristics, but it is not only the characteristics of the steward.

DR. SCANLON: I think what you are saying is there is a set of principles, and they are universal. They need to be adhered to and their application varies, all kind of circumstances, data holder, community, some other external factor, et cetera. The principle, transparency is the principle. You just stick with a set of principles.

DR. CARR: This gets back to my PowerPoint recommendation, we are at the end of this thing, we are proposing these 10 things. We are asking the Secretary to accept that these are the 10 things, and we are asking the Secretary to do research to further understand these things.

If I could, what is it, this is a letter to the Secretary, and we are asking what? We have recommendations, but we are talking about a lot of things in the beginning that don't follow through in the recommendations. I just want to take and tie what we are recommending.

DR. COHEN: Are we recommending, should this last sentence be, this letter is to help HHS facilitate the stewardship of community health information?

MS. KLOSS: Yes.

DR. COHEN: So should that be the last statement, this letter is to help HHS?

MR. SCANLON: Isn't this guidance to communities as well?

PARTICIPANT: Yes.

MR. SCANLON: We don't necessarily want to do something to them.

DR. WALKER: I think that is a good point, and I would say there are two parts of one letter, or two letters. One is what are the responsibilities of data stewards, whoever they are. Then, another would be quite a different thing, as sort of what are the things a community should look for, almost a consumer's guide to data stewardship, that is the flip side of that.

Then, I think the two need to be very clearly distinguished. I think you are right, we need both. Both need to be provided, but if they are not clearly distinguished, it gets unclear who is being governed, who is doing the governing, whose responsibilities are.

MS. BERSTEIN: Governing is too strong a word, right? We are not governing anyone.

DR. WALKER: We use it a great deal in the letter.

DR. FRANCIS: Could we take out facilitate the stewardship? What we are trying to do is lay out an affirmative framework, that we hope will facilitate, that will help HHS. Letters have different functions. Some letters say, Secretary, do X. Other letters can say, Secretary, here is an understanding of the territory, which we see some specific things that you might do. The most important thing is to understand that there is a territory here, and that is helpful. That is our goal in this letter.

DR. MAYS: Can I just suggest that rather than health information, it is usually health data. The focus is really on the data. Information is a little too broad, I think.

DR. FRANCIS: I would also take out local, because I think that is too narrowing, too. Where it says, although stewardship principles are universal, application of stewardship principles may vary. The communities whose efforts, because that is the point you want to make.

MS. BERNSTEIN: I am having trouble with the switch from information to data for the following reason. I think of data as the raw material that we collect and manage, and that information is what we discern from it and what is reported out after. This is supposed to deal with the whole life cycle of what we do, what we collect, how we use data, how we manage it, and eventually, how it gets disseminated.

At the point of dissemination, it is not data anymore. In some cases, it is really information. It is results and analysis and so forth.

DR. CARR: I need to interrupt for a second. We have got some parallel processes going on here. This letter, clearly, you have done a tremendous amount of work. We are seeing it for the first time, and new themes are emerging.

One question is, is it realistic to finish this in the allotted time. The second is that Ed Sondik is strongly encouraging us to do a field trip down to the NHANES trailers. We spend a lot of our time talking about surveys and data collection, and none of us, most of us have never seen it. The proposal on the table right now is to differ, and to give the members an opportunity to read this and set up a call, or some input onto this letter, and use the remaining time to tour the NHANES trail.

I am just saying this is what was asked. I am going to ask for a vote of hands as to what we can do. This is the only time that we can do it, so go ahead, Leslie.

DR. FRANCIS: All I wanted to say is that we have gone through the hard part. There is another probably five or 10 minutes of looking at it. We really have done the hard part. This is where all of the changes were. I, for one, would feel very sad if we've put all this amount of work and we are so close, and to have what feels like --

MS. GREENBERG: Obviously the invitation to tour the MEC has to take a backseat to the work of the committee. My concern is, is that there are a lot, maybe we haven't, you know, gone through the hard part.

I think you guys have done a lot of work, but I look through this marked up, and there's just a lot of changes. It is very hard to personally, I feel, I mean, maybe the committee's is more agile than I am, there have been so many more comments now, just on these first few pages, and I don't even think we have reached consensus. I am not even sure we have reached consensus on the purpose of the letter.

I think the letter is very important. I think it is important for the committee to issue it. I don't know that it is critical that it be issued at the end of this meeting. I particularly don't want the committee to be doing this under such a time pressure, that then you have to go back and sort of revisit it. That is my concern, but I defer obviously to the members.

MS. KLOSS: May I suggest though that even if we aren't ready to approve it at this meeting, having everybody here wrestling with these issues is invaluable. We will lose that.

MS. GREENBERG: I concur with that.

MS. KLOSS: Then, who will be back here in November, because we have got some basic concepts.

MS. GREENBERG: We may not be ready to take a vote.

MS. KLOSS: Let us walk you through what changes, so that you can read it.

DR. WALKER: I think it is unfair to the subcommittee as far as to not give them our full attention. I think it would be a mistake to send them off into a vacuum, and to work some more and have this happen again next time. I think we ought to take what time we need and get this done.

MS. KLOSS: Definition.

DR. WARREN: Are we still on this data information question that Maya was on?

MS. BERNSTEIN: I said we will drop it.

MS. KLOSS: In the takeaway here for me was that we need a little more discussion about this last sentence. This letter sets out a framework to do what? I mean, we really need to be crystal clear here, up front. As Leslie said, our goal was laying out the case why more work needs to be done on this, not laying out a definitive set of principles ready to be implemented. We hope that has come through.

DR. FRANCIS: If anyone thinks that the purpose of facilitating and supporting effective stewardship is not what should be the focus, it would be great to tell us that.

DR. WALKER: We might say this letter sets out principles for stewardship of health data, because I think we all agree that is what it does.

DR. FRANCIS: It should be a framework in principles, not a framework in recommendations. That is great.

DR. WALKER: Is it a framework?

DR. MAYS: You had said something about two letters. I was just curious.

DR. WALKER: I just following on what Jim said, that there are two communication tasks. One is to say to data stewards, this is the universal set of principles that apply to data stewards. There is another communication task that would say to communities, highly resourced and not very resourced at all, this is what you can expect when someone holds your information, so that it would be the same thing, just the obverse or whatever you call that.

If we want this ecology to really work, part of what we need to do is enable communities, to make sure that we know what these principles are, and have a way to assess whether they are being followed in their case. The baby blood dots is a perfect example, where the community found out belatedly that those principles weren't being observed. How do we help just make sure we do the other side of it? Often we kind of forget the last mile.

MS. KLOSS: The early thinking was that the next step from this might be a revision of a stewardship primer, directed at the community. I think you are right. Are we ready to move to page three?

Here, the major change was to delete references to HIPAA, and generalize that into just a discussion of current structures for data protection and ethical use, such as individual informed consent for identifiable data, de-identification. We tried to generalize that because the discussion yesterday was concerned that this was setting up these principles to be an addenda or an extension of HIPAA, and that wasn't our intent at all. Once we read it, we didn't really feel we would lose anything by just deleting that reference.

Then, we go on to add one sentence, that is inserted. These approaches may not always be adequate or practicalable for community health data uses. Communities need good stewardship principles to use data to improve their health.

DR. CARR: Just to clarify that thought, so in other words, there are times where community data is governed by informed consent or the common rule?

DR. MAYS: I was just going to ask, maybe we can say these approaches may not be practicable for community health data usage. I like the not always be adequate out, because I get concerned that it makes the procedures that we use seem like, oh, they don't protect me as much. We do a lot of work with the communities, to help them with some of that first part. If we could just drop, I don't think we need it, we can just drop, may not be adequate.

DR. CARR: Is it that it is practical or that it is relevant?

MS. BERNSTEIN: Getting individual consent, for example, for a very large population is very hard to manage. In some cases, we go directly to records or we waive consent through an IRB, or we do other things. That is just an administrative problem, for example.

DR. FRANCIS: We might want to change adequate to applicable. That would capture Justine's point that sometimes they do apply. We are not denying that.

MS. GREENBERG: The common rule includes waiving, or you have waiving, so you have covered it.

DR. FITZMAURICE: I notice you have an example here of the common rule. Aren't they more likely to be governed and be abiding by the HIPAA privacy rule, and maybe security rule, than they are the common rule? How many clinical trials do they face, but they are going to face a lot more uses, which will put them up against the --

MS. BERNSTEIN: The common rule covers more than clinical trials, it covers all federal grants that involve human subjects, including surveys that are not.

DR. FITZMAURICE: Don't deny it.

MS. BERNSTEIN: Many of these communities are not covered entities and most of them are not covered by the HIPAA privacy.

DR. FITZMAURICE: How do they get protected health information if they are not covered entities?

MS. BERNSTEIN: You don't have to be a protected health information to receive the information. For example, a public health entity can receive information, as long as the covered entity is disclosing it properly under the HIPAA rule. Once it is disclosed, it is not covered.

DR. FITZMAURICE: Agreed, so we are talking mostly about public health entities.

MR. SCANLON: To think in almost every case, the product will be de-indentified. We want it to be.

DR. CARR: Communities have names of people who found to have certain diseases, identifiable in detail.

DR. FITZMAURICE: It is strange to take out the reference to HIPAA, if you are talking about HIPAA de-identified data.

MS. BERNSTEIN: We are not talking HIPAA de-identified data, and that is why we took it out. We didn't want to connect the concept of de-identification specifically with the definition of the HIPAA rule.

DR. FITZMAURICE: Their definition of de-indentification.

MS. BERNSTEIN: De-identification existed before HIPAA existed. We used these kinds of things before HIPAA. HIPAA only covers specific kinds of coverage.

DR. FITZMAURICE: I understand, I am just wondering what the definition of de-identified data is then, but let's move on.

MS. KLOSS: Page four, we emphasize in the first paragraph that communities differ in many ways, different governance structures, needs, values and successful stewardship much address these differences in a flexible manner. We just tweaked that paragraph.

DR. COHEN: Again, it is not only the communities, it is the entities that hold the data. I think that needs to be added.

DR. WALKER: Just as a general thing, I think we should use data stewards over and over and over again, to make it clear. That sentence, communities, researchers, data users, consumers, I take it that some of those are data stewards and some are not, but they are all in that sentence together. That is what I mean about just being crystal clear about who is being protected and who is being whatever, governed or whatever, to protect them.

DR. FRANCIS: Just say guidance is needed for data stewards.

DR. WALKER: I think what this sentence is saying is, and the purported beneficiaries of data stewards, the potential victims, too. That is what I mean, I think this idea that there are two fundamental groups, data stewards and the people for whom they hold the data, are two different groups. I think our position is one that needs to be protected, the other needs to be managed or something, some word like that. They are very different.

It is confusing because some data stewards are the communities. One entity can be in both groups, but it is just everything we can do to keep those two groups separate, data stewards and whatever the other is, communities roughly speaking.

DR. FRANCIS: Maybe just tuck in, after desired by, data stewards, comma, communities, researchers, data uses and consumers. We heard from all of those groups.

MS. BERNSTEIN: I thought it was trying to get at the comment that you were making before, that the principles that apply to data stewards also consumers or users or whatever, understand what can be expected. The guidance is beneficial to all of those different groups at different times.

DR. WALKER: I think of it as car makers and car buyers. There is a framework that controls car makers. They have to have seatbelts and there are a bunch of things they have to do. The recipients of those protections, the consumers, have a different perspective. They need to be informed about their rights. If you don't keep those two separate roles, separate, than when you read this, it just gets hard to tell whether we are talking about the consumers, just say it that way, or communities, or we are talking about the data stewards.

MS. BERNSTEIN: I understand that. The way I look at that, with your metaphor, is if I am a consumer and I am aware that car manufacturers are required to put seatbelts, if I see a car without seatbelts, I am going to be worried. If I know what those rules are that apply, then that helps me to understand what is safe and appropriate for me, even though I am not the person who has to implement that.

DR. WALKER: If it is a good safety regimen, I am allowed to be ignorant of that all that and still buy a safe car.

MS. BERNSTEIN: I don't think that we want communities to be ignorant of all that.

DR. WALKER: I don't think we want them to be ignorant, but it is not their Constitutional obligation to be educated, either.

DR. MAYS: Since I think what we are doing is advisory, I just want to say that I think this notion of trying to see the letter. This is why I was asking about the two different letters. I think I am now coming to a different convergence, and that is trying to make this distinction.

When I edited an earlier version of the letter, that is what I was really struggling with. I would see some things as if I were in role A, which would be I am the data steward, I should do it this way. If I am in the other role of being the person whose data is being used, I kind of felt like I had a little, there was a different nuance to that.

To the extent possible, as the revisions are done, either having a reorganization where there is a separate section that talks about kind of each person's responsibilities or what they have to gain, I would suggest that. Or to the extent possible, that we go through and make sure that those two things are separate. I am more worried about the community. I just don't think the community has the same responsibilities as the data steward, and the data steward is where we really are trying to push for the change, then, for the community to be informed.

MS. KLOSS: We will clarify that, because when we are using community, we are using it as data steward. We will be more deliberate in doing that, because we are addressing this to the organizations that came before the committee. And they said, here are all the wonderful things we are doing, but we are worried about trust of data. I think we can fix that.

MR. SCANLON: It is directed at the sponsors of these community health data initiatives, maybe stewards who are probably the stewards, should be stewards.

MS. KLOSS: Right.

MR. SCANLON: They are clearly the sponsors. They may be stewards.

MS. KLOSS: We will be very explicit in terms of who it is addressed to.

MR. SCANLON: Just say community health data, because when we say community health initiatives, it takes care of this.

DR. CARR: Just a question on the next sentence, the community recommends that HHS develop guidance on stewardship practices for use of, would this be community health data then that we are talking about? Is that the same recommendation that appears at the back?

DR. FRANCIS: Justine, I think part of what we need to do is combine those last two sentences. The committee recommends that HHS should develop guiding principles and resources to enable data users and data subjects to understand the chain of trust required to be effective stewards. That captures both sides. It doesn't look like we are making guidance documents, which is what we were trying to avoid. Develop guiding principles, take out the develop guidance, develop guiding principles and resources. Take out from develop principles and resources, okay, to enable.

DR. CARR: We have incorporated here a starter set, I guess, that is missing in the recommendations. We also recommend that those principles address the following 10 things. That doesn't make it through, I don't think, to the recommendations.

DR. COHEN: Can you explain to me the difference between the principles and the framework, or how they work together? I am just unclear on that.

DR. CARR: I go back to the data definitions, who is the steward, what is the framework, what is the guiding principle.

DR. COHEN: I see the principles, but I don't see the framework.

DR. CARR: Let's hear from the authors, what is your concept of a framework, and Bruce or Judy, what is your concept of a framework?

DR. FRANCIS: The reason I think that we use the idea of a framework was that we did not want to imply that this was a set of principles that are necessarily set in stone. They are a starting architecture that should frame the way we think about stewardship. There may well be other principles. I mean, these are principles that fall under a stewardship framework.

DR. COHEN: I see those principles, but a framework implies a structure to me about how you integrate them, how you follow them, and how you apply them. I don't see that framework in this letter, I just see the principles.

DR. FRANCIS: We didn't tell you we are doing the full framework. We said we are starting on one.

DR. CARR: I think the word has different meanings to different people. We either need a data definition in the beginning to set expectations, or we need to stay away from framework.

MS. KLOSS: We get a little closer to the framework, as you are describing it, Bruce, in the appendix, which does differentiate between individual responsibilities and community responsibilities. I think that we saw that that is where we needed to head.

I will say up front that we did not take out some of the musts in the appendix yet, so that part is still needing to be tweaked. That clearly is moving in the direction of a framework.

MS. GREENBERG: Our primer uses the word framework, doesn't it? It is a stewardship framework? I don't know if it uses framework or not.

MS. MILAM: It is the way it used when you look at any other stewardship or privacy framework nationally, as well as internationally. You have principles making up the different components of your framework.

DR. WARREN: The problem I have is all I see and hear are principles. I don't see any components of a framework. If you are telling me the components of the framework are in the appendix, then my question is, why not put them in the letter. Your letter is a stewardship framework for the use of health data. If you put the development of the framework in the appendix, to me, that says it is not that good.

DR. COHEN: I don't see the table in appendix A as being actual framework. I just see it is a list of responsibilities for different parties, with respect to the principle.

DR. FITZMAURICE: Suppose we just say stewardship principles for the use of community health data?

DR. WARREN: Then, I think what you can do is at the end, say this is step one of developing a stewardship framework. That tees us up for the next big product.

MS. BERNSTEIN: What else would be in that framework, that makes it a framework and not a set of principles? We have talked about how we really can't place a governance structure, because the Indian communities, some have governance structures, some don't. That was the point of this, is that the governance structure that is appropriate to the community or developed by the community is what they should use, right, which we say.

DR. FRANCIS: That actually argues for just talking about principles here. I am fine. We will rewrite this so it talks about principles, not framework.

DR. COHEN: Maya, you need to apply the principles in some kind of ordered fashion to have a framework, and that is what is missing. That is the distinction I would make.

DR. WALKER: I think the consensus is that principles is a perfectly brilliant place to start. It is not a problem.

MS. KLOSS: We aren't ready for a framework, as you're defining it.

DR. WALKER: Can I ask another framing question, I am sorry. We say community data stewards, does this exclude other data stewards? We don't mean to exclude other data stewards from this set of principles, right? I think the fact that, we have enunciated these principles in response to a community data steward request. That is different from implying somehow that this doesn't touch other data stewards. Every time we say community health data steward, I am just thinking we ought to say health data steward, and that applies to everyone, communities and researchers and all of the others.

MS. BERNSTEIN: I think we were just talking about stewards of community data, community data stewards and not community data stewards. I don't think it was more than that.

MS. KLOSS: I think we were being cautious to have this more specific and narrower audience in mind, but realizing at the same time that this set of principles are more universal.

DR. WALKER: Then we ought to take the Texas Department of Health, because see the issue there is that this data steward betrayed these principles. Then, it is not relevant if we are really talking about. I would, by the way, say stewards of community data, then nobody could misinterpret it.

MS. KLOSS: That is perfect. See, we couldn't have done this over the telephone or on SharePoint. Let's just take a few minutes now to sail through principles themselves, because we worked hard to take the musts and the prescriptive language out of this, and just lay them out as topics. Principle one, openness and transparency, there we moved what had been in three, which was the communications principle, we moved that into this one.

MS. BERNSTEIN: We made more specific connection between the blood spots, which was having to do with outreach in particular and communication.

MS. KLOSS: Number two, purpose specification and use limitation. You can give us feedback on the stuff within that, in the next iteration, but let's make sure we have got the titles right. Three is gone, so a new three is involving communities in decision-making.

DR. WARREN: I have a question on two. One of the things that we are learning about health data is that later on, with new knowledge and science, we may want to repurpose the use of the data. How is that handled, because you have got limitation? You have purpose specification and use limitation. We can say we can't do.

DR. FRANCIS: No, we just say you reevaluate it. That is what the second sentence says.

MS. GREENBERG: I would still just say purpose specification and use, and then talk about it. That limitation right away is a red flag.

MS. KLOSS: How about purpose and use specification?

MS. GREENBERG: Yes, but I would take limitation out of the title.

DR. FRANCIS: That actually comes from the original Fair Information Practice Principles statement.

MS. BERNSTEIN: That would be a red flag on the other side, to people in the privacy community and other kinds of advocacy community, because that is sort of how it is stated. A basic principle of privacy is that information collected for one purpose should not be used for another purpose, without going back to the data subjects. Now, we do that in research sometimes, but that is by the reevaluation.

DR. WALKER: I think what Maya is saying, if I understand it, is if we take it out of the title, that will be read as a message. I think that Maya is saying is from the privacy community's standpoint, it won't be seen as a good message. It will be seen as a betrayal of privacy. It is just something to be aware of. I am not saying we can't do it.

DR. FRANCIS: Actually in FIPS, typically purpose specification and use limitation are two separate FIPS. It wouldn't look unusual to just have purpose specification. We do, in the body of this, say that if there are changes, that it should be reevaluated. That is not saying that you can't do it, it is saying that you have to think about whether, as a good steward, this is a change.

MS. GREENBERG: I think that is reasonable to say.

MS. MILAM: At the same time, when you look at every other framework that is out there, and there are dozens, not having this, and as I think Maya said, it is usually a standalone principle. Not having it in its entirety will be a red flag to the privacy community.

DR. CARR: The way I look at two and three is that today's reality is we collect data, and now we make connections where that same data can inform, enhance, improve health. Even raising the question of, should we go back to the others, I don't see how we can go back to the original people and now say, oh, we now discovered something else. I think it ties more into involving the community. There you say, this data now can answer that question, and how do we engage the community around that repurposing of that data.

MS. BERNSTEIN: That is going back to the people. That is one way to go back.

DR. CARR: I guess I read this as if you have to go back and talk to the original people who were in that cohort, to get their opinion. They may have been long gone.

DR. FRANCIS: We don't intend to have it be read that way. What we do intend, one way that, I will just speak personally, that I see as deeply problematic from the point of view of public health, is that there has been a lot of insistence on going back. Part of the whole point of some of the earlier framework stuff is that this is not a good model, the one you just described, for many public health circumstances. That doesn't mean anything goes, so that is why we wanted to say reevaluate under these stewardship principles. Maybe the best way to do it would be to say specification of purposes and uses.

DR. CARR: Maybe what needs to be explicit is in this new world, data will be repurposed. Maybe that is the message that people need to get when the data is used, and not the expectation that it will never be used. I think it is all being reused, that broadens the purpose.

MS. BERNSTEIN: This doesn't say it will never be reused. This says, if you make a significant change, you have to reevaluate how to do that. That is all this says, but you don't want to get into the Havasupai case, right, where you have got something, where you collected information for one purpose, used it for something completely different, and you have got a lawsuit on your hands because the researchers went off on some track that the community completely didn't expect, and objects to.

DR. CARR: I guess researchers are guided by their IRB common rule, whatever that kind of stuff. I had thought we were talking about the kind of public census information that was used to define a community. Now, we can take it and marry it up with something else, and tell a new story. When I gave census information, I thought they just wanted to know how many people lived in my house. Now, I find that it is being repurposed for some other thing. That is what I think we are talking about. Anything that is guided by a consent, an IRB, privacy research, that has a whole separate set of rules. I think maybe that is where we are getting confused, for me.

MR. BERNSTEIN: Also, things that are also in the original consent form may be more narrow than future uses. I think this is Judy's point that, in the future, someone may look at the data and go, you know, I can tell a different story with this data that the original consent form did not anticipate. Now, what do I do? Even if I had an IRB at the time, I mean, we may not.

DR. CARR: These are two separate tracts, and I think trying to create a middle ground is what is making it difficult. Anything covered by consent needs a separate thing. Data that is in the public domain, maybe collected, you have a heel stick because you wanted to know if my child had homocystinuria.

I don't even know if that is the right answer, but things that are in the public domain already, that have been repurposed, I think people should know when data is collected about them in the public domain, it likely will be repurposed. That is all of what we are talking about today, of matching stuff up.

MR. SCANLON: I think this is really sort of getting at the crux of this, plus who exactly does this. We are mixing up publically available data that is identified and often statistical, that is meant to be used for the vital statistics rates, the infant mortality rates. Who cares how many people use it over and over again for counties or cities? That is meant to be an indicator. We are not revealing anything about individuals or causes of death, it is a statistical indicator.

The whole other area, where you have done community research, you may or may not involve HIPAA, it could be re-identified, or its publication would result in harm to the community. Those are things that are almost a different set of guidance for the data developers, I would think. Mixing them together, I think you are scaring people here with a level of governance and regulatory framework, even though we are not saying it, it is publically available data.

MS. BERNSTEIN: There's no governance here, there's no regulatory framework, and these principles apply to both what you are just talking to.

MR. SCANLON: You have to make this distinction or no one will know what to do.

DR. WALKER: I wonder, it sounds to me like what we are talking about is something like accountable reuse. If you are going to reuse data, there is an expectation that you justify that to yourself, to others, in some kind of way, to be specified further, and maybe undoubtedly situation specific. It is accountable reuse, and something like that, I think, is what we are trying to say.

Yes, information will be reused. Some of it was even designed to be reused. Whatever the case is, when you do that, there should be a set of questions you ask and answer, and record publically probably, that prevent the public feeling blindsided, communities feeling like it has to sue.

DR. COHEN: This discussion just needs to be expanded.

DR. CARR: I want to hear from Glen and then Vickie and then Bruce.

DR. NICHOLS: Just very briefly, thank you, I just wanted to have everybody go back and read this sentence. It just says stewards also need guidance about when types of data might be considered. It is just saying be mindful of it and that we ought to develop standards to govern the reuse. There is nothing, I mean, I am as nervous about having avenues to research cut off as anybody. I don't think that is what is happening here. I think what is happening is a recommendation that we actually think about what it means to design a principle around which reuse would occur. I think that is all they are saying.

DR. CARR: It is the first sentence, the purpose of data collection and use should be explicit. Today, I am explicitly collecting these data for X, Y, Z. What do I do tomorrow?

DR. NICHOLS: It is saying we should develop guidance. It is not saying you can't do it. It is saying we need to think about how to govern that situation.

DR. CARR: I am saying that there is certain data, by definition, is for reuse, and that is what we need to say.

DR. NICHOLS: That would be part of the guidance, I think.

MR. BERNSTEIN: That is an explicit purpose you can specify up front, if you know up front. This is a question of what happens when you don't know up front. How do you deal with that situation, because if you know up front, Justine, you can give that notice.

DR. FRANCIS: Then, actually what it is trying to say is, when you have the new purpose, you should state it. You should think about, not that you are limited, but you should think about under these principles whether the due use is okay. That is all it is saying. If we are not clear about that, that's an easy fix.

DR. CARR: Vickie and then Bruce and then Marjorie.

DR. MAYS: If these sentences stay, the purpose of the data one, the significant changes, then there needs to be at an earlier place in the document, a longer discussion about these different types of data and I am going to tell you why. When you say significant changes from the original purpose should be reevaluated and the principles of community health data stewardship, the person in the community is not going to understand the different between my NIH grant, in which I had an IRB that had community members on it, that allowed me to have brought a different use later in my data.

Then, I can see I will be out of the meeting and they will say, then, we have to do these principles of community health data stewardship. It is like, no, we already had consent to do what we did. It feels like then that, on the subject of, I did something that wasn't honest or something.

There are too many different types of data, and I think the notion of what we are really about to experience, which is what I think we want to deal with, it is almost like our group that is coming later actually is going to start trying to figure out ways to connect all kinds of data together, to give it to the community.

It is going to be repurposed, beyond what I think any of us are sitting here imagining. We want to be ahead of the curve. I think you actually will do an incredible contribution to help the community understand the notion of repurposing, and how then technology, as well as the departments like open use and trying to facilitate use of the data more, for them to think about those things would be great. I mean, it would be exciting to the community, I think, to hear it that way.

This way, there is another side that, unless we tell them, well, you were given the chance. We did tell you that we would come back and do this. I think we are at a place where we can make an incredible contribution, that would be great for the community.

DR. FRANCIS: Could I just ask a question, because this could be written in a way that says, with research data, we have the following regime that applies. With data originally collected for public health, we don't have this regime that applies. There are still questions about repurposing. There are also questions about repurposing research data. They look just different in different contexts.

We could write that. It is at least three paragraphs to write, and the letter is really long if we do that. I guess one of the things we were trying to do was cut a balance between length and raising the questions. If having it be this short is seriously misleading, then we cut the balance the wrong way. I just want to raise that for reactions, because at least my sense is that in order to be responsive to a lot of the kinds of comments, this is going to have to be a longer letter.

DR. WALKER: Vickie, I would have thought that when you are in that public meeting, you just would have said, this is the original purpose that this information was collected for, and here is the process we went through. It is probably a teachable moment for you.

I would have thought the first sentence here would have taken care of that, because what you are talking about is not reuse, it is just someone misinterpreting your original specified carefully-worked out use, as reuse. You just need to explain to them, no, the reason this data was collected was for this exact purpose.

DR. CARR: I do have Bruce and Marjorie.

DR. COHEN: A couple of things. I think you are correct. I think in your attempt to be parsimonious in your words, you lost the richness of the context. Some of these basic declarative statements need more context, so everybody reads them and understands them the same way. That is what is I happening. It is clear to you what you meant when you wrote them, but it is not clear to the reader. I think the letter needs to be longer, to explain some of these things. That is my first point.

My second point is, you say stewards also need guidance. Guidance from whom? Is the intent there from the community, from some other body? I don't know who is going to provide stewards guidance. Again, that is another example of your attempt to be discreet, but it raises more issues than it clarifies.

MS. KLOSS: It does relate back to the discussion of where this came from, from the testimony of the communities, that indicated a need for guidance. We were referencing that back.

DR. COHEN: The intent here is that HHS is going to provide that guidance to stewards?

DR. FRANCIS: Or facilitate someone else doing it.

DR. COHEN: Okay, whatever the answer is, it just needs to be here, because it raises more questions to me not knowing who is going to provide that guidance.

MS. GREENBERG: This has been a very interesting and rich discussion. I think there is no doubt that there is need for something like this. There also is no doubt that some additional work, I think, needs to be done by the committee before we can ask anyone else to do additional work, like the department or someone else.

I think what you said about needing context is obvious. I loved the letter when I read it the first time because it was clear, it was crisp, it wasn't bogged down. All right, maybe lack of context, too. I think where you suggested a few times, Linda, that really what this might be leading to is an updating of the stewardship document, the primer, which could really spell out these different types of data, different types of uses, different types of issues, either in several appendices or something else. I think that that is probably what is really needed. Whether the committee has the bandwidth to do it, we have already started it, so maybe you do, at least to take it to a certain point.

I just want to challenge the concept that data will always be reused, you just have to be clear about that. I think there are cases where certain repurposing is inappropriate, and as Maya said, could get you in a lot of trouble if you have not gotten either, whether it was informed consent or the waiver, or whatever understanding under which you collected the data, really doesn't permit that.

I don't think we want to go on record saying, we just want to be clear that that is going to happen. It shouldn't happen sometimes without either going back to the IRB, going back to the individuals, doing something. Exactly what, it depends on the circumstances. This is maybe kind of a wordsmithing, except that I think when you are talking about number two, I would rather you said the purposes and uses should be explicit or should be clear.

It isn't always, you don't have to only have one purpose of data collection or one use. It is just that, if you have multiple purposes, you should make it clear. It is like you said, if purposes are allowed, I mean, we collect some data through HANES where we tell people, this is the way it is going to be used. It might be used in these other ways, too. The point is that you have to really be transparent and open.

I think these nuances need to be, if not in this document, in that updating of the primer. I agree with Vickie that it will be very useful. It obviously needs some additional work. Now, the question is whether that is where you want to go next, or whether you want to start with this letter and say that you are going to do that, and then that would be your next product, one or the other.

DR. CARR: If I refer to our agenda, we are actually at the juncture where we say summary steps and future directions. I think that is a good way for us to wrap up this conversation, this very rich discussion, and then also we turn to the discussion we had from 8:00 to 10:00 this morning, and decide on next steps.

Larry, I am looking to you a little bit, to put your perspective and how you would like to see next steps, because you will be sitting here.

MS. BERNSTEIN: Before we move on, I could just ask that we wrap up this letter by asking members of the committee, if they have further comments, one of the comments was that we hadn't seen the letter, that we continue to work on this, that they really make an effort to read it carefully and make comments, send them to myself.

MS. GREENBERG: You want to make changes to it, based on this discussion, before?

DR. FRANCIS: We will make changes and send a new version around. A couple of things, before I do that, I really want to be sure of, which is that it is okay to do a letter because earlier on, we actually asked this question, whether it should be a letter or a primer, and we got the sense of the committee that it should be a letter first, so we wrote a letter. It should be understood that we are going to start with a letter. If people really think we should start with a primer, we should know that.

The second thing really early on is that it will be longer because it is going to need to give context. If anybody has any trouble with that, we ought to know that now, too.

DR. CARR: I think the whole discussion of letter, primer, length is secondary to what is it that you want. What is the ask, what is going to move this forward? If there are things we need from the Secretary, we want to articulate them and move that forward. If there isn't an ask, it is a reflection on what we heard, then it shouldn't be a letter. I don't want to get locked in stone. I think the dialogue has been very rich, and I think the sensitivity about the importance of this type of document has helped us be very meticulous in saying what we mean.

As difficult as it is to kind of hammer through this, this is really when we do our best work. I think it is because this deliberative group, community, committee really does deliberate, because no one else will. If it feels hard, it is because it is hard. Every contribution to this is making this sharper and more helpful. Take it as an affirmation we are on the right track, and it should be a letter if there is an ask.

DR. FRANCIS: That is something I have never understood because couldn't a letter inform the Secretary? In order to take the form of a letter, does a letter have to say, Secretary, we need you to do A, B and C?

MS. KLOSS: I think there is an ask here that isn't something that we need a new taskforce for or something like that. It is an ask that underscores the dynamic of what is going on in community health daily use, and raises this as an issue that needs attention.

I think there are a lot of different ways that can be carried out by the Secretary, through new thoughtful provisions perhaps in granting and other varied ways. I think that we were seeing this as thinking that needs to permeate a lot of things, not being one single sort of project to be done.

MS. GREENBERG: Let me just say that that kind of letter, the committee has done that in other cases. This is where we are in thinking about this issue. I remember with the PRMI standards. Then, it gets it out to the broader community, the health industry, the communities, whatever, at large, too, so there are some opportunity, if it works right, for people within the department, outside of the department, et cetera, to communicate back to the committee and say, we think this is going in the wrong direction and this is going in the right direction, whatever. I think there are purposes for letters that go beyond adopt this standard for this transaction.

MR. SCANLON: The purpose here is to inform, somewhat persuade. If the committee said, we have become aware of this, this represents what we heard in our thinking to date, we will continue to look at ways of approaching this. I don't think we have something that HHS can do much with at the moment, other than do that. Even explaining it to our data holders, I am not sure they would know.

I think you have identified an issue that is an emerging issue, and I think the letter should be an informing letter, and say you are doing it and you are looking into it and so on.

DR. TANG: Maybe I might summarize it a little bit, in terms of the ask. We are pointing out a need for universal protection. Because it is universal, that is the federal government kind of responsibility.

The suggestion or ask for the Secretary is it would be wonderful if we had uniform guidance about how to be a good steward of community health data. That is voluntary at this point. If it is widely abused, then it should be mandatory. It is still front and center. When it is privacy, it would be wonderful, we could do things voluntarily. Because we are state-based, we would like to have some uniformity. They ask us for uniform guidance, that could address that thing which really is a universal right of citizens in this country.

DR. CARR: Also, we would like to seek input from the working group on data access and use this afternoon. I don't know if we will have an update on the letter, or at least maybe we will take a look at the ten principles and provide you their feedback, as well. The plan will be that we work what we have, circulate it to the full committee with the timeline for their feedback, incorporate that. Then, it will go to the executive subcommittee in preparation for the November meeting.

DR. MAYS: Can I just ask for clarity, because I hear two different things? There is going to be an ask in it or not an ask. Maybe at the very beginning, tell us that, so that as we read it, we will know what we need to help with.

DR. CARR: I think that will be helpful, because there are rich things in there, and I think that just calling them out, deciding on the ask is good.

DR. FRANCIS: Jim put the point of the letter, I thought, very well.

DR. CARR: That brings us to our third theme. That reminds me, there may be a little more work to do to tighten up the document that we discussed yesterday, which was actually just the minutes of our meeting. At some point, I think we may want to revisit that document, and perhaps seek input on that, in terms of our guiding principles and our work.

MS. GREENBERG: Were you thinking of revising? People do it all the time, I was going to say we don't want to rewrite history. History is in the view of the writer obviously or the historian. There is one thing that just documents what the executive subcommittee discussed. At the same time, I think there was some useful and good input yesterday. You just have to decide how you want to do that.

DR. CARR: Kind of having it, I think someone said it yesterday, Jim said it, dynamic document, because it is a moment in time. The principles are things that we have learned along the way, we may revise, et cetera, and the focus.

MS. GREENBERG: We can even just introduce them as a summary, as modified by discussion with the full committee, so we have a single document.

Agenda Item: NCVHS - Summary Steps and Future Directions

DR. CARR: All right. Then, that brings us back to our discussion from 8:00 this morning. Then, Larry, I will turn it to you, in terms of how you would like to see next steps on that work.

DR. GREEN: First of all, Sally and Paul and I all want to thank you for coming to the 8:00 a.m. discussion. I think our first next step is we get a nice summary of that discussion, sculpted toward a conceptual framework and also narrowing that framework down to some focus particular work. I think the next step is a written document that allows us to say, yes, it is the same meeting I went to.

I think the second step is we need clarification of the federal players that are relevant to care about and involved in this theme, what we were talking about. To make that specific, I studied the minutes from last time, and we heard from the ONC about issues that are pertinent to this.

We heard about the Center for Consumer Information and Insurance Oversight. There is the new workgroup and I will be engaged in the workgroup in this area. Brian Civic is the new CTO for HHS. This is extremely pertinent to his charge and his work. We have the CNS Office of Information Products and Data Analysis, that is pertinent. It goes on, and I am frankly befuddled by all of that. We need some clarification, I think, probably from Jim and Marjorie. When we start chewing on this theme, these are the folks that have to come along. I am sure our liaisons can help us with that, too.

MS. GREENBERG: In fact, in that regard, I just wanted to mention that Seth Foldy, he had a two-year appointment at CDC, and his two years have ended. He has gone back to I don't know if it will be the private or the public sector, maybe both. He obviously will no longer be the CDC liaison, although he was working to get a new one.

I think first of all, I would like to propose that we write him a letter, thanking him for his liaison function during the time that he was liaison to the committee. Also, that this project or this theme is the one that I think we could use the most help from CDC, to say as we discussed this morning. This is an area that they work in and have done work and are very involved with, so that we could actually rather than just asking for a liaison, we could ask for a liaison particularly who could maybe help support and bring CDC expertise to this project, and help define it for that matter.

DR. GREEN: A key thing that I have heard mentioned at least four or five times in the conversations is we want to not do something redundant. We want to step into a space that needs to be stepped into, and where we are positioned to be the right group to be doing it. It seems to me, as part of the next step, we had better get clear about that, so that a year from now, we don't have a discussion saying, why the heck did we get here.

MR. SCANLON: I was at a meeting the other day where everyone talked about being agile and lean, and not necessarily have a biblical outline before we start a project. Basically, take it step-by-step. Here, I think we need a fair amount of exploration before we know what it is.

I am wondering if we should start with some facts like what exactly does CDC have in the nature of community, health information, products and services, and maybe CMS and others, just so we have a better sense. I think there may be organizations, I think the folks know, that do this, as well. There is the whole community health indicators project that we were doing.

I don't know. It is probably maybe a subcommittee hearing or just a meeting. I just think that we haven't really don't a good environmental scan. You are exactly right, Larry, and I don't want to commit to some specific report when it probably was true that someone else had a lot better.

DR. GREEN: Staying pretty operational, I think the co-chairs of the executive subcommittee are going to have to work with Marjorie to have a manager approach for ourselves. Now that we have got the three themes consolidated, and we have a new workgroup and we are about to be 18, and we are going to suddenly have people sitting around the table, have no idea what the heck we are talking about.

I am thinking that the next step is to do as much preparation as we can in orienting them, so that we do our best to put them in a position where they can start being effective now, rather than two years from now.

I am going to hang myself again. You don't often have opportunities to hang yourself in public twice for the same crime. The last meeting we said, we want one of the Susans. It appears to me that we have lost one of the Susans, I don't want to lose the other one. Susan Queen is working with another group, so for our theme, we want the other Susan.

The point here is that, I bet we have got unanimous opinion. We are putting pressure on our staff. We cannot pursue this community as a learning health system them. After the conversation I just heard, I am pretty sure that to be successful, we are going to have to really tighten up our staffing, and people assess the implications for that.

Right now, at the most, the publication subcommittee has no lead staff. The next step is resolving that, there has got to be lead staff here, I think. I want to invite both Sally and Paul to add other next steps. I have one personal one, and this one I am going to go off the ranch from reporting about communities learning health system theme, and pretend that I am about to become the chairman of the whole committee.

These are work assignments for you guys. Pretend you are in fifth grade and I am your teacher. This is a homework assignment, okay? Think about it that way. Read before you get here, read it all. I am absolutely convinced from the conversations I have heard last night over dinner and here this morning, that some of us arrive here not knowing really what has already been done. We haven't digested it, and we learn about it as we go. That slows us down.

In the instance of the communities learning health system, please read our report, from cover to cover. If you can spare a little more time, read the appendix, because so many of the issues that come rolling out on the table, they are already there. They have been debated by those whose shoulders we are standing on, they have been expressed. Then, if you read it and you say, this is wrong, bring it to the table immediately, please. Consider that a homework assignment.

The second one, if you have never read anything from or about the Folsom report in 1967, go online and track something down about that, to the point that you can come back next time and know what a community solution is. What is a community solution? It was the key idea, produced by the American Public Health Association in the 1960s, after doing a bucket load of work. Once we do that, I think it will provide us with some common understanding where we won't have to have some of the discussions that we seem to have to have right now.

Thirdly, would you take about 10 to 15 minutes, and go on the web and Google NIH community engagement. There is six years of work funded by the CTSAs in 60 locations in the country that have various levels of community engagement stuff that often say what I have heard you say maybe five times this morning. It seems to me that we could build a common understanding of where the country is around this community engagement stuff, and why this letter from the privacy committee is so important.

I am going to end with this. The discussion so far, Justine has reminded me of two things. One is that Obama video, after he was elected, but before he was in office and the economy collapsed. Our first thought was, can we get a recount? After that discussion, I wonder if I can reconsider.

DR. CARR: I want to then take it back. We talked about this letter and standards. Walter, you updated us, was there anything more in the work on standards?

DR. GREEN: Before you say that, can I say one more thing and then go to Walter and then I will quit. I will be done, I won't have to do anything else.

The other thing is I had this quick exchange with Matt at the break. He was telling me about a sergeant that he had a discussion with, about how they could put a nuclear weapon on a shell in a Howitzer and shoot it off to distances as far as 28 miles away. They developed a work plan for that. It was going to be all right because they are going to wear special suits.

Where we are with this theme of the communities of learning health system is that everything has changed, except the way we think about it. That is why we have got hard, difficult discussions to have, because the committee's role and position for these communities of learning health system, is to step into this space where there is a missing infrastructure. They don't even know what a data steward is. There is no place for the data steward to live. There is no one that will pay the data steward.

This is a momentous shift. What is happening at a community level around the country, the change is unleashed and it is rolling right along like crazy. There is urgency for this framework that this committee is calling out for. There is going to be trouble here. There is already trouble here. You want to predict that because it is already here.

We have serious work to do on this. I want to ask each of you, as members of the committee, to do two things. One is, help me help you. I am going to shift roles after this meeting. My number one job here is to help you get these themes explored. You can help me by doing your homework assignments, coming prepared. I am going to get Justine to write down and send me that list of things she blurted out really fast. We may pass that out every meeting, we may pass it out in the middle of every meeting. I am going to try to become a manager of the process. This is, I know, basically impossible. Please, I would ask you to just see me that way. I want you to know that I need your help to manage this group.

Secondly, I want you to know the following. I have come to admire each of you. Because of that, I am really quite confident. I can hardly wait to meet our new members, because I don't know them all. It is a great group, and never, ever think that I don't have respect for you. Never think that what you have to say won't matter to me. It does, it has and it will.

When you see me get frustrated, I will be frustrated because we seem to be inextricably stalled, going around in circles, making the same points again. I will try to unstick us. If I unstick us in a way that offends you, you will tell me and I will try to do damage repair and that sort of stuff.

I will quit being a strong advocate for the communities of learning health system in 30 seconds. I will become a strong advocate for this committee, making progress in all three themes. You will have to help me, and know that is my intense desire. I will do my homework, you do yours.

DR. CARR: Walter, was there anything you wanted to add?

DR. SUAREZ: Why do I get to follow all of these difficult? Just to build up on what you just said, Larry, I think standards is a reflection of a lot of the things that are happening in the market. It needs to be and it needs to support what is happening in the market.

As Larry said, a lot of things are changing and a lot of the things that we focus on in the standards world are changing significantly. The administrative transactions of the past are now being looked upon and saying, are they still the right ones. In light of all of the transformation of the health care system, the experience, should we be looking at that.

Since a lot of things are changing, and we haven't changed the way we think about them, I think that is what we, in the standards community, are going to begin to do, is change the way we think about things, based on the changes that we see are going to happen.

We are already talking about, for example, how we were asked to identify standards for attachments. The word attachments and the word claim attachments are probably a relic of this old view of how health care is being done. We are not thinking or going to begin to not think about that, that way, but more the importance of the need for information exchanges that are happening already, that are going to expand and are going to expand not just between providers, but in the content of the exchange itself and the substance.

Our task, I think, into the future, and we already started talking about it at the standards committee, is to really charter our course with that in mind, the changes that are coming into the future, how we need to really transform the way we see those, and how we need to think about the changes and the standards that need to support those transformations.

MS. GREENBERG: I want to just ask regarding the third theme. This morning, at least I came to the thinking that, our discussion this morning was very much around both the first theme and the third themes. We kept talking about convergence, we talked about all. My original suggestion that we spend maybe a half a day on the 15^th or whatever, trying to address that third theme, now I am thinking that it would be different, more theme one than theme three.

We could try to do some work with the chair and the relevant subcommittee chairs, et cetera, with Susan Kinon, to look at past work, look at related work, a short sort of environmental scan, prior to that meeting. Is that something that you are interested that we would at least poll for, to see how many people could stay over for a half day on the 15^th? Or maybe we reorganize the two days that we have. We still would want to, I think, have a half day for the working group, right?

DR. CARR: I really think the 8:00 to 10:00 timeframe today was terrific. I would frame it as committee time, working on these themes. It is not populations, it is not quality standards, it is everyone. I think continuing to create, within the time we are here, to work on those things. That is one part of it.

I think the other thing, a little bit of housekeeping, we need to identify who is on the executive subcommittee, because we have got co-chairs that become chairs, we have chairs that have become. You may want to do that offline.

MS. GREENBERG: We are in this transition period, but I think Sally and Larry, who are the co-chairs of population, except now in five hours or something, Larry will be the chair of the full committee. He can't be the co-chair of that. They have asked someone else to serve as the co-chair. Did you want to mention that?

MS. MILAM: I would like to let everyone know that Bruce Cohen will be the new co-chair of population health with me, and I am really excited to work with Bruce.

MS. GREENBERG: We have Paul, who is at least two people. At the same time, I think this is part of what the executive subcommittees. Obviously, Paul is on the executive subcommittee, Bruce, Larry, Sally and Ob and Walter, and Leslie and Linda, so I think that continues. I think we will need to have a call of that group. I think that is clear.

What we will have to discuss is, it is very possible that one of the new members would be a good co-chair for quality, except if quality is not going to have a separate agenda. That needs to be discussed. Obviously, at this point, we are not prepared to name a new co-chair, I assume, of quality.

I would suggest a call of the executive subcommittee as soon as possible. What I would hope is it could be in the next two weeks, because then I am going to be going on some international travel, and we come back and there is a meeting. We will poll for that, and with Susan, who will be involved, as well.

DR. SUAREZ: One quick object from the standards committee I forgot to mention. During the standards subcommittee meeting, we actually talked about, well, we are going to be having conference calls monthly, but convening a half day hearing on the 15^th. We thought the day before, that is Monday the 12^th, which is a holiday. The 15^th is the day that we were targeting for.

MS. GREENBERG: That is why the executive subcommittee needs to have a call, as I said, in the next two weeks. Talk about the November meeting, talk about the 15^th, maybe you want to use a half day, and this other activity be the other half day, or how we are going to structure.

I think right now, as I understand it, we have one action item for the November meeting, and that is a letter, the stewardship letter, yes. Do we have any other action items? If not, we can move as much of the full committee meeting as possible for some of this continuation of this morning, et cetera. Maybe then just seed the half day to you all. I would like to have that call, so that we can all agree on that.

DR. SUAREZ: The other one that we already started to plan is in February. It is not too early to do it. In February, we will need also at least a half day, if not a full day, hearing before the committee meeting. I think it is Wednesday, February 27.

DR. CARR: Let's take it offline, because I realize we have 30 minutes for lunch, and we have our speakers are calling in at 1:00. We really do need to be here in place, ready to listen at 1:00.

(Recess for lunch)

A F T E R N O O N S E S S I O N

Agenda Item: De-identification methods for Open House data

DR. CARR: Welcome to the afternoon session of NCVHS. Do we have our speakers on the line?

MS. GREENBERG: Do we have Jonathan and Khaled on the phone? We are just reconvening here from lunch.

DR. CARR: Well, let's bring this meeting to order. Thank you very much to our speakers, Jonathan Gluck and Khaled El Aman. We are very grateful for you making yourself available to us, on the topic of de-identification methods for open health data.

As you know, we have a working group now of the NCVHS that is focused on data access and use. This is a topic of particular interest to that group, as well as to the NCVHS in an ongoing fashion, going back to our report on secondary uses in four or five years ago.

I will open it up to you. Do we have slides or is there anything we need to follow this presentation?

MR. GLUCK: The first half of the presentation, there are no slides. For the second half, which is Khaled's, Khaled does have some slides.

DR. CARR: We will open it to you, Jonathon, thank you.

MR. GLUCK: Good afternoon. My name is Jonathon Gluck and I am a counselor for Heritage Provider Network. I also do other special projects for Heritage, such as manage the Heritage Health Prize.

Initially, I want to thank you for giving me the opportunity to speak to you today. I apologize for having to do this over the phone, but I just simply couldn't get away for two days.

I think it is important to start off by giving you some background into who we are, why we created the prize, and describe how the privacy issues drove many of the decisions we made about the structure of the prize. Khaled will get into a more detailed discussion of the de-identification methods that were used. I think it is very important to understand the business decisions behind the prize, and how they were impacted by the privacy issues.

To start with the brief description about Heritage Provider Network, Heritage is a fully integrated physician's network that was founded by Dr. Richard Merkin about 30 years ago. Heritage is spread throughout Southern California, from San Luis Obispo to the north and to the west, to San Diego in the south, Bakersfield and Palm Springs in the east.

In Southern California, we have approximately 35 physical clinic locations, which range from 100,000 square foot, almost mini hospitals, to small offices that might only have 10 doctors. We employ approximately 400 doctors at these locations, and then contract with an additional 3000 primary care doctors, 30,000 specialists and 100 hospitals to provide care to the members.

In the industry, we are kind of what's known as the clinical model with the wraparound IPA. We also have operations in Arizona, as well as the five boroughs and Long Island and New York. We have approximately 700,000 members for whom we care.

Heritage is a full-risk, fully capitated medical group. By full capitated I mean that we are fully at risk for both professional and hospital claims. As a full-risk group in California, we have a limited Knox-Keene license, which is the license required by the state to take hospital risk. It has far more stringent and tangible equity requirements, reserve requirements. For an average medical group, because the state wants to make sure that the licensee has the wherewithal to pay expensive hospital claims.

Because we are at risk for those hospital claims, controlling hospital costs and reducing unnecessary utilization of the hospital is critical. To that end, we have lots of programs which aim to reduce hospitalization. We have, for example, chronic disease case management, where we risk stratify our population to provide them different case management techniques, depending on the severity and type of the illness. Programs for diabetics, COPD and CHF patients.

We have pharmacists that will go to the home to do medication, reconciliation, post-discharge, home-visiting doctors that will visit a patient who can't get out of the house to get to the doctor, because we know that the alternative may be to dial 911. We want to prevent that unnecessary hospitalization. It can be prevented simply by a doctor going to the house. Then, we have 24/7, 365 nurse/doctor hotlines the patients can call, all in an effort to prevent hospitalization.

In addition to these programs, we wanted to see what we were missing and what other component could we create that might add to what we are already doing, and specifically to do something through the use of data that would allow us to find new ways to attack this ongoing problem of unnecessary utilization of the hospital.

Dr. Merkin, who is the founder of our group, is also on the board of X Prize. You are probably familiar with X Prize, they are the ones who created the Ansari X Prize which awarded $10 million to the first group that sent an individual 100 miles into space and returned them safely to earth.

We wanted to do a prize that involved health care. Dr. Merkin is a mathematician by training, and for a long time has believed that the use of data in health care has lagged somewhat behind some of the other industries, such as the tech industry and possibly the finance hedge fund industry.

We also wanted, through the use of a prize, to open up what we considered to be some of the best young minds in the country, to the possibilities that would exist in the health care field, that they may not realize. Typically, when we speak to these types of individuals, when I have spoken about the prize at Strata Conference or elsewhere, these individuals are really thinking about tech or finance or some other industries, and don't really think about healthcare. We wanted to use the prize to kind of open them up to the possibilities in health care.

We began discussing a prize to predict hospitalization, which we believe would solve a real world problem, and do so through the use of readily available data. Now, the goal behind predicting hospitalization is simple. We know that unnecessary hospital utilization in the United States is a $40 billion a year problem. We also know that you are not going to be able to prevent every hospitalization, nor should you.

We also know that among hospital visits, there will be many that can be prevented through the use of preventive care measures. Indeed, many of the types of measures we have used for a long time. We wanted to do a better job of identifying those members, who would benefit from the preventive measures. We began discussing the creation of a data prize to find these individuals, and predict these individuals who would benefit from the care protocols.

When we discussed the prize, there were really two critical components that stood out above everything else. Number one, we wanted to make sure that the prize was real world usable and it had real world results. We do a lot of analytics work today, just as does most of the other larger health care companies. We attempt to use the data to risk stratify the population, to decide who would benefit from which care management protocols.

This work, however, relies largely on physicians, using their years of experience to place the patients in the risk bands I discussed. We wanted to make sure that the winning algorithm would do a better job than simply the human beings that had previously been working on the problem, or would add additional knowledge that could then be used in the real world to give us better results.

The second, and I must say equally as critical a component as we were discussing this, was the need to make sure that the data was de-identified. First, we obviously had to be HIPAA-compliant. I mean, that was a no-brainer, it goes without saying. This obviously had to be HIPAA-compliant.

On a more mundane level, we could not take the risk that the data would be re-identified. This was one of the first times that such a large and detailed data set had been made available, generally online. As a for-profit business, even if the data was HIPAA-compliant, re-identification of the data would have been a public relations nightmare. That de-identification privacy issue was in our minds just as much as the real world usability issue was.

We quickly realized that doing the de-identification in-house was going to be challenging, to say the least. We are obviously HIPAA complaint, but we are not data de-identification experts. We asked around and quickly were led to Khaled, who came on-board to do the de-identification process on our data.

Now, as a full-risk medical group, we have claims data, encounter data, pharmacy data and lab data on our patient population. The original intent was to provide the competitors in the prize with the full data set, each of those components, to each of the competitors. We knew, in speaking with people who have run these types of prizes a number of times, the richer the data set you can provide, the better solution you are going to get. The more information you have to pull out, the weaker the solution is going to be.

However, after discussing with Khaled, it quickly became apparent that we were not going to be able to provide all of the data we had wanted to provide, without running into two greater risks for re-identification. Khaled is going to discuss the details and specifics as to what we had to pull back and why.

We had to make a number of revisions to what we intended to release, to ensure that the data remained de-identified. This did not allow us to provide as rich a data set as we had originally intended.

In addition to having to pull back data from release in order to assure that it was not re-identified, we also tried to create a strict legal structure around the release of the data, as we possibly could. We made entrants enter into what many of them considered, because many of the people we were dealing with think that data should be for everyone, and data that is released should be used as anyone wants.

They thought our legal structure was very onerous, but we made everyone agree to keep the data private, not attempt to re-identify the data, as well as certain other legal hurdles we made people jump over if they wanted to participate in the prize. I don't know if the legal structure we had to put around it deterred anyone from participating. However, we are very happy with the participation we did get. I could not say that there are people who did not participate because of the legal structure. We do believe that legal structure has somewhat acted as a deterrent, because we have not really got wind of people attempting re-identification such as was attempted, with certain other prior data prizes.

Finally, we also commissioned an adversarial attack on the data before release, to determine if we thought it was stringently enough de-identification. We hired an individual at Stanford University who had actually done the re-identification of the second Netflix prize data set, to see if they could re-identify the Heritage Health Prize data set.

The conclusion was it would be extremely difficult to re-identify the Heritage Health Prize data set. However, we also realized, through having to hold back some of the data that we wanted to release, that the solution we will ultimately get is not the optimal solution, or likely will not be the optimal solution.

Where we sit today in the competition with approximately eight months to go, we don't know how much more robust the solution we are hoping to get will be, from what we can already do with our doctors and our private protocol that we have used, to identify the population. We are hopeful. We have approximately seven months to go, but we simply don't know yet how much better this result is going to be from what we have always done in the past.

Now, speaking of the business, it is looking for a business solution to a real world problem. I wanted to leave you, before we get to Khaled, with the few key takeaways related to the de-identification issues. We have run other prizes before. This is the first data prize we have run, but we have run other prizes before.

Prizes in competitions derive their benefit from the numerous individuals, from all walks, that participate and try to solve the issue. We have realized, as others who have run prizes, you just don't know where your best solution is going to come from. This is why you don't want to hire five people and have them try to solve the problem. If we had done so here, they undoubtedly would not have done as well as the competitors in the prize competition.

This has been born out in the year and approximately four months, during which the prize has been ongoing. Most of the leading solutions have not been created by people who are working in the medical space. Indeed, they are mathematicians, they are hedge fund managers, they are people from all other types of industries who happen to have a gift for doing data work.

We certainly could not have hired all of them. Even if we could have afforded them, they probably would not have all wanted to come work for us. That is the benefit of doing a prize model.

On the other hand, data prizes require the release of data sets. There has to be a way to balance the individual's privacy interest with the greater good to society that would be achieved by solving somebody's bigger problems through crowd sourcing and prizes. Clearly, the problem we are trying to solve is a large one.

We spend $40 billion a year, as I mentioned, on unnecessary hospital utilization in the United States, and we have a health care crisis on our hands. The general use that we are trying to make of the data have larger implications in the general attempt to move from a post disease provision and care model to a pre-disease prediction, prevention and cure model.

I am going to have to, however, leave it to people smarter than myself to figure out where the balance between the two lie. Thank you for letting me address you today, and now I will turn it over to Khaled, who is going to discuss much more specifically the de-identification of the Heritage data set.

MR. EL AMAN: Thank you, Jonathan. I have sent you some slides. I am not sure if they made it, but I will still talk to the main points on the slide.

MS. QUEEN: Khaled, this is Susan. I just sent you an email a few minutes ago about a different person to send the slides to. Have you received that?

MR. EL AMAN: I am Khaled El Aman. I am the CEO of the company, focusing on data de-identification, which was contracted by Heritage Provider Network, to do the data de-identification work.

I will give you an overview of the technical issues that we faced while doing de-identification for the Heritage prize. I am just going to start off with a number of general observations. First of all, we use, at the time which was around 18 months ago, we use best de-identification practices that were available at that point in time. There have been had been a number of improvements in methodologies, metrics and algorithms that we have developed and others have. This is a very active area of research. I think more can be done, if we were to start again. I will discuss some of the improvements as we go along.

The other point I would like to make is that re-identification attacks are hard to do. They take a lot of skills and resources to do successfully. This should be also kept in mind. We did a review of publically mandated attacks that was published in Plus One at the end of last year. There were 13 attacks, six of them on health data, but two of them were on data sets that were properly de-identified. We used the HIPAA standard as the basis for definition of properly de-identified.

In these two cases, the hit rate was quite low. I think a lot of the conclusions were if you de-identify a data set properly, using contemporary standards, the probability of a successful re-identification attack is low. The stories we hear about people re-identifying data sets stem largely from the fact that these data sets were de-identification properly when they were released. The systematic review, I think, makes that point clearer when you look at all of the evidence on one page.

This next point is about reasonableness criteria which, is the way this issue of address and HIPAA. I am going to read from the regs the definition of identifiable health information. Health information that does not identify an individual, and with respect to which there is no reasonable basis to believe with information can be used to identify an individual, is not individually identifiable health information.

There is the no reasonable basis term in the privacy rule. Also, it requires that the risk is very small and that the information could be used alone or in combination with other reasonably available information. Again, the reasonableness criterion is used in the privacy rule. Here, I am talking about the typical standards for de-identification in HIPAA. We are not striving for perfection; we are striving for something that would pass the reasonableness test. Of course, we have to figure out what very small risk means.

I will describe how we approach that here. In terms of the data sets, so we started off with original longitudinal data sets that had information on 175,000 patients or members, over a three-year period. That data set included claims, to have diagnosis and procedures, as well. We have drug data and a lot of information. We had the three domains in the original data set.

What we ended up releasing was a three-year longitudinal data set with data on 113,000 patients. It was a subsample from the original data set. It included the claims data on some drug information, and no lab information. A decision was made not to release lab information as it truncated drug information. Again, I will describe the reasoning behind that, as we go through this.

Another important point is that there is a lot of missing variables in the original data set, for example, length of stay. This was normal for data sets that come out of clinical information systems. If you do look at the Heritage Health Prize data and you notice a lot of the missing data was missing in the original data, as well. It was not necessarily a function of the de-identification.

The data set has information on some basic demographics, information about the specialty of the provider, place of service, CPT codes, ICD-9 codes, length of stay, then pseudonyms for provider and vendor and the information about the payment. Then, the drug also had information about the number of drugs dispensed to the patient over a certain period of time.

In terms of the technical issues that addressed, the first issue is what was the definition of very small, according to the statistical method and the HIPAA privacy rule? We chose a probability of 0.05 for de-identification of a single record. A maximum of 0.05 for the de-identification of a single record, and that was our definition of very small. The reason we chose that was we erred on the conservative side.

I think throughout the whole project, there was a general sense that it was necessary to err on the conservative side, because of the volume of data, visibility of the competition and also the potential consequences of a successful re-identification should it happen. We used the maximum probability of 0.05. That is a little bit higher than the threshold that was used by CMS recently to release their claims data. They used a maximum probability of 0.1, so we are more conservative than them. One of the reasons was that the longitudinal nature of the data, but also the details of the data was more detailed NBCMS data that was released.

That a .05 threshold was consistent with other public releases of data, so we are involved with other agencies that use data publically and they use the probability of .05. It is not completely inconsistent. However, it is more lower than the more recent CMS data release over claims data sets.

Also, the fact that we use the maximum probability, rather than the average probability is important. Again, that is erring on the conservative side. It meant that we took the worst case scenario and upgraded on that, while it minimized the risk on the worst case scenario, rather than looking at the average risk across all of the records and the data set.

One of the other factors we looked at, we looked at two types of attacks. One was an adversary who may know a member of HPN and will try to re-identify that member. It could be a nosey neighbor scenario or it could be a famous person who is an HPN member. Then you maybe have a member of the press, for example, trying to re-identify them.

The second type of attack we looked at was matching against external databases. The two databases we considered were the voter registration list for California, and the state and patient database for California as well for the three years that were covered by the data set. We did some matching experiments with the state.

We did some estimations and simulations for the voter registration, using Census data to estimate the probability of a successful match with those databases as a potential attack. This was strictly speaking, it was not really necessary because when we managed the risk of a single record being re-identified, we can show mathematically that that manages the risk from matching with the other databases. We did it anyway, just for the sake of completeness, and to see how much buffer we had. We want to leave a bit of contingency with a 0.5 threshold for that data.

Also, at the outside, we removed the patients that had sensitive diagnoses. The NCVHS actually has published a report on definitions of sensitive information, so the definitions we used were certainly consistent with that, plus the previous work on rare and visible ICD9 codes. We also used common sense, of course. The paper that we published, we just list those diagnoses and procedures and types of visits that were removed.

So in theory the de-identification we did would have reduced the risks for those individuals, those individuals with sensitive information. These are members with HIV, substance abuse, certain types of mental health diagnoses and so on. De-identification would have been principally protected those individuals. We were concerned about experiences.

The data set was so rich, and as we know, health medical records tend to have multiple domains that are strongly correlated with each other. The concern was if we remove only certain pieces of information, we would also have to test that that information could not be inferred from other information we were disclosing or releasing.

For all of the different types of sensitive information that would have been, at the time, not possible, so the decision was just to remove those individuals from the data set. This is consistent also with practices from other agencies that release data. They would just remove those with the most sensitive information.

The matching experiments we did with the voter registration list on the state and patient database shows that for building a successful match for any of the three years, the highest was 1.7 for the certain combinations of variables, age, length of stay, sex, condition groupings, procedure codes, CTP codes. The hit rate was lower than our threshold for all individual years and combined years. Also, the numbers were very small for the estimated match, should someone try to match with the voter registration list.

Now coming back to the correlation issue, because I think that is quite important. We are concerned that if we try to reduce some of the details in the claims data, and also provide drug information and lab information, that our adversary can use the drug and/or lab information to predict the diagnosis information.

We did a number of experiments with pharmacists, where we wanted to get a number. We know that if you give the pharmacist the drug information, and ask them to essentially reverse engineer the diagnosis, we know they can do this. The question was, how accurately can this be done?

On this particular data set, because if the accuracy was low, then that inference channel would not be a concern for us. If it was very high, then of course it would be a concern for us. We did a number of experiments or empirical studies with pharmacists, where we gave them incomplete medical records, and asked them fill in the gaps.

We found that the success rate varied from about 30 percent to 60, 65 percent, depending on the level of detail we ask them to predict. We didn't find much of an experience effect. We felt that that rate was sufficiently high, and that was a driver for curtailing the amount of drug information that was disclosed in this data set. We have to get out the claims data, but we didn't want the adversary to use the drug data to enhance or create more information in the claims data set that will increase our risk level with these inference channels.

Then, for the same reasoning, we recommended that we don't release the lab data. For the lab data, we actually built a number of models. Also for the drug data, there was a number of machine-learning models that would predict one domain from the other. It turns out that even simple models, such as naïve days(?) had a remarkably high accuracy for predicting values that we tried to generalize or suppress.

Again, the more information you have, the drug and lab, the models were very accurate. If you only had drug, predictive diagnosis or procedure codes, they were less accurate, but still very high F scores. That was some of the decisions around how much drug data to release and not releasing the lab data.

The other issue was the number of claims. Some members had a very large number of claims. They really do stand out. We removed or truncated claims, to essentially cut off the tail of that distribution, because some of the members were really extreme outlies just by the number of things that they had. If you had some basic demographics like age and gender, and then look at just the number of claims, they were quite unique.

We developed those methods to do this claim truncation, so that these individuals would not stand out as extreme outliers. They represent so that only the claims that were most unique in the data that were truncated, so that way, we wanted to minimize the impact on the data set. The argument being the extreme outliers would, in many data analyses they would be removed anyway. We would try to be as careful as possible to minimize the number of claims truncated, but also focus on the ones that were really outliers of all the variables that I mentioned.

Another important concept was that of adversary power. When you have a patient with a hundred claims, normal risk assessments would say, at least historically, I have an adversary who would know what is in these hundred claims. That is quite an implausible assumption. Nobody would know what is in a hundred claims about anybody. Even the patients themselves don't have that much detail about themselves. With each claim having about seven or eight variables, that is 700 or 800 pieces of information. It just didn't seem plausible.

There is the concept of the adversary power, where we assume that an adversary would only have information about a limited number of claims. For example, if we say an adversary would have background information they can use for re-identification on five claims, then we can assess the risk on that basis. That will, of course, have a dramatic impact on risk, without the release of more data, using a very plausible, a very reasonable assumption.

The other problem is if you have 50 or 100 claims, which combination of five claims do you assess? It is a combinatorial problem. We have various methods to assess risk, taking into account this adversary power concept. This is a concept that has existed in the computational disclosure control for some years, not applied to longitudinal health data, but I think it provides a reasonable way to evaluate risk for data sets, where you have multiple instances -- they have transactional type data sets, claims data, visits data, so on.

Another important concept was that of patient diversity. Some patients who have chronic conditions, if you know the information in one or two claims, you can predict the remainder, or a lot of the remaining claims throughout the rest of the year or subsequent years. A good example would be a patient receiving dialysis. It is a recurring pattern for those patients. If you know part of that pattern, you can fill in the gaps.

Then you have patients where they have a lot more diversity in their claims. They have a number of acute incidents that are not directly related to each other. Then in computing what an adversary would know, we took that into account. The kidney dialysis patient, for example, if an adversary knew one thing, they could predict a long trail of claims for that patient. Whereas the second type with high diversity, the information content in each claim was smaller because you can't use it to predict other claims. The claims were very diverse and they didn't have a pattern.

We developed a number of diversity metrics and used those also, to determine how much power or deciding how to compute the adversary power we considered the diversity of each patient, as well. Patients that had low diversity, we would give the adversary a lot of power. Those with high diversity, we give them less power. That way, we essentially tried to customize or adjust the risk for each single patient, so that we can release more data in a defensible way.

For generalized suppression, we used an algorithm called optimal lateral(?) optimization, or OLA. This is a globally optimal generalization suppression method that we had developed a few years prior, and we are using that for this data set. I think the article references some material on that.

As I mentioned before, we subsampled the data sets to add another subsampling, of course, it increases the uncertainty and works with the risk metrics. We released 113 out of the 175, again to allow for a little bit of buffer, in case any of our assumptions were violated in the future. We assumed a power of five, but what if there was an adversary with a power of 10. Would that increase the risk beyond that threshold? We did some (inaudible) analysis and we found that our assumptions would have to be violated quite a bit to have the risk above that .05 threshold. We needed that buffer, which was achieved partially through the subsampling, in order to maintain this insensitivity.

Then, the final thing, in terms of data modification is that we linked them(?) to protect provider identity, because we had provider IDs, we had also vendor IDs and information about place of service and so on. The adversarial attack that Jonathon had mentioned identified using information about provider IDs and so on, in order to draw some inferences.

This is not necessarily a privacy issue, per se, or patient privacy issue, per se. It was more on the provider confidentiality. It was possible to figure out the identity of a provider from the information here, by looking at the pattern of patients that an individual was seeing.

You can determine, to some extent, which facility people were looking in, by looking at the age of the patients and how many patients go there per year, if it is pediatric or adult. If it is pediatric, you can look at how many visits to determine which is the bigger facility and which is the smaller facility. There are paths that you can walk down, where you can draw inferences about the provider. We made some additional modifications to thwart such attempts, from the perspective of protecting provider confidentiality.

For the de-identification methods that we use, in terms of lessons learned, where would we be now if we were to start to do this again? I think we would look more at the average risk, rather than maximum risk. I think a good case can be made that this can be a reasonable compromise between data quality and the protection of privacy.

In general, I don't think it is necessary to do these matching experiments because by managing the risk from an individual being re-identified, you essentially manage the risk from matching with the external databases. It will always be low. In principle, thinking about these matching experiments may not be necessary for future de-identification efforts.

The issue of correlations within the data sets and what it in the data set, especially across domains, diagnoses, procedures, et cetera, it is complex and requires careful consideration, especially if you are trying to release a very detailed data set.

Then in terms of improvements in algorithms and so on, an active area of work has been improve claim truncation(?), algorithms to compute adversary power. There have been a lot of advances in that over the last 18 months, which can result in more data to be released, just because the optimizations are much more effective compared to that.

I think that that would be it. These are all of my comments. That gives us some time to answer some questions, thank you.

DR. CARR: This is really fascinating, a very fascinating analysis. I will open it up to questions. I believe Leslie Francis has the first question.

DR. FRANCIS: I have a very simple set of questions for Jonathon. One is, would you be willing to share a copy of the contract that you ask everybody to sign, or at least some parts of it, with us. The other is, do you have any way of following up, so that if person number one gets a data set, and somehow they were to use it to re-identify, could you figure out that it was the release of the data to person one, rather than to person 53, that had been the source of the data breach, or the source of the effort to re-identify?

MR. GLUCK: With respect to your first question, I would be happy to share with you a copy of the contract. Again, we had outside counsel who works specifically on -- I did not even know that such a practice existed, but they do. Prize, rules and contracts for companies, such as McDonald's when they do their buy your Big Mac, I guess, and get your game card thing. I would be happy to share the contract with you.

If someone, for example, downloaded the data, and then shared it ten ways down the line, I don't think we would be able to identify necessarily who that individual who breached the agreement was. Khaled, do you think differently? I don't know that we would be able to do that.

MR. EL AMAN: We did have a discussion as we were doing this of watermarking the data sets, so that if a breach occurred from this invisible watermark, you would be able to determine which account downloaded that version of the data set. It was deemed to be quite complex because it would have generated such variations in the data set each time. The watermark would have to be embedded within the data. You have different versions of the data set that would have to be generated dynamically, and it was a very large data set.

Then, the second issue, of course, was concern that a different entrants get different data sets, would that be a fair competition.

MR. GLUCK: One of our concerns throughout this entire competition was given the magnitude of the final prize, we had to be very careful we didn't wind up with anyone suing us because, like Khaled said, someone got a slightly different data set, which they thought prejudiced their ability to win.

If someone could send me the information on where I should send that contract, I should be able to get that to you.

MR. SCANLON: Jonathan and Khaled, thanks again. This is very interesting. It sounds like the model you are describing is not an open health data set in the sense of you just put it out there and de-identified It sounds like it is more for restricted use, where you sort of chose who you would release it to under the protection of a data use and contract agreement. This is not something you would simply put out.

MR. GLUCK: I think it is kind of a hybrid, because while it is limited to people who have agreed to sign up and abide by the rules of the competition, I believe that clearly we are over 5,000 competitors. It is not like we chose or handpicked people who could compete. As long as they were willing to agree to the rules, and they didn't live in certain countries which were excluded, they were able to download the data.

We tried to put some very, very broad guardrails around it, but generally, it is pretty open.

DR. GREEN: You may have covered this, and I just didn't digest it. I am interested in what part of an address, if any part, of the individuals whose data are in the data set is included in the released data set.

MR. EL AMAN: There is no address information. There is no ZIP Code information included. I think it was made at the very outset, not include the code information. If someone really tried very hard, they may be able to infer the facility where treatment was received, by looking at the size of the facilities, and just focusing on the large and the smallest. I think that would still be hard to do. In terms of geography, there was little there.

DR. COHEN: Many of the open data sets that we work with use county as the lowest level of geographic identifier. How sensitive do you think your de-identification method would be, with the inclusion of county as an identifier?

MR. GLUCK: I would like to begin the answer and then Khaled can follow-up on my answer. One of the issues for us specifically is that we are in 11 counties throughout Southern California, some of which have much more sparse population. Including counties, together with certain conditions, might get us too close to be able to re-identify. If we were talking, for example, only about L.A. and Orange County, I don't think it would be a big deal. We were including in our data set individuals who lived in these much more sparsely populated counties. Khaled, do you want to follow up on that?

MR. EL AMAN: I think also it is an empirical question. We could have added the county variable, and then done a risk assessment when we measured the de-identification. We would have been able to measure the risk with county information included, and then, determine whether that was a problem or not. I think, for this particular data set or any particular data set, the exact answer would require including the variable and measuring the risk on that data.

DR. GREEN: In the rules of the game, where the contestants allowed to make use of any other data set they wanted to, besides yours?

MR. GLUCK: They were allowed to use certain publically available data sets. If they used a data set, it had to be something that everyone could use.

DR. GREEN: They could go to Healthdata.gov and use any data that they found there in the contest?

MR. GLUCK: As long as it was publically available. One of the things, and Khaled, I don't know if you covered it completely or if you want to address it, one of the things we did actually have to take into account as we were doing the de-identification, this was something that actually arose somewhat at the last minute, was the realization that, by cross-referencing a different data set with ours, it would have upped the re-identification risks and having to account for that. Yes, if it was a publically available data set, people could use it.

MR. EL AMAN: As I mentioned, we included the explicit analysis, the state and patient database for California, that covers the hospital discharges, and data matching experiment with a few years' worth of data for that, and also looked up voter registration lists. The age data or the OECD data covered a lot of the variables in the claims that included diagnosis procedures, demographics, length of stay and so on. That was a good big to match against, because a lot of the fields that were included in the prize data sets were matchable to that discharge data. It gave us a good sense of what the risks were. Those results were also taken into account and the de-identification.

MR. SCANLON: That is the question that I think I was interested in. While there are folks who tell us that from the motor registration list and the voter registration lists, and they are publically available health care data sets. They can often re-identify. You actually did this in-house to see what was the probability and likelihood that re-identification could take place.

MR. EL AMAN: Right, for the voter registration list, we estimated it. We used a number of estimators to compute, with the help of some consensus data, what the match rate would be if we got the voter registration. In Southern California, you are not allowed to get the voter registration list for purpose unrelated to an election. We couldn't use it for a re-identification. We couldn't get it and use it for that purpose.

We were able to estimate the risk, and the match rate was quite low. Then, we obtained the data and did actual matching experiments for the three years with the possible discharge data. Again, I think that the metrics that we used would have anticipated the results of those matching experiments.

When we managed the maximum probability of verifying a single record, if we ensure that that probability is 0.05, then the proportional success for the matched records would also be less than 5 percent, any database that overlaps with this data set. I think matching experiments are good for assurance, but in terms of them revealing something completely surprising after you managed the original type of risk, would be unlikely.

To answer your question, if you do this well, the matching with these external databases would not be an issue.

MR. SCANLON: Depending on the geography, I guess and the detail. It sounds like you curtailed the information on diagnosis and dates of service and procedures, or did you not have to?

MR. EL AMAN: We did, to some extent, yes.

MR. SCANLON: That is traditional.

DR. CARR: This is Justine Carr, maybe I will make the last question. Do we glean from this that there is a generalizable application out of this? Are we to learn from this that, at the end of the day, labs are dicey and ZIP Codes are dicey? Or are we to learn that, given your own data set, you have to put it through these maneuvers to come to your own measurement of de-identification?

MR. EL AMAN: I think there is a general process that you have to follow, because the answer will depend on the data set. We have tried to spell out the steps in the article. I also covered them in the presentations. If you think through all of these issues, then I think you can produce a data set where you can maintain good utility, and then also have strong guarantees at the end.

MR. SCANLON: The contest is still under way, right?

DR. CARR: Wouldn't that be the measure of the utility of the data set?

DR. FRANCIS: Do you plan to recapture the data after the ending of the prize? The reason I ask that is that data sets that are available now may not be the same in five years.

MR. GLUCK: I am not sure what you mean by recapture the data set.

DR. FRANCIS: Is one of your requirements that people give it back at the end of the prize time period, without retaining copies or having sent copies to anyone else? The reason I am interested in that is that if the data hang around for 15 years, and the other kind of available data sets that are available, the landscape of what is reasonably anticipatable, that somebody might get ahold of, has really changed.

MR. GLUCK: The rules require that the data only be used for purposes of the prize. It is not to be used for anything else without special permission. We have actually had a few research institutions who were unable to get other data, specifically ask for the ability to use it for other research purpose. We have typically granted those, if they are reputable.

No one is allowed, under the rules, to use it for any other purposes. Because it is data and they have downloaded it onto their computer, at some level we are going to have to, I guess, trust people that they are not going to use it for other purposes. I don't know that a requirement that it be returned or destroyed would add that much to the requirement that it not be used for any other purposes.

DR. CARR: Your work is very stimulating, and as by evidence by that, we have two more last questions.

MR. QUINN: This is Matt Quinn. This seems like something, de-identification validation, and re-identification ability, seems like something that NISS could provide technical guidance towards, if they haven't already. My takeaway is that, as opposed to everybody reinventing this for every contest and everything, that guidance does exist today. I will talk to Kevin Stein and Matt Shoal(?) at NISS to see that. It seems like a great joint project with HHS and NISS.

MR. SOONTHORNSIMA: You talked about a lesson learned in terms of balancing the trade-offs, because along the way, you talked about truncating data, claims data, promissory data and so forth. The richness of data, because of re-identification risk, therefore, you start taking away pieces of information, pieces of data. Therefore, the richness of data and the ability to stratify its more useful purpose may have diminished as a result of that. I guess, what is your reaction to that comment?

MR. GLUCK: I would agree. I think again, that is why the balance has to be struck. I am not sure where I am sitting now, I guess about my job, that is a difficult question. I have to leave it to policymakers to figure out where that balance should be. I agree wholeheartedly with the comment.

DR. CARR: Thank you very, very much. We really appreciate you taking the time and very exciting, very thoughtful work, and we are looking forward to see who wins the prize. With that, I believe it is almost time to conclude the full committee meeting. Before we do, I want to again express my gratitude to all of the people that I have worked with, as chair of the committee, particularly obviously Jim and Marjorie taught me so much, and our incredible staff, Catherine and Debbie, Marietta, Janine and Nicole, Susan, and also to John and Shanda for helping us with our acoustics. Of course, to the very able staff, to Matt, Lorraine, Maya, all of you, it has been really my privilege to serve as chair of this committee. With that, I will entertain a motion to adjourn.

(Whereupon, at 2:02 p.m., the meeting was adjourned.)