Transcript of the September 21, 2012 NCVHS Working Group on Data Access and Use Meeting

[This Transcript is Unedited]

DEPARTMENT OF HEALTH AND HUMAN SERVICES

Meeting of
The Working Group on Data Access and Use

September 21, 2012

Hubert H. Humphrey Building
200 Independence Ave., SW
Washington, D.C.

Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030

Introductions - Review Agenda
HHS Chief Technology Officer Perspectives
Overview of NCHS Data Products and Services
Overview of CMS Data Products and Services
HHS Approaches to Dissemination

P R O C E E D I N G S (2:07 p.m.)

Agenda Item: Introductions -- Review Agenda

DR. CARR: We now open the Data Access and Use Working Group. Keeping with our tradition, we want to start on time and move forward efficiently. What we would like to do is go around the room.

(Intro around table)

DR. CARR: Is there anybody who called in on the line?

Agenda Item: HHS Chief Technology Officer Perspectives

MR. SIVAK: Thank you very much for having me today. I just wanted to say, first of all, that I have been hearing some stories all day long about some crazy barbecue that happened last night. All I have to say is that I want an invite next time, so put me on the list.

(Laughter)

MR. SIVAK: Again, thank you for having me. I unfortunately have to keep it relatively short as a meeting with the deputy secretary at 2:30 today, that I need to be at. What I did want to do today is kind of go through a brief overview of what we are doing with Healthdata.gov, kind of what some of the directions we are heading in are, and kind of get a sense from you guys about how that fits in with the work that you are doing, suggestions, comments, thoughts, all of that kind of stuff.

I will just dive right in. As I am assuming most of you guys know, Healthdata.gov is sort of the version two of the platform that was launched with Data.gov, sort of way back in the day. We launched Healthdata.gov in June of this year, coincidentally with the third edition of the Health Data Initiative Forum, or Datapalooza as it is affectionately known.

Actually, since about a month or six weeks ago, we have been working on sort of phase two of version two. We are developing this with an agile team, using two week sprints to build new features and things like that. We are going to be releasing this version, this phase, on October 8^th, next month. There are going to be a number of primarily back-end improvements to the system. I think the one that is most interesting for the folks here is that we are going to be improving the publishing process and the workflow associated with that, for the metadata associated with the datasets that people put out.

I think a message went out, actually, I know a message went out relatively recently to all of the health data leads within HHS. We are basically creating categories of folks who can go and author metadata entries and then edit metadata entries. It is actually kind of going back to the discussion that was just happening, it is relevant in many ways to that. We need to make sure that the datasets that we are putting out, now that we sort of have the control of Healthdata.gov and what datasets we put out there, we need to make sure that the datasets are both of high quality and sort of conform to the issues of the mosaic effect and re-identification and some of those challenges.

It is actually very important, but I think the new workflow that we are putting in place here will allow us to do that much more effectively. That is just one of the improvements. Keep that in mind.

For future iterations of Healthdata.gov, I am personally very interested in the feedback of both internal and external users, because I think we should use that to guide our development. Today might not be the right forum for this, but I would love to get input from you guys about the direction that you think this thing should take.

I really consider this a marketing tool for us. We have thousands of datasets across this agency or across the department that I think are incredibly valuable. In fact, it was funny, I got an email from somebody yesterday who had gotten the email about the new Healthdata.gov process, and said something to me about a couple of datasets that they have that they don't think are really that important. They are not really sure if it matters.

I looked at it and it is great data. It is data that you can think of a million different use cases for. I think there is an education process still that we need to go through internally. This site is primarily a tool for us to get this information out. The more usable and user friendly it is, the better, the better the descriptions of the data are, et cetera.

Really, I think over the course of the last few years, this has been, as you guys all know, a very intense, ongoing effort. These are some of the high level things that have happened with Healthdata.gov over the past really two and a half years, but primarily over the past, say, six to eight months as we have kind of built this out.

There is the metadata enhancement, or the publishing workflow efforts that we have been undergoing. If you guys have noticed, there's a blog on Healthdata.gov, which we've used pretty extensively to start to publish some good material on how data is being used, what some of these datasets are. I am a big fan of stories, so if any of you guys have stories that you want to promote, things that you want to tell, I think this is the way to engage people. We are happy to use this as a platform for anybody who wants to promote these things.

I should also say that we are looking at ways of enhancing the design. The look and feel, I think, can be brought up to data a little bit. I don't want to put anything negative on the team, because I don't think any of us are really designers. We kind of put this together, without having the expert services of somebody who is very familiar with design, and look and feel and user experience.

We are looking at ways now of incorporating that into future versions. We might even run a design challenge on Healthdata.gov, which I think would be kind of fun. That is all to say, marketing tool, if you guys have stories, bring them on. We are happy to use the platform to kind of publish those things.

I think a big step forward was providing programmatic access to the metadata catalog through APIs. That is really a first step. I think the real holy grail is providing an API for all of the data that we have, and that is obviously a much tougher lift, but something that is obviously right in line with the digital strategy, and everything that the entire federal government is kind of moving towards. I think we are seen, rightly so, as leaders in this space, and so we can continue this by moving aggressively down this path.

This next bullet point is a bit of a misnomer. One of the things that we are working on doing is enabling the ability to actually store data within Healthdata.gov. Right now, it is just a metadata catalog. That is great for the larger agencies and the folks who have some resources to actually host their datasets. There are lots and lots and lots of examples of smaller entities in HHS that don't have this ability. We think that it would be very useful to be able to provide them a place to maintain some of their content and information.

At the same time, using features of the platform to actually provide these APIs automatically on the data. This is something that we are working towards. With this new release, we will have at least the sort of basics of that capability. That is something to kind of keep in mind, going forward. If it is a heavy lift to create an API for a specific dataset, it might even be easier to bring that dataset into Healthdata.gov, and then use the tool itself to provide the API and the functionality that we need.

Then, finally, we are big fans of open source. It is the whole reason that we are building this on Secant and Drupal and SOLR. We are working now towards releasing the entire code base as a project under GET(?) in the next few weeks. We want to encourage development of this. Much like the data, we don't want to lock up the code within the walls of HHS, so we are trying to release this and get this out there as much as we possibly can.

One thing that has really kind of struck me about the whole data movement over the past three years, and I have been working on this stuff since I was with the DC government a few years ago, is that we have had this big push to get data out. We haven't done a ton in terms of helping people understand what the data is all about.

I did this before, I will do this here because I think it is kind of fun. Just as a really simple example of a dataset that we have, that I think is very valuable, but it is not accessible to normal people, is one called the National Plan and Provider Enumeration System Downloadable File. You guys all know what this is? I figured everybody here would know what that is.

Pretend that you don't know what it is for a second, and listen to the first sentence of the description of this dataset. The National Plan and Provider Enumeration System, NPPES, Downloadable File, also referred to as the NPI Downloadable File, contains FOIA-disclosable NPPES health care provider data for health care providers who have been assigned national provider identifiers, or NPIs.

I will admit, I read that and actually one of my guys that you all probably know, Arnum Chatagy(?), was showing me how we had included this dataset as part of our health data initiative starter kit, and how it is a great dataset. I read the description and I was like, I have no idea what that means. I said, what is that, and he said, oh, well, it's not totally complete, but it's pretty much a list of all of the doctors in the United States. Why didn't we say that? That's a useful dataset, right?

Every time I have said this to an audience that is not primarily composed of subject matter experts, their reaction is, wow, that is useful. We could use a list of all of the doctors in the United States, along with their location and specialty and all of that kind of stuff.

To me, that's a perfect example of some of the work we need to do, in terms of really broadening the tent and talking to a much wider group of people. I think there is a lot of talent out there that we can take advantage of, that we are missing right now. One of the efforts that we are going to be working on is, for lack of a better term at the moment, call it data education. We want to basically use the skills that we have internally, to translate things and make them available to people on a broader basis.

That is kind of one part of that. Another part of that is we have got people here who have been working on these datasets for a long, long time. Up until now, they really haven't had much contact, in some cases, with the people who actually could use the data, or maybe who aren't using the data, but would like to. I think that there is some value in that two-way conversation, both from the perspective of the users of the data, who might not know what it means?

Right, I mean, if we could have the person who creates the NPPS or maintains it say, hey, it is a list of all of the doctors, and here is how we collect it and here is how you use it. That would be helpful to people. At the same time, I think it would be helpful to the people inside, who are working on this stuff, to see how people on the outside want to use it, how they can use it, how they are using it. I think that will help influence that we develop things going forward. We are making an effort now to connect people, both virtually online, but also in person, and I will talk about that in a second, to kind of help spur some of these developments.

Finally, something else that really struck me as being a little strange was that we have our Healthdata.gov project, which is based on Drupal and Secant and SOLR, going on. Simultaneously, as the other federal government entities have this thing called the OGPL going on, which is based on Secant and Drupal and SOLR. We are both doing it because they are open source projects, and because we want to be able to kind of control the development of these things, or at least control what features and functionality we think are important, we put our resources to. We want to own the code. We want to build communities around this stuff.

Right now, we are building two separate communities, which really doesn't make a lot of sense to me, because it is on the same stuff. We have recently started a process to get everybody together and start working on the same code base, to kind of put these projects together. Obviously, we will have different features and functionality that we prioritize, but that is fine. That is how open source works, right?

I just feel like if we put everybody together and get a single code base going, that is this project, we will have much, much greater buy-in from the rest of the world. We will have a virtually unlimited pool of developers to tap into, that hopefully can expand the scope of this rapidly. So far, this process has actually been working pretty well.

It looks like there is a lot of international interest from the folks in India and the U.K. and Canada. The U.S.'s been really psyched about this. The guys in the Healthdata.gov team are pretty excited about this, so I think this can go somewhere. For any of you guys who want to pitch in on this effort, I think it would be great. The more support we can show for a unified band of people, kind of working together on this thing, would be really helpful.

Let's see, I mentioned the memo that we sent out to the health data leads already. If anybody has any questions about that, feel free to come talk to us at any time. The basic timeline is pretty straightforward. We want the leads to identify their internal teams, the authors and the editors, by October 1^st. We want those leads to then compile a catalog of just what datasets that we are talking about by the 10^th.

Then, we are going to do a training web to kind of walk through how the system actually works and what the new functionality looks like, on the 10^th, as well. Then, we will hold office hours until December 1^st, to kind of answer any questions that come up. This is the general idea, that we want to get across.

Now, in that webinar on the 10^th, one of the things we are going to be doing is refreshing people's familiarity with the privacy issues and the data quality issues. I think that previous discussion, a lot of that stuff is probably relevant. It is probably worth getting our guys in touch with some of the folks here, before that happens, in order to make sure we have the current thinking kind of in place.

I think the other thing I should say is that over the past few years, we have spent a lot of time focusing on the idea of changing the culture at HHS from this default setting of closed data, to a default setting of open, with thinking about what to do after, as opposed to a blanket restriction and then trying to open it. I think that has been remarkably successful. I just showed up here a couple of months ago, and I can tell you, just based on my conversations, it is almost a presumption of openness now among most people. I think we have done a great job there.

There are sort of pockets of resistance, I would say, within the department that we still want to talk to and get them sort of onboard with this whole thing. We are undertaking this process of meeting with the data owners at the agencies, with everybody at pretty much every level, to kind of again explain the importance of this, walk them through what we have done over the past few years, talk about some of the challenges, some of the successes, make sure they understand that this isn't really a terrible thing and there are lots of good things that can happen from it. Then, also, get feedback from them about what some of their issues are, how we can help resolve them, what we need to do in order to make this work better.

We started with the Administration for Children and Families. We met with them in July. That meeting actually led directly to a whole bunch of data being released just a few weeks ago, which is incredibly useful actually for many, many, many purposes. As part of this effort of getting the people who know the data together with the people who can use the data, we partnered up with some folks at the Greater Baltimore Technology Council, who are running a set of events called Unwired and then Groundwork.

Unwired was the first one, and the idea was to basically define the problems that existed in the city of Baltimore, figure out what assets were available to solve those problems, have everybody form into teams around specific projects designed to solve those problems. Then, the Groundwork event, which happens next week on the 28^th and 29^th, is to actually build those solutions, with the time being spent in those intervening weeks to actually kind of add more assets to the projects and things like that.

It turns out a lot of the ACF data is useful for some of those projects. We had people at the first event, at Unwired, from ACF and from SAMHSA, who were there to actually provide their expertise. They are going to come back to the events on Groundwork, be part of the teams and help explain what data is out there, how it works, all of that kind of stuff. We are looking forward to seeing how that goes.

It is an experiment for us. Even if it is not successful, I think we will learn from it. I would like to see us try to replicate this in other places. We have already had some interest expressed from folks in Philly, from some folks in Boston, some West Coast cities. I think there is this really interesting idea of connecting people who understand the information with people on the ground who know the problems, and have the access to other assets, to try to mash it all up and do some interesting things with it. This is another thing that we are working on.

Finally, we are having our biannual meeting for the HDI data leads in November. We will send out more information on that, so stay tuned for that. Almost finally, we are working on a bunch of different things externally.

We have this regional affiliates program for the health data initiative, the idea being that there are lots of interested parties around the country, who want to be a part of this and kind of do their own things in their regions. We are encouraging as much of that as we possibly can. We have some great examples here. In Cincinnati, two weeks ago, there was an event focused around a health company accelerator that had some really interesting applications come out of it.

There is an event, the first week in October, in Boston, a Massachusetts regional affiliate, kind of along the same lines. There has been a bunch of activity out there, and we are encouraging more and more folks to kind of sign up for this program, that are interested. Our job partly is to support those efforts, and to help those people kind of do as much as they possibly can. I should say you can find more details on this at the URL at the bottom of the slide, hdiforum.org. Again, if you have any questions, feel free to contact me.

Finally, the event that you all have been waiting for, the Fourth Edition of the Health Data Initiative Forum, or Datapalooza. Save the dates were just announced, June 3^rd and 4^th, next year. Mark your calendars, don't plan anything else. Be there the whole time, it will be fun. We are in the planning process now, so I would, over the next few months, just keep an eye out for lots of different pieces of information coming through around schedules and agendas and suggested topics. If you guys have any thoughts, let us have it. We are definitely open to suggestions. That is about it. I am happy to answer any questions. I think I can stay for one minute.

DR. COHEN: If data are going to be housed in Healthdata.gov, how is that going to interact with the health information warehouse?

MR. SIVAK: That is a good question. My personal opinion is that the datasets are slightly different, or they can have slightly different purposes. It is an alternative, in the sense that there is a different use case for it, maybe. The Health Indicators Warehouse, there is a lot of analysis that goes in there, and a lot of sort of aggregated data and things like that. I am imagining that, if we want to provide programmatic access to raw datasets that don't currently have a home, it is not a bad place to do it. These are all things that can be worked out. This is an initial stab.

DR. COHEN: Is the focus going to be on individual level or aggregate level or whatever is around?

MR. SIVAK: I think it remains to be seen. I would say that kind of whatever is around would be my answer right now, but that is something that we can talk about.

DR. WARREN: I love the fact that you use the NPPES as an example, because you described it wrong. When we are using these datasets, we need to be very careful, if we start changing the wording, that we have accurate descriptions. It is more than just physicians, it is any clinician that needs an identifier in order to be out there.

MR. SIVAK: I want to ask you a question about this. That is a great point, and this is something that I have seen kind of throughout my career, when talking to any subject matter expert in any field. Actually, my best friend in the world, actually he is a philosopher, right, I used him as an example of this.

I went to this thesis defense because I wanted to be a good friend and show support. I sat down and it was a four-hour long affair, with him and the committee that was sort of integrating him. They spent four hours talking about his thesis, using words that I understood, I had heard before, they were part of my vocabulary. They were using these words in combinations that literally made no sense to me. I understood maybe 15 percent of his defense. I like to think I am a pretty sharp guy.

What I realized at that moment was that, in technical fields, we have a language that we use because we have to. We have to be precise and we have to use very specific words, in order to communicate what we are trying to say. At the same time, if we are trying to involve a broader group of people than we typically talk to in our technical fields, I think we have to give up some of that precision, in order to be more approachable. I think this is a potentially interesting example of that, right?

DR. WARREN: As long as we don't disenfranchise other people.

MR. SIVAK: Sure, and absolutely right, but I think there is a way to phrase things in sort of lay language that might not be as precise as using the word, provider, right, but might be more accessible to folks. My gut tells me that we can involve a lot more people if we do that. It is something that I am happy to have the conversation about, because I am interested in this quite a bit.

MR. SCANLON: Bryan, just before you go, this is the group we have just formed. We have folks who know the data area, the public health area, the community health area and the technology area, and folks who have helped plan previous Datapaloozas. We will be using them, and it is available to your office, as well, as resource experts on how we are doing with HHS data. Do you have ideas about how we can reach out better to the developer community, and what kinds of datasets do you think would be more useful.

The other thing is, as a FACA, you can hold meetings with the public, and with anyone really, any group, and be covered under agency consultation. Again, it provides the opportunity for open meetings with these communities for the department, as well.

MR. SIVAK: That is great. I am sorry I have to run. If anybody has any questions, please feel free to just drop me an email. It is just Bryan.Sivak@HHS.gov and I am happy to help.

DR. CARR: Thank you so much, this is great. It was clear and we understood it. All right. Now, Jim, I keep preempting you, but back to you, review charge and agenda.

MR. SCANLON: To review, we invited you, you were all recommended highly to serve on this working group. The focus really is, as Bryan said, we are not developers. We have a lot of folks who are experts in data and surveys and research and programs, but we collect a lot of data. Some of it is intended to be public, and they have extensive dissemination programs. Other data is more administrative and you sort of have to turn it into a data product.

What we are going to try to do, and we really look to you for help, what would be the best way. We would like to expose you to the kind of data that HHS. To be honest, what you see on Healthdata.gov is probably the tip of the iceberg. That is the data that can be put on an open health data website with no restrictions. I think you know, that is probably not what we can do most of the time, other than directory data or location-based data, public lead data.

We have a lot of other data. We have all of the CMS claims data, for example, that can be made available for research analysis quality and so on. We can't put that, for the reasons you heard at the previous discussion. We can't just put that on a public website and cross our fingers. We will all be in jail before the day ended.

What we are asking is your help on what data do we have. We will talk about how we make it available now. If you could advise us on how we would reach out even further, I think you particularly could develop a community. How would we even interact more, because again, our folks sort of stop when they publish or move on with the next study. You really know how that data is taken and applied. Many of you make a living doing that actually, so that is the kind of advice that we would need.

All the while, thinking of protecting where we are making it available, how do we be sure to protect the confidentiality, and generally give us advice. I think we will start, as we will today, with two of our biggest data holder producer organizations, the National Center for Health Statistics and CMS. We will take you through some of the others, as well, at Public Health Data, and we will be looking for more advice.

DR. CARR: I would just add, I realize this is our second meeting, and Todd really asked us to work quickly, and we took the summer off. It is our intention to be nimble, quick, focused and really take this. In fact, Bruce brought forward an excellent application where we may begin to say, here is what is out there, and here would be a way to use it. We will talk about that later. I think we are going to hit the ground, working hard, before we leave here today. With that, Jim, can we turn it over to you?

MR. CRAVER: Thank you. I was hoping to squeeze another 10 minutes out of your agenda, but that has been taken from me.

DR. CARR: No, we are flexible. We are here to learn basically.

Agenda Item: Overview of NCHS Data Products and Services

MR. CRAVER: I am here on behalf of the National Center for Health Statistics. Really, my objective is to give you a whirlwind tour of some of the pieces that are available to you. I am going to try to jump back and forth, between the Web and the presentation. If I am going too quickly, stop me.

DR. CARR: Did you say the Web?

MR. CRAVER: Yes.

DR. CARR: Are you familiar with the HHS building?

MR. CRAVER: Yes, I am, so I can stay within the presentation, if I have to. Thank you for the reminder. These are some of the areas that I am going to touch for the presentation, just a brief overview of some of the data systems and how we think of them, how we organize them. Then, ask the question or answer the question, why might we approach a single health topic from multiple perspectives with multiple surveys. Then, dive in to some of the tools and some of the resources that are available to you and to the public really.

Again, my objective is that so your familiarity with NCHS and its resources is increased by the end of this talk. Also, for you to be empowered and able on your own, to go even more in-depth into some of those resources.

Just broadly speaking, the types of data sources that we usually refer to within NCHS are the vital statistic system, the births and deaths, mortality and natality datasets. We have surveys of individual people, and that includes person to person interviews, computer-assisted interviews, knocking on doors or telephone surveys, as well as bringing people through our Mobile Examination Clinic, the MEC, which not too long ago was sitting out in front of this building.

MS. GREENBERG: It still is.

MR. CRAVER: It is still here? I came in the back way.

MS. GREENBERG: We had hoped to visit it during this meeting, but there was just too much going on.

MR. CRAVER: That is a lost opportunity or missed opportunity, too bad. It is maybe four and a half trailers, I know they say five, but the last one is a small, short trailer. For our new survey, the National Youth Fitness Survey, which is bringing in youth, young children, and adolescents, and actually measuring their fitness, and their height and weight. It is really the gold standard of clinical measures of individuals.

You can imagine the provocative statement that I always say is, imagine you walk up to someone and ask them what their weight is and you will get a number. If you put a scale on the ground in front of them and ask them to step on it, you will get a different number. You get a sense of kind of the reasons why we like to approach topics from multiple perspectives.

We also have the National Health Care Survey. It is a family of surveys which do survey providers, doctors, clinicians and others, who are in the health care arena. That is really quite a broad area of surveys that are coordinated and ongoing. Those tend to be of administrative records or records that are abstracted from hospitals and clinics and other places where people interact with the health care system.

I have sort of hinted at or touched on some of the reasons why we have multiple sources. You have probably have guessed already, to really capture all aspects or all facets or all sides of the health care industry, the health care system, health care as it is used, and health care as it not used. We even have estimates of undiagnosed disease prevalence and incidence rates.

We also use our data to look at some of the methodological issues regarding collecting the data and analysis of the data, how best to understand the data that we have, how best to combine that data with other datasets. Also, to look at extending a data system that we have through linking it at the record level to other datasets. We have an ongoing program, we call it the Linkage Program, that takes Social Security Administration data, and takes our survey data, and it takes CMS data and census data, and we merge that together at the individual level, at the record level. We essentially come up with another dataset that really has a full range or a fuller range of information at the record level.

Your discussion about disclosure and re-identification starts to make some of the people at NCHS very nervous, because we do have this kind of data that is tapping into data sources from multiple agencies, and really intentionally doing that. Then, we have to release that data in an aggregate form, in a form that is perturbed in some way, or guards against that disclosure. Or provide access to researchers to the raw data that we have, in a very secure format, with high levels of assurance from them that they won't do anything that they shouldn't do with those data.

These are the tools. I am going to get up and grab my water, because I am starting to get dry. It is warm in here for me. These are some of the tools that I will touch on. I will try to jump over and take a look at those. If I don't succeed, because it will actually take me a little bit longer if I do that, but if I don't succeed, I will stay within the presentation.

Just looking at the homepage for NCHS, I will give you a little bit of the geography. This is maybe a month and a half old. Our website was revamped. We are very proud of the work that went into this. I have to tell you, though, as an older user of NCHS website, it took me about three days and a phone call to find out where FastStats was. There is a big red arrow, that is to help you. It was right there in front of me, and I couldn't find it. The person who I called was very nice and didn't say anything bad.

Scrolling down the screen, you will get to the data access areas, and then you will get to the additional resource areas. Obviously, this page has a lot of information and a lot of links that go off into different parts of NCHS. These five red arrows point to the five areas that I am going to look at today.

The first is the Health Indicators Warehouse, the first one that I am going to feature. I appreciate question earlier, Bruce. It is one that I think about often. Bryan and I have talked, and we will have ongoing conversations about it, so I have my thoughts that are not inconsistent with some of his thoughts.

The Health Indicators Warehouse is all aggregate, public use data. There are no individual records, it is all data that anyone can come and use, access, download and access through an API. The system has the ability to graph data. There is some linking to interventions. We use, right now, the community guide to preventive practice and the guide to clinical preventive practice for some of those. It was an idea that was proposed very early on, that we don't just put out indicators like the number of people who have diabetes. If someone in a community wants to do something about that, how can we point them in a direction that they then might be able to do something about.

Because that data is available through RAPI, that then can be used by third-party developers, which is really one of the things that this project is really trying to drive also, not just open data, but also to seed that community of developers, third-party developers, who aren't necessarily in public health, population health circles, but are developers, and want access to high quality data.

We have about 1100, I think it's 1170, 1169, indicators on the system right now. It comes from a variety of data sources. I am embarrassed slightly that CMS is not called out specifically here. It is included, of course, under HHS. We do have maybe about 150, 170 indicators that are from CMS, and many of those are only available through the health indicators warehouse.

Currently, those indicators only have a single year of data, 2008. We have, and are about to make available to the public, four years of data, and then, very soon after that, five years of data. There is going to be a little bit of a tweak for the interface, but that will happen quite shortly.

Let me get to the homepage and see if I can come over. Here, I am on the browser, and I am on our homepage at NCHS, and I am going to dive down the Health Indicators Warehouse. Just a quick geography, we have three main areas by topic, by geography or initiative, to get into the warehouse. I like to say that these are basically three doors into the same very large room.

What a user will do is select a topic and jump over to what we call our filter page. Immediately, you are presented with a subset of the 1100 indicators, related to the topic that you are interested in. I am just going to pick something. By the end of the day, this might be relevant. I am looking at binge drinking in adults.

We don't force you to read the metadata, but we do force you to click through the page that has the metadata on it. At NCHS, we do think that that is very important. You don't really know what indicator it is you are looking at, just by the title of that indicator. You need to know something about the methodology, something about the numerator, something about the denominator. We, at least, put that in front of you.

We also tell you the data source and what years might be available for these data, and some of the dimensions or variables that we have. It is a quick click over to the data display, which we start with a chart. In three or four clicks, you are looking at the data that we have on a particular indicator. This interface is to help you get familiar with the content of the warehouse. We then let you download these data, on an indicator by indicator basis. We also expose the database through an API, so you can gather as much about all of the indicators as you want.

DR. TANG: Who determines the 1100 indicators the first time, and who determines that HHS will actually maintain it?

MR. CRAVER: A really good question, it is one that we struggle with. We started with what we refer to as an initiative. We made the assumptions that there are groups out there, within HHS and closely aligned with HHS, that have already done that heavy lifting. The obvious one for us is Healthy People. The data stewards for Healthy People happen to be in NCHS. They happen to be about 50 feet down the hall from my office. It has been a very nice collaboration.

Now, ODPHP and assistant secretary for health, they have their interests and their agenda, and their audiences for Healthy People. They also have a heavy respect for the data that go into that. Right now, they are well over 50 percent of the indicators in the warehouse are associated with that initiative. We also have a subset, not a complete set, but a subset of the county health status indicators, and the county health rankings of the community health status indicators, as well, as I mentioned earlier, the CMS indicators. Together, that adds up to 1100, 1200.

Moving forward, it is a very good question, because what we like to do is receive data already processed for us. We do some quality control check, quality assurance checks on that. Within this project, we do some data programming, also.

We try to increase the efficiencies with the other projects that are happening at NCHS, including Health U.S., which I will talk about in a second, which programs a lot of data to make available to the world, Health Data Interactive, which I will also talk about, programs a lot of data, makes it available to the world. But it makes it available in a different way. Both of those projects and products have their own history and their audience and their stakeholders.

We think it is reasonable, and we do have a governance body that is the overseers come from the NCHS Board of Scientific Counselors. Then, we have our indicator advisory group, which is made up of members, of representatives of different HHS agencies. We have a statistical standards group, which is made up of staff of NCHS, and representatives of some of the data sources.

Just to make sure, we have a couple of levels of criteria before we put any indicator and any data in here. For example, if you have a survey, we want to know what the response rate is and we want to publish that. We don't want to just say, it has a good response rate. We want to include confidence intervals where it is relevant, or standard areas where available, those sorts of things. We make sure that that is true.

Each year, when a survey repopulates its publically available dataset, then we go through and we process it as quickly as we can. It is not small task, as you can imagine.

I have jumped back to my presentation. Just to move along a little bit here, a couple of screen shots --metadata from a different indicator, stroke deaths. Here is just an example of a different way that you can display the data. We saw the chart format for binge drinking. Those data are also available in map format, state by state. Where available, we have county level data, and that is mappable, as well. As you can guess, most people want the lowest level of geography as possible, for as much data as possible. We assure you we do that. Unfortunately, that mainly means county level.

I am going to move over to Health United States, and I see that I am going long here. I am going to stay within the presentation here. Health United States, I actually meant to bring the volume. Who has seen or who owns a copy of Health United States? Everyone here should have it.

MS. GREENBERG: It is routinely sent to the members of the National Committee. Actually, for comment, but also, we will include the workgroup the next time it goes out.

MR. CRAVER: I think the first time the workgroup met, we distributed the Health US in brief.

MS. GREENBERG: If anyone doesn't have it, let us know.

MR. CRAVER: You are very familiar with Health United States, the reason for its being, the fact that it is legislatively mandated, and that it covers the broad range of sources and topics. It provides, for example, this window onto the world, how might you start to describe selected trends in health care use? There are tables on preventive services, prescription drug use and inpatient surgery. Here, they are associated with the different surveys that we use as the sources. I just have some example slides, pulled from the text of Health United States.

One of the interesting developments just around the corner for Health United States is a collaboration that we have with elaborate(?) medicine, to make an interactive version of the Health United States available to the world. You can dive deeper and deeper, and deeper still, into these tables, deeper than what is in the published version, deeper than what is on the Excel sheets that you can download for the content of the warehouse. That might deserve its own presentation at some point.

I mentioned Healthy People earlier. We have recently transitioned from Healthy People 2010 to Healthy People 2020. That allowed an opportunity for people to review the indicators that are there, the number of objectives have increased. That, I think, just reflects the additional feedback and visibility to that project. Moving forward, their challenge will be to track that data and to keep that data updated. One of the things that they have done is to hone in what they are calling their 10 leading health indicators. They are producing reports about those 10 leading health indicators for Healthy People.

The example that I have pulled out here has to do with heart disease and stroke. For Healthy People 2010, there were 17 objectives tracked. Now, there are 18. There were five data sources. The same data sources are used, but there are an additional 31 developmental indictors. Developmental indicators are, if you want to think about sort of aspirational indicators. They are indicators or objectives that ought to be collected and ought to be available in order to more thoroughly understand the area or the topic.

This is a capture taken of the final review for Healthy People 2010. I think it is really worth looking at that PDF. The URL is up here, and I apologize that I did not have the slides to distribute. If that is necessary or desired, I would be happy to do that.

MS. GREENBERG: We will post them. Is the working group on the SharePoint, too? We will post them publically, because it is part of the public meeting. We can also put them on the SharePoint.

MR. CRAVER: I am sorry you don't have them in front of you. This is, I think, a really creative way to display a lot of information and to look at the trends on several indicators or objectives for a topic area, all at once on one page. This really does capture a lot of complexity. You can stay at that surface level, or you can look at the details and really gather a lot of information.

Health Data Interactive is another project that NCHS has. It also has aggregate data. It also has indicators, or in this project, we talk about tables. They are tabular views of the data. I am going to hold this up as I am talking about this, because it really has an interactivity that I think is worth understanding. Once you understand that interactivity, then you can really see the value of this site and why it exists separate from the warehouse.

When you drop into HDI, as we call it, it was the original HDI, I will have you know, it is not that Health Data Initiative, we had the name first. You come to our splash page with the table topics. When you click on a table, this screenshot shows the results if you had searched for the word, asthma, you get the list of tables inside the application. From there, you can look at charts, you can look at tables.

It is much more interesting if we look at, let's say, okay, ED visits in the United States. What I want to show you is this is our opening view for this table. We think that this probably meets most of the needs for most of the people who come to this page. However, these little shaded parts allow you to click and drag, and customize the view of this table. Suddenly, you have taken this dimensional hyper cube of data and twisted it, and shown a different face that you might be interested in.

Perhaps, you want to look at diagnosis by regions of the country. If that is not how you want to display it, then maybe this is how you want to display it. HDI, Health Data Interactive, lets you take these 50 odd tables, and not just look at the data that is there, but start to manipulate it and start to come up with a way of changing what you see, being able to focus on something that you are interested in exclusively.

Now, this is really your table. This is your customized view of the table. Of course, this can be looked at in chart format and where available by geographically, a map display. That is Health Data Interactive. That process is the same process that was used to create this customized chart. It really is an interactive tool for manipulating the data.

Lastly, I am going to mention FastStats. FastStats is the place to go, if you don't know the number, but you know the descriptor of the number. Say you want to look up the number of people who have diabetes and what is the current value of that. FastStats is the place to go. That is in the upper right quadrant of the homepage. It provides a quick access to a long laundry list of topics that we maintain and update. Whenever there is new data on a topic, we go in and we update that FastStats page. Here is an example page for diabetes. We cover that for, in this case, morbidity and health care use and so on.

That is the end of my talk. I encourage you to take a look at each of these tools, to explore the NCHS webpage. That is really your portal to these tools. Each of them has an approach and an audience and a reason for being, that is slightly different from the other. We are, at the backend, really trying to increase our production efficiencies.

One of the other things that certainly the Health Indicators Warehouses spurred us to do even more than we already is to pay attention to harmonization and pay attention to standardization. That is flowing back and forth, as we speak, with the Healthy People project, with Health Data Interactive, with CMS. It is an exciting set of tools that we have. I hope I have exposed you to some of those, and that you are able to take them and run with them. Any questions?

DR. CARR: I just want to say, it is great. It really is well thought through. It is sort of user friendly, because it makes sense where you want to go and where you find it. I really like it.

DR. ROSENTHAL: One of the external developers that Bryan and company are trying to target, what has been their reaction to it. Speaking as one of those, one thing that springs immediately to mind would be entity diagram. My question is, in terms of target groups, you mentioned it was for different audiences, different views.

One of my questions was, how do you reconcile this with what Bryan was saying, in terms of what is the nature of the reaction of external developers. From an external development kind of perspective, when I am looking at things outside health data, the very first thing I would expect to see is a big ole entity diagram up in the middle of it.

I am looking at interactive tables. If I want to develop, it is not enough to know this metadata is defined in sentences. I need to see what is this thing, and this goes for CMS, too, parent, county, org, et cetera. What is this thing, this piece of metadata, where does it exist as a piece of data or as a summary. It is very, very important for someone coming in, who doesn't understand the nature of the difference between physician, provider, clinician, et cetera, et cetera.

Show me that physically, what is the nature of that relationship. Typically, call it an entity diagram. We have spoken in other committees with some of the NORC and IMPACT people. They said there is no reason not to share that.

If I am coming in and trying to put on HHS's glasses, and take a look at what the world looks like through your eyes, when you're looking at your data, what is the single quickest thing I can see, I immediately go to that. My question is, this is all fantastic and wonderful, but from a development point of view, that would be the absolute first thing.

MR. CRAVER: I don't disagree. The warehouse is, to date, the data that is available through an API, and that is the place where there is the most critical need for an ERD. One of the issues that we have as a federal project is security. We have some people who are at one end of the continuum of locking down everything, and we have other people on the other end who say, it is public use data and there are no secrets.

For systems like that, we have to make sure that we are not handing the tools to someone who wants to crack it and open it up, and do something nefarious. I know that same continuum of people who anticipate that and worry about that, and make me as a project officer and project manager behave a certain because of their fears, whether they are real or not, they are what I am compelled to follow.

We do have plans to put up an ERD that should be sufficient. I do want to engage on a one-to-one basis with people who are trying to use our API, so that I can learn from them how to improve it, and what other resources. We do have user's guides and we have data dictionaries and those sorts of things.

There is at least one piece missing, and it may be just the ERD, but I think there is another piece missing, too. I actually have a person on staff, who I am working with, I basically said, you go use the API and you build me an example that I can post, an example app. I am circling back around with him in the next couple of weeks.

DR. ROSENTHAL: I have actually prepared some slides, and when we can get into what we could do, showing tangible examples of how to do this, especially with things that are very concrete, specific examples that have very little to do with security, in terms of nature of relationship between parent or contract. I can share those with the committee.

MR. SCANLON: Jim, if I am remember correctly, for the warehouse, anyway, the backup data that supports the graphics and others, isn't it machine readable? Am I thinking of Health Data? So that if you had the metadata, you saw an application, we have access to the data, as well, right?

MR. CRAVER: Absolutely, that is correct. What people have struggled with is, I mean, it is an obvious problem. You have a series of tables that are related to each other, through foreign keys, and you don't have the Rosetta Stone of how those relationships are built. If you know the data, you can sort of guess. If you don't know the data, you are lost in the wilderness. What we want to do is we want to encourage use, not discourage.

MR. DAVENHALL: Is it possible, Jim, that you can start to provide for this working group visitor statistics, metrics sites, give us some sort of sense of both anything else you can tell us about that.

MR. CRAVER: I can give you a kind of up to date statistics on the warehouse. We are generally around 1000 unique users in a given week. That has been pretty steady, it falls off in the summer, as you would expect. It is starting to ramp up again now that schools are back in session, people are back from vacation, so forth.

I haven't looked at HDI recently. That, a while ago, was more around the 5000. I am guessing that has tailed off a little bit with some of these other tools. Frankly, Health US, I don't really know. I could probably find that out, but I would have to look at that, and FastStats, also.

DR. CARR: The users are identified by their email, is that how?

MR. CRAVER: Again, there is something called CNA, Certification Accreditation, which any IT project in the government is supposed to go through. There are levels of oversight involved. If you collect PII information from visitors, such as their email address and their name, first name, last name, location, phone numbers, those sorts of things, you have a much, much higher hurdle to jump over, just to get your project out the door.

Most projects, including these, basically it is an open door and anyone can come and go as they please, and we don't track who you are. We might track whether someone has walked through the turnstile, but we don't take the fingerprint when they go through. We can give numbers. For the warehouse, we are currently using Google Analytics, which allows you to take a look at unique visitors during that time period that you are looking at. You can say, in a given week, this person only came once. Or if they came again, I didn't count them a second time.

The other tools, I am less familiar with how they track their users. Unless you see a site where you have to register or you have to deposit your ID, your email address or something like that, you are not likely to do that. Now, I am on the brink of making a decision, and having a discussion about asking more information about users of the API, so that we can provide more one-on-one relationship. I think it is worth the cost for me, as a project officer, to get that through. I think the audience will give that stuff away. They are not going to care about it.

MR. SCANLON: But you do accept suggestions or complaints?

MR. CRAVER: Absolutely.

DR. VAUGHAN: Are you thinking along the lines of what they are asking for labor, in terms of applying for developer keys?

MR. CRAVER: Yes, just so that we can then go back to them and get the kinds of stories that Bryan is asking for. Or say, we have this new resource, or we are thinking of doing this, can you give us some more targeted feedback.

DR. VAUGHAN: I was real interested to know, to what extent exists now, or is anticipated alignment of these same initiatives with what is going on in the states and counties, many of whom are also looking to open up their data, and kind of piggybacking on what Bryan is saying. They are using Drupal, blah blah blah blah. What does that alignment landscape look like?

MR. CRAVER: Well, the warehouse is not open source. Health United States, the data is available through Excel. HDI, Health Data Interactive is also in a propriety system, and FastStats is html, it is up on the web. We have not had much focused interaction with representatives from the states, although we are also on the brink of doing an evaluation study and trying to get input from state directors of public health and their deputies, feedback and baseline awareness on survey. That should be happening soon.

I am hoping that, in addition to getting that data, that will sort of climb a little bit of the pump for that back and forth communication, so we can have those exchanges. Now, in terms of open source, that was a decision to go down, not to have open source. It was a decision that was made, but that doesn't mean that couldn't be changed at some point in the future.

DR. COHEN: Can I respond a little to your question? There are, I would say, between 20 and 30 state-based, web-based query systems, either directly from state health departments or in conjunction with partners. Some of them are open source codes, some of them are in-house development, and some of them use essentially cut solutions. Then, there a couple of vendors who sell essentially products to create that functionality at the state.

Obviously, the states in general go down to county, and some go down below county to large communities. In a couple of cases, ZIP Codes for some of the data that they include in there, they are equivalent to the indicator warehouse. They draw from a variety of data sources, the analogous data sources that are available at the state level.

I hope that, as we move down the road here thinking about development, part of the conversation should be stimulating state and local developments. There are a bunch of counties, San Francisco is a leader and has always held up to the light. There is a variety of these initiatives at the county and community level, as well. Some of them have very interesting properties. Seattle allows actually folks to enter their own data in a framework, so they can choose data that is secondary data, as well as marry that to primary data. There are lots of different threads going on.

DR. ROSENTHAL: What might be really helpful, in terms of assessing where to allocate resources, in terms of bumping up depth or frequency outside of the privacy conversation, is just a basic kind of utilization or usage by the individual dataset. A kind of hit, download, maybe or something like that. If there are three of these that are accounting for 90 percent of the usage, that would be very informative in terms of being able to make an assessment.

DR. CARR: Jim, are you able to stay around?

DR. SONDIK: Can I say something? Let me just add something to Jim's really excellent description of this. There is another level here of data that I don't want the committee to lose sight of. Really, these tools focus on the aggregated, the indicators. A lot of this comes from raw stuff.

There are two kinds of raw stuff. This is my view, it is a technical term, raw stuff. There is the vital statistics, okay, which really is the listing of births and the listing of deaths, and it is as complete as it can be. Then, there is what we get from the surveys. The surveys is the really tough nut to crack here, because what you asked for is really hard to describe, when you asked for the diagram.

DR. ROSENTHAL: It is a specific technical entity relationship diagram. If you have a database, you do have an ERD somewhere.

DR. SONDIK: I don't know how to do that for a survey. I am sure if we sat down, we could do it in two minutes or so. The thing is that the survey, you get the data because you have used let's say a statistical technology beyond methodology really, to identify who you are going to bring in. Each one of those people, if you are surveying people, you can describe that individual in terms of what they represent, in terms of that diagram that you were getting at. They are a county, they are a block, they are a census, you know what I am getting at.

DR. ROSENTHAL: Yes, it is actually not a mini to mini. It could be as simple as saying, what states belong to a region, that is the type of thing I am talking about. If you are coming into this site for the first time, you don't know anything about the data. You see region up there, my first question is which states belong to a region. That, in terms of a diagram, is what I had expected to find as a developer. Say West Coast region includes.

DR. SONDIK: I am just saying that a lot of that thinking, when you look at an indicator, like the indicators in Healthy People, that thinking has already taken place, and that can be done. When I look at a survey, it is not so easy to do that. It is there, but it is many things. It depends on how I combine that data, that raw data, and how to combine the raw data, the 5000 individual, very, very long records, but there is only 5000 of them, in one year of HANES. There are many ways to combine. Do you agree with what I am saying? There are many ways. You can disagree if you want. There are many ways to combine that. The same goes true the 125,000 in HIS. That is a tremendous source of data. Actually, if we could get a few apps that start to use this, it will become clearer to the applications community as to what can be done. There is a challenge with this. I don't know of any apps at this point that go into any of the survey data.

DR. VAUGHAN: What would be your top three most important ones that you would want to have apps developed to?

DR. SONDIK: Well, I mean, there is the HIS, which is the core dataset for the department, a wide variety of information on everything from prevention to services that people are receiving. It is a nationally representative piece. That differs, in a sense, from what you get from CMS, because CMS says this is exactly what is going on, although there are surveys in there, too, that say what is going on. It is that survey thing that I think is really a challenge.

There is one other thing that I wanted to put in front of the committee, that is interesting. We did a briefing yesterday for Hill staffers. Somebody came up and said, I think this thing you got is really terrific, so it is very much on my mind. It is the tutorial that NHANES has. The tutorial is extensive, very extensive. It is not a two-minute deal, it is an extensive tutorial on how you get the data, how you analyze the data. It is actually award winning, and people can get credit for this.

Because it is so extensive, it is not as accessible as perhaps it could be. There is a possible application. It is not the usual app, what I would think of as the usual app. It is not just in front of you, per se, but the committee, because this is part of what we all do. We all being the people who produce this core data. We aren't as facile and as creative as we can be, if we can liberate this. The stuff I think we need to liberate is not only the data that has been massaged, and what I would call indicators, but what leads into that.

DR. CARR: I think it illustrates exactly why we have this meeting, because we have questions that we didn't know existed. He can use it, but you can't use it, what you need, he might need to make.

DR. ROSENTHAL: I think we might be speaking that kind of cross purpose. Just so you know, my Fulbright post doctoral was on qualitative data, so I have a reasonable approach to that. From a development point of view, it is kind of 80/20. For every database, there exists an entity relationship diagram. I absolutely guarantee it. That is cold, hard facts. You may debate about the nature of the relationships, but there is something behind that.

If you are wondering why there are no applications being developed, first thing to look at is market utility. Is any developer able to create anything valuable out of that, which is an interesting question. The second is, how easy is the access. The first thing a developer is going to look at is some form. It can be sifted, it can be only 20 percent of it, it can be 80 percent of it. Any basic metadata relationships, before I am even going to bother, because otherwise, I click on that and I don't go any further.

DR. CARR: I think this is exactly the kinds of stuff that we will begin to do. Let's hear from Allison.

DR. SONDIK: I want to give just a rejoinder to that. If you look at genetics, there are a lot of tools that have been produced that nobody around this table is going to use. Those tools are absolutely essential to the people who are doing genetics research.

In a sense, I have that kind of thing in mind, I think, when I talk about these core survey activities, whether they are what we do or what SAMHSA is doing or the many surveys that CDC has underway. I don't know how ready they are for the public.

DR. COHEN: I think they are. I disagree with you. I can see many uses, and this is a much longer conversation. If I am a woman and I just got pregnant, I am going to want to know what the C section rates are for a 35-year old woman with my particular risk profile. I would love to be able to go to an app that would combine the hospital discharge data and the birth data for my city or my community, and it would tell me, this place is good because, this place is bad because. All of those data exist in different places. I think the surveys can provide that kind of information, too.

DR. CARR: I want to make sure that we have time to get a scenario, to do just that. We have talked about that this morning, and want to hear what Josh has to say, too. To say, here is an issue. How would we populate this with the information, where would we go? Hold your thoughts. Allison, please.

Agenda Item: Overview of CMS Data Products and Services

MS. OELSCHLAEGER: I am Allison Oelschlaeger. I am from the Centers for Medicare and Medicaid Services, our new Office of Infomraiotn Products and Data Analysis that was launched, as I am sure many of you know, back at the Datapalooza in the spring, late spring, early summer.

It has been a whirlwind in the past couple of months, getting up to speed as the new office and really starting to think about the chart of our office is supporting both internal and external data users, CMS data users. Really, having a focus on data and the people who are using data out there in the world, and inside of CMS. There really hasn't been a focus at that level, until we announced this new office.

I am going to walk through a couple of the various data tools that we have out, starting with the Health Indicators Warehouse. I am actually going to stick to the web and show you some of the various things that we have on Healthdata.gov, look at the Blue Button Initiative, which is a cool thing that we are working on right now. Then, talk a little bit about how we are updating and improving our process for sharing actual claim level data with researchers, with states, that kind of thing.

Starting with the Health Indicators Warehouse, and for the people in the audience who aren't here, I am at HealthIndicators.gov. CMS has a specific kind of way that we present our data in the Health Indicators Warehouse that is a little bit different from how the other data is presented. We like to look at our data in specific report, so taking indicators that are related and having a report that shows all of them across a geographic area.

I am on the resources tab, and down at the bottom is the CMS indicators. We have our methodology paper that you can click here and kind of read about our various population and the things that we are doing to clean the data up. Then, we have our various reports. I am going to jump into the hospital and patient report, just to give you an example.

This is Alabama. One of the things that people are always interested in, and there will be more interesting data as we have trends in here, which will be loaded very soon, inpatient admissions per thousand beneficiaries. Alabama is obviously above the national average. When you click in and look at the HRRs, let's pick Tuscaloosa, it's even higher, 436.8 admissions.

What if we want to say, well, how does Tuscaloosa look compared to the areas around it? You can click through here and look at the data itself for just this indicator and jump to a map. Obviously, one of the things that you always have to think about when you are looking at individual indicators is the various interactions that it has with different populations. For example, maybe Alabama has a lot of dual-eligible beneficiaries who have higher resource needs than the general Medicare population. This is just one way of looking at indicators. It is not comparing across various sets.

Looking at the hospital referral regions, you will see down here that, here is Tuscaloosa, and it is surrounded by HRRs that also have higher inpatient admission rates per thousand beneficiaries. It helps you start thinking about targeting resources, or looking at causes for that in this area. We think that the Health Indicators Warehouse is a great tool, and we are really excited to see our trend data get up here, because it is even more information for people to start using, and putting into apps and that kind of thing.

I am going to jump over to Healthdata.gov. You can get to the Health Indicators Warehouse from Healthdata.gov. One of the interesting things that we're just starting to come out with is called Basic Standalone PUFs. I heard you guys talking a little bit earlier about kind of privacy and HIPPA, and thinking about how do we release clean level data, while worrying about privacy and making sure that we are not releasing beneficiary-identifiable information.

This is kind of CMS' first crack at doing that. These are public use files. They are available for download on the CMS website. What we have done is we have worked with a contractor to strip them of all identifiable information. It is still actual claims. It is not aggregated information. It takes things like age puts it into buckets. It reduces ICD9 codes to three digits instead of five. It is keeping hopefully a lot of the information that you need, while allowing you to download claims data, but also protecting beneficiary privacy.

If you click into these and you are not impressed, which was my initial kind of perspective, I think you have to remember that HIPPA has a lot of really strict rules around cleaning data, to get rid of identifiable information. We really have to be careful and make sure we are not releasing beneficiary information.

DR. COHEN: Allison, does this have the three digit ZIP or what geographic level?

MS. OELSCHLAEGER: I am actually not sure what the answer to that is, but if we click, I think it will tell us. There is some geographic information, I think. Each of the basic standalone claims have their own kind of rules around cleaning things. This one, it doesn't look like it has geography, but it does have the age category, sex, ICD9 code. Inpatient may have some of that information.

DR. FRANCIS: Could I just ask you what the de-identification methodology is that you are using? Is it the Safe Harbor?

MS. OELSCHLAEGER: It is not the Safe Harbor. We are actually going through statistical review with our contractor. They are both masking, kind of creating groups and categories to mask things, and then also making sure that there aren't any cells where you could aggregate and have it count if less than 11 beneficiaries or providers.

DR. FRANCIS: Who is the contractor?

MS. OELSCHLAEGER: NORC, yes. It looks like we are not doing geography right now.

MR. SCANLON: Is that the national file, all of the national level claims? There's no geography.

DR. FRANCIS: You can't get it by state or by region?

MS. OELSCHAEGER: Right, it is a 5 percent file. The other thing that I wanted to show you on Healthdata.gov is our Medicare and Medicaid statistical supplement. That is in the Medicare box down at the bottom, and it is the fourth one down here. This is something that I don't think all that many people know about, but is a really great resource if you are looking for basic statistical information on the Medicare/Medicaid program. It has things like enrollment by state, enrollment by eligibility type, all in individual Excel tables that you can go in and look at.

One of the things that our office is trying to do is improve the way that we start sharing data. Right now, this is just Excel tables that you have to download individually, but more and more, we are trying to think about how do we move into the iTools or other ways to share this data that gives people more access and makes it easier to use things.

Then, the final thing I wanted to show you on Healthdata.gov is the compare tools. Medicare has a rate set of compare tools. Probably the one that is the best known is Hospital Compare, and that has been around for a while. Now, we have Nursing Home Compare, Dialysis Compare, and that data actually sits on Medicaredata.gov. Let's go over here.

DR. VAUGHAN: Allison, what is the recent year for this?

MS. OELSCHLAEGER: For compare data? I actually don't know the answer to that. It might be '10, but I can check and get back to you. Hospital Compare, Nursing Home Compare, Home Health Compare, and these are tools that allow people to come in and actually look at providers and see how they are doing, and compare across. In their geographic area, for example, what are the different providers that they could use and what are the comparisons on various indicators of quality. Medicare.gov has a setup that allows you to automatically do things like filter and visualize the data yourself.

I am going to jump over to MyMedicare.gov. Another initiative that we are working on is the Blue Button Initiative. This is a way to share claims data with the beneficiaries. One of the most recent changes that we have made to the Blue Button is to expand the amount of data available to beneficiaries. They used to only be able to get one year of A&B claims data. Now, they are getting three years of AB data and one year of part D data, which is a great addition.

Scrolling down, it is right here on the main page. You can click Blue Button, download my data, and it opens up a website that allows you to read about the Blue Button program. You can actually go ahead and download your data. For example, at Datapalooza, we saw a developer who is working with Blue Button data and helping patients start to use their data in more interactive ways on their iPhone or Android or whatever, and actually share the data with their providers, and then have the providers share the data back with them. There is a lot of tools, and this is not only a Medicare program, but also a VA program, which is where it started, and also some of the FHB plans are starting to do this, as well.

Then, the final thing that I wanted to talk about with you guys was the Research Data Assistance Center. This is our external facing group to help people who are interested in actually getting claims data, or Medicare current beneficiary survey data, identifiable data. They are going to sign a DUA, they are going to promise us that they are not going to share the data or sell it, or use it in incorrect ways.

ResDAC just designed their website. It used to be kind of old format. It looked like you were going into an university webpage from the early 2000s. We are really excited about this new website, and all of the kind of improvements that they have made in communicating with researchers and communicating with states, and making sure states can come in and get Medicare data and can share Medicare data within the various agencies in the state.

Going back to your earlier point on having a program for people to go look at quality of providers, in the winter, we announced the Medicare Qualified Entity Program. That is a program, qualified entities come in through ResDAC to request the data. It is a program where you can get Medicare data specifically for public provider performance reporting. We are really excited about that opportunity. It gives us a new way to share our data with people who are going to use it for good purposes. They have to combine it with other claims data, so you are going to have one report that kind of covers the provider's practice, Medicare, Medicaid, private plans. That is an exciting thing, too.

I want to give you guys a chance to ask me any questions.

DR. FRANCIS: I have a question about your data use agreements. Actually, first of all, I would love for us to be able to get a copy of it. Secondly, I am curious about whether you do anything with respect to following up on whether people comply, any spot-checking, anything of that sort. And what the penalty would be if somebody broke one.

MS. OELSCHLAEGER: That is a great question. Our data use agreement is available publically, but I can also make sure that the working group gets a copy of it. There are penalties in the DUA for violations of the agreement with CMS. In a lot of cases, I think we have found that most of the people who violate the DUA don't mean to. They are not selling the data or whatever, so we try to work with people to make sure that we are not penalizing them for doing something that they are not necessarily at fault for.

In terms of follow-up, I think that is something that CMS is kind of still trying to figure out the best way to make sure that we are not. It takes a lot of resources to follow up and make sure that you are tracking people. We, so far, have only shared our data with kind of trusted academic researchers.

As we move towards more data sharing, one of the things that we are considering is a data enclave model, so giving people virtual access to Medicare data in an enclave. That gives us a little more control over the data, making sure that they are only using it for purposes that we kind of think are valid, and making sure that they are not running off and doing something else with it. I think that is a good solution, moving forward, and we have already started piloting an enclave with 200 users.

MR. SCANLON: Allison, on the follow-up question, ASPE, we actually did a joint project with CMS privacy group, and designed a program for sample audit follow-up. It is just a pilot basis to see if it works. They might have actually tried that, Allison, I am not sure. It was basically not just complaint-driven, but to actually follow-up proactively on a sample of some of the data holders.

As Allison said, it is usually something like the researcher added two graduate students to the access list and didn't tell us. It is usually not more serious that than, but it would have come to our attention.

MS. OELSCHLAEGER: If you are really interested in how we are approaching privacy, there is a whole privacy group at CMS that knows a lot more about this than I do. It would be great to have them come in and talk to you.

DR. MAYS: Is there a tutorial or a place that you can go to? What I am thinking about is, in terms of other academics and students, et cetera, getting access, is there some place you go and it would give you a kind of a sense of all of the datasets, or would I have to kind of know the different things?

MS. OELSCHLAEGER: In term of beneficiary level information, if you want to do your own research, or in terms of kind of the various data that information products that CMS makes available?

DR. MAYS: Yes.

MS. OELSCHLAEGER: There isn't. We are announcing very, very soon, there will be a data navigator function on the CMS website. It is pretty much ready to roll out, but we are just finishing up a couple of things. Our website is notoriously difficult to use and to find things on, so the data navigator will be a great tool to help you find the specific, if you are looking for Medicare inpatient information.

MR. SCANLON: ResDAC actually has got a little ways towards that. It shows you all of the datasets they have.

MR. CROWLEY: Do you have any mechanisms in place to understand how the users that are coming to you are making use of the data, or cataloging their needs and wants from the data?

MS. OELSCHLAEGER: I don't know the answer to that question. When you guys asked Jim, I was thinking we need to figure it out if we don't have it. The CMS website is not something that our office manages, but the office communications that does manage it probably does collect that information. If they don't, they may.

MS. QUEEN: There is a CMS privacy board, and so when things are requested, restricted data is requested, there is a whole protocol that gets submitted, that you know exactly how the data are going to be used by whom and for what purpose.

MS. OELSCHLAEGER: When researchers come and request underlying claims data, we do collect that information. The statistical supplement, how many people are going on there and downloading tables on Medicare enrollment, I don't know if we actually are collecting that kind of information.

MR. CROWLEY: As a researcher, it is useful to understand how other people are making use of certain datasets, from techniques to methodologies, to the extent that they are willing to share.

MS. OELSCHLAEGER: ResDAC is doing a great job of starting to think about ways to share that information. They are going to soon have a page up that will allow researchers to come in and say, here is what I am doing with the data. Here is where I have published, and then that will be available to other researchers, as they come in and think about, okay, here is what other people have been doing with this data. What can I do to add to that?

DR. KAUSHAL: What is the time lag between updating the data? For example, claims data, as an example.

MS. OELSCHLAEGER: Meaning for access to researchers or public products, everything?

DR. KAUSHAL: Yes, I'm sure there is a range.

MS. OELSCHLAEGER: Yes, there is a range. Historically, CMS has been almost at like a two-year lag, in terms of getting data out publically. Right now, we are at probably a year lag for public data, so for kind of stuff that we are getting out and sharing, maybe a little bit more than that. As Jim said, we are updating the health indicators warehouse with 2011 data now.

Six months to a year is when we kind of feel like we can start to get data out publically. For researchers and for qualified entities, we are starting to move more towards a six-month lag. Between five and six months is when we really see the data starting to settle down, and we are doing more and more work, trying to figure out when can we actually share data, and what is the earliest point that we can get data out there.

DR. KAUSHAL: The aspiration is to cut that six months down to even less?

MS. OELSHLAEGER: It is, except that we are really relying on providers submitting claims. When they don't submit the claims and the data is not complete, we really can't do anything about that. I guess the other option is to tell people the data is not complete, and it is up to you to figure out.

DR. CARR: That is what we are doing with the Pioneer program, getting rolling data.

MS. OELSCHLAEGER: Yes, as we are starting to communicate with providers, we are giving them even more up to date data.

MR. DAVENHALL: I want to make a comment. I wish Walter were here, because this is a standards issue. Why I want to point this out to you is that, one of the other problems we have to address is there is no standards in how people are setting up these websites. We saw the National Center, and they have this look and feel. If you want people to really start to use your data, you have to start to worry about being consistent across the board, as to what these things look like. Otherwise, people go to these sites, spend most of their time trying to figure out where this stuff is.

If you look at this site, this is, in my personal opinion, the richest, best site of all of the sites that we have been talking about here today. I would ask you to go to that one and play with it, and download that file she has out there, the provider number. Now, this is one of the most difficult crosswalk files to find in Medicare. Right, Allison?

This has every hospital and provider, and has their Medicare provider number. If you have that Medicare provider number, you can link that to a whole bunch of interesting statistics from intensity rates in hospitals to DRG adjustment rates and so forth. You have got to have this crosswalk file available. You can download that.

If you look across this tab, right here, you can't read it from where you are here. It is the most fully functioning set of tabs for anybody to get ahold of data out of CMS. I am just proposing that, as we go down this road, we think about using something like this as a model. It was immediately obvious what you are going to be able to get from that site, where the other places, it was always, can I download from this site, can I export, can I share, and that kind of thing.

The other thing that I wanted to mention while Jim is still here, tell Ed there are people out there who are using his data in the way he said he has never seen it. We need to find a way to bring him up to speed on that. I would say 50 percent of the hospitals in this country are using that survey data in a way that he would be shocked.

MR. SCANLON: Can I ask Allison, so this is institutional provider? It is not individual professional provider?

MS. OELSCHLAEGER: We don't make that available, although the qualified entity program is moving in that direction. As we start to announce QEs, that is pretty much what they will be doing.

MR. DAVENHALL: There is a file called the hospital service area file, by the way, which now you put data out in six months. That file has 2011 data in it, of every dollar that Medicare paid into a ZIP Code. It tells you how many days, how many cases and so forth. It provides a provider ID. It is totally worthless to you, unless you have this file, to tell you who that provider is.

I would say some of these things, as we think about, as we do our work, how do we make these crosswalk files? Could there be another file there, when you go to that other hospital service area file, that says, oh, just link here and you will get this crosswalk file, kind of thing. I really want to compliment you on this site. I would say that if more of the sites had that kind of look and feel, we would all get to it easier.

DR. ROSENTHAL: If you do that taxonomy work or see the entity relationship diagram, that stuff will pop out. You will say, oh, we don't have that, and that will be your roadmap for your metadata development, fi you will, to say, here is where we need to build something, as Bill said.

DR. CARR: Josh, did you want to come forward and go through?

MS. GREENBERG: I know Allison mentioned about the data use agreements, et cetera. You were pretty much talking about public use data tapes. I don't know if everyone on the group knows about our research data center, where we make more access to more restricted data. I just thought maybe you could just say a few words about that, and then maybe, in a future meeting, they might want to hear more about that.

MR. CRAVER: I mentioned our RDC, our Research Data Center. It might not be a bad idea to have Peter Meyer give a quick tour of the concept. The Research Data Center is a facility really that is under lock and key, and restricted access part of NCHS as a physical space that researchers can submit proposals to, and if accepted, come to our place and have the access to restricted datasets and datasets that are non-public, and do their research on it.

There are a couple of caveats to that. One, it costs money, because you are using our facility and our staff to do that. We have to set up the files for you, so that you can have access to it, and we also do disclosure review of your resulting files, so that we are not releasing into the wild something that legislation should. There are non-trivial reasons why we care about that. One of them is if we do disclose data, we, as individuals, are on the hook for hundreds of thousands of dollars, and many years of our lives in jail.

DR. COHEN: It is important to know that, not only NCHS data, but we actually are going to be using the RDC as a data enclave, to store data from another part of CDC that we are linking with state data. There is no way that we could get it out of CDC, because of the agreement that CDC made when they actually collected the data. You don't have to go to lovely downtown Hyattsville to use this. Remote access is a key for using the data at the RDC. It is a really powerful option that really hasn't been explored fully for using confidential data.

MS. QUEEN: Will the CMS data enclave be like that, be like an RDC?

DR. COHEN: Yes. The census uses, as well, a portion of data enclaves.

MR. CRAVER: At NCHS, we have ongoing and developing relationships, new relationships with census. NCHS, RDC now installed in Atlanta, there were recent discussions at a university in a laboring state to install a new RDC that would have census data, as well as NCHS data. It might be the time for us to discuss with CMS.

DR. VAUGHAN: Jim, is it now possible to align CDRC data with RDC, or is that a special super security? Is it possible to unify that under one secure roof at this juncture?

MR. CRAVER: I think that that is happening. I think that the efforts that need to take place are in process.

DR. CARR: We have an hour and four minutes remaining. This has been very, very helpful, very rich. I would like to hear from Josh, and then hopefully have 45 minutes or so to talk about what would be the next thing that we might want to do, to begin kind of getting a feel for what is available, how it works, what the issues are, et cetera.

DR. ROSENTHAL: I had from a previous meeting and this meeting, I just decided to put together some slides instead of put it in the Wiki. I had roughly speaking, and these were some suggestions and conversations I had with Todd way back when.

I had about seven very basic recommendations. Some of them are crazy and challenging, some of them are kind of common sense. I know Greg and company and Allison and John are already working on some of them. The specific recommendations, where to start, would be taxonomy, communal learning center, baking in business or market value into the challenges and contests, files accessible, the NORC IMPACT stuff you are talking about, I would also add synthetic files, completely synthetic for priming and loading systems, as well as just getting rid. I know you might already be doing that. Just leave it out there for the whole committee, so everyone is aware of it.

Data browsers, which are very different, Google Public Data, TABLO data, where you don't have data available to people. You just allow them to do analysis on the fly. This is common in the web world. Some of your data was actually used by TABLO and ReadWriteWeb at a contest, half a million people used the thing. Someone won, a young girl, for diabetes, comorbidity. It was fantastic, your data was being used. It was in this browser, they could instantly ask questions.

Then, partnerships and product development, we got into some of that. That will take me into basic web utilization. This is Google analytics stuff, so you get a sense of who is doing what. Opt-in, which is completely crazy, but consider it like green button for people who want to have their data shared. Don't underestimate that, as everyone knows from Blue Button.

What is taxonomy, just really quickly so everyone knows what we are talking about. It is not just metadata or talking about it or putting out sentences about it. It is showing the relationships between the data. Before you do anything else, before we get into it, you have to have a taxonomy, and you have it one way or the other. It is there. I can reference internal or external conversations.

Let me give you an example. I am looking at this file from CMS. I download the thing and this is what I see. I am like, all right. I am a developer with no health care experience, now what am I going to do. What are these things?

Well, I can go and maybe I can find it in a metadata catalog, but that doesn't tell me anything I need to know to develop an application. I have to sift, bit through bit, and be able to reconstruct the relationships. It turns out that plan belongs to contract. I mean, you get into kind of mini to mini, it belongs to organization, belongs to parent organizations.

Taxonomy defines business entities and the relationships between them. Parent orgs have orgs which have contracts which cover plans. Then, there are attributes, right? Contract number, legal entity, this little cell on the spreadsheet, what does that belong to? That will kill a developer coming in, who has no familiar with it. They expect to see that sort of thing.

Anywhere else outside of health care, you see it. You see it in government geolocation, you see in weather. All of the great success stories we have talked about have this. If we are wondering why it hadn't picked up, if we are only hitting 1000 of user downloads, this is really, really important stuff. There is sort of an expectation in it. It doesn't have to be. I understand the privacy thing. It can literally be what states belong to a region. It can be that level, same thing applies, 80/20.

If you create all of this, you have CMS doing their data products and tools, Niles' office, you have researchers doing their things. You have people in the commercial space, who will build applications, if they can create market value, myself being one of them.

Inside your extract system, call it nonclave, you need a public learning center, I will get to you. The center of all of that is taxonomy, the business relationships of the entities. If you want anyone to build something cool up here, it is very, very difficult to ask them to kind of knit by hand all of those relationships. I just gave you one example of a file. Thousands of them, to be able to pull together a reasonable application. That is taxonomy.

Learning center, I will let you read on your own time. I want to be able to see what other developers are doing, and actually how some access, Bryan talked out speaking with the data experts. We could do that at little roundtables and committees, but actually having an interactive community to do that would be really, really helpful.

You know what would be really helpful? Sharing the web analytics, not only so you have them, but so that when I walk into it and I see 500 files, I say, what is the most important one? What file is everyone going to, right? I know that Bill is a super user because he has 5000 posts and five stars and five recommendations. When he says that particular crosswalk file is the most important thing, I am going to listen to him. That sort of information, doing that in a scalable way.

I might suggest, and even Bryan mentioned it, this doesn't have to be kind of big budget, cost-built internally. You are hosting various challenges and spending money for this or that. Point it towards the core infrastructure, and you will get the developers themselves actually doing this. Learning center, blah blah blah, all of your videos, blah blah blah, okay, here is what it might look like. I know they are working on this. You have different things in there, fine.

For the challenges, you track an internal database, or you look at crunch base, in the health startups. This is some of my work and I have had a couple of successful ones myself. They fail at an astronomical rate, far outside peer play technology. Why is that? Legacy guys don't want to retool, fine. The young MIT and Harvard kids come in, and they can't navigate perverse incentives, they can't figure out the market layout. They don't know the business case of the data they are looking at or how to use it. They built WheatCracker 2000. If you guys want to steal this app, you can. How fat am I today, a pig, a hippo or a Jabba the Hut, right? This is what they come up and they try to sell it at DTC and it doesn't work.

If you are building a challenge and you are saying, I am going to spend money and reward someone for building an application, one of the criteria for building that application or product should be having viable market value. I am not talking about a whole business case. Todd referenced the HDI analytics and data session I put on at HDI, and where I had a little tiny Mad Lib template, where people had to fill it out, and very well reviewed. That was just literally just forcing them to say, what is the market, what is the product, why would they buy this, what are the challenges, what are the opportunities.

Doing a little bit of kind of business thinking around the data, I would humbly submit would astronomically increase the success rate in these challenges. By the way, if you have a learning center, you can put all that information up there so I can see. I see that he says this and it is working well for him. I say, oh, that is a really interesting use of that data. I never thought about using it that way. Then, we can build on one another, and that is sort of what you try.

DR. KAUSHAL: What do you see as our role in teaching that business case?

DR. ROSENTHAL: I am going to humbly refrain from answering that question now. What I am going to say is that I think, however we do it, if you want to crowd source it, if you want to experts, personally, I think multiple perspectives, I would love to have James, I would love to have the West Wireless fellow talking about some of that. I would love to have some of the challenge winners talking about that.

By the way, if you are spending dollars internally to kind of track, I might look at the companies that have successfully done this and had exits, which are few and far between. You could have a multiplicity of perspectives and uses.

MR. CROWLEY: Essentially, you can wait the scoring of the challenge to have a model as one of those factors. We do this when we run challenges at the business school. We will bring some engineering students, get the business school students, public health students, and it is really helpful. Will you bring in either a resident or the business expert residents, sort of throw them in.

DR. ROSENTHAL: If you capture that and keep it as part of it, and now I go to the public website and I cannot just see the data, I can see the relationships of the data and this taxonomy, business relationships. I can see what people have submitted and how they are trying to use the data for different business issues. That is how you develop and create an internet time.

DR. KAUSHAL: There is a supply side of data and then there is the demand side. Our role in this working group, my assumption was a little bit more on the supply side, but would love clarification. I think that you need both.

DR. ROSENTHAL: I am just throwing out it would be good to get both. Basically, before we even start saying what is more important, where should we focus, there is a demand issue. I might humbly suggest that a little tiny form, in terms of demand and what you would like, with some basic categorization. The stuff you consider, like the iterative quick form, as well as just the basic demand issue, who is using what. Let me just get through and then we can go on anyway, because I have a couple of more.

If I actually win this thing, I am talking about it, and by the way, this has worked really well in other instances. You are doing public data, blah blah blah, perhaps offer some synthetic or fully synthetic. That is really nice for reasons you will get into. A lot of the big folks, if I want to build an application, which I did, and prime the system, I need something. I don't need security clearance, I just need fake, made up data in your structures. You will hear the big analytic vendors and big payers say that sort of stuff. You have already created it.

If you did that in a very specific way, you might even be able to kind of walk around a good deal of the demand for privacy. Pure synthetic files, if it has a specific type of meaning intact. Here is someone at Caltech. She isn't in health care whatsoever, no affiliation with what you are hearing earlier. They use the stuff for privacy, for credit cards, pure synthetic creation that retains a specific analytic usage. It is done under industries that are out there.

Push the ERD, here is the data browser. If I go to Google Public Data, I will actually be able to see all of the wonderful data and I will be able to move things around, and click and ask questions. This gets tremendous usage. If you want to engage the broader community, who is not even data savvy, but the tech business students who might want to get in and ask, what might I want to solve in the health care, starting from the business side, before I hit the data. This is tremendously valuable. These are data explorers, free, Google has one, TABLO has a public one. I mentioned they did a ReadWriteWeb one, there is a Google one.

It got like half a million hits, half a million uses in their contest, right? The contest was they put up a little bit of data, some of your stuff publically available, about a half a million hits. A young girl wins it with comorbidity of diabetes. Why does it generate that usage? She doesn't have to understand the data structures. One of the people actually put it together, and then it allows everyone to play and ask the questions. No data leaves the environment. It is all de-identification, aggregate. That is very good partner usage, basic web analytics, demand and supply is obviously a question, but just somewhere to start with.

Partner publicized monitor, and then you can actually allocate resources and build based on the most popular stuff. I did that with Todd, so it's kind of a joke. You do a photo gallery of Todd and it is the most popular thing.

Finally, Opt-In. Blue Button was crazy, no one will ever use Blue Button, right? You guys remember all of the talk before it went out, 10,000 uses or something like that. What is it, 10 million now? How about just a crazy idea of allowing individuals, like myself, there might be some others, who want to share their data for specific usage. I want to Opt-In and say, please use my data for research. Call it a Green Button and see what happens.

Obviously, you have to work out all of that, but this is another kind of creative way to attempt to get around the privacy. If I say I want the Opt-In for limited usage in a crafted way, if you get even a very small percentage, and if you get something like Blue Button, you might be surprised. All of a sudden, you are creating another very rich source of data. That is it, so thank you very much for the few minutes.

DR. CARR: Jim, did you want to say anything?

MR. SCANLON: No. I think we are at the point now where we just do some brainstorming here and what do you think would be a reasonable next step.

DR. VAUGHAN: One thing I would also suggest, too, is there are use cases, and looking into some of the things we just talked about over the last couple of days. The products that you are looking for don't just have to do with building private businesses, that part of the audience and part of the customer base is public health departments, non-profits. Their use case and their outcomes and their products and their needs and engagement issues are going to be different from a developer oftentimes. They may be far more data literate than somebody who is trying to approach it from the point of view of, I don't have to know any data.

I think that also we shouldn't lose sight of what is a very rich public health heritage, and that people have been donating their data for many, many decades, in the Washington County study, the Alameda County study, in framing in the national nurses study. Much of that is with privacy protection, but has offered a very, very rich template upon which we have learned a great deal. Perhaps move those forward in ways that make it more accessible, I think, is not going back to reinvent the wheel, but use what we know works really, really well.

MR. SCANLON: I think in terms of developers, the motive is not the interest. It is more, if there are public health person apps, that is fine. If they are for profit, whatever they are, I think we have experience at reaching some of those audiences, and we have formal ways audiences. This community, we don't.

We don't have much experience and obviously we need your ideas about how to get those two communities together. We have the auspices of a federal advisory committee. Bring people in, whatever they would be able to say publically, we could do that. If there are some questions we should pursue, in terms of environmental scan a little more. I think you are right. I think what we will do is look at the usage statistics. It might be embarrassing, because you have to promote.

DR. VAUGHAN: There might be an opportunity to say, well, given what we know, we have use statistics now. What would be helpful and useful moving forward, if we found those that weren't quite what we might need.

MR. SCANLON: If you could give us ideas about the metadata, here we are, we just renewed and revised the metadata for the Healthdata.gov. If we are missing a couple of more items, obviously you can't have infinite metadata. No one would ever put up that set. If there is a tendency toward a standard or at least a core that would make it available, not only for what we have to do in a government agency, but what would make it easier for developers, that would be very helpful.

DR. VAUGHAN: I guess to not discount that there the potential community of developers is far broader than you might imagine. To try to think out of the box, so that we are being as inclusive as possible, because I hate to think of these very rich domain experts, who are in struggling health departments all around the country. What are those people doing, when they really should be tapped and brought to the table.

MR. SCANLON: I would think all 3000 counties shouldn't have to do independently a decent set of community health indicators or population analysis. That is more than just the data. That is what we do in the U.S., everybody does it, that is an obvious. Everyone shouldn't have to do it. If someone makes money doing it, too. Otherwise, you have to be an aficionado. You literally have to know who does this, how do I find this, how do I pull together. There are aficionado.

DR. ROSENTHAL: Even if you are an aficionado, I will speak for decades worth of Dartmouth atlas, I still as an aficionado say, where is the ERD. It is not mutually exclusive, it benefits everyone.

MR. SCANLON: That raises more questions than it answers.

DR. ROSENTHAL: Even the super savvy aficionados, basic kind of community sharing, if government doesn't want to share, if you want to say actually, I don't want to share this, or there are security reasons, or we don't have an official taxonomy, which we don't, there are multiple ones floating around. If you take kind of the community development perspective, I share mine with other folks in the community, and they share theirs with me. It would be really nice if we could post that somewhere, where other people can share that, as well.

DR. COHEN: I think the sharing issue, we can certainly figure out how to get past that. I don't see that as an impediment to doing what we want to do. I see the impediment from my point of view is, we are the data holders, we understand this really incredible set of very complex and data that we think is useful for certain things. We don't know how it resonates to the real world, because we are these geeks who focus our entire lives on making the distinction between 37 weeks and 38 weeks gestation.

Not only do I need to learn to speak your language about the kinds of stuff that you need to develop the apps, I need another perspective to ground my distorted reality about how these data can be used and applied in ways that the folks that are actually developing these applications know resonate with communities who might want to use this. I am asking just as much for guidance in that area.

I am having trouble thinking, I could name a million business applications that I think would be cool. I don't know whether anybody else would care. That is the kind of feedback that would help me figure out how we can best get you the information that you need to do your development. That is where I am coming from.

MR. CROWLEY: Basically, that question that you just asked, Bruce, there is not enough staff or time in the world to curate and catalog all of the potential uses of this data and find out people's needs. As was discussed, and has been discussed in different ways, by virtue of a learning community with certain social features, to allow people to have that conversation, for you to also engage in that conversation with them, as they had their questions, you bring their expertise and have it in a community-mediated way that is accessible to others, then that creates those answers to those questions.

DR. COHEN: That is a great idea. Just say, here is all the stuff we have. What can you do with it and what would be valuable for you, or people like you, and begin that conversation. Perhaps, priorities would emerge.

I think we have some basic usage statistics we have for our web-based query. We see what reports people like to generate and stuff like that. It is the people we don't reach. We have a cult community of data users, because our data has never been free enough for people to get access to it. It is the people who don't use the data who we want to be using the data, not the usual suspects.

I think learning community is the way to engage folks. Show them what we have, in a way that they can understand it, and get the feedback so you can develop the tools that they can use to meet their needs.

MR. CROWLEY: You might want to consider incentivizing in some way. There is a community willingness to learn an approach, but maybe run a couple of pilots. Some of the challenges that have already been used for leveraging those mechanisms, to build that into the community. There is some additional quip or quo for participating.

DR. CARR: Okay. Now, it is time to land the meeting, at least for our next thing. I am trying to keep up with you all here. We talked a bit, there are clearly infinite numbers of ways to approach it. There is a need for us to articulate what is it, what are we doing to do. Is it that we are going to talk about, as you said, the uses of the data, the usability of the data, the applications. I think all are valid, but we have got to figure out one.

I will just bring you back to what we talked about today. Because we are linked into NCVHS, we have an interest in getting more expert in what is available. One of our themes for this year is thinking more about our communities, the data available to them, what they do with it. It brought us back to this slide that came out as part of the NCVHS Shaping Health Statistics 10 years ago.

As we look at it today, there is a way we could make it come alive, and even test the validity of it, with the data that is out there. At the center is the population's health, disease, functional status, well-being, the incidence and the distribution. Beyond that, our community attributes, so some of these we could get. The biological characteristics, community age distribution, gender, genetic makeup, health services, number of personnel that are available, cost and financing. These are things that I think that we have been seeing today, that are out there, and population-based health programs and so on.

Then, beyond that, context, natural environment, and this goes beyond HHS, but air quality, water climate, weather, cultural context, political context, and place and time. Bruce, you may want to speak to this a little bit to say, would this be something that we would land on, to say if we use this to kind of walk through, do we have this kind of data. Does it help us, what are the complications that we encountered that we were not anticipating.

I just use this as something. I know it highlights Massachusetts because that is where I am from. This is something HRQ has put together, and it combines a whole host of factors, and comes back and tells you what every state looks like. I just want to point out New York, as opposed to Massachusetts. It is just something I happened to be looking into this week. This would tie in very well with our ongoing focus on our community. This is Bruce's idea.

DR. COHEN: This is the space where I want to be, but again, I don't know where other people want to be and how they see what we can provide. I am very interested in pursuing this from the national committee's point of view. I think this can add value to us understanding community, the health of communities and how people relate to that by essentially breaking down the traditional silos of defining public health very narrowly. I think this is an incredibly important space for the quality of life.

Listening to what Josh had to say and some of the folks today, I don't know whether we can develop applications from this that people will want to buy, if that is the bottom line. Again, I am very ambivalent. I really want to be here and explore the possibility, because I think we can add value from my perspective. Again, I don't know what the folks who once we liberate our data independently, what they are going to want to do with it. I have these two competing notions in my mind.

DR. CARR: Datapalooza and cool apps are sort of the enticements, but do we want to take it to a level of a little more sophistication that is beyond a cool app, but it is a relevant, sophisticated important question. I am not saying this is the only one, but I really like Bruce's idea that it ties in with the work of the committee. You have an audience, and it would take us through our paces if we did Healthy People 2020 and put that in there. I don't know if it is simplistic to think about rolling it up like that, that would be a really cool app.

Ed, you have been saying everybody is looking for the one number, right, that tells you about a community. I think also, as we look at this, we don't know what of these things, in context or community, drive health. We have been talking about it for 10 years, for a decade, but no one has ever put it all together to see if we could get one number.

DR. FRANCIS: Mostly what I have heard addressed is that the current data that are up there could be more user friendly in many important ways. What I haven't heard is, and this goes to what you were saying at the end, and it also goes to the choice question that Heritage put out, really an underlying problem is that the datasets that are currently out there just aren't rich enough, or enough interesting enough.

What do we then start to run up against with respect to stewardship questions, protection questions, and so on, if we are going to go richer. I don't know, and I think that is partly my role here. I just want to tell you that there is some work that the committee is doing, if you weren't here earlier, about data stewardship for uses of community health data. This group is going to get a copy of what we are looking at, before we do anything with it.

DR. CARR: I think of all the array of things that we have seen, there are a huge number of things that do not touch into that sensitivity, and when juxtaposed and integrated harmonize, can tell a very powerful story. I think that we have got to walk before we run. If we jump into de-identification, who has the right to know what, we are really going to have the same experience of others.

DR. GREEN: I would like to make an observation and nominate something for one of your PowerPoints. I know there are going to be next steps or goals or whatever you are going to do up there. I think I am safe doing this in behalf of the NCVHS committee.

First observation is, I have been back there listening to you guys all afternoon. So far, I have not heard anything that cannot fit comfortably into the three themes that the NCVHS is going to guide its work going forward. If you disagree with that, I really want to hear that dissent and understand what that dissent is about. This looks very promising, from my perspective.

The thing I want to put somewhere on your list, if you will let me, goes like this. Regardless of what type of sharing we have, when we stay focused on helping the people in the United States have better, longer lives, and to get the health care they deserve and need, we keep re-exposing, over and over again, that this nation is missing an infrastructure. It happened again this afternoon.

Infrastructure probably means different things in everyone's brain, sitting around the table. I am using that word to put a spaceholder out there for systematic proper use of new knowledge and new technology. We have these infrastructures for other human enterprises, but we do not have the infrastructure in place to either turn the health care delivery system into a learning system, or to turn communities into learning systems. What NCVHS has been seeing now for a decade is a silent cry for that infrastructure.

A minor, but I think important, example for NCVHS, I usually just try to beat this to death. I have annoyed Ed at least three different times about the workforce. We heard in our hearings that the analytic capacity of this nation's public health system is weak. In some places, it is not weak, it is absent.

We have heard that we don't have a fellowship program some place, that is preparing people to do the sorts of work that I hear you guys wanting to do here. Then, share it, disseminate it, teach it, augment it, scale it. Where is that national infrastructure that prepares the workforce to work within a new infrastructure, to just totally change health care and health status in this country?

I have passion about this now for several reasons, but the main one is, I will state this negatively, there is a risk that we are going to get a lot of cool stuff done and it isn't going to matter, because we didn't bother to think what it was going to take to move data into information into stories that change the world. I want to get that on our list somewhere.

DR. CARR: I am going to push back a little bit, because I think we can either come at it from here down to here, or when I look around the room and I look at the expertise, I am thinking if we do one thing, we will learn many things. I am wondering if try to land on walking through one thing, it doesn't have to be a long time thing. Let's see how it goes and that will obviously inform.

I would agree with you 100 percent about the workforce. I have been saying it as long as you were, even longer. Maybe it is too hard, so maybe this is the interface. Maybe we create information that is so easily accessible, that you don't need a PhD in these various things.

DR. GREEN: I don't hear that as a push back. I hear that as helping us with our question about how are we going to adjudicate and coordinate our work. If this group could get down and dirty, and while we are thinking about changing the world, if you could just change something. If you can get a couple of apps that just knock your socks off, and finally you say, now, I get it. I think the full committee is well-positioned to grab that and then to provide advice at a broader level, to do broader policy work that helps get that done.

DR. CARR: Which gets back to Bruce's idea, which is if we landed on this, the long-standing NCVHS kind of vision, and tried to say, okay, now we have made all this data available, let's fill in the blanks, what is it telling us. If we take the healthiest community and find out it actually doesn't matter at all what the streets and the roads and the air quality is. And take the worst community and we see, oh, actually, they have the best of everything. I don't know what the answer is. I don't know if that is a simplistic way.

DR. ROSENTHAL: Real quickly to interject, just pick something. Do whatever you are comfortable with. Do what you know to start with. You are not going to find something else you don't know.

DR. VAUGHAN: I would say maybe a lot of those questions are answered with GIS. What happens where and why, and how is it different. I would also say that, for all of the infrastructure we lack, we also have a lot of infrastructure to work with, that we are not using. It is not so much that that is legacy or old school, and this is bright, shiny and new, and a killer app. It is what is the best practices of both and bring those together, and making it possible for those to come together.

Agenda Item: HHS Approaches to Dissemination

MR. SCANLON: A couple of specific things I would like to get out of the group, not today. Number one, you all have ideas about, I wouldn't call it principles or guidelines, but something like that off the top of your head, about how examples of what you heard today, which HHS could do better. Very practical or theoretical, but again, that we could begin to. You made some very practical suggestions today. This is to improve what we are doing already. Maybe what we could do is to ask people to think about them and we will start getting it on our worksite.

Secondly, I see a lot of student level applications, I will call them student level, and a lot of enthusiasm. I am an optimistic. I think generation three, those will turn into serious, not that there aren't serious ones now. Again, I don't want to dampen the enthusiasm, but if you have ideas about how you get to that stage, how we could get there a little quicker in the health area, that would be helpful to us, including the idea of down the road demos or incentives or challenges. Again, I don't want to dampen enthusiasm. We want to be careful of how we use your time. I think we are looking at sort of beyond that.

Then, third, I think do we want to pick, and we are not ready to do this, an area. An area where it seems to want community level data, and there are a lot of examples of that available now. I am guessing that many of these tools are ignored. You can assess the use of the various tools now. We could focus on this at a hearing, what differentiates use and application from just putting it up there, and trying to get some best practices from some actual example.

Then, knowing what is useful and what differentiates, we probably have a dozen, at least, community-level indicator type systems, and probably the states have even more. We don't need to build another one, we need to find out sort of how to get the best out of those, I think. I would like to start that way, from my recommendations, advice and principles about the job we were first asked to do, which is to help HHS, one, get the data up, get it out there and then we will see what happens later. Get it out in a way that can promote and accelerate applications. Some of the ideas today, I think, were fine.

Then, two, maybe this is just the way things have to, how do you accurate the serious applications. I don't know. Again, maybe we don't know, maybe it just has to happen. Maybe there is no theory, maybe it is atheoretical, it is market-based.

Then, the other one, do we want to pick an area. Again, I would ask the committee to come back with an idea. Is it community health tools, is it quality tools, which you probably don't want to go there. Just stay in the health area, but otherwise, is it social determents, is it environmental. Again, this is the information that communities put up, that organizations put up. You are looking for a house in a neighborhood. Some of these apps will tell you the quality of the schools and probably the environment in there. Is there an area that we could look at to kind of specifically apply some of these ideas.

MR. DAVENHALL: Jim, I prepared a one-pager. I don't want to discuss it today, but I want to read it on the airplane. Part of it is me trying to figure out what you have asked me to do, what job you really want us to do. I actually think we ought to spend some time thinking about how we enrich the ecosystem that we are talking about. Good medical advice to be that good, and proper diagnosis before we start to fix what we don't understand. I offer that to try to figure out what it is you really want to do, and at the same time, give you some ideas of what I think would bring the developers to the table.

DR. COHEN: Thank you so much, Bill. That was going to be essentially my ask for our joint homework assignment. I need to understand more what space you live in and what makes sense, and how you use words and language, so that we can provide you with the tools that you need to succeed. What I would like for you to do is understand what information we have in the space we will operate in, and see if that leads to a fruitful relationship.

DR. VAUGHAN: I would also say, though, that part of the answer to what you have asked is when do we get to from the cool to something that is useful, is to start out with a different question. That is, what would be useful to which we would like there to be easier access to the data, that that would empower communities, that that would push this thing forward.

Same thing as any health survey, the same thing as any other epi study, any other public health intervention. To start with, what are your goals and objectives, to sound like somebody who used to beat me over the head with it.

DR. ROSENTHAL: I would respectfully disagree. When I use the word business and usage, I also mean non-profit, as well. I am just using that broadly. I think things have been done for quite a while, saying what would be good water to drink, and actually asking the market, where do the horses want to drink, is at least how I understood how Todd and Bryan framed the problem. I will share my slides around that.

Bruce, to your point, I can share my two cents, I can provide resources that you guys may want to look at. I think the beauty, what we have learned in the consumer web world, whatever you want to call it, is the learning center, as crowdsourcing that question is a really powerful way to go about it.

MS. GREENBERG: I think this has been a really useful and interesting discussion. I think we have gotten some substantive reports from the department, but also some really great little reports from the members. It is Friday afternoon, and I, for one, am exhausted for various reasons.

I think that what Josh presented, and then what Larry said, and I am sure what is on here, Bill, and what Jim said and everything, there is an opportunity to try to connect these things, because we are talking about at least two different things, which would be very helpful to, like Bruce, who is running a state data operation. That is, what are some really top priority things that would make these data more helpful to developers.

Also, that is a different question then, how could we make this data more accessible to people who are not developers, but have all of these questions. We know that health websites are among the most visited websites. Increasingly, when you ask people where did they get their information, whether they have a health problem or whatever, they are all going to the web, we are all going to the web.

That touches on the workforce issue, which I agree with somebody, I don't know who it was, who said it is going to take years to quadruple, and it won't happen probably, but the people who are getting MPHs, and you don't necessarily need a PhD in epidemiology. That is where some of Josh's suggestions, I think, and we need to explore them more, are ways to make the data easier. Whether it be these tutorials or these examples of these learning centers, not just for applications or for people who really are data savvy, but for the rest of us.

NCHS, years ago, we had an applied statistic learning program or something, and it was one of many things in the cooperative health statistic system that was somewhere kept and had new names, others were lost. There was a reason why we have that, and nothing ever filled the gap frankly.

I think that yes, we want to increase the really technical and well-trained workforce. We want to just push this data out to people in a way that just makes it much more accessible. They don't have to be statisticians, they don't have to be graphics gurus. I seem to be resonating with Josh here, so I feel like I am not complete on the wrong track.

MR. SCANLON: The employment rate is almost 8 percent. I think there are a lot of folks who have the skills who can actually help here. I don't think you need necessarily a lot of new folks. I think you have to repurpose.

MS. GREENBERG: We don't need what?

MR. SCANLON: Well, I think there are a lot of folks, including MPHs, who are unemployed. I think there is probably an excess.

MS. GREENBERG: People who are unemployed, who could get up to speed, because they have other skills or they could be repurposed or whatever. Our focus is not just the people who do the apps, yes. It is also several of the things that Josh suggested, I think, are really for leveling the playing field.

I think we will have a transcript from this meeting. We will get your slides, we will have this. Then, those of you who are members of both groups, of the working group, think about how to integrate it, possibly with the project that you have suggested. I think both the full committee and this working group will be more effective and more productive if they can support each other.

MR. DAVENHALL: Marjorie, something you can do to help us with, some of the things that you are mentioning, like, I look at your conference you had a month ago. Most of those sessions with the data geniuses were filled. That tells us there's something that is really working there, but there was no sessions for CMS at that. They didn't have a data guru. Allison wasn't present talking about it or the data guru behind all of that. I am saying that was well-attended. This is the place where you go and meet a person who has the diagram you want.

MS. GREENBERG: We have to realize the environment that we are in, that we have been told conferences like that, which are biennial, should be no more frequent than triennial or quad annual. We have got to look at all of the different opportunities.

DR. CARR: That is why we are here. We each speak different languages. I understood the first part, the second part, not so much. Likewise, I know what an MPI, but I think it is convening these kinds of groups in this small setting, in larger settings. Josh and Ed, having conversation, coming at it from completely different environments. That is what we are here to do. I think that again, our next NCVHS meeting is in November. It would be nice to sort of have a thought about what is the next thing. I mean, I think you are right, we need to reflect on what we heard today. I would like to hear from you guys, what would feel right to you. We have done some presentations of what there is, what would be the next thing.

MS. QUEEN: Who is the audience for this working group? Is the audience for the working group developers?

MR. SCANLON: It is HHS. You heard Bryan say today, and he really did it sort of without substance. It was how do you just promote this generally. I think we are looking for the data that we have, and we can go into a lot of detail, and for the data we have already posted on Healthdata.gov, other mediums, too, how can we promote the use of the data for applications of all kinds. I think he is specifically looking for applications, third-party developers.

Number two, are there other datasets that should be made available there? If not there, should they be available somewhere else? This is clearly an HHS working group. It is not necessarily for developing reports. This is really to give us fairly practical advice on bridging the gap between our data producers here and other government agencies, and the apps.

Again, we could if we wanted to go into the various user communities like public health and research, but I don't think that is where Bryan is actually asking. There are a lot of ways to do that. I think you have been selected because you are technology. You have developed apps yourself, you know the community, you know them. You know what we are not doing.

I think we are looking for not a report. I think we are looking for advice from you, what are the principles, what are the recommendations that you would provide us, I would say just after today, for example. I would like this to be an actual group where, in two weeks, you give us your comments and we take them to Healthdata.gov. I don't want us to revise Healthdata.gov again and get the metadata wrong, so there we are for another three years.

I think I would like us to keep this fairly practical, agile and oriented. What is your sense of advice after today, after you heard from Bryan, after you took a look at the Healthdata.gov, at the Health Indicators Warehouse. Give us some advice that we could take from the department for that. That is number one, because you already are good, you are experts at this already.

Then, beyond that, it would be is there a longer term, are there other issues. Do you want to think of ways to bring the communities together. It may not be the conference, but it may be other ways. It might be Challenge. We're FACA now, so we could probably host the meeting where we literally bring in, not thousands, but folks that we think would represent the points of view and expand our own.

I guess what bothers me is Healthdata.gov. I love it, I support it. It is such a small part of the data we have, that it is what we are making available without restriction. I guess what I would really like you to do is think about how we could take these other datasets that are not restricted access. You can get to them now, but you have to sort of have a data use agreement.

Then, look at those in the sense of, is there technology that can help us, are there apps that can help us. I don't think we are going to be able to just put those on the website without restriction. Maybe it doesn't have to be quite the way we do it now. Maybe there are platforms and technologies.

We have probably six of these research data centers within HHS, and we can describe those to you. Maybe there is a much more facile, agile platform that we could aim for, that would do a lot of the work for the analyst. Then, you get to the point where now you need the data, like the 5 percent sample or the partially synthetic. I think that is really what Bryan and the leadership here is looking for. Not that we don't have broader issues and needs, but I think that is really what this workgroup was set up to do.

DR. KAUSHAL: Can I make a suggestion? Thank you, first of all, that helps clarify a bunch of things for me. I think we need to divide the workday into the framework I sort of started with the supply site. Joshua had some great ideas at best practices. That is not everything, so how do we increase the dataset. I think we should spend some time, thinking around how do we improve the actual supply of datasets and how do we just make it easier to access.

Then, I feel the next piece of our work, and with our collective networks, I think we could help with that, work on the demand site. As an example, I mentor in New York, Blueprint Health. It is Rock Health in San Francisco. All of these young engineers are crying out for data to make their business models and apps better. I also mentor them on business models. I think we could bring them to the table once we sort of got our ducks in a row around the supply side, iterate with them, get their feedback and then maybe go through another cycle.

DR. ROSENTHAL: There is a course at Harvard we are doing on entrepreneurship around that with the young developers this January.

MS. GREENBERG: Could I just suggest that if we have a transcript, we are going to have a draft summary. We also have these other documents, but they will maybe be integrated since they were part of the record. We have the SharePoint site, if we could use that to start teasing out some principles, as you said, some priorities, both for applications and for more general use, for increasing literacy really in the data across the board. Use that as a way to be communicating, and do you want to have a teleconference before the November meeting? It is two months away.

I think there is a lot of content as to what I was saying before, in what we have discussed just this afternoon and before, and that has been presented, that we could start organizing it somewhat in the ways that Jim had suggested and that all of you have said.

MR. SCANLON: I would think by November, we would have a rough draft. We would just accumulate your ideas for principles, and sort of how HHS do this.

MS. GREENBERG: Priorities, if you can do these four things.

MR. SCANLON: This is what would be most useful.

UNIDENTIFIED SPEAKER: Part of it goes back to the goal to increase the number of apps, the goal to expand the universe of users so that the data is more distributed.

DR. VAUGHAN: That is why we do everything, to improve health and healthcare.

MS. GREENBERG: Many people as possible, able to make some use of.

MR. SCANLON: There is an approximate kind of goal and that is get our data out, so that would provide the conditions for these.

DR. VAUGHAN: You will not forbid me to tell any CMS and Ed Sondik and everybody else how they can improve theirs as well, because ultimately, they will do that. You see how it all started with the open government focus and with these ideas of Healthdata.gov. By the way, it is health and environment data. Healthdata.gov includes other agencies.

PARTICIPANT: I think if you want to get beyond a cool app, you have to ask for something beyond a cool app. You don't have to know that it has this button here and that button there. What is the question you are trying to answer.

MR. SCANLON: What is the health or other personal issues? By the way, it is tools, as well. For example, I think NIH just put up an BMI calculation index. We have had others. Those are tools. You decide, you give us a sense of, are they valuable? Are the datasets more valuable? Are they both valuable in just separate things? Then, those kinds of apps get put on all kinds.

DR. VAUGHAN: Does that make a difference in people's health? They get made, but you have to ask, so what? Again, kind of pull back to kind of a more epi framework, what is the question you are trying to answer.

DR. COHEN: The question that I am trying to ask and answer is, how can we use all of the data we have locked up here, to create information to help individuals and the communities improve their health.

MR. SCANLON: Through this particular technology for now.

DR. CARR: What I am hearing is we are going to get all of the information compiled. We will have our transcript, perhaps we could get an executive summary out of that transcript. I have tried to capture and revise, as we speak, what we said here. We will meet again on the second, on November 14^th, so between now and November 14^th, is there a sense that we would want to communicate, develop, revise, have a call?

DR. KAUSHAL: I think it would be a great idea. We have three hours again on the next meeting? I think we can realistically have time to maybe jump into one of the big issues, however you want to define it. Again, I am open to whatever framework. If it is a supply site, we can focus maybe the next one just on supply, if it is demand, we could do that. If it is supply, I would ask us all to think through.

Joshua has already started with his stack, as well, around what are some of the best practices we have from our combined experiences. Why don't we just lay that all out, maybe on the short side. Hopefully by the next meeting, we would have our compiled thoughts and a couple of pages probably around what we think is the best practices. Then, we can prioritize, okay, what are the most high impacts of these, and use the next meeting to debate that.

Hopefully, that will whittle down to maybe 10, 15 different things. Then, hopefully, we can maybe even implement a couple of those. That is just a suggestion. I am not biased either way. How do we feel about that? We have homework, great.

DR. SONDIK: Let me throw one other thing out that I didn't hear. First of all, I think the focus on Healthdata.gov, and is this a good tool for marketing the data, I think is really important. It seems to me sort of a two-dimensional kind of thing. I don't know, it seems like we need something more dynamic.

MR. SCANLON: I think that we start with that, but I think we immediately get others.

DR. SONDIK: Let me throw one other thing that comes back to the people on the supply side, I suppose, which is do we have a responsibility that goes beyond the metadata in saying, here it is. Is there something else there, because we know the ins and outs. We know what it is not saying. Or maybe we don't, we think we know what it is not saying. Is there a responsibility here? Is there a liability here that goes with putting this stuff up, that goes beyond the mosaic issue which is a concern?

The other thing, though, is that a point that came up many times, you have raised it and Bruce raised it, as well, which is the use side. I think before that first Datapalooza, we had the supply people, one strange group, and then we had this other strange group, which was the developers. They got together at the IOM, at the National Academy of Sciences, and I tell you something really sparked.

There were a few questions, like somebody said something about, well, you know we know the difference by race of the number of people who get screened for something. Somebody said, yes, you really know that? Well, of course we know that. It is clear to us that we know that. The other side had no idea, really no idea, of what was there.

Of course, they weren't thinking about it from a standpoint of what the questions were and the users were. I think another component to this is the user side. When I said before, well, you know, it's not clear exactly to me how people would actually use the survey data, Bruce had semi-infinite number of ideas immediately about that. I think it is really important in this, to crank that in. I feel we are doing a lousy job of marketing the data, and it is not clear to me that Healthdata.gov, a website, is what we need. I think it is terrific to build on, but I think we need something more dynamic.

MR. SCANLON: It will become obvious to all of you that that is necessary, but not sufficient for some of these, and that is why that will be the next.

DR. COHEN: One other comment, I think this endeavor is based on an assumption and a shared belief that data will lead to better decisions, and improve individual and entities lives. That is our underlying assumption. We have got lots of numbers, we have got lots of things out there, and we want to leverage that to improve people's lives. We are looking for strategies to do that. I think we just need to keep in mind that that is our assumption.

MR. SCANLON: I would say that information is necessary, but not sufficient. Other than New York City, you can't force people.

DR. SONDIK: We can prove that it is true.

DR. VAUGHAN: It is not always true in the same way, in the same place, at the same time. You have to measure it. It doesn't mean you have to have perfect measure, but you have to measure.

DR. SONDIK: We know how many people in a county don't know that they have high blood pressure. We know that because we have it really solid on a national level. We can model that, that is an app. Now, how good is that model? That is an issue. That is the kind of thing where we use market forces to try to tell us that. Right now, nobody is marketing that. Nobody is marketing the number of people who have undiagnosed diabetes at this point. Part of that is we don't have the workforce over there to use that information.

DR. COHEN: I am comfortable. I have drunk the Kool-Aid.

DR. CARR: The table has been set for a robust and exciting discussion, based on the entries onto the SharePoint. Thank you all very much. I look forward to our next meeting.

(Whereupon, at 5:04 p.m., the meeting was adjourned.)

DEPARTMENT OF HEALTH AND HUMAN SERVICES

Meeting of The Working Group on Data Access and Use

September 21, 2012

Proceedings by: CASET Associates, Ltd. Fairfax, Virginia 22030

TABLE OF CONTENTS

P R O C E E D I N G S (2:07 p.m.)

Meeting of
The Working Group on Data Access and Use

Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030