Episode 037: Spreadsheets, Shared drives and Emails: The Scary Side of Enterprise Search


Reoccurring guest on The Paperless Podcast and Senior Customer Advisor for Government at Hyland Software, Kevin Albrecht takes the mic, giving voice to a critical topic that he’s been in countrywide conversations with government agencies about unsecured, personally identifying information circulating in disparate government systems.   

If you work in any industry that manages sensitive information, you won’t want to miss Kevin’s insights on how to more securely store information while maintaining public transparency efforts, quickly populate public records or FOIA requests, or the several, nation-wide use cases of government agencies that are already leading the way in more secure, compliant and accessible enterprise search efforts.  

Check out this episode!

Read the Transcript
Kevin Ledgister: Welcome to the Paperless Productivity podcast where we have experts give you the insights, knowhow, and resources to help you transform your workplace from paper to digital and making your work life better at the same time.

Thanks for joining us. My name is Kevin Ledgister, your host. And today talking about one of the most challenging tasks for government agencies, which is finding information that is stored in so many different systems where otherwise known as enterprise search. You will want to hear this because you’ll be amazed, maybe even a little alarmed at how much unsecured, personal identifying information is circulating around government systems. But it is also a very useful tool for assisting government staff when the public is requesting documents. If you’re not in government, you still would want to lean in because this technology can be applied to any industry, but today we’re focused on our public agencies.

So with me, is Kevin Albrecht, a Senior Customer Advisor for Government from Hyland Software. Kevin has been talking to government agencies all around the country about this, and he’s appeared on our podcast channel before. So welcome back, Kevin.

Kevin Albrecht: Hi, good to be here, Kevin.
Kevin L: So in the interest of full disclosure ImageSoft, who is the sponsor of this channel, is a partner of Hyland Software. So we do have a relationship behind the scenes that we want to let people know about, but I think that you’re going to find this information very helpful. So just as kind of start off Kevin, for our listeners who maybe aren’t familiar with this term enterprise search, what it means, what it’s all about. Can you give us a little primer on enterprise search?
Kevin A: Yeah, absolutely. So for those of you that are familiar with OnBase, right? We know about going in and searching for things that reside in OnBase, or if you have SharePoint searching for things that are in SharePoint or you’re in your outlook, your email and you need to go find something in that, or in your hard drive, right? We’ve all done this where we go in and look for these different things. But what happens is that we’re finding that at government employees, like as Kevin you said at the beginning, across all the sectors, right? How much time people are spending searching for content in all of these different repositories, especially when they’re not sure which one it’s residing in. So agencies will have all, like I said, all these different places they’ll have cloud, shared drives in the cloud. They’ll have the hard drives on their desktops, email, all of these, including OnBase, or they could even have legacy other content management systems.

And what enterprise search does is it gives you a single tool for searching across those multiple repositories, right? So it really gives you one interface that you can go in and understand what we’re talking about. So if you’re listening to this podcast and one thing that you might want to do while we’re doing that, go to Idaho State Legislature’s website and in the upper right corner, there’s a little magnifying glass. And you kind of look around and play with it a little bit while we’re talking about it, because they’ve integrated this on their public facing website that goes and it searches across all over their different repositories.

Kevin L: Wow. That’s great that you can actually go out and try that out. I’m thinking I’ll do that right after this podcast. I know that we have we have actually a webinar coming up, or we’re talking about a department of transportations. And I know that one of the challenges that they have in there is that they’ve used so many different systems and content that’s stored in different systems and finding that each one is a little bit of a challenge in terms of being able to see that data information, how else can this enterprise search fit into a government agency? What are some of the use cases you’re seeing that are popping up that really just makes a lot of sense.
Kevin A: Yeah. As I mentioned it, because it’s kind of a couple of different ones that jumped to mind. So one is, not only trying to find the content that people need to do their jobs, there’s also a use case out there for fulfilling public records requests, right? So not only the worker’s going to find that content, but they’re talking about things like with Idaho State Legislature or other customers, and they’ll put fulfilled information or stuff that they want the public to be able to access out on these portals that they create using enterprise search so that the public can go and do the searches themselves. This frees up a lot of time for government agency workers so they’re not having to do it. I guess, and the third one that we can touch into, it’s really protecting the personal identifying information that can reside in all of these different repositories.

I think even at Hyland, and I’m sure at ImageSoft, think about the number of different places where your social security and date of birth reside. There’s a lot of situations where we even had where some administrative assistants were restoring passport information and family information of some of our executives to kind of help them with their travel plans that they would make. And they’re just out there residing, just in a shared drive that wasn’t secure. And we found this by running enterprise search to kind of uncover what information was where it was. We found there was a ton of it laying all over the place. Again, Excel spreadsheets, shared drives, hard drives, emails. It’s crazy.

Kevin L: That’s kind of scary. I can imagine that if some government agencies obviously think about this, they even may have most data secured and locked up inside their systems, which is fine or encrypted inside their database. But if somebody is working on an issue and they’re just going to copy and paste and dump information into a word document and store on a drive somewhere, that stuff becomes accessible, right. If a hacker breaks it, find that right.
Kevin A: Yeah. Somewhere where a spreadsheet where, I’m sorry, Kevin. I was just going to say, yeah, somebody creating a spreadsheet that ends up with hundreds of social security numbers. And then when we’re there done with the project, where does it go? It just sits on their hard drive. They don’t delete it. They don’t get rid of that content, but so that can just kind of build up and up and up over time.
Kevin L: Wow. Well, and have you actually ever been at any sites like doing your presentation where you actually test it up the software and saw all kinds of information come up in a presentation?
Kevin A: Yes. There’s actually an interesting story that came up. We had, let’s say a US intelligence type agency that had a question about their software and their usage, and so in order for us to work on this content that they sent us, they went through and scrubbed it first of this certain type of information.

And so when they sent it to us and we thought, this would be neat to just kind of run enterprise search against that again and they missed some. And they were not very happy with the person they thought was in charge of finding all this classified information when just in a matter of seconds, we were able to uncover it.

Kevin L: Wow. Let’s talk about that just for a moment, because we mentioned social security information, there could be other information that’s private, and unless you’re searching for a particular one, really your search tool is open. So how does your search tool, this enterprise search tool, find that kind of information as opposed to just throwing up false positives all the time?
Kevin A: Yeah. Are you saying about the specific content inside the page?
Kevin L: Yeah. So let’s suppose that we wanted to find any social security numbers that are unsecured in our system. How does this know what to look for in there? Because it may just be numbers, it may not actually say social security number and everybody’s is different. So it’s not like you’re searching for a specific number.
Kevin A: Yeah. So what we’re able to do, and it’s not just things like social security numbers, it could be searching for any type of words that are located on documents or any groups of words, but when we use social security numbers and what it does is actually looking for a number format, right? So it’ll go, and when you connect it to all of these different repositories, it can go out there and either on a scale, this was kind of neat now on a schedule basis, right. You could say, okay, every week I want you to go look at SharePoint and OnBase and hard drives and Outlook and all these different things and just scan through there for things that look like social security numbers. You could also then say, okay, well, I want to look for documents that have social security numbers and dates of birth on it.

And then you can change that by the repository. So say PeopleSoft, which is managing a lot of HR information. You can not have it looking for that type of information in PeopleSoft. Because you know it’s going to find that. Another use case could be in this type of repository, these documents always must have this word on it. So it could be that this in this repository, these are only for things that have been redacted and those documents will say the word redacted on it somewhere. So everything that resides there must have this. And if it doesn’t then a notification would get sent to an administrator.

Kevin L: So this could really help with like data governance and just making sure that your data is secured, that your information is correct, that that things are being stored correctly. So it’s not just, I’m looking for something and find it, I’m might be also looking for things that I know should be there, but if they’re not, or if they’re not in a proper format, then that also raises a flag too. And it gives you a way to address that, correct?
Kevin A: Yeah, absolutely. And it can be a great way, especially when people are in a process of migrating their contents, maybe from a legacy or a home built document management system into OnBase, search can be a great tool for helping you monitor or in search for content in that legacy system right? So while you’re like, hey, this is going to take us a while to move all of this stuff. That’s fine, you can do that and move it as you need. And in the meantime, enterprise search can be that kind of singular tool to help you do your day to day searching.
Kevin L: Wow. That’s awesome. Now, since you mentioned OnBase and that’s the company that you work for, and I think that’s great. What are some tools, and maybe there are tools that are out there in the marketplace that are offering similar solutions, but let’s suppose that we are doing a search for certain things that should be redacted. So going back to that website portal that the public can go to, certainly you don’t want anybody, Joe going to that website, being able to search and come up with personal identifying information. So what are some things that OnBase can do that can kind of help mitigate with some of these things from this content?
Kevin A: That OnBase would do?
Kevin L: Yeah.
Kevin A: So OnBase in its capability for the redaction, right? So that you can do the redacting of information on a document and people can either have that permission or the ability to do that or not. And as far as with enterprise search doing a similar thing is, like we mentioned social security numbers, you can go in there and say, every time you find a social security number, you can put a redaction over the top of the viewer. So that’s not even necessarily redacting that actual document, right? So that document can stay as its own, but just the enterprise search and as viewing capability would mask that social security number. And I have a really great example of that. If you remember Kevin, several years ago, then Governor of Florida, Jeb Bush released all of his emails out to the public and said, I want to have full transparency and so we’re releasing all of our emails.

And I told our sales engineer go download those immediately. And so he went downloaded some over 50,000 emails and a week later, someone in the press uncovered that those emails were full of social security numbers. Staffers emailing back and forth, hey, can you help this constituent with this? Or here’s this information. I mean, so still to this day, when we can do a demonstration of enterprise search, we have those emails that contain those social security numbers, and we can show that. But what we’ve done is used that redaction capability to go over and mask over the top of them so that we’re not showing the social security numbers to individuals, to the general public.

Kevin L: Wow. That’s really scary. But at the same time, that’s a great feature because you can then have the public view this information, which you want to make accessible to them, saves their internal staff a lot of time. But at the same time internally, when staff needs to access that data and information, they don’t have to worry about the masking because they need to see that data to do their job so it just simplifies the whole process for them. So what about in terms of public requests and discovery for legal purposes? Have you seen it being used for that perspective or how does it help with that overall process in terms of what you do?
Kevin A: Yeah, absolutely. So public requests, fulfillment, kind of again, has the two aspects it’s for that FOIA, PRM worker fulfilling those requests and what they can do then is guarantee that they’ve uncovered all of the information that’s related to the request, right? They know that they’ve looked in every repository for all the information. So when a request or comes back and says, I don’t believe that this is all of it. It’s very easy for them to say, God forbid, if it goes to court or something like that to say, hey, I’ve done full discovery. And I can show that this is all of the information.

Another use case for it also is, it’s Duchess County, New York, and a great use for fulfilling public records requests by they’ve gone and they’ve put all of their ancient documents in the repository. So you can go and search yourself for a record type and you can type in warrant and you can find the actual warrant for the arrest from 1792 for Jebediah Jones or whatever it might be. And you can go and fulfill your own kind of FOIA requests and find the information you’re looking for. So that’s kind of just neat to look at these old documents. But the other way to do it is allowing people to say, even if they’re going to search for previously fulfilled FOIA requests for public records requests, right? That the public is able to then go onto the portal themselves and look for the type of information that maybe has already been requested and fulfilled. And then the government agency doesn’t have to spend any time redoing it.

Kevin L: That’s great, does this also help with like, what I would call like a fuzzy search? So I might search for a particular word, but I may want to turn on features so that it might search for words that are like synonyms or different conjugations if you’re close to that word?
Kevin A: Absolutely. Yep. And you can kind of dial that in and up and down as you need it just, as you said, kind of have that fuzzy search capability of similar words, variations of this word, and then you can also have that it is next to this other word or within say a certain number of words next to that other word. So yeah, because depending on how fuzzy you want it to be, you can really dial that up or down.
Kevin L: So this could really help with data governance and maintaining the security and integrity of data, and that it’s in the correct format. It could also help with finding data that should be there, but isn’t that right?
Kevin A: Again, being able to prove that you have fulfilled the records requests and to show that you’re providing, all the documents and another great aspect to not only be able to do that is maintaining the automation that comes with that, right? So being able to not only search for documents, highlight the content inside of that document that you’re looking for, tagging those results, and even then submitting them onto a workflow from there, right? So it’s kind of doing all of these things, finding the search, but then getting that content into the hands of the next person in the row that that should be dealing with it.
Kevin L: So then I would assume then for this tool to be as effective as possible, that means that we need to have content digitized, right? So stuff that’s sitting in a paper file cabinet or something like that, this tool is not going to help you. You’re still going to have to climb through those files or go into the dungeon, the musty dungeon or have Iron Mountain ship those boxes back in so you can climb through those. So that’s an important message, right? To begin this kind of efficiency in, go ahead.
Kevin A: I was just going to let you know. Yeah, the last thing we want to do is use this to search across all these different digital repositories, and then go print the information out in the hand that to somebody. That’s going in the wrong direction.
Kevin L: It certainly is. So do you have any idea, and I’ve heard numbers before, and I don’t know if there’s any updated numbers, but how much time do public officials spend on just responding to FOIA requests, just searching for the documents and respond to what the public is asking for. Do you have any idea or any metrics on that?
Kevin A: I don’t off the top of my head, but I will tell you, I was just talking with another customer and it wasn’t specific to public records requests, but they were sharing that just one of their problems was they were spending 10 to 15 hours a week just looking for information in order to then do their job, right? And so that was outside of public records request, but that most assuredly applies to that too, where they’re not actually performing the task of what they need to be doing day to day, they’re spending their time looking for the information, so they can then go do their jobs.
Kevin L: Yeah. If you think about how many times have you searched through Outlook trying to find that right email, and then you multiply that by a bunch of different systems searching for what you need.
Kevin A: Yeah. And then not finding it and having to ask someone to, I just unfortunately had to do that with an email the other day because I know I have this somewhere, I keep searching and searching. And when we talked about those fuzzy searches, what happened with, I did in Outlook, I wasn’t searching for the right word. And the word in that email was a variation of that, was close, and it was near another word. So if I had had a tool like this, I could have gone in just search and found what I needed, but I wasn’t doing it correctly. And therefore, I said, can you send this back to me? I know I lost it. Sorry.
Kevin L: Wow. So, so Kevin, this has been a great discussion on enterprise search and I think so many government agencies, I mean, virtually every agency could benefit from having technology like this just to make it more efficient and functional. And even in our current crisis too, just think about that, how tough is it when you have to download a different systems and you can just open up an application, or I believe the enterprise search tool that you’re talking about operates basically a web browser, right?
Kevin A: Yeah.
Kevin L: You can go in and login and authenticate and search. And so even in this current crisis where people can’t go to the office, you could still be getting your job done, and you still can be responding to the public. You still can be responsive to things that need to happen from that perspective, so I think that’s great. Do you have any websites or locations that our listeners could go to, to maybe get some more information on this?
Kevin A: We’ll say here’s the easiest way to do it. If you go to your web browser and you look for Hyland and enterprise search, will take you to a general website that talks about enterprise search as a whole. But if also look for Hyland and confidential information discovery, that’ll take you to a website that has a lot more information about using enterprise search to monitor all of those repositories, the content that shouldn’t be there, and it includes a brochure and a really nice little 20 minute webinar that they talked about the capabilities and has a little demonstration. So I really recommend people to go check that out because it gives you a nice overview of the capabilities.
Kevin L: That’s awesome, And for our listeners, if you do find that information and this is something that you want to follow up on, you can certainly contact us at imagesoftinc.com. That’s imagesoftinc.com. You can either use a web chat or click this in an email, or you can even call us and we would get the correct information out to you and we can begin to talk about pricing if you’re interested or what that might look like in your organization. So, Kevin, thank you so much for joining us, this has been a wealth of information. And so just thinking, now I’m a little bit scared. I mean, I’ll sleep still soundly tonight, but knowing what information is out there, but again thank you for joining us.
Kevin A:
Kevin L: Yeah. And thank you again everyone for joining us today and if you haven’t already done, so be sure to subscribe to Paperless Productivity, where we tackle some of the biggest paper-based pain points facing organizations today. We’ll see you next time.

Thanks again for joining us on this podcast. And if you haven’t already done, so be sure to subscribe to Paperless Productivity, where we tackle some of the biggest paper-based pain points facing organizations today. We’ll see you next time.

One thought on “Episode 037: Spreadsheets, Shared drives and Emails: The Scary Side of Enterprise Search

Leave a Reply

Your email address will not be published. Required fields are marked *