News:

Welcome to the new (and now only) Fora!

Main Menu

Research involving internet data

Started by stupefied, June 21, 2019, 06:38:54 AM

Previous topic - Next topic

pigou

Quote from: stupefied on June 26, 2019, 01:58:10 PM
What is frustrating is that I tried to do the right thing after I learned of my innocent error; yet, if I had not reached out for permission, maybe things would have been easier?
This is often true. Hence the saying that it's easier to ask for forgiveness than permission.

Quote
The other thing is that the website forbids commercial gain from use of the site's data.  I don't see that as a problem; I can simply just forego the advance and any royalties from the book.  My intention is to use the book for academic purposes, not to make money (especially since academic books tend not to sell well anyway).
The issue is that your publisher is going to sell the book and not give it away. So, they are in violation of the site's terms. Which is a good reason not to bring this to their attention, too...

If I were in your shoes, I'd stop doing anything on email related to this and talk to whomever you asked at the university and tell them they no longer need to reply to it. Then go ahead with the book and focus on communicating your results. You've gotten a book contract... enjoy it.

There's probably a thousand things you're doing every year that could get you sued by someone. Deal with it when someone actually sues, which isn't a low bar: hiring lawyers is expensive and they're unlikely to spend tens or hundreds of thousands on something from which they have nothing to gain even if they win.

stupefied

Thanks for your replies, Pigou.  Lots to think about. 

Just out of curiosity, what would the rest of you do if you were in my shoes? 

BlueberryBagel

Quote from: polly_mer on June 21, 2019, 07:07:32 AM
Welcome!

I'll repeat my advice here for readers:

1) Check with your IRB because there are still humans somewhere who wrote the text.

2) Check with your colleagues on whether what you propose will count as publishable research.  A convenience sample is increasingly frowned upon for scientific research due to the lack of an adequate control group with strong bias for who participates in internet forums.

What I would do is 100% this.

My diss contains data that was published elsewhere, and has been available for years, online and in print. I was required by my IRB to get letters from the authors of said texts that I was allowed to use that specific data for my dissertation and subsequent publications.

Some schools are being much more careful about thinking about internet data collection. In my field, your project as you describe it isn't unheard of, and would require a cursory IRB review at the least because there are human subjects.

There's nothing to be lost with shooting IRB contact people an email ahead of time and asking how  to handle it. I don't know if other places would require the same letters, and since each place has their own ideas about CYA, you need to ask your own.

I got a surprise when my latest project involved oral history but didn't count as "research" under federal guidelines because the IRB did not feel the results would be generalizable. What is important is that one hour after I submitted the IRB application, I got an email message stating this and containing a review number. Your place might have handled it similarly, and you would have done due diligence.

polly_mer

For the readers at home: This is one way that being mentored by wolves manifests itself.

Quote from: stupefied on June 26, 2019, 01:58:10 PM
What is frustrating is that I tried to do the right thing after I learned of my innocent error

There never should have been the appearance of an option of an innocent error here.  A very early step in planning the research should have been establishing rights to use the data through either obtaining written permission or being able to satisfy university counsel/IRB that due diligence to obtain the permission was attempted and failed.  No analysis should have been performed, let alone publication contracts sought until rights to use the data had been established.

Pigou is more cavalier about being sued and/or fired than I am.  The initial barrier to suing someone is pretty low (hundreds of dollars of a law office's time, not tens of thousands), especially when the evidence is so clear: written notice that using the data for external purposes was explicitly not given permission.

I'm not sure the best way to proceed, but one possibility is contacting authors who published using this data set and seeing if they have written permission that perhaps they could also extend to you in some grandfathered sense.  As Pigou mentioned, now that you've already set the flag as intending to use the data in non-authorized ways, brazening out the situation and claiming ignorance isn't an option.
Quote from: hmaria1609 on June 27, 2019, 07:07:43 PM
Do whatever you want--I'm just the background dancer in your show!

BlueberryBagel

And in addition to polymer's wisdom, my training in IRB tells me that ignorance is not a defense. Your institution's HSC will host on-line training modules and descriptions of what kinds of work must have IRB oversight of what type. The institution will put the onus on you as a researcher to make sure you are operating within the lines.

And, to be sure, I am not saying that IRB committees are flawless decision-makers, or that the process is highly effective, etc. But these are the rules, and as a junior scholar, I am certainly lawsuit-averse. What I would suffer in stress and taking time away from my other projects is not worth the risk, nor is my reputation, nor is my school's.

pigou

Quote from: polly_mer on June 27, 2019, 10:33:17 PM
Pigou is more cavalier about being sued and/or fired than I am.  The initial barrier to suing someone is pretty low (hundreds of dollars of a law office's time, not tens of thousands), especially when the evidence is so clear: written notice that using the data for external purposes was explicitly not given permission.
There's no requirement to receive written permissions for the use of publicly available data in analysis or research -- and there are court cases to back that up.

There are hundreds (thousands?) of papers using data from Twitter, Yelp, Google Reviews, etc. virtually all of which have at least violated the terms of service and quite possibly federal law for circumventing restrictions (all that don't have a coauthor from Twitter, Yelp, Google, or Microsoft). Lots of them are published in very good journals.

The risk of getting fired is solely from going against the advice of the university's lawyers... which I'd never advise anyone do. But their advice isn't necessarily based on the legality of what the OP did.

Quote from: BlueberryBagel on June 28, 2019, 05:36:52 AM
And in addition to polymer's wisdom, my training in IRB tells me that ignorance is not a defense. Your institution's HSC will host on-line training modules and descriptions of what kinds of work must have IRB oversight of what type. The institution will put the onus on you as a researcher to make sure you are operating within the lines.
The OP can easily get an exempt declaration from the IRB. Scraping data from the web does not count as a "research activity" that falls under the IRB. The issue is that the OP wanted to clear his conscience with regard to the legality of getting the data in the first place.