Peer reviewing (in Comp Ling) - Why it is broken and ways to fix it

Posted on Mer, 30 sep 2020 in reviewing

Reviewing takes a lot time

I review a lot. If review for conferences, workshops, journals. I am also AE for TACL, ACM Computing Surveys, Computer Speech and Language and Traitement Automatique des Langues (mostly in French). Which means that I review too much. Hopefully my employer (CNRS) does not seem to mind. Yet. The situation is getting worse every year, to the point where I can feel¹ -- as many -- that our model is broken. This point was already raised years ago in our community notably by K. Church (2005) and I. Mani, (2011), as in others. A very long thorough analysis of the issue, and proposals to improve our current practicies is in A. Rogers and I. Augenstein (2020)¹, following a series of Blog posts by the first author. These proposals are mostly intended at maintaining fairness and accountability of the peer review procedure despite the increase of submissions, and I share many of them.[\R&A] My view point is a bit different, though, since I am primarily interested in ways to spare reviewer's time - since at it seems, the most critical resource in the system.

We review more because there are more scholars and venues and papers, and our field is growing and being more active, and diverse. This is probably a good thing. More scholars could imply more reviewers, so the situation may not be that bad.[^2] Still.

Another reason why there are more reviews is because there a a growing pressure on scholars to increase publication statistics :to get a enroll in a good PhD, to get a position, to get tenured, to be promoted.

As my field (Computational Linguistics / Natural Language Processing > Machine Learning > IA) is a very active, there is also a pressure to be the first to [add stuff here], so papers need to have to be published quickly and getting reviews is now part of the writing process. Which could be considered as a good thing as well, reviews can help strengthen papers.

So for a writer, there is no reason why not send a paper to review: if reviews are good, the paper will be published with the "peer-reviewed" stamp, if they are bad, they can help make the paper better. If this the paper is not good enough for the current deadline, it might work for the next. All this for free.

For a reviewer, all this combines to make things worse -- more papers to read and reviews to write. All this for free, and most of the time noone will ever know. So one of my reaction as been to cut down on the number of reviews, and stop reviewing (and submitting) papers outside the the core NLP community (no more NIPS, ICML, ICLR, AAAI, IJCAI, etc) for me, even though these conferences also publish a significant lot of NLP related papers.

For the field as a whole, this has a clear downside: even with more slots and venues, more papers compete for the good spots, which means that: (a) people have less time to review, and the quality of reviews drops (may be also as a conjunction of side effects such as the direct or indirect implication of more junior researchers in the process) (b) paper acceptance is increasingly random and very much depends upon the set of reviewers; (c) good papers get rejected; (d) bad papers can be reviewed multiple times, and waste reviewer time (a); (e) with more "core papers" to review, there is less time to accept reviewing from other, related, fields (applied linguistics, ML, image) - maybe.

Why review ?

Because this is how our field, and more generally how science, works. Ideas, experiments, claims, results, proofs need to be reviewed, verified, criticized and discussed before they can slowly be turn into facts and knowledge. Ideally, we should even aim at full reproducibility. Reviewing is part of our job.
Because this is good for our CV. Being a reviewer, a workshop / area chair, an editor, is good for the CV. It shows recognition and expertise. With more reviews to write, this might not be a very distinctive feature of our resumes. Nonetheless.
Because we can learn from others. Reviewing helps keeping in sync with new ideas in the field. So does following arxiv, sure, but reviewing, in addition, requires to actually read, rather than skim, the papers (well, that is the theory).

In any case writing a long, detailled, well-argued review, remains a pure act of altruism. What you write will not stay; almost noone will read it, and (almost) noone will know about your work. Why bother? Writing bad reviews is so much better. You still improve your CV, at a reduced cost. And (almost) noone will know.

The ACL discussion and proposals

ACL has opened the discussion regarding a series of short term, as well as long term, proposals to reform the reviewing and publication process. The discussion page is still open. Use it.

They start their discussion with four problems with the current system, three that I share (see below), and one that I do not. I will start with this one.

Turnaround time. Quoting the ACL Exec: "A large factor in the incentive to go to arXiv is that turnaround time from when a paper is finished to when it can be made public can be many months, especially when you get several random rejects." So one first comment is that this time issue is related to the first problem identified by the ACL Exec (arxiv and non-blind reviews). If we fix this one (and we can, by having an ACL operated blind archive, a proposal that is starting to gain momentum), may be the turnaround issue will not be that bad. There are multiple deadlines around the year and multiple venues to publish good papers. Most conferences of our field are called "archival" for a reason - the papers they archive are meant to remain relevant for a long time. For papers that cannot wait the extra-month before being released, because they risk to lose their value, arxiv is an excellent choice, but please do not expect a good (and free) peer review. This is in fact my take on ACL Exec's first proposal: a ban on pre-publishing on arxiv - as this breaks blind review. So sad news: good peer review is somehow antoganistic to fast turnaround -- even if we can make our best effort, as TACL has, to reduce the total duration of the reviewing process.

A possible answer to this might be the launching of a new publication vehicule 'Findings of EMNLP' for borderline papers which are in a hurry to be published on a peer-reviewed support. The future will tell whether this initiative will help reduce the number of unpublished papers on arxiv (which is not a concern), as well as the number of resubmitted papers (which is a concern). There are good reasons to support this initiative. A. Rogers and I. Augenstein (2020) give (ironically, in a 'Findings' paper) compelling reasons why the exact contrary could well happen.

Comments

Reduce the number of paper to review: draining down the pipe

Ask each author to explicitely endorse their paper(s). It is not rare to see some authors having their name on dozens of papers submitted at a given conference. When reviews are random, this is the most rational course of action: submitting more papers will maximize the expectation of the number of publications. How many papers can one actually author over a short time frame? That number may vary from indivudals to individuals and from teams to teams. There is however a definition of what authoring a paper means. A simple nudge would be to ask each author to read this short text and validate its status as an author. In fact, our ethical guidelines focus much more on editors and reviewers than on authors. This is an easy fix.

Desk reject more papers. Style and presentation issues are a problem. They make papers more difficult to read - and pose reviewers an additional burden: to report typos, style errors etc. A first step would be to have a decent English style checker included in the submission pipeline, with a public bar for automatic rejection. Mentoring could be proposed more systematically to assist paper writing (until the time when everyone can write in her own language). Given than we have already solved Machine Translation and Natural Language Understanding[^3], this seems doable. Likewise for plagiarism and incremental publications. Good journals have such tools. We should as well.

Better control republication. This is a TACL policy. Poor papers cannot be resubmitted before a period of time - taken to be long enough for the authors to make the necessary adjustments.

Distribute fixed submission tokens. This amounts to granting authors a limit of the number of papers they can have reviewed each year. As for the previous proposal on republication, this poses boundaries issues that would be hard to fix given the growing "fluidity" between conferences (LREC - Interspeech ICASSP - ACL - ECAI, IJCAI, AAAI - IMCL, NIPS, AIStats, ICLR, etc.)

Give more tokens if you review more papers, or if you publish more papers. Likely to create a rich-get-richer effect (in the best case) or a black market of reviewing tokens (in the worst), and yield many discussions about implementation details (how many tokens for team work ? One per author ? Just one to be split among authors ? just the first ? Just the last ? Just one random author ?). Having one central repository, as proposed by the ACL Exec will enable to compute interesting statistics (max / mean number of submission and review per year and per scholar or institution etc).

Improve the reviewing quality (work on the offer side)

Incentivize good reviews. This is a proposal of the ACL review committee which recommand to recognize good reviews and reviewers and grant awards Short-term action number 3. Which will imply that someone will have to review the reviews -- arguably AC / AEs read review and can evaluate them; but to compare them on fair grounds and distribute valuable rewards will require to build a more robust sytem. Do we have time for this? Conversely, one could think of many ways to also discourage bad rewiews (a prize for the worst reviewer(s), a publication of the worst reviews (with authors names or institution names), etc, etc?). Probably not the road to follow.

Train reviewers. This is also an important ACL proposal Short-term action number 4. Learning to review should be part of any good PhD training program, and a lot of material has already been published, as pointed out in this EMNLP 2020 Blog post or [editors]. Having reviews endorsed by senior staff members could also be a good thing. As long as the right person gets credit for the review. Only selecting reviewers that have been duely trained and prepared for this should be a basic rule of Program Committee building. Should one go one step further and enforce formal requirements (ie. to have a completed PhD degree) ?

Make reviews public. NeurIPS (as of 2013) and ICLR make their anonymous reviews (as well as author's rebuttal) public forever and I tend to think that this tends to improve reviews (could we test this?). If additionnally reviews are signed, this should nudge reviewers towards paying more attention to what they write, which will last, have more readers, more impact, and may help make new friends (or enemies). Non-blind reviewing systems have other biases that are well documented, so the cure might again be worst than the disease.

Political grunts

About paid reviews: https://www.the-scientist.com/careers/scientists-publishers-debate-paychecks-for-peer-reviewers-68101?mc_cid=540aef3aa7&mc_eid=b30567b349

[] Excerpt from the Singapore statement on research integrity:

'''Authorship''': Researchers should take responsibility for their contributions to all publications, funding applications, reports and other representations of their research. Lists of authors should include all those and only those who meet applicable authorship criteria.
'''Publication Acknowledgement''': Researchers should acknowledge in publications the names and roles of those who made significant contributions to the research, including writers, funders, sponsors, and others, but do not meet authorship criteria

[R&A] There are many points worth discussing in this paper. One statement that I find very disturbing is about the indisputable randomness of reject/accept decisions, which the authors attribute to alledged the power law distribution of quality. I disagree - for reasons developed in [this associated post].

EMNLP this year required that for each submitted paper, at least one author had to be a registered candidate reviewer. Suggesting that there was some unbalance between the number of authors and reviewers. ↩↩