Skip to content

Throw Warning When Reading File with Blank Lines#7707

Open
Asa-Henry wants to merge 3 commits intoRdatatable:masterfrom
Asa-Henry:fix-3339
Open

Throw Warning When Reading File with Blank Lines#7707
Asa-Henry wants to merge 3 commits intoRdatatable:masterfrom
Asa-Henry:fix-3339

Conversation

@Asa-Henry
Copy link
Copy Markdown

Adds a check to detect when blank lines are present in the file passed to fread, but the user did not specify to skip blank lines. This change throws a warning to let the user know they may want to consider setting the blank.lines.skip to TRUE.

…kip' is greater than 0 when blank lines are present, so I also check if blank lines should be skipped so I can throw a warning to let the user know.
@jangorecki
Copy link
Copy Markdown
Member

PR could have meaningful title

@Asa-Henry Asa-Henry changed the title Fix #3339 Throw Warning When Reading File with Blank Lines Apr 14, 2026
@tdhock
Copy link
Copy Markdown
Member

tdhock commented Apr 14, 2026

please add a test case

@aitap
Copy link
Copy Markdown
Member

aitap commented Apr 14, 2026

This is a good start.

Can you run R CMD build . in the Git checkout directory and then R CMD check data.table_1.18.99.tar.gz? You'll need a working compiler for this (Rtools on Windows, highly specific compiler versions and libraries on macOS, feel free to ask otherwise) and all of data.table's dependencies installed (which you can do using something like source('.ci/ci.R'); dcf.dependencies(which = 'all') |> install.packages(); this is needed only once). Generally, data.table is required to pass the check with nothing worse than a few NOTEs. (There are some shorthands for this process that need a little more familiarity with R. You can learn about them if you want to by reading .dev/cc.R. See also ?data.table::test.data.table.)

You can see that the tests don't pass. I think that tests 1578.1, 1578.2, 1578.6, 1883 will need to be updated for the new warning, possibly 1867.07 and 1867.11 too. Unfortunately, other tests show a regression:

  • tests 1840.2, 1867.05, 1867.06, 1867.09, 2251.07, 2251.09 produce the "blank line" warning despite there are no blank lines in the input:
    write('a,b\n ab,cd,ce\nabc,def \n hj,kli  ', f<-tempfile())
    fread(f)
    fread("A\nCol1,Col2\n1,3,5\n2,4,6\n")
    fread("Some header\ninfo\nCol1,Col2,Col3\n1,3,5\n2,4,6\n")
    fread("Heading text\nA,B,C,D,E\n1,3,5\n2,4,6\n")
    lines = c(
    "12223, University",
    "12227, bridge, Sky",
    "12828, Sunset",
    "13801, Ground",
    "14853, Tranceamerica",
    "14854, San Francisco",
    "15595, shibuya, Shrine",
    "16126, fog, San Francisco",
    "16520, California, ocean, summer, golden gate, beach, San Francisco",
    "")
    text = paste(lines, collapse="\n")
    fread(text)
    text = paste(lines[c(1:5, 9L, 6:8, 10L)], collapse="\n")
    fread(text)
  • test 1856.3 warns about the blank lines despite they don't pose a real problem:
    fread("A,B\n\n\n")

Any ideas how the spurious warnings can be fixed? If you need to debug data.table::fread while it runs, you can run R -d $your_debugger or call Sys.getpid() and attach to the running process.

@Asa-Henry
Copy link
Copy Markdown
Author

Thanks for this comprehensive response @aitap! This made it easier to understand what went wrong with my changes!

In response to your question: it appears I introduced confusion between a file with intermittent blank lines and a file which has the header and the data separated by a blank line. For issues such as 1840.2, looks like I over looked how spaces come into play when parsing...

I picked up on that topSkip, prevStart, and topStart are used to find chunks of lines, but topSkip was greater than zero for the example case presented in the issue #3339, but I don't know enough about freadMain to know what all goes into dealing with blank lines. Is there any insight for how freadMain handles files or maybe that I'm working in the wrong place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants