Skip to content

Selection using Base functions and possibly missing values #134

@tcovert

Description

@tcovert

Suppose I have a DataFrame with two fields: idx and date. The date field has missing values (in the DataFrames sense) and is currently stored in the DataFrame as a string. Is there a query statement that I can write which parses the string into a date? I tried something like this:

df2 = @from i in df begin
       @select {i.idx, date = Date.(i.date, "mm/dd/yyyy")}
       @collect DataFrame
       end

but got an error like this:

ERROR: type UnionAll has no field parameters
Stacktrace:
 [1] column_types at /Users/tcovert/.julia/v0.6/IterableTables/src/utilities.jl:20 [inlined]
 [2] _DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:105
 [3] DataFrames.DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:128
 [4] collect(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}, ::Type{DataFrames.DataFrame}) at /Users/tcovert/.julia/v0.6/Query/src/sinks/sink_type.jl:2

I also tried a version with no dot-broadcasting:

df2 = @from i in df begin
       @select {i.idx, date = Date(i.date, "mm/dd/yyyy")}
       @collect DataFrame
       end

and got this error:

ERROR: MethodError: Cannot `convert` an object of type DataValues.DataValue{String} to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] next at /Users/tcovert/.julia/v0.6/Query/src/enumerable/enumerable_select.jl:41 [inlined]
 [2] macro expansion at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:91 [inlined]
 [3] _filldf(::Tuple{DataArrays.DataArray{Int64,1},Array{Date,1}}, ::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:79
 [4] _DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:119
 [5] DataFrames.DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:128
 [6] collect(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}, ::Type{DataFrames.DataFrame}) at /Users/tcovert/.julia/v0.6/Query/src/sinks/sink_type.jl:2

is what I am trying to do possible? if so, what am I doing wrong?

thanks in advance for any suggestions you can offer.

here is some example data to apply the code to above: https://www.dropbox.com/s/kgiicawhegmtavc/query_example.csv?dl=0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions