How to keep or delete columns/Variable of a data frame in R

After reading a data in R, the 1^st thing we should do before processing/enriching/ preparing the data in require format is to retain the required columns that can be used further and remove rest of the columns. This will improve the performance in the subsequent steps.

There could be 2 scenarios

Dropping list of columns from a data frame
Keeping required columns

Here is an example of data frame (Testdata)

Scenario 1 – (Dropping/deleting) list of columns from a data frame)

Method 1: Delete column by name

We are going to delete/drop Vendor Type and Country

df= subset(Testdata, select = -c ( Vendor Type, Country))

Note: “-“ sign indicates dropping variables

Make sure the variable/column names should not specify in a quote when using () function

Method 2: Delete column by column index number

We are going to delete/drop Payment ID, Country and Sales

df = subset (Testdata[-c(2,4:5)]

Note: 2,4,5 are position of the variable in the data frame

Method 3: Delete columns by index number using “dplyr” package (install package “dplyr” )

We are going to delete/drop Payment ID, Country and Sales

df = select (Testdata,-2,-4:-5]

Note: 2,4,5 are position of the variable in the data frame

Method 4: Delete columns by name using “dplyr” package (install package “dplyr”)

We are going to delete/drop Payment ID, Country and Sales

There are 2 ways:

df = select (Testdata, -Payment ID, - Country, - Sales]

         (or)

df = select (Testdata, -C (Payment ID, Country, Sales)]

Method 5: Delete columns whose name starts with e.g. “Vendor”

df = Testdata[,!grepl(“^Vendor”, names(Testdata))]

Note: “!” sign indicates negation, It will retain column Payment ID, country, sales and invoice number

Method 6: Delete columns whose name ends with letter “e”

df = Testdata[,!grepl(“e$”, names(Testdata))]

Note: This will remove column that end with “e” that is “Vendor Name” and “Vendor Type”

Method 7: Delete columns whose name contains “Type”

df = Testdata[,!grepl(“*Type”, names(Testdata))]

Note: This will remove column that end with “Type” that is “Vendor Type”

Scenario 2 – (Keeping required columns in a data frame)

Method 1: Keep column by name

We are going to keep Vendor Type and Country

df= subset(Testdata, select = c ( Vendor Type, Country))

Note: Make sure the variable/column names should not specify in a quote when using () function

Method 2: Keep column by column index number

We are going to keep Payment ID, Country and Sales

df = subset (Testdata[c(2,4:5)]

Note: 2,4,5 are position of the variable in the data frame

Method 3: Keep columns by index number using “dplyr” package (install package “dplyr” )

We are going to keep Payment ID, Country and Sales

df = select (Testdata,2,4:5]

Note: 2,4,5 are position of the variable in the data frame

Method 4: Keep columns by name using “dplyr” package (install package “dplyr”)

We are going to keep Payment ID, Country and Sales

There are 2 ways:

df = select (Testdata, Payment ID, Country, Sales]

(or) 

df = select (Testdata, C (Payment ID, Country, Sales)]

Method 5: Keep columns whose name starts with e.g. “Vendor”

df = Testdata[,grepl(“^Vendor”, names(Testdata))]

Method 6: Keep columns whose name ends with letter “e”

df = Testdata[,grepl(“e$”, names(Testdata))]

Note: This will retains columns that end with “e” that is “Vendor Name” and “Vendor Type”

Method 7: keep columns whose name contains “Type”

df = Testdata[,grepl(“*Type”, names(Testdata))]

Note: This will retain column that end with “Type” that is “Vendor Type”

How to keep or delete columns/Variable of a data frame in R

Scenario 1 – (Dropping/deleting) list of columns from a data frame)

Method 1: Delete column by name

Method 2: Delete column by column index number

Method 3: Delete columns by index number using “dplyr” package (install package “dplyr” )

Method 4: Delete columns by name using “dplyr” package (install package “dplyr”)

Method 5: Delete columns whose name starts with e.g. “Vendor”

Method 6: Delete columns whose name ends with letter “e”

Method 7: Delete columns whose name contains “Type”

Scenario 2 – (Keeping required columns in a data frame)

Method 1: Keep column by name

Method 2: Keep column by column index number

Method 3: Keep columns by index number using “dplyr” package (install package “dplyr” )

Method 4: Keep columns by name using “dplyr” package (install package “dplyr”)

Method 5: Keep columns whose name starts with e.g. “Vendor”

Method 6: Keep columns whose name ends with letter “e”

Method 7: keep columns whose name contains “Type”

About Admin

Leave a Reply Cancel reply

Scenario 1 – (Dropping/deleting) list of columns from a data frame)

Method 1: Delete column by name

Method 2: Delete column by column index number

Method 3: Delete columns by index number using “dplyr” package (install package “dplyr” )

Method 4: Delete columns by name using “dplyr” package (install package “dplyr”)

Method 5: Delete columns whose name starts with e.g. “Vendor”

Method 6: Delete columns whose name ends with letter “e”

Method 7: Delete columns whose name contains “Type”

Scenario 2 – (Keeping required columns in a data frame)

Method 1: Keep column by name

Method 2: Keep column by column index number

Method 3: Keep columns by index number using “dplyr” package (install package “dplyr” )

Method 4: Keep columns by name using “dplyr” package (install package “dplyr”)

Method 5: Keep columns whose name starts with e.g. “Vendor”

Method 6: Keep columns whose name ends with letter “e”

Method 7: keep columns whose name contains “Type”

About Admin

Related Articles

Leave a Reply Cancel reply