After reading a data in R, the 1st thing we should do before processing/enriching/ preparing the data in require format is to retain the required columns that can be used further and remove rest of the columns. This will  improve the performance in the subsequent steps.

There could be 2 scenarios

  • Dropping list of columns from a data frame
  • Keeping required columns

Here is an example of data frame (Testdata)

 


Scenario 1 – (Dropping/deleting) list of columns from a data frame)

Method 1: Delete column by name

We are going to delete/drop Vendor Type and Country

df= subset(Testdata, select = -c ( Vendor Type, Country))

Note: “-“ sign indicates dropping variables

 

Make sure the variable/column names should not specify in a quote when using () function

Method 2: Delete column by column index number

We are going to delete/drop Payment ID, Country and Sales

df = subset (Testdata[-c(2,4:5)]

Note: 2,4,5 are position of the variable in the data frame

Method 3: Delete columns by index number using “dplyr” package (install package “dplyr” )

We are going to delete/drop Payment ID, Country and Sales

df = select (Testdata,-2,-4:-5]

Note: 2,4,5 are position of the variable in the data frame

Method 4: Delete columns by name using “dplyr” package (install package “dplyr”)

We are going to delete/drop Payment ID, Country and Sales

There are 2 ways:

df = select (Testdata, -Payment ID, - Country, - Sales]

         (or)

df = select (Testdata, -C (Payment ID, Country, Sales)]

Method 5: Delete columns whose name starts with e.g. “Vendor”

df = Testdata[,!grepl(“^Vendor”, names(Testdata))]

Note: “!” sign indicates negation, It will retain column Payment ID, country, sales and invoice number

Method 6: Delete columns whose name ends with letter “e”

df = Testdata[,!grepl(“e$”, names(Testdata))]

Note: This will remove column that end with “e” that is “Vendor Name” and “Vendor Type

Method 7: Delete columns whose name contains “Type”

df = Testdata[,!grepl(“*Type”, names(Testdata))]

Note: This will remove column that end with “Type” that is “Vendor Type

Scenario 2 – (Keeping required columns in a data frame)

Method 1: Keep column by name

We are going to keep Vendor Type and Country

df= subset(Testdata, select = c ( Vendor Type, Country))

Note: Make sure the variable/column names should not specify in a quote when using () function

Method 2: Keep column by column index number

We are going to keep Payment ID, Country and Sales

df = subset (Testdata[c(2,4:5)]


Note: 2,4,5 are position of the variable in the data frame

Method 3: Keep columns by index number using “dplyr” package (install package “dplyr” )

We are going to keep Payment ID, Country and Sales

df = select (Testdata,2,4:5]

Note: 2,4,5 are position of the variable in the data frame

Method 4: Keep columns by name using “dplyr” package (install package “dplyr”)

We are going to keep Payment ID, Country and Sales

There are 2 ways:

df = select (Testdata, Payment ID, Country, Sales]

(or) 

df = select (Testdata, C (Payment ID, Country, Sales)]

Method 5: Keep columns whose name starts with e.g. “Vendor”

df = Testdata[,grepl(“^Vendor”, names(Testdata))]

Method 6: Keep columns whose name ends with letter “e”

df = Testdata[,grepl(“e$”, names(Testdata))]

Note: This will retains columns that end with “e” that is “Vendor Name” and “Vendor Type”

Method 7: keep columns whose name contains “Type”

df = Testdata[,grepl(“*Type”, names(Testdata))]

Note: This will retain column that end with “Type” that is “Vendor Type