Source of data: Link

Covid-19 data is easily accessible through several websites, it is usually publically available to use. It is also a crucial worldwide event, that affected everyone’s life. So, the analysis of this data may help in getting information on how to face such incidents if it happens again in the future.

1.Total number of confirmed cases by country

I used date_sub() function to get automated data from the day before, as the data changes and accumulates daily. So, whenever we would use the below code we get results up to the day before.

SELECT
  country_region,
  SUM(confirmed) AS Total_confirmed_cases, 
  SUM(deaths) AS Total_deaths
FROM `bigquery-public-data.covid19_jhu_csse.summary`
WHERE  date = date_sub(current_date(), INTERVAL 1 day)
GROUP BY country_region
ORDER BY 2 desc,3 desc

Sample from running the code by looker Studio:

Total amount of confirmed cases by country.bmp

2. Comparing total number of deaths to confirmed cases.

SELECT 
  country_region,  
  SUM(confirmed) AS Total_confirmed_cases,
  SUM(deaths) AS Total_deaths,
ROUND((SUM(deaths)/SUM(confirmed))*100, 2) AS Death_percentage
FROM `bigquery-public-data.covid19_jhu_csse.summary`
WHERE  date = date_sub(current_date(), INTERVAL 1 day)
GROUP BY country_region

Sample from running the code by looker Studio:

deathsvs cases.bmp

3.New cases and deaths For Finland “Updated daily”

First, I wrote a CTE query (base_date ) to list total_cases and total_deaths daily. Then a second CTE query (yesterday), where I used lag() function to get Total_confirmed_cases_yestarday, and Total_deaths_yestarday. Additionally, the difference between accumulated numbers and yesterday's result to get new case and new deaths can also be seen.

WITH base_date AS (
SELECT
  date, 
  country_region, 
  SUM(confirmed) AS Total_confirmed_cases,
  SUM(deaths) AS Total_deaths
FROM `bigquery-public-data.covid19_jhu_csse.summary`
GROUP BY 1,2
ORDER BY 2,3 
), yestarday AS(
SELECT
   *,
  lag(Total_confirmed_cases) over (partition by country_region order by date) AS Total_confirmed_cases_yestarday,
  lag(Total_deaths) over (partition by country_region order by date) AS Total_deaths_yestarday
FROM base_date)
SELECT
  country_region, 
  (Total_confirmed_cases - Total_confirmed_cases_yestarday) AS New_cases,
  (Total_deaths - Total_deaths_yestarday) AS New_deaths
FROM yestarday
WHERE
 country_region = "Finland" 
 AND date = date_sub(current_date(), INTERVAL 1 day)

4.Highest number of confirmed cases in one day per country.

I wrote a CTE query (base_date) to get a list of total_cases and total_deaths daily. Next a second CTE query (yesterday) and used lag() function to get Total_confirmed_cases_yestarday, and Total_deaths_yestarday.

I wrote a third CTE query (difference_table) to obtain the difference between accumulated number and yesterday’s result to get daily new case and daily new deaths. Lastly, I wrote a query where I used row_number() function to make windows and limit results by filtered results when row_number() is 1

WITH base_date AS (
SELECT
  date,
  country_region, 
  SUM(confirmed) AS Total_confirmed_cases, 
  SUM(deaths) AS Total_deaths
FROM `bigquery-public-data.covid19_jhu_csse.summary`
GROUP BY 1,2
ORDER BY 2,3 
), yestarday AS(
SELECT
  *,
  lag(Total_confirmed_cases) over (partition by country_region order by date) AS Total_confirmed_cases_yestarday,
  lag(Total_deaths) over (partition by country_region order by date) AS Total_deaths_yestarday
FROM base_date), difference_table as (
SELECT
  *, 
  (Total_confirmed_cases - Total_confirmed_cases_yestarday) AS daily_New_cases,
  (Total_deaths - Total_deaths_yestarday) AS daily_New_deaths
FROM yestarday)
SELECT
 * 
FROM (
SELECT
  date, 
  country_region,
  daily_New_cases,
  daily_New_deaths,
  ROW_NUMBER() over (partition by country_region order by daily_New_cases DESC, daily_New_deaths DESC ) as RN
FROM difference_table) 
WHERE RN = 1
ORDER BY 3 DESC

Sample from running the code :

daily.bmp