[빅데이터] flexdashboard 사용법, tabset , dygraphs,plotly,레이어 아웃의 사이즈(figure Sizes),shiny 패키지,valueBox()아이콘 설정 ::

[빅데이터] flexdashboard 사용법, tabset , dygraphs,plotly,레이어 아웃의 사이즈(figure Sizes),shiny 패키지,valueBox()아이콘 설정

GOGO치삼 2020. 4. 8. 10:50

2020. 4. 8. 10:50

flexdashboard 사용법

flexdashboard 기본

---
title: "Dashboard Example"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns 
    vertical_layout: fill  
---

```{r setup, include=FALSE} 
library(flexdashboard)
library(ggplot2)
```

Column {data-width=650} #px로 정의 해놓음
-----------------------------------------------------------------------

### Chart A

```{r}
#간단한 시각화 작업
ggplot(data=mtcars,aes(x=hp, y=mpg,color=as.factor(cyl)))+
geom_point()

```

Column {data-width=350} #시각화 자료를 컬럼에서 실행 할 수 있다.
-----------------------------------------------------------------------

### Chart B

```{r}
ggplot(data = mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(am)))
```

### Chart C

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(cyl)), position="dodge")+
  coord_polar() # 나이팅게일 함수 같은 방사형 차트 표시해주는 함수
```

vertical_layout: fill 때문에 열이 무시될 수있다(화면 창의 크기를 조절할 경우) 반응형 웹에서 자주 사용되는 것

title: "Dashboard Example"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns #컬럼기준
    vertical_layout: fill  #창사이즈에 맞춰서 줄이거나 커지게 만들어라

```{r setup, include=FALSE} #이름은 셋업이고 인클루드 시키지 않겠다) 
library(flexdashboard)
library(ggplot2)
```

flexdashboard 응용

---
title: "Dashboard Example"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill  
---

```{r setup, include=FALSE} 
library(flexdashboard)
library(ggplot2)
```

Column {data-width=650} #px로 정의 해놓음
-----------------------------------------------------------------------

### Chart A

```{r}
#간단한 시각화 작업
ggplot(data=mtcars,aes(x=hp, y=mpg,color=as.factor(cyl)))+
geom_point()

```

Column {data-width=350} #시각화 자료를 컬럼에서 실행 할 수 있다.
-----------------------------------------------------------------------

### Chart B

```{r}
ggplot(data = mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(am)))
```

### Chart C

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(cyl)), position="dodge")+
  coord_polar() # 나이팅게일 함수 같은 방사형 차트 표시해주는 함수
```

---
title: "Dashboard Example"
output: # 출력 결과에 대한 환경 설정(아랫줄들을 붙여쓰면 안된다, 전체적인 위치는 columns 구조 )
  flexdashboard::flex_dashboard:
    orientation: columns # 기본위치값이 열단위로 만들어진다는 의미(columns, rows 선택 가능)
    vertical_layout: scroll # 가로 사이즈를 flexdashboard 사이즈에 맞춰서 꽉 채우라는 의미
---

```{r setup, include=FALSE}
# 청크코드의 이름은 setup으로 지정, include는 안시키겠다.(문서 만들때만 잠깐 사용)
library(flexdashboard) 
library(ggplot2) #해당 라이브러리를 계속 사용하면 여기에 써놓고, 한번만 사용할거면 청크코드에 쓰면 된다.
```

Column {data-width=650}
-----------------------------------------------------------------------
<!-- 첫 번째 컬럼 가로크기를 650(이 내부에 들어가는 것은 컬럼으로 만들겠다.라고 선언(Column 대소문자 구분)) -->
### Chart A

```{r}
ggplot(data=mtcars, aes(x=hp, y=mpg, color=as.factor(cyl)))+
  geom_point()
```

Column {data-width=350} 
-----------------------------------------------------------------------
<!-- 두 번째 컬럼 가로크기를 350 -->
### Chart B

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping=aes(x=cyl, fill=as.factor(am)))
```

### Chart C

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(cyl)), position="dodge")+
  coord_polar() # 나이팅게일 함수 같은 방사형 차트 표시해주는 함수
```

orientation: columns vertical_layout: scroll 변경 후

flexshboatd에서 tabset 사용하기

---
title: "Dashboard Example"
output: # 출력 결과에 대한 환경 설정(아랫줄들을 붙여쓰면 안된다, 전체적인 위치는 columns 구조 )
  flexdashboard::flex_dashboard:
    orientation: columns # 기본위치값이 열단위로 만들어진다는 의미(columns, rows 선택 가능)
    vertical_layout: scroll # 가로 사이즈를 flexdashboard 사이즈에 맞춰서 꽉 채우라는 의미
---

```{r setup, include=FALSE}
# 청크코드의 이름은 setup으로 지정, include는 안시키겠다.(문서 만들때만 잠깐 사용)
library(flexdashboard) 
library(ggplot2) #해당 라이브러리를 계속 사용하면 여기에 써놓고, 한번만 사용할거면 청크코드에 쓰면 된다.
```

Column {data-width=650}
-----------------------------------------------------------------------
<!-- 첫 번째 컬럼 가로크기를 650(이 내부에 들어가는 것은 컬럼으로 만들겠다.라고 선언(Column 대소문자 구분)) -->
### Chart A

```{r}
ggplot(data=mtcars, aes(x=hp, y=mpg, color=as.factor(cyl)))+
  geom_point()
```

Column {.tabset} 
-----------------------------------------------------------------------
<!-- 두 번째 컬럼 가로크기를 350 -->
### Chart B

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping=aes(x=cyl, fill=as.factor(am)))
```

### Chart C

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(cyl)), position="dodge")+
  coord_polar() # 나이팅게일 함수 같은 방사형 차트 표시해주는 함수
```

---
title: "Dashboard Example"
output: # 출력 결과에 대한 환경 설정(아랫줄들을 붙여쓰면 안된다, 전체적인 위치는 columns 구조 )
  flexdashboard::flex_dashboard:
    orientation: columns # 기본위치값이 열단위로 만들어진다는 의미(columns, rows 선택 가능)
    vertical_layout: scroll # 가로 사이즈를 flexdashboard 사이즈에 맞춰서 꽉 채우라는 의미
---

```{r setup, include=FALSE}
# 청크코드의 이름은 setup으로 지정, include는 안시키겠다.(문서 만들때만 잠깐 사용)
library(flexdashboard) 
library(ggplot2) #해당 라이브러리를 계속 사용하면 여기에 써놓고, 한번만 사용할거면 청크코드에 쓰면 된다.
```

Column {data-width=650}
-----------------------------------------------------------------------
<!-- 첫 번째 컬럼 가로크기를 650(이 내부에 들어가는 것은 컬럼으로 만들겠다.라고 선언(Column 대소문자 구분)) -->
### Chart A

```{r}
ggplot(data=mtcars, aes(x=hp, y=mpg, color=as.factor(cyl)))+
  geom_point()
```

Column {.tabset .tabset-fade} 
-----------------------------------------------------------------------
<!-- 두 번째 컬럼 가로크기를 350 -->
### Chart B

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping=aes(x=cyl, fill=as.factor(am)))
```

### Chart C

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(cyl)), position="dodge")+
  coord_polar() # 나이팅게일 함수 같은 방사형 차트 표시해주는 함수
```

flexshboatd에서 dygraphs 사용법

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dygraphs) # 시계열 자료에 대한 시각화 그래프 구현할 때, 하이라이트, 줌 등의 기능을 제공한다.
```

### Lung Deaths (All)

```{r}
dygraph(ldeaths)
```

### Lung Deaths (male)

```{r}
dygraph(mdeaths)
```

### Lung Deaths (female)

```{r}
dygraph(fdeaths)
```

flexedashboard plotly 사용하기

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dygraphs) # 시계열 자료에 대한 시각화 그래프 구현할 때, 하이라이트, 줌 등의 기능을 제공한다.
library(plotly)

#install.packages("highcharter")
library(highcharter)
```

### Lung Deaths (All)

```{r}
plot_ly(mtcars, x=~hp,y=~mpg, type = 'scatter',
        mode='markers', color = ~as.factor(cyl))
        
```

### Lung Deaths (male)

```{r}
hchart(mtcars,"scatter", hcaes(x=hp, y=mpg, group=as.factor(cyl)))
```

### Lung Deaths (female)

```{r}
hchart(diamonds$price, color="#B71C1C", name="Price") %>% 
  hc_title(text = "You can Zoom me")
```

https://rmarkdown.rstudio.com/flexdashboard/using.html#compenents

Using flexdashboard

You can form links directly to dashboard pages using markdown link syntax: [Page 2]. To use custom link text you can also create a link via the page’s anchor: [Page Two](#page-2). Both styles of page linking are demonstrated in this example: Hiding Pages T

rmarkdown.rstudio.com

레이어 아웃의 사이즈(figure Sizes)

---
title: "figure Sizes"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
---

```{r setup, include=FALSE}
library(flexdashboard)
```

Column
-----------------------------------------------------------------------

### Chart A(12,7)

```{r, fig.width=12,fig.height=7}
plot(cars)
```

Column 
-----------------------------------------------------------------------

### Chart B(5,5)

```{r,fig.width=5,fig.height=5}
plot(pressure)

```

### Chart C(10,7)

```{r,fig.width=10,fig.height=7}
plot(airmiles)

```

shiny 패키지

---
title: "figure Sizes"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
runtime: shiny
---

```{r setup, include=FALSE}
library(flexdashboard)

```

Column
-----------------------------------------------------------------------

### Chart A(12,7)

```{r, fig.width=12,fig.height=7}
plot(cars)
```

Column 
-----------------------------------------------------------------------

### Chart B(5,5)

```{r,fig.width=5,fig.height=5}
plot(pressure)

```

### Chart C(10,7)

```{r,fig.width=10,fig.height=7}
library(shiny)
renderTable({head(mtcars,10)})

```

---
title: "figure Sizes"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
runtime: shiny
---

```{r setup, include=FALSE}
library(flexdashboard)

```

Column
-----------------------------------------------------------------------

### Chart A(12,7)

```{r, fig.width=12,fig.height=7}
plot(cars)
```

Column 
-----------------------------------------------------------------------

### Chart B(5,5)

```{r,fig.width=5,fig.height=5}
plot(pressure)

```

### Chart C(10,7)

```{r,fig.width=10,fig.height=7}
library(shiny)
renderTable({head(mtcars,10)})

```


```{r}
DT::datatable(mtcars,
              options = list(pageLength=25,
                             bPaginate=T),
              fillContainer = "top")

```

flexdashboard에서 valueBox()아이콘 설정

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
library(ggplot2)
library(knitr)
```

### valuebox Example1

```{r}
valueBox(42, 
         icon = "fa-github")

```

### # of Bit Coin

```{r}
num=8
valueBox(num, 
         icon = "fa-bitcoin",
         color = "info")

```

### valuebox Example3

```{r}
num=50
valueBox(num,
         caption="APPLE PAY",
         icon = "fa-bluetooth",
         color = ifelse(num>10, "warning", "primary")) # 워닝,프라이머리는 내장 상수

```

### valuebox Example4

```{r}
valueBox(107,
         caption="AWS",
         icon = "fa-cannabis",
         color = "success")
```

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
library(ggplot2)
library(knitr)
```

Row1
----------------------------------------

### valuebox Example1

```{r}
valueBox(42, 
         icon = "fa-github")

```

### # of Bit Coin

```{r}
num=8
valueBox(num, 
         icon = "fa-bitcoin",
         color = "info")

```

### valuebox Example3

```{r}
num=50
valueBox(num,
         caption="APPLE PAY",
         icon = "fa-bluetooth",
         color = ifelse(num>10, "warning", "primary")) # 워닝,프라이머리는 내장 상수

```

### valuebox Example4

```{r}
valueBox(107,
         caption="AWS",
         icon = "fa-cannabis",
         color = "success")
```

Row2
----------------------------------------

### ggpolt2 graph

```{r}
ggplot(data=mtcars)+
  geom_bar(mapping = aes(x=cyl, fill=as.factor(am)),
           position="dodge")+
  theme(legend.position = "blank")
```

### Tabular data

```{r}
kable(mtcars)
```

---
title: "Multiple Pages"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    source_code : embed
    navbar:
      - { title: "About", href: "https://www.naver.com/about", align: right }
      - { icon: "fa-pencil", href: "https://www.github.com", align: right}
---

  Pag1 1
==========================================
  
  This is an example. As you can see, flexdashboard can have text annotations.

```{r setup, include=FALSE}
library(flexdashboard)
library(ggplot2)
library(knitr)
```

Column 1 {data-width=300}
-----------------------------------------------------------------------
  
### Gauge ex1. Contact Rate
  
```{r}
gauge(45, min=0, max=100, symbol = '%',              #전달되는 값에 따라서 색상을 변경시키라는 의미
      sectors = gaugeSectors(success = c(80, 100),
                             warning = c(40, 79),
                             danger = c(0, 39)))

```

### Gauge ex2. Average Rating

```{r}
rating=42
gauge(rating, 0, 50, label = 'Test', gaugeSectors(
  success = c(41, 50), warning = c(21, 40), danger = c(0, 20)))

```

### Text Annotation

One of the dashboard section can be as a text area.

Markdown Grammers can be helpful here.

It is not that difficult.

Just have a try

Column 2 {data-width=300}
-----------------------------------------------------------------------
  
  Pag1 2
==========================================

### ggplot2 chart1

```{r}

ggplot(data= mtcars,aes(x=hp, y=mpg,color=as.factor(cyl)))+
  geom_point()+
  theme(legend.position = "blank")

```

about,연필모양 눌렀을 때 이동하는 코드

- { title: "About", href: "https://www.naver.com/about", align: right }
- { icon: "fa-pencil", href: "https://www.github.com", align: right}

########################################################
# 서점의 고객 데이터에 대한 가상 사례
#http://blog.daum.net/revisioncrm/405
# (탐색적인 분석과 고객세분화 응용 사례)
########################################################
# 작업 파일 : cust_seg_smpl_280122.csv
# --------------------------------------------------------
# 작업 내용
#--------------------------------------------------------
# 최종구매후기간 recency와 구매한 서적의 수간의 관계 확인
# 동일 좌표에 다수의 고객 존재 가능성이 있으므로 이를 처리

# 가설 1
# 보조선인 회귀선을 본다면 최근성이 낮을수록, 
# 즉 구매한지 오래되었을 수록 구매한 서적의 수가 많음
#---------------------------------------------------------

# 가설 2
# 구매한 책의 수가 많을수록 구매금액이 큼
# 주로 비싼 책을 샀는지를 파악하기 위해 평균금액을 계산


# 수행 내용
#---------------------------------------------------------
# 성별을 구분해서 특성 차이 비교
#---------------------------------------------------------
# 서적과 서적이외 구매액 비교

# 1차 결론
# 서적 구매는 많으나 기타 상품 구매가 약한 집단을 선정해
# 집중적 cross-selling 노력 기울이는 것이 필요해 보임


#---------------------------------------------------------
# 대상 집단 조건 - 
# 시각적으로 설정했던 기준선 영역에 해당하는 고객리스트 추출


#---------------------------------------------------------
# 선정된 집단의 프로파일 시각적으로 확인
# 서적 구매수량과 성별 분포 확인 (여성은 pink)


#---------------------------------------------------------
# 전체고객의 평균/중위수 서적구매수량과 비교

# 프로파일 확인 결과 중위수에 비해 서적구매수량이 많고, 
# 평균에 비해서도 많은 편인 여성 고객들임

# 2차 결론
# 기타 상품 중 여성 선호 상품을 찾아 제안하는 방식으로 cross-sell
# 캠페인 진행 필요해 보임


#---------------------------------------------------------
# 군집분석을 활용한 고객세분화

# 고객집단을 표시할 색상을 임의로 지정
# 번호순의 색상 이름 벡터 생성

# 각 고객의 소속 집단이 어디인가에 따라 색상 표시


#---------------------------------------------------------
# 3차 결론
# 서적 구매 장르의 수가 많다면 
# 서적 구매 수량이 많을 가능성 높으므로
# 비율을 새로 계산 (=구매한 서적 수량 대비 쟝르의 수)
######################################################

#고객을 세분화해서 분석하는 문제
#앞으로 타겟 마케팅을 누구를 잡을 건지 마케팅 방식을 어떻게 집중적으로 해야 할 것인 지에 대해 조사
#동일 좌표안에 고객들이 쌓여잇어서 별도 함수를 이용하여 푼다 jitter()

# 구매한 서적의 개수가 정말 많은지 비싼 책을 많이 샀는지 저렴한 금액의 책을 많이 삿는지 조사
# 성별을 분리하여 기타 상품을 삿는지 책을 삿는지 구분

#1.데이터 파일 불러오기
cs0<-read.csv("cust_seg_smpl_280122.csv")
head(cs0)
names(cs0) #컬럼명만 꺼내오기 get 방식

cs1<-cs0

names(cs1) <- c("cust_name", "sex", "age", "location", "days_purchase",#set 방식
                "recency", "num_books", "amt_books", "amt_non_book",
                "amt_total", "interest_genre", "num_genre",
                "membership_period", "sms_optin" )
#xls는 예전 버전 xlsx 는 뉴버전

#최종 구매 후 기간recency 와 구매한 서적의 수간의 관계 확인
plot(cs1$recency,cs1$num_books)

#동일 좌표에 다수의 고객 존재 가능성이 있으므로 jitter 활용
# --------------------------------------------------------
# jitter()
# jitter는 데이터 값을 조금씩 움직여서 같은 점에 데이터가 여러번 겹쳐서 표시되는 현상을 막는다.
# 숫자로 이뤄진 벡터 타입
# --------------------------------------------------------
# Description
#숫자 벡터에 소량의 노이즈를 추가하는 함수

#Usage
# jitter(x, factor=1, amount=NULL)

#Arguments
# x: 지터를 추가 할 숫자 형 벡터.
#
# factor : numeric(숫자).

# amount : numeric(숫자).
#양수이면 양으로 사용되며
#그렇지 않으면 =0
#기본값은 facor * z /50 입니다.

# amount의 기본값은 NULL
# factor * d /5 여기서 d는 x값 사이의 가장 작은 차이

#Examples
#z <- max(x)-min(x) 라고 하자(일반적인 경우를 가정).

#추가 될 양 a는
#다음과 같이 양의 인수 양으로 제공되거나 z에서 계산된다.
#만약 amount == 0 이면 a <- factor * z/50 을 설정.

round(jitter(c(rep(1,3),rep(1.2,4),rep(3,3))),3) #겹쳐진 좌표의 거리를 멀게 한다.
# [1] 0.997 1.040 0.975 1.174 1.190 1.177 1.187 2.970 2.998 3.022

# rep(1,3)
# [1] 1 1 1
# rep(1.2,4)
# [1] 1.2 1.2 1.2 1.2
# rep(3,3)
# [1] 3 3 3

# jitter(c(rep(1,3),rep(1.2,4),rep(3,3)))
# [1] 0.9854338 0.9838435 0.9848714 1.1701283 1.1712497 1.2084189 1.1942649 3.0360655 3.0260686 2.9635534

#변동 된 값을 보여주는 그래프
plot(jitter(cs1$recency),jitter(cs1$num_books))

#선그래프
abline(lm(cs1$num_books~cs1$recency),col="blue")

View(cs1)

#엑셀에서 천단위 comma가 포함된 것을 gsub 함수로 제거
cs1$amt_books <- as.numeric(gsub(",",
                                 "",
                                 as.character(cs1$amt_books))
)

cs1$amt_non_book <- as.numeric(gsub(",",
                                    "",
                                    as.character(cs1$amt_non_book))
)

plot(jitter(cs1$num_books), jitter(cs1$amt_books))

abline(lm(cs1$amt_books~cs1$num_books),
       col="blue")

#주로 비싼 책을 삿는지를 파악하기 위해 평균금액을 계산
cs1$unitprice_book <- cs1$amt_books / cs1$num_books

plot(jitter(cs1$num_books),
     jitter(cs1$unitprice_book),
     pch=19,
     col="blue",
     cex=0.7,
     ylim=c(0, max(cs1$unitprice_book)*1.05)
)

abline(lm(cs1$unitprice_book~cs1$num_books),
       col="blue",
       lwd=2, lty=2)

abline(h=median(cs1$unitprice_book), #median:중간값 만들어주는 것
       col="darkgrey")

#성별을 구분해서 특성 차이 비교
plot(jitter(cs1$num_books),
     jitter(cs1$unitprice_book),
     pch=19,
     cex=0.7,
     col=ifelse(cs1$sex=='여', "pink", "blue"),
     ylim=c(0, max(cs1$unitprice_book)*1.05),
     sub="pink: female blue:male")

abline(lm(cs1$unitprice_book~cs1$num_books),
       col="blue",
       lwd=2, lty=2)



abline(h=median(cs1$unitprice_book), #median:중간값 만들어주는 것
       col="darkgrey")

#동그라미 크기 비율 조정하기
plot(jitter(cs1$num_books),
     jitter(cs1$unitprice_book),
     pch=19,
     cex=4*cs1$amt_non_book/max(cs1$amt_non_book),
     col=ifelse(cs1$sex=='여', "pink", "blue"),
     ylim=c(0, max(cs1$unitprice_book)*1.05),
     sub="size: 서적이외 상품구매액액")


abline(lm(cs1$unitprice_book~cs1$num_books),
       col="blue",
       lwd=2, lty=2)

abline(h=median(cs1$unitprice_book),
       col="darkgrey")

## 이상치를 제외하고 확인했을 때, 책을 많이 살 수록 서적 외 다른 상품들도 많이 구매함을 알 수 있다.
## --> 서적에 대한 구매력을 올려야 한다. or 기타 주력상품을 카테고리화 하여 판매량에 대한 분석을 할 수 있다.(캐시카우 확인, 문제아 확인)

########################################################### 4단계 : 서적구매액과 서적 이외 구매액 분포 및 관계 확인

#서적과 서적이외 구매액 비교

plot(jitter(cs1$amt_books),
     jitter(cs1$amt_non_book),
     pch=19,
     col="khaki",
     cex=1.5,
     ylim=c(0, max(cs1$amt_non_book)*1.05)
)

abline(lm(cs1$amt_non_book~cs1$amt_books),
       col="blue",
       lwd=2, lty=2)

abline(h=median(cs1$amt_non_book)*1.5, col="darkgrey")
abline(v=median(cs1$amt_books)*1.5, col="darkgrey")

text(median(cs1$amt_books)*1.5*2,
     median(cs1$amt_non_book)*1.5*0.7,"cross-sell target")

## 책의 구매수량이 적을 수록 기타상품 또한 구매력이 떨어지는 것을 확인할 수 있다.(우상향 확인 가능)
## Selling point 확인 가능
## --> 구매력이 약한 집단을 주 타겟층으로 잡아 전략 수립 필요

########################################################### 5단계 : 고객 리스트 추출
# 대상 집단 조건 -
# 시각적으로 설정했던 기준선 영여겡 해당하는 고객리스트 추출

tgtgridseg <- cs1[cs1$amt_books > median(cs1$amt_books)*1.5 &
                    cs1$amt_non_book < median(cs1$amt_non_book)*1.5,]

nrow(tgtgridseg) # 2명 확인

paste("size of target =",
      as.character(100*nrow(tgtgridseg) / nrow(cs1)),
      "% of customer base") # 결과 : "size of target = 10 % of customer base"

########################################################### 6단계 : 집단 시각화
# 선정된 집단의 프로파일 시각적으로 확인
# 서적 구매수량과 성별 분포 확인(여성은 pink)

barplot(tgtgridseg$num_books,
        names.arg = tgtgridseg$cust_name,
        col=ifelse(tgtgridseg$sex=='여',"pink","blue"),
        ylab="서적 구매수량")

# 전체 고객의 평균과 서적구매수량 중위수를 비교
abline(h=mean(cs1$num_books), lty=2)
abline(h=median(cs1$num_books), lty=2)


#세분화 작업
#군집분석을 활용한 고객 세분화
cs2 <- cs1[,names(cs1) %in% c("days_purchase",
                       
                       "recency", "num_books", "amt_books", "unitprice_book", 
                       
                       "amt_non_book", "num_genre", "membership_period")]
cs2

kmm1<-kmeans(cs2,3)#kmeans : 군집 분석 함수

table(kmm1$cluster)

#고객 집단을 표시할 색상을 임의로 지정
#번호 순의 색상 이름 백터 생성
cols<-c("red","green","blue")

barplot(table(kmm1$cluster),
        names.arg=names(table(kmm1$cluster)),
        col=cols,
        main="군집별 고객 수 분포")

#고객집단을 프로파일 하는데 장르부분
#각 고객의 소속 집단이 어디인가에 따라 색상표시
plot(jitter(cs2$days_purchase),
     jitter(cs2$num_genre),
     col=cols[kmm1$cluster],
     pch=19,
     main="고객세분집단 프로파일 : 구매 빈도와 서적구매 장르 다양성 분포",
     sub="cl#1:red, cl#2:green, cl#3:blue")

#1.데이터 파일 불러오기
bank0<-read.csv("bnk05.csv")
head(bank0)
names(bank0) #컬럼명만 꺼내오기 get 방식
View(bank0)


###20대 30대만 구하기
library(dplyr)

#필더가 적용된 값을 넣기
bank1<-bank0 %>% filter(bank0$age < 40 & bank0$age > 19)
View(bank1)

###연령과 잔고의 분포를 보여주는 플롯 작성
plot(bank1$age)
plot(bank1$balance)
plot(bank1$duration)

hist(bank1$age)
hist(bank1$balance)

#산점도(scatterplot)
###연령과 잔고의 산점도를 작성하라
plot(bank1$age,bank1$balance)

table(bank1$age)

#변동 된 값을 보여주는 그래프
###동일한 연령이 많이 존재하므로 ijtter를 활용한 플롯을 작성하고 플롯의 point를 반투명한 blue로 변경하라
plot(jitter(bank1$age), jitter(bank1$balance), pch=19, col=rgb(0,0,1,0.2))

###결혼상태별 고객 수를 막대챠트로 작성하고 <20대의 결혼 상태 분포>라는 제목을 추가하라

#목록 확인하기
plot(bank1$marital)

#바차트 만들기
barplot(table(bank1$marital),
        main ="20대 결혼 상태 분포")

# - 잔고와 duration간의 분포를 보여주는 scatterplot을 작성하고, 선형회귀선을 추가하라.
plot(bank1$balance,bank1$duration)

abline(lm(bank1$balance~bank1$duration),
       col="blue",
       lwd=2, lty=2)

abline(h=median(cs1$unitprice_book), #median:중간값 만들어주는 것
       col="darkgrey")

# - 결혼상태가 single인 경우는 blue, 아니라면 red인 반투명 point로 색상을 변경하라.
plot(jitter(bank1$age), jitter(bank1$balance), pch=19, 
     col=ifelse(bank1$marital=="single", rgb(0,0,1,0.2), rgb(1,0,0,0.2)),
     sub="blue: single")
abline(lm(balance~age, data=bank1), col="darkgrey", lwd=2, lty=2)


# - duration과 balance 각각에 대한 중위수를 기준으로 수직,수평의 보조 구분선을 추가하라.
plot(jitter(bank1$balance), jitter(bank1$duration), pch=19, 
     col=ifelse(bank1$loan=="yes", rgb(1,0,0,0.2), rgb(0,0,1,0.2)),
     sub="blue: loan")

abline(lm(duration~balance, data=bank1), col="darkgrey", lwd=2, lty=2)
abline(h=median(bank1$duration), lty=2)
abline(v=median(bank1$balance), lty=2)
# 
# 
# - 개인대출여부 (loan) 별 잔고 분포를 box plot을 사용하여 나타내라.
boxplot(balance~loan,data=bank1, main="개인대출여부별 잔고", 
        ylab="잔고", ylim=c(-1000,4000))
grid()
# 
# - 직업별 잔고의 중위수를 집계 산출하고, 막대 플롯을 작성하라.
gg1 <-aggregate(bank1$balance, by=list(bank1$job), 
                FUN=median)

#- 직업이 학생이면 blue로 아니면 grey로 색상을 지정하라.
names(agg1) <- c("job","mdn_bal")
barplot(agg1$mdn_bal, names.arg=agg1$job,
        col=ifelse(agg1$job=="student","blue","grey"))

#- 20대 전체의 잔고 중위수 값을 기준으로 수평 보조선을 추가하라.
abline(h=median(bank1$balance), lty=2)

agg2 <-aggregate(bank1$age, by=list(bank1$job), 
                 FUN=mean)

names(agg2) <- c("job","avg_age")

rloan01 <- table(bank1$job, bank1$loan)[,3]/table(bank1$job, bank1$loan)[,1]

rloan02 <- rloan01[!is.nan(rloan01)]



barplot(agg2$avg_age, names.arg=agg2$job,
        col=rgb(rloan02,0,rloan02,0.8), 
        ylab="age", xlab="job")

abline(h=median(bank1$age), col="grey", lty=2)


# 
# - 20대의 직업별 고객수 비율을 table 명령을 활용하여 구하고
# 
# 20대 뿐 아닌 전체 고객의 직업별 비율을 역시 같은 방식으로 구한후
# 
# 두 가지를 하나의 데이터프레임으로 결합해서 생성하라
# 
# 
# 
# -20대와 전체의 각 직업별 구성비 차이를 비교해 함께 보여주는 막대플롯을 작성하라
# # 
# 
# 
# - 30대 고객의 연령과 잔고간 scatterplot과 
# 
# 20대에 대한 그 것을 각각이 플롯으로 비교해 보여주는 한 장의 그림을 작성하라 (par mfrow 활용)
#

저작자표시 비영리 변경금지

':: IT > R' 카테고리의 다른 글

[빅데이터] 전처리 작업, 오라클 연결, xml/json 다루는 방법, Markdown (0)	2020.04.07
[빅데이터]20200402 데이터 전처리 (0)	2020.04.06
[빅데이터] 텍스트 마이닝, 등 (0)	2020.04.03
[빅데이터] 20200401 qplot, 데이터 프레임, 외부데이터 이용, 데이터 수정 및 파악, 파생변수, 조건문, 데이터 전처리, 그래프 그리기 (0)	2020.04.01
[빅데이터] 20200331 RStudio 설치, plot, 워드클라우드 (0)	2020.03.31

:: GO치의 에브리데이 일기장::

[빅데이터] flexdashboard 사용법, tabset , dygraphs,plotly,레이어 아웃의 사이즈(figure Sizes),shiny 패키지,valueBox()아이콘 설정

':: IT > R' 카테고리의 다른 글

+ Recent posts

티스토리툴바