데이터 사이언스 분석 보고 #5 산점도, 꺾은 선 그래프

대한민국 범죄 분석 보고서 [데이터 사이언스]

데이터 사이언스 분석 보고 #5 산점도, 꺾은 선 그래프

rudals0000 2024. 12. 19. 21:32

import matplotlib.pyplot as plt

import seaborn as sns

import pandas as pd

import matplotlib.font_manager as fm

font_path = 'C:/Windows/Fonts/malgun.ttf'

prop = fm.FontProperties(fname=font_path).get_name()

plt.rcParams['font.family'] = prop

df = pd.read_csv("C:/데사프로젝트 데이터셋/경찰청_범죄자 교육정도_2021.10.csv", encoding='euc-kr')

exclude_columns = ['기타', '미상']

education_columns = [col for col in df.columns[2:] if col not in exclude_columns]

crime_education_sum = df.groupby(['범죄대분류', '범죄중분류'])[education_columns].sum()

education_sum = crime_education_sum.sum(axis=0)

# 상관계수

#correlation = education_sum.corr(df[education_columns].sum(axis=0))

#print("상관계수:", correlation) (상관계수 는 폐기)

# 산점도

plt.figure(figsize=(10, 6))

sns.scatterplot(x=education_sum.index, y=education_sum.values, color='b', marker='o')

plt.title('교육 수준별 범죄 합계', fontsize=16)

plt.xlabel('교육 수준', fontsize=12)

plt.ylabel('범죄 발생 합계', fontsize=12)

plt.xticks(rotation=45, ha='right')

plt.tight_layout()

plt.show()

예측 :　교육수준과 범죄와의 상관관계가 있을 것으로 추정

결과 : 교육수준과 범죄와의 상관관계는 거의 없다.

산점도를 보면 범죄자의 비율은 고등학교(졸업)이 압도적으로 많고,

대학(4년 이상)(졸업)이 두 번째로 유의미한 차이로 많다.

그 외에는 크게 차이가 나지 않는 것으로 보인다.

예측의 오류 이유 : 한국의 교육수준이 뛰어나고, 고등학교까지 의무 교육이여서

교육수준으로 범죄자들을 구별하는 것은 무의미하다. 대한민국의 절대다수가 고등학교 졸업을 해 범죄자의 비율도 같게 나타난다.

더군다나 교육수준은 양적변수여서 산점도는 맞지 않는 분석

방법이다.

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv("C:\데사프로젝트 데이터셋\경찰청_범죄자 직업_2021.10.csv", encoding='euc-kr')

student_data = df['학생']

crime_types = df['범죄중분류']

# 선 그래프

plt.figure(figsize=(12, 6)) # 그래프 크기 설정

plt.plot(crime_types, student_data, marker='o', color='b', label='학생')

plt.title('학생에 따른 범죄 유형별 데이터')

plt.xlabel('범죄 유형')

plt.ylabel('학생 수')

plt.xticks(rotation=45, ha='right')

plt.legend()

plt.tight_layout()

plt.show()

#=====================================================================================================================

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv("C:\데사프로젝트 데이터셋\경찰청_범죄자 직업_2021.10.csv", encoding='euc-kr')

selected_columns = ['금융업', '의사', '교수', '종교가']

df_selected = df[selected_columns]

df_selected['범죄중분류'] = df['범죄중분류']

df_selected.set_index('범죄중분류', inplace=True)

# 선 그래프

df_selected.plot(figsize=(12, 6), marker='o')

plt.title('범죄 유형별 직업군 분포 (금융업, 의사, 교수, 종교가)')

plt.xlabel('범죄 유형')

plt.ylabel('수치')

plt.xticks(ticks=range(len(df_selected.index)), labels=df_selected.index, rotation=45, ha='right')

plt.tight_layout()

plt.show()

'대한민국 범죄 분석 보고서 [데이터 사이언스]' 카테고리의 다른 글

데이터 사이언스 분석 보고 #7 결론 요약, 특장점과 차별성 & 단점 (7)	2024.12.19
데이터 사이언스 분석 보고 #6 버블차트, 워드 클라우드 (2)	2024.12.19
데이터 사이언스 분석 보고 #4 트리맵, 데이터프레임, 원도표 (4)	2024.12.19
데이터 사이언스 분석 보고 #3 막대그래프, 히트맵 (6)	2024.12.19
데이터 사이언스 분석 보고 #2 데이터 분석 도구 소개 (2)	2024.12.19

현재글데이터 사이언스 분석 보고 #5 산점도, 꺾은 선 그래프

녹슨 삼지창

아직 모름

vscode 에러 # 알수없는 오류,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

녹슨 삼지창