3-2. 크롤링한 데이터를 csv 파일 만들기

3-2. 크롤링한 데이터를 csv 파일 만들기

2018. 6. 10. 17:45ㆍCoding/Python

728x90

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import requests
from bs4 import BeautifulSoup as BS
import csv
 
def mnet_Crawling(html):
    temp_list = []
    temp_dict = {}
 
    tr_list = html.select('div.MnetMusicList.MnetMusicListChart > div.MMLTable.jQMMLTable > table > tbody > tr')
 
    for tr in tr_list :
        rank = int(tr.find('td',{'class':'MMLItemRank'}).find('span').text.strip('위'))
 
        img = tr.find('td',{'class':'MMLItemTitle'}).find('div',{'class':'MMLITitle_Album'}).find('img')['src']
        img = tr.find('td',{'class':'MMLItemTitle'}).find('div',{'class':'MMLITitle_Album'}).find('img').get('src')
 
        title = tr.find('td',{'class':'MMLItemTitle'}).find('div',{'class':'MMLITitle_Box info'}).find('a',{'class':'MMLI_Song'}).text
        artist = tr.find('td',{'class':'MMLItemTitle'}).find('div',{'class':'MMLITitle_Box info'}).find('a',{'class':'MMLIInfo_Artist'}).text
        album = tr.find('td',{'class':'MMLItemTitle'}).find('div',{'class':'MMLITitle_Box info'}).find('a',{'class':'MMLIInfo_Album'}).text
        temp_list.append([rank, img, title, artist, album])
        temp_dict[str(rank)] = {'img':img, 'title':title, 'artist':artist, 'album':album}
 
 
 
    return temp_list, temp_dict
#============================================================ End of mnet_Crawling() ============================================================#
 
 
def toCSV(mnet_list):
    file = open('mnet_chart.csv', 'w', encoding='utf-8', newline='')
    csvfile = csv.writer(file)
    for row in mnet_list :
        csvfile.writerow(row)
    file.close()
#============================================================ End of toCSV() ============================================================#
 
mnet_list = []
mnet_dict = {}
 
req = requests.get('http://www.mnet.com/chart/TOP100/')
 
for page in [1,2]:
    req = requests.get('http://www.mnet.com/chart/TOP100/?pNum={}'.format(page))
    html = BS(req.text, 'html.parser')
    
    mnet_temp = mnet_Crawling(html)
    mnet_list += mnet_temp[0]
    mnet_dict = dict(mnet_dict, **mnet_temp[1])
 
# 리스트 출력
for item in mnet_list :
    print(item)
 
# 사전형 출력
for item in mnet_dict :
    print(item, mnet_dict[item]['img'], mnet_dict[item]['title'], mnet_dict[item]['artist'], mnet_dict[item]['album'])
 
# CSV파일 생성
toCSV(mnet_list)
 
 
Colored by Color Scripter
cs

CSV모듈이 다 처리해주니, 이를 이용하면 간단합니다.

1. csv모듈 import

import csv

2. toCSV()함수 생성

# CSV파일 생성

toCSV(mnet_list)

def toCSV(mnet_list):

    file = open('mnet_chart.csv', 'w', encoding='utf-8', newline='')

    csvfile = csv.writer(file)

    for row in mnet_list :

        csvfile.writerow(row)

    file.close()

file =	open	'mnet_chart.csv'	'w'	encoding='utf-8'	newline=''
변수	파일생성하는 함수	생성하는 파일명	write쓰기모드	오류날경우	안하면 마지막에 공백 생김

encoding = 'utf-8'의 경우,

에러가 안나는 경우도 있지만, 사용하는 에디터에 따라서 에러가 나기도합니다.

아래와 같은 에러가 나는 경우에 넣으면됩니다.

UnicodeEncodeError: 'ascii' codec can't encode characters in position 64-67: ordinal not in range(128)

newline = ''의 경우,

csv파일을 만들때, 각각의 라인뒤에 빈라인이 생기는데, 이를 제거하는 옵션입니다.

Tip

1
2
3
4
5
6
def toCSV(mnet_list):
    file = open('mnet_chart.csv', 'w', encoding='utf-8', newline='')
    csvfile = csv.writer(file)
    for row in mnet_list :
        csvfile.writerow(row)
    file.close()
Colored by Color Scripter
cs

대신에 ↓↓↓ 아래와같이 쓸 수 있습니다.

file.close()를 자동으로 해주기 때문에 편리합니다.

1
2
3
4
5
def toCSV(mnet_list):
    with open('mnet_chart.csv', 'w', encoding='utf-8', newline='') as file :
        csvfile = csv.writer(file)
        for row in mnet_list:
            csvfile.writerow(row)
Colored by Color Scripter
cs

728x90

저작자표시 비영리 변경금지

'Coding > Python' 카테고리의 다른 글

[Python]Atom에디터 UnicodeEncodeError: 'cp949' codec can't encode character (0)	2018.08.10
3-3. 크롤링한 데이터를 json 파일 만들기 (3)	2018.06.10
3-1. 크롤링한 데이터를 리스트화 List [], 사전화 Dict {} (3)	2018.05.27
2-4. BeautifulSoup를 이용한 Mnet 차트 크롤링 하기[함수] (0)	2018.05.26
2-3. BeautifulSoup를 이용한 Mnet 차트 크롤링 하기[데이터 접근] (1)	2018.05.22

떡빵로그

떡빵로그

태그

최근글

댓글

공지사항

아카이브

'Coding > Python' 카테고리의 다른 글

관련글

티스토리툴바