Skip to content

小红书获取主页数据的时候,用户个人简要信息获取不到 #529

Open
@GithubLinfuxi

Description

async def get_creator_info(self, user_id: str) -> Dict:
    """
    通过解析网页版的用户主页HTML,获取用户个人简要信息
    PC端用户主页的网页存在window.__INITIAL_STATE__这个变量上的,解析它即可
    eg: https://www.xiaohongshu.com/user/profile/59d8cb33de5fb4696bf17217
    """
    uri = f"/user/profile/{user_id}"
    html_content = await self.request(
        "GET", self._domain + uri, return_response=True, headers=self.headers
    )
    match = re.search(
        r"<script>window.__INITIAL_STATE__=(.+)<\/script>", html_content, re.M
    )

    if match is None:
        print("没有获取到用户数据")
        return {}

    info = json.loads(match.group(1).replace(":undefined", ":null"), strict=False)
    if info is None:
        return {}
    return info.get("user").get("userPageData")
    
    
    
    这个函数获取不到window.__INITIAL_STATE__,但是网页上是确实存在的

Activity

NanmiCoder

NanmiCoder commented on Dec 29, 2024

@NanmiCoder
Owner

更换账号尝试,可能出现验证码了或者风控了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @NanmiCoder@GithubLinfuxi

      Issue actions

        小红书获取主页数据的时候,用户个人简要信息获取不到 · Issue #529 · NanmiCoder/MediaCrawler