Skip to content
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
8c38853
facebook: add export option to `FacebookWizardImportOrBuildPage`
redshiftzero Feb 19, 2025
45a383f
facebook: add `FacebookLastImportOrBuildComponent`
redshiftzero Feb 19, 2025
fd500e0
facebook: add `FacebookWizardImportPage`
redshiftzero Feb 19, 2025
35cd45c
facebook: add `FacebookWizardImportPage`
redshiftzero Feb 19, 2025
8c02088
facebook: add `WizardDeleteOption` state
redshiftzero Feb 19, 2025
9c5ef80
facebook: add `FacebookWizardImportDownloadPage`
redshiftzero Feb 19, 2025
b58504d
facebook: emit `State.WizardImportStart` event
redshiftzero Feb 19, 2025
0a1969e
facebook: add `FacebookWizardImportingPage`
redshiftzero Feb 19, 2025
5f167ab
facebook: add required methods to `Facebook` interface
redshiftzero Feb 19, 2025
a5101e8
facebook: hook up new methods through IPC/preload script
redshiftzero Feb 19, 2025
baf36c6
facebook: implement `unzipFacebookArchive`
redshiftzero Feb 19, 2025
ddf84f9
facebook: implement `deleteUnzippedFacebookArchive`
redshiftzero Feb 19, 2025
bfa4452
facebook: implement `verifyFacebookArchive`
redshiftzero Feb 19, 2025
33e08f3
facebook: import archive is default option, from scratch not impl
redshiftzero Feb 20, 2025
b3cccec
facebook: start implementing `importFacebookArchive`
redshiftzero Feb 20, 2025
508cd66
facebook: import posts into database table
redshiftzero Feb 20, 2025
9c32bf4
facebook: fix test on archive import page
redshiftzero Feb 20, 2025
c030bfd
facebook: remove unnecessary container type
redshiftzero Feb 20, 2025
75230d0
facebook: filter by wall posts only for now
redshiftzero Feb 20, 2025
979b6bd
fix: initDB in the Facebook directory
redshiftzero Feb 22, 2025
f6a1625
fix: use consistent Facebook data dir `{account id} {account name}`
redshiftzero Feb 22, 2025
d9e2a29
fix: typo in download instructions
redshiftzero Mar 1, 2025
1798996
fix: update instructions for Facebook download
redshiftzero Mar 1, 2025
5735f04
fix: formatting on export page
redshiftzero Mar 1, 2025
fcd4f18
fix: parse JSON in `verifyFacebookArchive`
redshiftzero Mar 1, 2025
a1ef2ea
fix: small edits to time that facebook archiving takes
redshiftzero Mar 1, 2025
4bf23ae
fix: update Facebook import logic for JSON import file
redshiftzero Mar 1, 2025
a4c437c
fix: just import "shared a status" posts, that's the wall in fb land
redshiftzero Mar 1, 2025
a543ccc
fix: linting
redshiftzero Mar 1, 2025
43c35d5
facebook: add initial archive site
redshiftzero Feb 20, 2025
de64aa2
facebook: start implementing `archiveBuild` method
redshiftzero Feb 20, 2025
97c793a
facebook: finish `archiveBuild`
redshiftzero Feb 21, 2025
b4e07d8
facebook: package up facebook-archive site
redshiftzero Feb 21, 2025
be9bb92
facebook: fix build of static site, remove path which we don't have yet
redshiftzero Feb 21, 2025
03daa4e
refactor: DRY up static site build code a bit
redshiftzero Feb 21, 2025
303b9d0
fix: save archive in correct dir (`(account id) (name)`)
redshiftzero Mar 1, 2025
33d144c
If we start and do not have an account ID, go back to login
micahflee Mar 4, 2025
73d8821
Log the posts that get skipped
micahflee Mar 4, 2025
5ad60e4
eslint config
redshiftzero Mar 4, 2025
26cc1fd
decode unicode characters
redshiftzero Mar 4, 2025
4b6a0fd
Merge pull request #411 from lockdown-systems/facebook-webapp
micahflee Mar 4, 2025
e2efbec
facebook: add isReposted field to Post
redshiftzero Feb 21, 2025
c35c739
facebook: current posts are not reshares
redshiftzero Feb 21, 2025
21ef0e2
facebook: try to pull out post types from HTML content
redshiftzero Feb 21, 2025
8244c09
facebook: add db migration to add isReposted column
redshiftzero Feb 21, 2025
3703766
facebook: show reposts on the static archive site
redshiftzero Feb 21, 2025
80c1ea2
facebook: display title on archive
redshiftzero Feb 22, 2025
2a1fad7
fix: populate db with repost status
redshiftzero Mar 2, 2025
66ae851
fix: remove now unused function
redshiftzero Mar 3, 2025
d2f23f6
facebook: support media imports
redshiftzero Mar 3, 2025
0bf6090
fix: media gets saved to post_media, and media/ dir
redshiftzero Mar 3, 2025
49b66ce
facebook: add video/image support to static site
redshiftzero Mar 4, 2025
dcb2078
fix: don't show post text twice if it's the same text as the image
redshiftzero Mar 4, 2025
a508edb
facebook: stop skipping any posts (will need to add support)
redshiftzero Mar 4, 2025
4211521
fix: text can be null
redshiftzero Mar 4, 2025
7bcf910
Merge pull request #415 from lockdown-systems/facebook-reposts
redshiftzero Mar 4, 2025
7160150
facebook: improve checking of shared post, add debug logging
redshiftzero Mar 4, 2025
4121fb8
facebook: display life events on archive using life event title
redshiftzero Mar 4, 2025
59f8ebe
Merge branch 'main' into facebook-export-import
redshiftzero Mar 4, 2025
5e91ecc
archive: add repost indicator
redshiftzero Mar 11, 2025
98ab0c6
Merge branch 'main' into facebook-export-import
redshiftzero Mar 11, 2025
f371cd3
facebook: also import URLs into the database
redshiftzero Mar 12, 2025
9bdbd60
facebook: enable display of urls on website
redshiftzero Mar 12, 2025
e0c1d34
facebook: remove unused type
redshiftzero Mar 12, 2025
ea6418d
fix: remove unused import
redshiftzero Mar 12, 2025
e0c56ea
facebook: render URLs on static site
redshiftzero Mar 12, 2025
f7a5277
fix: display of link previews
redshiftzero Mar 12, 2025
cba0d5e
Merge branch 'main' into facebook-export-import
redshiftzero Mar 12, 2025
0ef9809
Merge branch 'main' into facebook-export-import
micahflee Mar 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
287 changes: 283 additions & 4 deletions src/account_facebook/facebook_account_controller.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
import path from 'path'
import fs from 'fs'
import os from 'os'

import fetch from 'node-fetch';
import { session } from 'electron'
import log from 'electron-log/main';
import Database from 'better-sqlite3'
import unzipper from 'unzipper';
import { glob } from 'glob';

import {
getAccountDataPath,
Expand All @@ -13,6 +17,7 @@ import {
FacebookJob,
FacebookProgress,
emptyFacebookProgress,
FacebookImportArchiveResponse,
} from '../shared_types'
import {
runMigrations,
Expand All @@ -25,6 +30,8 @@ import { IMITMController } from '../mitm';
import {
FacebookJobRow,
convertFacebookJobRowToFacebookJob,
FacebookArchivePost,
FacebookPostRow
} from './types'

export class FacebookAccountController {
Expand Down Expand Up @@ -113,7 +120,7 @@ export class FacebookAccountController {
}

// Make sure the account data folder exists
this.accountDataPath = getAccountDataPath('X', this.account.name);
this.accountDataPath = getAccountDataPath("Facebook", `${this.account.accountID} ${this.account.name}`);
log.info(`FacebookAccountController.initDB: accountDataPath=${this.accountDataPath}`);

// Open the database
Expand All @@ -138,7 +145,19 @@ export class FacebookAccountController {
id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT NOT NULL UNIQUE,
value TEXT NOT NULL
);`
);`]
},
{
name: "20250220_add_post_table",
sql: [
`CREATE TABLE post (
id INTEGER PRIMARY KEY AUTOINCREMENT,
postID TEXT NOT NULL UNIQUE,
createdAt DATETIME NOT NULL,
title TEXT,
text TEXT,
addedToDatabaseAt DATETIME NOT NULL
);`
]
},
])
Expand Down Expand Up @@ -220,7 +239,7 @@ export class FacebookAccountController {
}
const buffer = await response.buffer();
log.info("FacebookAccountController.getProfileImageDataURI: buffer", buffer);
return `data:${response.headers.get('content-type')};base64,${buffer.toString('base64')}`;
return `data: ${response.headers.get('content-type')}; base64, ${buffer.toString('base64')}`;
} catch (e) {
log.error("FacebookAccountController.getProfileImageDataURI: error", e);
return "";
Expand All @@ -234,4 +253,264 @@ export class FacebookAccountController {
async setConfig(key: string, value: string) {
return setConfig(key, value, this.db);
}
}

// Unzip facebook archive to the account data folder using unzipper
// Return null if error, else return the unzipped path
async unzipFacebookArchive(archiveZipPath: string): Promise<string | null> {
if (!this.account) {
return null;
}
const unzippedPath = path.join(getAccountDataPath("Facebook", `${this.account.accountID} ${this.account.name}`), "tmp");

const archiveZip = await unzipper.Open.file(archiveZipPath);
await archiveZip.extract({ path: unzippedPath });

log.info(`FacebookAccountController.unzipFacebookArchive: unzipped to ${unzippedPath}`);

return unzippedPath;
}

// Delete the unzipped facebook archive once the build is completed
async deleteUnzippedFacebookArchive(archivePath: string): Promise<void> {
fs.rm(archivePath, { recursive: true, force: true }, err => {
if (err) {
log.error(`FacebookAccountController.deleteUnzippedFacebookArchive: Error occured while deleting unzipped folder: ${err} `);
}
});
}

// Return null on success, and a string (error message) on error
async verifyFacebookArchive(archivePath: string): Promise<string | null> {
// If archivePath contains just one folder and no files, update archivePath to point to that inner folder
const archiveContents = fs.readdirSync(archivePath);
if (archiveContents.length === 1 && fs.lstatSync(path.join(archivePath, archiveContents[0])).isDirectory()) {
archivePath = path.join(archivePath, archiveContents[0]);
}

const foldersToCheck = [
archivePath,
path.join(archivePath, "personal_information", "profile_information"),
];

// Make sure folders exist
for (let i = 0; i < foldersToCheck.length; i++) {
if (!fs.existsSync(foldersToCheck[i])) {
log.error(`XAccountController.verifyXArchive: folder does not exist: ${foldersToCheck[i]} `);
return `The folder ${foldersToCheck[i]} doesn't exist.`;
}
}

// Check if there's a profile_information.html file. This means the person downloaded the archive using HTML, not JSON.
const profileHtmlInformationPath = path.join(archivePath, "personal_information/profile_information/profile_information.html");
if (fs.existsSync(profileHtmlInformationPath)) {
log.error(`FacebookAccountController.verifyFacebookArchive: file is in wrong format, expected JSON, not HTML: ${profileHtmlInformationPath}`);
return `The file ${profileHtmlInformationPath} file is in the wrong format. Request a JSON archive.`;
}

// Make sure profile_information.json exists and is readable
const profileInformationPath = path.join(archivePath, "personal_information/profile_information/profile_information.json");
if (!fs.existsSync(profileInformationPath)) {
log.error(`FacebookAccountController.verifyFacebookArchive: file does not exist: ${profileInformationPath}`);
return `The file ${profileInformationPath} doesn't exist.`;
}
try {
fs.accessSync(profileInformationPath, fs.constants.R_OK);
} catch {
log.error(`FacebookAccountController.verifyFacebookArchive: file is not readable: ${profileInformationPath}`);
return `The file ${profileInformationPath} is not readable.`;
}

// Make sure the profile_information.json file belongs to the right account
try {
const profileData = JSON.parse(fs.readFileSync(profileInformationPath, 'utf-8'));

if (!profileData.profile_v2?.profile_uri) {
log.error("FacebookAccountController.verifyFacebookArchive: Could not find profile URI in archive");
return "Could not find profile ID in archive";
}

const profileUrl = profileData.profile_v2.profile_uri;
const profileId = profileUrl.split('id=')[1];

if (!profileId) {
log.error("FacebookAccountController.verifyFacebookArchive: Could not extract profile ID from URL");
return "Could not extract profile ID from URL";
}

if (profileId !== this.account?.accountID) {
log.error(`FacebookAccountController.verifyFacebookArchive: profile_information.json does not belong to the right account`);
return `This archive is for @${profileId}, not @${this.account?.accountID}.`;
}
} catch {
return "Error parsing JSON in profile_information.json";
}

return null;
}

// Return null on success, and a string (error message) on error
async importFacebookArchive(archivePath: string, dataType: string): Promise<FacebookImportArchiveResponse> {
if (!this.db) {
this.initDB();
}

let importCount = 0;
const skipCount = 0;

// If archivePath contains just one folder and no files, update archivePath to point to that inner folder
const archiveContents = fs.readdirSync(archivePath);
if (archiveContents.length === 1 && fs.lstatSync(path.join(archivePath, archiveContents[0])).isDirectory()) {
archivePath = path.join(archivePath, archiveContents[0]);
}

// Load the username
let profileId: string;


try {
const profileInformationPath = path.join(archivePath, "personal_information/profile_information/profile_information.json");
const profileData = JSON.parse(fs.readFileSync(profileInformationPath, 'utf-8'));

if (!profileData.profile_v2?.profile_uri) {
return {
status: "error",
errorMessage: "Could not find profile URI in archive",
importCount: importCount,
skipCount: skipCount,
};
}

const profileUrl = profileData.profile_v2.profile_uri;
profileId = profileUrl.split('id=')[1] || '';

if (!profileId) {
return {
status: "error",
errorMessage: "Could not extract profile ID from URL",
importCount: importCount,
skipCount: skipCount,
};
}
} catch (e) {
return {
status: "error",
errorMessage: "Error parsing profile information JSON",
importCount: importCount,
skipCount: skipCount,
};
}

// Import posts
if (dataType == "posts") {
const postsFilenames = await glob(
[
// TODO: for really big Facebook archives, are there more files here?
path.join(archivePath, "your_facebook_activity", "posts", "your_posts__check_ins__photos_and_videos_1.json"),
Comment on lines +578 to +579
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. We're going to need people with big Facebook accounts to help us test this. I actually discovered the same problem with twitter, that it splits JSON files into multiple files for people with like 100k tweets. We should ask around.

],
{
windowsPathsNoEscape: os.platform() == 'win32'
}
);
if (postsFilenames.length === 0) {
return {
status: "error",
errorMessage: "No posts files found",
importCount: importCount,
skipCount: skipCount,
};
}

// Go through each file and import the posts
for (let i = 0; i < postsFilenames.length; i++) {
const postsData: FacebookArchivePost[] = [];
try {
const postsFile = fs.readFileSync(postsFilenames[i], 'utf8');
const posts = JSON.parse(postsFile);

for (const post of posts) {
// Skip if no post text
const postText = post.data?.find((d: { post?: string }) => 'post' in d && typeof d.post === 'string')?.post;
if (!postText) {
log.info("FacebookAccountController.importFacebookArchive: skipping post with no text");
continue;
}

// Check if it's a shared post by looking for external_context in attachments
const isSharedPost = post.attachments?.[0]?.data?.[0]?.external_context !== undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is falsely classifying posts as shared posts. Here's one of the posts from my test account, which includes two links:

Screenshot 2025-03-03 at 5 32 09 PM

I added a commit to log the actual posts content when it skips a post (73d8821) and it skipped this as a shared post:

17:30:27.316 › FacebookAccountController.importFacebookArchive: skipping shared post {"timestamp":1740775067,"attachments":[{"data":[{"external_context":{"url":"https://stardewvalleywiki.com/JojaMart"}}]}],"data":[{"post":"One of the best wikis around: https://stardewvalleywiki.com/Stardew_Valley_Wiki\n\nAnd the store to boycott: https://stardewvalleywiki.com/JojaMart"},{"update_timestamp":1740775067},{},{},{}],"title":"Chase Westbrook shared a link."}

Here's the JSON:

{
    "timestamp": 1740775067,
    "attachments": [
        {
            "data": [
                {
                    "external_context": {
                        "url": "https://stardewvalleywiki.com/JojaMart"
                    }
                }
            ]
        }
    ],
    "data": [
        {
            "post": "One of the best wikis around: https://stardewvalleywiki.com/Stardew_Valley_Wiki\n\nAnd the store to boycott: https://stardewvalleywiki.com/JojaMart"
        },
        {
            "update_timestamp": 1740775067
        },
        {},
        {},
        {}
    ],
    "title": "Chase Westbrook shared a link."
}

It appears that external_context is used for link previews, and probably other things too, even for posts that aren't reposts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can check the domain to ensure that the external_context.url is a facebook link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my export, the reposts have the "data"->"external_context"->"url" but the URL is empty, so I've modified the logic to look for that


// Skip if it's a shared/repost, group post, shares a group, etc. We will extend the import logic
// to include other data types in the future.
if (isSharedPost) {
log.info("FacebookAccountController.importFacebookArchive: skipping shared post");
continue;
}
else if (post.attachments) {
log.info("FacebookAccountController.importFacebookArchive: skipping unknown post type");
continue;
}

postsData.push({
id_str: post.timestamp.toString(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the timestamp as an id is reasonable since (hopefully) this will be unique. But the bigger concern for me is, since this actually isn't the real ID of the post, how will we use the info we're storing to delete this post in the future, if we choose to?

For example, in my test account, if I click on a post it loads this URL:

https://www.facebook.com/permalink.php?story_fbid=pfbid0UWD2yDkvpM5FoBCJFiG3wFvjKcCx45RaZFeTZm1EU7VzyyRXRNkKTjnYwWTKGqc6l&id=61572798227018

The id part, 61572798227018, is my Facebook ID. So the pfbid0UWD2yDkvpM5FoBCJFiG3wFvjKcCx45RaZFeTZm1EU7VzyyRXRNkKTjnYwWTKGqc6l part must be the post ID. (I stripped the pfbid from the beginning and tried base64 decoding it but nothing legible comes out.)

When I grep my FB archive for this string, there's nothing. So once we get to the point where we choose what posts to delete, how do we choose to delete this one if we can't build the URL to load it?

I don't think this is something we need to solve in this PR, but I do think it makes sense to consider what we actually get get from the archive. If we don't have real post IDs (or rather, story IDs, as it seems FB calls them) then maybe the deletion options just have to be like "delete all posts without media", "delete all videos", "delete all photos", etc., instead of basing it on reactions like we do with X.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we can get the post ID from the archive so we may need to do the latter. We could also provide options to delete different types of content in a specified time period, e.g. "Delete all posts more than 2 years old"

title: post.title || '',
full_text: postText,
created_at: new Date(post.timestamp * 1000).toISOString(),
});
}
} catch (e) {
return {
status: "error",
errorMessage: "Error parsing JSON in exported posts",
importCount: importCount,
skipCount: skipCount,
};
}

// Loop through the posts and add them to the database
try {
postsData.forEach((post) => {
// Is this post already there?
const existingPost = exec(this.db, 'SELECT * FROM post WHERE postID = ?', [post.id_str], "get") as FacebookPostRow;
if (existingPost) {
// Delete the existing post to re-import
exec(this.db, 'DELETE FROM post WHERE postID = ?', [post.id_str]);
}

// TODO: implement media import for facebook
// TODO: implement urls import for facebook

// Import it
exec(this.db, 'INSERT INTO post (postID, createdAt, title, text, addedToDatabaseAt) VALUES (?, ?, ?, ?, ?)', [
post.id_str,
new Date(post.created_at),
post.title,
post.full_text,
new Date(),
]);
importCount++;
});
} catch (e) {
return {
status: "error",
errorMessage: "Error importing posts: " + e,
importCount: importCount,
skipCount: skipCount,
};
}
}

return {
status: "success",
errorMessage: "",
importCount: importCount,
skipCount: skipCount,
};
}

return {
status: "error",
errorMessage: "Invalid data type.",
importCount: importCount,
skipCount: skipCount,
};
}
}
36 changes: 36 additions & 0 deletions src/account_facebook/ipc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,4 +110,40 @@ export const defineIPCFacebook = () => {
throw new Error(packageExceptionForReport(error as Error));
}
});

ipcMain.handle('Facebook:unzipFacebookArchive', async (_, accountID: number, archivePath: string): Promise<string | null> => {
try {
const controller = getFacebookAccountController(accountID);
return await controller.unzipFacebookArchive(archivePath);
} catch (error) {
throw new Error(packageExceptionForReport(error as Error));
}
});

ipcMain.handle('Facebook:deleteUnzippedFacebookArchive', async (_, accountID: number, archivePath: string): Promise<string | null> => {
try {
const controller = getFacebookAccountController(accountID);
await controller.deleteUnzippedFacebookArchive(archivePath);
} catch (error) {
throw new Error(packageExceptionForReport(error as Error));
}
});

ipcMain.handle('Facebook:verifyFacebookArchive', async (_, accountID: number, archivePath: string): Promise<string | null> => {
try {
const controller = getFacebookAccountController(accountID);
return await controller.verifyFacebookArchive(archivePath);
} catch (error) {
throw new Error(packageExceptionForReport(error as Error));
}
});

ipcMain.handle('Facebook:importFacebookArchive', async (_, accountID: number, archivePath: string, dataType: string): Promise<FacebookImportArchiveResponse> => {
try {
const controller = getFacebookAccountController(accountID);
return await controller.importFacebookArchive(archivePath, dataType);
} catch (error) {
throw new Error(packageExceptionForReport(error as Error));
}
});
};
Loading