3 min read

Determining How long it will take to get your EAD from USCIS

Scraping data off the USCIS website

I found some code online (https://github.com/co89757/USCISCasePoll/blob/master/poll_uscis.py) to scrape the USCIS website for status updates. I used it to collect the case status for every 10th case between last October and today.

from pyquery import PyQuery as pq
import requests
import smtplib
import os
import sys
import os.path
import re
import pandas as pd
import feather

FILENAME_LASTSTATUS = os.path.join(sys.path[0], "LAST_STATUS_{0}.txt")
mynum = 1890048782 # THis is my case number

def poll_optstatus(casenumber):
    poll USCIS case status given receipt number (casenumber)
        param1: casenumber the case receipt number
        a tuple (status, details) containing status and detailed info
    headers = {
        'Accept': 'text/html, application/xhtml+xml, image/jxr, */*',
        'Accept-Encoding': 'gzip, deflate',
        'en-US, en; q=0.8, zh-Hans-CN; q=0.5, zh-Hans; q=0.3',
        'Cache-Control': 'no-cache',
        'Connection': 'Keep-Alive',
        'Content-Type': 'application/x-www-form-urlencoded',
        'Host': 'egov.uscis.gov',
        'Referer': 'https://egov.uscis.gov/casestatus/mycasestatus.do',
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586'
    url = "https://egov.uscis.gov/casestatus/mycasestatus.do"
    data = {"appReceiptNum": casenumber, 'caseStatusSearchBtn': 'CHECK+STATUS'}

    res = requests.post(url, data=data, headers=headers)
    doc = pq(res.text)
    status = doc('h1').text()
    code = STATUS_OK if status else STATUS_ERROR
    details = doc('.text-center p').text()
    return (code, status, details)

# Get every 10th case status
case_nums = ['YSC' + str(i) for i in range(1890038932, 1890079632)]
vals = [poll_optstatus(case) for case in case_nums]

df = pd.DataFrame.from_records(vals)
df['case'] = case_nums

feather.write_dataframe(df, "uscis.feather")

Data Transformations and cleaning

cases = read_feather("uscis.feather")
colnames(cases)[1:3] = c('code', 'status', 'details')

my_case_numeric = 1890048782
my_case_date = parse_date("2017-11-21")

I filtered out all cases that were rejected or cancelled for any reason.

This leaves behind cases that have either been processed, or cases that are still unprocessed.

cases %<>% 
  filter(code == 0 & ((status == "Case Was Received" & grepl("765", details)) | 
         status == 'New Card Is Being Produced')) %>%
  mutate(status_date = parse_date(word(details,2,4),format="%B %d, %Y,"),
         case_numeric = as.numeric(str_extract(case,"[0-9]+")))

Unprocessed cases

First lets look at the distribution of cases that have not yet been processed, including mine:

We can see that UCSIS is falling behind on their promise to process applications between 75-90 days. they seem to have finished cases that were submitted in the beginning of November, which was 106 days ago.

Newly processed cases

Now we will look at the cases that USCIS has recently completed.

fit <- smooth.spline(cases_pending$case_numeric, cases_pending$status_date)
cases$predicted_receival_date = as.Date(
                                  predict(fit, cases$case_numeric)$y, 

cases_pending$predicted_receival_date = as.Date(
                                          predict(fit, cases_pending$case_numeric)$y, 

ggplot(cases, aes(x=predicted_receival_date, 
                  y = status, 
                  color = status)) + 
  geom_jitter() + 
  geom_vline(xintercept = my_case_date) +
  ggtitle('Case Status by Date Recieved', 'My case is indicated by the vertical line') +
  xlab('Date Recieved') +
  ylab('Case Status') +
  theme_classic() +
  scale_color_discrete(name="Process Status",
                         breaks=c("Case Was Received", "New Card Is Being Produced"),
                         labels=c("Unprocessed", "Processed"))

From this plot, it seems like cases around mid November are currently being processed, my official date is November 21st.


From this data, I can make the following observations:

  • Cases with the same date as mine have just started being processed
  • It seems like it takes about 21 days for an application date to go from completely unprocessed to completely processed.


  • There is a small, but non-zero chance it will be processed in the next 4 days.
  • There is a 50% chance it will be processed in the next 10 days.
  • There is close to a 100% chance that my case will be processed in the next 21 days.


Converting Case number to date recieved

Cases with “New Card Is Being Produced” catagory do not indicate the date that those cases were first recieved, therefore I needed to convert the case number to the date the application was recieved. This is well approximated using a simple spline:

ggplot(cases_pending, aes(x=case_numeric, y = status_date)) + geom_point() +
  geom_line(aes(y=predicted_receival_date), color = 'red') + xlab('Case Number') + ylab('Case Recieved') + theme_classic()